, . microsoft word has an option “text alternative” to add a description of a table or figure for visually impaired people, who will use screen readers for reading the document. adobe acrobat reader also has an accessibility pane to tag tables and add alternative text and descriptions of tables, which is used by the nvda screen reader to read aloud. moreover, commonlook office, whose motto is “build accessibility into documents early,” has add-ins for microsoft word or powerpoint to add enough accessibility content to the documents to information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 9 table 1. solutions and libraries for table extraction and processing. s no. tools open source image based comments 1 tabula y n extracts data tables from pdf and saves as csv or excel spreadsheet. it works on native pdf files and cannot extract scanned tables. it supports multiple platforms but does not support batch processing. 2 pdftables n n extracts page, table, table row, and even table cell. it is a fully automated api. it supports multiple platforms and multiple programming languages. 3 docparser n y extracts information from images and forms. it is a cloud-based application and supports batch processing. it parses the documents and offers more features but needs human intervention. it shows poor accuracy in handwritten application forms. 4 pdftron n n supports multiple platforms and multiple programming languages. 5 camelot y y a python library that extracts table from images. it has built-in ocr. 6 excalibur y y a web-based solution which is powered by camelot. 7 pypdf2 y n a python library that can do batch processing with multiple files. 8 pdfplumber y y a python library built on pdfminer. 9 pdf table extractor y n a web-based tool built on tabula. it supports scraping of multiple page tables and comparison of cell values. 10 pdfminer y n a python library that extracts information like location, fonts, and lines of the text. it focuses on analyzing text. it has a pdf parser. it figures out the semantic relationships among structured tables. make the resulting pdf accessible. however, already-developed unstructured documents, without any accessibility features, still need some measures to make the documents understandable to visually impaired or blind users. keeping in mind the statistics of visually impaired people and the unstructured data of the future—the global data sphere will grow from 33zb to 175zb and 80% of this worldwide data will be unstructured—visually impaired individuals cannot be ignored for their access to knowledge.68 therefore, we would need mechanisms for making these unstructured documents understandable to as many people as possible by incorporating accessibility measures in the document readers. the following section highlights some of the key issues in this domain. issues and challenges in the existing systems tables can be utilized in multiple scenarios including information extraction, table search, ontology engineering, conversion to dbms, and document engineering. 69 the situation becomes difficult when a blind or visually impaired person needs to understand the tables. the issues and challenges in dealing with pdf tables are categorized in the following sections. https://tabula.technology/ https://pdftables.com/ https://docparser.com/ https://www.pdftron.com/ https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 10 table structure tables in pdf documents need more focus on table structure detection because they do not follow a defined formal structure.70 several knowledge gaps are identified in literature regarding table structure, such as the identification of functional areas of tables, for which silva argued the use of multiple heuristics and machine learning algorithms in parallel or in sequence.71 the variety of structural layouts creates problems in their identification, which can be handled by defining more rules at the lexical and syntactic layer of table processing. this could also be fruitful for better semantic annotations.72 in addition, the variety of cell content or inconsistent cell content, along with implicit header cells, creates problems in understanding the tables, especially by machines.73 the vector representation of web tables may be applied to pdf tables for semantic annotations and identification of column types.74 along with that approach, graphical representation and a graphical neural network (gnn) can also be used for better structure identification in multiple domains.75 new data sets need to be introduced for structure recognition in various domains, including business and finance, as they use a huge amount of tables in their documents.76 from the discussion above, the table structure inconsistencies, cell content inconsistencies, functional and logical processing of tables needs more research effort to eliminate the stated problems. along with that, the inclusion of more data sets will also help in handling the diversity in the field. table formats the existing format of tables in pdf lacks the metadata needed for further processing; therefore, the conversion of pdf tables to other formats, especially open formats, will open new endeavors. some researchers have worked on converting tables to csv format, which retains the basic structure but lacks some cell formatting. researchers worked on the transformation of web tables to relational tables for easy manipulation.77 in contrast, xml can handle complex data and is more easily read by humans. therefore, a methodology is presented to work on tables in xml format, but it considers tables having text and numerical data only.78 json, another format, can also be used as an alternative to xml; it is smaller in size than the xml and can handle complex and hierarchical data. the json format has less support than xml but is preferred for web application due to its interoperability and lightweight features. table interpretation the variable representation patterns of table values, dense content and natural languag e processing create problems in the correct interpretation of tables.79 anaphoric resolution techniques and documenting level discourse parsers are suggested to handle complex references among multiple domains.80 moreover, handling the locality features of a table and the annotation of its property feature can lead to better interpretation of tables.81 the use of a knowledge base is suggested for understanding and annotating the relationships among tables and text to get more information about the extracted entities from tables and text.82 similarly, the extraction of data and its precision in medical and financial tables is an issue that needs the attention of researchers, as both fields have crucial and important data in its tables. 83 for easy interpretation of tables, machine learning classifiers, based on table headings and captions, can be used to classify them into their respective domain.84 the relationship of tables in a specific domain and or among multiple domains can be achieved by developing ontologies.85 this will enable the tables to be published on an lod cloud that will establish more relationships and infer insights from multiple domains. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 11 table evaluation most of the researchers working on pdf tables have tried to evaluate their work with popular data sets such as icdar 2013, icdar 2015, icdar 2017 pod, pubmed, unlv, and mormont. as we have pdf documents in multiple domains, therefore, new data sets should be introduced for structure recognition, especially in business and finance, as these domains use a large number of tables in their documents.86 an evaluation methodology was proposed for table detection, structure recognition, and its functional and semantic analysis.87 unfortunately, there are no standard metrics, parameters, and formal methodology for table processing evaluation.88 therefore, standard evaluation metrics should be defined for pdf tables, in order to standardize the evaluation of algorithms and frameworks. table presentation to blind and visually impaired users the available tools and techniques for reading aloud documents to blind and visually impaired people either read the table caption only and ignore the content or treat the tables as text and read the rows line by line. this does not help these users to understand the semantics of the table and its content. besides the content of the table, its layout shows grouping and connections among the content which is not presented to blind and visually impaired people by current solutions.89 therefore, tools and screen readers need to present tables in nonvisual format or give a summarized view of tables by following the guidelines of w3c, instead of reading the table like text.90 the summarized view of tables can become part of bibliographic metadata and can contribute in cataloging in the perspective of linked and open data. 91 a study highlighted the accessibility of published pdf articles by four journal publishers and presented the findings in graphs to show the trend from 2009 to 2013, by taking parameters including meaningful title, alternate text for images, and logical reading order.92 the author further applied the same methodology to analyze the articles published in next four years (2014 to 2018) and came to the conclusion that accessibility of pdf documents had improved. however, the journal publishers , who should be more aware of disability and accessibility, did not consistently follow the pdf/ua accessibility requirements and wcag 2.0 when producing pdf versions of their articles.93 therefore, visually impaired individuals should be provided with a mechanism for understanding the digital content and underlying semantics at multiple levels of abstractions, like the general information about the document and its elements—including tables—its structure and content, navigation in the table, and querying the table to get more details and lessen cognitive overload. accessibility of digital library collection the accessibility of large-scale digital library collections can enhance content for sighted as well as visually impaired users. the traditional utilization of digital library collections needs to be broadened by making computation-ready collections meant to be used and consumed in multiple domains.94 an effort was made by researchers to digitize and archive a digital repository of images and convert them to pdf/a documents but, unfortunately, the researchers came up with limited semantics as they did not consider the elements within the documents themselves.95 the accessibility of these converted documents may be compromised with these limited semantics. the rich semantics of tables can be used in the bibliographic classification of a digital library’s collection to increase the search width of the digital library.96 blind and visually impaired users can be assisted in using digital libraries, as they may need help at physical and cognitive levels. at the physical level, the blind may face difficulty in accessing information, identifying path and status, and efficiently evaluating information. at the cognitive level, they may face problems in understanding multiple structures, programs, information, features of the digital library, and the need to stick to some specific formats. therefore, the inclusion of help features will make the information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 12 digital library friendly to blind and visually impaired people by incorporating meaningful descriptions for nontextual elements.97 the sight-centered nature of the digital library creates problems for blind and visually impaired users due to missing textual or verbal instructions. some researchers identified the inclusion of labels and meaningful descriptions for hyperlinks, instructions, structure, multimedia content and nontext content to make digital libraries friendly to blind and visually impaired people.98 at the same time, others argue for improvement in usability by introducing help features in terms of usefulness, ease of use, and user satisfaction.99 the accessibility of digital libraries in general and its content in specific may be improved by accommodating help features in the interface and meaningful descriptions for the contents’ nontext elements including tables. conclusions and future research directions this study discusses the accessibility of tables included in pdf documents in general as well as in the specific environment of digital libraries. existing frameworks, algorithms, and solutions for the processing and interpretation of pdf tables, specifically their presentation to blind and visually impaired people, are thoroughly discussed. a general workflow of table processing is also presented in figure 1. the available solutions for reading out pdf documents to blind and visually impaired people are analyzed for their output, specifically for their attitude towards handling tables. furthermore, a list of resources for table interpretation and presentation are discussed along with their different features. the issues and challenges in table structure, format, interpretation, evaluation, its presentation to blind, and accessibility of digital library collection are discussed. the researchers working in the domain of accessibility, digital library, and pdf tables can extend and modify the current solutions and algorithms by following the future research directions given below. • the structure of a table has implicit semantic information which a sighted reader can infer but a blind reader needs assistance to understand. the structure of a pdf table is extracted using multiple approaches like heuristics, ontologies, machine learning and segmentation, whereas vectors are used for a web table.100 therefore, the combinations of multiple approaches and use of vectors for pdf tables may produce better results. • the content of a table is usually numeric or very short text and needs proper interpretation. therefore, a knowledge base can be used to get more information about the extracted entities from tables and text in order to understand and annotate the relationships among tables and text.101 these knowledge bases can be predetermined or may be selected automatically according to the table content or domain. • table interpretation can become easy if tables are classified according to their domains by using machine learning classifiers. the classification can be based on table headings and captions, as well as the title and author of the document.102 • ontologies are used to relate the tables in a specific domain and or among multiple domains, and publishing them on an lod cloud will establish new relationships.103 this will help in inferring new insights from complex, long, and numerical tables. • unstructured data and content can be made available for multiple usage and interpretations if it is converted to open formats like csv, json and xml.104 among these, csv comes with repeated content, xml needs special parsers, whereas json is lightweigh t and easy to write and read.105 it has support from nosql databases like mongodb and apache couchdb, and web application apis like twitter, you tube, and facebook. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 13 therefore, json might be a better option for the conversion of pdf tables for its multiple interpretation and navigation within tables. • the processes used for evaluation of tables have no defined matrices.106 therefore, the table evaluation processes should be defined with their respective matrices in order to standardize the research in this domain. • the precision of extracted content of table is very crucial especially in medical, financial, and experimental tables that have numeric data. therefore, the preprocessing of tables or conversion to other formats would need more attention to avoid any truncation or round off of the data. • the presentation of tables to blind or visually impaired people can be in nonvisual or summarized form.107 the summaries may be presented nonvisually, including the structural layout as well as a brief introduction of the table, to minimize the cognitive overload on these individuals. • to evaluate the accessibility of digital library interfaces, 16 heuristics were proposed to make the digital libraries in reach of users, however, more heuristics are needed to make generalized interfaces for all individuals.108 • the nontext elements of digital library collections should have meaningful descriptions for better understandability of blind and visually impaired individuals. the user-generated content about these nontext elements could be used for cataloging.109 • the rich semantics of tables can be exploited for cataloging and classification that will be helpful in exploratory searching. • as the michigan state university libraries has taken the initiative of assessing and improving the accessibility of digital library content by adopting the wcag guidelines, other libraries can also adopt the model for providing accessible content to their users including blind and visually impaired individuals. • the development of new data sets for tables in multiple domains can facilitate the researchers in interpreting tables and establishing relationships in cross-domains. this review paper is an attempt to highlight the knowledge gap in processing the pdf tables and its accessibility for blind and visually impaired individuals. an efficient and open-source solution for making pdf documents accessible to blind and visually impaired people needs to exploit the heuristics, ontologies, machine learning, and deep learning by using open-source libraries and tools for understanding and interpreting the tabular content in order to reduce information overload. endnotes 1 roya rastan, “automatic tabular data ex wcag traction and understanding” (phd diss., university of new south wales, 2017). 2 mark t. maybury, “communicative acts for explanation generation,” international journal of man-machine studies 37, no. 2 (1992): 135–72. 3 patricia wright, “the comprehension of tabulated information: some similarities between reading prose and reading tables,” nspi journal 19, no. 8 (1980): 25–29, https://doi.org/10.1002/pfi.4180190810. https://doi.org/10.1002/pfi.4180190810 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 14 4 jean-claude guédon et al., future of scholarly publishing and scholarly communication: report of the expert group to the european commission (brussels: european commission, directorategeneral for research and innovation, 2019), https://doi.org/10.2777/836532. 5 world health organization, world report on vision, october 8, 2019, https://www.who.int/publications-detail/world-report-on-vision/. 6 mireia ribera turró, “are pdf documents accessible?” information technology and libraries 27, no. 3 (2008): 25–43, https://doi.org/10.6017/ital.v27i3.3246. 7 kyunghye yoon, laura hulscher, and rachel dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites,” library quarterly 86, no. 2 (2016): 213–29, https://doi.org/10.1086/685399. 8 iris xie et al., “using digital libraries non-visually: understanding the help-seeking situations of blind users,” information research 20, no. 2 (2015): 673. 9 heidi m. schroeder, “implementing accessibility initiatives at the michigan state university libraries,” reference services review 46, no. 3 (2018): 399–413, https://doi.org/10.1108/rsr04-2018-0043. 10 joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no.4 (2016): 7–18, https://doi.org/10.6017/ital.v35i4.9469. 11 rakesh babu and iris xie, “haze in the digital library: design issues hampering accessibility for blind users,” electronic library 35, no. 5 (2017): 1052–65, https://doi.org/10.1108/el-102016-0209. 12 rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101. 13 xinxin wang, “tabular abstraction, editing, and formatting” (phd diss., university of waterloo, 1996). 14 rastan, “automatic tabular data extraction,” 25. 15 azadeh nazemi, “non-visual representation of complex documents for use in digital talking books” (phd diss., curtin university, 2015). 16 rastan, “automatic tabular data extraction,” 14. 17 max göbel et al., “icdar 2013 table competition,” in 2013 12th international conference on document analysis and recognition (2013): 1449–53, https://doi.org/10.1109/icdar.2013.292. 18 burcu yildiz, katharina kaiser, and silvia miksch, “pdf2table: a method to extract table information from pdf files,” in proceedings of the 2nd indian international conference on artificial intelligence (iicai, 2005): 1773–85; tamir hassan and robert baumgartner, “table recognition and understanding from pdf files,” in ninth international conference on https://doi.org/10.2777/836532 https://www.who.int/publications-detail/world-report-on-vision/ https://doi.org/10.6017/ital.v27i3.3246. https://doi.org/10.1086/685399 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.6017/ital.v35i4.9469 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.6017/ital.v38i4.11101 https://doi.org/10.1109/icdar.2013.292 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 15 document analysis and recognition (icdar 2007) (2007): 1143–47, https://doi.org/ 10.1109/icdar.2007.4377094; alexey shigarov et al., “tabbypdf: web-based system for pdf table extraction,” in international conference on information and software technologies (springer international publishing, 2018): 257–69, https://doi.org/10.1007/978-3-31999972-2_20. 19 minghao li et al., “tablebank: table benchmark for image-based table detection and recognition,” preprint, arxiv:1903.01949; sebastian schreiber et al., “deepdesrt: deep learning for detection and structure recognition of tables in document images,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 1162–67, https://doi.org/10.1109/icdar.2017.192. 20 zewen chi et al., “complicated table structure recognition,” preprint, arxiv:1908.04729. 21 michael cafarella et al., “ten years of webtables,” in proceedings of the vldb endowment 11, no. 12 (august 2018): 2140–49, https://doi.org/10.14778/3229863.3240492. 22 shah khusro, asima latif, and irfan ullah. “on methods and tools of table detection, extraction and annotation in pdf documents,” journal of information science 41, no. 1 (2015): 41–57, https://doi.org/10.1177/0165551514551903. 23 hassan, “table recognition and understanding”; richard zanibbi, dorothea blostein, and james r cordy, “a survey of table recognition,” document analysis and recognition 7, no. 1 (2004): 1–16, https://doi.org/10.1007/s10032-004-0120-9; andreiwid sheffer corrêa and pär-ola zander, “unleashing tabular content to open data: a survey on pdf table extraction methods and tools,” in proceedings of the 18th annual international conference on digital government research (june 2017): 54–63, https://doi.org/10.1145/3085228.3085278; christopher clark and santosh divvala, “looking beyond text: extracting figures, tables and captions from computer science papers” (paper, aaai workshops at the twenty-ninth aaai conference on artificial intelligence, austin, tx, january 25–26, 2015)., 24 ermelinda oro and massimo ruffolo, “pdf–trex: an approach for recognizing and extracting tables from pdf documents,” in 2009 10th international conference on document analysis and recognition (icdar) (2009): 906–10, https://doi.org/10.1109/icdar.2009.12. 25 vidhya govindaraju, ce zhang, and christopher ré, “understanding tables in context using standard nlp toolkits,” in proceedings of the 51st annual meeting of the association for computational linguistics (sofia, bulgaria: association for computational linguistics, august 2013): 658–64. 26 nikola milosevic et al., “disentangling the structure of tables in scientific literature,” in natural language processing and information systems, nldb 2016, lecture notes in computer science 9612 (springer, cham), https://doi.org/10.1007/978-3-319-41754-7_14. 27 rastan, “automatic tabular data extraction,” 48. https://10.0.4.85/icdar.2007.4377094 https://10.0.4.85/icdar.2007.4377094 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1109/icdar.2017.192 https://doi.org/10.14778/3229863.3240492 https://doi.org/10.1177/0165551514551903 https://doi.org/10.1007/s10032-004-0120-9 https://doi.org/10.1145/3085228.3085278 https://doi.org/10.1109/icdar.2009.12 https://doi.org/10.1007/978-3-319-41754-7_14 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 16 28 alexey shigarov, andrey mikhailov, and andrey altaev, “configurable table structure recognition in untagged pdf documents,” in proceedings of the 2016 acm symposium on document engineering, (2016): 119–22, https://doi.org/10.1145/2960811.2967152. 29 shigarov et al., “tabbypdf,” 262, 263, 265. 30 dae hyun kim et al., “facilitating document reading by linking text and tables,” in proceedings of the 31st annual acm symposium on user interface software and technology (october 2018): 423–34, https://doi.org/10.1145/3242587.3242617. 31 hassan, “table recognition and understanding,” 1145. 32 jing fang et al., “a table detection method for multipage pdf documents via visual separators and tabular structures,” in 2011 international conference on document analysis and recognition (2011): 779–83, https://doi.org/10.1109/icdar.2011.304. 33 bahadar ali and shah khusro, “a divide-and-merge approach for deep segmentation of document tables,” in proceedings of the 10th international conference on informatics and systems (may 2016): 43–49, https://doi.org/10.1145/2908446.2908473. 34 wenyuan xue et al., “table analysis and information extraction for medical laboratory reports,” in 2018 ieee 16th intl conf on dependable, autonomic and secure computing, 16th intl conf on pervasive intelligence and computing, 4th intl conf on big data intelligence and computing and cyber science and technology congress (dasc/picom/datacom/cyberscitech) (2018): 193–99, https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043. 35 roya rastan, hye-young paik, and john shepherd, “texus: a unified framework for extracting and understanding tables in pdf documents,” information processing & management 56, no. 3 (2019): 895–918, https://doi.org/10.1016/j.ipm.2019.01.008. 36 dafang he et al., “multi-scale multi-task fcm for semantic page segmentation and table detection,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 254–61, https://doi.org/10.1109/icdar.2017.50. 37 jing fang et al., “table header detection and classification,” in proceedings of the twenty-sixth aaai conference on artificial intelligence (july 2012): 599–605. 38 he et al., “multi-scale multi-task,” 255. 39 martha o. perez-arriaga, trilce estrada, and soraya abad-mota, “tao: system for table detection and extraction from pdf documents,” florida artificial intelligence research society conference, north america (2016). 40 saman arif and faisal shafait, “table detection in document images using foreground and background features,” in 2018 digital image computing: techniques and applications (dicta), (2018): 1–8, https://doi.org/10.1109/dicta.2018.8615795. 41 schreiber et al., “deepdesrt,” 1163, 1164. https://doi.org/10.1145/2960811.2967152 https://doi.org/10.1145/3242587.3242617 https://doi.org/10.1109/icdar.2011.304 https://doi.org/10.1145/2908446.2908473 https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043 https://doi.org/10.1016/j.ipm.2019.01.008 https://doi.org/10.1109/icdar.2017.50 https://doi.org/10.1109/dicta.2018.8615795 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 17 42 shoaib ahmed siddiqui et al., “decnt: deep deformable cnn for table detection,” ieee access 6 (2018): 74151–61, https://doi.org/10.1109/access.2018.2880211. 43 chi et al., “complicated table structure recognition.” 44 rahul anand, hye-young paik, and cheng wang, “integrating and querying similar tables from pdf documents using deep learning,” 2019, preprint, arxiv:1901.04672. 45 jiaoyan chen et al., “colnet: embedding the semantics of web tables for column type prediction,” in proceedings of the aaai conference on artificial intelligence 33, no. 1: 29–36, https://doi.org/10.1609/aaai.v33i01.330129. 46 ziqi zhang, “towards efficient and effective semantic table interpretation,” in international semantic web conference (2014): 487–502, https://doi.org/10.1007/978-3-319-11964-9_31. 47 ivan ermilov, sören auer, and claus stadler, “user-driven semantic mapping of tabular data,” in proceedings of the 9th international conference on semantic systems (september 2013): 105–12, https://doi.org/10.1145/2506182.2506196. 48 martha o perez-arriaga, trilce estrada, and soraya abad-mota, “table interpretation and extraction of semantic relationships to synthesize digital documents,” in proceedings of the 6th international conference on data science, technology and application—data (2017): 223– 32, https://doi.org/10.5220/0006436902230232. 49 varish mulwad, “tabel—a domain-independent and extensible framework for inferring the semantics of tables,” (phd diss., university of maryland, 2015). 50 syed tahseen raza rizvi et al., “ontology-based information extraction from technical documents,” in proceedings of the 10th international conference on agents and artificial intelligence (icaart) (2018): 493–500, https://doi.org/10.5220/0006596604930500. 51 corrêa and zander, “unleashing tabular content to open data,” 55. 52 irfan ullah et al., “an overview of the current state of linked and open data in cataloging,” information technology and libraries 37, no. 4 (2018): 47–80, https://doi.org/10.6017/ital.v37i4.10432. 53 nosheen fayyaz, irfan ullah, and shah khusro, “on the current state of linked open data: issues, challenges, and future directions,” international journal on semantic web and information systems (ijswis) 14, no. 4 (2018): 110–28, https://doi.org/10.4018/ijswis.2018100106. 54 govindaraju, zhang, and ré , “understanding tables in context using standard nlp toolkits,” 660, 661. 55 perez-arriaga, estrada, and abad-mota, “table interpretation and extraction,” 227. 56 kim et al., “facilitating document reading,” 425, 426. https://doi.org/10.1109/access.2018.2880211 https://doi.org/10.1609/aaai.v33i01.330129 https://doi.org/10.1007/978-3-319-11964-9_31 https://doi.org/10.1145/2506182.2506196 https://doi.org/10.5220/0006436902230232 https://doi.org/10.5220/0006596604930500 https://doi.org/10.6017/ital.v37i4.10432 https://doi.org/10.4018/ijswis.2018100106 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 18 57 rastan, pail, and shepherd, “texus,” 906. 58 nikola milosevic et al., “a framework for information extraction from tables in biomedical literature,” international journal on document analysis and recognition (ijdar) 22, no. 1 (2019): 55–78, https://doi.org/10.1007/s10032-019-00317-0. 59 chi et al., “complicated table structure recognition.” 60 wenhao yu et al., “tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system,” in the world wide web conference (may 2019): 3615–19, https://doi.org/10.1145/3308558.3314118. 61 anand, paik, and wang, “integrating and querying similar tables.” 62 turró, “are pdf documents accessible?” 2, 4. 63 nazemi, “non-visual representation of complex documents,” 110, 111, 112, 118. 64 juan cao, “generating natural language descriptions from tables,” ieee access 8 (2020): 46206–16, https://doi.org/10.1109/access.2020.2979115. 65 maartje ter hoeve et al., “conversations with documents: an exploration of document-centered assistance,” in proceedings of the 2020 conference on human information interaction and retrieval (march 2020): 43–52, https://doi.org/10.1145/3343413.3377971. 66 guédon et al., “future of scholarly publishing,” 42. 67 w3c, “wcag 2.0.” 68 world health organization, “world report on vision”; david reinsel, john gantz, and john rydning, “data age 2025: the digitization of the world, from edge to core,” idc white paper, #us44413318 (framingham, ma: idc, november 2018), https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataagewhitepaper.pdf/. 69 rastan, “automatic tabular data extraction,” 18, 19. 70 arif and shafait, “table detection in document images,” 1. 71 ana costa e silva, “parts that add up to a whole: a framework for the analysis of tables,” (phd diss., edinburgh university, uk, 2010). 72 milosevic et al., “a framework for information extraction from tables,” 60. 73 rastan, “automatic tabular data extraction,” 14. 74 chen et al., “colnet,” 31. 75 mulwad, “tabel,” 23; zewen, “complicated table structure recognition.” 76 siddiqui et al., “decnt,” 74160. https://doi.org/10.1007/s10032-019-00317-0 https://doi.org/10.1145/3308558.3314118 https://doi.org/10.1109/access.2020.2979115 https://doi.org/10.1145/3343413.3377971 https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 19 77 david w embley, sharad seth, and george nagy, “transforming web tables to a relational database,” 2014 22nd international conference on pattern recognition (2014) 2781–86, https://doi.org/10.1109/icpr.2014.479. 78 milosevic et al., “a framework for information extraction from tables,” 56. 79 milosevic et al., “a framework for information extraction from tables,” 55, 56. 80 kim et al., “facilitating document reading,” 432. 81 chen et al., “colnet,” 36. 82 asima latif et al., “a hybrid technique for annotating book tables,” int. arab j. inf. technol 15, no. 4 (2018): 777–83. 83 rastan, paik, and shepherd, “texus,” 909. 84 milosevic et al., “a framework for information extraction from tables,” 61, 62, 65, 66. 85 rizvi et al., “ontology-based information extraction,” 496. 86 siddiqui et al., “decnt,” 74160. 87 max göbel et al., “a methodology for evaluating algorithms for table understanding in pdf documents,” in proceedings of the 2012 acm symposium on document engineering (september 2012): 45–48, https://doi.org/10.1145/2361354.2361365. 88 rastan, paik, and shepherd, “texus,” 917. 89 david pinto et al., “table extraction using conditional random fields,” in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval (july 2003): 235–42, https://doi.org/10.1145/860435.860479. 90 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 91 ullah et al., “current state of linked and open data in cataloging,” 47, 48. 92 julius t. nganji, “the portable document format (pdf) accessibility practice of four journal publishers,” library and information science research 37, no.3 (2015): 254–62, https://doi.org/10.1016/j.lisr.2015.02.002. 93 julius t. nganji, “an assessment of the accessibility of pdf versions of selected journal articles published in a wcag 2.0 era (2014–2018),” learned publishing 31, no. 4 (2018): 391–401, https://doi.org/10.1002/leap.1197. 94 wittmann et al., “from digital library to open datasets,” 49, 50. 95 yan han and xueheng wan, “digitization of text documents using pdf/a,” information technology and libraries 37, no. 1 (2018): 52–64, https://doi.org/10.6017/ital.v37i1.9878. https://doi.org/10.1109/icpr.2014.479 https://doi.org/10.1145/2361354.2361365 https://doi.org/10.1145/860435.860479 https://doi.org/10.1016/j.lisr.2015.02.002 https://doi.org/10.1002/leap.1197 https://doi.org/10.6017/ital.v37i1.9878 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 20 96 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 97 xie et al., “using digital libraries non-visually,” paper 673. 98 babu and xie, “haze in the digital library,” 1057–59. 99 iris xie et al., “enhancing usability of digital libraries: designing help features to support blind and visually impaired users,” information processing and management 57, no. 3 (2020): 102110, https://doi.org/10.1016/j.ipm.2019.102110. 100 chen et al., “colnet,” 31, 32. 101 kim et al., “facilitating document reading,” 432. 102 milosevic et al., “a framework for information extraction from tables,” 61. 103 rizvi et al., “ontology-based information extraction,” 496. 104 embley, seth, and nagy, “transforming web tables to a relational database,” 2783; milosevic et al., “a framework for information extraction from tables,” 60. 105 nicholas j tierney and karthik ram, “a realistic guide to making data available alongside code to improve reproducibility,” preprint, arxiv:2002.11626. 106 rastan, paik, and shepherd, “texus,” 917. 107 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 108 mexhid ferati and wondwossen m. beyene, “developing heuristics for evaluating the accessibility of digital library interfaces,” in universal access in human–computer interaction, design and development approaches and methods, uahci 2017, lecture notes in computer science 10277 (springer, cham), https://doi.org/10.1007/978-3-319-58706-6_14. 109 ullah et al., “current state of linked and open data in cataloging,” 64. https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.1016/j.ipm.2019.102110 https://doi.org/10.1007/978-3-319-58706-6_14 abstract introduction the current state of table processing table extraction and processing using heuristics using segmentation using machine learning and deep learning approaches using ontologies relationship of tables with content and context existing accessibility-driven solutions for pdf documents issues and challenges in the existing systems table structure table formats table interpretation table evaluation table presentation to blind and visually impaired users accessibility of digital library collection conclusions and future research directions endnotes applying gamification to the library orientation: a study of interactive user experience and engagement preferences articles applying gamification to the library orientation a study of interactive user experience and engagement preferences karen nourse reed and a. miller information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12209 karen nourse reed (karen.reed@mtsu.edu) is associate professor, middle tennessee state university. a. miller (a.miller@mtsu.edu) is associate professor, middle tennessee state university. © 2020. abstract by providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. the benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. one academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. the library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups. introduction background newcomers to the academic campus can be a bit overwhelmed by their unfamiliar environment: there are faces to learn, services and processes to navigate, and an unexplored landscape of academic buildings to traverse. whether one is an incoming student or recently hired employee of the university, all need to become quickly oriented to their surroundings to ensure productivity. in the midst of this transition, the academic library may or may not be on the list of immediate inquiries; however, the library is an important place to start. newcomers would be wise to familiarize themselves with the building and its services so that they can make optimal use of its offerings. two studies found that students who used the library received better grades and had higher retention rates. 1 another study regarding university employees revealed that untenured faculty made less use of the library than tenured faculty, a problem attributed to lack of familiarity with the library.2 researchers have also found that faculty will often express interest in different library services without realizing that these services are in fact available.3 it is safe to say that libraries cannot always rely on newcomers to discover the physical and electronic services on their own; they need to be shown these items in order to mitigate the risk of unawareness. in consideration of these issues, the walker library at middle tennessee state university (mtsu) recognized that more could be done to welcome its new arrivals to campus. the public university enrolls approximately 21,000 students, the majority of whom are undergraduates. however, with a carnegie classification of doctoral/professional and over one hundred graduate degree programs, there was a strong need for specialized research among the university’s graduate students and faculty. other groups needed to use the library too: non-faculty employees on campus as well as community users who frequently used walker library for its specialized and general collections. the authors realized that when new members of these different groups mailto:karen.reed@mtsu.edu mailto:a.miller@mtsu.edu information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 2 arrived on campus, few opportunities were available for acclimation to the library’s services or building layout. limited orientation experiences were conducted within library instruction classes, but these sessions primarily taught research skills and targeted freshman generaleducation classes as well as select upper-division and graduate classes. in short, it appeared that students, employees, and visitors to the university would largely have to discover the library’s services on their own through a search on the library website or an exploration of the physical library. it was very likely that, in doing so, the newcomers might miss out on valuable services and information. as mtsu librarians, the authors felt strongly that library orientations were important to everyone at the university so that they might make optimal use of the library’s offerings. the authors based this opinion on their knowledge of relevant scholarly literature as well as their own anecdotal experiences with students and faculty.4 the authors defined the library orientation differently from library instruction: in their view, an orientation should acquaint users with the services and physical spaces of the library, as compared to instruction that would teach users how to use the library’s electronic resources such as databases. the desired new approach would structure orientations in response to the different needs of the library’s users. for example, the authors found that undergraduates typically had distinct library interests compared to faculty. it was recognized that library orientations were time-consuming for everyone: library patrons at mtsu often did not want to take the time for a physical tour, nor did the library have the staffing to accommodate large-scale requests. the authors turned to the gamification trend, and specifically interactive storytelling, as a solution. interactive storytelling has previous applications in librarianship as a means of creating an immersive and self-guided user experience.5 however, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. to overcome this gap, the authors developed an online, interactive, game-like experience via storytelling software to orient four different groups of users to the library’s services. these groups were undergraduate students, graduate students, faculty members (which included both faculty and staff at the university), and community members (i.e., visitors to the university or alumni); see figure 1 for an illustration of each groups’ game avatars. these groups were invited to participate in the gamified experience called libgo (short for library game orientation). after playing libgo, participants gave feedback through an online survey. this paper will give a brief explanation of the creation of the game, as well as describe the results of research conducted to understand the impact of the gamified experience across the four user groups. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 3 figure 1. libgo players were allowed to self-select their user group upon entering the game. each of the four user groups was assigned an avatar and followed a logic path specified for that group. literature review traditional orientation searches for literature on library orientation yield very broad and yet limited details about users of the traditional library orientation method. it is important to note that the terms “library tour” and “library orientation” can be somewhat vague, because this terminology is not interchangeable, yet is frequently treated as such in the literature.6 these terms are often included among library instruction materials which predominately influence undergraduate students.7 kylie bailin, benjamin jahre, and sarah morris define orientation as “any attempt to reduce library anxiety by introducing students to what a college/university library is, what it contains, and where to find information while also showing how helpful librarians can be.”8 their book is a culmination of case studies of academic library orientation in various forms worldwide where the common theme across most chapters is the need to assess, revise, and change library orientation models as needed, especially in response to feedback, staff demands, and the evolving trend of libraries and technology.9 furthermore, the majority of these studies are undergraduate-focused, and often freshman-focused, while only a few studies are geared towards graduate students. other traditional orientation problems discussed in the literature include students lacking intrinsic motivation to attend library orientation, library staff time required to execute the orientation, and lack of attendance.10 additionally, among librarians there seems to be consensus that the traditional library tours are the least effective means of orientation, yet they are the most highly used and with attention predominately focused on the undergraduate population alone. 11 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 4 in 1997, pixey anne mosely described the traditional guided library tour as ineffective, and documented the trend of libraries discontinuing it in favor of more active learning options.12 her study surveyed 44 students who took a redesigned library tour, all of whom were undergraduates (with freshmen as the target population). although mosely’s study only addressed one group of library users, it does attempt to answer a question on library perception whereby 93 percent of surveyed students indicated feeling more comfortable in using the library after the more active learning approach.13 a comparison study by marcus and beck looked at traditional vs treasure hunt orientations, and ultimately discovered that perception of the traditional method is limited by the selective user population and lack of effective measurements. they cited the need for continued study of alternative approaches to academic library orientation.14 a study by kenneth burhanna, tammy eschedor voelker, and julie gedeon looked at the traditional library tour from the physical and virtual perspective. confronted with a lack of access to the physical library, these researchers at kent state university decided to add an online option for the required traditional freshman library tour.15 their study compared the efficacy of learning and affective outcomes between face-to-face library tours and those of online library tours. of the 3,610 students who took the required library tour assignment, 3,567 chose the online tour method and 63 opted or were required to take the in-person, librarian-led tour. surveys were later sent to a random list of 250 students who did not take the in-person tour and the 63 students who did take the in-person tour. of the 46 usable responses all but one were undergraduates and 39 (85 percent) of them were freshman.16 this is a small sample size with a ratio of slightly greater than 2:1 for online versus in-person tour participation. although results showed that an instructor’s recommendation on format selection was the strongest influencing factor, convenience was also significant for those who selected the online option (81.5 percent). in contrast, only 18.5 percent of the students who took the face-toface tour rated it as convenient. the authors found that regardless of tour type, students were more comfortable using the library (85 percent) and more likely to use library resources (80 percent) after having taken a library tour. interestingly, students who took the online tour seemed slightly more likely to visit the physical library than those who took the in-person tour. ultimately the analysis of both tours showed this method of library orientation encourages library resource use, and the “online tour seems to perform as well, if not slightly better than the in-person tour.”17 gamification use in libraries an alternative format to the traditional method is gamification. gamification has become a familiar trend within academic libraries in recent years, and most often refers to the use of a technology based game delivery within an instructional setting. some users find gamified library instruction to be more enjoyable than traditional methods. for these people, gamification can potentially increase student engagement as well as retention of information.18 the goal of gamification is to create a simplified reality with a defined user experience. kyle felker and eric phetteplace emphasized the importance of user interaction over “specific mechanics or technologies” in thinking about the gamification design process.19 proponents of gamification of library instructional content indicate that it connects to the broader mission of library discovery and exploration as exemplified through collaboration and the stimulation of learning.20 additional benefits of gamification are its teaching, outreach and engagement functions.21 many researchers have documented specific applications of online gaming as a means of imparting library instruction. mary j. broussard and jessica urick oberlin described the work of librarians at lycoming college in developing an online game as one approach to teaching about information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 5 plagiarism.22 melissa mallon offered summaries of nine games produced for higher education, several of which were specifically created for use by academic libraries.23 many of these online library games reviewed used flash, or required players to download the game before playing. by contrast, j. long detailed an initiative at miami university to integrate gamification into the library instruction, a project which utilized twine.24 twine is an in-browser method and therefore avoids the problem of requiring users to download additional software prior to playing the game. other libraries have used online gamification specifically as a tool for library orientations. although researchers have demonstrated that the library orientation is an important practice in establishing positive first impressions of the library and counteracting library anxiety among new users, the differences between in-person versus online delivery formats are unclear.25 several successful instances have been documented in which the orientation was moved to an online game format. nancy o’hanlon, karen diaz, and fred roecker described a collaboration at ohio state university libraries between librarians and the office of first year experience; for this project, they created a game to orient all new students to the library prior to arrival on campus.26 the game was called “head hunt,” and was cited among those games listed in the article by mallon. 27 anna-lise smith and lesli baker reported the “get a clue” game at utah valley university which oriented new students over two semesters.28 another orientation game developed at california state university-fresno was noteworthy for its placement in the university’s learning management system (lms).29 in reviewing the literature regarding online library gamification efforts, there appear to be several best practices. several studies cite initial student assessment to understand student knowledge and/or perceptions of the content, followed by an iterative design process with a team of librarians and computer programmers.30 felker and phetteplace reinforced the need for this iterative process of prototyping, testing, deployment, and assessment as one key to success; however they also stated that the most prevalent reason for failure is that the games are not fun for users.31 librarians are information experts, and are not necessarily trained in fun game design. some libraries have solved this problem by partnering with or hiring professional designers; however for many under-resourced libraries, this is not an option.32 taking advantage of opensource tools, as well as the documented trial-and-error practices of others, can be helpful to newcomers who wish to break into new library engagement methods utilizing gamification. as literature has shown, a traditional library tour may have a place in the list of library services, but for whom and at what cost are questions with limited answers in studies done to date. gamification has offered an alternative perspective but with narrow accounts of its success in the online storytelling format and for users outside of the heavily focused freshman group. across all literature of library orientation studies, there is little reference to other library user populations such as faculty, staff, community users, distance students, or students not formally part of a class that requires library orientation. development of the library game orientation (libgo) libgo was developed by the authors with not only a consideration for the walker library user experience, but also with a specific attention to the differing needs of the multiple user groups served by the library. this user-focused concern led to exploring creative methodologies such as user experience research and human-centered design thinking, a process of overlapping phases that produces a creative and meaningful solution in a non-linear way. the three pillars of design information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 6 thinking are inspiration, ideation, and iteration.33 defining the problem and empathizing with the users (inspiration) led into the ideation phase, whereby the authors created lowand high-fidelity prototypes. the prototypes were tested and improved (iteration) through the use of beta testing in which playtesters interacted with the gamified orientation. the authors were novice developers of the gamified orientation, and this entailed a learning curve for not only the design thinking mindset but also the technical achievability. the development started with design thinking conversations and quickly turned to low-fidelity prototypes designed on paper. the development soon advanced to the actual coding so that the authors could get early designs tested before launching the final version. prior to deployment on the library’s website, libgo underwent a series of playtesting by library faculty, staff, and student employees. this testing was invaluable and led to such improvements as streamlining of processes and less ambiguity of text. libgo was developed with the twine open-source software (https://twinery.org), a product which is primarily used for telling interactive, non-linear stories with html. twine was an excellent application for this project as it allowed the creation of an online and interactive “choose your own adventure” styled library orientation game, in which users could explore the library based upon their selection of one of multiple available plot directions. with a modest learning curve and as an open source software, twine is highly accessible for those who are not accustomed to coding. for those who know html, css, javascript, variables, and conditional logic, twine’s capabilities can be extended. the library’s interactive orientation adventure requires users to select one of the four available personas: undergraduate student, graduate student, faculty, or community member. users subsequently follow that persona through a non-linear series of places, resources and points of interest built with the html output of using twee (twine’s programming language). see figure 2 for an example point of interest page and figure 3 for an example of a user’s final score after completing the gamified experience. once the twine story went through several iterations of design and testing, the html file was placed on the library’s website for the gamified orientation to be implemented with actual users. https://twinery.org/ information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 7 figure 2. this instructional page within libgo explains how to reserve different library spaces online. upon reading this content, the user will progress by clicking on one of the hypertext lines in blue font at the bottom. figure 3. based upon the displayed avatar, this libgo page is representative of a graduate student’s completion of libgo. the page indicates the player’s final score and gives additional options to return to the home page or complete the survey. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 8 purpose of study libgo utilized the common "choose your own adventure" format whereby players progress through a storyline based upon their selection of one of multiple available plot directions. although the literature suggests that other technology-based methods are an engaging and instructive mode of content delivery, little prior research exists regarding this specific approach to library outreach. furthermore, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. the researchers wanted to understand the potential of interactive storytelling as a means to educate a range of users on library services as well as make the library more approachable from a user perspective. the study was designed to understand the user experience of each of the four groups. the researchers hoped to discern which users, if any, found the gamified experience to be a helpful method of orientation to the library’s physical and electronic services. another area of inquiry was to determine whether this might be an effective delivery method by which to target certain segments of the campus for outreach. finally, the study intended to determine whether this method of orientation might incline participants toward future use of the library. methodology overview the authors selected an embedded mixed methods design approach in which quantitative and qualitative data were collected concurrently through the same assessment instrument.34 the survey instrument primarily collected quantitative data, however a qualitative open-response question was embedded at the end of the survey: this question gathered additional data by which to answer the research questions. each data set (one quantitative and one qualitative) was analyzed separately for each participant group, and then the groups were compared to develop a richer understanding of participant behavior. research questions the data collection and subsequent analysis attempted to answer the following questions: 1. which group(s) of library users prefer to be oriented to library services and resources through the interactive storytelling format, as compared to other formats? 2. which group(s) of library users are more likely to use library services and resources after participating in the interactive storytelling format of orientation? 3. what are user impressions of libgo, and are there any differences in impression based on the characteristics of the unique user group? participants participants for the study were recruited in-person and via the library website. in-person recruitment entailed the distribution of flyers and use of signage to recruit participants to play libgo in a library computer lab during a one-day event. online recruitment lasted approximately ten weeks and simply involved the placement of a link to libgo on the home page of th e library’s website. a total of 167 responses were gathered through both methods and participants were distributed as shown in table 1. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 9 table 1. composition of study’s participants group number affiliation number of responses 1 undergraduate students 55 2 graduate students 62 3 faculty 13 4 staff 28 5 community members 9 total 167 for the purposes of statistical data analysis, groups 3 and 4 were combined to produce a single group of 41 university employee respondents; also, group 5’s data was not included in the statistical analysis due to the low number of participants. qualitative data for all groups, however, was included in the non-statistical analysis. survey instrument a survey with twelve total questions was developed for this study and was administered online through qualtrics. after playing libgo, participants were asked to voluntarily complete the survey; if they agreed, they were redirected to the survey’s website. before answering any survey questions, the instrument administered an informed consent statement to participants . all aspects of the research, including the survey instrument, were approved through the university’s institutional review board (protocol number 18-1293). the first part of the survey (see appendix a) consisted of ten questions, each with a ten-point likert scaled response. the first five questions were each designed to measure a preference construct, and the next five questions each measured a likelihood construct. the pref erence construct referred to participant’s preference for a library orientation: did they prefer libgo’s online interactive storytelling format, or did they prefer another format such as in-person talks? the likelihood construct referred to the participant’s self-perceived likelihood of more readily engaging with the library in the future (both in-person and online) after playing libgo. the second part of the survey gathered the participant’s self-reported affiliation (see table 1 for the list of possible group affiliations) as well as offered participants an open-ended response area for optional qualitative feedback. data collection the study’s data was collected in two stages. in stage one, libgo was unveiled to library visitors during a special campus-wide week of student programming events. on the library’s designated event day, the researchers held a drop-in event at one of the library’s computer labs (see figure 4 for an example of event advertisement). library visitors were offered a prize bag and snacks if they agreed to play libgo and complete the survey. during the three-hour-long drop-in session, 58 individual responses were collected: the vast majority of these came from undergraduate students (51 responses), with additional responses from graduate students (n = 4), university staff employees (n = 2), and one community member responding. community members were defined as anyone not currently directly affiliated with the university; this group may have included prospective students or alumni. stage 2 began the following day after the library drop-in event, and simply involved the placement of a link to libgo on the home page of the library’s website. any visitor to the library’s website could click on the advertisement to be taken to libgo. this link information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 10 remained active on the library website for ten weeks, at which point the final data was gathered. a total of 167 responses were gathered during both stages and participants were distributed as previously shown in table 1. figure 4. example of student libgo event advertisement results quantitative findings statistical analysis of each of the ten quantitative questions required the use of one-way anova in spss. a post hoc test (hochberg’s gt2) was run in each instance to account for the different sample sizes. for all statistical analysis, only the data from undergraduates, graduate students, and university employees (a group which combined both faculty and staff results) were utilized. a listing of mean comparisons by group, for each of the ten survey questions, may be found in table 2. the analysis of the one-way anovas yielded statistically significant results for three of the ten individual questions in the first part of the survey: questions 2, 3, and 6 (see table 3). table 2. descriptive statistics for survey results (10-point scale, with 10 as most likely) survey question mean for undergraduate students mean for graduate students mean for university employees 1. in considering the different ways to learn about walker library, do you find this library orientation game to be more or less preferable as compared to other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)? 7.02 6.39 6.02 2. in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources? 8.13 6.94 7.12 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 11 3. if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own?) 7.38 5.94 5.98 4. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services by my own exploration of the library website?” 6.11 6.50 5.88 5. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services through an inperson orientation tour.” 6.11 5.08 5.76 6. after playing this orientation game, are you more or less likely to visit walker library in person? 8.27 6.94 6.90 7. after playing this library orientation game, are you more or less likely to use the walker library website to find out about the library (such as hours of operation, where to go to get different materials/services, etc.)? 7.82 6.97 7.20 8. after playing this library orientation game, are you more or less likely to seek help from a librarian at walker library? 6.95 6.58 6.63 9. after playing this library orientation game, are you more or less likely to use the library’s online resources (such as databases, journals, e-books)? 7.67 7.15 6.90 10. after playing this library orientation game, are you more or less likely to attend a library workshop, training, or event? 6.96 6.73 6.24 table 3. overall statistically significant group differences df f p w2 question 2 2 3.714 .027 .03 question 3 2 4.508 .012 .04 question 6 2 7.178 .001 .07 question 2 asked “in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 3.714, p = .027, ω2 = .03). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to prefer libgo in this manner (m = 8.13, sd = 1.94, p = .031) as information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 12 compared to the graduate students (m = 6.94, sd = 2.72). there was no statistically significant difference between undergraduates and the university employees (p = .145). according to criteria suggested by roger kirk, the effect size of .03 indicates a small effect in perceived usefulness of libgo as an introduction among undergraduates.35 question 3 asked “if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)?” the one-way anova found that there was a statistically significant difference between groups (f(2, 155) = 4.508, p = .012, ω2 = .04). the post hoc comparison using the hochberg’s gt2 test found that undergraduates were statistically significantly more likely to prefer libgo over other orientation options (m = 7.38, sd = 2.49, p = .021) as compared to graduate students (m = 5.94, sd = 3.06). there was no statistically significant difference between undergraduates and university employees (p = .053). the effect size of .04 indicates a small effect regarding undergraduate preference for libgo versus other orientation options. question 6 asked “after playing this library orientation game, are you more or less likely to visit walker library in person?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 7.178, p = .001, ω2 = .07). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to visit the library after playing libgo (m = 8.27, sd = 2.09, p = .003) as compared to graduate students (m = 6.94, sd = 2.20). additionally, the test found that undergraduates were statistically significantly more likely to visit the library after playing libgo (p = .007) as compared to university employees (m = 6.90, sd = 2.08). according to criteria suggested by kirk, the effect size of .07 indicates a medium effect regarding undergraduate potential to visit the library in person after playing libgo. 36 in addition to testing each individual survey question, tests were run to understand the possible group differences by construct (preference and likelihood). the preference construct was an aggregate of survey questions 1-5, and the likelihood construct was an aggregate of survey questions 6-10. for both constructs, the one-way anova found results which were not statistically significant. in all, the quantitative findings indicated three areas by which the experience of playing libgo was more helpful for the surveyed undergraduates than the other surveyed groups (i.e., graduate students or university employees). at this point, the analysis turned to the qualitative data so as to better understand participant views of libgo. qualitative findings analysis of the qualitative results was limited to the data collected in the survey’s final question. question 12 was an open-response area, and was intentionally prefaced with a vague prompt: “do you have any final thoughts for the library (suggestions, additions, modification, comments, criticisms, praise, etc.)?” of the 167 total survey responses, 67 individuals chose to answer this question. preliminary analysis showed that the feedback derived from this question covered a spectrum of topics, ranging from remarks on the libgo experience itself to broader concerns regarding other library services. open coding strategies were utilized to interpret the content of participant responses. under this methodology, the responses were evaluated for general themes and then coded and grouped information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 13 under a constant comparative approach.37 nvivo 12 software was used to code all 67 participant responses. initial coding yielded eight open codes, but these were later consolidated into six final codes (see table 4). one code (libgo improvement tip) was rather nuanced and yielded five axial codes (see table 5). axial codes denoted secondary concerns which fell under a larger category of interest. although some participants gave longer feedback which addressed multiple concerns, care was taken to segregate each distinct concern to a specific code. therefore, it is important to note that some comments addressed multiple concerns, and so the total number of concerns (n = 76) is greater than the total number of individuals responding to the prompt (n = 67). table 4. distribution of qualitative codes by user group code undergraduate graduate faculty staff community member total # concerns positive feedback 7 7 1 4 2 21 negative feedback 1 2 0 3 0 6 in-person tour preference 2 3 0 1 0 6 libgo improvement tip 5 11 1 3 3 23 library services feedback 2 4 3 0 0 9 library building feedback 1 7 1 2 0 11 total: 18 34 6 13 5 76 discussion of qualitative themes positive feedback (21 separate concerns). affirmative comments regarding libgo were primarily split between undergraduate and graduate students, with a small number of comments coming from the other groups. although all groups stated that the game was helpful, one undergraduate wrote “i wish i would’ve received this orientation at the very beginning of the year!” a graduate student declared “this was a creative way to engage students, and i think it should be included on the website for fun.” both community members commented on the utility of libgo in providing an orientation without having to physically come to the library; for example, “interactive without having to actually attend the library in person which i liked.” additionally, a community member pointed out the instructional capability of libgo, writing “i think i learned more from the game than walking around in the library.” negative feedback (6 separate concerns). unfavorable comments regarding libgo primarily challenged the orientation’s characterization as a “game” in terms of its lack of fun. one graduate student wrote a comment representative of this concern by stating, “the game didn’t really seem like a game at all.” a particularly searing comment came from a university staff member who information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 14 wrote, “calling this collection of web pages an ‘interactive game’ is a stretch, which is a generous way of stating it.” in-person tour preference (6 separate concerns). a small number of concerns indicated a preference for in-person orientations versus online. one undergraduate cited the ability to ask questions during an in-person tour as an advantage of that delivery medium. a graduate student mentioned their desire for kinesthetic learning over an online approach, writing, “i prefer hands on exploration of the library.” libgo improvement tip (23 separate concerns). suggested improvements to libgo were the largest area of qualitative feedback and produced five axial themes (subthemes); see table 5 for a breakdown of the five axial themes by group. 1. design issues were the largest cited area of improvement, and the most commonly mentioned design problem was the inability of the user to go back to previously seen content. although this functionality did in fact exist, it was apparently not intuitive to users; design modifications in future iterations are therefore critical. other users made suggestions as to the color scheme used and the ability to magnify image sizes. 2. user experience was another area of feedback, and primarily included suggestions on how to make libgo a more fun experience. one graduate student offered a role-playing game alternative. another graduate student expressed an interest in a game with side missions, in addition to the overall goals, where tokens could be earned for completed missions; the student justified these changes by stating “i feel that incorporating these types of idea will make the game more enjoyable.” in suggesting similar improvements, one undergraduate stated that libgo “felt more like a quiz than a game.” 3. technology issues primarily addressed two related issues: images not loading and broken links. images not loading could be dependent on many factors, including the user’s browser settings, internet traffic (volume) delaying load time, or broken image links, among others. broken links could be the root issue since the images used in libgo were taken from other areas of the library website. this method of gathering content pointed out a design vulnerability of using existing image locations (controlled by non-libgo developers) rather than images exclusively for libgo. 4. content issues were raised exclusively by graduate students. one student felt that libgo placed an emphasis on physical spaces in the library and did not give a deep enough treatment to library services. another graduate student asked for “an interactive map to click on so that we physically see the areas” of the library, thus making the interaction more user-friendly with a visual. 5. didn’t understand purpose is a subtheme where improvement is needed and is based on two comments made by the two university staff members. one wrote that “an online tour would have been better and just as informative,” although libgo was not only designed to be an online tour of the library, but also an orientation of the library’s services. the other staff member wrote, “i read the rules but it was still unclear what the objective was.” in all, it is clear that libgo’s purpose was confusing for some. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 15 table 5. libgo improvement tip axial codes by user group axial code undergraduate graduate faculty staff community member total # concerns design 4 3 0 0 1 8 user experience 1 2 1 0 1 5 tech issue 0 1 0 1 0 2 content 0 5 0 0 1 6 didn’t understand purpose 0 0 0 2 0 2 total: 5 11 1 3 3 23 library services feedback (9 separate concerns). several participants took the opportunity to provide feedback on general library services rather than on libgo itself. undergraduates simply gave general positive feedback about the value of the library, but many graduate students gave recommendations regarding specific electronic resource improvements. additionally, one graduate student wrote, “i think it is critical to meet with new graduate students before they start their program,” something the library used to do but had not pursued in recent years. although these comments did not directly pertain to libgo, the authors accepted all of them as valuable feedback to the library. library building feedback (11 separate concerns). this was another theme in which graduate students dominated the comments. feedback ranging from requests for microwave use, additional study tables and better temperature control in the building appeared. several participants asked for greater enforcement of quiet zones. like the library services feedback, the authors again took these comments as helpful to the overall library rather than libgo. discussion the results of this study indicated that some groups of library visitors better received the gamified library orientation experience than other groups. undergraduate students indicated the largest appreciation for a library orientation via libgo. specifically, they demonstrated a statistically significant difference over the other groups in supporting libgo’s usefulness as an orientation tool, a preference for libgo over other orientation formats, and a likelihood of future use of the physical library after playing libgo. these very encouraging results provide evidence for the efficacy of alternative means of library orientation. the qualitative results provided additional helpful insight regarding the user impressions from each of the five surveyed groups. this feedback demonstrated that a variety of groups benefited from the experience of playing libgo, including some community members who appreciated libgo as a means of becoming acclimated to the library without having to enter the building. a virtual orientation format was not ideal for a few players who indicated a preference for a face-toface orientation due to the ability to ask questions. many people identified areas of improvement for libgo. graduate students in particular offered a disproportionate number of suggestions as compared to the other groups. while they provided a great deal of helpful feedback, it is possible that graduate students were so distracted by the perceived problems that they could not fully take in the experience or gain value from libgo’s orientation purpose. it is also very likely that libgo information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 16 simply was not very fun for these players: several players noted that it did not feel like a game but rather a collection of content. the review of literature indicated that this amusement issue is a common pitfall of educational games. although the authors tried to design an enjoyable orientation experience, it is possible that more work is needed to satisfy user expectations. the mixed-methods design of this study was instrumental in providing a richer understanding of user perceptions. while the statistical analysis of participant survey responses was very helpful in identifying clear trends between groups, the qualitative analysis helped the authors draw valuable conclusions. specifically, the open-response data demonstrated that additional groups such as graduate students and community members appreciated the experience of playing libgo; this information was not readily apparent through the statistical analysis. additionally, the qualitative analysis demonstrated that many groups had concerns regarding areas of improvement that may have impaired their user experience. these important findings could help guide future directions of the research. in all, the authors concluded this phase of the research feeling satisfied that libgo showed great promise for library orientation delivery but could benefit from continued development and future user assessment. although undergraduate students seemed most receptive overall to a virtual orientation experience, other groups appeared to have benefited from the resource. study limitations a primary limitation of this study was its small sample size. as the entire university campus was targeted for participation in the study, the number of respondents was far too small to generalize the results. despite this limitation however, the study’s population reflected many different groups of library patrons on campus. the findings are therefore valuable as a means of stimulating future discussion regarding the value of alternative library orientation methods utilizing gamification. another limitation is that the authors did not pre-assess the targeted groups for their prior knowledge of walker library services and building layout, nor for their interest in learning about these topics. it is possible that various groups did not see the value in learning about the library for a variety of reasons. faculty members, in particular, may have considered their prior knowledge adequate for navigating the electronic holdings or building layout without recognizing the value of the other many services offered physically and electronically by the library. all groups may have experienced a level of “library anxiety” that prevented them from being motivated to learn more about the library.38 it is difficult to understand the range of covariate factors without a pre-assessment. finally, there was qualitative evidence supporting the limitation that libgo did not properly convey its stated purpose of orientation rather than imparting research skills. without understanding libgo’s focus on library orientation, users could have been confused or disappointed by the experience. although care was taken to make this purpose explicit, some users indicated their confusion in the qualitative data. this observed problem points to a design flaw that undoubtedly had some bearing on the study’s results. conclusion & future research convinced of the importance of the library orientation, the authors sought to move this traditional in-person experience to a virtual one. the quantitative results indicated that the gamified information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 17 orientation experience was useful to undergraduate students in its intended purpose of acclimating users to the library, as well as encouraging their future use of the physical library. at a time in which physical traffic to the library has shown a marked decline, new outreach strategies should be considered.39 the results were also helpful in showing that this particular iteration of the gamified orientation was preferred over other delivery methods by undergraduate students, as compared to other groups, to a statistically significant level. this is an important finding as it demonstrates that a diversified outreach strategy is necessary: different groups of library patrons desire their orientation information in different formats. the next logical question to ask however is: why did the other groups examined through the statistical data analysis (graduate students and faculty) not appreciate the gamified orientation to the same level as undergraduates? the answers to this question are complicated and may be explained in part by the qualitative analysis. based upon those findings, it is possible that the game did not appeal to these groups on the basis of fun or enjoyment; this concern was specifically mentioned by graduate students. faculty members, including staff, provided a smaller level of qualitative feedback; it is therefore difficult to speculate as to their exact reasons for disengagement with libgo. with this concern in mind, the authors would like to concentrate their next iteration of research on the specific library orientation needs of graduate students and faculty. both groups present different, but critical, needs for outreach. graduate students were the largest group of survey respondents, presumably indicating a high level of interest in learning more about the library. many graduate programs at mtsu are delivered partially or entirely online; as a result, these students may be less likely to come to campus. due to graduate students’ relatively infrequent visits to campus, a virtual library orientation could be even more meaningful for them in meeting their need for library services information. faculty are another important group to target because if they lack a full understanding of the library’s offerings, they are unlikely to assign assignments that wholly utilize the library’s services. although it is possible that faculty prefer an in-person orientation, many new faculty have indicated limited availability for such events. a virtual orientation seems conducive to busy schedules. however, it is possible that the issue is simply a matter of marketing: faculty may not know that a virtual option is available, nor do they necessarily understand all that the library has to offer. in all, future research should begin with a survey to understand what both groups already know about the library, as well as the library services they desire. another necessary step in future research would be the expansion of the development team to include computer programmers. although the authors feel that libgo holds great promise as a virtual orientation tool, more needs to be done to enhance the user’s enjoyment of the experience. twine is a user-friendly software that other librarians could pick up without having to be computer programmers; however, programmers (professional or student) could bring a design expertise to the project. future iterations of this project should incorporate the skills of multiple groups, including expertise in libraries, user research, visual design, interaction design, programming, marketing, and testers from each type of intended audience. collectively, this group will have the greatest impact on improving the user experience and ultimately the usefulness of a gamified orientation experience. this experience with gamification, and specifically interactive storytelling, was a valuable experience for walker library. these results should encourage other libraries seeking an alternate information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 18 delivery method for orientations. the authors hope to build upon the lessons learned from this mixed methods research study of libgo to find the correct outreach medium for their range of library users. acknowledgments special thanks to our beta playtesters and student assistants who worked the libgo event, which was funded, in part, by mt engage and walker library at middle tennessee state university. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 19 appendix a: survey instrument information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 20 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 21 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 22 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 23 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 24 endnotes 1 sandra calemme mccarthy, “at issue: exploring library usage by online learners with student success,” community college enterprise 23, no. 2 (january 2017): 27–31; angie thorpe et al., “the impact of the academic library on student success: connecting the dots,” portal: libraries and the academy 16, no. 2 (2016): 373–92, https://doi.org/10.1353/pla.20160027. 2 steven ovadia, “how does tenure status impact library usage: a study of laguardia community college,” journal of academic librarianship 35, no. 4 (january 2009): 332–40, https://doi.org/10.1016/j.acalib.2009.04.022. 3 chris leeder and steven lonn, “faculty usage of library tools in a learning management system,” college & research libraries, 75, no. 5 (september 2014): 641–63, https://doi.org/10.5860/crl.75.5.641. 4 kyle felker and eric phetteplace, “gamification in libraries: the state of the art,” reference and user services quarterly 54, no. 2 (2014): 19-23, https://doi.org/10.5860/rusq.54n2.19; nancy o’hanlon, karen diaz, and fred roecker, “a game-based multimedia approach to library orientation,” (paper, 35th national loex library instruction conference, san diego, may 2007), https://commons.emich.edu/loexconf2007/19/; leila june rod-welch, “let’s get oriented: getting intimate with the library, small group sessions for library orientation,” (paper, association of college and research libraries conference, baltimore, march 2017), http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/letsgetoriented.pdf. 5 kelly czarnecki, “chapter 4: digital storytelling in different library settings,” library technology reports, no. 7 (2009): 20-30; rebecca j. morris, “creating, viewing, and assessing: fluid roles of the student self in digital storytelling,” school libraries worldwide, no. 2 (2013): 54–68. 6 sandra marcus and sheila beck, “a library adventure: comparing a treasure hunt with a traditional freshman orientation tour,” college & research libraries 64, no. 1 (january 2003): 23–44, https://doi.org/10.5860/crl.64.1.23. 7 lori oling and michelle mach, “tour trends in academic arl libraries,” college & research libraries, 63, no. 1 (january 2002): 13-23, https://doi.org/10.5860/crl.63.1.13. 8 kylie bailin, benjamin jahre, and sarah morriss, “planning academic library orientations: case studies from around the world,” (oxford, uk: chandos publishing, 2018): xvi. 9 bailin, jahre, and morriss, “planning academic library orientations.” 10 marcus and beck, “a library adventure”; a. carolyn miller, “the round robin library tour,” journal of academic librarianship 6, no. 4 (1980): 215–18; michael simmons, “evaluation of library tours,” edrs, ed 331513 (1990): 1-24. 11 marcus and beck, “a library adventure”; oling and mach, “tour trends”; rod-welch, “let’s get oriented.” https://doi.org/10.1353/pla.20160027 https://doi.org/10.1016/j.acalib.2009.04.022 https://doi.org/10.5860/crl.75.5.641 https://commons.emich.edu/loexconf2007/19/ http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf https://doi.org/10.5860/crl.64.1.23 https://doi.org/10.5860/crl.63.1.13 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 25 12 pixey anne mosley, “assessing the comfort level impact and perceptual value of library tours,” research strategies 15, no. 4 (1997): 261–70, https://doi.org/10.1016/s07343310(97)90013-6. 13 mosley, “assessing the comfort level impact and perceptual value of library tours.” 14 marcus and beck, “a library adventure,” 27. 15 kenneth j. burhanna, tammy j. eschedor voelker, and jule a. gedeon, “virtually the same: comparing the effectiveness of online versus in-person library tours,” public services quarterly 4, no. 4(2008): 317–38, https://doi.org/10.1080/15228950802461616. 16 burhanna, voelker, and gedeon, “virtually the same,” 326. 17 burhanna, voelker, and gedeon, “virtually the same,” 329. 18 felker and phetteplace, “gamification in libraries.” 19 felker and phetteplace, “gamification in libraries,”20. 20 felker and phetteplace, “gamification in libraries.” 21 felker and phetteplace, “gamification in libraries”; o’hanlon et al., “a game-based multimedia approach.” 22 mary j. broussard and jessica urick oberlin, “using online games to fight plagiarism: a spoonful of sugar helps the medicine go down,” indiana libraries 30, no. 1 (january 2011): 28–39. 23 melissa mallon, “gaming and gamification,” public services quarterly 9, no. 3 (2013): 210–21, https://doi.org/10.1080/15228959.2013.815502. 24 j. long, “chapter 21: gaming library instruction: using interactive play to promote research as a process,” distributed learning (january 1, 2017), 385–401, https://doi.org/10.1016/b978-008-100598-9.00021-0. 25 rod-welch, “let’s get oriented.” 26 o’hanlon et al., “a game-based multimedia approach.” 27 mallon, “gaming and gamification.” 28 anna-lise smith and lesli baker, “getting a clue: creating student detectives and dragon slayers in your library,” reference services review 39, no. 4 (november 2011): 628–42, https://doi.org/10.1108/00907321111186659. 29 monica fusich et al., “hml-iq: frenso state’s online library orientation game,” college & research libraries news 72, no. 11 (december 2011): 626–30, https://doi.org/10.5860/crln.72.11.8667. https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1080/15228950802461616 https://doi.org/10.1080/15228959.2013.815502 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1108/00907321111186659 https://doi.org/10.5860/crln.72.11.8667 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 26 30 broussard and oberlin, “using online games”; fusich et al., “hml-iq”; o’hanlon et al., “a gamebased multimedia approach.” 31 felker and phetteplace, “gamification in libraries.” 32 felker and phetteplace, “gamification in libraries”; fusich et al., “hml-iq.” 33 “design thinking for libraries: a toolkit for patron-centered design,” ideo (2015), http://designthinkingforlibraries.com. 34 john w. creswell and vicki l. plano clark, designing and conducting mixed methods research (thousand oaks, ca: sage publications, 2007). 35 roger kirk, “practical significance: a concept whose time has come,” educational and psychological measurement, no. 5 (1996). 36 kirk, “practical significance.” 37 sandra mathison, “encyclopedia of evaluation,” sage, 2005, https://doi.org/10.4135/9781412950558. 38 rod-welch, “let’s get oriented.” 39 felker and phetteplace, “gamification in libraries.” http://designthinkingforlibraries.com/ https://doi.org/10.4135/9781412950558 abstract introduction background literature review traditional orientation gamification use in libraries development of the library game orientation (libgo) purpose of study methodology overview research questions participants survey instrument data collection results quantitative findings qualitative findings discussion of qualitative themes discussion study limitations conclusion & future research acknowledgments appendix a: survey instrument endnotes 6 japanese character input: its state and problems ichiko morita: ohio state university, columbus. computer processing of information is highly advanced in japan, and it continues to be researched and improved by the cooperative efforts of the government, private corporations, and individual scientists, who are among the best in the world. this paper introduces various approaches to the computer input of information currently developed in japan, and discusses the possibility of their applications to the processing of east asian-vernacular language materials in large research libraries in this country. processing of catalog information through an on-line shared-cataloging system has become a part of american libraries' common practice, and its financial and temporal savings have been proven. however, there are some materials not yet considered appropriate for computer processing. the library of congress' plans for romanizing catalog information for all non-roman language materials and putting them on marc tapes for quick distribution of information have been objected to by a large number of specialists in the field. the opponents' reason has been that computerization of vernacular languages by means of transliteration is not satisfactory. such materials are best handled in their own writing systems (the languages in this category include chinese, japanese, korean, hebrew, arabic, and various languages in india). those specialists in the field who see systems working for roman-alphabet materials generally agree that automated systems are very efficient and useful for their research. it would be best if non-roman language materials could be processed through computers using their own writing systems. as far as technology goes, it is possible to process such materials in their original form. systems that have the capability of handling those languages directly have been developed; among the most advanced are the japanese systems. japan has overcome numerous difficulties in developing systems that are capable of handling japanese characters. although automation of libraries is not as widespread as in the united states (due perhaps to a delay in the development of computers), some japanese libraries have already a decade of experience with advanced manuscript received august 1980; accepted december 1980. japanese character input/morita 7 systems. many others have recently started to adopt them. wide utilization of these systems seems to be just a matter of time. it will be beneficial to review japanese methods and consider possible adaptation of them to our systems. in the following sections, various japanese approaches to inputting the japanese language are explained with an eye to future automation of non-roman language materials in this country. the japanese language and the computer it should be noted, first of all, that the japanese language is an entirely different language from chinese, although they are often confused because they both use the same chinese ideographs in writing. each chinese ideograph , or character, symbolizes a certain object or denotes a certain meaning. the japanese use them in the japanese language with its own pronunciation in the context of its own grammar, whereas the chinese use them in the chinese language with its own pronunciation in the context of its own grammar. this means that a chinese ideograph could mean the same thing in both languages, but be pronounced or read differently and used in different grammatical environments. the chinese ideographs used in japanese are referred to as kanji, which are, to complicate the matter, used along with japanese syllabaries called kana. kana, in two styles called hiragana and katakana, total about 170 characters. depending on whether ,a kanji is used with another kanji or kana, the reading of it varies. at different times one set of kanji may be read in two or three different ways. the total number of kanji is about 50,000. in comprehensive dictionaries, about 40,000 or more kanji are included. medium-sized ones, such as ueda's daijiten, include about 15,000; concise ones about 8,000 to 10,000. 1 according to several tests on frequency of kanji occurrence made in various japanese institutions, approximately 3,000 kanji appear in high frequency, 3,000 are of moderate frequency, and several thousand more are of infrequent occurrence. as for geographical names, 2,279 kanji will cover most of japan and 1,500 kanji will suffice to cover personal names, except for very unusual names. 2 approximately 6,300 characters are needed for major newspapers such as the asahi and the nikkei. the trends in the use of kanji are to simplify the characters themselves, and not to use difficult kanji with many strokes. in 1946, the japanese government established 1,850 kanji as those for daily use, 3 and today newspapers and official documents use only those kanji, except for some personal and geographical names. the implication of this trend for computerization of kanji is that, depending on the documents to be covered, the need in number and kind of kanji varies. that is, institutions that deal with scientific or current information do not need as many kanji as other types of institutions that handle documents cover8 journal of library automation vol. 14/1 march 1981 ing longer periods and larger areas of knowledge . for example, japan information center for science and technology, which mainly handles the latest scientific information, claims that with approximately 6,000 kanji it can function satisfactorily. an example from the other extreme is the national institute of japanese literature, whose collection covers older historical periods, during which a great number of kanji were used and many kanji went through changes, mostly simplification m style. the latter institute is constantly adding new kanji to its system. it is obvious then that the first problem in the computerization of japanese materials is the number and kind of kanji to be included in the system. this is a problem of hardware. the other problem concerns software. when japanese is written, its words are not divided as in english, for combination of kanji and kana helps visually to make sentences understandable without word division. also, compound nouns are made by adding other words to a noun, so that, if a set of kanji represents one noun, one can expand its meaning by adding another kanji to it. though word division has been a problem in transliteration and not new in computerization, both arbitrarily divided words and undivided words in particular become serious problems in the computer files and in the retrieval of information . a question may be raised as to why we need kanji processing in spite of these problems; why isn't computer handling of alphanumerics and kana, which is in use today, sufficient? the answer to this is mainly that kanji possess a definite visual effect. also, if only romanized languages or kana alone are used, many homonyms may make the meaning ambiguous. while it is quite possible to write japanese only in kana or in the'"romanized forms, as proven by the systems in use, it is better, for efficiency and precision, to express the language in the way it is actually written. as for the problem of word division, study is in progress on methods of dividing words systematically and automatically, incorporating the latest research in the field of applied linguistics. this is more concerned with the development of software, and this paper will not delve into it. inputting various japanese approaches to inputting kanji and kana are organized below into six major groupings according to different inputting devices. they are: (1) full keyboard, (2) component pattern input , (3) kana keyboard, ( 4) stenotype, (5) optical character recognition, and (6) voice recognition . these six methods are further divided into subvariations as shown in table 1. 4 full keyboard the main feature of this approach is use of a full character keyboard as the inputting device. the operator uses the full character keyboard japanese character input/morita 9 table 1 . input systems major approaches full variations keyboard kanji teletypewriter subvariations japanese typewriter character location coded-plate scanning coded typeface modified coded typeface tablet style electromagnetic component pattern input kana electrostatic photoelectric training characters/ characters needed minute accommodated mediumextensive medium mediumsmall 40-100 30-50 30-70 2,300-4,000 2,205 2,863 2,200-3,000 3,000-4,096 2,800-4,000 2,800-4,000 keyboard two-key stroke location correspondence extensive 60-120 4,096 stenotype optical character recognition voice recognition association memory display selection small 20-30 kana-kanji conversion word conversion sentence conversion 1,000-2,500 rather than codes or other symbols. the keyboard varies depending on models, usually consisting of frequently used kanji and both sets of kana, supplemented by arabic numerals, roman, cyrillic, and greek alphabets in upper and lower cases, often with italics, signs, and diacritical marks. to each character, a two-byte binary code (expressed by a four-digit numeral) is assigned, so that when the inputter types a character the code for the character is punched on paper or cassette tape. kanji teletypewriter the oldest method for kanji inputting, still widely in use, is the kanji teletypewriter system or multishift system. one variation of this approach, developed by the national diet library at an early stage of its computerization, has 192 character keys, each having fourteen characters in three columns and five lines, as shown in figure 1. in addition, there are fourteen selection keys arranged in three columns and five rows on the lower left of the keyboard to correspond to the pattern of characters on each character key . when an operator strikes the character key b with the right hand and the selection key a with the left hand at the same time, the code for the character c is punched on the tape. 10 journal of library automation vol. 14/1 march 1981 000 000 ooo 000 qoo \ ' \ \ \ \ :>'1":111l"jwij i '_''l'h.l-t tt1ul um ~~i :f'i :t~ ;jl;lt'{>,'f r r_r,f rx 15 1 lf~ 1 1-e --····--j·-·· ·-.. lf [l{rl i i'yliilj f'li: ·r1 1jgt)f i *y:nt : ii!j,ii¥1.1 i 9;j.;,~;1: ; ~!z1tt '?" ~~:.,;.· .-.t r •.. ~,, •. x ~.:r, r_ x ,r,; ,r;; i ~~i_i_ 1 if r. ---· . -l -··· ·r'i!..r':~ i tm~m~x <¥~1 t :k j~,] f:k {i~ ~ ri fr t1>/ ilm ~'!<. #.l'li!iii *t9.t !k ix x: . rel ~ \ \ \ \ \ \ \ • \ .ii character key b character c 1rselection key a fig. 1. kanji teletypewriter keyboard of the national diet library. included on this keyboard are : kanji kana western alphabets numerals symbols and marks kanji pattern s kanji components space 2,006 90 144 20 210 40 139 total 2,6506 by using shift keys on the upper left of the keyboard, kana in both styles and alphabets in upper and lower cases can be input. for satisfacjapanese character input!morita 11 tory operation, the keyers must be professionally trained, and it is said that one to three months are necessary for them to be fully trained and able to input an average of fifty to sixty kanji per minute. this is not as fast as most other methods discussed. japanese typewriter the second of the full keyboard approaches is the japanese typewriter method, which uses a modification of the standard japanese typewriter with a tray filled with kanji printing types. the operator finds a character in the tray and punches it by moving a metal handle as the type bar is punched down to print the character. this is rather primitive and different in its operation from the english typewriter, which uses the ten-finger touch method. there are four variations: character location method. kanji are arranged on a keyboard by their codes, so that when a key is punched, the kanji is typed on regular paper as if it had been done by a regular japanese typewriter. at the same time, the code is automatically read from the location of the key and is punched on tape. code-plate scanning method. each type bar has a plate attached on its side, and the code for the character is marked on its plate . when a key is typed, the kanji is printed on paper and the code from the plate is optically scanned at the same time. coded typeface method. each typeface is made with a character on the upper half and a code for it on the lower hale when a key is typed, both the character and code are printed. the code on the bottom half is optically scanned from the printed paper. modified coded typeface method. instead of typing both characters and codes on the paper, this method prints only the characters on the front of the paper and, at the same time, prints a bar code on the back of the paper. the machine capable of doing this is complicated. the size of the character on a typeface can be bigger than in the variation above, and the bar code can be larger to make the scanning of the code easier and more precise. as the discussion of the four variations indicates, the japanese typewriter offers the advantage of being able to monitor input at the time of keying. since the japanese typewriter has been in use for a long time in offices where a quantity of official documents are dealt with, and since ordinary japanese typists can use this system without any additional training, the use of equipment similar in operation was considered advantageous . however, it should be noted that japanese typewriters have never become as prevalent as english typewriters, and the demand for computers comes from more areas than just those where japanese typewriters are used . for this reason, the use of japanese typewriters is not as advantageous as its proponents claim . an obvious 12 journal of library automation vol. 14/1 march 1981 disadvantage is its slow speed of operation-thirty to fifty characters per minute on the average. another disadvantage is that the number of characters on the keyboard is limited to about 3,000. tablet style this method, also known as pen-touch method, was recently developed . each character has a key, and characters are arranged in a certain order. the location of the characters on a matrix sheet determines the two-byte binary code, which consists of a two-digit numerical abscissa and twodigit numerical ordinate . the operator touches the key with a penshaped detector and the code for the character is punched on the paper tape. the operation is one-handed, requiring only a light touch of the key by a detector. keys are on one flat keyboard and are color-coded by sections to make it easier for the operator to locate them. light touch operation reduces operator fatigue. this method does not require special training. however, the number of kanji on a keyboard of reasonable size is limited to approximately 3,500. by shifting, twice as many characters can be handled, though all characters are not indicated on the keyboard. speed of input is not very high-thirty to seventy characters per minute. this system, already used in many libraries, is becoming increasingly popular because of its easy operation. there are three different technologies used: electromagnetic, electrostatic, and photoelectric. there are no differences in actual input operation for those electronically different methods. component pattern input although not a full keyboard method, component pattern input is closely related to these methods. the idea behind this approach is that most kanji are composed of one or more basic component units, two or more of which can be put together into one kanji according to one predetermined pattern out of forty general patterns. the inputting device has keys for those forty patterns along with keys for individual components on a special keyboard. to compose a kanji, a key for an appropriate pattern is selected and typed, and components are chosen to fill each individually numbered block of the selected pattern, following the established order as shown below. 7 each pattern has a code, and so does each component . when a key is typed, the code is punched on a paper tape as shown in figure 2. there are cases where a kanji with two components can be a component of another kanji, as shown in the first and second examples in figure 2. a kanji is constructed by punching at least three codes : one for a pattern and at least two for components. then, a kanji dictionary consisting of several thousand master-code combinations (see figure 3) is stored in a magnetic drum, and the several codes to compose a kanji punched on paper or cassette tapes are converted through this dictionjapanese character lnput!morita 13 k&njl nol on pattern a componenl parlo (radiula) lhe keyboard• ;1§ *-d! [e] . f§ ---. .j 2804 38d 2723 --·-c od eo ~t§ !-.~f~~ --:~ . : .... .: . ... ! 00 "j * ej 2806 3813 1638 1938 -codu t-t ;f:t:~ lm * ~t ~ ~ ~' ,,.~; u : __ ~~-; 4 2807 1638 1138 1138 1138 --cod eo ffe ~*,l; ~ [1@ * ;-1-1 y {! -l __ m1 ___.. 4 2807 1o3a 1817 142a 08z4 ---cod eo fig. 2 . component pattern input. z804 3813 z7zb 0000 0000 0000 8118 • ~-m z806 3813 1638 193!1 0000 0000 b 118 -ao z607 1638 1138 1138 1138 0000 6117 -~ 1a z807 1638 1817 l4za 08z4 0000 9815 .. t~ fig. 3. kanji dictionary. ary to a two-byte binary code assigned to that particular kanji. these are then handled as other kanji with an individual code. though this can be a stand-alone approach to inputting kanji, the principle has been adopted by the national diet library to supplement the inputting of kanji on the full keyboard kanji teletypewriter. the national diet library uses this system when inputting kanji that are not included in its keyboard. instead of having a special separate keyboard, the kanji teletypewriter of the national diet library integrates patterns and components as equivalents to other characters. its keyboard includes forty patterns and approximately 140 components. this was the most elementary approach to computerize kanji . conceived in the early developmental stage of kanji processing, it used one of the characteristics of kanji, the composition from several components. in actual situations, this technique requires at least three key strokes for one kanji and consumes time to locate the needed component on the 14 journal of library automation vol. 14/1 march 1981 keyboard. furthermore, it requires the complicated extra step of putting input codes through a kanji dictionary to combine component codes into a code per kanji. no library is currently using this system by itself. kana keyboard system the keyboard of a japanese syllabary typewriter has adapted the conventional english typewriter keyboard and has standard roman alphabet keys that contain katakana in shift (figure 4). since the number of katakana exceeds that of roman letters, the katakana keys are extended to keys for numerals and punctuation marks. this means that this typewriter can be used either for kana or roman letters by changing its mode. fig. 4. kana typewriter keyboard. two-key stroke method this variation of the kana keyboard system is referred to as the twokey stroke system, and uses kana as codes not as letters . roman letters can be used as codes, too. there are two different subvariations. they are: location correspondence. keys are divided into two sections : one for right hand, and the other for left hand. if two keys are to be stroked, there will be four possible combinations of key strokes: (1) left hand twice, (2) left .and right, (3) right and left, and (4) right twice. the keyboard is accompanied by a kanji table in which characters are arranged in several blocks and in a certain order within each block. each block, which contains twenty-six kanji in a four-by-six arrangement, is made according to each combination of strokes: first block is left and left; second block is left, right, etc. within each block, the ordinate consists of keys for the first stroke and the abscissa for the second . a kanji which is at the intersection of the above indicates which keys are to be typed. when kanji a is to be typed (see figure 5), since it is in block a indicating the stroke combination as left and left, the operator types a · and w by left hand. if kanji b is to be typed, the operator types key a by left hand and key p by right. each key has a byte code and a combination of two key strokes makes a composite, a two-byte binary code, for a kanji. the bit may be changed by shifting, and different kanji can block a (for left, left) g j.,;( '7-. (q) (w) (e) (r) ~ ( 1) 0000 '! (q) 00 00 4(a) o• 00 ll) 0/0 0 0 (z) ' ,. kanji a japanese character input!morita 15 'ij / (t) (y) 0 0 0 0 00 0 0 ~ (1) "' (q) 4(a) ''l (z) block b (for left, right) 7-.:::.. 7--e" o (u) (i) (0) (p) ($) (c) 000000 oooooo ooo.oo 0 0 0/0 0 0 ,. / / kanji b fig. 5. kanji table for location correspondence method. be typed if another table is prepared for kanji with different bits. association memory method . in this method, each kanji is given two kana which usually represent a reading of that kanji. the operator associates a kanji to be input with two kana assigned to that kanji, and types them with two strokes using the kana keys. both of the key-stroke methods are economical as well as convenient because of the wide availability of kana typewriters . mainly for that reason, both of these systems . have been well accepted and are expected to grow further. since this touch method does not require the operator to look for the character on the keyboard to input, it is the fastest to operate and is considered suitable for input in quantity. it is possible to input 60 to 120 characters per minute. the only drawback is that the operator must get acquainted with the arrangement of kanji in the first variation, and must memorize all the associated kana spelling for many kanji in case of the second variation. in either case, the operator must be professionally trained. the japan information center for science and technology, which indexes many scientific publications, employs a vendor who uses the location correspondence variation of this system for inputting information. display selection this also uses a kana typewriter with a screen in front . when a word is typed in kana, a group of kanji with that sound are displayed on the screen. the operator chooses the right kanji with a light pen-a slow but accurate operation. the operator does not have to be specially trained for this. kana-kanji conversion in contrast to the conventional approach of full keyboard inputting, an entirely new method for inputting kanji is gaining popularity as the 16 journal of library automation vol. 14/1 march 1981 availability of sophisticated software increases. this uses a kana typewriter keyboard to input japanese in syllabary or romanized form, converting them to kanji by software. there are two ways of conversion: one that converts word by word, and the other sentence by sentence. stenotype the stenotype is a typewriterlike device. the operator must be able to take shorthand. when the stenotype is used, it punches words in paper tapes. therefore, inputting is high speed. however, the operator must receive proper training. optical character recognition this system, developing quickly and expected to gain wider use, can scan a maximum of 2,500 printed kanji. 8 one variation connects a writing tablet to a computer so that as the operator writes kanji on the tablet, the computer scans them in stroke order. this function of scanning by the stroke order is considered to be an advantage for processing some types of japanese documents. the drawbacks are that the system is still very expensive, and the number of recognizable characters is fewer than 2,000. voice recognition this is an oral-visual system, in which the human voice is read by a computer. obviously the most difficult to develop, this system is still in an experimental stage . however, a prototype has been demonstrated at various exhibitions, and the system apparently possesses great potential. summary pattern configuration and output devices for japanese characters are basically the same as those for english. however, the pattern generation of characters is mechanically more complicated than that of the roman alphabet, because kanji has a more complicated structure than the roman alphabet and the number of components is greater. each kanji is represented by a two-byte binary code rather than one byte as in roman alphabet. because of this, the efficiency of retrieval is low. presently, hard copy and typesetting for printing of hard copy are the major output forms, and very little on-line retrieval of information with kanji is in current operation. problems particular to kanji processing among numerous problems in processing kanji through computers, major ones are: (1) which kanji are to be included; (2) how many characters are to be handled; (3) what code should be assigned and how it should be arranged on the keyboard or table; and (4) how the kanji not included on the keyboard should be treated. in the early stage of kanji computer development, different institujapanese character input/morita 17 tions handled the problems in ways best suited to their individual needs, according to the nature of the literature covered, the amount of literature processed, and the kinds of output needed . they experimented with the then best available capabilities. as a result, the finished systems are all independent and mutually incompatible. standardization is obviously necessary for exchange of information among the systems. in order to set standards for selection of characters and assignment of codes, jis (japan industrial standard) c6226-1978 has been compiled by the japan association for development of information processing. this is a table of characters designed for information exchange (a portion of which is shown in figure 6). it has a one-byte code as its abscissa and another as its ordinate. characters are arranged so that the intersection of abscissa and ordinate determines a kanji whose code consists of four numerals, two from the abscissa and two from the ordinate. included in the table are kana in both styles, roman, greek, and cyrillic alphabets in upper and lower cases, diacritical marks, numerals, and punctuation marks, as follows: 1. special characters 108 2. numerals (arabic) 10 3. roman alphabets 52 4. hiragana 83 5. katakana 86 6. greek alphabets 48 7. cyrillic alphabets 66 8 . kanji 6,349 total 6~8029 in the first section of the table , numerals, alphabets., kana, and special characters are grouped . in the second section, the total of 2, 965 frequently used kanji are arranged as the first priority group, and an additional 3,384 kanji are selected as the second group 10 in the bottom half of the table. kanji are printed in the preferred style for printing typeface. this table will resolve problems 1 to 3 mentioned above. institutions that had arranged their own codes for kanji, including the national institute of japanese literature, are now automatically translating their own codes into jis codes. in cases where needed kanji are not included on the keyboard, handling varies. with the japanese typewriter, because each kanji is inscribed on a typeface, only the kanji on that typeface is printed when the type bar is stroked . therefore , only kanji that have typefaces can be input in this system, while some other handling is possible in other methods. while the number of characters that can be accommodated on keyboards is limited to 2,000 to 3,500, depending on the type of equip18 journal of library automation vol. 14/1 march 1981 b7 d d did d d d d 0 d 0 0 0 b6 1 1 1 1 1 1 1 1 1 1 1 1 1 ! ~ bs d d d d d 1 d 0 d d d 0 d d 2 "' b4 d d d d 0 d 0 1 1 1 1 1 1 1 bj d 0 d 1 1 1 1 0 0 0 0 1 1 1b2 d 1 1 d 0 1 1 0 0 1 1 d 0 bt 1 0 1 0 1 0 1 0 1 0 1 d 1 ~ 1 "'1 1~ b4 1 2 3 4 5 6 7 8 9 10 11 12 13 b; b6 b5 b3 b2 bt 0 1 0 0 0 0 1 1 :·s p: i jl r-f ii ' lll-i . . ? i ~ ~ lj ' 0 ji' 1 • _; ' 1... ---' . . 0 1 d 0 1 0 1 0 2 ~ oic'ji6 a. \l v * ' t -i t 0 1 0 0 0 j1 1 3 0 1 0 i q i 1 0 0 4 ... ..j.. ~.--. ) ;{_ i .z h tj' /j{ ~ "> d) "' 7 }; 0 1 0 0 1 0 1 5 7 711 1 '/ rf .:r.. .x. ;;t ;t ij ij~ ~ 0 1 0 j 0 i 1 ii 0 6 a!bir t.ieiz h 8 r kia m n 0 1 0 j 0 1 1 i 1 i 7 a 6 1 8 rln e e )k 3 l1 i1 k ji 0 1 0 1 0 olol 8 0 1 ol1,oloi11 9 0 1 0 1 0 ii! 0 10 0 i 1 0 1 0 1 ' 11 j. 0 1 ol 1 1jo 0 12 0 1 0 1 1 0 1 13 0 1 0 1 1 1 0 14 0 1 o 1 1 1 1 1 15 r· :!fi p.§. k ~ n -~ "' 1 t-'· ttr; j ;~ ;_,~ -ftt !ffi 0 1 1 0 0 0 0 16 5.p.. ' t ,u a. ').{ * _[§ ~c...· 0 \1 ' 1 0 0 0 1 17 vi,"'i ~~ ,., .. p-[ :tji r'-· ft•;j;j i rr: 1fn .!jfl ·~.c.~ >j(; ,>_l;;, , .~. lit (j • 1 -'fj•--;;1 0/ !_11/ 0 0 1 0 18 tftl b.fltitti l~j [£}:\ £ fjil ~~ n ~;_rj :& !iii] :l~ j . f--""· i . --:----·-·· ~q.~~ t~r-i~ 1 jf( fe .r.t: ''"' if~~ rm 0 1 1 0 oj1/l 19 is •r ·1,. i \. 1,el ;r-.l; j,~ ~ 0 1 1 0 1 0 0 20 5''5 ;\ j ....:: i ji "'~ fn . f i )(ij • 'f-lj!t. ret jf~ ;flj /fjj •wj .;lj: p~ n i 1 1 n 1 i n i 1 ?l ~ .j~ i m i ~ \.;!cr j:.jt ~rr ~ i.gi ~;j 14!.~ h:l :=r fig. 6 . code of th e japanese graphic character set for information interchange. japanese character input!morita 19 ment, character generators have the capability of outputting more than the number of characters on the keyboard. figure 7 shows their relationship. characters that are in the generator but not on the keyboard must be frequently processed, because the number of characters needed for most documents could reach 6,000 to 6,500. using a shift key to enter another mode is a fairly common technique for inputting uncommon kanji. the keyboard may not have a character but, if the character generator has it, the code for that character can be input by shifting. for example, if a character on the keyboard has a code 0117, a bit is changed so the code 8117 can be typed by shifting and typing that key. if the code 8117 is assigned to another kanji not on the keyboard but indexed in the dictionary, it can be input. this applies for the kanji teletypewriter, tablet style, and the two-key stroke variations of the kana typewriter. in the kanji teletypewriter system used by the national diet library, the keyboard accommodates 2,650 characters, while its character generi i i i ,---...... ' / -'-fig. 7. kanji creating capability. outside system capability system capability character generator capability keyboard characters ator has the capability for 5, 717. operators in the national diet library input kanji that are not on the keyboard by using component pattern input method. or, if the operator finds the kanji code in the specially compiled dictionary in which codes for kanji are indexed, a shift key is used to change the bit, thus creating the code for kanji not on the keyboard. most other tablet systems use code dictionaries. in the twokey stroke variations of kana typewriters, tables of kanji for second and third or more shifts can be built, especially when the location association method is used. the handling of kanji that are not in character generators is more difficult. only the digital character generator, the kind that uses either dot or stroke, can add characters fairly easily. in the flying spot system, characters can be added, but it must be done professionally with an additional character cylinder and is very costly. the national diet library, which now uses flying spot, limits addition of kanji to a minimum. because its output is solely in printed book form, the national diet library inputs a fill character for kanji not in the system . when 20 journal of library automation vol. 14/1 march 1981 the phototypeset masters are made, the fill characters are replaced by typeset characters . the use of a fill character suffices only when the output is phototypeset, because there is a step to replace fill characters by typeface. however, as long as the data base includes many fill characters on the magnetic tapes, the on-line retrieval of information or later utilization of tapes becomes unsatisfactory . the national institute of japanese literature uses a dot matrix and prints by wiredot impact . if a kanji is not in the character generator, the institute's staff composes the kanji in an enlarged dot matrix and creates the capability for printing in the generator. if the kanji made in such a way is used only once, the kanji pattern is not stored in the character generator, so that the generator does not reach its full capacity quickly. the enlarged dot composite for kanji created in the institute is filed and indexed for future use. most other institutions simply do not use those less commonly used kanji, and substitute kana for them . in addition to the problems common to any character output, such as size and number of dots, the problem of the space for kanji in relation to other characters and the choice of vertical or horizontal printing of japanese sentences with kanji must be considered. kanji have many strokes and, as mentioned before, are expressed by two-byte codes . each kanji needs a double space when displayed on screens or printed. when a kanji is used with numerals or kana, the kanji part looks fine but the numerical part has too much space between each numeral. therefore, input of kanji is done in a kanji mode and input of kana, roman alphabets, and numerals are in a kananumerical mode. in this way a multidigit figure looks like one whole figure rather than a line of one-digit figures . some formal documents must be printed in the traditional vertical arrangement. to cope with this situation, some line printers have the capability to precompose a vertical page before printing it. there are multicolor crts · on the market that can be used for the retrieval of library-related information, e. g., main entry in red, series statement in yellow. one last problem that must be considered is that most of these systems require trained operators, or else the operation is very slow. the information is edited and compiled by the editors and prepared for input in the form of worksheets. so are the revisions. at various stages of revising the text, the information must be printed, given to the editors, and revised . further developments in simplifying input and revising texts for efficient flow are to be expected. application of kanji systems processing of vernacular-language materials in their own writing systems is considered vital for research libraries in this country. in adoptjapanese character input/morita 21 ing the kanji systems in such libraries, there are three major factors that must be considered: the objectives and needs of the institution, the cost, and the personnel. first, the institution must know what it must accomplish by means of such a system. the needs may not be the same for all institutions . is the system for retrieving catalog information, or for inputting catalog and other information? is it for internal processing or patron use? is it for a large bibliographic utility to distribute information to its subscribers, or for an individual institution to process its own information? could the system be shared by the department of asian studies in any way? the character set needs· of the institution are a major factor in choosing the system . since input and output devices are different, i.e., one cannot input kanji on a crt and retrieve kanji from the same crt, the institution must consider how much it will need to input, or whether it can rely on available data bases. some institutions may not need any input equipment if they utilize available data bases . if japan marc and other tapes are made accessible by a large bibliographic utility in this country, the institutions will be able to obtain bibliographic information in kanji on the screen. if they want only catalog cards or a com catalog, they will not need any equipment except the terminals supported by the utility. if they want to input, they must consider what form or forms of output they need, how to create the characters not included in the system, in addition to which system to choose. second, cost is an important factor. is the expense jl.lstified in terms of the other needs of the library? what can be accomplished per dollar spent? the kanji systems are still expensive, though the cost will eventually be reduced. how much can be spent and how much continuing support can be expected are factors that modify system expectations. the budget must include not only the one-time hardware cost , but also the software, maintenance, and personnel. third, the availability of personnel will affect the choice of system. what degree of language expertise does the system require in each stage of operation, such as inputting, maintenance , and programming? does it need terminal operators trained in those languages? what other personnel does the system need as far as language-related qualification is concerned? apart from the three major factors discussed above, there are some technical aspects that must be adjusted to library situations in this country. since japanese, chinese, and korean use the same chinese ideographs to different degrees and in different ways, libraries considering automated processing of these language materials are probably expected to handle all three languages by the same system, to say nothing about the other non-roman scripts. problems will arise in selecting characters for inclusion in the system. as pointed out earlier with regard to 22 journal of library automation vol. 14/1 march 1981 japanese character processing, there are simply too many characters for the present capacity of any computer. if korean and chinese languages are to be handled by the same computer, this problem multiplies. the korean alphabet, called hangul, would have to be included. chinese has more characters than japanese. worse yet is the fact that some kanji are simplified in different ways in japan and china, so that they are neither recognizable nor interchangeable between them . it will be an enormous task to accommodate both in the same system. another problem is the arrangement and indexing of kanji. if a full keyboard, a japanese typewriter keyboard, or two-key stroke system, especially its location association method by kana typewriter, is considered for japanese, chinese, and korean, the arrangement of the characters must be indexed and accessed for the three languages, in addition to the multiple readings found in japanese. for example, kanji on the japanese keyboard are usually arranged by the initial sound of the japanese reading of the kanji . this arrangement will be useless for chinese and korean, because japanese readings are not the same as chinese or korean readings. the arrangement of kanji on the keyboards must be on some new principle common to these languages. even if the kana-kanji conversion is used, and roman alphabet-kanji conversion software is adopted, software to handle those three languages must be developed. such software would have to be highly sophisticated. the presence of many homonyms in chinese will cause a great problem to the extent that the system relies on transliterated or romanized forms of the language . recognition of the many identical spellings in different language contexts will be extremely difficult. the above discussion is based on what is currently available in japan . the combination of existing inputting, generating, and outputting equipment developed by japanese technology opens up various possibilities for us to build effective systems in this country . acknowledgment this article is based on a study conducted in japan as a japan foundation professional fellow, and as a visiting re search fellow of the center for research on information and library science, university of tokyo. references l. national institute of japanese lite rature, implementation of a computer system and a kanji handling system at ni]l (tokyo: nijl, 1978), p.16. 2. toshio ishiwata, "kanji shori kenkyu ni motomerareru mono " ("requirements for study on kanji processing"] computopia no.9 (1977) , p.35 . 3. gendai yoga no kiso chishiki , 1980 {basic knowledge on current terms , 1980] (tokyo: jiyukokuminsha, 1980), p .999. 4. figures are taken from the following two sources and compiled by the author: hasegawa, jitsur6. "kanji shari sochi" ("kanji processing devices"] ]aha shari [in formation processing] 19, no.4:353 (april 1978). japanese character lnput!morita 23 sugai, kazur6. "kanji nyii.-shutsuryoku sochi mo kaihatsu doko" ["a trend in development of kanji input-output devices"] business communication 16, no. 7:41 (1979). 5. used for the pattern input mentioned in the following component pattern input system . 6. national diet library, library automation in the national diet library (tokyo: the library, 1979), p.4 . 7. ibid., p.7 . 8. asia business consultants is using an optical character recognition system that can scan handwritten kana and numerals in a small scale to input and process catalog information for a library collection. 9. "joh6 kokan no tame no kanji fug6 no hy6junka" ["s tandarization of kanji code for information interchange"] kagaku gijitsu bunken siibisu [scientific and technical documents service] no.50 (1978), p.29. 10. ibid., p .28. ichiko morita is assistant professor in library administration and head, automated processing division, the ohio state university libraries . editor's notes most ]ola readers are aware of significant delays in publication in the last volume. susan k. martin, a former editor of ]ola, and richard d. johnson, a former editor of college & research libraries , gave freely of their time and energy to bring the journal back on schedule. mary madden, judith schmidt, and the members of the editorial board under the leadership of charles husbands all worked closely with sue and richard in this effort. this was a second time around for sue, who undertook a similar task when she assumed the jola editorship in 1972. the ]ola readership and this editor owe debts of gratitude to sue, richard, and all the others who helped. we do not foresee major changes in the format of the journal as established principally under the editorships of kilgour and martin. we look for increased strength in our book reviews section under the editorship of david weisbrod. the addition of tom harnish as assistant editor for video technologies indicates our recognition of the growing importance of videobased information systems. we encourage reader suggestions. w e welcome brief communications of successes or failures that might be of interest to other readers. letters to the editor about any of our feature articles or communications are solicited. the next generation library catalog | yang and hofmann 141 sharon q. yang and melissa a. hofmann the next generation library catalog: a comparative study of the opacs of koha, evergreen, and voyager open source has been the center of attention in the library world for the past several years. koha and evergreen are the two major open-source integrated library systems (ilss), and they continue to grow in maturity and popularity. the question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. little has been written in the library literature to answer this question. this paper intends to answer this question by comparing the next-generation features of the opacs of two open-source ilss (koha and evergreen) and one proprietary ils (voyager’s webvoyage). m uch discussion has occurred lately on the nextgeneration library catalog, sometimes referred to as the library 2.0 catalog or “the third generation catalog.”1 different and even conflicting expectations exist as to what the next-generation library catalog comprises: in two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. it provides its intended audience with a more effective means for finding and using data and information.2 such expectations, despite their vagueness, eventually took concrete form in 2007.3 among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted navigation, federated search, user contribution, and enriched content, just to mention a few. over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop opacs with advanced features. the next-generation catalog is becoming the current catalog. the library community welcomes open-source integrated library systems (ilss) with open arms, as evidenced by the increasing number of libraries and library consortia that have adopted or are considering opensource options, such as koha, evergreen, and the open library environment project (ole project). librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. open-source opacs, especially that of koha, seem to be more innovative than their long-established proprietary counterparts, as our investigation shows in this paper. threatened by this phenomenon, ils vendors have rushed to improve their opacs, modeling them after the next-generation catalog. for example, ex libris pushed out its new opac, webvoyage 7.0, in august of 2008 to give its opac a modern touch. one interesting question remains. in a competition for a modernized opac, which opac is closest to our visions for the next-generation library catalog: opensource or proprietary? the comparative study described in this article was conducted in the hope of yielding some information on this topic. for libraries facing options between open-source and proprietary systems, “a thorough process of evaluating an integrated library system (ils) today would not be complete without also weighing the open source ils products against their proprietary counterparts.”3 ■■ scope and purpose of the study the purpose of the study is to determine which opac of the three ilss—koha, evergreen, or webvoyage—offers more in terms of services and is more comparable to the next-generation library catalog. the three systems include two open-source and one proprietary ilss. koha and evergreen are chosen because they are the two most popular and fully developed open-source ilss in north america. at the time of the study, koha had 936 implementations worldwide; evergreen had 543 library users.4 we chose webvoyage for comparison because it is the opac of the voyager ils by ex libris, the biggest ils vendor in terms of personnel and marketplace.5 it also is one of the more popular ilss in north america, with a customer base of 1,424 libraries, most of which are academic.6 as the sample only includes three ilss, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary catalogs. but, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ literature review a review of the library literature found two relevant studies on the comparison of opacs in recent years. the first study was conducted by two librarians in slovenia investigating how much progress libraries had made toward the next-generation catalog.7 six online catalogs sharon q. yang (yangs@rider.edu) is systems librarian and melissa a. hofmann (mhofmann@rider.edu) is bibliographic control librarian, rider university. 142 information technology and libraries | september 2010 were examined and evaluated, including worldcat, the slovene union catalog cobiss, and those of four public libraries in the united states. the study also compared services provided by the library catalogs in the sample with those offered by amazon. the comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personalization, and web 2.0 technologies applied in opacs. the authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. the findings indicated that “the progress of library catalogues has really been substantial in the last few years.” specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” when compared to services offered by amazon, the authors concluded that “none of the six chosen catalogues offers the complete package of examined options that amazon does.”8 in other words, library catalogs in the sample still lacked features compared to amazon. the other comparative study was conducted by linda riewe, a library school student, in fulfillment for her master’s degree from san jose university. the research described in her thesis is a questionnaire survey targeted at 361 libraries that compares open-source (specifically, koha and evergreen) and propriety ilss in north america. more than twenty proprietary systems were covered, including horizon, voyager, millennium, polaris, innopac, and unicorn.9 only a small part of her study was related to opacs. it involved three questions about opacs and asked librarians to evaluate the ease of use of their ils opac’s search engines, their opac search engine’s completeness of features, and their perception of how easy it is for patrons to make self-service requests online for renewals and holds. a scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of opacs. the mean and medium satisfaction ratings for open-source opacs were higher than those of proprietary ones. koha’s opac was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary opacs were ranked 3.9, 3.6, and 3.6.10 evergreen fell in the middle, still ahead of proprietary opacs. the findings reinforced the perception that open-source catalogs, especially koha, offer more advanced features than proprietary ones. as riewe’s study focused more on the cost and user satisfaction with ilss, it yielded limited information about the connected opacs. no comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. therefore the comparison described in this paper is the first of its kind. as only koha, everygreen, and voyager’s opacs are examined in this paper, the results cannot be extrapolated. studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ method the first step of the study was identifing and defining of a set of measurements by which to compare the three opacs. a review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the nextgeneration catalog should be. as marshall breeding put it, “there isn’t one single answer. we will see a number of approaches, each attacking the problem somewhat differently.”11 this study decided to use the most commonly held visions, which are summarized well by breeding and by morgan’s lita executive summary.12 the ten parameters identified and used in the comparison were taken primarily from breeding’s introduction to the july/ august 2007 issue of library technology reports, “nextgeneration library catalogs.”13 the ten features reflect some librarians’ visions for a modern catalog. they serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. the following are the definitions of each measurement: ■■ a single point of entry to all library information: “information” refers to all library resources. the next-generation catalog contains not only bibliographical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. it is a federated search engine for one-stop searching. it not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ state-of-the-art web interface: library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other internet search engines.14 a library’s opac can be intimidating and complex. to attract users, the next-generation catalog looks and feels similar to google, amazon, and other popular websites. this criterion is highly subjective, however, because some users may find google and amazon anything but intuitive or appealing. the underlying assumption is that some internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ enriched content: breeding writes, “legacy catalogs tend to offer text-only displays, drawing only on the marc record. a next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of information presented to the user.”15 the enriched content the next generation library catalog | yang and hofmann 143 includes images of book covers, cd and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ faceted navigation: faceted navigation allows users to narrow their search results by facets. the types of facets may include subjects, authors, dates, types of materials, locations, series, and more. many discovery tools and federated search engines, such as villanova university’s vufind and innovative interface’s encore, have used this technology in searches.16 auto-graphics also applied this feature in their opac, agent iluminar.17 ■■ simple keyword search box: the next-generation catalog looks and feels like popular internet search engines. the best example is google’s simple user interface. that means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ relevancy: traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. relevancy has not worked well in opacs. in addition, popularity is another factor that has not been taken into consideration in relevancy ranking. for instance, “when ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 by the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. so far, almost no opacs are capable of incorporating circulation statistics into relevancy ranking. ■■ “did you mean . . . ?”: when a search term is not spelled correctly or nothing is found in the opac in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. for example, a modern catalog may generate a statement such as “did you mean . . . ?” or “maybe you meant . . . .” this may be a very popular and useful service in modern opacs. ■■ recommendations and related materials: the nextgeneration catalog is envisioned as promoting reading and learning by making recommendations of additional related materials to patrons. this feature is an imitation of amazon and websites that promote selling by stating “customers who bought this item also bought . . . .” likewise, after a search in the opac, a statement such as “patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ user contribution—ratings, reviews, comments, and tagging: legacy catalogs only allow catalogers to add content. in the next-generation catalog, users can be active contributors to the content of the opac. they can rate, write reviews, tag, and comment on items. user contribution is an important indicator for use and can be used in relevancy ranking. ■■ rss feeds: the next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through rss feeds. modern catalogs are service-oriented; they do more than provide a simple display search results. the second step is to apply these ten visions to the opacs of koha, evergreen, and webvoyage to determine if they are present or absent. the opacs used in this study included three examples from each system. they may have been product demos and live catalogs randomly chosen from the user list on the product websites. the latest releases at the time of the study was koha 3.0, evergreen 2.0, webvoyage 7.1. in case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. in other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. despite the fact that a planned future release of one of those investigated opacs may add a feature, this study only recorded what existed at the time of the comparison. the following are the opacs examined in this paper. koha ■■ koho demo for academic libraries: http://academic .demo.kohalibrary.com/ ■■ wagner college: http://wagner.waldo.kohalibrary .com/ ■■ clearwater christian college: http://ccc.kohalibrary .com/ evergreen ■■ evergreen demo: http://demo.gapines.org/opac/ en-us/skin/default/xml/index.xml ■■ georgia pines: http://gapines.org/opac/en-us/ skin/default/xml/index.xml ■■ columbia bible college at http://columbiabc .evergreencatalog.com/opac/en-ca/skin/default/ xml/index.xml webvoyage ■■ rider university libraries: http://voyager.rider.edu ■■ renton college library: http://renton.library.ctc .edu/vwebv/searchbasic 144 information technology and libraries | september 2010 ■■ shoreline college library: http://shoreline.library .ctc.edu/vwebv/searchbasic the final step includes data collection and compilation. a discussion of findings follows. the study draws conclusions about which opac is more advanced and has more features of the next-generation library catalog. ■■ findings each of the opacs of koha, evergreen, and webvoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for all library information none of the opacs of the three ilss provides true federated searching. to varying degrees, each is limited in access, showing an absence of contents from electronic databases, digital archives, and other sources that generally are not located in the legacy catalog. of the three, koha is more advanced. while webvoyage and evergreen only display journal-holdings information in their opacs, koha links journal titles from its catalog to proquest’s serials solutions, thus leading users to fulltext journals in the electronic databases. the example in figure 1 (koha demo) shows the journal title unix update with an active link to the full-text journal in the availability field. the link takes patrons to serials solutions, where full text at the journal-title level is listed for each database (see figure 2). each link will take you into the full text in each database. state-of-the-art web interface as beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. with this limitation in mind, the out-of-thebox user interface at the demo sites was considered for each opac. all the three catalogs have the google-like simplicity in presentation. all of the user interfaces are highly customizable. it largely depends on the library to make the user interface appealing and welcoming to users. figures 3–5 show snapshots from each ilss demo sites and have not been customized. however, there are a few differences in the “state of the art.” for one, koha’s navigation between screens relies solely on the browser’s forward and back buttons, while webvoyage and evergreen have internal navigation buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. while all three opacs offer an advanced search page with multiple boxes for entering search terms, only webvoyage makes the relationship between the terms in different boxes clear. by the use of a drop-down box, it makes explicit that the search terms are by default anded and also allows for the selection of or and not. in koha’s and evergreen’s advanced search, however, the terms are anded only, a fact that is not at all obvious to the user. in the demo opacs examined, there is no option to choose or or not between rows, nor is there any indication that the search is anded. the point of providing multiple search boxes is to guide users in constructing a boolean search without their having to worry about operators and syntax. in koha, however, users have to type an or or not statement themselves within the text box, thus defeating the purpose of having multiple boxes. while evergreen allows for a not construction within a row (“does not contain”), it does not provide an option for or (“contains” and “matches exactly” are the other two options available). see figures figure 1. link to full-text journals in serials solutions in koha figure 2. links to serials solutions from koha the next generation library catalog | yang and hofmann 145 6–8. thus koha’s and evergreen’s advanced search is less than intuitive for users and certainly less functional than webvoyage’s. enriched content to varying degrees, enriched content is present in all three catalogs, with koha providing the most. while all three catalogs have book covers and movie-container art, koha has much more in its catalog. for instance, it displays tags, descriptions, comments, and amazon reviews. webvoyage displays links to google books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. see figures 9–11. faceted navigation the koha opac is the only catalog of the three to offer faceted navigation. the “refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. clicking on a term within a facet adds that term to the search query and generates a narrower list of results. the user may then choose another facet to further refine the search. while evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. the following facets appear after a search generates hits: “relevant subjects,” “relevant authors,” and “relevant series.” but choosing a term within a facet does not narrow down the previous search. instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. users must manually combine the terms in the simple search box or through the advanced search page. webvoyage also does not offer faceted navigation—it only provides an option to “filter your search” by format, language, and date when a set of results is returned. see figures 12–14. keyword searching koha, evergreen, and webvoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy neither koha, evergreen, nor webvoyage provide any evidence for meeting the criteria of the next-generation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. koha uses index data’s zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search figure 3. koha: state-of-the-art user interface figure 5. voyager: state-of-the-art user interface figure 4. evergreen: state-of-the-art user interface 146 information technology and libraries | september 2010 user contributions koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. in koha’s opac, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or figure 6. voyager advanced search figure 7. koha advanced search figure 8. evergreen advanced search expressions and relevance-ranked free-text queries.19 evergreen’s dokuwiki states that the base relevancy score is determined by the cover density of the searched terms. after this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 these statements do not indicate that either koha or evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as webvoyage. did you mean . . . ? only evergreen has a true “did you mean . . . ?” feature. when no hits are returned, evergreen provides a suggested alternate spelling (“maybe you meant . . . ?”) as well as a suggested additional search (“you may also like to try these related searches . . .”). koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. this is not the same as a “did you mean . . . ?” feature as defined above. while the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (when “womyn” is searched as a keyword in the koha demo opac, 16,230 hits are returned. this catalog does not appear to contain the term as spelled, which is why it is normalized to women. the fact that the term does not appear as is may not be transparent to the searcher.) with normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affecting discovery. the normalization feature also only works with particular combinations of misspellings, where letter order affects whether a match is found. otherwise the system returns a “no result found!” message with no suggestions offered. (try “homoexuality” vs. “homoexsuality.” in koha’s demo opac, the former, with a missing “s,” yields 553 hits, while the latter, with a misplaced “s,” yields none.) however, koha is a step ahead of webvoyage, which has no built-in spell checker at all. if a search fails, the system returns the message “search resulted in no hits.” see figures 15–17. recommendations/related materials none of the three online catalogs can recommend materials for users. the next generation library catalog | yang and hofmann 147 figure 9. koha enriched content figure 10. evergreen enriched content figure 11. voyager enriched content figure 12. koha faceted navigation figure 13. evergreen faceted navigation figure 14. voyager faceted navigation 148 information technology and libraries | september 2010 nevertheless, the user contribution in the koha opac is not easy to use. it may take many clicks before a user can figure out how to add or edit text. it requires user login, and the system cannot keep track of the search hits after a login takes place. therefore the user contribution features of koha need improvement. see figure 18. rss feeds koha provides rss feeds, while evergreen and webvoyage do not. ■■ conclusion table 1 is a summary of the comparisons in this paper. these comparisons show that the koha opac has six out of the ten compared features for the next-generation catalog, plus two halves. its full-fledged features include state-of-the-art web interface, enriched content, faceted navigation, a simple keyword search box, user contribution, and rss feeds. the two halves indicate the existence of a feature that is not fully developed. for instance, “did you mean . . . ?” in koha does not work the way the next-generation catalog is envisioned. in addition, koha has the capability of linking journal titles to full text via serials solutions, while the other two opacs only display holdings information. evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “did you mean . . . ?” webvoyage, the voyager opac from ex libris, comes in third, providing only three out of the ten features for figure 15. evergreen: did you mean . . . ? figure 16. koha: did you mean . . . ? figure 17. voyager: did you mean . . . ? figure 18. koha user contibutions tag’s frequency of use. all the tags in a tag cloud serve as hyperlinks to library materials. users can write their own reviews to complement the amazon reviews. all user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the opac. the next generation library catalog | yang and hofmann 149 the next-generation catalog. based on the evidence, koha’s opac is more advanced and innovative than evergreen’s or voyager’s. among the three catalogs, the open-source opacs compare more favorably to the ideal next-generation catalog then the proprietary opac. however, none of them is capable of federated searching. only koha offers faceted navigation. webvoyage does not even provide a spell checker. the ils opac still has a long way to go toward the nextgeneration catalog. though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ils opacs are not comparable in features and functions to stand-alone opacs, also referred to as “discovery tools” or “layers.” some discovery tools, such as ex libris’ primo, also are federated search engines and are modeled after the next-generation catalog. recently they have become increasingly popular because they are bolder and more innovative than ils opacs. two of the best stand-alone open-source opacs are villanova university’s vufind and oregon state university’s libraryfind.21 both boast eight out of ten features of the next-generation catalog.22 technically it is easier to develop a new stand-alone opac with all the next-generation catalog features than mending old ils opacs. as more and more libraries are disappointed with their ils opacs, more discovery tools will be implemented. vendors will stop improving ils opacs and concentrate on developing better discovery tools. the fact that ils opacs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone opac. certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source opac overlay will suffer, unless market conditions change. references 1. tanja mercun and maja žumer, “new generation of catalogues for the new generation of users: a comparison of six library catalogues,” program: electronic library & information systems 42, no. 3 (july 2008): 243–61. 2. eric lease morgan, “a ‘next-generation’ library catalog— executive summary (part #1 of 5),” online posting, july 7, 2006, lita blog: library information technology association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed nov. 10, 2008). 3. marshall breeding, introduction to “next generation library catalogs,” library technology reports 43, no. 4 (july/aug. 2007): 5–14. 4. ibid. 5. marshall breeding, “library technology guides: key resources in the field of library automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed jan. 23, 2010). 6. marshall breeding, “investing in the future: automation marketplace 2009,” library journal (apr. 1, 2009), http:// www .libraryjournal.com/article/ca6645868.html (accessed jan. 23, 2010). 7. marshall breeding, “library technology guides: company directory,” http://www.librarytechnology.org/exlibris .pl?sid=20100123734344482&code=vend (accessed jan. 23, 2010). 8. merčun and zumer, “new generation of catalogues.” 9. ibid. 10. linda riewe, “integrated library system (ils) survey: open source vs. proprietary-tables” (master’s thesis, san jose university, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed nov. 4, 2008). 11. ibid., 26–27. 12. breeding, introduction. 13. ibid.; morgan, “a ‘next-generation’ library catalog.” 14. breeding, introduction. 15. ibid. 16. ibid. 17. villanova university, “vufind,” http://vufind.org/ (accessed june 10, 2010); innovated interfaces, “encore,” http:// encoreforlibraries.com/ (accessed june 10, 2010). 18. auto-graphics, “agent illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed june 10, 2010). 19. breeding, introduction; morgan, “a ‘next-generation’ table 1. summary features of the next generation catalog koha evergreen voyager single point of entry for all library information ûü û û state-of-the-art web interface ü ü ü enriched content ü ü ü faceted navigation ü û û keyword search ü ü ü relevancy û û û did you mean…? üû ü û recommended/ related materials û û û user contribution ü û û rss feed ü û û 150 information technology and libraries | september 2010 22. villanova university, “vufind”; oregon state university, “libraryfind,” http://libraryfind.org/ (accessed june 10, 2010). 23. sharon q.yang and kurt wagner, “open source standalone opacs,” (microsoft powerpoint presentation, 2010 virtual academic library environment annual conference, piscataway, new jersey, jan. 8, 2010). library catalog.” 20. index data, “zebra,” http://www.indexdata.dk/zebra/ (accessed jan. 3, 2009). 21. evergreen docuwiki, “search relevancy ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed dec. 19, 2008). lita cover 3, cover 4 yalsa cover 2 index to advertisers communications manzari user-centered design of a web site | manzari and trinidad-christensen 163 this study describes the life cycle of a library web site created with a user-centered design process to serve a graduate school of library and information science (lis). findings based on a heuristic evaluation and usability study were applied in an iterative redesign of the site to better serve the needs of this special academic library population. recommendations for design of web-based services for library patrons from lis programs are discussed, as well as implications for web sites for special libraries within larger academic library settings. u ser-centered design principles were applied to the creation of a web site for the library and information science (lis) library at the c. w. post campus of long island university. this web site was designed for use by master’s degree and doctoral students in the palmer school of library and information science. the prototype was subjected to a usability study consisting of a heuristic evaluation and usability testing. the results were employed in an iterative redesign of the web site to better accommodate users’ needs. this was the first usability study of a web site at the c. w. post library. human-computer interaction, the study of the interaction of human performance with computers, imposes a rigorous methodology on the process of user-interface design. more than an intuitive determination of userfriendliness, a successful interactive product is developed by careful design, testing, and redesign based on the testing outcomes. testing the product several times as it is being developed, or iterative testing, allows the users’ needs to be incorporated into the design. the interface should be designed for a specific community of users and set of tasks to be accomplished, with the goal of creating a consistent, usable product. the lis library had a web site that was simply a description of the collection and did not provide access to online specialized resources. a new web site was designed for the lis library by the incoming lis librarian who made a determination of what content might be useful for lis students and faculty. the goal was to have such content readily accessible in a web site separate from the main library web site. the web site for the lis library includes: access to all online databases and journals related to lis; a general overview of the lis library and its resources as well as contact information, hours, and staff; a list of all print and online lis library journal subscriptions, grouped by both title and subject, with links to access the online journals; links to other web sites in the lis field; links to other university web pages, including the main library’s home page, library catalog, and instructions for remote database access, as well as to the lis school web site; a link to jake (jointly administered knowledge environment), a project by yale university that allows users to search for periodical titles within online databases, since the library did not have this type of access through its own software. this information was arranged in four top-level pages with sublevels. design considerations included making the site both easy to learn and efficient once users were familiar with it. since classes are taught at four locations in the metropolitan area, the site needed to be flexible enough to serve students at the c. w. post campus library as well as remotely. the layout of the information was designed to make the web site uncluttered and attractive. different color schemes were tried and informally polled among users. a version with white text on black background prompted strong likes or dislikes when shown to users. although this combination is easy to read, it was rejected because of the strong negative reactions from several users. photographs of the lis library and students were included. the pages were designed with a menu on the left side; fly-out menus were used to access submenus. where main library pages already existed for information to be included in the lis web site, such as lis hours and staff, links to those pages were made instead of re-creating the information in the lis web site. an attempt was made to render the site accessible to users with disabilities, and pages were made compliant with the world wide web consortium (w3c) by using their html validator and their cascading style sheet validator.1 literature review usability is a term with many definitions, varying by field.2 the fields of industrial engineering, product research and development, computer systems, and library science all share the study of human-and-machine interaction, as well user-centered design of a web site for library and information science students: heuristic evaluation and usability testing laura manzari and jeremiah trinidad-christensen laura manzari (manzari@liu.edu) is an associate professor and library and information science librarian at the c. w. post campus of long island university, brookville, n.y. jeremiah trinidad-christensen (jt2118@columbia.edu) is a gis/map librarian at columbia university, new york, n.y. 164 information technology and libraries | september 2006 as a commitment to users. dumas and reddish explain it simply: “usability means that the people who use the product can do so quickly and easily to accomplish their own tasks.”3 user-centered design incorporates usability principles into product design and places the focus on the user during project development. gould and lewis cite three principles of user-centered design: an early focus on users and tasks, empirical measurement of product usage, and iterative design to include user input into product design and modification.4 jakob nielsen, an often-cited usability engineering specialist, emphasizes that for increased functionality, engineering usability principles should apply to web design, which should be treated as a software development project. he advocates incorporating user evaluation into the design process first through a heuristic evaluation, followed by usability testing with a redesign of the product after each phase of evaluation.5 usability principles have been applied to library web-site design; however, library web-site usability studies often do not include the additional heuristic evaluation recommended by nielsen.6 in addition to usability, consideration should also be given during the design process to making the web site accessible to people with disabilities. federal agencies are now required by the rehabilitation act to make their web sites accessible to the disabled. section 508 part 1194.22 of the act enumerates sixteen rules for internet applications to help ensure web-site access for people with various disabilities.7 similarly, the web accessibility initiative hosted by the w3c works to ensure that accessibility practices are considered in web-site design. they developed the web content accessibility guidelines for making web sites accessible to people with disabilities.8 although articles have been written about usability testing of academic library web sites, very little has been written about usability testing of special-collection web sites for distinct user populations within larger academic settings.9 heuristic evaluation methodology heuristic evaluation is a usability engineering method in which a small set of expert evaluators examine a user interface for design problems by judging its compliance with a set of recognized usability principles or heuristics. nielsen developed a set of ten widely adopted usability heuristics (see sidebar). after studying the use of individual evaluators as well as groups of varying sizes, nielsen and molich recommend using three to five evaluators for a heuristic evaluation.10 the use of multiple experts will catch more flaws than a single expert, but using more than five experts does not produce greater results. in comparisons of heuristic evaluation and usability testing, the heuristic evaluation uncovered more of the minor problems while usability testing uncovered more major, global problems.11 since each method tends to uncover different usability problems, it is recommended that both methods be used complementarily, particularly with an iterative design change between the heuristic evaluation and the usability testing. for the heuristic evaluation, four people were approached from the palmer lis school faculty and ph.d. program with expertise in web-site design and humancomputer interaction. three agreed to participate. they were asked to familiarize themselves with the web site and evaluate it according to nielsen’s ten heuristics, which were provided to them. heuristic evaluation results the evaluators were all in agreement that the language was appropriate for lis students. one evaluator said if new students were not familiar with some of the terms they soon would be. another thought jake, the tool to access full text, might not be clear to students at first, but the lis web-site explanation was fine the way it was. they were also in agreement that the web site was well designed. comments included: “the purpose and description of each page is short and to the point, and there is a good, clean, viewable page for the users”; “the site was well designed and not over designed”; “very clear and user friendly”; “excellent example of limiting unnecessary irrelevant information.” the only page to receive a “poor layout” comment was the lengthy subject list of journals, though no suggestions for improvement were made. concern was expressed about links to other web sites on campus. one evaluator thought new students might be confused about the relationship between long island university, c. w. post, and the palmer school. two evaluators thought links to the main library’s web site could cause confusion because of the different design and layout. a preference for the design of the lis library web site over the main library and palmer school web sites was expressed. to eliminate some confusion, the menu options for other campus web sites were dropped down to a separate menu right below the menu of lis web pages. for additional clarity, some of the main library pages were re-created in the style of the lis pages instead of linking to the original page. the evaluators made several concrete suggestions for menu changes, which were included in the redesign. it was suggested that several menu options were unclear and needed clarification, so additional text was added for clarity at the expense of brevity. long island university’s online catalog is named liucat and was listed that way on the menu. new students might not be familiar with this name, so the menu label was changed to liucat (library catalog). user-centered design of a web site | manzari and trinidad-christensen 165 for the link to jake, a description, find periodicals in online databases, was added for clarification. it was also suggested that the link to the main library web page for all databases could cause confusion since the layout and design of that page is different. the wording was changed to all databases (located in the c. w. post library web site). menu options were originally arranged in order of anticipated use (see figure 1). thus, the order of menu options from the lis home page was databases, journals, library catalog, other web sites, palmer school, and main library. evaluators suggested that putting the option for lis home page first would give users an easy “emergency exit” to return to the home page if they were lost. the original menu options also varied from page to page. for example, menu options on the database page referred only to pages that users might need while doing database searches. at the suggestion of evaluators, the menu options were changed to be consistent on every page (see figure 2). a redesign based on these results was completed and posted to the internet for public use (see figure 3). usability testing methodology usability testing is an empirical method for improving design. test subjects are gathered from the population who will use the product and are asked to perform real tasks using the prototype while their performance and reactions to the product are observed and recorded by an interviewer. this observation and recording of behavior distinguishes usability testing from focus groups. observation allows the tester to see when and where users become frustrated or confused. the goal is to jakob nielsen’s usability heuristics visibility of system status—the system should always keep users informed about what is going on, through appropriate feedback within reasonable time. match between system and the real world— the system should speak the user’s language, with words, phrases, and concepts familiar to the user rather than system-oriented terms. follow real-world conventions, making information appear in a natural and logical order. user control and freedom—users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. support undo and redo. consistency and standards—users should not have to wonder whether different words, situations, or actions mean the same thing. follow platform conventions. error prevention—even better than good error messages is a careful design that prevents problems from occurring in the first place. recognition rather than recall—make objects, actions, and options visible. the user should not have to remember information from one part of the dialogue to another. instructions for use of the system should be visible or easily retrievable whenever appropriate. flexibility and efficiency of use—accelerators, unseen by the novice user, may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. allow users to tailor frequent actions. aesthetic and minimalist design—dialogues should not contain information that is irrelevant or rarely needed. every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. help users recognize, diagnose, and recover from errors—error messages should be expressed in plain language (no codes), precisely indicate the problems, and constructively suggest a solution. help and documentation—even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. any such information would be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.12 figure 1. original menu figure 2. revised menu 166 information technology and libraries | september 2006 uncover usability problems with the product, not to test the participants themselves. the data gathered are then analyzed to recommend changes to fix usability problems. in addition to recording empirical data such as number of errors made or time taken to complete tasks, active intervention allows the interviewer to question participants about reasons for their actions as well as about their opinions regarding the product. in fact, subjects are asked to verbalize their thought processes as they complete the tasks using the interface. test subjects are usually interviewed individually and are all given the same pretest briefing from a script with a list of instructions followed by tasks representing actual use. test subjects are also asked questions about their likes and dislikes. in most situations, payment or other incentives are offered to help recruit subjects. four or five subjects will reveal 80 percent of usability problems.13 messages were sent to students via the palmer school’s mailing lists requesting volunteers. a ten-dollar gift certificate to a bookstore was offered as an inducement to recruitment. input was desired from both master’s degree and doctoral students. the first nine volunteers to respond—all master’s degree students—were accepted. this group included students from both the main and satellite campuses. no ph.d. students volunteered to participate at first, citing busy schedules, but eventually a doctoral student was recruited. testing was conducted in computer labs at the library, at the palmer school, and at the manhattan satellite campus. demographic information was gathered regarding users’ gender, age range, university status, familiarity with computers, with the internet, and with the lis library, as well as the type of internet connection and browser usually used. the subjects were given eight tasks to complete using the web site. the tasks reflected both the type of assignment a student might receive in class and the type of information they might seek on the lis web site on their own. the questions were designed to test usability of different parts of the web site. usability testing results the first task tested the print journals page and asked if the lis library subscribes to a specific journal and whether it is refereed. (the web site uses an asterisk next to a journal title to indicate that it is refereed.) all subjects were able to easily find that the lis library does hold the journal title. although it was not initially obvious that the asterisk was a notation indicating that the journal was refereed, most of the subjects eventually found the explanatory note. many of the subjects did not know what a refereed journal was, and some asked if a definition could be provided on the site. for the second task, subjects needed to use jake to find the full text of an article. none of the students were familiar with jake but were able to use the lis web site to gain an understanding of its purpose and to access it. the third task asked subjects to find a library association that required using the other web sites page. all subjects demonstrated an understanding of how to use this page and found the information. the fourth task tested the full-text databases page. only one subject actually used this page to complete the task. the rest used the all databases link to the main library’s database list. that link appears above the link to full-text databases and most subjects chose that link without looking at the next menu option. several subfigure 3. final home page user-centered design of a web site | manzari and trinidad-christensen 167 jects became confused when they were taken to the main library’s page, just as the evaluators had predicted. even though wording was added warning users that they were leaving the lis web site, most subjects did not read it and wondered why the page layout changed and was not as clear. they also had trouble navigating back to the lis web site from the main library web site. the fifth task tested the journals by subject page. this task took longer for most of the subjects to answer, but all were able to use the page successfully to find a journal on a given subject. the sixth task required using the lis home page, and everyone easily used it to find the operating hours. the seventh task required subjects to find an online journal title that could be accessed from the electronic journals page. all subjects navigated this page easily. the final task asked subjects to find a book review. most subjects did not look at the page for library and information sciences databases to access the books in print database, saying they did not think it would be included there. instead, they used the link to the main library’s database page. one subject was not able to complete this task. problems primarily occurred during testing when subjects left the lis page to use a non-library science database located on the main web site. subjects had problems getting back to the lis site from the main library site. while performing tasks, some subjects would scroll up and down long lists instead of using the toolbars provided to bring the user to an exact location on the page. some preferred using the back button instead of using the lis web-site menu to navigate. these seemed to be individual styles of using the web and not any usability problem with the site. several people consistently used the menu to return to the lis home page before starting each new task, even though they could have navigated directly to the page they needed, making a return to the home page unnecessary. this validated the recommendation from the heuristic study that the link to the home page always be the first menu option to give users a comfortable safety valve when they get lost. the final questions asked subjects for their opinions on what they did and did not like about the web site, as well as any suggestions for improving the site. all subjects responded that they liked the layout of the pages, calling them uncluttered, clean, attractive, and logical. there were very few suggestions for improving the site. one person asked that contact information be included on the menu options in addition to its location right below the menu on the lis home page. another participant suggested adding class syllabi to the web site each semester, listing required texts along with a link to an online bookstore. some of the novice users asked for explanations of unfamiliar terms such as “refereed journals.” a participant suggested including a search engine instead of using links to navigate the site. this was considered during the initial site design but was not included since the site did not have a large number of pages. however, a search engine may be worth including. the one doctoral student had previously only used the main library’s web page to access databases. originally, he said he did not see the advantage of a site devoted to information science sources for doctoral candidates, since that program is more multidisciplinary. however, after completing the usability study, the student concluded that the lis web site was useful. he suggested that it should be publicized more to doctoral candidates and that it be more prominently highlighted on the main library web site. though the questions asked were about the lis web site, several subjects complained about the layout of the main library web site and suggested that it have better linking to the lis web site to enable it to be accessed more easily. conclusions iterative testing and user-centered design resulted in a product that testing revealed to be easy to learn and efficient to use, and about which subjects expressed satisfaction. based on findings that some students had not even been aware of the existence of the lis web site, greater emphasis is now given to the web site and its features during new student orientations. the biggest problem users had was navigating from the web pages of the main library back to the lis site. it was suggested that the lis site be highlighted more prominently on the main library web site. some users were confused by the different layouts between the sites, but no one expressed a preference for the design used by the main library web site. despite this confusion, subjects overwhelmingly expressed positive feedback about having a specialized library site serving their specific needs. issues regarding web-site design can be problematic for smaller specialized libraries within larger institutions. in this case, some of the problems navigating between the sites could be resolved by changes to the main library site. the design of the lis web site was preferred over the main campus web site by both the heuristic evaluators and the students in the usability test. however, designers of a main library web site might not be receptive to suggestions from a specialized or branch library. although consistency in design would eliminate confusion, requiring the specialcollection’s web site to follow a design set by the main institution could be a loss for users. in this instance, the main site was designed without user input, whereas the specialized library serving a smaller population was able to be more dynamic and responsive to its users. finding an appropriate balance for a site used by students new to the field as well as advanced students is 168 information technology and libraries | september 2006 a challenge. although the students in the study were all experienced computer and web users, their familiarity with basic library concepts varied greatly. a few novice users expressed some confusion as to the difference between journals and index databases. there actually was a description of each of these sources on the site but it was not read. (the subjects barely read any of the site’s text, so it can be difficult to make some points clearer when users want to navigate quickly without reading instructions. several subjects who did not bother to read text on the site still suggested having more notes to explain unfamiliar terms. however, if the site becomes too overloaded with explanations of library concepts, it could become annoying for more advanced users.) a separate page with a glossary is a possibility—based on the study, however, it will probably not be read. another possibility is a handout for students that could have more text for new users without cluttering the web site. having such a handout would also serve to publicize the site. there was some concern prior to the study that offering more advanced features, such as providing access to jake or indicating which journals are refereed, might be off-putting for new students; therefore, test questions were designed to gauge reactions to these features. most students in the study did express some intimidation at not being familiar with these concepts. however, all the subjects eventually figured out how to use jake and, once they tried it, thought it was a good idea to include it. even new students who had the most difficulty were still able to navigate and learn from the site to be able to use it efficiently. an online survey was added to the final design to allow continuous user input. the site consistently receives positive feedback through these surveys. it was planned that responses could be used to continually assess the site and ensure that it is kept responsive and up-to-date; however specific suggestions have not yet been forthcoming. how valuable was usability testing to the web-site design? several good suggestions were made and implemented, and the process confirmed that the site was well designed. it provided some insight into how subjects used the web site that had not been anticipated by the designers. since usability studies are fairly easy and inexpensive to conduct, it is probably a step worth taking during the web-site design process even if it results in only minor changes to the design. references and notes 1. w3c, “the w3c markup validation service,” validator .w3.org (accessed nov. 1, 2005); w3c, “the w3c css validation service,” jigsaw.w3.org/css-validator (accessed nov. 1, 2005). 2. see carol m. barnum, usability testing and research (new york: longman international, 2002); alison j. head, “web redemption and the promise of usability,” online 23, no. 6 (1999): 20–29; international standards organization, ergonomic requirements for office work with visual display terminals. part 11: guidance on usability—iso 9241-11 (geneva: international organization for standardization, 1998); judy jeng, “what is usability in the context of the digital library and how can it be measured?” information technology and libraries 24, no. 2 (2005): 47–52; jakob nielsen, usability engineering (boston: academic, 1993); ruth ann palmquist, “an overview of usability for the study of users’ web-based information retrieval behavior,” journal of education for library and information science 42, no. 2 (2001): 123–36. 3. joseph s. dumas and janice c. redish, a practical guide to usability testing (portland: intellect bks., 1999), 4. 4. john d. gould and clayton h. lewis, “designing for usability: key principles and what designers think,” communications of the acm 28 no. 3 (1985): 300–11. 5. jakob nielsen, “heuristic evaluation,” in jakob nielsen and robert l. mack, eds., usability inspection methods (new york: wiley, 1994), 25–62. 6. see denise t. covey, usage and usability assessment: library practices and concerns (washington, d.c.: digital library federation, 2002); nicole campbell, usability assessment of library-related web sites (chicago: ala, 2001); kristen l. garlock and sherry piontek, designing web interfaces to library services and resources (chicago: ala, 1999); anna noakes schulze, “user-centered design for information professionals,” journal of education for library and information science 42, no. 2 (2001): 116–22; susan m. thompson, “remote observation strategies for usability testing,” information technology and libraries 22, no. 3 (2003): 22–32. 7. government services administration, “section 508: section 508 standards,” www.section508.gov/index.cfm?fuseacti on=content&id=12#web (accessed nov. 1, 2005). 8. w3c, “web content accessibility guidelines 2.0,” www .w3.org/tr/wcag20 (accessed nov. 1, 2005). 9. see susan augustine and courtney greene, “discovering how students search a library web site: a usability case study,” college and research libraries 63, no. 4 (2002): 354–65; brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188–98; janice krueger, ron l. ray, and lorrie knight, “applying web usability techniques to assess student awareness of library web resources,” journal of academic librarianship 30, no. 4 (2004): 285–93; thura mack et al., “designing for experts: how scholars approach an academic library web site,” information technology and libraries 23, no. 1 (2004): 16–22; mark shelstad, “content matters: analysis of a web site redesign,” oclc systems & services 21, no. 3 (2005): 209–25; robert l. tolliver et al., “web site redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–67; dominique turnbow et al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; leanne m. vandecreek, “usability analysis of northern illinois user-centered design of a web site | manzari and trinidad-christensen 169 university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 10. jakob nielsen and rolf molich, “heuristic evaluation of user interfaces,” in proceedings of the acm chi ’90 (new york: association for computing machinery, 1990), 249–56. 11. robin jeffries et al., “user interface evaluation in the real world: a comparison of a few techniques,” in proceedings of the acm chi ’91 (new york: association for computing machinery, 1991), 119–24; jakob nielsen, “finding usability problems through heuristic evaluation,” in proceedings of the acm chi ’92 (new york: association for computing machinery, 1992), 373–86. 12. jakob nielsen, “heuristic evaluation,” 25–62. 13. jeffrey rubin, handbook of usability testing: how to plan, design, and conduct effective tests (new york: wiley, 1994); jakob nielsen, “why you only need to test with five users, alertbox mar. 19, 2000,” www.useit.com/alertbox/20000319.html (accessed nov. 1, 2005). letter from the editor: september 2021 letter from the editor september 2021 kenneth j. varnum information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13859 in the editorial section of this issue, we have two columns to share. the september editorial board thoughts essay is by paul swanson, “building a culture of resilience in libraries,” reflecting on the lessons of covid-driven flexibility and suggests that a culture of resilience in our libraries will help us to more easily adapt to these, and emerging, changes we will inevitably encounter. that is followed by carole williams’ public libraries leading the way column, “delivering: automated materials handling for staff and patrons,” in which she discusses the effects of an automated materials handling system on both the staff and patrons of the charleston county (sc) public library. in peer-reviewed content, we have a diverse set of articles on range of topics: bias mitigation in metadata; accessibility of pdf documents; two articles on automated classification of different kinds of texts; two articles with lessons learned due to our abrupt move to remote service; and a case study on the importance of product ownership. 1. mitigating bias in metadata: a use case using homosaurus linked data / juliet hardesty and allison nolan 2. accessibility of tables in pdf documents: issues, challenges and future directions / nosheen fayyaz, shah khusro, and shakir ullah 3. text analysis and visualization research on the hetu dangse during the qing dynasty of china / zhiyu wang, jingyu wu, guang yu, and zhiping song 4. topic modeling as a tool for analyzing library chat transcripts / hyunseung koh and mark fienup 5. expanding and improving our library’s virtual chat service: discovering best practices when demand increases / parker fruehan and diana hellyar 6. a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic / matthew black and susan powelson 7. product ownership of a legacy institutional repository: a case study on revitalizing an aging service / mikala narlock and don brower kenneth j. varnum, editor varnum@umich.edu september 2021 https://ejournals.bc.edu/index.php/ital/article/view/13781 https://ejournals.bc.edu/index.php/ital/article/view/13697 https://ejournals.bc.edu/index.php/ital/article/view/13697 https://ejournals.bc.edu/index.php/ital/article/view/13053 https://ejournals.bc.edu/index.php/ital/article/view/12325 https://ejournals.bc.edu/index.php/ital/article/view/12325 https://ejournals.bc.edu/index.php/ital/article/view/13279 https://ejournals.bc.edu/index.php/ital/article/view/13279 https://ejournals.bc.edu/index.php/ital/article/view/13333 https://ejournals.bc.edu/index.php/ital/article/view/13117 https://ejournals.bc.edu/index.php/ital/article/view/13117 https://ejournals.bc.edu/index.php/ital/article/view/13209 https://ejournals.bc.edu/index.php/ital/article/view/13209 https://ejournals.bc.edu/index.php/ital/article/view/13241 https://ejournals.bc.edu/index.php/ital/article/view/13241 mailto:varnum@umich.edu 2 information technology and libraries | march 2010 michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: join us at the forum! t he first lita national forum i attended was in milwaukee, wisconsin. it seems like it was only a couple of years ago, but in fact nine national forums have since passed. i was a new librarian, and i went on a lark when a colleague invited me to attend and let me crash in her room for free. i am so glad i took her up on the offer because it was one of the best conferences i have ever attended. it was the first conference that i felt was made up of people like me, people who shared my interests in technology within the library. the programming was a good mix of practical know-how and mindblowing possibilities. my understanding of what was possible was greatly expanded, and i came home excited and ready to try out the new things i had learned. almost eight years passed before i attended my next forum in cincinnati, ohio. after half a day i wondered why i had waited so long. the program was diverse, covering a wide range of topics. i remember being depressed and outraged on the current state of internet access in the united states as reported by the office for information technology policy. i felt that surge of recognition when i discovered that other universities were having a difficult time documenting and tracking the various systems they run and maintain. i was inspired by david lanke’s talk, “obligations of leadership.” if you missed it you can still hear it online. it is linked from the lita blog (http:// www.litablog.org). while the next forum may seem like a long way off to you, it is in the forefront of my mind. the national forum 2010 planning committee is busy working to make sure this forum lives up to the reputation of forums past. this year’s forum takes place in atlanta, georgia, september 30–october 3. the theme is “the cloud and the crowd.” program proposals are due february 19, so i cannot give you specifics about the concurrent sessions, but we do hope to have presentations about projects, plans, or discoveries in areas of library-related technology involving emerging cloud technologies; software-as-service, as well as social technologies of various kinds; using virtualized or cloud resources for storage or computing in libraries; library-specific open-source software (oss) and other oss “in” libraries; technology on a budget; using crowdsourcing and user groups for supporting technology projects; and training via the crowd. each accepted program is scheduled to maximize the impact for each attendee. programming ranges from five-minute lightening talks to full day preconferences. in addition, on the basis of attendee comments from previous forums, we have also decided to offer thirtyand seventy-five-minute concurrent sessions. these concurrent sessions will be a mix of traditional singleor multispeaker formats, panel discussions, case studies, and demonstrations of projects. finally, poster sessions will also be available. while programs such as the keynote speakers, lightning talks, and concurrent sessions are an important part of the forum experience, so is the opportunity to network with other attendees. i know i have learned just as much talking with a group of people in the hall between sessions, during lunch, or at the networking dinners as i have sitting in the programs. not only is it a great opportunity to catch up with old friends, you will also have the opportunity to make new ones. for instance, at the 2009 national forum in salt lake city, utah, approximately half of the people who attended were first-time attendees. the national forum is an intimate event whose attendance ranges between 250 and 400 people, thus making it easy to forge personal connections. attendees come from a variety of settings, including academic, public, and special libraries; library-related organizations; and vendors. if you want to meet the attendees in a more formal setting you can attend a networking dinner organized on-site by lita members. this year the dinners were organized by the lita president, lita past president, lita presidentelect, and a lita director-at-large. if you have not attended a national forum or it has been a while, i hope i have piqued your interest in coming to the next national forum in atlanta. registration will open in may! the most up-to-date information about the 2010 forum is available at the lita website (http:// www.lita.org). i know that even after my lita presidency is a distant memory, i will still make time to attend the lita national forum. i hope to see you there! 87 book reviews automation in libraries, by r. t. kimber. oxford, pergamon press, 1968. 140 pp. $6.00. many books have been published in recent years on the subject of library automation. very few of them, however, have succeeded in making meaningful contributions to a better understanding of the subject. this volume has made a sincere effort to be one of the few. although library automation is an ambiguous term which lacks precise definition, it is used here clearly to mean the use of computers in libraries. the book is intended for those with no computer background but who are familiar with library operations. it attempts to give a good introduction to current practices in library automation and a fairly detailed ac4 count of the state of the art. in the first chapter, "libraries and automation," mr. kimber discusses the relationship between the library and the computer. seeing the computer as a means of performing human clerical functions, he points out two important attitudes that must be observed: first, one must not change to a computer system just for the sake of changing, and second, one must be willing to change if the change means improvement. the monetary worth of the computer in the library is difficult to express because the end result is not increased profit but better service. since benefits from computer operations can be expressed in time and effort saved, these are the means of monetary comparison the author suggests. he also observes that although there are many good reasons for wanting computerized operations, some of these are merely emotional. chapter ii, "introduction to computers" is written by anne h. boyd, lecturer in computation at queen's university of belfast. miss boyd gives a brief review of the development and use of computers and discusses the fundamentals of computer systems. the next four chapters by mr. kimber present computerized systems for various library activities: chapter iii, "ordering and acquisitions," chapter iv, "circulation control," chapter v, "periodicals listing and accessioning," chapter vi, "catalogues and bibliographies." each chapter with the minimum of technical terminology gives a good account of what is involved in automating a particular operation. his treatment is very informative on these matters. in his final chapter (chapter vii, "the present state of automation in libraries") kimber discusses current trends of library automation and gives examples of libraries which use computers. his list is admittedly not comprehensive, but it does provide a comparison to the "ideal" systems he has described in the earlier chapters. in commenting on the future of computerized library systems, he sees these systems as an escape from the problems of everyday library operations. .88 journal of library a.utomation vol. 3/1 march, 1970 this book should be a good addition to the current book-s on library automation. one unfortunate aspect, however, appears to be an absence of treatment regarding the psychological impact of automation on librarians and users which is certainly one important aspect to he considered when automation of a system is proposed. also, at times the author, in attempting to simplify his discussion, has made :a generalized statement without fuller explanation. this could be misleading and tend to confuse the uninitiated reader. these deficiencies are not of major consequence and do not prejudice the total work but, care should be taken in reading. sul h. lee 1968 international directory of research and development scientists, philadelphia: institute for scientific information, inc., 1969, 1352 pages { approx. ) . $60.00. the second issue of the "international directory of research and development scientists" ( idr&ds) iists the names and organizational addresses of 152,648 authors whose papers were listed in either 4o implications of marc, and the· library of congress systems studies. (this paper includes twenty-eight pages. of ap-· pendices., mostly charts}., two additional papers include a discussion of the future of, and a tabulation of trends affecting, library automation. mm:h of the material in these non-survey papers. is reported more completely elsewhere and some of ft now seems dated. the material presented in this publication must have produced a highly effective educational institute in 1967. in 1969~ its value is at best as a first reader in library automation but not as the state-of-the-art review the title proclaims. charles t . payne 90 journal of library automation vol. 3/1 march, 1970 computers and data processing: information sources, by chester morrill, jr. an annotated guide to the literature, associations, and institutions concerned with input, throughput, and output of data. detroit: gale research co., [1969]. 275 pp. $8.75. (management information guide, 15) this latest volume in the management information guide series should prove as useful as its predecessors, offering to those persons interested in or concerned with computers and data processing (and who now is not?) an organized and extensive survey of the basic and necessary source of available information. thus the text is for the most part an annotated bibliography of pertinent references arranged in broad categories, each category prefaced with a paragraph or two of comment. this is in the style of mr. morrill's earlier contribution to the series, systems and procedures including office management, 1967 and, in general, that of all the volumes of the series. section 7 "operating" is the largest category, some forty pages of references subdivided into "manuals," "digital computers," "data transmission," "fortran," "software" and the like. section 9, entitled "front office references," is of particular interest to the reference librarian, since it serves as a guide to desirable dictionaries, handbooks and abstracting services in the fields of automation and data processing. individual annotations are usually brief, informative and on occasion evaluative. they give evidence of considerable skill in the art of capsule characterization. the prefatory paragraphs and notes to each section characterize the particular topic as successfully and succinctly as do the individual annotations. the preface to section 3, "personnel," is particularly felicitous. coverage is ample not only as to the subjects chosen but also as to numbers of references under individual subjects. an important thirty pages of appendices lists additional sources of information associations, manufacturers, seminars, publishers, placement firms, etc.-particularly valuable to the business man or government official as a desk or front-office reference book, although the librarian will also find it of value in providing specific information for his clientele. in all, this is a highly competent and very welcome addition to the series as well as to the ranks of special reference sources so necessary to the proper practice of the reference librarian's art. i think of crane's a guide to the literature of chemistry and white's sources of information in the social sciences and consider the author quite comfortable in their company as well as in that of his colleagues in the series. in addition, he evinces in his annotations and prefaces a wit, a turn of phrase and a capacity for direct statement that inform and delight the user. he displays an expertise in the fields of management and computer science, and one feels one can rely on his selection and judgment. eleanor r. devlin book reviews 91 cenralized book processing: a feasibility study based on colorado academic libraries by lawrence e. leonard, joan m. maier and richard m. dougherty. metuchen, n.j.: scarecrow press, 1969. 401 pp. $10.00. in october 1966 the national science foundation awarded a grant to the university of colorado libraries and the colorado council of librarians for research in the area of centralized processing. the project was in three phases. phase i involved an examination of the feasibility of establishing a book-processing center to serve the needs of the nine state-supported college and university libraries in colorado (which range in size from the university of colorado, with 805,959 volumes as of june 30, 1967, to metropolitan state college, a new institution with 8,310 volumes). phase ii involved a simulation study of the proposed center, while phase iii involved an operational book-processing center on a one-year experimental basis. this book summarizes the results of the first two phases of the study. phase i involved a detailed time-and-cost analysis of the acquisition, cataloging, and bookkeeping procedures in the nine participating libraries, with resultant processing costs per volume which are both convincing and somewhat startling, ranging as they do from $2.67 to $7.71 per volume. the operating specifications of the proposed book-processing center are then set forth and a mathematical model for simulating its operations under a variety of alternative conditions is prepared. the conclusions are less than surprising: "a centralized book processing center to serve the needs of the academic libraries in colorado is a viable approach to book processing." project benefits are enumerated, in the areas of cost savings, time-lag reductions, and the more efficient utilization of personnel. unfortunately, while many of the conclusions are buttressed by a dazzling array of tables and mathematical formulas (how can most librarians really argue with a regression analysis correlation coefficient matrix?), some of the most important savings cited are based on simple guesses, in some cases very simple guesses. to mention just two examples: 1) we are told that "a discount advantage expected through the use of combined ordering and a larger volume of ordering is conservatively estimated at 5% ... " (perhaps, but what is this based on?) 2) in the area of time lag reduction, "the greatest savings in time will accrue when the center is able to purchase materials from a vendor who has built up his book stock to reflect the needs of academic institutions. up to now, vendors have been unwilling to do this because there is insufficient profit motive." would nine libraries combining together change this profit picture? it is unfortunate that this report could not have waited on phase iii, the completion of the one-year trial of the operational center which was to have been ready in august 1969, so that we could see just how the predictions for the center worked out in practice. as it stands, however, the 92 journal of library autcmuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those reference publications which are also available in machine readable form. as a "selection" it is limited to eighteen sources from government, professional and private organizations. the guide is the result of a survey undertaken in 1968 by the science and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the equipment configuration section describes computer model, core, operating systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized information in science and technology, is updated periodically and is a very useful reference tool in this field. ge"y d. guthrie 92 journal of library autonuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those refere~~e pu~lic,~~o~s which are also available in machine readable form. as a selection 1t ls limited to eighteen sources from government, professional and private organizations. . . the guide is the result of a survey undertaken m 1968 by the sc1ence and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the. equipment configuration section describes computer model, core, operatmg systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized infornuztion in science and technology, is updated periodically and is a very useful reference tool in this field. gerry d. guthrie \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant ineditorial board thoughts | eden 109 editorial board thoughts bradford lee eden musings on the demise of paper w e have been hearing the dire predictions about the end of paper and the book since microfiche was hailed as the savior of libraries decades ago. now it seems that technology may be finally catching up with the hype. with the amazon kindle and the sony reader beginning to sell in the marketplace despite the cost (about $360 for the kindle), it appears that a whole new group of electronic alternatives to the print book will soon be available for users next year. amazon reports that e-book sales quadrupled in 2008 from the previous year. this has many technology firms salivating and hoping that the consumer market is ready to move to digital reading as quickly and profitably as the move to digital music. some of these new devices and technologies are featured in the march 3, 2009, fortune article by michael v. copeland titled “the end of paper?”1 part of the problem with current readers is their challenges for advertising. because the screen is so small, there isn’t any room to insert ads (i.e., revenue) around the margins of the text. but new readers such as plastic logic, polymer vision, and firstpaper will have larger screens, stronger image resolution, and automatic wireless updates, with color screens and video capabilities just over the horizon. still, working out a business model for newspapers and magazines is the real challenge. and how much will readers pay for content? with everything “free” over the internet, consumers have become accustomed to information readily available for no immediate cost. so how much to charge and how to make money selling content? the plastic logic reader weighs less than a pound, is one-eighth of an inch thick, and resembles an 8½ x 11 inch sheet of paper or a clipboard. it will appear in the marketplace next year, using plastic transistors powered by a lithium battery. while not flexible, it is a very durable and break-resistant device. other e-readers will use flexible display technology that allows one to fold up the screen and place the device into a pocket. much of this technology is fueled by e-ink, a start-up company that is behind the success of the kindle and the reader. they are exploring the use of color and video, but both have problems in terms of reading experience and battery wear. in the long run, however, these issues will be resolved. expense is the main concern: just how much are users willing to pay to read something in digital rather than analog? amazon has been hugely successful with the kindle, selling more than 500,000 for just under $400 in 2007. and with the drop in subscriptions for analog magazines and newspapers, advertisers are becoming nervous about their futures. or will the “pay by the article” model, like that used for digital music sales, become the norm? so what should or do these developments mean for libraries? it means that we should probably be exploring the purchase of some of these products when they appear and offering them (with some content) for checkout to our patrons. many of us did something similar when it became apparent that laptops were wanted and needed by students for their use. many of us still offer this service today, even though many campuses now require students to purchase them anyway. offering cutting-edge technology with content related to the transmission and packaging of information is one way for our clientele to see libraries as more than just print materials and a social space. and libraries shouldn’t pay full price (or any price) for these new toys; companies that develop these products are dying to find free research and development focus groups that will assist them in versioning and upgrading their products for the marketplace. what better avenue than college students? related to this is the recent announcement by the university of michigan that their university press will now be a digital operation to be run as part of the library.2 decreased university and library budgets have meant that university presses have not been able to sell enough of their monographs to maintain viable business models. the move of a university press to a successful scholarly communication and open-source publishing entity like the university of michigan libraries means that the press will be able to survive, and it also indicates that the newer model of academic libraries as university publishers will have a prototypical example to point out to their university’s administration. in the long run, these types of partnerships are essential if academic libraries are to survive their own budget cuts in the future. references 1. michael v. copeland, “the end of paper?” cnnmoney .com, mar. 3, 2009, http://money.cnn.com/2009/03/03/ technology/copeland_epaper.fortune/ (accessed june 22, 2009). 2. andrew albanese, “university of michigan press merged with library, with new emphasis on digital monographs,” libraryjournal.com, mar. 26, 2009, http://www .libraryjournal.com/article/ca6647076.html (accessed june 22, 2009). bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services and scholarly communication, university of california, santa barbara. a tale of two tools: comparing libkey discovery to quicklinks in primo ve communication a tale of two tools comparing libkey discovery to quicklinks in primo ve jill k. locascio and dejah rubel information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16253 jill k. locascio (jlocascio@sunyopt.edu) is associate librarian, suny college of optometry. dejah rubel (dejahrubel@ferris.edu) is metadata and electronic resources management librarian, ferris state university. © 2023. introduction consistent delivery of full-text content has been a challenge for libraries since the development of online databases. library systems have attempted to meet this challenge, but link resolvers and early direct linking tools often fell short of patron expectations. in the last several years, a new generation of direct linking tools has appeared, two of which will be discussed in this article: third iron’s libkey discovery and quicklinks by ex libris, a clarivate company. figure 1 shows the “download pdf” link added by libkey. figure 2 shows the “get pdf” link provided by quicklinks. the way we configured our discovery interface, a resource cannot receive both the libkey and quicklinks pdf links. these two direct linking tools were chosen because they were both relatively new to the market in april 2021 when this analysis took place and they can both be integrated into primo ve, the library discovery system of choice at the authors’ home institutions of suny college of optometry and ferris state university. through analysis of the frequency of direct links, link success rate, and number of clicks, this study may help determine which product is most likely to meet your patrons’ needs. figure 1. example of a libkey discovery link in primo ve. figure 2. example of a quicklink in primo ve. mailto:jlocascio@sunyopt.edu mailto:dejahrubel@ferris.edu information technology and libraries june 2023 a tale of two tools 2 locascio and rubel literature review over the past 20 years link resolvers and direct linking have evolved in tandem. early link generator tools, such as proquest’s sitebuilder, often involved a process that “… proved too cumbersome for most end-users.”1 five years later, tools from ebsco, gale, ovid, and proquest had improved, but they were all proprietary. bickford postulates that metadata-based standards, like openurl, may make linking as simple as copying and pasting from the address bar; however, they may be more likely to fail “… as long as vendors use incompatible, inaccurate, or incomplete metadata.”2 the first research was wakimoto’s 2006 study of sfx, which relied on 224 test queries and 188,944 individual uses for its data set. 3 of those queries, 39.7% of search results included a full-text link and that link was accessed 65.2% of the time. unfortunately, wakimoto also discovered that 22.2% of all full-text results failed and concluded that most complaints against sfx were problems with the systems it links to and not the link resolver itself. alth ough intended to be provider-neutral, the openurl standard is, in fact, vulnerable to metadata omissions. content providers, whether aggregators or publishers, have a vested interest in link stability and platform use and have therefore invested in building direct link generation tools. in 2006, grogg examined ebsco’s smartlink, which checks access rights before generating the link; proquest’s crosslinks, which was used to link from proquest to another vendor’s content; silverplatter and links@ovid, which relied on a knowledge base in the terabytes for static links.4 in 2008, cecchino described the national library of medicine’s linkout tool for selected publishers within pubmed.5 they also described two ovid products: links@ovid and linksolver, noting that the former is similar to linkout and the latter is similar to sfx. most of the time these tools worked well, but their use was restricted to a particular platform or set of publishers. as online public catalogs became discovery layers, direct linking became a feature of the library management system. two studies have been done thus far: silton’s analysis of summon and stuart’s analysis of 360 link. in 2014, silton tested the percentage of full-text articles retrievable from summon by running a test query and examining the first 100 results. over a year, the total success rate for unfiltered queries rose from 61% to 76%. after direct linking was introduced, the success rate of link resolver links rose to 65.8–73% and direct links succeeded 90.48–100% of the time. silton concluded, “while direct linking had some issues in its early months, it generally performs better than the link resolver.”6 in 2011, stuart, varnum, and ahronheim began testing the 1-click feature of 360 link on 579 citations, 82.2% of which were successful. after direct linking became an option for summon in 2012, 61–70% of their sample relied on it. “between direct linking and 1-click about 93 to 94% of the time an attempt was made to lead users directly to the full text of the article … [and] … we were able to reach full text … from 79% to about 84% of the time.”7 direct linking outperformed 1-click with a 90% success rate compared to 58–67% for 1-click. stuart also compared the actual error rate with one based on user reports and discovered that “relying solely on user reports of errors to judge the reliability of full-text links dramatically underreports true problems by a factor of 100.”8 openurl links were especially alarming with approximately 20% of them failing. although direct linking is more reliable, stuart closes by noting that direct linking binds libraries closer to vendors thereby decreasing institutional their flexibility. information technology and libraries june 2023 a tale of two tools 3 locascio and rubel methods the goal of this project was to assess two of the latest direct linking tools: ex libris’s native quicklinks feature and third iron’s libkey discovery. we performed a side-by-side comparison of the two tools by searching for specific articles in primo ve, the library discovery system used by the authors’ respective home institutions, suny college of optometry and ferris state university, and measuring • how often each vendor’s direct links appeared on the brief record; • success rate of the links; and • number of clicks it takes from each link to reach the pdf full text. both suny college of optometry and ferris state university use ex libris’ alma as their library services platform. alma provides a number of usage reports in their analytics module. we sourced the queries used in our analysis from the alma analytics link resolver usage report. the report contains a field number of requests, which records the number of times an openurl request was sent to the link resolver. an openurl request is sent to the link resolver when the user clicks on a link to the link resolver from an outside source (such as google scholar), for example, when the user submits a request using primo’s citation linker or when the user accesses the article’s full record in primo by clicking on either the brief record’s title or availability statement. this means that results that have a direct link (whether a quicklink or libkey discovery link) on the brief record will not appear in the report if the user clicked the direct link to the article. thus, in order to create test searches that would be an accurate representation of articles being accessed, we used article titles taken from suny optometry’s october 2019 alma link resolver usage report— a report that was generated prior to the implementation of both libkey discovery and quicklinks. the report was filtered to include only articles with the source type of primo/primo central to ensure that the initial search was taking place within the native primo interface, as requests from outside sources like google scholar or from primo’s citation linker are irrelevant to this analysis. this filtering generated a total of 412 articles. after further removal of duplicates and non -article material, there were 386 article titles in our test query set. we created two separate primo views as test environments: one with libkey discovery and the other with quicklinks. we ran the test searches twice in each view. in the first round of testing, we recorded whether a direct link was present. we also recorded the name of the full-text provider (if present), as well as whether the article was open access. suny optometry does not filter their primo results by availability; therefore, many of the articles included in the initial search did not have any associated full-text activations. since these articles are irrelevant to our assessment, we removed them before analyzing the first round of data and proceeding with the second search. the exception to these removals were articles identified as open access by unpaywall, as the presence of unpaywall links is independent of any activations in alma. furthermore, third iron’s libkey discovery and ex libris’ quicklinks both incorporate unpaywall’s api into their products to provide direct links to pdfs of open access articles. this functionality helps fill coverage gaps where institutions may not have activated a hybrid open access journal due to its paywalls. therefore, we are including the presence of direct links resulting from the unpaywall api when determining whether a libkey discovery link or quicklink is present. after filtering for availability, we had 254 article titles for the first round of searching and analysis. the initial analysis revealed the need to further filter the information technology and libraries june 2023 a tale of two tools 4 locascio and rubel articles used for the second round of searching, which would provide a much closer comparison of the two direct linking tools as third iron had partnered with more content providers than ex libris. controlling for shared providers would give a more accurate representation of how each direct linking tool performs in relation to the other. when controlling for shared providers and open access articles, we were left with 145 article titles for the second query set. during the second round of searching, we measured whether the direct link was successful in linking to the full text—meaning that the link was neither broken nor linked to an incorrect article—and how many clicks were necessary to get from the direct link to the article pdf. along the way, additional qualitative measures were observed, such as document download time and metadata record quality. while not as easy to measure as the quantitative data, these observations provided additional insight into the strengths and weaknesses of each of these direct linking tools. since april 2022, when our research was conducted, ex libris has added several quicklinks providers, possibly increasing the current number of quicklinks available. additionally, both rounds of searching were conducted on campus, so our analysis excludes any consideration of authentication and/or proxy information. results of the 254 articles searched, 208 (82%) had libkey discovery links present while 129 (52%) had quicklinks present. while this seems like a large discrepancy between the two direct link providers, it can be explained by the fact that during the time of testing, ex libris was collaborating with fewer content providers than third iron. ex libris has since added more providers. while the provider discrepancy meant that there were many instances where a libkey discovery link was present where a quicklink was not, there were 5 articles where a quicklink was present while a libkey discovery link was not. as mentioned previously, the criterion for the 254 articles included in the second round of searching was that the articles must be activated in alma or must be open access. of these 254 articles, we identified 137 (54%) as open access. of those open access articles, 132 (96%) had libkey discovery links present, and 118 (86%) had quicklinks present. we found that 113 (82%) of the open access articles had both libkey discovery links and quicklinks present. we also discovered within this set of 137 open access articles that 30 (22%) were from non-activated resources. of those 30 open access articles from non-activated titles, all 30 (100%) had libkey discovery links appearing on the brief results and 24 (80%) had quicklinks. to get a better idea of how libkey discovery links and quicklinks compared in terms of linking success, we filtered to only those articles available from providers who were participating in both libkey discovery links as well as quicklinks. since both direct linking tools use unpaywall integrations, we continued to include open access articles. this filtering resulted in 145 articles where libkey discovery links were present in 137 articles (94%) while quicklinks were present in 129 articles (89%). we found that 123 (85%) of these 145 articles had both libkey discovery links and quicklinks present. there were 2 (1%) articles that had neither libkey discovery links nor quicklinks present despite being activated in a journal currently participating as a provider in both direct linking tools. there were also 14 articles (10%) that had libkey discovery links but information technology and libraries june 2023 a tale of two tools 5 locascio and rubel not quicklinks; all of these articles were open access. in total, of the 145 articles searched, 128 (88%) were identified as open access. as for the 137 libkey discovery links, 130 (95%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. of the 129 quicklinks, 126 (98%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. we also attempted to measure the time it took for the pages to load after the initial click on the libkey discovery links and quicklinks; however, the tools used to measure this, as well as the environments in which the links were being clicked, proved too varied to provide an appropriate comparison. nevertheless, we noted observations such as the page load times after clicking on libkey discovery links and quicklinks were generally consistent, but quicklinks attempts to connect to the wiley platform took a significant time (at least 10 seconds) to load. conclusions with high article linking success rates, both third iron’s libkey discovery and ex libris’ quicklinks deliver on the promise to provide fast and seamless access to full-text articles. however, the libkey discovery tool far outpaces quicklinks when it comes to coverage. both direct linking tools perform well with open access articles, supplying libraries with better options for full-text links to articles that may be in hybrid journals. as with any kind of full-text linking, both direct linking tools rely on metadata. in conclusion, while libkey discovery provides a more complete direct linking solution, both libkey discovery and quicklinks are reliable tools that improve primo’s discovery and delivery experience. endnotes 1 david bickford, “using direct linking capabilities in aggregated databases for e-reserves,” journal of library administration 41, no. 1/2 (2004): 31–45, https://doi.org/10.1300/j111v41n01_04. 2 bickford, 45. 3 wendy furlan, “library users expect link resolvers to provide full text while librarians expect accurate results,” evidence based library and information practice 1, no. 4 (2006): 60–63, https://doi.org/10.18438/b88c7p. 4 jill e. grogg, “linking without a stand-alone link resolver,” library technology reports 42, no. 1 (2006): 31–34. 5 nicola j. cecchino, “full-text linking demystified,” journal of electronic resources in medical libraries 5, no. 1 (2008): 33–42, https://doi.org/10.1080/15424060802093377. 6 kate silton, “assessment of full-text linking in summon: one institution’s approach,” journal of electronic resources librarianship 26, no. 3 (2014): 163–69, https://doi.org/10.1080/1941126x.2014.936767. https://doi.org/10.1300/j111v41n01_04 https://doi.org/10.18438/b88c7p https://doi.org/10.1080/15424060802093377 https://doi.org/10.1080/1941126x.2014.936767 information technology and libraries june 2023 a tale of two tools 6 locascio and rubel 7 kenyon stuart, ken varnum, and judith ahronheim, “measuring journal linking success from a discovery service,” information technology and libraries 34, no. 1 (2015): 52–76, https://doi.org/10.6017/ital.v34i1.5607. 8 stuart, varnum, and ahronheim, 74. https://doi.org/10.6017/ital.v34i1.5607 introduction literature review methods results conclusions microsoft word 14041 20211221 galley.docx public libraries leading the way how covid affected our python class at the worcester public library melody friedenthal information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.14041 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. © 2021. in june 2020, ital published my account of how the worcester public library (ma) came to offer a class in python programming and how that class was organized. although readers may have read the article in the middle of our covid-year, i wrote it mostly in early january 2020, before libraries across the country closed in an effort to protect staff and patrons from the disease. from spring 2020 through april 2021, i taught intro to coding: python for beginners five more times. but, of course, these classes were not face-to-face. like virtually all other library, musical, political, religious, and cultural programming across the world, our python course was taught virtually. the public services team has one professional zoom account, which my colleagues and i share. how did going remote affect this class? it depends on whether your perspective is that of a student or that of the instructor. many of us have read how difficult it’s been for teachers to effectively reach their elementarythrough-collage-age students. i’ve had many of the same challenges, but since nearly all my students are adults and they all chose to take this class, i don’t need to grapple with fidgety kids or recess. on the other hand, there were few distractions in our computer lab, while covid-time students have to grapple with pets, children squabbling, or noise from a tv. i was teaching from my home office. at the library i have one monitor but at home i have two, which makes it easier for me to spread out my assorted documents. to “protect” my students from seeing my messy house, i used a virtual background, one chosen not to distract. however, the software which determines the borders of a human presenter isn’t perfect and there is sometimes a halo behind my head of the things behind me; this may be distracting itself. prior to covid, since we had twelve seats in the computer lab, we limited registration to fourteen, allowing for some no-shows (and we have two spare laptops, in case everyone showed up). a week prior to session one i would email the registrants, asking them to confirm their continued interest. if a student didn’t confirm, i’d give their seat to someone on the waitlist. while i was not prepared to make my class a mooc (massive open online courses) because i individually review homework and give lots of feedback, we did increase maximum registration to fifteen since the number of seats in the computer lab was no longer a limiting factor. and, as before, i ask for confirmation via email, but i also include in that email two links and an attached word doc. the document is an excerpt from cory doctorow’s novel little brother on the joys of coding. information technology and libraries december 2021 how covid affected our python class | friedenthal 2 the first embedded link leads to the free version of zoom. the second link is to the thonny website (https://thonny.org). thonny is a free ide (integrated development environment) where students can write and execute python code. we used thonny when i taught face-to-face, but the lab computers all had thonny installed, and were ready for students to use. now, i have to depend on the ability of students to download the software to their own computers. i ask students to do the two downloads ahead of session one. which brings us to two problems: the class was no longer accessible to students who live in a household without a computer and internet service. and, as i found out with one prospective student, it’s not accessible to patrons who don’t have administrative rights to their computer; that is, the ability to download new software. when a patron confirms their interest, i email them the course manual. it now contains about 93 pages. i told students they might choose to print it but doing so is up to individual preference. the advantage of having a digital copy is that students can search for keywords easily. the disadvantage is that the cost of printing the manual is shifted to the student and may be prohibitive for some. in session one, i acknowledged that it’s difficult to learn technical material via zoom, and i encouraged everyone to ask questions during class and to email me if they are stymied while working on the homework. i reiterated that invitation during every session. while teaching, i bounce back-and-forth between screen-sharing my thonny window and the manual, while trying to keep an eye on the little zoom windows showing my students. some students cannot or choose not to turn on their video. this is a problem for me, since i can’t readily determine who’s asking a question. moreover, it is helpful to associate a face with a name. and since i give out a certificate of completion to each student who does the homework and attends all sessions, i want to make sure the student is actually taking part. i’ve had students who sign in, leave their camera off, and then apparently leave (i call on students by name and sometimes the no-video ones never respond). offering the class online has advantages in snowy worcester. students can tune in from the comfort of their own homes, avoid the slick roads, bypassing paying for parking at the municipal lot next to our building or for a bus to downtown, or the discomfort of walking in a dark citycenter in the evening. another plus: as program organizers and program participants have discovered, with videoconferencing we are no longer limited geographically. i had registrants who live in pennsylvania and georgia. as always, students range from total beginners to experienced programmers-of-other-languages. i’ve thought about how i can give extra time to the former while not boring the latter. one thing i’ve done is to make some assignments optional and say, “if you want an extra challenge, give this a try….” i’ve slowed the class down a bit, leaving more time for coding during each session. if a student has difficulties, i invited them to share their screen. this pedagogical technique actually works better information technology and libraries december 2021 how covid affected our python class | friedenthal 3 via zoom than in-person, because we could all see that screen equally well. in the computer lab, only the student who sat at the same (2-person) desk could easily see what the other person had coded. another thing i’ve done is to ratchet down the formality of the class: i am chattier and demo fun games i’ve written, e.g., hangman, tic-tac-toe, rock-paper-scissors, and you sunk my battleship, for inspiration. i experimented with using the built-in zoom whiteboard but that wasn’t satisfactory, so i wrote supplementary notes as comments in thonny. parents were fearful their kids were not being intellectually challenged when schools were closed due to the pandemic, so maybe i shouldn’t have been surprised that the april 2021 class contained seven children. there would have been an eighth, but when i realized one registrant was just seven years old, i told his mother that, while she was the best judge of her son’s abilities, i discouraged him from taking the class. she decided to take it herself. figure 1. a word-cloud of our fall 2020 project outcome evaluations (includes other digital learning programs). at our sixth and final session i traditionally execute a program which draws colorful graphics, rather like spirograph. students were able to see each curve being drawn in a new window launched by the ide. but this window doesn’t exist until i executed the program. while we were information technology and libraries december 2021 how covid affected our python class | friedenthal 4 using zoom, when i attempted to share my screen, the students missed the first graphics, no matter how fast i was at screen-sharing. i made the execution “sleep” for a few seconds to give me time to switch screens before the graphics were drawn. a larger percentage of students earned the certificate of completion during the virtual classes than on average in the in-person pre-covid classes, perhaps 75% vs. 40%. for the in-person classes our communications officer printed the certificates on heavy paper adorned with the wpl logo; i signed each and handed them out during the final session. for our virtual classes, the certificates were digitally signed and then emailed; students could print them if they chose. this follow-up is being written during october 2021, and with a substantial percentage of massachusetts residents vaccinated for covid, the worcester public library is now back to offering many programs in-person, including python. the city of worcester requires mask use in all municipal buildings, and while some patrons don’t cooperate, i’ve told my students that anyone not wearing a mask properly will be asked to leave the computer lab. with so many people out of work due to the economic devastation wrought by covid, we were gratified to be able to offer a class that teaches in-demand skills, especially ones that can be applied in a work-from-home environment. article title | author 23frbrization of a library catalog | dickey 23 the functional requirements for bibliographic records (frbr)’s hierarchical system defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems. certain library materials (especially audio-visual formats) pose notable challenges to search and retrieval; the first benefits of a frbrized system would be felt in music libraries, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. this report will summarize the benefits of frbr to nextgeneration library catalogs and opacs, and will review the handful of ils and catalog systems currently operating with its theoretical structure. editor’s note: this article is the winner of the lita/ ex libris writing award, 2007. t he following review addresses the challenges and benefits of a next-generation online public access catalog (opac) according to the functional requirements for bibliographic records (frbr).1 after a brief recapitulation of the challenges posed by certain library materials—specifically, but not limited to, audiovisual materials—this report will present frbr’s benefits as a means of organizing the database and public search results from an opac.2 frbr’s hierarchical system of records defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems; it thus affords both library users and staff a more streamlined navigation between related items in different materials formats and among editions and adaptations of a work. in the eight years since the frbr report’s publication, a handful of working systems have been developed. the first benefits of such a system to an average academic library system would be felt in a branch music library, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. ■ current search and retrieval challenges the difficulties faced first, but not exclusively, by music users of most integrated library systems fall into two related categories: issues of materials formats, and issues of cataloging, indexing, and marc record structure. music libraries must collect, catalog, and support materials in more formats than anyone else; this makes their experience of the most common ils modules—circulation, reserves, and acquisitions—by definition more complicated. the study of music continues to rely on the interrelated use of three distinct information formats—scores (the notated manifestation of a composer’s or improviser’s thought), recordings (realizations in sound, and sometimes video, of such compositions and improvisations), and books and journals (intellectual thought regarding such compositions and improvisations)—music libraries continue to require . . . collections that integrate [emphasis mine] these three information formats appropriately.3 put a different way, “relatedness is a pervasive characteristic of music materials.”4 this is why frbr’s model of bibliographic relationships offers benefits that will first impact the music collection.5 at present, however, musical formats pose search and retrieval challenges for most ils users, and the problem is certainly replicated with microforms and video recordings. the marc codes distinguish between material formats, but they support only one category for sound recordings, lumping together cd, dvd audio, cassette tape, reel-toreel tape, and all other types.6 this single “sound recording” definition is easily reflected in opacs (such as those powered by innovative interfaces’ millennium and ex libris’ aleph 500) and union catalogs (such as worldcat. org).7 however, the distinction between sound recording formats is embedded in subfields of the 007 field, which presently cannot be indexed by many library automation systems because the subfields are not adjacent. an even more central challenge derives from the fact that music sound recordings—such as journals and essay collections—contain within each item more than one work. thus, for one of the central material formats collected by a music library (as well as by a public library or other academic branches), users routinely find themselves searching for a distinct subset of the item record. perversely, though music catalogers do tend to include analytic added-entries for the subparts of a cd recording or printed score, and major ils vendors are learning to index them, aacr2 guidelines set arbitrary cutoff points of about fifteen tracks on a sound recording, and three performable units within a score.8 subsets of essay collections and journal runs are routinely exposed to users’ searches by indexing and abstracting services and major databases, but subsets of libraries’ music collections depend upon catalogers to exploit the marc records for user access.9 timothy j. dickey (dickeyt@oclc.org) is a post-doctoral researcher, oclc office of programs and research, dublin, ohio. frbrization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display timothy j. dickey 24 information technology and libraries | march 200824 information technology and libraries | march 2008 in light of these pervasive bibliographic relationships, catalogers of music (again, with parallels in other subjects) have developed a distinctive approach to the marc metadata schema. in particular, they—with their colleagues in literature, fine arts, and theology—rely upon the 700t field for uniform work titles, and upon careful authority control.10 however, once again, many major ils portals have spotty records in affording access to library collections via these data. innovative interfaces’ millennium, though it clearly leads other major library products in this market, frequently frustrates music librarians (it is, of course, not alone in doing so).11 its automatic authority control feature works poorly with (necessary) music authority records.12 and even though innovative has been one of the first vendors to add a database index to the 700t field, partly in response to concerns expressed to the company by the music librarians’ user group, millennium apparently does not allow for an appropriate level of follow-through on searching.13 an initial search by name of a major composer, for instance, yields a huge and cluttered result set containing all indexed 700t fields.14 the results do helpfully include the appropriate see also references, but those references disappear in a subsidiary (limited) search. in addition, the subsidiary display inexplicably changes to an unhelpful arrangement of generic 245 fields (“mozart, symphonies”; “mozart, operas, excerpts”). similar challenges will be faced by other parts of an academic or large public library collection, including the literature collections (for works such as shakespeare’s plays), fine arts (for images and artists’ works), and theology (for works whose uniform title is in latin). the opac interfaces of other major ils vendors fare little better. the same search (for “mozart”) on the emory university library catalog (with an ils by sirsidynix), similarly yields a rich results set of more than one thousand records, and poses similar problems in refining the search.15 in the case of this opac, an index of 700t fields also exists, but it only may be searched from the inside of a single record; as with millennium, sirsidynix’s interface will then group the next set of results confusingly by 245 fields. the library corporation’s carl-x apparently does not contain a 700t index; the simple “mozart” search returns a muchsimplified set of only 97 results organized by 245a fields, and thus offers a more concise set of results but avoids the most incisive index for audio-visual materials.16 ex libris offers a somewhat more helpful display of its more restricted results; unfortunately for the present comparison, though the detailed results set does list the “format” of all mozart-authored items, the same term— “music”—is used for sound recordings, musical scores, and score excerpts, with no attempt logically to group the results around individual works.17 no 700t index appears present. ■ the frbr paradigm: review of literature and theory from the earliest library catalogs in the modern age, the tools of bibliographic organization have sought to afford users both access to the collection and collocation of related materials. anglo-american cataloging practice has traditionally served the first function by main entries and alternate access points and the second function by classification systems. however, as knowledge increases in scope and complexity, the systems of bibliographic control have needed to evolve. as early as the 1950s, theories were developing that sought to distinguish between the intellectual content of a work, and its often manifold physical embodiments.18 the 1961 paris international conference on cataloging principles first reified within the cataloging community a work-item distinction, though even the 1988 publication of the anglo-american cataloging rules, 2nd ed., “continued to demonstrate confusion about the nature . . . of works.”19 meanwhile, extensive research into the nature of bibliographic relationships groped toward a consensus definition of the entity-types that could encompass such relationships.20 ed o’neill and diane vizine-goetz examined some one hundred editions of smollett’s the expedition of humphrey clinker over a two-hundred-year span of publication history to propose a hierarchical set of definitions to define entity levels.21 the theoretical entities include the intellectual content of a work—which in the case of audio-visual works, may not even exist in any printed formats—the various versions, editions, and printings in which that intellectual content manifests itself, and the specific copies of each manifestation which a library may hold.22 research has discovered such clusters of bibliographically related entities for as much as 50 percent or more of all the intellectual works in any given library catalog, and as many as 85 percent of the works in a music catalog.23 this work laid the foundation for frbr (and, once again, incidentally underscored the breadth of its applicability to, and beyond, music catalogs). the theoretical framework of frbr is most concisely set forth in the final report of the ifla study group. the long-awaited publication traces its genesis to the 1990 stockholm seminar, and the resultant 1992 founding of the ilfa study group on functional requirements for bibliographic records. the study group set out to develop: a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities . . . a conceptual model that would serve as the basis for relating specific attributes and relationships . . . to the various tasks that users perform when consulting bibliographic records. article title | author 25frbrization of a library catalog | dickey 25 the study makes no a priori assumptions about the bibliographic record itself, either in terms of content or structure.24 in other words, the intention of the group’s deliberations and the final report is to present a model for understanding bibliographic entities and the relationships between them to support information organization tools. it specifically adopts an approach that defines classes of entities based upon how users, rather than catalogers, approach bibliographic records—or, by natural extension, any system of metadata. the frbr hierarchical entities comprise a fourfold set of definitions: ■ work: “a distinct intellectual or artistic creation”; ■ expression: “the intellectual or artistic realization of a work” in any combination of forms (including editions, arrangements, adaptations, translations, performances, etc.); ■ manifestation: “the physical embodiment of an expression of a work”; and ■ item: “a single exemplar of a manifestation.”25 examples of these hierarchical levels abound in the bibliographic universe, but frequently music offers the quickest examples: ■ work: mozart’s die zauberflöte (the magic flute) ■ work: puccini’s la bohéme ■ expression: the composer’s complete musical score (1896) ■ manifestation: edition of the score printed by ricordi in 1897 ■ expression: an english language edition for piano and voices ■ expression: a performance by mirella freni, luciano pavarotti, and the berlin philharmonic orchestra (october 1972) ■ manifestation: a recording of this perfor mance released on 33¹/³ rpm sound discs in 1972 by london records ■ manifestation: a re-release of the same per formance on compact disc in 1987 by london records ■ item: the copy of the compact disc held by the columbus metropolitan library ■ item: the copy of the compact disc held by the university of cincinnati in fact, lis research has tended to demonstrate what music librarians have always understood—that relatedness among items and complexity of families is most prevalent in audio-visual collections. even before the ifla report had been penned, sherry vellucci had set out the task: “to create new catalog structures that better serve the needs of the music user community, it is important first to understand the exact nature and complexity of the materials to be described in the catalog.”26 even limiting herself to musical scores alone (that is, no recordings or monographs), vellucci found that more than 94.8 percent of her sample exhibited at least one bibliographic relationship with another entity in the collection; she further related this finding to the very “inherent nature of music, which requires performance for its aural realization,” as opposed to, for example, monographic book printing.27 vellucci and others have frequently commented on how the relatedness of manifestations—in different formats, arrangements, and abridgements—of musical works continues to be a problem for information retrieval in the world of music bibliography.28 musical works have been variously and industriously described by musicologists and music bibliographers. yet, in the information retrieval domain [and, i might add, under both aacr and aacr2] . . . systems for bibliographic information retrieval . . . have been designed with the document as the key entity, and works have been dismissed as too abstract . . .29 the work is the access point many users will bring—in their minds, and thus in their queries—to a system. they intend, however, to discover, identify, and obtain specific manifestations of that work. very recently, research has begun to demonstrate that the frbr model can offer specific advantages to music retrieval in cases such as these: “the description of bibliographic data in a frbr-based database leads to less redundancy and a clearer presentation of the relationships which are implicit in the traditional databases found in libraries today.”30 explorations of the theory in view of the benefits to other disciplines, such as audio-visual and other graphic materials, maps, oral literature, and rare books, have appeared in the literature as well.31 the admitted weakness of the frbr theory, of course, is that it remains a theory at its inception, with still preciously few working applications. ■ frbr applications working implementations of frbr to catalogs, opacs, and ilss are still relatively few but promise much for the future. the frbr theoretical framework has remained an area of intense research at oclc, which has even led to some prototype applications and, very recently, deployment in the worldcat local interface.32 a scattered few other researchers have crafted frbr catalogs and catalog displays for their own ends; the library of congress has a prototype as well. innovative, the leading academic ils vendor, announced a frbr feature for 2005 release, 26 information technology and libraries | march 200826 information technology and libraries | march 2008 yet shelved the project for lack of a beta-testing partner library.33 ex libris’ primo discovery tool, one other complete ils (by visionary technologies for library systems, or vtls), and the national library of australia, have each deployed operational frbr applications.34 the number of projects testifies to the high level of interest among the cataloging and information science communities, while the relatively small number of successful applications testifies to the difficulties faced. oclc has engaged in a number of research projects and prototypes in order to explore ways that frbrization of bibliographic records could enhance information access. oclc research frequently notes the potential streamlining of library cataloging by frbrization; in addition they have experienced “superior presentation” and “more intuitive clustering” of search results when the model is incorporated into systems.35 work-level definitions stand behind such oclc research prototypes as audience level, dewey browser, fictionfinder, xisbn, and live search. in every case, researchers determined that, though it was very difficult to automate any identification of expressions, application of work-level categories both simplifies and improves search result sets.36 an algorithm common to several of these applications is freely available as an open source application, and now as a public interface option in oclc’s worldcat local.37 the algorithm creates an author/title key to cluster worksets (often at a higher level than the frbr work, as in the case of the two distinct works that are the book and screenplay for gone with the wind). in the public search interface, the results sets may be grouped at the work level; users may then execute a more granular search for “all editions,” an option that then displays the group of expressions linked to the work record. unfortunately, as the software does not use 700t fields (its intention is to travel up the entity hierarchy, and it uses the 1xx, 24x, and 130 fields), its usefulness in solving the above challenges may not be immediate. a somewhat similar application (though merrilee proffitt declares it not to be a frbr product) was redlightgreen, a user interface for the exrlg union catalog based upon quasi-frbr clustering.38 the reports from designers of other automated systems offer interesting commentaries on the process. the team building an automatically frbrized database and user interface for austlit—a new union collection of australian literature among eight academic libraries and the national library of australia—acknowledged some difficulty with non-monographic works such as poems, though the majority of their database consisted of simpler work-manifestation pairs.39 based on strongly positive user feedback (“the presentation of information about related works [is] both useful and comprehensible”), a similar application was attempted on the australian national music gateway musicaustralia; it is unclear whether the project was shelved due to difficulties in automating the frbrization process.40 one recent application created for the perseus digital library adopts a somewhat different approach.41 rather than altering previously created marc records to allow hierarchical relationships to surface, this team created new records using crosswalks between marc and, for instance, mods, for work-level records. they claim some moderate level of success; though once again, their discussion of the process is more illuminating than their product. mimno and crane successfully allowed a single manifestation-level record to link upwards to many expressions, a necessary analytic feature especially for dealing with sound recordings. they did practically demonstrate the difficulty of searching elements from different levels of the hierarchy at the same time (such as work title and translator), a complication predicted by yee.42 three ils vendors have released products that use the frbr model: portia (visualcat), ex libris (primo), and vtls (virtua).43 the first product, a cataloging utility from a smaller player in the vendor market, claims to incorporate frbr into its metadata capture, yet the information available does not explain how, nor do they offer an opac to exploit it. the 2007 release of ex libris’ primo offers what the company calls “frbr groupings” of results.44 this discovery tool is not itself an ils, but promises to interoperate with major existing ils products to consolidate search results. it remains unclear at this time how ex libris’ “standard frbr algorithms” actually group records; the single deployment in the danish royal library allows searching for more records with the same title, for instance, but does not distinguish between translations of the same work.45 vtls, on the other hand, has since 2004 offered a complete product that has the potential to modify existing marc records—via local linking tags in the 001 and 004 fields—to create frbr relationships.46 their own studies agreed with oclc that a subset, roughly 18 percent, of existing catalog records (most heavily concentrated in music collections) would benefit from the process, and they thus allow for “mixed” catalogs, with only subsets (or even individually selected records) to be frbrized. the company’s own information suggests relatively simple implementation by library catalogers, coupled with robust functionality for users, and may be the leading edge of the next generation of catalog products. ■ frbr solutions the ilfa study group, following its user-centered approach, set out a list of specific tasks that users of a computer-aided catalog should be able to accomplish: article title | author 27frbrization of a library catalog | dickey 27 ■ to find all manifestations embodying certain criteria, or to find a specific manifestation given identifying information about it; ■ to identify a work, and to identify expressions and manifestations of that work; ■ to select among works, among expressions, and among manifestations; and ■ to obtain a particular manifestation once selected. it seems clear that the frbr model offers a framework of relationships that can aid each task. unfortunately, none of the currently available commercial solutions may be in themselves completely applicable for a single library. the oclc work-set algorithm is open source, as well as easily available through worldcat local, but it only works to create super-work records; it also ignores the 700t field so crucial to many of the issues noted above. none of the other home-grown applications may have code available to an institution. the virtua module from vtls offers a very tempting solution, but may require a change of vendor.47 either adapting one of these solutions or designing a local application, then, raises the question: what would the ideal system entail? catalog frbrization will transpire in two segments: enhancing the existing catalog to add bibliographic relationships to surface in the retrieval phase, and designing or adaptating a new interface and display to reflect the relationships.48 the first task may prove the more formidable, due to the size of even a modest catalog database and the difficulties often observed in automating such a task; while the librarians constructing the austlit system found a relatively high percentage of records could be transferred en masse, the oclc research team had difficulty automatically pinpointing expressions from current marc records.49 despite current technology trends toward users’ application of tags, reviews, and other metadata, a task as specialized as adding bibliographic relationships to the catalog demands specialized cataloging professionals.50 the best approach within a current library structure may be to create a single new position to head the project and to act as liaison with cataloging staff in the various branches and with vendor staff, if applicable. each library branch may judge on its own the proportions of records to frbrize, beginning with high-traffic works and authors, those for whom search results tend to be the most overwhelming and confusing to users. each branch can be responsible for allocation of cataloging staff effort to the process, and will thus have specialist oversight of subsets of the database. three technical solutions to actually changing the database structure have been attempted in the literature to date: incrementally improving the existing marc records to better reflect bibliographic relationships, adding local linking tags, and simply creating new metadata schemas. the vtls solution of adding local linking tags seems most appropriate; relationships between records are created and maintained via unique identifiers and linking statements in the 001 and 004 fields.51 oclc’s open source software could expedite the creation of work-level records, and the creation of expression-level records will be made easier by the large amount of bibliographic information already present in the current catalog. wherever possible, cataloging staff also should take the opportunity to verify or create links to authority files so as to enhance retrieval.52 creating a new catalog display option could be accomplished via additions to current opac coding, either by adopting worldcat local or by designing parts of a new local interface. it need not even require a complete revision; the single site (ucl) currently deploying vtls’ frbrized interface maintains a mixed catalog and offers, once again, a highly intuitive model.53 when a searcher comes across a bibliographic record for which frbr linking is available, they may click a link to open a new display screen. we should strive, however, to use simple interface statements such as “view all different kinds of holdings,” “this work has x editions, in y languages” or “this version of the work has been published z times” (both the oclc prototype and the austlit gateway offer such helpful and user-friendly statements). though the foundational work of both tillett and smiraglia focused upon taxonomies of relationships, the hierarchical structure of the ifla proposal should remain at the forefront of the display, with a secondary organization by type of relationship or type of entity. rather than adopting a design which automatically refreshes at each click, a tree organization of the display should be more user-friendly, allowing users to maintain a visual sense of the organization that they are encountering (see appendix for screenshots of this type of tree display).54 format information should be included in the display, as an indication of a users’ primary category, as well as a distinction among expressions of a work. with these changes, the library catalog will begin to afford its users better access to many of its core collections. frbrization of even part of the catalog—concentrating on high-incidence authors, as identified by subject specialists—will allow it better to reflect, and collocate, items within the families of bibliographic relationships that have been acknowledged a part of library collections for decades. this increased collocation will begin to counteract the pitfalls of mere keyword searching on the part of users, especially in conjunction with renewed authority work. finally, frbr offers a display option in a revamped opac that is at the same time simpler than current result lists, and more elegant in its reflection of relatedness among items. each feature should better 28 information technology and libraries | march 200828 information technology and libraries | march 2008 enable the users of our catalog to find, select, and obtain appropriate resources, and will bring our libraries into the next generation of cataloging practice. references and notes 1. ifla committee on the functional requirements for bibliographic records, final report (munich: k. g. saur, 1998); see also http://www.ifla.org/vii/s13/wgfrbr/bibliography.htm (accessed mar. 10, 2007). 2. this paper began as a graduate research assignment for lis 60640 (library automation), in the kent state university mlis program, march 19, 2007. my thanks to jennifer hambrick, nancy lensenmayer, and joan lippincott, for their helpful comments on earlier drafts. the curricular assignment asked for a library automation proposal in a specific library setting; the original review contained a set of recommendations concerning frbr through the lens of a (fictional) medium-sized academic library system, that of st. hildegard of bingen catholic university. as will be noted below, the branch music library typically serves a small population of music majors (graduate and undergraduate) within such an institution, but also a large portion of the student body that use the library’s collection to support their music coursework and arts distribution requirements. any music library’s proportion of the overall system’s holdings may be relatively small, but will include materials in a diverse set of formats: monographs, serials, musical scores, sound recordings in several formats (cassette tapes, lps, cds, and streaming audio files), and a growing collection of video recordings, likewise in several formats (vhs, laser discs, and dvd). it thus offers an early test case for difficulties with an automated library system. 3. dan zager, “collection development and management,” notes—quarterly journal of the music library association 56, no. 3 (march 2000): 569. 4. sherry l. velluci, “music metadata and authority control in an international context,” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541. 5. the opac for the university of huddersfield library system famously first deployed a search option for related items (“did you mean . . . ?”); http://www.hud.ac.uk/cls (accessed july 10, 2007). frbr not only offers the related item search, but also logically groups related works throughout the library catalog. 6. allyson carlyle demonstrated empirically that users value an object’s format as one of the first distinguishing features: “user categorization of works: toward improved organization of online catalog displays,” journal of documentation 55, no. 2 (mar. 1999): 184–208 at 197. 7. millennium will feature heavily in the following discussion, both because of its position leading the academic library automation market (being adopted wholesale by, for instance, the ohio statewide academic library consortium), and because it was the subject of the original paper. 8. see alastair boyd, “the worst of both worlds: how old rules and new interfaces hinder access to music,” caml review 33, no. 3 (nov. 2005), http://www.yorku.ca/caml/ review/33-3/both_worlds.htm (accessed mar. 12, 2007); michael gorman and paul w. winkler, eds., anglo-american cataloging rules, 2nd ed. (chicago: ala, 1988). 9. in the past few years, a small subset of the search literature has described technical efforts to develop search engines that can query by musical example; see j. stephen downie, “the scientific evaluation of music information retrieval systems: foundations and future,” computer music journal 28, no. 2 (summer 2004): 12–23. a company called melodis corporation has recently announced a successful launch of a query-by-humming search engine, though a verdict from the music community remains out; http://www.midomi.com (accessed jan. 31, 2007). 10. see velluci, “music metadata and authority control in an international context”; richard p. smiraglia, “uniform titles for music: an exercise in collocating works,” cataloging and classification quarterly 9, no. 3 (1989): 97–114; steven h. wright, “music librarianship at the turn of the century: technology,” notes—quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. each author builds upon the foundational work of barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss., university of california at los angeles, 1987). 11. “at conferences, [my colleagues] are always groaning if they are a voyager client,” interview with an academic music librarian by the author, feb. 9, 2007. 12. several prominent music librarians only discovered that innovative’s system had such a feature when instances of the automatic system’s changing carefully crafted music authority records were discovered; mark sharff (washington university in st. louis) and deborah pierce (university of washington), postings to innovative music users’ group electronic discussion list, oct. 6, 2006, archive accessed feb. 1, 2007. 13. music librarians are the only subset of the millennium users to have formed their own innovate users’ group. sirsidynix has a separate users’ group for stm librarians, and ex libris hosts a law librarians’ users’ group, two other groups whose interaction with the ils poses discipline-specific challenges. 14. searches were tested on the the ohio state university libraries’ opac , http://library.osu.edu (accessed mar. 10, 2007). 15. http://www.emory.edu/libraries.cfm (accessed june 27, 2007). 16. searches performed on the library of oklahoma state university, http://www.library.okstate.edu (accessed june 27, 2007); tlc has considered making frbrization a possible feature of their product. they offer some concatenation of “intellectually similar bibliographic records,” and “tlc continues to monitor emerging frbr standards”; don kaiser, personal communication to the author, july 8, 2007. i was unable to reach representatives of sirsidynix on this issue. 17. searches performed on the mit library catalog, powered by aleph 500 http://libraries.mit.edu (accessed june 27, 2007). 18. eva verona, “literary unit versus bibliographic unit [1959],” in foundations of descriptive cataloging, ed. michael carpenter and elaine svenonius, 155–75 (littleton, colo.: libraries unlimited, 1985), and seymour lubetzky, principles of cataloging, final report phase i: descriptive cataloging (los angeles: institute for library research, 1969), are usually credited with article title | author 29frbrization of a library catalog | dickey 29 the foundational work on such theories; see richard p. smiraglia, the nature of “a work”: implications for the organization of knowledge (lanham, md.: scarecrow, 2001), 15–33, to whom the following overview is indebted. 19. anglo-american cataloging rules, cited in smiraglia, the nature of “a work,” 33. 20. among the many library and information science thinkers contributing to this body of research, the most prominent have been patrick wilson, “the second objective” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 5–16 (san diego: academic publ., 1989); edward t. o’neill and diane vizine-goetz, “bibliographic relationships: implications for the function of the catalog,” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 167–79 (san diego: academic publ., 1989); barbara ann tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss, university of california, los angeles, 1987); eadem, “bibliographic relationships,” in relationships in the organization of knowledge, carol a. bean and rebecca green, eds. , 19–35 (dordrecht: kluwer, 2001) (summary of her dissertation findings on 19–20); martha m. yee, “manifestations and near-equivalents: theory with special attention to moving-image materials,” library resources and technical services 38, no. 3 (1994): 227–55. 21. o’neill and vizine-goetz, “bibliographic relationships”; see also edward t. o’neill, “frbr: application of the entityrelationship model to humphrey clinker,” library resources and technical services 46, no. 4 (oct. 2002): 150–59. 22. theorists in music semiotics who have more or less profoundly influenced music librarians’ view of their materials include jean-jacques nattiez, music and discourse: toward a semiology of music, trans. by carolyn abbate (princeton, n.j.: princeton univ. pr., 1990), and lydia goehr, the imaginary museum of musical works (new york: oxford univ. pr., 1992). see also smiraglia, the nature of “a work,” 64. for a concise overview of how semiotic theory has influenced thinking about literary texts, see w. c. greetham, theories of the text (oxford: oxford univ. pr., 1999), 276–325. 23. studies have found families of derivative bibliographic relationships in 30.2 percent of all worldcat records, 49.9 percent of records in the catalog of georgetown university library, 52.9 percent in the burke theological library (union theological seminary), 57.9 percent of theological works in the new york university library, and 85.4 percent in the sibley music library at the eastman school of music (university of rochester). see smiraglia, the nature of “a work,” 87, who cites richard p. smiraglia and gregory h. leazer, “derivative bibliographic relationships: the work relationship in a global bibliographic database,” journal of the american society for information science 50 (1999): 493–504; richard p. smiraglia, “authority control and the extent of derivative bibliographic relationships” (ph.d. diss., university of chicago, 1992); richard p. smiraglia, “derivative bibliographic relationships among theological works,” proceedings of the 62nd annual meeting of the american society for information science (medford, n.j.: information today, 1999): 497–506; and sherry l. vellucci, “bibliographic relationships among musical bibliographic entities: a conceptual analysis of music represented in a library catalog with a taxonomy of the relationships” (d.l.s. diss., columbia university, 1994). 24. ifla, final report, 2–3. 25. ibid, 16–23. 26. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md.: scarecrow, 1997), 1. 27. ibid, 238; 251. 28. vellucci, “music metadata”; richard p. smiraglia, “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002). patrick le boeuf notes that users of music collections often use the single word “score” to indicate any one of the four frbr entities; “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco,” in functional requirements for bibliographic records (frbr): hype or cure-all? ed. patrick le boeuf, 103–23 at 105–06 (new york: haworth, 2005). 29. smiraglia, “musical works and information retrieval,” 2. 30. marte brenne, “storage and retrieval of musical documents in a frbr-based library catalogue” (masters’ thesis, oslo university college, 2004), 79. see also john anderies, “enhancing library catalogs for music,” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; powerpoint presentation accessed mar. 12, 2007, from http://academics. hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt; boyd, “the worst of both worlds.” 31. see the extensive bibliography compiled by ifla, cataloging division: “frbr bibliography,” http://www.ifla.org/ vii/s13/wgfrbr.bibliography.htm (accessed mar. 10, 2007). 32. the first ils deployment of the worldcat local application using frbr is with the university of washington libraries: http://www.lib.washington.edu (accessed june 27, 2007). 33. innovative interfaces, inc., “millennium 2005 preview: frbr support,” inn-touch (june 2004), 9. interestingly, the onepage advertisement for the new service chose a musical work, puccini’s opera la bohème, to illustrate how the sorting would work. innovative interfaces booth staff at the ala national conference, washington, d.c., june 24, 2007, told the author the company has moved in a different development direction now (investing more heavily in faceted browsing). 34. denmark’s det kongelige bibliotek has been the first ex libris partner library to deploy primo, http://www.kb.dk/en (accessed july 10, 2007). the vtls system has been operating since 2004 at the université catholique de louvain, http:// www.bib.ucl.ac.be (accessed mar. 15, 2007). for austlit, see http://www.austlit.edu.au (accessed mar. 14, 2007). 35. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. work-level records allow manifestation and item records to inherit labor-intensive subject classification metadata; eric childress, “frbr and oclc research,” paper presented at the university of north carolina-chapel hill, apr. 10, 2006, http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). 36. thomas b. hickey, edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002), http://www.dlib.org/dlib/september02/hickey/09hickey.html (accessed mar. 12, 2007). 37. thomas b. hickey and jenny toves, “frbr work-set algorithm,” apr. 2005 report, http://www.oclc.org/research/ projects/frbr/default.htm (accessed mar. 12, 2007); algorithm 30 information technology and libraries | march 200830 information technology and libraries | march 2008 available at http://www.oclc.org/research/projects/frbr/algorithm.htm. on worldcat local, see above, note 32. 38. merrilee proffitt, “redlightgreen: frbr between a rock and a hard place,” http://www.ala.org/ala/alcts/alctsconted/ presentations/proffitt.pdf (accessed mar. 12, 2007). redlight green has been discontinued, and some of its technology incorporated into worldcat local. 39. http://www.austlit.edu.au (accessed mar. 14, 2007), but unfortunately a subscription database at this time, and thus unavailable for operational comparison. see marie-louise ayres, “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia,” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54, http:// www.nla.gov.au/nla/staffpaper/2005/ayres1.html (accessed mar. 12, 2007). 40. ibid. 41. see david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/ crane/10crane.html (accessed mar. 12, 2007). 42. ibid. see also martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 3 (2005): 77–95, http://repositories.cdlib.org/postprints/715 (accessed mar. 12, 2007). 43. portia, “visualcat overview,” http://www.portia.dk/ pubs/visualcat/present/visualcatoverview20050607.pdf (accessed mar. 14, 2007); vtls, inc., “virtua,” http://www.vtls. com/brochures/virtua.pdf (accessed mar. 14, 2007). 44. http://www.exlibrisgroup.com/primo_orig.htm (accessed july 10, 2007). 45. syed ahmed, personal communication to the author, july 10, 2007; searches run july 10, 2007, on http://www.kb.dk/en. the library’s holdings of manifestations of mozart’s singspiel opera, the magic flute, run to four different groupings on this catalog: one under the title “die zauberflöte,” one under the title “la flute enchantée: opéra fantastique en 4 actes,” and two separate groups under the title “tryllefløtjen.” 46. “vtls announces first production use of frbr,” http:// www.vtls.com/corporate/releases/2004/6.shtml (accessed mar. 14, 2007). unfortunately, though this press release indicates commitments on the part of the université catholique de louvain and vaughan public libraries (ontario, canada) to use fully frbrized catalogs, only the first is operating in this mode as of july 2007, and with only a subset of its catalog adapted. 47. virtua is not interoperable, for instance, with any of innovative’s other ils modules, which continue to dominate a number of larger academic consortia; john espley, vtls inc. director of design, personal communication to the author, mar. 15, 2007. 48. see allyson carlyle, “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. 49. even at the work-level, yee distinguished fully eight different places in a marc record in which the identity of a work may be located, “frbrization,” 79–80. 50. gregory leazer and richard p. smiraglia imply that cataloger-based “maps” of bibliographic relationships are inadequate; “bibliographic families in the library catalog: a qualitative analysis and grounded theory,” library resources and technical services 43, no. 4 (1999): 191–212. the cataloging failures they describe, however, are more a result of inadequacies in the current rules and practice, and do not really prove that catalogers have failed in the task of creating useful systems. 51. vinood chacra and john espley, “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval,” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). 52. see yee, “frbrization.” 53. http://www.bib.ucl.ac.be (accessed mar. 15, 2007). 54. not only does the ex libris primo application need clickthroughs, it creates a new window for an extra step before presenting a new group of records. bibliography anderies, john. “enhancing library catalogs for music.” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; http://academics.hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt (accessed mar. 12, 2007). ayres, marie-louise. “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia.” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54; http://www.nla.gov.au/nla/staffpaper/2005/ ayres1.html (accessed mar. 12, 2007). bennett, rick, brian f. lavoie, and edward t. o’neill. “the concept of a work in worldcat: an application of frbr.” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. boyd, alistair. “the worst of both worlds: how old rules and new interfaces hinder access to music.” caml review 33, no. 3 (nov. 2005); http://www.yorku.ca/caml/review/33-3/ both_worlds.htm (accessed mar. 12, 2007). brenne, marte. “storage and retrieval of musical documents in a frbr-based library catalogue.” masters’ thesis, oslo university college, 2004. carlyle, allyson. “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. ______. “user categorization of works: toward improved organization of online catalog displays.” journal of documentation 55, no. 2 (mar. 1999): 184–208 chacra, vinood, and john espley. “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval.” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). childress, eric. “frbr and oclc research.” paper presented at the university of north carolina-chapel hill, apr. 10, 2006; http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). hickey, thomas b., and edward o’neill. “frbrizing oclc’s worldcat.” in functional requirements for bibliographic records article title | author 31frbrization of a library catalog | dickey 31 (frbr): hype or cure-all? ed. patrick le boeuf, 239-251. new york: haworth, 2005. hickey, thomas b., and jenny toves. “frbr work-set algorithm.” apr. 2005 report; http://www.oclc.org/research/ frbr (accessed mar. 12, 2007). hickey, thomas b., edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002); http://www.dlib.org/dlib/september02/hickey/09hickey. html (accessed mar. 12, 2007). ifla study group on the functional requirements for bibliographic records. functional requirements for bibliographic records: final report. munich: k. g. saur, 1998. layne, sara shatford. “subject access to art images.” in introduction to art image access: issues, tools, standards, strategies, murtha baca, ed., 1–18. los angeles: getty research institute, 2002. leazer, gregory, and richard p. smiraglia. “bibliographic families in the library catalog: a qualitative analysis and grounded theory.” library resources and technical services 43, no. 4 (1999): 191–212. le boeuf, patrick. “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco.” in functional requirements for bibliographic records (frbr): hype or cure-all? patrick le boeuf, ed., 103–23 new york: haworth, 2005. markey, karen. subject access to visual resources collections: a model for computer construction of thematic catalogs. new york: greenwood, 1986. mimno, david, and gregory crane. “hierarchical catalog records: implementing a frbr catalog.” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/crane/10crane. html (accessed mar. 12, 2007). o’neill, edward t. “frbr: application of the entity-relationship model to humphrey clinker.” library resources and technical services 46, no. 4 (oct. 2002): 150–59. o’neill, edward t., and diane vizine-goetz. “bibliographic relationships: implications for the function of the catalog.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 167–79. san diego: academic publ., 1989. proffitt, merrilee. “redlightgreen: frbr between a rock and a hard place.” paper presented at the 2004 ala annual conference, orlando, fla.; http://www.ala.org/ala/alcts/alctsconted/presentations/proffitt.pdf (accessed mar. 12, 2007). smiraglia, richard p. bibliographic control of music, 1897–2000. lanham, md.: scarecrow and music library association, 2006. ______. “content metadata: an analysis of etruscan artifacts in a museum of archaeology.” cataloging and classification quarterly, 40, no. 3/4 (2005): 135–51. ______. “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002): 747–64. ______. the nature of “a work”: implications for the organization of knowledge. lanham, md.: scarecrow, 2001. ______. “uniform titles for music: an exercise in collocating works.” cataloging and classification quarterly 9, no. 3 (1989): 97–114. tillett, barbara ann. “bibliographic relationships.” in relationships in the organization of knowledge. carol a. bean and rebecca green, eds., 19–35. dordrecht: kluwer, 2001. vellucci, sherry l. bibliographic relationships in music catalogs. lanham, md.: scarecrow, 1997. ______. “music metadata and authority control in an international context.” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541–54. wilson, patrick. “the second objective.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 5–16. san diego: academic publ., 1989. wright, h. s. “music librarianship at the turn of the century: technology.” notes: quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. yee, martha m. “frbrization: a method for turning online public finding lists into online public catalogs.” information technology and libraries 24, no. 3 (2005): 77–95; http://repositories.cdlib.org/postprints/713 (accessed mar. 12, 2007). ______. “manifestations and near-equivalents: theory with special attention to moving-image materials.” library resources and technical services 38, no. 3 (1994): 227–55. zager, daniel. “collection development and management.” notes: quarterly journal of the music library association 56, no. 3 (2000): 567–73. 32 information technology and libraries | march 200832 information technology and libraries | march 2008 a search on also sprach zarathustra on the online public access catalog for the universite catholique de louvain, with results frbrized. (a vtls opac). selecting the first work yields the following screen: . . . which, when frbrized, yields a list of expressions. any part of the tree may be expanded, to display manifestations, and item-level records follow. appendix: examples of a frbrized tree display evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results articles evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results sam grabus information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12235 sam grabus (smg383@drexel.edu) is an information science phd candidate at drexel university’s college of computing and informatics, and research assistant at drexel’s metadata research center. this article is the 2020 winner of the lita/ex libris student writing award. © 2020. abstract this research compares automatic subject metadata generation when the pre-1800s long-s character is corrected to a standard < s >. the test environment includes entries from the third edition of the encyclopedia britannica, and the hive automatic subject indexing tool. a comparative study of metadata generated before and after correction of the long-s demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the long-s is not corrected. results confirm that correcting the long-s increases the availability of terms that can be used for creating quality metadata records. a relationship is also demonstrated between shorter entries and an increase in omitted terms when the long-s is not corrected. introduction the creation of subject metadata for individual documents is long known to support standardized resource discovery and analysis by identifying and connecting resources with similar aboutness .1 in order to address the challenges of scale, automatic or semi-automatic indexing is frequently employed for the generation of subject metadata, particularly for academic articles, where the abstract and title can be used as surrogates in place of indexing the full text. when automatically generating subject metadata for historical humanities full texts that do not have an abstract, anachronistic typographical challenges may arise. one key challenge is that presented by the historical “long-s” < ſ >. in order to account for these idiosyncrasies, there is a need to understand the impact that they have upon the automatic subject indexing output. addressing this challenge will help librarians and information professionals to determine whether or not they will need to correct the long-s when automatically generating subject metadata for full-text pre-1800s documents. the problem of the long-s in optical character recognition (ocr) for digital manuscript images has been discussed for decades.2 many scholars have researched methods for correcting the longs through the use of rule-based algorithms or dictionaries.3 while the problem of the long-s is well-known in the digital humanities community, automatic subject metadata generation for a large corpus of pre-1800s documents is rare, as is research about the application and evaluation of existing automatic subject metadata generation tools on 18th-century documents in real-world information environments. the impact of the long-s upon automatic subject metadata generation results for pre-1800s texts has not been extensively explored. the research presented in this paper addresses this need. the paper reports results from basic statistical analysis and visualization using the helping interdisciplinary vocabulary engineering (hive) tool automatic mailto:smg383@drexel.edu information technology and libraries september 2020 evaluating the impact of the long-s | grabus 2 subject indexing results, before and after the correction of the historical long-s in the 3rd edition of the encyclopedia britannica. background work was conducted over the summer and fall of 2019, and the research presented was conducted during winter 2020. the work was motivated by current work on the “developing the data set of nineteenth-century knowledge” project, a national endowment for the humanities collaborative project between temple university’s digital scholarship center and drexel university’s metadata research center. the grant is part of a larger project, temple university’s “19th-century knowledge project,” which is digitizing four historical editions of the encyclopedia britannica.4 the next section of this paper presents background covering the historical encyclopedia britannica data, the automatic subject metadata generation tool used for this project, a brief background of “the long-s problem,” and the distribution of encyclopedia entry lengths in the 3rd edition. the background section will be followed by research objectives and method supporting the analysis. next, the results are presented, demonstrating prevalence of terms omitted from the automatic subject metadata generation results if the long-s is not corrected to a standard small < s > character, as well as the impact of encyclopedia entry length upon these results. the results are followed by a contextual discussion, and a conclusion that highlights key findings and identifies future research. background indexing for the 19th-century knowledge project the 19th-century knowledge project, an neh-funded initiative at temple university, is fully digitizing four historical editions of the encyclopedia britannica (the 3rd, 7th, 9th, and 11th). the long-term goal of the project is to analyze the evolving conceptualization of knowledge across the 19th century.5 the 3rd edition of the encyclopedia britannica (1797) is the earliest edition being digitized for this project. the 3rd edition consists of 18 volumes, with a total of 14,579 pages, and individual entries ranging from four to over 150,000 words. for each individual entry, researchers at temple have created individual tei-xml files from the ocr output. in order to enrich accessibility and analysis across this digital collection, the knowledge project will be adding controlled vocabulary subject headings into the tei headers of each encyclopedia entry xml file. considering the size of this corpus, both in terms of entry length and number of entries, automatic subject metadata generation will be required for the creation of this metadata. the knowledge project will employ controlled vocabularies to replace or complement naturally extracted keywords for this process. using controlled vocabularies adheres to metadata semantic interoperability best practices, ensures representation consistency, and helps to bypass linguistic idiosyncrasies of these 18th and 19th century primary source materials. 6 we selected two versions of the library of congress subject headings (lcsh) as the controlled vocabularies for this project. lcsh was selected due to its relational thesaurus structure, multidisciplinary nature, and continued prevalence in digital collections due to its expressiveness and status as the largest general indexing vocabulary.7 in addition to the headings from the 2018 edition of lcsh, headings from the 1910 lcsh are also implemented in order to provide a more multi-faceted representation, using temporally-relevant terms that may have been removed from the contemporary lcsh. the tool applied for this process is hive, a vocabulary server and automatic indexing application. 8 hive allows the user to upload a digital text or url, select one or more controlled vocabularies, and performs automatic subject indexing through the mapping of naturally extracted keywords to the available controlled vocabulary terms. hive was initially launched as an imls linked open information technology and libraries september 2020 evaluating the impact of the long-s | grabus 3 vocabulary and indexing demonstration project in 2009. since that time, hive has been further developed, with the addition of more controlled vocabularies, user interface options, and the rake keyword extraction algorithm. the rake keyword extraction algorithm has been selected for this project after a comparison of topic relevance precision scores for three keyword extraction algorithms.9 the long-s problem early in our metadata generation efforts, we discovered that the 3rd edition of the encyclopedia britannica employs the historical long-s. originating in early roman cursive script, the long-s was used in typesetting up through the 18th century, both with and without a left crossbar. by the end of the 18th century, the long-s fell out of use with printers.10 as outlined by lexicographers of the 17th and 18th centuries, the rules for using the long-s were frequently vague, complicated, inconsistent over time, and varied according to language (english, french, spanish, or italian). 11 these rules specified where in a word the long-s should be used instead of a short < s >, whether it is capitalized, where it may be used in proximity to apostrophes, hyphens, and the letters < f >, < b >, < h >, and < k >; and whether it is used as part of a compound word or abbreviation.12 this is further complicated by the inclusion of the half-crossbar, which occasionally results in two consequences: (a) the long-s may be interpreted by ocr as an < f >, and < b > and < f > may be interpreted by ocr as a long-s. figure 1 shows an example from the 3rd edition entry on russia, in which the original text specifies “of” (line 1 in top figure), yet the ocr output has interpreted the character as a long-s. the long-s may also occasionally be interpreted by the ocr as a lowercase < l >, such as the “univerlity of dublin” in the 3rd edition entry on robinson (the most rev sir richard). these complications and inconsistencies are challenges when developing python rules for correcting the long-s in an automated way, and even preexisting scripts will need to be adapted for individual use with a particular corpus. figure 1. example from the 3rd edition entry on russia, comparing the original use of a letter < f > in “of” to the ocr output of the same passage, which mistakenly interprets the character as a long-s. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 4 despite the transition away from the long-s towards the end of the 18th century, the 3rd edition of the encyclopedia britannica (published in 1797) implements the long-s throughout, with approximately 100,594 instances of the long-s in the ocr output. when performing metadata generation with the hive tool on the ocr output for an entry, the long-s is most often interpreted by the automatic metadata generation tool as an < f >, which can result in (a) inaccurate keyword extraction (e.g., russians→ ruffians), and (b) when mapping extracted keywords to controlled vocabulary terms, essential topics could be unidentifiable, and hive will subsequently omit them from the results because they cannot be mapped to controlled vocabulary terms. figure 2 provides a truncated view of long-s words in the 3rd edition entry on rum, which are subsequently removed from the pool of automatically extracted keywords when performing the automatic subject indexing sequence in hive. using keyword extraction algorithms that are largely dependent upon term frequencies, automatic subject indexing for an entry on rum may be substantially hindered when meaningful and frequently occurring words such as sugar, and yeast are removed. figure 2. examples of the long-s in the 3rd edition encyclopedia britannica entry on rum. using this example entry, the automatic subject indexing results were compared using python, to determine which terms only appear when the long-s has been corrected to the standard < s >. the comparison showed that 16 total terms no longer appeared in the results when the long-s was not corrected to a standard < s >: ten terms using the 2018 lcsh, and six terms using the 1910 lcsh. these omitted results included the terms sugar and yeast. the next section will discuss the encyclopedia entry word count for this corpus, and the possible impact that this may have upon automatic subject indexing between corrected and uncorrected long-s instances. encyclopedia entry lengths consistent with other encyclopedia britannica editions in the 18th and 19th centuries, the encyclopedia entries in the 3rd edition vary substantially in length. a convenience sample of 3,849 3rd edition entries ranging in length from 2 to 202,848 words demonstrated an arithmetic mean of information technology and libraries september 2020 evaluating the impact of the long-s | grabus 5 826.60, and a median word count of 71. as shown in figure 3, this indicates a significant skew towards shorter entry lengths. for the vast majority of encyclopedia entries in this corpus, a low total word count may impact the degree of long-s impact for automatic subject indexing results, given the importance of term availability and frequency for keyword extraction algorithms. figure 3. scatterplot of word count for a convenience sample of 3,849 3rd edition encyclopedia britannica entries. large-scale metadata generation requires time, labor, and resources, and it becomes more costly when accounting for the complications of correcting the long-s for a particular corpus. library and information professionals working with digital humanities resources will need to understand the impact of correcting or not corrected the long-s in the corpus before designating resources and developing a protocol for generating the automatic or semi-automatic metadata for full-text resources. this includes understanding whether or not the length of each individual document will affect the degree of long-s impact upon the results. this challenge, and issues reviewed above, are in the research presented below. objectives the overriding goal of this work is to determine the prevalence of omitted terms in automatic subject indexing results when the long-s is not corrected in the 3rd edition entries of the encyclopedia britannica. research questions: 1. what is the average number of terms that are omitted from automatic subject indexing results when the long-s is not corrected to a standard < s >? 2. how does the encyclopedia entry length affect the number of terms that are omitted when the long-s is not corrected to a standard < s >? this analysis will approach these goals by performing a comparative analysis of automatic subject indexing results to determine the number of terms that are omitted from the results when the long-s is not corrected to a standard letter < s >. basic descriptive statistics are generated to determine central tendency. the quantity of terms omitted are then compared with encyclopedia information technology and libraries september 2020 evaluating the impact of the long-s | grabus 6 entry word counts. these objectives were shaped by collaboration between drexel university’s metadata research center and temple university’s digital scholarship center. the next section of this paper will report on methods and steps taken to address these objectives. methods we approached this research by performing a comparative analysis of subject metadata generated both before and after the correction of the historical long-s in the 3rd edition of the encyclopedia britannica. the hive tool was used to automatically generate the subject metadata. descriptive statistics were applied, and visualizations produced from the results were also examined to identify trends. figure 4. the 30 encyclopedia britannica 3rd edition encyclopedia britannica entries randomly selected for this study, sorted in ascending order by their word counts. the protocol for performing this research involved the following steps: 1. compile a sample for testing: 1.1. a random sample of 30 encyclopedia entries was identified from a convenience sample of entries that comprise the letter s volumes of the 3rd edition. the entries range, in length, from 6 to 6,114 words. the median word count for entries in this sample is 99 words. 1.2. the sample of terms selected for this study and their respective word counts are visualized in figure 4. 1.3. for each entry, the long-s terms in the original xml file were extracted to a list. 2. perform automatic subject indexing sequence upon entries to generate lists of terms: 2.1. using the 2018 and 1910 versions of the lcsh. 2.2. with fixed maximum subject heading results set to 40: 20 maximum terms returned with the 2018 lcsh, and 20 maximum terms returned with the 1910 lcsh. 2.3. before long-s correction and after long-s correction, using the oxygen xml editor tei to txt transformation. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 7 3. perform outer join on python data frames, between terms generated when the long-s has been corrected vs. terms generated when the long-s has not been corrected. the resulting left outer join list displays terms that are omitted from the automatic indexing results if the long-s is not corrected to a standard small < s >. the quantity of terms omitted are recorded for comparison. 4. analysis: descriptive statistics were generated to determine central tendency for the number and percentage of words omitted when the long-s is not corrected. the quantity of terms omitted are also visualized in a continuous scatterplot with the corresponding word counts, to demonstrate that the quantity of terms omitted when the long-s is not corrected seems to relate to the length of the document being automatically classified. results the results report the prevalence of omitted terms when the long-s is not corrected to a standard < s >, as well as a visualization of the number of terms omitted as they relate to the encyclopedia entry length. for each of the 30 sample entries automatically indexed with hive, a fixed maximum number of 40 entries were returned: a maximum of 20 terms using the 2018 lcsh, and a maximum of 20 terms using the 1910 lcsh. as seen in table 1, central tendency is measured using the arithmetic mean and median, along with the standard deviation and range. the average number of terms omitted from an entry’s results is 6.73, and the average percentage of terms omitted from an entry’s results is 26.51 percent, with the 2018 and 1910 editions of lcsh performing at similar rates. the full results are displayed in appendix a. table 1. measures of centrality, standard deviation, range, and percentage for quantity of terms omitted when the long-s is not corrected to a standard < s >, rounded to the hundredth. for each entry, a maximum of 40 terms were returned: 20 using 2018 lcsh and 20 using 1910 lcsh. the total results returned varies according to the entry length. these totals are reported in appendix b. (n= 30 entries.) for each entry in the sample, the results in appendix a display the total words omitted when the long-s is not corrected, the number of 2018 lcsh terms omitted, the number of 1910 lcsh terms omitted, and the encyclopedia entry word count. figure 5 visualizes the total number of terms omitted for each entry when the long-s is not corrected, demonstrating an increase in terms omitted for entries with lower word counts. these results are broken down by vocabulary used in figure 6, demonstrating that both vocabularies used to generate these results indicate a significant increase in omitted terms for shorter entries. column1 both vocabularies 2018 lcsh 1910 lcsh average, terms omitted 6.73 3.67 3.07 median, terms omitted 5 3 2 standard deviation 6.53 3.84 3.17 range, terms omitted 0-24 0-13 0-11 average percentage, omitted terms 26.51% 27.51% 24.28% median percentage, omitted terms 22.36% 20.00% 19.09% information technology and libraries september 2020 evaluating the impact of the long-s | grabus 8 figure 5. number of automatic subject indexing terms that are omitted when the long-s is not corrected to a standard < s > as compared by encyclopedia entry word count. figure 6. number of automatic subject indexing terms that are omitted when the long-s is not corrected to a standard < s > as compared by encyclopedia entry word count, separated by controlled vocabulary version. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 9 discussion the analysis above presents measures of centrality for quantity of terms omitted if the long-s is not corrected to a standard < s > prior to automatic subject indexing using hive, as well as a visualization to represent the relationship between encyclopedia entry word count and number of terms omitted. although researchers have identified challenges with the long-s and have focused a great deal on the technologies and methods used to correct it, there is still limited work on looking at the results of not correcting the long-s character when performing an automatic subject indexing sequence. this research demonstrated an average of 6.73 potentially relevant terms omitted from automatic indexing results when the long-s is not corrected, accounting for an average of 26.51 percent of the total results, with an approximately equal distribution of omitted terms across the two controlled vocabulary versions used. when the quantity of terms omitted is visualized using a continuous scatterplot, the results also demonstrated a significant increase in omitted terms for shorter entries, with longer entries less affected. these results reflect the impact of term frequency and total word count in keyword extraction and automatic subject indexing, with longer documents having a greater pool of total terms from which to identify key terms. considering the complexities and similarities of the typographical characters in the original manuscript, the ocr output process for this corpus occasionally mistakes the letters < s >, < f >, < r >, and < l >. as a result, an occasional long-s word in this study did not originally contain an < s > (e.g., sor instead of for). correction of these long-s ocr errors requires the development of a dictionary-based script. an additional complication of this research is that the corrected ocr output for the encyclopedia entries still contains a few errors not related to the long-s, which will prevent the mapping of the term to any controlled vocabulary term (e.g., in the entry on sepulchre, the ocr output for the term palestine was palestinc). these results are specific to this particular corpus of 3rd edition encyclopedia britannica entries, but it is very likely that testing another set of pre-1800s documents containing the long-s would also illustrate that for best results with any algorithm or tool, the long-s needs to be corrected. the results are also specific to the two versions of the lcsh used, both the 1910 lcsh and the 2018 lcsh, which are available in the hive tool. the 1910 version is key for the time period being studied, and the 2018, more contemporary to today, has supported additional analysis on the impact of the long-s. both of these vocabularies are important to the larger 19th-century knowledge project. it should be noted that while the lcsh is updated weekly, we were limited to what is available via the hive tool, and any discrepancies that may be found with the 2020 lcsh will very likely have a minimal effect upon metadata generation results. it should be noted that the 2020 lcsh will be incorporated into hive soon and can be explored in future research. conclusion and next steps the objective of this research was to determine the impact of correcting the long-s in pre-1800s documents when performing an automatic metadata generation sequence using keyword extraction and controlled vocabulary mapping. this was accomplished by performing an automatic subject indexing sequence using the hive tool, followed by a basic statistical analysis to determine the quantity of terms omitted from the results when the long-s is not corrected to a standard < s >. the number of omitted terms was also compared with the encyclopedia entry word count and visualized to demonstrate a significant increase in omitted terms for shorter information technology and libraries september 2020 evaluating the impact of the long-s | grabus 10 encyclopedia entries. the study was conclusive in confirming that the correction of the long-s is a critical part of our workflow. the significance of this research is that it demonstrates the necessity of correcting the long-s prior to performing an automatic subject indexing on historical documents. beyond the correction of the long-s, the larger next steps for this project are to continue to explore automatic metadata generation for this corpus. these next steps include the comparison of results using contemporary vs. historical vocabularies and streamlining a protocol for bulk classification procedures and integration of terms into the tei-xml headers. the research presented here can inform other digital humanities and even science-oriented projects, where researchers may not be aware of the impact of the long-s on automatic metadata generation not only for subjects, but also named entities, particularly when automatic approaches with controlled vocabularies are desired. acknowledgements the author thanks dr. jane greenberg and dr. peter logan for their guidance. the author acknowledges the support of the neh grant #haa-261228-18. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 11 appendix a entry term total words omitted 2018 lcsh terms omitted 1910 lcsh terms omitted encyclopedia entry word count sardis 24 13 11 381 suction 24 13 11 38 stylites, pillar saints 19 13 6 199 shadwell 14 10 4 211 salicornia 13 6 7 254 sepulchre 11 3 8 348 sitta nuthatch 9 5 4 620 sprat 9 3 6 475 serapis 8 5 3 587 strada 8 1 7 189 shoad 7 4 3 463 sign 7 5 2 68 shooting 6 3 3 6114 strata 6 3 3 2920 stewartia 5 4 1 72 subclavian 5 3 2 20 schweinfurt 4 2 2 84 scroll 4 2 2 45 spalatro 4 3 1 99 special 4 3 1 24 samogitia 3 2 1 112 shakespeare 3 0 3 3855 sinapism 2 1 1 25 sect 1 1 0 20 severino 1 1 0 38 shaddock 1 1 0 6 scarlet 0 0 0 65 shallop, shalloop 0 0 0 42 soldanella 0 0 0 56 spoletto 0 0 0 99 information technology and libraries september 2020 evaluating the impact of the long-s | grabus 12 appendix b *n = 30 entries average terms returned median terms returned corrected 24.77 / 40 possible 28 / 40 possible uncorrected 26.47 / 40 possible 29 / 40 possible 2018 lcsh corrected 14.10 / 20 possible 19 / 20 possible 2018 lcsh uncorrected 13.47 / 20 possible 18.5 / 20 possible 1910 lcsh corrected 11.27 / 20 possible 11 / 20 possible 1910 lcsh uncorrected 10.13 / 20 possible 9 / 20 possible information technology and libraries september 2020 evaluating the impact of the long-s | grabus 13 endnotes 1 liz woolcott, “understanding metadata: what is metadata, and what is it for?,” routledge (november 17, 2017), https://doi.org/10.1080/01639374.2017.1358232; koraljka golub et al., “a framework for evaluating automatic indexing or classification in the context of retrieval,“ journal of the association for information science and technology 67, no. 1 (2016), https://doi.org/10.1002/asi.23600; lynne c. howarth, “metadata and bibliographic control: soul-mates or two solitudes?,“ cataloging & classification quarterly 40, no. 3-4 (2005), https://doi.org/10.1300/j104v40n03_03. 2 a. belaid et al., “automatic indexing and reformulation of ancient dictionaries“ (paper presented at the first international workshop on document image analysis for libraries, palo alto, ca, 2004), https://doi.org/10.1109/dial.2004.1263264. 3 beatrice alex et al., “digitised historical text: does it have to be mediocre" (paper presented at the konvens 2012 (lthist 2012 workshop), vienna, september 21, 2012); ted underwood, “a half-decent ocr normalizer for english texts after 1700," the stone and the shell, december 10, 2013, https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-englishtexts-after-1700/. 4 “nineteenth-century knowledge project," (github repository), 2020, https://tuplogan.github.io/. 5 “nineteenth-century knowledge project.” 6 marcia lei zeng and lois mai chan, “metadata interoperability and standardization a study of methodology, part ii," d-lib magazine 12, no. 6 (2006); g. bueno-de-la-fuente, d. rodríguez mateos, and j. greenberg, “chapter 10 automatic text indexing with skos vocabularies in hive" (elsevier ltd, 2016); sheila bair and sharon carlson, “where keywords fail: using metadata to facilitate digital humanities scholarship," journal of library metadata 8, no. 3 (2008), https://doi.org/10.1080/19386380802398503. 7 john walsh, “the use of library of congress subject headings in digital collections," library review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 8 jane greenberg et al., “hive: helping interdisciplinary vocabulary engineering,“ bulletin of the american society for information science and technology 37, no. 4 (2011), https://doi.org/10.1002/bult.2011.1720370407. 9 sam grabus et al., “representing aboutness: automatically indexing 19thcentury encyclopedia britannica entries,” nasko 7 (2019), pp. 138-48, https://doi.org/10.7152/nasko.v7i1.15635. 10 karen attar, “s and long s," in oxford companion to the book, eds. michael felix suarez and h. r. ii woudhuysen (oxford: oxford university press, 2010); ingrid tieken-boon van ostade, “spelling systems,“ in an introduction to late modern english (edinburgh university press, 2009). 11 andrew west, “the rules for long-s," tugboat 32, no. 1 (2011). 12 attar, “s and long s.” https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1002/asi.23600 https://doi.org/10.1300/j104v40n03_03 https://doi.org/10.1109/dial.2004.1263264 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tu-plogan.github.io/ https://tu-plogan.github.io/ https://doi.org/10.1080/19386380802398503 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.7152/nasko.v7i1.15635 abstract introduction background indexing for the 19th-century knowledge project the long-s problem encyclopedia entry lengths objectives methods results discussion conclusion and next steps acknowledgements appendix a appendix b microsoft word march_ital_dehmlow.docx editorial board thoughts a&i databases: the next frontier to discover mark dehmlow information technology and libraries | march 2015 1 i think it is fair to say that the discovery technology space is a relatively mature market segment, not complete, but mature. much of the easy-‐to-‐negotiate content has been negotiated, and many of the systems on the market are above or approaching a billion records. this would seem a lot, but there is a whole slice of tremendously valuable content still not fully available across all platforms, namely the specialized subject abstracting and indexing database content. this content has a lot of significant value for the discovery community—many of those databases go further back than content pulled from journal publishers or full-‐text databases. equally as important is that they represent an important portion of humanities and social sciences content that is less represented in discovery systems as compared to stem content. for vendors of a&i content, the concerns are clear and realistic, differently from journal publishers whose metadata is meant to direct users to their main content (full text), the metadata for a&i publishers is the main content. according to a recent nfais report, a major concern for them is that if they include their content in discovery systems, they “risk loss of brand awareness” and the implications are that institutions will be more likely to cancel those subscriptions.1 the focus therefore seems to have been how to optimize the visibility of their content in discovery systems before being willing to share it. in addition to the nfais report, some of the conversations i have seen on the topic seem to focus on wanting discovery system providers to meet a more complex set of requirements that will maximize leveraging the rich metadata contained in those resources, the idea being that utilizing that metadata in specific ways will increase the visibility of the content. in principle i think it is a commendable goal to maximize the value of the comprehensive metadata a&i records contain, and the complexities of including a&i data into discovery systems need to be carefully considered -‐ namely blending multiple subject and authority vocabularies, and ensuring that metadata records are appropriately balanced with full text in the relevancy algorithm. but i also worry that setting too many requirements that are too complicated will lead to delayed access and biased search results. it is important that this content is blended in a meaningful way, but determining relevancy is a complex endeavor, and it is critically important for relevancy to be unbiased from the content provider perspective and instead focus on the user, their query, and the context of their search. another concern that i have heard articulated is that results in discovery services are unlikely to be as good as native a&i systems because of the already mentioned blending issues. this is likely mark dehmlow (mark.dehmlow@nd.edu), a member of the ital editorial board, is program director, library information technology, university of notre dame, south bend, in. editorial board thoughts: a&i databases | dehmlow 2 to be true, but i think it is critical to focus on the purpose of discovery systems. as donald hawkins recently wrote in a summary of a workshop called “information discovery and the future of abstracting and indexing services,” “a&i services provide precision discipline-‐specific searching for expert researchers, and discovery services provide quick access to full text.”2 hawkins indicates that discovery systems are not meant to be sophisticated search tools, but rather a quick means to search a broad range of scholarly resources and i think sometimes a quick starting point for researchers. because of the nature of merging billions of scholarly records into a single system, discovery systems will never be able to provide the same experience as a native a&i system, nor should they. over time, they may become better tuned to provide a better overall experience for the three different types of searchers we have in higher education: novice users like undergraduates looking for a quick resource, advanced users like graduate students and faculty looking for more comprehensive topical coverage, and expert users like librarians who want sophisticated search features to hone in on the perfect few resources. many of the discovery systems are working on building these features, but the industry will take time to solve this problem, and i tend to look at things from the lense of our end users—non-‐inclusion of this content directly impacts their overall discovery experience. one might ask, if the discovery system experience isn’t as precise and complete as the native a&i experience, why bother? in addition to broadening the subject scope by including many of the more narrow and deep subject metadata, there is also the importance of serendipitous finding. that content, in the context of a quick user search, may drive the user to just the right thing that they need. in addition, my belief is that with that content, we can build search systems that are deeper than google scholar, and by extension provide our end users with a superior search experience. and so i advocate for innovating now instead of waiting to work out all of the details. i am not suggesting moving forward callously, but swiftly. the work that niso has done on the open data initiative has resulted in some good recommendations about how to proceed. for example, they have suggested two usage metrics that could be valuable for measuring a&i content use in discovery systems: search counts (by collection and customer for a&i databases) and results clicks (number of times an end user clicks on a content provider’s content in a set of results).3 while i think these types of metrics are aligned with the types of measures that libraries evaluate a&i database usage by, i think at the same time they don’t really say much about the overall value of the resources themselves. sometimes in the library profession, our obsession for counting stuff loses connection with collecting metrics that actually say something about impact. of the two counts, i could see perhaps counting the result clicks as having more value. in this instance, knowing that a user found something of interest from a specific resource at the very least indicates that it led the user some place. i think the measure of search counts by collection is less useful. at best it indicates that the resource was searched, but it tells us nothing about who was searching for an item, what they found, or what they subsequently did with the item once they found it. i do think we in libraries need to consider the bigger picture. regardless of the number of searches information technology and libraries | march 2015 3 (which doesn’t really tell us anything anyway), we need to recognize the value alone of including the a&i content, and instead of trying to determine the value of the resource by the number of times it was searched, focus more on the breadth of exposure that content is getting by inclusion in the discovery system. i think a more useful technical requirement for discovery providers would be to provide pathways to specific a&i resources within the context of a user’s search—not dissimilar to how google places sponsored content at the top of their search results, a kind of promotional widget. in this case, using metadata returned from the query, the systems could calculate which one or two specific resources would guide the user to more in depth research. by virtue of inclusion of the resource in the discovery system, those resources could become part of the promotional widget. this would guide users back to the native a&i resource which both libraries and a&i providers want, and it would do that in a more intuitive and meaningful way for the end user. all of the parties involved in the discovery discussion can bring something to the table if we want to solve these issues in a timely way. i hope that a&i publishers and discovery system providers make haste and get agreements underway for content sharing and i would recommend that instead of focusing on requiring finished implementations based in complex requirement before loading content, both of them should instead focus on some achievable short and long term goals. integrating a&i content perfectly will take some time to complete and the longer we wait, the longer our users have a sub-‐optimal discovery experience. discovery providers need to make long term commitments to developing mechanisms that satisfy usage metrics for a&i content, although i would recommend defining measures that have true value. a&i providers should be measured in their demands: while their stakes in system integration is real, there runs a risk of content providers vying for their content to be preferred when relevancy neutrality is paramount for a discovery system to be effective. i think it is worth lauding the efforts of a few trailblazing a&i publishers such as thomson reuters and proquest who have made agreements with some of the discovery providers and are sharing their a&i content already, providing some precedent for sharing a&i content. lastly, libraries and knowledge workers need to develop better means for calculating overall resource value, moving beyond strict counts to thinking of ways to determine the overall scholarly/pedagogical impact of those resources and they need to make the fact alone that an a&i publisher shares its data with a discovery provider indicate significant value for the resource. editorial board thoughts: a&i databases | dehmlow 4 references 1. nfais, recommended practices: discovery systems. nfais, 2013. https://nfais.memberclicks.net/assets/docs/bestpractices/recommended_practices_final_aug_ 2013.pdf. 2. hawkins, donald t., “information discovery and the future of abstracting and indexing services: an nfais workshop.” against the grain. , 2013. http://www.against-‐the-‐ grain.com/2013/08/information-‐discovery-‐and-‐the-‐future-‐of-‐abstracting-‐and-‐indexing-‐ services-‐an-‐nfais-‐workshop/. 3. open discovery initiative working group, open discovery initiative: promoting transparency in discovery. baltimore: niso, 2014. http://www.niso.org/apps/group_public/download.php/13388/rp-‐19-‐2014_odi.pdf. editor’s comments bob gerrity information technologies and libraries | september 2013 3 this month’s issue in this month’s issue, we welcome back the president’s message column, with incoming lita president cindi trainor describing upcoming lita events, priorities, and opportunities for members. university of denver mlis candidate gina schlesselman-tarango contributes a compelling piece describing the background, use, and potential library application of searchable signatures in web 2.0 applications such as instagram. jenny emanuel from university of illinois reports on the complex relationship that millennial academic librarians have with technology. kristina l. southwell and jacquelyn slater from university of oklahoma present the findings of a study evaluating the accessibility of special collections finding aids to screen readers for visually impaired users. ping fu from central washington university and moira fitzgerald from yale university look at the potential effects of cloud-based next-generation library services platforms on staffing models for systems and technical-services departments. visiting the discovery side of library services, megan johnson from appalachian state university reports on usability testing of appalachian’s “one box” integrated articles and catalog search, using innovative interfaces’ encore discovery service. speaking of usability, i had the chance recently to observe a usability testing session for my library’s website, and was reminded of the importance of designing library websites and delivering web-based library services that will actually be of value to our users, delivered with their context in mind rather than ours. my library, like many others, has a website rich in content and complexity and organized around our structure. to the user i was observing, the complexity and library-centric organization clearly were obstacles to the rich content we offer. an undergraduate art history major, she was primarily interested in library resources and services that were directly connected to her coursework and that were accessible from the university’s learning management system (lms). she valued the convenience of direct access from the lms to library-managed course readings and past exam papers. but, when asked to navigate to the same resources using the library homepage as a starting point, rather than the lms, she quickly became frustrated and confused by the overload of search options with (to her) confusing labels. she was further stymied by our proclivity to make things more complex than they need to be (or should be). a simple example: a common occurrence at the beginning of semester is that students with outstanding library fines/fees are blocked from registering for classes. rather than providing a simple, direct “resolve my library fees” link, with clear instructions on how to fix their problem, as bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments bob gerrity editor’s comments | gerrity 4 quickly as possible, we instead provide pages of information about how and why the fines/fees were calculated, with no link to a solution to the problem at hand. my takeaways from the session were that (1) our website needs to be radically simplified and (2) we should be focussing on designing and delivering services that can be embedded in the context of the user’s natural workflows, not the library’s. easier said than done, of course. reviewers needed the ital editorial board has room for a couple of additional members, to help us keep up with incoming article submissions. if you have a passion for library technology, a willingness to undertake a few reviews each year, and are a member of lita (or willing to join), please send me an e-mail indicating your interest and area(s) of expertise. as always, suggestions and feedback on ital are welcome, at the e-mail address above. 315 technical communications isad/solinet to sponsor institute "networks and networking ii; the present and potential" is the theme of an isad institute to be held at the braniff place hotel on february 27-28, 1975, in new orleans. the sponsors are the information science and automation division of ala and the southeastern library network (solinet). this second institute on networking will be an extension of the previous one held in new orleans a year ago. the ground covered in that previous institute will be the point of departure for "networks ii." the purpose of the previous institute was to review the options available in networking, to provide a framework for identifying problems, and to suggest evaluation strategies to aid in choosing alternative systems. while the topics covered in the previous institute will be briefly reviewed in this one, some speakers will take different approaches to the subject of networking, while other speakers will discuss totally new aspects. in addition to the papers given and the resultant questions and answers from the floor, a period of round table discussions will be held during which the speakers can be questioned on a person-to-person basis. a new feature to isad institutes now being planned will be the presence of vendors' exhibits. arrangements are being made with the many vendors and manufacturers whose services are applicable to networking to exhibit their products and systems. it is hoped that many of them will be interested in responding to this opportunity. the program will include: "a systems approach to selection of alternatives" -resource sharing-camponents-communications options-planning strategy. joseph a. rosenthal university of california, berkeley. ' "state of the nation"-review of current developments and an evaluation. brett butler, butler associates. "the library of congress, marc, and future developments." henriette d. avram, library of congress. "data bases, standards and data conversions" -existing data bases-characteristics-standardization-problems. john f. knapp, richard abel & co. "user products"-possibilities for product creation-the role of user products. maurice freedman, new york public library. "on-line technology"-hardware and software considerations-library requirements-standards-cost considerations of alternatives. philip long, state university of new york, albany. "publishers' view of networks"-copyright-effect on publishers-effect on authorship-impact on jobbers-facsimile transmission. carol nemeyer, association of american publishers. "national library of canada"-current and anticipated developments-cooperative plans in canada-international cooperation. rodney duchesne, national library of canada. "administrative, legal, financial, organizational and political considerations" -actual and potential problems-organizational options-financial commitment-governance. fred kilgour, oclc. registration will be $75.00 to members of ala and staff members of solinet institutions, $90.00 to nonmembers, and $10.00 to library school students. for hotel reservation information and registration blanks, contact donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611; 312-944-6780. 316 journal of library automation vol. 7/4 december 1974 regional projects and activities indiana coopemtive libmry services authm·ity the first official meeting of the board of directors of the indiana cooperative library services authority (incolsa) was held june 4, 1974, at the indiana state library in indianapolis. a direct outgrowth of the cooperative bibliographic center for indiana libraries ( cobicil) feasibility study project sponsored by the indiana state library and directed by mrs. barbara evans markuson, incolsa has been organized as an independent not-for-profit organization "to encourage the development and improvement of all types of library service." to date, contracts have been signed by sixty-one public, thirteen academic, fourteen schools and five specfal librariesa total of ninety-three libraries. incolsa is being funded initially by a three-year establishment grant from the u.s. office of education, library services and construction act (lsca) title i funds. officers are: president-harold baker, head of library systems development, indiana state university; vice-presidentor. michael buckland, assistant director for technical services, purdue university libraries; secretary-mary hartzler, head of catalog division, indiana state library; treasurer-mary bishop, director of the crawfordsville book processing center; three directors-at-large--phil hamilton, director of the kokomo public library; edward a. howard, director of the evansville-vanderburgh county public library; and sena kautz, director of media services, duneland school corporation. stanford's ballots on-line files publicly available through spires september 16,.1974 the stanford university libraries automated technical processing system, ballots (bibliographic automation of large library operations using a timesharing system) , has been in operation for twenty-two months and supports the acquisition and cataloging of nearly 90 percent of all materials processed. important components of the ballots operations are several on-line files accessible through an unusually powerful set of indexes. currently available are: a file of library of congress marc data starting from january 1, 1972 (with a gap from may to august 1972); an in-process file of individual items being purchased by stanford; an on-line catalog (the catalog data file) of all items cataloged through the system, whether copy was derived from library of congress marc data, was input from non-marc cataloging copy, or resulted from stanford's own original cataloging efforts; and a file of see, see also, and explanatory references (the reference file) to the catalog data file. in addition, during september and october 1974, the 85,000 bibliographic and holdings records (already in machinereadable form on magnetic tape) representing the entire j. henry meyer memorial undergraduate library was convmted to on-line meyer catalog data and meyer reference files in ballots. these files are publicly available through spires (stanford public information retrieval system) to any person with a terminal that can dial up the stanford center for information processing's academic computer services computer (an ibm 360 model 67) and who has a valid computer account. the marc file can be searched through the following index points: lc card number personal name corporate/ conference n arne title the in-process, catalog data, and reference files for stanford and for meyer can also be searched as spires public subfiles through the following index points: ballots unique record identification number personal name corporate/ conference name title subject heading (catalog data and reference file records only) call number (catalog data and reference file records only) lc card number the title and corporate/ conference name indexes are word indexes; this means that each word is indexed individually. search requests may draw on more than one index at a time by using the logical operators "and," "or," and "and not" to combine index values sought. if you plan to use spires to search these files, or if you would like more information, a publication called gttide to ballots files may be ordered by writing to: editor, library computing services, s.c.i.p.-willow, stanford university, stanford, ca 94305. this document contains complete information about the ballots files and data elements, how to open an account number, and how to use spires to search ballots files. a list of ballots publications and prices is also available on request. as additional libraries create on-line files using ballots in a network environment, these files will also be available. these additions will be announced in ]ola technical commttnications. data base news interchange of alp and ei data bases a national science foundation grant (gn-42062) for $128,700 has been awarded to the american institute of physics (aip), in cooperation with engineering index ( ei), for a project entitled "interchange of data bases." the grant became effective on may 1, 1974, for a period of fifteen months. the project is intended to develop methods by which ei and alp can reduce their input costs by eliminating duplication of intellectual effort and processing. through sharing of the resources of the two organizations and an interchange of their respective data bases, alp and ei expect to improve the utilization of these computer-readable data bases. the basic requirement for the developtechnical communications 317 ment of the interchange capability for computer-readable data bases is the establishment of a compatible set of data elements. each organization has unique data elements in its data base. it will therefore be necessary to determine which of the data elements are absolutely essential to each organization's services which elements can be modified, and wh~t other elements must be added. mter the list of data elements has been established, it will be possible to unite the specifications and programs for format conversions from alp to ei tape format and vice versa. simultaneously, there will be the development of language conversion facilities between ei' s indexing vocabulary and alp's physics and astronomy classification scheme (pacs). it is also planned to investigate the possibility of establishing a computer program which can convert alp's indexing to ei's terms and vice versa. with the accomplishment of the above tasks, it will be possible to create new services and repackage existing services to satisfy the information demands in areas of mutual interest to engineers and physicists, such as acoustics and optics. eric data base users conference the educational resource information center (eric) held an eric data base users conference in conjunction with the 37th annual meeting of the american society for information science (asis) in atlanta, georgia, october 13-17, 1974. the eric data base users conference provided a forum for present and potential eric users to discuss common problems and concerns as well as interact with other components of the eric network: central eric, the eric processing and reference facility, eric clearinghouse personnel, and information dissemination centers. although attendees have in the past been primarily oriented toward machine use of the eric files, all patterns of usage were represented at this conference, from manual users of printed indexes to operators of national on-line reh·ieval systems. 318 ]oumal of library automation vol. 7/4 december 1974 a number of invited papers were presented dealing with subjects such as: • the current state and future directions of educational information dissemination. sam rosenfeld (nie), lee burchina! (nsf). • what services, systems, and data bases are available? marvin gechman (information general), harvey marron (nie). • the roles of libraries and industry, respectively, in disseminating educational information. richard de gennaro (university of pennsylvania), paul zurkowski (information industry association) . several organizations (national library of canada, university of georgia, wisconsin state department of education) were invited to participate in "show and tell" sessions to describe in detail how they are using the eric system and data base. a status report covering eric on-line services for educators was presented by dr. carlos cuadra (system development corporation) and dr. roger summit (lockheed). interactive discussion groups covered a number of subjects including: • computer techniques-programming methods, use of utilities, file maintenance, search system selection, installation, and operation. • serv:uig the end user of educational information. • introduction to the eric systemwhat tools, systems, and services are available and how are they used? • beginning and advanced sessions on computer searching the eric files. online terminals were used to demonstrate and explain use of machine capabilities. commercial services and developments scope data inc. ala train compatible terminal printers scope data inc. currently is offering a high-speed, nonimpact terminal printer for use in various interactive printing applications. capability can be included in the series 200 printer as an extra-cost feature to print the eight-bit ascii character set for ala character set with 176 characters. for further information contact alan g. smith, director of marketing, scope data inc., 3728 silver star rd., orlando, fl 32808. institute for scientific information puts life sciences data base on-line through system development corporation the institute for scientific information (lsi) has announced that it will collaborate with system development corporation (sdc) to provide on-line, interactive, computer searches of the life sciences journal literature. scheduled to be fully operational by july 1, 1974, the isi-sdc service is called scisearch® and is designed to give quick, easy, and economical access to a large life sciences literature .file. stressing ease of access, the sdc retrieval program, orbit, permits subscribers to conduct extremely rapid literature searches through two-way communications terminals located in their own facilities. mter examining the preliminary results of their inquiries, searchers are able to further refine their questions to make them broader or narrower. this dialog between the searcher and the computer (located in sdc's headquarters in santa monica, california) is conducted with simple english-language statements. because this system is tied in to a nationwide communications network, most subscribers will be able to link their terminals to the computer through the equivalent of a local phone call. covering every editorial item from about 1,100 of the world's most important life sciences journals, the service will initially offer a searchable ille of over 400,000 items published between april 1972 and the present. each month approximately 16,000 new items will be added until the average size of the file totals about one-half million items and represents two-and-one-half years ·of coverage. to assure subscribers maximum retrieval effectiveness when dealing with this massive amount of information, the data base can be searched in several ways. included are searches by keywords, word stems, word phrases, authors, and organizations. one of the search techniques utilized-citation searching-is an exclusive feature of the lsi data base. for every item retrieved through a search, subscribers can receive a complete bibliographic description that includes all authors, journal citation, full title, a language indicator, a code for the type of item (article, note, review, etc.), an lsi accession number, and all the cited references contained in the retrieved article. the accession number is used to order full-text copies of relevant items through lsi's original article tear sheet service (oats®). this ability to provide copies of every item in the data base distinguishes the lsi service from many others. current library of congress catalog on-line for reference searches information dynamics corporation (idc) has agreed to collaborate with system development corporation (sdc) to provide reference librarians, researchers, and scholars with on-line interactive computer searches of all library materials being cataloged by the library of congress. scheduled to be fully operational as of october 1, 1974, the sdc-idc service is called sdc-idc/libcon and is designed to give quick, easy, and economical access to a large portion of the world's scholarly library materials. as in the lsi service described above, the data base can be searched in several ways. included are compound logic searches by keywords, word stems, word phrases, authors, organizations, and subject headings for most english materials. one of the search techniques utilized-string searching-is an exclusive feature of sdc's orbit system. keyword searching of cataloged items including all foreign materials processed by the library of congress technical communications 319 is an exclusive feature of the idc data base not currently available in other online marc files. for individual items retrieved through a search, subscribers can receive a bibliographic description that includes authors, full title, an idc accession number, the lc classification number, and publisher information. standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: the tesla reactor ballot will be provided in f01'thcoming issues. to use, photocopy the ballot fol'm, fill out, and mail to: john c. kountz, associate for library automation, office of the chan{jellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives to existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline is designed to facilitate review by both the committee and the membership of initiative standards proposals and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi320 journal of library automation vol. 7/4 december 1974 cated by: vi. existing standards. not applicable). nate that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8~~~~ x 11" white paper (typing on one side only) . each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. (optional) specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specifications of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. tesla reactor ballot identification number for standing requirement reactor information name-----'----------tiue ______________________ ___ organization --------------addrms _____________ ___ city ___ _ state ___ zip __ _ telephonea 1:-:::-rea::+----~--- need (for this standard) for d against 0 specification (a presented in this requirement) for 0 against 0 ext. can you participate in the development of this. standard -.,.---------==----0 no d yes reason for position: (use format of proposal. · additional pages can be used if required) the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. as you, the members, use the outline to present your standards proposals, tesla will publish them in jola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. input to the editor: we have been asked by the members of the ala interdivisional committee on representation in machine readable form of bibliographic information, (marbi) to respond to your editorial in the june 1974 issue of the journal of library automation. this editorial dealt with the council of library resources' [sic] involvement in a wide range of projects, ranging from the sponsorship of a group which is attempting to develop a subset of marc for use in inter-library exchange technical communications 321 of bibliographic data ( cembi), to management of a project which has as its goal the creation of a national serials data base, (conser), and, more recently, to the convening of a conference of library and a&i organizations to discuss the outlook for comprehensive national bibliographic control. you raised several legitimate questions: 1) has sufficient publicity been given to these activities of the council so that all, not just a few, libraries are aware of what is happening and have an opportunity to exert an influence on developments? and, 2) is the council bypassing existing channels of operation and communication? you also suggest that proposals from groups such as cembi be channeled through an official ala committee such as marbi for intensive review and evaluation. it should be pointed out that marbi is not charged with the development of standards. it acts to monitor and review proposals affecting the format and content of machine readable bibliographic data, where that data has implications for national or international use. this applies to proposals emanating from cembi and conser as well as from other concerned groups. all indications to date are that the council is fully aware of marbi's role and will not bypass marbi. a number of members of marbi are also members of cembi and marbi is represented on the conser project. also reassuring is the fact that, unless we allow lc to fall by the wayside in its role as the primary creator and distributor of machine readable data, any standards for format or content developed by a council-sponsored group will eventually be reflected in the marc records distributed by lc. the library of congress has issued a statement, published in the june 1974 issue of jola, to the effect that it will not implement any changes in the marc distribution system which are not acceptable to marbi. marbi and lc have worked out a procedure whereby all proposed changes to marc are submitted to marbi. they are then published in ]ola and distributed to mem322 journal of library automation vol. 7/4 december 1974 hers of the marc users discussion group for comments. comments are collected and evaluated by marbi and a report submitted to lc, with its recommendations. the marbi review process does not guarantee perfection and there is no assurance that everyone will be satisfied. compromise and expediency are the name of the game in this extremely complicated and uncharted area of standards for machine readable bibliographic data. however the council has undoubtedly learned from the isbd(m) experience that it cannot make decisions which affect libraries without the greatest possible involvement of librarians. it is the feeling of the marbi committee members that the council intends to work with marbi in future projects which fall into marbi's area of concern. velma veneziano marbi past chairperson ruth tighe chairperson editor's note: it is gratifying to note that marbts response reflects the opinions expressed in the june 1974 editorial. the library community will doubtless. be pleased to learn of clr's intention to work closely with marbi.-skm to the editor: as briefly discussed with you, yom editorial in the june 1974 issue of jola is both admirable and disturbing (to me, at least). the problem of national leadership in the area of library automation is a critical problem indeed. being in the ''boondocks" and far removed from the scene of action, i can only express to you my perception as events and activities filter through to me. i can remember as far back as 1957 when adi had a series of meetings in washington, d.c. trying to establish a national program for bibliographic automation. i have been through eighteen years of meetings, committees, conferences, etc. concerned with trying to develop a national plan for bibliographic automation and information storage and retrieval systems. i have worked with nsf, usoe, department of commerce, u.s. patent office, engineering and technical societies, dod agency-the entire spectrum. i spent a good many years working in adi and asis, sla, andmost recently ala. at no time were we able to make significant progress towards a national system. even the great airlie house conference did not produce any significant changes in the fragmented, competitive "non-system." it has only been in the recent past since clr has taken an aggressive posture that i am able to see the beginning of orderly development of a national automated bibliographic system. i certainly agree that any topic as critical as those being discussed by cembi should be in the public domain, but i also believe that the progress made by cembi would not have been possible without clr taking the initiative in getting these key agencies together. thank goodness someone quit talking and started doing something at the national level! i sincerely believe that in the absence of a national library and with the cmrent lack of legally derived authority in this arena, clr provides a genuine service to the total library community in establishing cembi. hopefully, your very excellent article (in the same issue of jola) on "standards for library automation ... " will help to put the entire issue of bibliographic record standards into perspective. as a former chemist and corrosion engineer, i am fully aware of the absolute necessity for technical standards. i am also fully aware of the necessity of developing technical standards through the process you outlined in your article. hopefully, clr action with cembi will expedite this laborious process and help to push our profession forward into the twentieth century. since we ourselves have not been able to do it through all these years, i am personally grateful that some group such as clr took the initiative and forced us to do what we should have done years ago. maryann duggan slice office di1·ector editor's note: positive action and progmssive movement are, of course, desirable and are often lacking in large organizations. however, posit·ive action without communication of this action to the affected population can only be detrimental. on issues of the complexity of those addressed by cembi and conser, review by the library community is always useful, even though action may be temporarily delayed.-skm to the editor: on page 233 of the september issue of lola there is a report from the information industry association's micropublishing committee chairman (henry powell). he states that", .. the committee spelled out several areas of concern to micropublishers which will be the subject of committee action .... " one of the concerns of the committee is that a z39 standards committee has recommended "standards covering what micropublishers can say about their products." (emphasis mine.) technical communications 323 as chairman of the z39 standards subcommittee which is developing the advertising standard referred to, i wish to point out that there is no intention on the part of the subcommittee to tell micropublishers what they can say nor what they may say about their products. the subcommittee, which is composed of representatives from three micropublishing concerns, two librarians, and myself, has from the beginning taken the view that the purpose of the standard would be to provide guidance for micropublishers and librarians alike. we are most anxious that no one feel that the subcommittee has any intention of attempting to use the standards mechanism to tell any micropublisher how he must design his advertisements. in addition it should be noted that no ansi standard is compulsory. carl m. spaulding program officer council on library resou1·ces decision-making in the selection, procurement, and implementation of alma/primo: the customer perspective article decision-making in the selection, procurement, and implementation of alma/primo the customer perspective jin xiu guo and gordon xu information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15599 jin xiu guo (jiguo@fiu.edu) is associate dean for technical services, florida international university. gordon xu (gordon.xu@njit.edu) is associate university librarian for collections & information technology, new jersey institute of technology. © 2023. abstract this case study examines the decision-making process of library leaders and administrators in the selection, procurement, and implementation of ex libris alma/primo as their library services platform (lsp). the authors conducted a survey of libraries and library consortia in canada and the united states who have implemented or plan to implement alma. the results show that most libraries use both request for information (rfi) and request for proposal (rfp) in their system selection process, but the vendor-offered training is insufficient for effective operation. one-third of the libraries surveyed are considering switching to open-source options for their next automation system. these insights can benefit libraries and library consortia in improving their technological readiness and decision-making processes. introduction with the exponential growth of digital information, libraries have been seeking innovative systems to manage electronic resources and provide collection services. the next-generation integrated library system (ils) should address both current challenges and future demands. with that in mind, new cloud-based commercial products have come into the market in recent years. ex libris alma, oclc worldshare, and innovative sierra are often referred to as library service platforms (lsps) compared to a client-based ils. among these new products, selecting and implementing a new system is no small task. studies show that libraries might overlook the capacity of an ils to accommodate many functions and make a tough choice between sticking with the current vendor or switching to another before investing time and resources to migrate to a completely new system.1 libraries do not make these kinds of decisions in a rational manner, which involves clearly defining the problem, identifying and evaluating potential options, weighing the pros and cons of each option, considering an organization’s values, goals, and preferences, making a choice based on a systematic analysis, and continuously reassessing and adjusting the decision as new information becomes available. as a result, a selected system might not be the best fit for a library’s actual needs.2 library consortia also face a similar challenge, but in a more complex context. for example, sharing cost, level of collaboration, and integration with other library applications can be quite different from a small library to a large research library. additionally, the requirement for security and scalability can vary among consortial members. ninety-four percent of academic libraries migrated their systems to alma in 2018 by joining a consortium.3 at a consortial level, managing a system migration project adds a significant challenge because of the competing, often conflicting desires of constituent institutions. mailto:jiguo@fiu.edu mailto:gordon.xu@njit.edu information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 2 guo and xu budgeting for a migration project needs to be secured before the project takes place. the one-time migration cost has a huge impact on a library’s decision on a new system. lengthy procurement processes mean that it can take a year to communicate requirements, solicit bids, and make a final decision. libraries also wonder if they should acquire such a new system through a consortial deal or on their own. a successful implementation of a new system starts with making a sound choice. the system migration project encompasses various technological and management decisions made by project managers, team leaders, and library administrators. decisions about data cleanup, migration mapping, system configuration, communication, and training can have a tremendous impact on project outcomes, staffing, existing workflows, and job functions and responsibilities. in the meantime, the project itself also provides libraries a great opportunity to improve the existing operational and staffing model and to adjust their strategy to manage technological and organizational change. there are few studies on decision-making of the alma/primo selection, procurement, and migration from the user’s perspective. alma is a cloud-based library management system that helps libraries manage, deliver, and discover digital and physical resources. it offers functionalities such as resource discovery, resource management, resource sharing, and analytics. primo ve is a next-generation library discovery platform that provides users with access to a central index of the library’s collections. it offers a personalized and intuitive search experience, with features such as faceted searching, saved searches, and item recommendations. both alma and primo ve are ex libris products. this case study fills the gap and provides a better understanding of how american and canadian library leaders and administrators make decisions for their libraries and consortia. the pairing of ex libris’s alma and primo products has become a widely accepted next-generation system due to its cloud-based model for managing both electronic and print resources. the findings of this study offer insights and lessons learned to help library leaders and administrators to make better decisions on their future technological change. literature review the growing user demand for electronic resources over the last decade has led libraries to make a rapid digital transformation to manage and deliver online library services. consequently, system providers are hungry to develop the next-generation library systems. organizations have started to adopt cloud computing as their infrastructure. a benefit of cloud computing is that local it staff no longer need to handle hardware failures and software installation. cloud computing streamlines processes and saves time and money. additionally, cloud computing not only enables libraries to deliver resources and services in a network and a library community but also frees libraries from managing technology to focus on collection building, service improvement, and innovation. therefore, libraries have started to migrate their client-based integrated library systems (ils) to cloud-based next-generation systems, often referred as lsps. these lsps can be connected with other web applications, increase collection visibility and accessibility, streamline workflows, reduce duplication of staffing and collections, and create a greener ecosystem for organizations.4 library consortia have been playing vital roles in resource sharing, cooperative purchasing, discovery, user experience, and technical support. many libraries migrate to a shared nextgeneration ils or lsp by joining a consortium. besides sharing common needs, participating information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 3 guo and xu libraries are quite different with respect to their sizes, the kinds and numbers of resources they provide, services, priorities, and staffing. although this could pose some challenges like cost sharing for participating libraries, workflow design, policy, and a collaboration model for libraries, libraries still benefit greatly from the shared catalog and enhanced metadata as well as cooperation on a global level through the product community such as eluna and igelu.5 the selection of a new system is not a small decision. calvert and read pointed out that some libraries turned to “sheep syndrome” of selecting what other libraries have bought due to the lack of software knowledge.6 their study suggested that a request for proposal (rfp) could be a part of the lsp selection process by providing a consistent set of vendor responses with a narrow scope, a formal statement of requirements for benchmarking, and a mechanism for vendors to compete. gallagher advised considering existing contracts, financial resources, and rfps before beginning a system assessment. he indicated that the expiration date of the current ils and opt-out clauses of the existing contract could be the indicators of a go-live date. a price quote including a one-time implementation fee and a cost-benefit analysis of the current ecosystem compared to the vendor offer could provide a helpful document that envisions future library services.7 in addition to an rfp, yang and venable also considered the library automation marketplace and needs of their own library when migrating from sirsidynix symphony to alma/primo.8 gallaway and hines embraced competitive usability techniques to test a set of standard tasks across multiple systems by using focus groups at loyola university new orleans to select a nextgeneration system.9 they also collected anecdotal information and feedback on the system performance of the current library online catalog through a survey of library staff. this evidencebased decision-making process makes system selection in a rational manner. manifold, on the other hand, proposed a principled approach to selecting a new lsp. he believed that system selection was a part of the continuing process of organizational change and needed to involve library staff and users throughout the process. today’s lsp systems can connect almost the entire range of library operations, from resource management and acquisitions to user request fulfillment and the integration of subject guides on research, teaching, and learning a system migration is much more than just a move to a new system; instead, it is a transfer to a new culture. he suggested the acquisitions process must start with educating participants on the features of various systems, methods of vendor assessment, the rules of contract negotiation, communication, and stress management. the success in system selection and implementation should be measured over the life span of the system to guide new decisions along the way.10 in addition to commercial products, some libraries are acquiring open-source software (oss) that enables them to have a greater control over customization. the potential benefits of oss include cost effectiveness, interoperability, user friendliness, reliability, stability, auditability, and customization. koha, evergreen, folio, abcd, winisis, newgenlib, emilda, pmb (phpmybibli) and weblis are examples of oss ils/lsp products on the market.11 when selecting and implementing an oss solution, small libraries such as the paine college colins-callaway library, with a limited budget and small staff, chose a hosted open-source ils (koha) to obtain specific expertise and services at a reasonable price.12 once a system is selected, the implementation process itself can be critical to the perception of overall system success. lovins expressed concern about choosing a project management approach that is schedule-driven over results-driven. he also recommended organizing implementation information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 4 guo and xu activities around the incoming system functionality. for a consortium-wide system migration, a “train-the-trainer” strategy was adopted in the training program, which mostly offers demonstrations instead of instruction to future trainers.13 the program hardly met libraries’ expectation for training. active staff participation in a system migration is key to a project success. banerjee and middleton reported that when library staff owned the migration process, fewer mistakes and greater satisfaction with the new system, as well as quicker troubleshooting of problems that did arise as a result of the migration, were observed.14 avery shared that the god’s bible college libraries did an informal preand post-assessment of library users and staff to gather feedbacks on both legacy and target ils. he recommended conducting a formalized preand post-evaluation of user satisfaction with the ils.15 stewart and morrison observed that acquisitions workflows in a shared alma environment must balance required consortial needs with local policies and procedures. the unmet training needs and the lack of an electronic resources management (erm) module in alma presented challenges for library staff to develop and manage alma workflows. they argued that a two-year project cycle was super ambitious especially if the consortium size and variety of individual libraries involved were large and wide.16 when migrating from horizon to symphony (both are sirsidynix products), king fahd university of petroleum and minerals based in dhahran saudi arabia experienced a delayed implementation. some unmet needs, such as a dramatic shift of workflows, user interface customization, and training support by a system provider or its parent company not matched by a local vendor, became hurdles for this project.17 although a new lsp including alma/primo and oss empowers libraries to create unified workflows across functional modules, this feature requires a system user to have cross-functional roles to conduct these activities.18 when migrating from non-ex libris product lines to alma/primo, libraries may need to make tough implementation decisions. for example, the university of south carolina migrated library data to alma/primo from innovative’s millennium and ebsco’s full text finder. when the legacy and target products are from different vendors, the system migration can be more complicated in communication, data mapping, data quality, and expected results of data migration. for the usc library, the preexisting duplicate records for electronic resources should have been cleaned up before the migration.19 libraries should address their concerns about key activities during the implementation to get the best possible result. the joint bank fund library had a three-day onsite training in workflows in the middle of the project. it would be much more effective if the library had communicated with the vendor to reschedule the training at a later stage of the migration because library staff were not yet familiar with the lsp by the expected time.20 the university of north carolina at charlotte migrated from oclc’s worldshare management services (wms) to alma/primo after migrating from millennium to wms four and a half years previously. the atkins library went through the second system migration because wms modules did not meet their library’s needs. going through two system migrations in the span of five years was particularly costly and frustrated technical services staff spent more than half of their work time on data cleanup. additional time for data cleaning, workflow design, and training was also needed after the migration to alma.21 fu and fitzgerald studied the effect of lsp staffing models for library systems and technical services by analyzing the software architecture, workflows, and functionality of voyager and information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 5 guo and xu millennium against those realigned in alma, wms (worldshare management systems), and innovative sierra. they discovered that the workload of systems staff could be reduced by around 40 percent, so library systems staff could have additional time to focus on local applications development, the discovery interface, and system integration. in the meanwhile, the functionality of the next generation ils provides a centralized-data services platform to manage all types of library assets with unified workflows. consequently, libraries could streamline and automate workflows for both physical and electronic resources through systems integration and enhanced functionality. this change requires libraries to reconsider their staffing models, redefine job descriptions, and even reorganize the library structure to leverage the benefits of a new lsp.22 western michigan university (wmu) decided to reorganize its technical services department after the alma migration was completed in 2015. after the alma implementation, it was observed that staff spent 38 percent less time working with physical materials. the systems department also shifted its focus from back-end system support to front-end user and other new technologies. wmu consolidated fourteen departments into six and renamed technical services to resource management, composed of cataloging and metadata, collections and stacks, and electronic resources. the lsp administration was shared by four certified alma administrators and one discovery administrator residing in the resource management department.23 although researchers and library practitioners have studied ils selection and implementation processes and the impact of migration on library operation and staffing, only the studies on the rfp and usability testing have focused on decision-making on the ils selection. today, library administrators and leaders face technological change more often while making a transformation to a digital business model. they should understand how decisions are made at different organizational levels when managing change. this study is to fill this gap and help library administrators and leaders to better prepare for future change through the following research questions: • what is the decision-making process and what do libraries consider? • how do libraries evaluate the migration project? • what are the impacts of the system migration on library staffing and operation? • what lessons have libraries learned from the system migration? • what will libraries do differently for the future system migration? methods researchers have adopted both qualitative and quantitative methods for studies about system migration. the literature indicates that both interviews and surveys have been employed to collect data for these studies.24 a usability testing through a set of tasks across systems has also been utilized in a system selection.25 a comparative analysis of vendor documents, rfp responses, and webinars has been applied in studying the impact of system migration on staffing models.26 in this research, the authors used a qualitative method through a survey to understand decisionmaking on system selection, procurement, and implementation. data collection the population for this study is those libraries that implemented or are planning to implement alma. through the eluna membership management site (https://eluna40.wildapricot.org/), the https://eluna40.wildapricot.org/ information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 6 guo and xu authors identified 1,440 libraries in the united states and canada that use at least one ex libris product. with help from sue julich at university of iowa libraries, who manages the site, 1,150 alma libraries were identified. the authors also contacted marshall breeding, the founder and publisher of library technology guides (https://librarytechnology.org/), and obtained a list of 1,134 alma libraries in the united states and canada. comparing the alma libraries acquired from the two different sources, they eventually identified 1,079 libraries from the united states and 55 libraries from canada as eligible survey-participating libraries. the authors developed a 13-question survey in qualtrics. this questionnaire aimed to help participants recall the project experience and offer them an opportunity to self-reflect and give feedback. the survey was distributed via email to the eligible libraries. a few email reminders were sent out to encourage participation. upon the closure of the survey, 291 libraries (27%) completed the survey completely. data analysis qualtrics generates data analysis and reports. the authors conducted a text analysis by categorizing responses to those open-ended survey questions to clarify the characteristics of each response manually and then presented and analyzed data in microsoft excel. findings part i: library profile & background information the participating libraries have diverse profiles in terms of size and geographic location and reflect the point of views from small library to library consortium. remarkably, during the survey, the authors received requests for a complete survey questionnaire so that respondents could coordinate and provide the complete and accurate data on behalf of their libraries. respondents the majority of the respondents in this survey were deans, directors of the library or university librarians, and system librarians (see table 1). also, there were a wide variety of other position titles across cataloging, acquisitions, technical support, and reference, who participated in the survey (see table 2). participating libraries geographic location the participating libraries were located in the united states and canada, and the majority of them were american libraries (see table 3). the american libraries were distributed in 36 states, while the canadian libraries came from 4 provinces. https://librarytechnology.org/ information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 7 guo and xu table 1. the position titles of the respondents position title percentage dean/director of the library/university librarian 35% system librarian 23% other 42% table 2. the other position titles of the respondents other position titles assessment librarian head of metadata and cataloging asset management librarian head of technical services assistant director ils coordinator associate dean instructional technology librarian associate director lead librarian associate law librarian library technician associate university librarian library technology manager cataloging and metadata librarian manager of archives & access services cataloging librarian manager of digital services collections librarian manager of technical support consortial executive director metadata librarian deputy director of the library project director director of library systems public services librarian director of library technology services reference librarian/webmaster director of technical services resource description and access librarian electronic resources librarian solutions architect, alma implementation project manager head librarian supervisor for access services head of acquisitions technical services and instruction librarian head of collection management technical services librarian head of library systems technical services section head head of library technology services technology manager table 3. the geographic locations of the libraries country percentage united states 92% canada 8% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 8 guo and xu library size the libraries served a wide variety of student sizes, ranging from less than 1,000 to over 50,000 students (see table 4). the smallest library had only 199 students while the largest library system or consortium had 482,000. the number of employees in those institutions ranged from less than 1,000 employees to over 20,000 faculty and staff (see table 5). the smallest institution may only have 10 employees, while there were three larger institutions with over 50,000 faculty and staff. table 4. student population (number of ftes) student population (number of ftes) percentage <1,000 6% 1,000–1,999 14% 2,000–2,999 10% 3,000–3,999 8% 4,000–4,999 4% 5,000–5,999 6% 6,000–6,999 4% 7,000–7,999 6% 8,000–8,999 4% 9,000–9,999 1% 10,000–14,999 9% 15,000–19,999 8% 20,000–29,999 6% 30,000–39,999 5% 40,000–49,999 3% 50,000+ 4% table 5. faculty and staff population (number of ftes) faculty/staff population (number of ftes) percentage <100 9% 100–499 25% 500–1,000 17% 1,000–1,999 14% 2,000–2,999 7% 3,000–4,999 12% 5,000–9,999 9% 10,000–19,999 4% 20,000+ 5% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 9 guo and xu library type the majority of the libraries were single campus libraries; some were part of a multicampus library system or consortium libraries (see table 6). the other types of libraries may include single campus libraries serving more than one institution or location, central offices of a consortium, part of a statewide system, or independent libraries involved in consortium purchase and implementation of alma. table 6. library type library type percentage single campus library 45% part of a multicampus library system 24% part of a consortium 26% other 5% previous integrated library system (ils) the majority of previous ilss used by the participating libraries were voyager, aleph, millennium, and sierra (see table 7), and their vendors were ex libris, innovative interfaces, inc., and sirsidynix (see table 8). thirty-seven percent of libraries reported that they had used their previous ils over 20 years before they planned to migrate or migrated to alma (see table 9). also, one-fifth of libraries indicated that prior to alma, it was their first time to adopt an ils. therefore, this was their only experience in system migration (see table 10). all libraries used cataloging, circulation, and opac modules in their previous ilss, and they also used other modules (see tables 11 and 12). table 7. the previous ilss the previous ils percentage voyager 29% aleph 24% millennium 16% sierra 12% symphony 6% worldshare management services 3% horizon 2% workflows 2% tlc 1% clio 1% evergreen 1% surpass 1% the library corporation 1% other 3% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 10 guo and xu table 8. the previous system vendors the previous ils vendor percentage ex libris 49% innovative interfaces, inc. 28% sirsidynix 11% oclc 4% endeavor 1% tlc 1% surpass 1% the library corporation 1% other 5% table 9. years with the previous systems years with the previous system percentage 3 1% 4 1% 5–9 7% 10–14 18% 15–19 27% 20+ 37% unknown 9% table 10. whether the previous systems were the first ilss was it your first ils percentage no 72% yes 20% unknown 7% table 11. modules used in previous ils modules used in previous ilss percentage cataloging 100% circulation 100% opac 100% serials 77% acquisitions 76% course reserves 64% interlibrary loan 28% other 9% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 11 guo and xu table 12. other modules used in previous ilss other modules used in previous ilss analytics booking course reserves discovery system electronic resource management ereserves inn-reach licensing part ii: implementation process alma modules/functions the majority of libraries reported that they will implement or have implemented the following alma modules: fulfillment, primo/primo ve, resource management, and acquisitions (see table 13). some libraries mentioned that they also used summon to replace primo/primo ve as they had used it before the system migration. table 13. alma modules/functions implemented alma modules/functions implemented percentage fulfillment 100% primo/primo ve 93% resource management 92% acquisitions 84% erm (electronic resources management) 77% course reserves 73% network zone 50% interlibrary loan 40% digital collections 21% other 8% selection process rfi and rfp when asked if an rfi (request for information) was involved, more than half of the libraries responded with a confirmative answer (see fig. 1). about half of the libraries reported that they did not conduct a system functionality survey to collect information from library users and colleagues (see fig. 2). more than half of the libraries indicated that the rfp (request for proposal) process is required for the system migration (see fig. 3). there were a variety of reasons why for those libraries who did not conduct the rfp process (see fig. 4), such as an rfp may not be necessary when migrating systems to the same vendor, there was no increase in expenditure, or the expenditure did not reach a budget threshold (e.g., less than $100,000), or the previous contract stipulated it if upgrading to a new product with the same vendor. another reason was that libraries might have an existing relationship with vendors and would like to continue using information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 12 guo and xu their products. some libraries were given authority by the university administration and library directors to handle the negotiation, or they thought an rfi offered sufficient information to make this decision. other libraries had no choice in conducting an rfi or rfp process for reasons such as their system was outdated and they had to migrate, the decision was made by consortium, or alma was their sole source procurement. figure 1. whether an rfi (request for information) was involved. figure 2. whether a system functionality survey was conducted. yes 52% no 40% unknown 8% no 51%yes 43% unknown 6% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 13 guo and xu figure 3. whether an rfp (request for proposal) was involved. figure 4. the rationales for libraries who did not conduct the rfp. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 14 guo and xu decision-making the authors found that the common roles involved in the decision-making process included library dean/director, alma local implementation team, and alma project working group (consortium) (see fig. 5). some libraries indicated that their system migration decision was made by university executives (provost, vp finance, cio, and cfo), campus it, aul for library technology, or all librarians/staff. one library reported that the dean of arts, languages & learning services made the selection decision instead of the library or librarians. figure 5. the decision makers. important factors for system selection the authors found that the four most important elements to consider for system selection were budget reality; electronic resource management (erm), bibliographic, and authority control; discovery layers (primo, primo ve); and cloud hosted (see table 14). information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 15 guo and xu table 14. the important factors for system selection important factor for system selection strongly disagree somewhat disagree neither agree nor disagree somewhat agree strongly agree the budget reality 3% 6% 11% 34% 47% the number of libraries adopted 7% 7% 27% 40% 19% erm, bibliographic, & authority control 2% 2% 17% 38% 41% discovery layers (primo, primo ve) 6% 4% 13% 27% 50% the analytics/reporting functionality 4% 6% 15% 41% 35% cloud hosted 3% 3% 12% 36% 47% the campus it infrastructure & its ecosystems 8% 12% 31% 31% 18% integration with other erps 12% 15% 30% 33% 10% customer support & satisfaction 4% 6% 21% 37% 31% system user training programs 5% 11% 24% 38% 21% figure 6. the data migrated to alma. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 16 guo and xu data migrated the most common types of data migrated to alma were bibliographic records, holdings and items, patrons, and circulation data (see fig. 6). some libraries reported that they also migrated other types of data including vendor lists, e-resource data, all available data types, etc. discovery service the survey asked if there were any libraries that migrated to alma and did not choose primo/primo ve for their discovery service. nine libraries reported they were in this case. four of them used summon, four chose ebsco discovery service, and one adopted their locally developed product. when asking the reason for their choices, the nine libraries indicated that they would like to stay with the existing discovery service. additionally, two of the libraries stated that a budget limitation was a part of their reasons, and one library thought the better discovery service for users was the rationale. part iii: feedback on alma migration system migration evaluation the majority of libraries reported that they did not conduct a formal post-migration evaluation. half of the libraries thought the migration achieved their project goals, or met the needs of library operations (acquisitions, cataloging, fulfilment, discovery, etc.) (see fig. 7). figure 7. whether a formal post migration evaluation was conducted. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 17 guo and xu some libraries also provided their own migration evaluation, including rfp mandatory requirements signoff, availability study, focus groups with library staff, usability testing with students and faculty, feedback and cross-checking with consortium, debrief of library staff, etc. some only did an informal evaluation, which turned out to be not handled well or not very satisfactory. for example, one consortium did a survey on the migration and provided the feedback to ex libris for improvement. other libraries reported that they had not done the evaluation as they did not start the migration process, were still in the migration stage, that an evaluation was not a part of the decision-making process, or that alma was offered as a free product because of their consortial partnerships. valuable lessons learned the authors asked what were the most valuable lessons the libraries had learned from the migration project, and how they would implement the migration differently if they had a chance to do it again. the most valuable lessons concentrated on training, communication, engagement, implementation process, and data cleanup/preparation (see fig. 8). these lessons are shared in greater detail in the discussion section. figure 8. the valuable lessons learned from the migration project. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 18 guo and xu prospective migration when asking if libraries would consider working with ex libris again if they migrated to a new system in the future, 70 percent of libraries gave an affirmative answer, but some libraries indicated that they would seek other alternatives (see fig. 9). when asked how likely libraries would be to consider implementing an open-source ils, the majority of libraries conveyed that they would not consider open source; only 7 percent of libraries would consider it (see fig. 10). figure 9. whether ex libris products would be considered in the future. figure 10. whether an open-source ils would be considered in the future. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 19 guo and xu discussion the authors examine the above findings further through the lens of the research questions raised in the literature review section. the decision-making process and factors considered the survey indicates that both rfi and rfp are important for a selection process. fifty-two percent of the libraries conducted an rfi and 57 percent required the rfp process for the system migration. interestingly, even with a variety of sound reasons such as no increase in expenditure, within the budget threshold, existing relationships with vendors, sole source procurement, consortium decision, riders, etc., some libraries still did not roll out the rfp process. besides rfi and rfp, 43 percent of libraries went through a system functionality survey to collect information from library users and colleagues. for most libraries, the library dean or director, alma local implementation team, or alma project working group of a consortium were involved in the decision-making process. in some cases, university executives such as provost, vp finance, cio, cfo, campus it, and associate dean or associate university librarian for library technology made a collective decision. in a rare case, the dean of arts, languages & learning services made the call for the system selection. when considering system migration, many factors can be important. this survey shows that libraries mainly consider budget reality; erm, bibliographic, and authority control; discovery layers; and cloud-hosted systems. it is interesting that most libraries would like to move to a cloud-based system that has better functionality for discovery and electronic resources management. the survey also reveals that library administration needs to find a way to offset the cost increase of the system migration. the lack of comparable system or service offerings in the market also contributes to the decision on system selection. project evaluation project evaluation provides important feedback from both system users and system providers and a great opportunity for libraries to learn. the findings indicate that many libraries do not have a formal assessment process. some consortia have conducted surveys and provided feedback to ex libris, but no response reported to the feedback from ex libris. both libraries and system vendors have lost the opportunity to learn and improve project management. for example, welldocumented complaints on dissatisfaction with ex libris training have not been effectively addressed. some libraries believe a demonstration-focused training model does not provide the same experience as onsite training offers. many libraries have had trouble with acquisitions workflows. the eocr (electronic order confirmation record) and edi (electronic data interchange) processes are standard practices in libraries today to generate order records and create invoices automatically and should be a part of implementation contract to ensure that libraries can operate appropriately after a new system goes live. it is time for both libraries and system providers to consider a formal project assessment as a part of system migration down the road. libraries will not do better if they do not improve today. libraries cannot improve if they do not know where previous projects have gone wrong. a better way to learn from mistakes is project assessment. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 20 guo and xu impacts on library staffing and library operation some libraries reported that insufficient staffing over the system migration has created additional problems and hardships. some library departments have been stretched very thin in order to work on the migration project in addition to their regular operational duties. however, about onethird of survey-participating libraries have reported that meeting the needs of library operation including acquisitions, cataloging, fulfilment, and discovery is a criterion of project evaluation. the lack of dedicated lsp project migration staff creates a challenge for system migration. most importantly, additional staffing time and technical capacity are important factors that decide if libraries could fully take advantage of the functionalities of a new system. libraries might manage the system migration better by hiring additional technical staff on a project basis to handle technical aspects if staff cannot be released from library operation to focus on the migration project. the system integration and unified automated workflows of a modern lsp can enable libraries to run their operations more efficiently. particularly in a shared environment or network, libraries could share bibliographic records for general collections wider and deeper, which could dramatically reduce the need for both original and copy cataloging. system staff no longer need to install or upgrade proprietary software and maintain servers in house. these changes might cause job insecurity for some library staff. it is critical for library leaders to make adjustments to some job responsibilities or develop new skills to meet new demands. this requires library administration to create a culture of embracing change, learning, and collaboration. staff can take the advantage of a new system by being curious and reassessing previous workflows. library administration could create a flexible structure to encourage learning and collaboration across departments. lessons learned many libraries shared valuable lessons they learned from the migration projects. those lessons concentrate on training, communication and engagement, implementation process, and data cleanup and preparation. training many libraries expressed dissatisfaction with the training provided by their vendor. for example, libraries moving to alma reported that ex libris could have focused more on in-person, postmigration training. as it was, staff felt undertrained because they had access only to online training before the libraries had access to their own data in alma/primo. additionally, ex libris did not have regular trainers for a particular library, so there was less continuity across training sessions than there could have been. some suggest that ex libris do a concentrated several-day initial training for migration so that libraries have a solid overview of the entire system before data exports for testing loads, and then delve into a detailed weekly training that includes more library staff. it seems a good idea to schedule more training sessions after implementation because libraries may not know how the system functions during the implementation period. in an ideal world, libraries would put more contractual obligations on ex libris to train staff more thoroughly. after all, libraries need to hold ex libris more accountable for project outcome. for consortium libraries, they should insist that ex libris provide specialized individual trainers and technical contacts. attending group training sessions conducted by a variety of different ex libris information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 21 guo and xu trainers does not work well in large migration projects. ex libris needs to train the library staff rather than focusing on training the consortium support staff and expecting them to do most of the staff training. ex libris indeed carries a variety of training webinars that are free; however, for bespoke training or intimate training sessions, they charge their customers. a barrier for many libraries is that they just cannot afford to pay more on these bespoke training sessions so they depend on other in-house training and best practices (e.g., work groups, training committees, inhouse power users, etc.) to train/manage the training needs of their library personnel. communication and engagement many libraries express that communication is extremely important and buy-in from stakeholders at all levels is critical to the migration project’s success. investing the initial time to have all stakeholders onboard will pay off. blocking off time for weekly meetings with involved staff and ex libris is key. some suggested asking more questions and seeking to understand the functionality of the new system more deeply. for consortial libraries, librarians can become much closer to each other and learn to seek out and receive help from one another in the ways that they might never do before. the networking can be an invaluable source for mutual support going forward. some libraries reported that due to the lack of communication, an overly sudden decision for the implementation timeline was made at the legislative level. information regarding requirements and expenses was not fully clarified before the process began and came as a surprise during the migration. the whole process felt very rushed by the vendor with insufficient trainings, which turned out to be very dissatisfying. implementation process a system migration is complex and requires a great deal of time, institutional resources, and staff. some key processes needed to be better prepared in advance, such as staff trainings, project plans and major milestones, system analysis, customer inputs for implementation and configuration, data cleanup, physical to electronic processing (p2e), source data extraction, validation and delivery, workflow analysis, fulfillment network, authentication, third-party integrations, data review and testing, go-live readiness checklist, etc. in practice, the migration was often more time and resource-intensive than expected, meaning that libraries found it difficult to complete their part of the process in the contractually-specified time. libraries should clear the decks of core staff to focus on migration, and make sure there are no other major projects occurring at the same time. if staff have insufficient time during the migration window, libraries need to hire temporary experienced staff for the project. this investment will benefit library operation in the long run. the implementation team members should have more dedicated time to be trained so that the library staff are well prepared and knowledgeable in the areas in which they work. it is wise to clean up data as much as possible prior to migration. it would be ideal if the existing workflows were fully documented with diagrams so that it would be easier to determine what parts of the workflows need change. some libraries reported their migration happened during the pandemic with state-issued stay-athome orders in force. it was extremely stressful juggling all of the changes for the library while keeping up with system migration. ideally, it would be better to avoid doing the migration during a pandemic and postpone the migration. but if libraries have no other choices, one benefit is to take advantage of closures for cutover days. the stress of the implementation and trying to get information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 22 guo and xu things done may cause frustrations to boil over. it is advised to manage these situations by adding additional support where needed and by always ensuring that communication is a top priority so that any confusion is kept to a minimum. for consortial libraries, it is important for individual institution members to have their own project managers. the consortial libraries would have tried to standardize more configurations across the consortia, like user groups, circulation settings, item types, etc. some libraries felt the whole migration process was rushed by the vendor, which turned out to be not very successful. libraries should not let the vendor talk them into a compressed, severalmonth migration timeline; instead, they should spend more time in the preparation and implementation process. data cleanup and preparation although it is tedious and time consuming, many libraries suggested cleaning up data as much as possible prior to migration. more pre-migration data cleanup would avoid the post-migration mess. some libraries recommended more stringent cleanup of catalog records, acquisitions data, circulation data, patron records, weeding, etc. it is important to make sure the cataloging structure matches the structure of the new system. had they taken the data review stage more seriously and fully modeled the processes and workflows that would be needed, they would have had fewer data cleanup problems to address after the migration was complete. some libraries cautioned that alma’s p2e (physical to electronic) migration process was more complex than anticipated. they stated that the p2e conversion did not work as it should have, and ex libris should do a better job in the future. due to misalignment of source and target collections, the p2e process resulted in a large cleanup after the migration. a number of libraries would have asked more questions about what data was migrated and to where. ex libris had migrated data that should not have been migrated. as a result, a messy system became a reality. planning for future system migrations when asking what libraries will do differently for a future system migration, many provided very interesting insights. some libraries believed that the system migration put library leadership in a difficult position. they needed to engage all library employees in decision-making and provide staff with the resources they needed to navigate change, experience the vulnerability of learning a new system, and even have difficult conversations with colleagues. at the same time, library leaders are accountable to their parent organizations and subject to budget pressure and mandates to follow procurement processes, which are geared around efficiency and hierarchy rather than promoting democratic decision-making and self-governance. many libraries expressed a concern about training. they stated that they would demand a separate contract for training in the future and put more contractual obligations on system providers to train staff more thoroughly. they would spell out in greater detail what a successful migration would consist of to hold ex libris responsible for outcome. during the bidding process, library staff should be less distracted by smooth presentations but ask difficult questions about system functionality. another concern is about the pricing. one early adopter of alma stated that they learned the risks, rewards, and excitement of helping with a developing product as they felt aleph was a dead end information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 23 guo and xu and did not see many other alternatives. they would have negotiated more strongly with ex libris on pricing considering the immaturity of the product and pricing model at the time of adoption. some libraries felt they were not given competitive pricing, and their costs went up significantly, which constituted a large budget shift. some small libraries believed alma is too big for them, and oclc might be more appropriate for their size of collections and materials. they realized they underutilized a very expensive system. some libraries preferred a customized implementation as opposed to the one-size-fits-all model ex libris offered. they stated that despite learning the new system, they found that the solutions ex libris offered for their implementation rarely worked. they would better off fitting in their own workflows with alma (especially for budgeting). ex libris seems to be not ready to work with single-campus small colleges. other libraries reported that they had multiple people in a project management role, which created communication issues. they learned that in any future migration processes they should have a single project manager empowered to make decisions. for consortium libraries, some libraries suggested taking advantage of cohorts of migrating institutions to share information, issues, and raise common questions. they would have made some local decisions instead of simply going with the consortiums. one consortium experienced a major difficulty that the group implementation took place in different countries. the time difference with their implementation team had added an additional dimension to project management. they would have done an individual migration instead of a group migration since they had a very complex institutional structure. some libraries strongly recommended open-source systems as well. they believed that the trend toward vertical consolidation of vendors is not healthy for the library system market in the long run. with mergers and acquisitions, gigantic companies are formed and might over-control the market and pricing. conclusions decision-making on the selection, procurement, and implementation of a new lsp is a process that requires gathering information and seeking input from library administration, experts, and different levels of stakeholders in a systematical way to ensure the system quality, fitness, and a successful implementation. the findings suggest that libraries should adopt an rfi/rfp (request for information/proposal) or system functionality survey as the basis for system selection. budget, resource discovery, and electronic resources management are the most important factors to be considered in an ils selection. staffing time and technical capability must be addressed before implementing a new system to enable libraries to manage user expectations. insufficient staff and the lack of technical skills could affect the realization of the benefits of a new system. technological change can lead to the shifts of staff job responsibilities and lead to a new way of working together. it is important for library administration to address organizational change when making technological change. a formal project assessment is essential for libraries and system providers to learn and improve collectively. open-source systems could open doors for libraries to seek more customized and affordable systems. research limitations like all research studies, this study has limitations that provide opportunities for further investigations. firstly, because we asked for responses from individuals, not libraries, the findings information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 24 guo and xu might be biased by participants due to individual experiences. secondly, due to the limitation of time, space, and number of survey questions, reported data mainly focused on alma libraries and could not cover migration experiences of libraries migrating to other products or all aspects of system migration. further research would benefit the library community from interviewing participating libraries in a different size, type, and geographic location as well as different system providers. practical implications every new system has its advantages and downsides. to help libraries fully take advantage of a new system, it would be helpful if vendors could evaluate training, physical to electronic (p2e) process, and system affordability. providing training after a system goes live will help libraries implement workflows effectively and give staff better experience. p2e is crucial for ensuring that all relevant information is transferred and maintained in the new system. vendors could address potential p2e issues before a system migration takes place so that libraries might approach data cleanup differently. it would be great if vendors could customize system modules or functionalities as needed by both small and large libraries. this will give libraries flexibility to invest in most needed library operations at different prices to make the system affordable. customer services can be crucial for libraries to continue optimizing the new system down the road. regularly seeking libraries’ feedback can foster a positive customer relationship and benefit both libraries and vendors. acknowledgements the authors appreciate the support of marshall breeding and sue julich for providing the library contact lists. the authors would also like to thank the office of research integrity for reviewing the survey questionnaire and providing comments. much gratitude goes to the survey participants who volunteered their time to participate in this study and took the time to communicate with the authors in order to provide accurate responses for their libraries or consortia. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 25 guo and xu appendix: survey questionnaire adult online consent to participate in a research study a customers’ perspective: decision-making on system migration summary information things you should know about this study: • purpose: the purpose of the study is to understand how library leaders make decisions on system migration during technological change and the impact of these decisions on library operation and staff. • procedures: if you choose to participate, you will be asked to answer 12 multiplechoice questions and 3 open-ended questions. • duration: this will take about 15 to 20 minutes. • risks: there is little risk or discomfort from this research since you share your project experience anonymously. • benefits: the main benefit to you from this research is to self-reflect on the project and have an opportunity to share the project experience. we plan to publish our findings, which will bring potential benefits to you and the library community. • alternatives: there are no known alternatives available to you other than not taking part in this study. • participation: taking part in this research project is voluntary. please carefully read the entire document before agreeing to participate. confidentiality the records of this study will be kept private and will be protected to the fullest extent provided by law. in any sort of report we might publish, we will not include any information that will make it possible to identify you. research records will be stored securely and only the researcher team will have access to the records. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 26 guo and xu the following questions are for general analytical use only. although qualtrics does not collect your email address, please do not provide your personal identification indicators (pii) with your answers. if pii appear in the responses, we will apply a data anonymization process to anonymize pii after the results are added into the final tally. right to decline or withdraw your participation in this study is voluntary. you are free to participate in the study or withdraw your consent at any time during the study. you will not lose any benefits if you decide not to participate or if you quit the study early. the investigator reserves the right to remove you without your consent at such time that he/she feels it is in the best interest. researcher contact information if you have any questions about the purpose, procedures, or any other issues relating to this research study you may contact jin guo (jiguo@fiu.edu) or gordon xu (gordon.xu@njit.edu). irb contact information if you would like to talk with someone about your rights of being a subject in this research study or about ethical issues with this research study, you may contact the fiu office of research integrity by phone at 305-348-2494 or by email at ori@fiu.edu. participant agreement i have read the information in this consent form and agree to participate in this study. i have had a chance to ask any questions i have about this study, and they have been answered for me. by clicking on the “consent to participate” button below i am providing my informed consent. consent to participate mailto:jiguo@fiu.edu mailto:gordon.xu@njit.edu information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 27 guo and xu section i: library profile and background information 1. your title: a. dean/director of the library/university librarian b. system librarian c. other (please specify: _________________) 2. describe your institution a. location i. us ii. canada iii. state 2. total student and faculty population a. total student population (number of ftes) b. total faculty population (number of ftes) 3. information about your library a. single campus library b. part of a multicampus library system c. part of a consortium d. other (please specify: _________________) 4. previous ils: a. the previous ils name: b. the previous ils vendor: c. years with the previous system: d. was it your first ils? a. yes b. no 5. ils modules in use prior to alma migration: (please check all that apply) a. acquisitions b. cataloging c. circulation d. interlibrary loan e. reserves f. serials g. opac h. other (please specify: _____________________) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 28 guo and xu section ii: alma implementation process 6. alma modules/functions implemented: (please check all that apply) a. acquisitions b. resource management c. fulfillment d. interlibrary loan e. course reserves f. erm g. network zone h. primo/primo ve i. digital collections j. other (please specify: ________________________) 7. the system selection process • was an rfi (request for information) involved? a. yes b. no • did you conduct a system functionality survey to collect information from library users and colleagues? a. yes b. no • was the rfp (request for proposal) process required? • a. yes, please specify the person/department that prepared for rfp. _____ • b. no, please provide the reason why (e.g., budget cap less than $100k, etc.)_____ 8. who was involved in the decision-making process? • alma project working group (consortium) • alma local implementation team • project manager(s) • library dean • institutional coordinators/leads • departmental heads • others (please specify ______) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 29 guo and xu 9. what are important factors for system selection (5 points, weight/per response)? • the budget reality • the number of libraries adopted • e-resource management (erm), bibliographic, and authority control • discovery layers (primo, primo ve) • the analytics/reporting functionality • cloud hosted • the university/college it infrastructure and its ecosystems • integration with other erp (enterprise resource planning) systems/platforms • customer support & satisfaction • system user training programs 10. what data was migrated (please select all that apply)? • authority data • bibliographic records • holdings and items • patrons • loans, holds, and fines • acquisitions • course reserves • digital metadata and objects 11. please skip this question if you use primo/primo ve. if you chose non-ex libris products for discovery service, please specify the product____, and select the possible reason below: • budget limitation • stay with the existing discovery service • others section iii: feedback on alma migration project. 12. how did your library evaluate the system migration project? • no formal post-migration evaluation • user satisfaction survey • achieved the project goals • met the needs of library operations (acquisitions, cataloging, fulfilment, discovery, etc.) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 30 guo and xu 13. open-ended questions • what are the most valuable lessons you have learned from this project? if you had a chance to do it again, how would you implement the migration differently? • would the library consider working with ex libris again if it were to migrate to a new system in the future? • how likely is it that this library would consider implementing an open-source ils? information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 31 guo and xu endnotes 1 zhonghong wang, “integrated library system (ils) challenges and opportunities: a survey of us academic libraries with migration projects,” the journal of academic librarianship 35, no. 3 (2009): 207–20, https://doi.org/10.1016/j.acalib.2009.03.024. 2 teri oaks gallaway and mary finnan hines, “competitive usability and the catalogue: a process for justification and selection of a next-generation catalogue or web-scale discovery system,” library trends 61, no. 1 (2012): 173–85. 3 guoying liu and ping fu, “shared next generation ilss and academic library consortia: trends, opportunities and challenges,” international journal of librarianship 3, no. 2 (2018): 53–71. 4 matt goldner, “winds of change: libraries and cloud computing,” bcla browser: linking the library landscape 4, no. 1 (2012): 1–7. 5 liu and fu, “shared next generation,” 53–71; jone thingbø, frode arntsen, anne munkebyaune, and jan erik kofoed, “transitioning from a self-developed and self-hosted ils to a cloudbased library services platform for the bibsys library system consortium in norway,” bibliothek forschung und praxis 40, no. 3 (2016): 331–40, https://doi.org/10.1515/bfp-20160052. 6 philip calvert and marion read, “rfps: a necessary evil or indispensable tool?” electronic library 24, no. 5 (2006): 649–61. 7 matt gallagher, “how to conduct a library services platform review and selection,” computers in libraries 36, no. 8 (2016): 20. 8 zhongqin (june) yang and linda venable, “from sirsidvnix symphony to alma/primo: lessons learned from an ils migration,” computers in libraries 38, no. 2 (march 2018): 10–13. 9 gallaway and hines, “competitive usability,” 173–85. 10 alan manifold, “a principled approach to selecting an automated library system,” library hi tech 18, no. 2 (2000): 119–30, https://doi.org/10.1108/07378830010333455. 11 ayoku a. ojedokun, grace o. o. olla, and samuel a. adigun, “integrated library system implementation: the bowen university library experience with koha software,” african journal of library, archives and information science 26, no. 1 (2016): 31–42. 12 lyn h. dennison and alana faye lewis, “small and open source: decisions and implementation of an open source integrated library system in a small private college,” georgia library quarterly 48, no. 2 (spring 2011): 6–9. 13 daniel lovins, “management issues related to library systems migrations. a report of the alcts camms heads of cataloging interest group meeting. american library association annual conference, san francisco, june 2015,” technical services quarterly 33, no. 2 (2016): 192–98, https://doi.org/10.1080/07317131.2016.1135005. https://doi.org/10.1016/j.acalib.2009.03.024 https://doi.org/10.1515/bfp-2016-0052 https://doi.org/10.1515/bfp-2016-0052 https://doi.org/10.1108/07378830010333455 https://doi.org/10.1080/07317131.2016.1135005 information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 32 guo and xu 14 kyle banerjee and cheryl middleton, “successful fast track implementation of a new library system,” technical services quarterly 18, no. 3 (2001): 21–33. 15 joshua m. avery, “implementing an open source integrated library system (ils) in a special focus institution,” digital library perspectives 32, no. 4 (2016): 287–98, https://doi.org/10.1108/dlp-02-2016-0003. 16 morag stewart and cheryl aine morrison, “breaking ground: consortial migration to a nextgeneration ils and its impact on acquisitions workflows,” library resources & technical services 60, no. 4 (2016): 259–69. 17 zahiruddin khurshid and saleh a. al-baridi, “system migration from horizon to symphony at king fahd university of petroleum and minerals,” ifla journal 36, no. 3, (2010): 251–58, https://doi.org/10.1177/0340035210378712. 18 efstratios grammenis and antonios mourikis, “migrating from integrated library systems to library services platforms: an exploratory qualitative study for the implications on academic libraries’ workflows,” qualitative and quantitative methods in libraries 9, no. 3 (september 2020): 343–57, http://qqml-journal.net/index.php/qqml/article/view/655/585. 19 abigail wickes, “e-resource migration: from dual to unified management,” serials review 47, no. 3–4 (2021): 140–42. 20 yang and venable, “from sirsidynix,” 13. 21 joseph nicholson and shoko tokoro, “cloud hopping: one library’s experience migrating from one lsp to another,” technical services quarterly 38, no. 4 (2021): 377–94. 22 ping fu and moira fitzgerald, “a comparative analysis of the effect of the integrated library system on staffing models in academic libraries,” information technology and libraries 32, no. 3 (september 2013): 47–58. 23 geraldine rinna and marianne swierenga, “migration as a catalyst for organizational change in technical services,” technical services quarterly 37, no. 4 (2020): 355–75, https://doi.org/10.1080/07317131.2020.1810439. 24 vandana singh, “experiences of migrating to open source integrated library systems,” information technology and libraries 32, no. 1 (2013): 36–53, https://doi.org/10.6017/ital.v32i1.2268; shea-tinn yeh and zhiping walter, “critical success factors for integrated library system implementation in academic libraries: a qualitative study,” information technology and libraries 35, no. 3 (2016): 27–42, https://doi.org/10.6017/ital.v35i3.9255; grammenis and mourikis, “migrating from integrated library systems,” 343–54; xiaoai ren, “service decision-making processes at three new york state cooperative public library systems,” library management 35, no. 6 (2014): 418–32, https://doi.org/10.1108/lm-07-2013-0060; wang, “integrated library system,” 207– 20; pamela r. cibbarelli, “helping you buy ils,” computers in libraries 30, no. 1 (2010): 20–48, https://www.infotoday.com/cilmag/cilmag_ilsguide.pdf; calvert and read, “rfps,” 649–61. https://doi.org/10.1108/dlp-02-2016-0003 https://doi.org/10.1177/0340035210378712 http://qqml-journal.net/index.php/qqml/article/view/655/585 https://doi.org/10.1080/07317131.2020.1810439 https://doi.org/10.6017/ital.v32i1.2268 https://doi.org/10.6017/ital.v35i3.9255 https://doi.org/10.1108/lm-07-2013-0060 information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 33 guo and xu 25 gallaway and hines, “competitive usability,” 173–85. 26 fu and fitzgerald, “a comparative analysis,” 47–58. abstract introduction literature review methods data collection data analysis findings part i: library profile & background information respondents participating libraries geographic location library size library type previous integrated library system (ils) part ii: implementation process alma modules/functions selection process rfi and rfp decision-making important factors for system selection data migrated discovery service part iii: feedback on alma migration system migration evaluation valuable lessons learned prospective migration discussion the decision-making process and factors considered project evaluation impacts on library staffing and library operation lessons learned training communication and engagement implementation process data cleanup and preparation planning for future system migrations conclusions research limitations practical implications acknowledgements appendix: survey questionnaire endnotes lib-mocs-kmc364-20131012113638 book reviews the future of the printed word: the impact and implications of the new communications technology. edited by philip hills. westport, conn.: greenwood, 1980. 172p. $25. lc: 80-1716. isbn: 0-313-22693-8 (lib. bdg.). the character of this volume is as much that of a topical journal or annual review as that of a monograph. a dozen authors have contributed thirteen chapters, all but one prepared especially for this publication. ten of the chapters are by british authors, two by americans, and one by european community personnel located in luxembourg. an amusing punch satire about book (built-in orderly organized knowledge) is reprinted as an unnumbered fourteenth chapter. in an excellent opening essay, john m. straw horn notes: "in this book, the expression printed word is construed very broadly, to include words in any kind of display: paper, microforms, crt's, plasma panels and so on." his essay is a terse but pointed review of the organization of information transfer, some current trends, factors affecting acceptance of new technologies, and some broad projections for the future. provocative essays by maurice b. line and p. j. hills, editor of the volume, explore the printed word from the points of view of a bookperson and an educator. in one of the most elegant metaphors to appear in information science literature, line suggests: "the printed butterfly will emerge from its electronic chrysalis, but it will also return again to it in due time. the vast majority of documents will thus be stored in electronic (chrysalis) form, but the majority of those used at any given time will be in their printed (butterfly) form." two incisive and thorough chapters on official information by patricia wright systematically explore the use of old and new technologies for forms, leaflets, and signs. 239 wright makes acute and useful observations on how technology can hinder or help gathering and dispersion of governmental information. the graphic information research unit of the royal college of art has done excellent work in recent years in exploring how various display options affect comprehension. linda reynolds provides a good essay, "designing for the new communications technology," based on that research. the review of prospects for electronic journal publishing by donald w. king is a good overview, especially for beginners. a chapter on euronet diane describes problems in creating an online database capability in the european political environment. chapters on printing technologies, microforms, and videodiscs cover all major alternatives but suffer from brevity. two brief but competent speculative essays, which add little, complete the volume. the work lacks a general index, but the organization of chapters makes this a minor flaw. use of presumably common british acronyms without explanation, especially in credits and citations, is an irritant for non-u.k. readers. the work would make an excellent supplementary text for a course on the history of the book. practitioners in publishing or library and information science will find much of interest.-brian aveney. turnkey automated circulation systems: aids to libraries in the market place. edited by judith bernstein . chicago. american library assn., 1980. 332p. $10.50. when my library entered the marketplace for an automated circulation system, i searched the literature for aids. had i found this book at that time i would have been disappointed. what i would expect from a 332-page book with a subtitle, "aids to libraries in the market place," would be numerous examples of what had been done 240 journal of library automation vol. 14/3 september 1981 before. i would expect samples of the analyses that other libraries had done to justify entering the marketplace, samples of the rfps that had been sent to vendors, and samples of the contracts that had been signed. i would like to see a case study (or two) of the complete process of procurement. admittedly, this expectation is somewhat of an ideal, but these are "aids" that we searched for and that other libraries now ask from us. what does this book provide? an editorial introduction gives a sense of the difficulties of the marketplace and the frustrations encountered in it. a two-page bibliography gives a reasonable selection of readings to provide a background for decision making. a discussion titled "hiring a consultant-why and how," is a very useful enumeration of details to be considered in the decision to hire a consultant and in the agreement with a consultant. a model request for proposal is a good synthesis of the details to be included in almost every library's rfp and thus provides a starting point for the library new to the marketplace. all of this is what i consider to be the substance of this book, and it ends at page 40. the remaining 292 pages are devoted to the "profiles" of individual libraries which have installed automated circulation systems. the profiles are intended to assist in the identification of libraries to be contacted for further information, but provide little useful information by themselves. my primary objection to this book is the misleading nature of the citation. one expects more than three hundred pages of "aids" and finds a directory with a fortypage preface. but for the librarian new to the marketplace it may be worth the price.-alan e. hagyard, yale university library, new haven, connecticut. archives and the computer, by michael cook. london: butterworths, 1980. 152p. $29.95. lc: 80-41286. isbn: 0-40810734-0. michael cook recognizes the special predicament of the archivist whose job consists of trying to satisfy three contradictory needs: (1) the need to arrange and describe archives by their provenance, (2) the need to store them most efficiently by shape and size, and (3) the need to access them to answer inquiries that are mostly subjectoriented. the solution to these conflicting requirements may come from the computer. as cook says, "the speed and variety of computerized lists and indexes derived from a single data base could solve this problem by producing finding aids in all possible sorts of order." in a very handsomely produced, sturdily bound book, archives and the computer, michael cook, archivist of the university of liverpool, reports on various computer systems serving the needs of the archivists. his book starts with a general discussion on the nature of automated systems and their relation to manual ones. this is followed by the description of a select group of archives systems-some still in use, others put to their well-deserved rest after a few years' use. he covers records management systems (i.e., the area of handling current records) and archives management systems (i.e., the handling of noncurrent documents). in the final chapter cook moves the discussion away from computer processing of traditional, familiar forms of archival material, focusing instead on processing archives that are themselves machine-readable data files. how does the archivist accomplish all of the necessary tasks if the archives are not readable by the human eye? how does he appraise, arrange, describe, and access them? i like mr. cook's cautious and sober attitude. talking about system design, he remarks, "at this stage decisions will be made which will be irrevocable in practical terms, and may cause much trouble later. " about implementation and testing, "computer systems should help people to work more effectively in a more interesting environment; if they fail in this, or appear to fail, there is something wrong, and it would perhaps be better not to introduce the change." the records management systems he describes are used by british county and city record offices. an interesting feature in one of them, a system called arms, is a printout that tabulates for each class of documents the number of requests in a year, per year stored. this printout could be very helpful in modifying established retention periods on the basis of experience. the following archives systems are described: prospec (adopted by the public record office of london) , nars a-1 (used by the national archives of the usa), spindex (first used by the national archives and the national historical publications and records commission), selgem (used by the archives of the smithsonian institution), stairs (an ibm system, used, among others, by the house of lords record office in london), paradigm (developed and used at the university of illinois), mistral (used by the national archives of ivory coast), and arcaic (used and abandoned by the east sussex record office). of all these systems, i found the description of selgem the most educational. besides listing the fields making up a computer record, cook shows an example of an actual record as it appears in the master list, and as it appears in the printed guide to the archives. he also includes an actual segment of the name/ subject index. although there is a brief mention about the choice between networking versus isolated, separate systems, the book does not speculate about the possibility of a network of many institutions building a common database. nor does the author discuss the much debated and very timely question of whether archivists could possibly agree on a uniform computer record for the description of manuscripts and archives, similar to the way in which librarians have agreed on using the marc formats for the description of their materials. a glossary of technical terms, a "select directory" of archival systems, and a "select bibliography" are useful additions to the main text. this book is more recommended to the archivist looking for a computer system than for the systems analyst who wants to learn how archives are processed.suzanna lengyel, yale university library, new haven, connecticut. the library and information manager's guide to online seroices. edited by ryan e. hoover. white plains, n.y.: knowledge industry publications, 1980. 270p. $29.50 hardcover, $24.50 softcover. lc: 8021602. isbn: 0-914236-60-1 (hardcover); book reviews 241 0-914236-52-0 (softcover). hoover and jeven colleagues provide an overview of the main issues and techniques involved in starting and managing an online retrieval service. the emphasis is on a library setting-the implicitly broader focus conveyed by the title is not matched by any specific coverage of, for example, the online search activity of the for-profit information brokers, where funding, staffing, publicizing, and the search process itself are handled differently than in libraries. the three large, general search services (lockheed, sdc, and brs) are used throughout for the descriptions and search examples, and their bibliographic databases inevitably receive the most attention. there is a noticeable slant toward the two agencies with which several of the contributors are or were affiliated-the university of utah (which doesn't detract from the book's objectivity) and sdc (which does). the chapters are of uneven quality and scope. most of the obvious areas are covered-the available search systems and databases; equipment needs; search techniques; managing an online service in a library; training searchers; promoting service; and measurement and evaluation. taken as a whole, the.book is a good stateof-the-art report, even though it is already becoming outdated in terms of industry facts. the numerous charts and tables serve to flesh out the text, but do we really need six photographs of terminals (two of them showing the same searcher at the same terminal , the only difference being that in one there is an onlooker) to illustrate that "some searchers prefer to have the user present"? brief chapters on the growing network of online user groups, and on the future of online services (largely derived from lancaster) end the text, and the book has a serviceable bibliography, glossary, and index. six years ago i reviewed one of the first kipi publicationsit was in typescript, comb-bound, a little more than one hundred pages, and it cost $24.50. this is a much better production and, considering inflation since 1975, it represents vastly better value for money. it should serve as a useful handbook for those of us in the field, as well as those just starting, for another 242 journal of library automation vol. 14/3 september 1981 year or two.-peter watson, california state university, chico. basics of online searching, by charles t. meadow and pauline atherton cochrane. new york: wiley, 1981. 245p. $15.95. lc: 80-23050. isbn: 0-417-05283-3. the use of online information retrieval services is becoming widespread throughout the information community, whether in traditional libraries or in business, industry, or government offices. the need for trained searchers is evident by looking at the job advertisements and at the quantity of training programs being offered around the country. the programs presented by the machine-assisted reference section (mars) of the reference and adult services division of ala are always packed. the librarians attending ala annual conferences seem to be hungry for any information available about online information retrieval services. this text fills an obvious need for the professional who attended library school before course offerings in online information retrieval were available. although online information retrieval is now being taught in most library and information science curriculums, there have been only a few attempts at providing a textbook for beginning students, and none of those has been very successful since the lancaster and fayen information retrieval online in 1973. basics of online searching is a text intended "to teach the principles of interactive bibliographic searching . . . to those with little or no prior experience. the major intended audiences are students, working information specialists and librarians, and end users, the people for whom all this searching is done. " because the authors have done an excellent job of targeting their audience and sticking to that target, this text will be useful at the introductory level. the authors cover the elements of interactive searching including the reference interview, boolean logic, search strategy development, telecommunications and equipment, basic database structure, selective dissemination of information, and how to get help from search-service vendors. the text is relatively free of jargon and does a good job of defining in context new terms as they appear. the authors begin with basic definitions and a brief overview of the process of interactive searching. the reference interview and search strategy development is covered adequately, first with an introduction and then in a later chapter providing more detailed information. telecommunications and computer equipment are covered in enough detail for the novice. the next five chapters cover search language, databases, various types of text searching, and how to get on and off the computer. this section of the book uses examples that show the different approaches to the same process on three different systems-brs, orbit, and dialog. the authors do not lose sight of their intent to demonstrate the principles of online searching. there is a brief chapter on selective dissemination of information (sdi) and cross-file searching. the chapter explains how sdi is used and gives examples of constructing and saving a search for sdi on each of the three systems. the last chapter of the book, "search strategy," is especially good. there seemed to be something beyond the basic elementary information of the preceeding chapters. the authors clearly demonstrate concept development and search strategy formulation. the authors do an excellent job of integrating the discussion of the three major search service vendors, lockheed's dialog, system development's orbit, and bibliographic retrieval services, inc. examples are used from each of the services with a discussion of the differences. the book does clarify the similarity of the services by showing how each function can be accomplished on each system. searchers using only one system now might use this text to see how easily their knowledge could be transferred to another system. problems with the text do not abound, but there are some that should be brought to the attention of the reader. there is a slight problem with the format of the examples. the reviewer found herself searching for the completion of a paragraph of text on a few occasions. the examples are very good and clear; they are simply not separated from the text adequately for easy reading. there were a couple of instances of unnecessary redundancy . t here were two separate discussions, one on truncation and one on searching word fragments, which could have been improved by integration into one section. there was a repetition of "steps in the presearch interview and the online search" in chapter 3 and then again in chapter 12. this is almost a page of steps, which are very good, but a simple reference back to the earlier list would have sufficed. but the biggest problem with the text in the eyes of this reviewer is that of omission. there was no discussion of citation searching, evaluation of search results, and no mention of the various training options available for the novice searcher. this reviewer would like to have seen more information on where to go next as guidance to the novice. the one hundred pages of appendixes seem unnecessary and will soon be out of date. library school teachers planning to use this as a text would do well to request free, up-to-date materials rather than relying upon the documents in the appendix, which are more than a year old at the time of this writing. most every book on this topic has made the same mistake of reprinting search-service and database-producer literature. overall, however, the authors have succeeded very capably in their intended endeavor "to teach principles, rather than the detailed mechanics of any particular search system." there is a place in the literature for this very basic text, which is well written, uses clear examples, and teaches in an understated way. for those people who are afraid of automation, afraid to touch a computer terminal, and are insecure about their ability to do online searching, this book will relieve most of those fears and insecurities. the authors acknowledge their desire to give simple instructions and offer a chapter called "assistance" for people who need more help. novices might assume they could read this book, purchase a terminal, get a password and system manual, and begin searching. as a matter of fact one could do this, but the results would likely be a discredit to the search-service vendor because of a lack of system-specific training on the part of the searcher. most people, like this reviewer, can conceptualize a new process, but would feel more comfortable with some type of formal hands-on book reviews 243 training-even for half a day. there are too many little things that can be an impediment to success. the reviewer would heartily recommend this book to inexperienced searchers and library school students but would warn the experienced searchers that there is nothing newforthem.-carolynm. gray, western illinois university, macomb. quick • search cross-system database search guides. san jose, calif.: california library authority for systems and services, 1980. 21 charts. $75 (class members), $95 (nonmembers). isbn: 0-938098-00-4. the class on-line reference service (colrs) is a cooperative program for public, academic, and special libraries offering training and consultation on almost any aspect of online reference searching through the major commercial vendors of databases. this service is a part of class, the california library authority for systems and services, and acts as a contact point for searchers and the database industry through vendor-training sessions, database training, and the coordination of large group contracts with dialog information services and bibliographic retrieval services (brs). this close relationship to the online industry gives class a unique position from which to supply information on databases from a multiple search-system perspective. the publication of the quick•search cross-system database search guides is a natural outgrowth of the colrs program in training and consulting. the twenty-one charts in quick•search show the formats used to search for information in a specific database across the two or three vendors offering the database commercially. the databases were selected as the most commonly searched through the major commercial search services: bibliographic retrieval services, dialog information services, and system development corporation search service (soc). eight databases in the sciences, eight in the social sciences, and five multidisciplinary files are included in the complete set. two subsets of the science and multidisciplinary files, and the social science and multidisciplinary files are available for $60 for class members 244 journal of library automation vol. 14/3 september 1981 and $80 for nonmembers. the eight science databases are biosis, cab abstracts, compendex, energyline, enviroline, food service & technology abstracts, inspec, and oceanic abstracts. the social science files are abii inform, eric, exceptional child education resources, library and information science abstracts, management contents, psychological abstracts, social scisearch, and u.s. political science documents. the multidisciplinary databases are conference papers index, comprehensive dissertation index, ntis, pais international, and ssie current research. the stated purpose of the quick • search guides is to aid the experienced searcher who must use databases from more than one search service by showing the formats for each vendor of a database side by side for comparison. because most searchers tend to use a database on only one system, the guides are really more appropriate to an organization where several searchers may be using the same database through different systems and a "universal" quickreference chart is needed. because each guide covers only one database, the level of detail shown is much greater than in the simple-command comparison charts previously published. the guides are arranged to show particular features of the databases as they are used on the different search systems. the file label used to access the database and those fields that are searched when a term is entered with no restriction (the basic index) are shown at the top of each chart. the fields used in subject searching follow and show the field codes used to restrict subject searches, along with the format used online to enter search terms. the typical fields illustrated are title, subject descriptor, identifier, abstract, and category or section code. these fields vary according to database, but include the majority of subject access points used in the file. the balance of the chart is used to illustrate the field codes and formats used to retrieve information from other access points in the database such as author, journal source, language, publication date, document type, report numbers, or update code. these alternate access points vary widely by database, but each chart provides information on limiting searches by date, language, or update code at a minimum. the guides supply a useful amount of information for the experienced searcher needing a prompt on a form of entry for the fields available in a database, but a good understanding of the search system is required to use them properly. given the close contact class has with the database producers and online vendors, it is somewhat surprising to find inaccuracies and some misinterpretation in some of the guides. in the preface, for instance, the editor states, "in many brs files, uj and un are paragraph labels used in addition to de, mj, and mn. they are used to indicate major (uj) or minor (un) single word descriptors, similar to the df in dialog and iw in orbit." it is true that df is used in dialog to indicate a single-word descriptor, but in orbit the code is it. in brs, uj and un mean the term so restricted is an "unbound" part of a multiword descriptor-not a single-word descriptor (see brs/eric database guide, p.l4). the use of iw in orbit retrieves "unbound" words from the it field. the most trouble in the charts appears to be in the orbit sections. the basic index is misrepresented in several files and the iw field is only irregularly listed, even when it is present in the sdc version of the database. suggestions on the use of sensearch and stringsearch are not consistently illustrated for fields that cannot be directly restricted in some databases on orbit, such as abstract or supplementary index terms. many times the suggested search entry would not restrict retrieval to the field indicated on the chart. these inaccuracies would probably not doom an experienced searcher to failure in using a database, but they are annoying and do little to inspire absolute confidence in the information presented. class is to be complimented on the graphic representations in quick*search and the heavy stock used for the guides (the paper will probably outlive the information printed on it). addenda are planned for those databases changed or reloaded since the preparation of quick*search in october 1980, and a second edition is already under consideration. the quick*search guides are not meant as a replacement for vendor or database documentation and, in fact, are simply repackaged versions of the basic file descriptions available from the online vendors. considering the price of this publication, organizations would do well to consider investing instead in detailed user guides and updates for their searchers in order to provide the most accurate and current information on databases on a specific system.-rod slade, university of oregon library, eugene. viewdata and videotext, 1980-81: a worldwide report. transcript of viewdata '80, first world conference on viewdata, videotex, and teletext, london, march 26-28, 1980. white plains, n.y.: knowledge industry publications, 1980. 623p. $75 softcover. lc: 80-18234. isbn: 0914236-77-6. videotex81. proceedingsofvideotex'81 international conference and exhibition, may 20-22, 1981, toronto, canada. northwood hills, middlesex, u.k.: online conferences ltd., 1981. 470p. $85 softcover. viewdata '80 and videotex '81 were two state-of-the-art conferences for the emerging videotex field. videotex is the generic name for mass-market, consumer-oriented information retrieval systems of low cost and relative ease of use. videotex, as a technology, is divided into teletext systems and viewdata systems. teletext systems sequentially broadcast information using a portion of the television signal. subscribers, using a special decoder, can select individual pages from the several hundred offered. viewdata systems, on the other hand, are quite like online information systems except for their use of a television as a display device, their simplicity, and their broader range of transactions and information. these conference proceedings will be of interest to a limited audience. they are not for the complete beginner. nor will they provide hours of entertaining reading. neither meets academic publication criteria; many of the papers are fluff, outlines, or sales pitches. both proceedings have their share, unfortunately large, of uninformative articles. but if you are seriously interested in vidbook reviews 245 eotex's technology, uses, and social implications, then by all means at least skim the 1981 conference papers. the proceeding~ do describe the state of the art. moreover, the two proceedings, taken together, show some of the changes in the videotex field in the last year ... and not only in the spelling of "videotex." as state of the art, the viewdata '80 conference proceedings are already superseded. most of the material has been adequately covered by now in other publications at a much lower cost. there are two exceptions to this, both worth noting. the proceedings has several excellent articles on the japanese captain system, the best published on that system. of additional interest is a report on control data corporation's (cdc) market test of their plato educational system. their report suggests a large consumer market for highquality educational services even at a relatively high price. the videotex '81 conference proceedings are, of course, more current. there are four major topics of interest in the proceedings. firstly, there are several good presentations on videotex services, such as electronic publishing, retailing, and banking. there is an excellent discussion on what videotex means to newspapers, both in opportunities and threats. secondly, and particularly recommended, is a paper by tydeman and zwimpfer of the institute for the future. the paper outlines some of the social changes and problems that may result from large-scale videotex implementation. thirdly, there are updates on the existing videotex technologies and efforts from the french, japanese, canadian, and british groups. the british are perhaps the most interesting since they have a year of operational experience with their viewdata system, prestel. they state that most usage was from the business community, and their reports suggest that services are shifting to attract that market. if this is the case, it is a significant change from the original consumer orientation. there is also a good article on a prestel information provider's first year. of additional interest is that prestelcompatible databases and systems are being constructed in britain. thus, people will be 246 journal of library automation vol. 14/3 september 1981 able to access different systems using the same protocol. finally, there are numerous fascinating papers on american efforts. the americans, in contrast to the british, seem very unsettled; there is still a multiplicity of designs. (at&t's decision on a modified telidon standard, not reported in the proceedings but a major event of the conference, may ameliorate that .) the papers indicate overall that the "classic" definitions of viewdata and teletext will crumble or will be supplemented in the face of 100-channel, two-way cable systems. several papers document how these new cable capabilities will provide channels for large amounts of information to be delivered by teletext, viewdata, or hybrid systems. a paper by simon notes that cable will not only provide large audiences for information services but will also eliminate some of the traditionally defined viewdata functions. for example, people will not buy commodity prices from a viewdata service if that same information is available on a cable channel at a lower price. unfortunately, there are some topics missing from the 1981 conference proceedings. consumer-oriented educational services are mentioned little. systemperformance or human-factor considerations are rarely analyzed. there is much discussion of what services should be offered, but there is little discussion of how those services should be offered. no presentation is made on how to design very large databases for ease of use. particularly distressing is the relative omission of the word "quality" from the american papers in both proceedings. one cannot expect every home to be wired to access the entire library of congress. nonetheless, one can hope that videotex will not become merely a medium for used-car advertising.-mark s. ackerman, department of computer and information science, ohio state university and oclc, inc. , columbus. a candid look at collected works: challenges of clustering aggregates in glimir and frbr gail thornburg information technology and libraries | september 2014 53 abstract creating descriptions of collected works in ways consistent with clear and precise retrieval has long challenged information professionals. this paper describes problems of creating record clusters for collected works and distinguishing them from single works: design pitfalls, successes, failures, and future research. overview and definitions the functional requirements for bibliographic records (frbr) was developed by the international federation of library associations (ifla) as a conceptual model of the bibliographic universe. frbr is intended to provide a more holistic approach to retrieval and access of information than any specific cataloging code. frbr defines a work as a distinct intellectual or artistic creation. put very simply, an expression of that work might be published as a book. in frbr terms, this book is a manifestation of that work.1 a collected work can be defined as “a group of individual works, selected by a common element such as author, subject or theme, brought together for the purposes of distribution as a new work.”2 in frbr, this type of work is termed an aggregate or “manifestation embodying multiple distinct expressions .”3 zumer describes aggregate as “a bibliographic entity formed by combing distinct bibliographic units together.”4 here the terms are used interchangeably. in frbr, the definition of aggregates applies only to group 1 entities, i.e., not to groups of persons or corporate bodies. the ifla working group on aggregates has defined three distinct types of aggregates: (1) collections of expressions, (2) aggregates resulting from augmentation or supplementing of a work with additional material, and (3) aggregates of parallel expressions of one work in multiple languages.5 while noting the relationships between the categories, this paper will focus on the first type. aggregates of the first type include selections, anthologies, series, books with independent sections by different authors, and so on. aggregates may occur in any format, from a volume containing both of the j. d. salinger works catcher in the rye and franny and zooey to a sound recording containing popular adagios from several composers to a video containing three john wayne movies. gail thornburg (thornbug@oclc.org) is consulting software engineer and researcher at oclc, dublin, ohio. mailto:thornbug@oclc.org a candid look at collected works | thornburg 54 the environment the oclc worldcat database is replete with bibliographic records describing aggregates. it has been estimated that that database may contain more than 20 percent aggregates.6 this proportion may increase as worldcat coverage of recordings and videos tends to increase. in the global library manifestation identifier (glimir) project, automatic clustering of the records into groups of instances of the same manifestation of a work was devised. glimir finds and groups similar records for a given manifestation and assigns two types of identifiers for the clusters. the first type is manifestation id, which identifies parallel records differing only in language of cataloging or metadata detail, some of which are probably true duplicates whose differences cannot be safely deduplicated by a machine process. the second type is a content id, which describes a broader clustering, for instance, physical and digital reproductions and reprints of the same title from differing publishers. this process started with the searching and matching algorithms developed for worldcat. the glimir clustering software is a specialization of the matching software developed for the batch loading of records to worldcat, deduplicating the database, and other search and comparison purposes.7 this form of glimirization compares an incoming record to database search results to determine what should match for glimir purposes. this is a looser match in some respects than what would be done for merging duplicates. the initial challenges of tailoring matching algorithms to suit the needs of glimir have been described in thornburg and oskins8 and in gatenby et al.9 the goals of glimir are (1) to cluster together different descriptions of the same resource and to get a clearer picture of the number of actual manifestations in worldcat so as to allow the selection of the most appropriate description, and (2) to cluster together different resources with the same content to improve discovery and delivery for end users. according to richard greene, “the ultimate goal of glimir is to link resources in different sites with a single identifier, to cluster hits and thereby maximize the rank of library resources in the web sphere.”10 glimir is related conceptually to the frbr model. if the goal of frbr is to improve the grouping of similar items for one work, then glimir similarly groups items within a given work. manifestation clusters specify the closest matches. content clusters contain reproductions and may be considered to represent elements of the expression level of the frbr model. the frbr and glimir algorithms this paper discusses have evolved significantly over the past three years. in addition, it should be recognized that the frbr algorithms use a map/reduce keyed approach to cluster frbr works and some glimir content while the full glimir algorithms use a more detailed and computationally expensive record comparison approach. the frbr batch process starts with worldcat enhanced with additional authority links, including the production glimir clusters. it makes several passes through worldcat, each pass constructing keys that pull similar records together for comparison and evaluation. as described by toves, “successive passes progressively build up knowledge about the groups allowing us to refine and information technology and libraries | september 2014 55 expand clusters, ending up with the work, content and manifestation clusters to feed into production.”11 each approach to clustering has its limits of feasibility, but the frbr and glimir combined teams have endeavored to synchronize changes to the algorithms and to share insights. some materials are easier to cluster using one approach, and some in the other. clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified and handled in some way. why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowballing of cluster members that serves no one. how could this happen? in a bibliographic database such as worldcat, items representing collected works can be catalogued in several ways. efforts to relax matching criteria in just the right degree to cluster records for the same work are difficult to devise and apply. the glimir and frbr teams consulted several times to discuss clustering strategies for works, content, and manifestation clusters. practical experience with glimir led to rounds of enhancements and distinctions to improve the software’s decisions. while glimir clusters can and have been undone and redone on more than one occasion, it took experience from the team to realize that the clues to a collected work must be recognized. bible and beowulf as are many initial production startups, the output of glimir processing was monitored. reports for changes in any clusters of more than fifty were reviewed by quality control catalogers for suspicious combinations. and occasionally a library using a glimiror frbr-organized display would report a strange cluster. this was the case with a huge malformed cluster of records for the bible. such a work set tends to be large and unmanageable by nature; there are a huge number of records for the bible in worldcat. however, it was noticed the set had grown suddenly over the previous two months. user interface applications stalled when attempting to present a view organized by such a set. one day, a local institution reported that a record for beowulf had turned up in this same work set. this started the team on an investigation. after much searching and analysis of the members of this cluster, the index case was uncovered. in many cases bibliographic records are allowed to cluster based on a uniform title. what the team found connecting these disparate records was a totally unexpected use of the uniform title, a field a candid look at collected works | thornburg 56 240 subfield a, contents: “b.”. that’s right, “b.”. once the first case was located, it was not hard to figure out that there were numerous uniform “titles” with other single letters of the alphabet. so in this odd usage, bible and beowulf could come together, if insufficient data were present in two records to discriminate by other comparisons. or potentially, other titles which started with “b.” seeing this unanticipated use of uniform title field, the frbr and glimir algorithms were promptly modified to beware. the frbr and glimir clusters were then unclustered and redone. this was a data issue, and unanticipated uses of fields in a record will crop up, if usually with less drama. further experience showed more. in the examination of another ill-formed cluster, a reviewer realized that one record had the uniform title stated as “illiad” but the item title was homer’s “odyssey.” of course these have the same author, and may easily have the same publisher. even the same translator (e.g., richard lattimore) is not improbable for a work like this. this was a case of bad data, but it imploded two very large clusters. music and identification of collected works as music catalogers know, musical works are very frequently presented in items that are collections of works. the rules for creating bibliographic records for music, whether scores or recordings or other, are intricate. the challenges to software to distinguish minor differences in wording from critical differences seem to be endless. moreover, musical sound recordings are largely collected works due to the nature of publication. as noted by papakhian, personal author headings are repeated oftener in sound recording collections than in the general body of materials.12 there are several factors that may contribute to such an observation. there are likely to be numerous recordings by the same performer of different works and numerous records of the same work by different performers. composers are also likely to be performers. the point is, for sound recordings an author statement and title may be less effective discriminators than for printed materials. vellucci13,14 and riley15 have written extensively on the problems of music in frbr models. the problems of distinguishing and relating whole/part relationships is particularly tricky. musical compositions often consist of units or segments that can be performed separately. so they are generally susceptible to extraction. these extractive relationships are seen in cases where parts are removed from the whole to exist separately, or perhaps parts for a violin or other instrument are extracted from the full score. software must be informed with rules as to significant differences in description of varying parts and varying descriptions of instruments, and in this team’s experience that is particularly difficult. krummel has noted that the bibliographic control of sound recordings has a dimension beyond item and work, that is, performance.16 different performances of the same beethoven symphony information technology and libraries | september 2014 57 need to be distinguished. cast and performer list evaluation and dates checking are done by the software. however, the comparisons the software can make are susceptible to fullness or scarcity of data provided in the bibliographic record. there is great variation observed in the numbers of cast members stated in a record. translator and adapter information can prove useful in the same sense of roles discrimination for other types of materials. this is close scrutiny of a record. at the same time consider that an opera can include the creative contributions of an author (plot), a librettist, and a musical composer. yet these all come together to provide one work, not a collected work. tillett has categorized seven types of bibliographic relationships among bibliographic entities, including the following: 1. equivalence, as exact copies or reproduction of a work. photocopies, microforms are examples. 2. derivative relationships, or, a modification such as variations, editions, translations. 3. descriptive, as in criticism, evaluation, review of a work. 4. whole/part, such as the relation of a selection from an anthology. 5. accompanying, as in a supplement or concordance or augmentation to a work. 6. sequential, or chronological relationships. 7. shared characteristic relationships, as in items not actually related that share a common author, director, performer, or other role. 17 while it is highly desirable for a software system to notice category 1 to cluster different records for the same work, that same software could be confused by “clues,” such as in category 7. and the software needs to understand the significance of the other categories in deciding what to group and what to split. to handle these relations in bibliographic records, tillett discusses linking devices including, for instance, uniform titles. yet uniform titles are used for the categories of equivalence relationships, whole/part relationships, and derivative relationships. this becomes more and more complex for a machine to figure out. of course, uniform titles within bibliographic records are supposed to link to authority records via text string only. consideration should ideally be given to linking via identifiers, as has been suggested elsewhere.18 thematic indexes review of scores and recordings glimir clusters showed a case where haydn’s symphonies a and b were brought together. these were outside the traditional canon of the 104 haydn symphonies and were referred to as “a” and “b” by the haydn scholar h. c. robbins landon. this misclustering highlighted the need for additional checks in the software. a candid look at collected works | thornburg 58 the original glimir software was not aware of thematic indexes as a tool for discrimination. thematic indexes are numbering systems for the works of a composer. the kochel mozart catalog, as in k. 626, is a familiar example. these designations are not unique to a given composer, that is, they are intended to be unique for a given composer, but identical designators may coincidentally have been assigned to multiple composers. while “b” series numbers may be applied to works of chambonnières, couperin, dvořák, pleyel, and others, the presence of more than one b number is suggestive of collected work status. for more on the various numbering systems, see the interesting discussion by the music library association.19 however, the software cannot merely count likely identifiers in the usual place. this could lead to falsely flagging aggregates; one work by dvořák could have b.193, which is incidentally equivalent to opus 105. clearly, any detection of multiple identifiers of this sort must be restricted to identifiers of the same series. string quartet number 5, or maybe 6 cases of renumbering can cause problems in identifying collected works. an early suppressed or lost work, later discovered and added to the canon of the composer’s work, can cause renumbering of the later works. clustering software needs must be very attentive to discrete numbers in music, but can it be clever enough? paul hindemith (1895–1963) works offer an example. his first string quartet was written in 1915, but long suppressed. his publisher was generally schott. long after hindemith’s death, this first quartet was unearthed, and then was published by schott. the publisher then renumbered all the quartets. so quartets previously 1 through 6 became 2 through 7. the rediscovered work was then called “no. 1,” though sometimes called “no. 0” to keep the older numbering intact. further, the last two quartets did not even have opus numbers assigned and were both in the same key.20 this presents a challenge. anything musical another problem case emerged when reviewers noticed a cluster contained both the unrelated songs “old black joe” and “when you and i were young maggie.” on investigation, the cluster held a number of unrelated pieces. here the use of alternate titles in a 246 field had led to overclustering, and the rules for use of 246 fields were tightened in frbr and glimir. as in the other problem cases, cycles of testing were necessary to estimate sufficient yet not excessive restrictions. rules too strict split good clusters and defeat the purpose of frbr and glimir. at this point the glimir/frbr team recognized that rules changes were necessary but not sufficient. that is, a concerted effort to handle collected works was essential. information technology and libraries | september 2014 59 strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). with time and experimentation, a set of checks were devised to allow collected works to be flagged. these clues were categorized as types: (1) considered conclusive evidence, or (2) partial evidence. type 2 needed another piece of evidence in the record. finding the best clues was a team effort. it was acknowledged that to prevent overclustering, overidentification of aggregates was preferable to failure to identify them. several cycles of tests were conducted and reviewed, assessing whether the software guessed right. table 1 illustrates the types of checks done for a given bibliographic record. here the “$” is used as abbreviation for subfield, and “ind” equals indicator. area field rule notes uniform title 240 $a and no $m, $n, $p, or $r title in $ a on list of terms, without the other subfields listed, is collected work this is a long list of terms such as “symphonies,” “plays,” “concertos,” and so on. title 245 contains “selections,” is collected 245 245 with multiple semi colons and doc type “rec” 246 if four or more v246 fields with ind2 = 2, 3, or 4, is collected. if more than 1 246, consider partial evidence extent 300 if 300$a has “pagination multiple” or “multiple pagings,” is collected contents notes 505$a and $t 1. check $a for first and last occurrences of “movement”. if not multiple movement occurrences and does have if all / any the above produce more than one pattern instance or more a candid look at collected works | thornburg 60 multiple “ / ” pattern. 2. if the above doesn’t find multiple patterns, also look for “ ; “ patterns. 3. if the above checks don’t produce more than 1 pattern, look for multiple “ – ” patterns. 4. count 505s $t cases. 5. count $r cases. than one $t, or more than one $r, is collected. various fields for thematic index clues 505a if any v505 $a, check for differing opuses. (this also checks for thematic index cases too.) if found, is collected. for types score and recording related work 740 if 1 or more 740 and 1 has indicator 2 = 2”, is collected . if only multiple 740s, partial evidence author 700/710/711/730 check for $t and $n. and check 730 ind 2 value of “2.” if 730 with ind2 = 2 or multiple $t is found, is collected. if only 1 $t, partial evidence 100/110/111, 700/710 730 if format recording, and both records are collected work, require cast list match to cluster anything but manifestation matches. that is, do not cluster at content level without verifying by cast. table 1. checks on bibliographic records. frailties of collected works identification in well-cataloged records the above table illustrates many areas in a bibliographic record that can be mined for evidence of aggregates. the problem is that cataloging practice offers no one rule mandatory to catalog a collected work correctly. moreover, as worldcat membership grows, the use of multiple schemes of cataloging rules for different eras and geographic areas adds to the complexity, even assuming that all the bibliographic records are cataloged “correctly.” correct cataloging is not assumed by the team. information technology and libraries | september 2014 61 software confounded with all the checks outlined in the table, the team still found cases of collected works that seemed to defy machine detection. one record had the two separate works, tom sawyer and huckleberry finn, in the same title field, with no other clues to the aggregate nature of the item. the work brustbild was another case. for this electronic resource set, brustbild appeared to be the collection set title, but the specific title for each picture was given in the publisher field. a cluster for the work gedichte von eduard morike (score) showed problems with the uniform title which was for the larger work, but the cluster records each actually represented parts of the work. the bad cluster for si ku quan shu zhen ben bie ji, an electronic resource, contained records which each appeared to represent the entire collection of 400 volumes, but the link in each 856 field pointed only to one volume in the set. limitations of the present approach the current processing rules for collected works adopt a strategy of containment. the problem may be handled in the near term by avoiding the mixing of collected works with noncollected works, but the clusters containing collected works need further analysis to produce optimal results. for example, it is one thing to notice scores “arrangements” as a clue to the presence of an aggregate. the requirement also exists that an arrangement should not cluster with the original score. the rules for clustering and distinguishing different sets of arrangements present another level of complexity. checks to compare and equate the instruments involved in an arrangement are quite difficult; in this team’s experience, they fail more often than they succeed. without initial explication of the rules for separating arrangements, reviewers quickly found clusters such as haydn’s schopfung, which included records for the full score, vocal score, and an arrangement for two flutes. an implementation that expects one manifestation to have the identifier of only one work is a conceptual problem for aggregates. a simple case: if the description of a recording of bernstein’s mass has an obscurely placed note indicating the second side contains the work candide, mass is likely to be dominant in the clustering effect, with the second work effectively “hidden.” this manifestation would seem to need three work ids, one for the combination, one for mass, and one for candide. this does not easily translate to an implementation of the frbr model but could perhaps be achieved via links. several layers of links would seem necessary. a manifestation needs to link to its collected work. a collected work needs links to records for the individual works that it contains, and vice versa, individual works need to link to collective works. this can be important for translations, for example, into russian, where collective works are common even where they do not exist in the original language. a candid look at collected works | thornburg 62 lessons learned first and foremost, plan to deal with collected works. for clustering efforts this must be addressed in some way for any large body of records. secondly, formats will gain the focus. the initial implementation of the glimir algorithms used test sets mainly composed of a specific work. after all, glimir clusters should all be formed within one work. these sets were carefully selected to represent as many different types of work sets as possible, whether clear or difficult examples of work set members. plenty of attention was given to the compatibility of differing formats, given the looser content clustering. these were good tests of the software’s ability to cluster effectively and correctly within a set that contained numerous types of materials. random sets of records were also tested to cross check for unexpected side effects. what in retrospect the team would have expanded was sets that were focused on specific formats. recordings, scrutinized as a group, can show different problems than scores or books. the distinctions to be made are probably not complete. another lesson learned in glimir concerned the risks of clustering. the deliberate effort to relax the very conservative nature of the matching algorithms used in glimir was critical to success in clustering anything. singleton clusters don’t improve anyone’s view. in the efforts to decide what should and should not be clustered, it was initially hard to discern the larger scale risks of overclustering. risks from sparse records were probably handled fairly well in this initial effort, but risks from complex records needed more work. collected works is only one illustration of risks of overclustering. future research the current research suggests a number of areas for possible further exploration: • the option for human intervention to rearrange clusters not easily clustered automatically would seem to be a valuable enhancement. • there is next the general question, what sort of processing is needed, and feasible, to distinguish the members of clusters flagged as collected works? • part versus whole relationships can be difficult to distinguish from the information in bibliographic records. further investigation of these descriptions is needed. • arrangements of works in music are so complex as to suggest an entire study by themselves. work on this area is in progress, but it needs rules investigation. • other derivative relationships among works: do these need consideration in a clustering effort? can and should they be brought together while avoiding overclustering of aggregates? • how much clustering of collected works may actually be helpful to persons or processes searching the database? how can clusters express relationships to other clusters? information technology and libraries | september 2014 63 conclusion clustering bibliographic records in a database as large as worldcat takes careful design and undaunted execution. the navigational balance between underclustering and overclustering is never easy to maintain, and course corrections will continue to challenge the navigators. acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many complex cases in music cataloging and music history. references 1. barbara tillett, “what is frbr? a conceptual model for the bibliographic universe,” last modified 2004, accessed november 22, 2013, http://www.loc.gov/cds/frbr.html. 2. janifer gatenby, email message to the author, november 10, 2013. 3. international federation of library associations (ifla) working group on aggregates, final report of the working group on aggregates, september 12, 2011, http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf. 4. maja zumer and edward t. o’neill, “modeling aggregates in frbr,” cataloging and classification quarterly 50, no. 5–7 (2012): 456–72. 5. ifla working group on aggregates, final report. 6. zumer and o’neill, “modelling aggregates in frbr.” 7. gail thornbug and w. michael oskins, “misinformation and bias in metadata processing: matching in large databases,” information technology & libraries 26, no. 2 (2007): 15–22. 8. gail thornburg and w. michael oskins, “matching music: clustering versus distinguishing records in a large database,” oclc systems and services 28, no. 1 (2012): 32–42. 9. janifer gatenby et al., “glimir: manifestation and content clustering within worldcat,” code{4}lib journal 17 (june 2012),http://journal.code4lib.org/articles/6812. 10. richard o. greene, “cataloging alchemy: making your data work harder” (slideshow presented at the american library association annual meeting, washington, dc, june 26–29, 2010), http://vidego.multicastmedia.com/player.php?p=ntst323q. 11. jenny toves, email message to the author, december 17, 2013. 12. arsen r. papakhian, “the frequency of personal name headings in the indiana university music library card catalogs,” library resources & technical services 29 (1985): 273–85. http://www.loc.gov/cds/frbr.html http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf http://journal.code4lib.org/articles/6812 http://vidego.multicastmedia.com/player.php?p=ntst323q a candid look at collected works | thornburg 64 13. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md: scarecrow, 1997). 14. sherry l. vellucci, “frbr and music,” in understanding frbr: what it is and how it will affect our retrieval tools, ed. arlene g. taylor (westport, ct: libraries unlimited, 2007), 131–51. 15. jenn riley, “application of the functional requirements for bibliographic records (frbr) to music,” www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf. 16. donald w. krummel, “musical functions and bibliographic forms,” the library, 5th ser. 31 (1976): 327–50. 17. barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging,” (phd diss., graduate school of library & information science, university of california, los angeles, 1987), 22–83. 18. program for cooperative cataloging (pcc) task group on the creation and function of name authorities in a non marc environment, “report on the pcc task group on the creation and function of name authorities in a non marc environment,” last modified 2013, http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcc tgonnameauthina_nonmarc_environ_finalreport.pdf. 19. music library association, authorities subcommittee of the bibliographic control committee, “thematic indexes used in the library of congress/naco authority file,” http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm. 20. jay weitz, email message to the author, may 6, 2013. http://www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm overview and definitions the environment clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified ... why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowb... bible and beowulf music and identification of collected works thematic indexes string quartet number 5, or maybe 6 anything musical strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). frailties of collected works identification in well-cataloged records software confounded limitations of the present approach lessons learned future research conclusion acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many co... references 150 book reviews networks and disciplines; !proceedings of the educom fall conference, october 11-13, 1972, ann arbor, michigan. princeton: educom, 1973. 209p. $6.00. as with so many conferences, the principal beneficiaries of this one are those who attended the sessions, and not those who will read the proceedings. except for a few prepared papers, the text is the somewhat edited version of verbatim, ad lib summaries of a number of workshop sessions and two panels that purport to summarize common themes and consensus. since few people are profound in ad lib commentaries, the result is shallow and repetitive. the forest of themes is completely lost among a bewildering array of trees. the conference was, i am sure, exciting and thought-provoking for the participants. it was simply organized, starting with statements of networking activities in a number of disciplines, i.e., chemistry, language studies, economics, libraries, museums, and social research. the paper on economics is by far the best organized presentation of the problems and potential of computers in any of the fields considered, and perhaps the best short presentation yet published for economics. the paper on libraries was short, that on chemistry lacking in analytical quality, that on language provocative, that on social research highly personal, and that on museums a neat mixture of reporting and interpreting. much of the information is conditional, that is, it described what might or could be in the realm of the application of computers to the various subjects. the speakers all directed their papers to the concept of networks, interpreted chiefly as widespread remote access to computational facilities. the papers are followed by very brief transcripts of the summaries of workshops in which the application of computers to each of the disciplines was presumably discussed in detail. much of each summary is indicative and not really informative about the discussions. the concluding text again is the transcript of two final panels on themes and relationships among computer centers. the only description for this portion of the text is turgid. in the midst of all this is the banquet paper presented by ed parker, who as usual was thoughtful and insightful, and several presentations by national science foundation officials that must have been useful at the time to guide those relying on federal funding for computer networks in developing proposals. i can't think of another reference that touches on the potential of computers in so many different disciplines, but it is apparent from the breadth of ideas and the range of suggested or tested applications that a coherent and analytical review should be done. this volume isn't it. russell shank smithsonian institution the analysis of information systems, by charles t. meadow. second edition. los angeles: melville publishing co., 1973. a wiley-becker & hayes series book. this is a revised edition of a book first published in 1967. the earlier edition was written from the viewpoint of the programmer interested in the application of computers to information retrieval and related problems. the second edition claims to be "more of a textbook for information science graduate students and users" (although it is not clear who these "users" are) . elsewhere the author indicates that his emphasis is on "software technology of information systems" and that the book is intended "to bridge the communications gap among information users, librarians and data processors." the book is divided into four parts: language and communication (dealing largely with indexing techniques and the properties of index languages) , retrieval of information (including retrieval strategies and the evaluation of system performance), the organization of information (organization of records, of ffies, file sets), computer processing of information (basic file processes, data access systems, interactive information retrieval, programming languages, generalized data management systems). the second two sections are, i feel, . much better than the first. these are the areas in which the author has had the most direct experience, and the topics covered, at least in their information retrieval applications, are not discussed particularly well or particularly fully elsewhere. it is these sections of the book that make it of most value to the student of information science. i am less happy about meadow's discussion of indexing and index languages, which i find unclear, incomplete, and inaccurate in places. the distinction drawn between pre-coordinate and post-coordinate systems is inaccurate; meadow tends to refer to such systems simply as keyword systems, although it is perfectly possible to have a post-coordinate system based on, say, class numbers, which can hardly be considered keywords, while it is also possible to have keyword systems that are essentially precoordinate. in fact, meadow relates the characteristic of being post-coordinate to the number of terms an indexer may use (" ... permit their users to select several descriptors for an index, as many as are needed to describe a particular document"), but this is not an accurate distinction between the two types of system. the real difference is related to how the terms are used (not how many are used), including how they are used at the time of searching. the references to faceted classification are also confusing and a number of statements are made throughout the discussion on index languages that are completely untrue. for example, meadow states (p. 51) that "a hierarchical classification language has no syntax to combine descriptors into terms." this is not at all accurate since several hierarchical classification schemes, including udc, do have synthetic elements which allow combination of descriptors, and some of these are highly synthetic. in fact, meadow himself gives an example (p. 3839) of this synthetic feature in the udc. it is also perhaps unfortunate that the student could read all through meadow's discussion of index languages without getting any clear idea of the structure of a thesaurus for information retrieval and how this thesaurus is applied in practice. book reviews 151 moreover, meadow used medical subject headings as his example of a thesaurus (p. 33-34), although this is not at all a conventional thesaurus and does not follow the usual thesaurus structure. my other criticism is that the book is too selective in its discussion of various aspects of information retrieval. for example, the discussion on automatic indexing is by no means a complete review of techniques that have been used in this field. likewise, the discussion of interactive systems is very limited, because it is based solely on nasa's system, recon. the student who relied only on meadow's coverage of these topics would get a very incomplete and one-sided view of what exists and what has been done in the way of research. in short, i would recommend this book for those sections (p. 183-412) that deal with the organization of records and files and with related programming considerations. the author has handled these topics well and perhaps more completely, in the information retrieval context, than anyone else. indexing and index languages, on the other hand, are subjects that have been covered more completely, clearly, and accurately by various other writers. i would not recommend the discussion on index languages to a student unless read in conjunction with other texts. f. w. lancaster university of illinois application of computer technology to librm·y processes, a syllabus, by joseph becker and josephine s. pulsifer. metuchen, n.j.: scarecrow press, 1973. 173p. $5.00. despite the large number of institutions offering courses related to library automation, including just about every library school in north america, accredited or not, there is a remarkable shortage of published material to assist in this instruction. with the publication of this small volume a light has been kindled; let us hope it will be only the first of many, for larger numbers of better educated librarians must surely result in higher standards in the field. this syllabus covers eight topics related 152 journal of library automation vol. 7/2 jtme 1974 to the use of computers in libraries, titled as follows: bridging the gap (librarians and automation); computer technology; systems analysis and implementation; marc program; library clerical processes (which encompasses acquisitions, cataloging, serials, circulation, and management information) ; reference services; related technologies; and library networks. each topic is treated as a unit of instruction, and each receives the identical treatment as follows. the units each start with an introductory paragraph, explaining what the field encompasses, and indicating the purpose of teaching that topic. the purpose of systems analysis, for example, is "to develop the sequence of steps essential to the introduction of automated systems into the library." a series of behavioral objectives are then listed, to show what the student will be able to do (after he has learned the material) that he presumably was unable to do before. for example, there are seven behavioral objectives in the unit on computer technology, of which the first four are: "1) the student will be able to discuss the two-fold requirement to represent data by codes and data structures for purposes of machine manipulation, 2) the student will be able to identify the basic components of computer systems and describe their purposes, 3) the student will be able to differentiate hardware and software and describe briefly the part that programming plays in the overall computer processing operation, 4) the student will be able to define the various modes of computer operation and indicate the utility of each in library operations." the remaining three objectives refer to the student's ability to enumerate and compare types of input, output, and storage devices. then an outline of the instructional material is presented, followed by the detailed and well-organized material for instruction. in no case can the material presented here be considered all that an instructor would need to know about the field, but a surprising amount of specific detail is included, along with a carefully organized framework within which to place other knowledge. the end result is to present to the instructor a series of outlines that would encompass much of the material included in a basic introductory course in library automation. every instructor would, presumably, want to add other topics of his own in addition to adding other material to the topics treated in this volume, but he has here an extremely helpful guide to a basic course, and the only work of its kind to be published to date. peter simmons school of librarianship university of british columbia the larc reports, vol. 6, issue 1. online cataloging and circulation at western kentucky university: an approach to automated instructional resources ~anagement. 1973. 78p. this is a detailed account of the design, development, and implementation of online cataloging and circulation which have been in operation at western kentucky university for several years. the library's reasons for using computers are similar to those of many college and university libraries that experienced rapid growth during the 1960s. the faculty of the division of library services first prepared a detailed proposal with appropriate feasibility studies and cost analyses to reclassify the collection from dewey decimal to library of congress classification. the proposal was approved by the administration of the university, and the decision was made to utilize campus computer facilities via online input techniques for reclassification, cataloging, and circulation. "project reclass" was accomplished during 1970-71 using ibm 2741 ats/360 terminals. a circulation file was subsequently generated from the master record file. the main library is housed in a new building and has excellent computer facilities within the library that are connected to the university computer center. cataloging information is input directly into the system via ats terminals; ibm 2260 visual display terminals are used for inquiry into the status of books and patrons; and ibm 1031/1033 data collection terminals are used to charge out and check in books. catalog cards and book catalogs in upper/lower case are produced in batch mode on regular schedule. the on-line circulation book record file is used in conjunction with the on-line student master record and payroll master record files for preparation of overdue and fine notices. apparently the communication between library staff and computer personnel has been well above average, and cooperation of the administration and other interested parties has been outstanding. the attention given to planning, scheduling, training, and implementation is impressive. what has been accomplished to date is considered very successful, and plans are book reviews 153 underway to develop on-line acquisitions ordering and receiving procedures. the report has some annoying shortcomings such as referring to the library of congress as "national library"; frequent use of the word "xeroxing," which the xerox corporation is attempting to correct; "inputing" for "inputting"; and several other misspelled words. some parts are poorly organized and unclear, but the report does provide rriany useful details for those considering a similar undertaking. lavahn overmyer school of library science case western reserve university letter from the editors (march 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.16319 welcome to the march 2023 issue. despite the date, snow still covers the ground where the editor lives, and winter still appears to be holding on tightly to both coasts. we’re pleased to share with you the first issue of the calendar year and a collection of five peer-reviewed articles, as well as some news and updates (below). we also have a column in our public libraries leading the way series, “virtual production at cloud901 in the memphis central library” by alan ji and david mason, about how that library has adapted cutting-edge production techniques used in streaming tv shows such as the mandalorian to create virtual scenery in their teen-focused makerspace. peer-reviewed articles in the current issue are listed here: • the current state and challenges in democratizing small museums’ collections online / avgoustinos avgousti and georgios papaioannou • services to mobile users: the best practice from the top visited public libraries in the us / yan quan liu and sarah lewis • decision-making in the selection, procurement, and implementation of alma/primo: the customer perspective / jin xiu guo and gordon xu • exploring final project trends utilizing nuclear knowledge taxonomy: an approach using text mining / faizhal arif santosa • japanese military “comfort women” knowledge graph: linking fragmented digital records / haram park and haklae kim call for new editorial board members coming in april the ital editorial board, a core committee, will be issuing a call for volunteers in april. for those selected, two-year terms of service will start on july 1. editorial board members have a critical role in building the foundation for the journal’s future through setting policy and content guidelines. members of the board have several key responsibilities: • shaping the direction and strategy for the journal; • participating in online editorial board meetings; • soliciting contributions to the journal (based on personal networking, conference attendance, etc.); and • optionally reviewing articles submitted to the journal, for those who want to be involved at an even deeper level (see the peer reviewer job description). if you are interested in furthering the scholarly record for library technology and have a background in information technology in libraries, archives, or museums, this is an exciting opportunity to contribute to the profession and engage with colleagues across all types of organizations in examining the role of technology in libraries. because we want the editorial board to reflect the broad diversity of core’s membership, we especially encourage individuals from underrepresented groups and identities to apply. ital will move to a new host this summer over the past year, the editors of the three core journals—ital, library leadership & management (ll&m), and library resources and technical services (lrts)—have been working with core and the core board to consolidate our journals on a single publishing platform. we’re pleased to say that ll&m and ital will move this summer to ala’s open journal systems platform, where lrts https://ejournals.bc.edu/index.php/ital/article/view/16315 https://ejournals.bc.edu/index.php/ital/article/view/14099 https://ejournals.bc.edu/index.php/ital/article/view/15143 https://ejournals.bc.edu/index.php/ital/article/view/15599 https://ejournals.bc.edu/index.php/ital/article/view/15599 https://ejournals.bc.edu/index.php/ital/article/view/15603 https://ejournals.bc.edu/index.php/ital/article/view/15603 https://ejournals.bc.edu/index.php/ital/article/view/15799 https://ejournals.bc.edu/index.php/ital/article/view/15799 https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing information technology and libraries march 2023 letter from the editors 2 varnum and kelly is already published. we’ll have more details to share in our june issue, before the move, but want to let you know some important details: • ital’s urls will change, but dois will continue to resolve the new home of the journal. we will work with our current host, boston college, to set up redirects to the new location. • ala uses the same publishing platform as boston college, open journal systems, so for authors and reviewers, the experience will remain the same. • articles published in ital (and our two sibling journals) will continue to be open access with no fees charged to authors or readers. authors maintain copyright in their work. we are very grateful to boston college for their support of information technology and libraries over the past decade, and to the core board for supporting this project. be a part of a future issue as the u.s. academic year hurdles to a close this spring, it’s a great time to think about the work you’ve accomplished and what you might share with your library colleagues near and far. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. we’d love to consider your article for publication. or, if you have an idea you’d like to discuss with ital’s editors, contact either of us at the email addresses below. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com call for new editorial board members coming in april ital will move to a new host this summer be a part of a future issue salazar 170 information technology and libraries | september 2006 author id box for 3 column layout traditional, larger libraries can rely on their physical collection, coffee shops, and study rooms as ways to entice patrons into their library. yet virtual libraries merely have their online presence to attract students to resources. this can only be achieved by providing a fully functional site that is well designed and organized, allowing patrons to navigate and locate information easily. one such technology significantly improving the overall usefulness of web sites is a content management system (cms). although the cms is not a novel technology per se, it is a technology smaller libraries cannot afford to ignore. in the fall of 2004, the northcentral university electronic learning resources center (elrc), a small, virtual library, moved from a static to a database-driven web site. this article explains the importance of a cms for the virtual or smaller library and describes the methodology used by elrc to complete the project. state of the virtual library the northcentral university electronic learning resource center (elrc), a virtual library, recently moved from a static to a databasedriven web site in 2004.1 before this, the site consisted of 450 static pages and continued to multiply due to the creation and expansion of northcentral university (ncu) programs. to provide the type of service demanded by our internet-savvy patrons, the elrc felt it needed to evolve to the next stage of web management and design. ncu, with a current enrollment of roughly twenty-one hundred fulltime students, is one of many forprofit virtual universities (including the university of phoenix, capella, and walden, among others) seeking to carve a niche in the education market by offering professional degrees entirely online.2 in the past few years, distance education has experienced exponential growth, causing virtual universities to flourish, but forcing on their libraries the challenge of keeping pace.3 typically, virtual libraries are manned by a limited staff comprised of one or two librarians who are responsible for all facets of the library, including interlibrary loan, virtual reference, library instruction, and web site management, among other library duties. 4 web site management, as expected, becomes cumbersome when a site exceeds two hundred or more static pages and a clear and structured system is not in place to maintain a proliferating number of web pages. because virtual, for-profit libraries do not rely on public funding and taxes, they tend not to be as concerned about autonomy as public or state libraries, which must find ways to stay within budget and curtail expenses. on the same note, some academic libraries prefer to maintain a local area network (lan), while other libraries may not have the staff, resources, or need for such a system. thus, for some virtual libraries, such as elrc, the incorporation of technology takes on a more dependent role. that is, where some libraries are encouraged to explore open source applications and create homegrown tools, the virtual, smaller-staffed library finds itself more or less reliant on its university’s information technology (it) department.5 virtual libraries address the needs of distance education students, who demand an equivalent, if not surpassing, level of service and instruction as they would expect to find at physical libraries.6 meeting these needs requires a great deal of creativity, ingenuity, and a strong technical background. recent trends in developing technologies such as mylibrary, learning objects, blogs, virtual chat, and federated searching have broadened the scope of possibilities for the smaller-staffed, virtual library. in particular, a content management system (cms) utilizes a combination of tools that provide numerous advantages, as outlined below: 1. the creation of templates that maintain a consistent design throughout the site 2. the convenience of adding, updating, and deleting information from a single, online location 3. the creation and maintenance of interactive pages or learning objects 4. the implementation of a simple editing interface that eliminates knowledge of extensible hypertext markup language/hypertext markup language (xhtml/ html) by library staff simply defined, a cms is comprised of a database, server pages such as active server page (asp), personal home page (php), or coldfusion; a web server—for example, internet information server (iis), personal web server (pws), or apache; and an editing tool to manage web content.7 these resources vary in price, but for a virtual library integrated into a larger university, it is ideal to implement applications and software supported by the university. for the autonomous academic library, this may differ. there are advantages and disadvantages for using proprietary and nonproprietary software, and it is left to the library, virtual or physical, to determine the type of resources needed to meet the goals and mission of the university.8 although the scope of this article focuses on the creation of tools for a homegrown cms, some libraries may wish to explore commercial cms packages that include additional services such as technical support. these cms packages will vary in price and services depending on the vendor and the needs of the library.9 elrc transformed in fall 2004, a group that consisted ed salazar (esalazar@ncu.edu) is reference/web librarian at northcentral university. content management for the virtual library ed salazar article title | author 171content management for the virtual library | salazar 171 of two librarians, the education chair, and programmer, convened to discuss the redesign of the elrc web site, which had become increasingly difficult to manage. specifically, the amount of duplicated content, inconsistent design and layout, and unstructured architecture of the site posed severe navigational and organizational problems. the group selected and compared other academic library sites to determine a desired design and theme for the new elrc site. discussions also involved the addition of features such as a site search and breadcrumbs, which the group felt were essential. as a result, the creation of a homegrown cms using proprietary software became the route of choice to meeting the increasing demands of patrons and the need to expand the site. because ncu utilizes microsoft (ms) information system products, it was agreed ms or ms-compatible applications would be used to create the cms, which consisted of sql server, iis, asp, visual basic script (vbscript), jspell iframe, and ms visual interdev. ms visual interdev and jspell iframe supplanted our previous web editor, ms frontpage, which seemed to generate superfluous code and thus made it difficult to debug or alter the design and layout of pages. also, using jspell iframe eliminated the need for future ncu librarians to possess an expertise in xhtml/html. with these pieces in place, the arduous task of culling content from static pages and entering it into a database was begun. the database the sql server database helped in organizing and structuring content, and allowed for the creation of templates and administration (admin) pages.10 in addition, the database played an integral part in creating the search, breadcrumb, and site map features the group so desperately wanted. a significant amount of time was spent weeding the site for information that had become obsolete or irrelevant to elrc. it should be noted that the group originally attempted to use access for a database but stumbled across several problems, one being the inability to maintain a stable and reliable connection to the database. the templates with the database nearly complete, the programmer began creating asp templates in ms visual interdev. these templates basically serve as the shell of the web page, preserving the design and layout elements of the page while extracting unique content based on a user’s request. in essence, a single template can produce hundreds of pages consistent in design. likewise, a single change to the template can alter the entire design of the site. for the elrc, seven templates were created for more than 450 pages. figure 1 shows the elrc course guides template. figure 2 shows the public view of the elrc course guide template. changes to the templates are done using ms visual interdev, which offers a user-friendly environment for managing web pages. ms visual interdev also includes helpful features, such as highlighting code errors for easy debugging, and the ability to access, create, and maintain stable connections to databases.11 in addition, the ms visual interdev editor recognizes commonly used asp commands, allowing the user to save time by utilizing keyboard shortcuts when programming. besides creating templates, asp server-include files and cascading style sheets (css) were incorporated, allowing for the easy modification of code on a single file instead of each and every page or template. this, in particular, is time-efficient when having to add or change database connections or design elements. also, the elrc took extra precaution to ensure that style elements met the accessibility requirements and standards set forth by the world wide web consortium (w3c), as well as tested the site on other browsers, such as firefox and netscape.12 as the site continues to grow and expand, so may the need for additional templates. creation or replication of templates is simple, requiring a basic understanding of programming and the re-assigning of new variables in the code to match added or modified tables. there is some speculation in the near future of migrating the site to the asp.net environment for added functionality and security. if and when that time comes, the elrc will be ready. at present, ncu is not considering the use of open source code or applications (the exception being the apache web server); this is primarily due to available technical support, security, and intuitiveness of use associated with commercial software. in addition, the ncu information system was built using commercial software and a complete transition to open source, at the moment, is not possible or desirable. with the templates complete, the elrc began running a prototype of the new site, making it accessible to students and faculty from a link on the old site. a survey was created that allowed users to comment on the new site. one detail of importance to note is that the survey duplicated a prior survey done on the old site in 2003 in order to provide the elrc with comparative data. the admin pages the next phase of the project required the creation of admin pages, which would allow content to be quickly added, updated, and deleted on the site. these pages, like the templates, were created in ms visual interdev; display content is housed within the database on the web, thus allowing 172 information technology and libraries | september 2006 it to be changed on the fly. figure 3 shows all of the web pages for the elrc within a table. what is particularly convenient about the admin edit pages is the incorporation of the jspell iframe editor, which serves as the frontend editor to the site. the reason for using jspell iframe, as stated earlier, is its ease of use: the simple tool bar provides the basic, essential tools necessary for creating content without the daunting number of buttons and menu selections other editors tend to have. also, jspell iframe is reasonably priced and does not entail a complex installation or require any space on local hard drives; instead, the program is maintained on the server. consequentially, all that is required is the insertion of the jspell iframe javascript code into the web pages. in addition to jspell iframe, fields within admin edit pages are or can be pre-populated by content in the database. for instance, the title or display order of links can be easily edited or changed. longer text fields comprised of paragraphs are created or modified using jspell iframe. deleting a page is simple, requiring only the click of a delete button on the bottom, righthand corner. figure 4 shows jspell iframe embedded within an admin edit page. the admin add page is straightforward. information is entered into the fields appearing on a form page, and the proper page type designation is selected from a drop-down menu. yet, more importantly, the admin add and the admin edit pages can filter information to specific users for security purposes and library needs. figure 5 shows an admin add page. figure 6 shows an admin edit page. the admin pages were designed with flexibility in mind. main column headings may be sorted, as seen in figure 3, allowing one to locate a particular page. the sorting feature also displays the inner structure of the database that, in turn, identifies parent-child relationships between pages in the elrc, which is useful and necessary when adding pages to the elrc site. due to the careful thought used in creating the admin pages, they have proven to be extremely effective and useful in maintaining a library web site. each and every change to the site can be made on the web, allowing content to be edited remotely and eliminating the need for installing and maintaining expensive editing software on local and remote machines. usability testing with the site completed, the elrc felt it important to perform usability figure 1. elrc course guide template figure 2. public view of the elrc course guide template article title | author 173content management for the virtual library | salazar 173 tests, but how does a virtual library conduct usability testing when all of its students are distance education students? this is a difficult question that involves some ingenuity to answer. in order to solve this problem, staff members were propositioned (begged) to volunteer for the study. total staff acquired was five. also, a local college class of about ten students was persuaded to participate in the study. granted, the total number of subjects is not representative of the ncu student body; however, substantial changes to the site were made from the data gathered. more usability testing is expected in the immediate future. the findings usability testing complete, the site was launched. during this period, a few minor hang-ups were experienced, including broken links, form page errors, and stray design elements, but these were only minor problems that were quickly fixed. feedback from the elrc survey showed that nearly all of the students and faculty, roughly fifty respondents, approved of the changes by commenting that the site had improved in layout and organization of content as well as navigation. also, responses and comments from usability testing participants were equally positive and encouraging. figure 7 shows the new ncu learners elrc home page. although it is difficult to establish a direct connection between the elrc site and usage, recent statistics appear promising. since the inception of the new site in december 2004, the number of visits to the elrc learners home page has jumped 10 percent. this number is expected to rise as ncu continues to grow and students become more acquainted and familiar with the site. the project took nearly six months to complete and required the expertise of a programmer. although programming may be outside the requisites of a distance librarian, managing the site is not. a general understanding of control statements and sql is all that is needed. for the distance librarian who spends almost all of his or her time online, these skills can be acquired on the job or by taking introductory programming courses at a local college. in the hope that the site will continue to expand in concert with the growing body of ncu students, recently the elrc added a writing center and blog. with the entire site now being database driven, adding, updating, deleting content is done effortlessly. ideally, students and faculty will play a greater role in the development of the elrc site as a result of the changes. involving patrons with the site can play an integral, beneficial role in their academic pursuits. figure 3. web pages for elrc within a table figure 4. jspell iframe editor embedded within an admin edit page 174 information technology and libraries | september 2006 conclusion the elrc at ncu encourages other virtual or smaller libraries to explore their resources for improving their library web sites, which involves understanding campus resources and personnel. with the ever-burgeoning growth of technological resources, every library—small or large, virtual or physical, public or private—can empower itself to meet the needs of internet-savvy students. it is only a matter of being aware of the resources and putting them to good use. references and notes 1. the ncu elrc web site is comprised of three separate sites: the public site www.ncu.edu/elrc (accessed dec. 2, 2004), the mentors site http://mentors .ncu.edu/elrc (accessed dec. 2, 2004), and the learners site http://learners.ncu .edu/elrc (accessed dec. 2, 2004). although similar in design, each site is tailored to meet the needs of each individual group as well as protect ncu’s resources, services, and information. access to subscription resources and personal information is available upon authentication of the user to the site. 2. for a detailed overview of virtual libraries, see valerie a. akuna, “virtual universities: the new higher education paradigm,” estrella mountain college, http://students.estrellamountain .edu/drakuna/virtualuniversities.htm (accessed feb. 15, 2005). 3. u.s. department of education, national center for education statistics, “the condition of education 2004,” distance education at postsecondary institutions, http://nces.ed.gov/pubsearch/ pubsinfo.asp?pubid=2004077 (accessed feb. 8, 2005). 4. for more information on the role of the virtual librarian in a virtual university, see jan zastrow, “going the distance: academic librarians in the virtual university,” university of hawaii–kapiolani community college, http://library.kcc .hawaii.edu/~illdoc/de/depaper.htm (accessed jan. 29, 2005). 5. for an overview on developing an open source cms, please see mark dahl, “content management strategy for a college library web site,” information technology and libraries 23, no. 1 (2004). 6. for a detailed discussion on distance education and virtual libraries, see smiti gandhi, “academic librarians and distance education: challenges and opportunities,” reference & user services quarterly 43, no. 2 (2003). 7. for detailed information on using asp pages for managing databases, see xiaodong li and john paul fullerton, “create, edit, and manage web database content using active server pages,” library hi tech 20, no. 3 (2002); see also, bryan h. davidson, “database driven, dynamic content delivery: providing and managing access to online resources using microsoft access and active server pages,” oclc systems and services 17, no. 1 (2001). figure 6. admin edit page figure 5. admin add page article title | author 175content management for the virtual library | salazar 175 8. for advantages and disadvantages of open source and proprietary software, see john caroll, “open source versus proprietary: both have advantages,” special to cnet asia, http://asia.cnet.com/ builder/program/work/0,39009380,3918 1451,00.htm (accessed feb. 4, 2004); see also, stephen shankland, “study: opensource database going mainstream,” cnet, http://ecoustics-cnet.com.com/ study+open-source+databases+going +mainstream/2100-7344_3-5171543.html (accessed feb. 4, 2004). 9. for information on commercial content management vendors and prices, see cms watch, www.cmswatch.com/cms/ vendors (accessed feb. 15, 2005). “sql server 2000 product overview,” microsoft windows server system, www.microsoft. com/sql/evaluation/overview/default. asp (accessed feb. 15, 2005). 10. for a review on visual interdev, see maggie biggs, “visual studio 6.0 demonstrates improved integration,” infoworld 20, no. 35 (1998), www.infoworld.com/ cgi-bin/displaytc.p1?/reviews/980831 vstudio6.htm (accessed feb. 4, 2004). 11. “checklist of checkpoints for web content accessibility guidelines 1.0,” w3c, www.w3.org/tr/wai-webcon tent/full-checklist.html (accessed feb. 1, 2005). 12. jspell iframe 2004, www.jspell .com/iframe-spell-checker.html (accessed dec. 2, 2004). figure 7. elrc learners home page ebsco cover 2 lama cover 3 lita cover 4 index to advertisers lib-mocs-kmc364-20131012112749 147 who rules the rules? "why can't the english teach their children how to speak?" wondered henry higgins, implying that a lack of widely and consistently followed rules of usage created linguistic backwardness and anarchy. higgins' question might be rephrased today as: "when will the code teach its founders how to catalog?" the library of congress has historically fitted catalog codes to its own practices rather than following them slavishly. the best example is the lamentable policy of superimposition: continued use of preestablished forms of names that are not in compliance with the paris principles or aacrl. this was a cause of widespread confusion and complaint and the practice was eventually discontinued ... well, sort of discontinued. the various interpretations of aacrl, the inclusion of new rules, and pressure for further modifications eventually led to the drafting of aacr2, a code that was supposed to end variance and controversial practices. one might assume that including lc as a principal author of the new text and an lc official as one of the editors might result in a code that it could actually follow. judging by the spate of exceptions and interpretations made so far (more than 300), this has not been the case. in the place of superimposition, we have new impositions known as "compatible headings." they may not be readily ascertained according to the rules, but have been granted a sort of bibliographic squatter's rights. although it would be simpler for catalogers to follow the rules consistently, they must instead check several cataloging service bulletins and name authorities to see whether lc has determined that a given personal, corporate, or serial name is already "compatible" with aacr2. this can result in cataloging delays, higher processing costs, and inconsistent entries. aacr2 and uncertainties regarding its application by lc have been widely credited with lower cataloging productivity. this is not to imply that lc is behaving in a strictly arbitrary or capricious manner vis-a-vis the code. they can be seen as caught on the horns of a trilemma, with vast internal needs and increasing external demands competing for a shrinking budget. president reagan may have whispered sweet nothings during national library week, but during budget hearings it became clear that libraries are not as "truly needy" as impoverished generals and interior decorators. decisions to depart from aacr2 have been based primarily on cost factors. the decision by the rtsd catalog code revision committee and the joint steering committee not to consider cost and implementation factors has led both to widespread opposition to the code resulting in a one-year delay in implementation, and to the modifications that lc has made and is making. some variations such as using "dept." for "depart148 journal of library automation vol. 14/3 september 1981 ment" and "house" for "house of representatives" make fiscal and common sense. many other lc changes are simply bibliographic nit-picking, minor irritants to catalogers who must flip back and forth between the text of aacr2 and half a dozen bulletins to settle a minor point of description. why didn't lc representatives attempt to say, "wait a minute-we just can't do that now," while the code was being considered rather than after it was published? anyway, considering that lc was starting up a whole new catalog and closing the old one, one wonders why rules not to be applied retrospectively had to be tinkered with to such an extent. major questions still to be resolved include not only the compatiblename quandary, but the treatment of serials, microform reproductions, establishment of corporate names and determination of when works "emanate from" corporate bodies, and the romanization of slavic names. the decision to use title entry for serials and monographic series even in the case of generic titles has been controversial. there are, of course, exceptions to the rules, and there will be differences in how uncertain catalogers construct complex entries with parenthetical modifiers. unfortunately, rules establishing entries for serials have sometimes been muddied rather than clarified in the bulletin. consider the example in the winter 1981 issue wherein the bulletin of the engineering station of west virginia university is entered under "bulletin," while the same publication for the entire university is entered under "west virginia university bulletin." also, consider the complex cross-reference structure required to direct users between the two files, both of which may well be split again,' historically, between author/ title and title main entry. this is a special problem in the case of large monographic series generated by corporate bodies. the lc position on microform reproductions of previously published works is clearer, but is still a point of controversy. they have decided to provide the imprint and collation (er, make that "publication, distribution, etc., area" and "physical description area") of the original work, with a description of the microform in a note. in other words, they're sticking to aacrl. the rtsd ccs committee on cataloging: description and access is currently trying to resolve this conflict, one in which many research libraries have sided with lc. this body is also trying to unravel the mystique of "corporate emanation'' introduced in aacr2. another sore point has been the lc decision to follow an alternative rule, which prefers commonly known forms of romanized names over those established via systematic romanization. that lc is correctly following the spirit of the general principle for personal names is little comfort to research libraries with large slavic collections. how are other libraries responding to the murky form of aacr2? some are closing old card catalogs and continuing them with com or temporary card supplements. some of these are establishing cross-reference links between variant forms of names between catalogs, while others are not. editorial/dwyer 149 some are keeping their catalogs open and shifting files, while others are splitting files. some are shifting some files and splitting others. aa cr2 was intended to provide headings that could be easily ascertained by the user. ironically, the temporary result is scrambled catalogs: access systems involving multiple lookups and built-in confusion . until most bibliographic records are in machine-readable form under reliable authority control this will continue to be the case. authority control, it would seem, has long been an idea whose time has come but whose application is yet to be realized. the cooperative efforts of the library of congr~s and the major bibliographic utilities to establish reliable automated authority control will do much to ameliorate the problems presented by aacr2. it would also be helpful if lc, perhaps with the financial assistance of other libraries, networks, and foundations, would publish what might be called aacr2¥2-not a new edition of the code but one accurately reflecting actual lc practice. finally, future code makers would be wise to consider cost and other implementation factors in their deliberations. professor higgins, ever the optimist, would rather sing "wouldn't it be !overly" than hear another verse of "i did it my way." james r. dwyer editor's notes title change it often seems that the only things that change their names as often as library publications are standards organizations. not to be left out, jola will be called information technology and libraries beginning with volume 1, number 1, the march 1982 issue . this name was approved by the lit a board in san francisco this june as more accurately reflecting the true scope of the journal. new section with this issue, we are initiating a new section: "reports and working papers." this is intended to help disseminate documents of particular interest to the]ola readership. we solicit suggestions of documents, often developed as working papers for a specific purpose or group but of interest and value to our readership. in general, documents in this section are neither refereed nor edited. mitch i take great personal pleasure in publishing mike malinconico's speech upon presenting the 1981 lita award to mitch freedman. readers' comments we do continue to solicit suggestions about the journal but receive few. is anybody reading it? if you have any thoughts about what we should or shouldn't do, we would welcome your sharing them. lib-s-mocs-kmc364-20141005044052 109 statistical behavior of search keys abraham bookstein: graduate library school, university of chicago editor's note: the editor and author are aware that varying approaches may be taken to the problem presented here. readers are invited to respond in the form of a paper or a technical c.'ommunication. in discussion about search keys, concern has been expressed as to how the nwnber of items tetrieved by a single value relates to collection size. this paper creates a statistical model that attempts to give some insight into this behavior. it is concluded that, in general, the observed behavior can be explained as being intrinsically statistical in nature rather than being a property of specific search keys. an attempt is made to relate this model to other tesearch, and to indicate how this model may be made to yield more accurate predictions. introduction various experiments suggest that it may be possible to develop, as an access route into a file of bibliographic records, a search key'" whose values can be easily derived from such bibliographic data as is likely to be available to its users.1 some concern, however, has been expressed regarding the nonuniqueness of these keys: if the number of items retrieved were often to exceed an amount easily handled by a user of the system, the value of this access route would be considerably diminished. accordingly, an important measure of search key performance is the frequency with which a large number of records is reh·ieved as the search key is applied to the file. this measure is · related, for example, to how many memory accesses will be required, on the average, to retrieve all records satisfying a request; it is also an important consideration in deciding which display device should be installed in a system.2 • 3 after evaluating such a measure for a search key on a particular file, it is reasonable to ask how that measure will change over time, as the file increases in size. the nature of this variation has already been of concern to researchers in the field. kilgour, on the basis of a· number of experiments carried out at oclc, notes that "there remains a major problem to be o by the. phrase "search key'~ we mean a key similar to the 3-3 or 3-1-1-1 keys used at · ohio college library center and other places, which is made up by concatenating truncations of bibliographic data elements. llo journal of library automation vol 6/ 2 june 1973 solved and a major question to be answered. the problem is constituted of those replies that contain a number of entries exceeding the optimal maximum .. .. the major question to be answered is how truncated search keys will perform on files ten and a hundred times the size of that used in this experiment."' he elsewhere observes that "as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio ... . "5 this paper presents a mathematical model that addresses itself to the problem defined by kilgour and attempts to explain his observation; it is suggested that the gross features of the behavior are statistical in nature and not properties of specific search keys. a view of collection growth the cause of the phenomenon observed by kilgour can best be understood by first considering a simple model which, while not itself valid, does cast light on the nature of the behavior. this first model neglects the effect of randomness both in the growth of the collection and in the arrival of requests. it supposes our search key has the following property: regardless of collection size, the fraction of the collection retrieved by a particular search key value, v~, is exactly given by a constant f;; thus, if the fil e holds n records, a request for v 1 will retrieve n 1 = f,n records. this model similarly assumes that among any sizeable number of requests, the fraction of the time any particular search key value will occur is fixed; thus, for any subset of search key values, it is possible to determine how often members of that subset will occur among a set of requests. in particular, for any integer n, we can form the set of all the search key values that will retrieve less than n items. we can then determine how often search key values from that set are requested. if, for example, requests for these values occur 99 percent of the time, then we can assert that 99 percent of the time less than n items will be retrieved. if the fil e contains n items, then these n items constitute the fraction f = ~ of the file. should the collection size increase to ln, then the model predicts that 99 percent of the time less than f( ln) = ln items would be retrieved. in other words, we have precisely the behavior kilgour observes does not occur. this argument shows that a simple deterministic model does not conform to experience with search keys. the model breaks down in two ways, which accounts for the discrepancy between the results derived from it and kilgour's observations: 1. in any actual library, the fraction of the time that a particular request will appear within a sequence of requests will vary; and 2. in comparing two different samples having the same size, the number of items having a given search key value will vary. the first of these factors is easily dealt with and its analysis will suggest the number of requests to use in a test of search key behavior in a given library. for a particular collection, lets denote the set of search key values statistical behavior of search keysj bookstein 111 for which, say, twenty or more items are retrieved. we would like to find the fraction of the time that a request in s occurs in the long run; suppose this value is in fact q. then among m requests, the probability that m members of s occur is given by the binomial distribution fb(m\q,mi). this distribution has a mean of qm and a variance of qm(1 q). should we desire to estimate the actual fraction of the time that twenty or more items will be retrieved, we can take a sample of m requests and compute q, the fraction of the requests with search key values in s; if we do so, we will usually get a value for q between q ,/ m v q ( 1 q) and q + v2 m v q ( 1 q) .' if for example, q = .01 and m = 10,000, we would tend to find q in the interval .01 ± .002. thus the effect of randomness in the arrival of requests can easily be controlled by increasing the number of requests considered; furthermore, the size of error can be predicted. we next introduce the second factor; its analysis will suggest how the behavior of search keys will change as the collection grows in size. for this purpose we adopt a model of collection growth which assumes that as items arrive, they are randomly distributed among the search key values in accordance with some probability distribution. if we suppose that the probability of an item being assigned a specified search key value, v11 is p11 then in a collection of n items we may conclude that the probability of n items having that value is given by the binomial distribution: ( n ) n n-n fu(n jpbn) = 7 p1(1p1 ) • if g' ( v;) is the probability that the value v1 is selected from the request population, then the probability that the "next" request retrieve n items is given by def ~~ g'(vt) fb(njp;,n) =fg(p) fb(njp,n)dp; g(p) dp= ~ g'(v;) p;! p i ~ p + dp is the probability that a request arrive with value p1 in the interval (p,p + dp), and will be treated as a continuous function.""' since the expectation of the binomial distribution is given by pn, we have de£ nfpg(p)dp = np as the expected number of items retrieved by a random request; since this is proportional ton, doubling the size of the collection will, on the average, double the amount of material ret1·ieved. similarly, the 2 2 variance, u 2, is given by n2 ( p 2 p ) + nf p( 1 p) g( p )dp. should p2 p , de£ the variance of p, be small, this reduces to nfp(l p )g(p )dp = i?n, so that approximately 95 percent of the time the amount of material retrieved would be less than np + 2\1 n a-= n ( p + , ~a) . v n . •• this result would more precisely be expressed as f fb(n lp ,n)dg(p), which has the form of a stieltjes integral. the expression used in the text is simpler and reasonably valid because of the vast number of values the search key can take. i i i j i 112 journal of library automation vol. 6/2 june 1973 it is the factor + 2cf p vn' and its dependence on n, that may account for kilgour's nonlinearity, and not any property intrinsic in the nature of any type of search key. thus, to the extent that this model reflects what is really happening, the 95 percent point increases roughly proportionately with file size; the "constant" of proportionality, however, is the sum of two tem1s: the first is a true constant, and the second is a term that approaches zero as the file gets larger. in particular, this model suggests that we will never reach a leveling off point-as the file increases in size, the number of items retrieved will also increase, and the pattern of increase will become increasingly linear. up to this point this discussion has been qualitative in nature, being based upon general statistical considerations and making use of the normal approximation to some unknown distribution; its broad conclusions are, however, consistent with the findings of earlier workers and can explain certai11 unanticipated properties of search keys. to proceed further it will be necessary to restrict the form of the function g(p); tl1is will be attemped in the following section of this paper. relationship of model to earlier research interest in access methods that are appropriate for files of bibliographic data has generated a considerable amount of empirical research on search key behavior. of necessity, this pioneering work has been of a descriptive nature, resulting in data showing search key behavior in specific environments. while these efforts have lent a good deal of insight into the nature of search keys, the basic weakness of such research lies in the difficulty of extending these findings to other situations. one purpose of a mathematical model such ·as. the one being developed here is to provide this increased generality by representing in a concise and easily manipulated form the results of previous research. it is accordingly of interest to indicate the relationship between previous work on search keys and our model. research on search key performance has been of two kinds. the fi.rst kind seeks .to answer the question: for any number, n, how many search key values retrieve n items? the answer to this question depends only on the search key and the collection; it is independent of the pattern of request arrivals. the second kind of research involves the ·actual arrival of requests; it tries to answer the question: for any number n, how frequently will requests resulting in the retrieval of n items occur? to discuss this research in terms of our model requires a closer examinadef tion of the function g( p) previously defined. we recall that g( p) dp == ~ g'(v1), with dp being a small number. thus g(p) is determined p ~ pi ~ p+dp by two factors: statistical behavior of semch keysjbookstein 113 a. the number of search key values in the interval ( p,p + dp). let us denote this value by f(p )dp, so f(p) is the density of search keys at p. we make use here of the fact that although the number of possible search key values is finite, the number is very large, so their. distribution can be thought of as continuous. b. the average probability of search keys, with values p 1 near p, being requested. we shall refer to this quantity as g"(p). by combining these factors we have g(p) = g"(p )f(p ). · in terms of this discussion, the first type of research described above. is in fact estimating f(p): if there ares search key values that retrieve n items from a collection of n items, then sis an estimate of this relation uses _!_ f (~)· n n' n + ~~ n = pn, and dp = n n~ 1 n n' the second kind of research directly estimates g ( p). guthrie, in a recent paper, provides a bridge between the two types of research by discussing his findings in terms of two models.6 one of his models, which asserts that each search key value has an equal chance of being requested, is equivalent to the assumption that g"(p) = 1, and g(p) = f(p). guthrie finds that this is not an adequate representation of his data. guthrie's second model asserts that each item has an equal chance of being requested. in our terms this becomes g' ( p )ap, and g( p )apf ( p). this model, while an improvement over the first, still disagrees with the data. furthermore , these models do not estimate f ( p); even if guthrie's model were correct, we would not know the probability that n items would be re trieved until we were told how many search key values contained n items. in the next section we will try to remedy this situation by means of a two paramete r representation of g( p). a representation of f(p) to get a more detailed account of search key behavior by experiment is difficult since the two aspects of randomness already discussed are confounded; the experimenter only sees the combined effect. we will, however, try to estimate the distribution g ( p) by a distribution of the form (a + {3 + 1)! a (1 )f3 a!f3! p p. we believe that such an attempt is reasonable on three grounds: a. it is not possible to find g(p) exactly, and moreover, it is not clear that this would be desirable. we are interested in a reasonable approximation that is satisfactory for decision-making purposes; b. the above distribution assumes a wide variety of shapes as a and f3 vary; it seems likely that values of a and f3 can be found for which 114 journal of library automation vol. 6/ 2 june 1973 this distribution is close enough to g ( p); and c. this distribution is mathematically tractable. if we proceed using the above approximation for g(p ), we find: (i) the probability, p(n), of n items being retrieved is given by 1. p(n) = (-n) ~-+ f3 + 1~l(a + n)! (nn + [3)! n a!fj! (a+fj+n+l)! ( ii) the expected number of items retrieved, e, is given by a + 1 2. e == n a + {3 + 2 ; and (iii) the variance, v, of the number of items retrieved is given by _ a+l {3 + 1 n 3· v n a + f3 + 2 a + {3 + 3 ( 1 + a. + {3 + 2 ) · if the experiment is performed on a small sample, the expectation and variance can be computed and the values of a and f1 estimated from the relations e a (1 -) + 1 4. f1== n 2, and e n v en e 1 -n 5. a. v 1 e l e 1-n usually ~ will be much smaller than one; in this case we may use the approximations: n 4'. f3 =(a+ i)e, and e 5'. e 1 a= n. once a and f1 have been evaluated, we can compute the probabilities p ( n) for files of arbitrary size, and with these values we can make assertions regarding the probability of, say, more than 30 items being retrieved. a relation that can be derived from formula 1 and may be of use when comparing this model with experiment is: p(n) i + {3 n-n = 1 + a n + 1 p(n + 1) statistical behavior of search keys/ bookstein 115 the probability of zero retrievals is likely to be an extraordinary point in the distributions g ( p) and p ( n) since it is influenced by the knowledge that a user may have of the collection; this effect is likely to be encountered in a sampling process in which the requests have to be generated artificially. in such cases it would be advisable to treat p ( 0) as an empirically derived parameter, (), and use the modified formula { (jifn=o 6. p' (n) = (1 fj) 1 ~(;~o) if n ::1= 0. the value of() can be estimated by the fraction of requests retrieving zero items; for sampling techniques using only productive requests, () will be zero. a. and f3 can be calculated as before from the mean and variance of the sample. conclusion the above discussion is intended as an attempt to provide some theoretical understanding of the puzzling behavior discovered in the use of search keys and also to provide some guide for those experimenting with samples of such files. we do, however, urge caution for the latter uses. an analysis similar to the above can be useful under several different circumstances, such as: determining the future behavior expected of a search key in a single library as the collection grows; determining the behavior for one library based upon experiments conducted on a different but similar library; and extrapolating from the performance of a search key in a sample of the collection to its pedormance in the full collection. if one wishes to compare two different libraries, one can note that as far as search key values are concerned, a particular library's collection can be thought of as a random sample of the larger population from which it selects its material, and accordingly the formula for p ( n) should be valid. in this case, if two different collections are drawn from the same population, the g ( p) refers to this population and the libraries are distinguished by the parameter n; when we are considering samples from a single library, then n is the sample size and g ( p) refers to the library itself. no theoretical basis exists at present for estimating to what extent the populations being considered depend upon the type of library, if any, so this problem must be dealt with empirically. we have assumed here that these populations are similar with regard to search key values. should these populations in fact vary, it is possible that they can be broken down, e.g., by language, into subpopulations that are stable and for each of which the analysis is valid. acknowledgments this work was made possible by clr/ neh grant no. e0-262-70-4658. i would like to express my gratitude to members of the university of chicago systems development office for their many comments and suggestions on this work. i ; i ll6 journal of library automation vol. 6/ 2 june 1973 references i. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraf, "title-only entries retrieved by use of truncated search key," journal of library automation 4:207-10 (dec. 1971). 2. a. bookstein, "double hashing," journal of the american society for information science 23:402-25 (nov.-dec. 1972) . 3. a. bookstein, "hash coding with a non-unique search key," to be published in the journal of american society for information science. 4. frederick g. kilgour, philip l. long, eugene b. leiderman, and ajan l. landgraf, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys." preprint. 5. kilgour, long, leiderman, and landgraf, "title-only entries," p.209-10. 6. gerry p. guthrie and steven d. slifko, "analysis of search key retrieval on a large bibliographic file," journal of library automation 5:96-100 (june 1972). letter from the editor kenneth j. varnum information technology and libraries | march 2018 1 https://doi.org/10.6017/ital.v37i1.10388 this issue marks 50 years of information technology and libraries. the scope and everaccelerating pace of technological change over the five decades since journal of library automation was launched in 1968 mirrors what the world at large has experienced. from “automating” existing services and functions a half century ago, libraries are now using technology to rethink, recreate, and reinvent services — often in areas that simply were in the realm of science fiction. in an attempt to put today’s technology landscape in context, ital will publish a series of essays this year, each focusing on the highlights of a decade. in this issue, editorial board member mark cyzyk talks about selected articles from the first two volumes of the journal. in the remaining issues this year, we’ll tackle the 1970s, 1980s, 1990s, and 2000s. the journal itself, now as ever before, focuses on the present and the near future, so we will hold off recapitulating the current decade until our centennial celebration in 2068. as we look back over the journal’s history, the editorial board is also looking to the future. we want to make sure that we know for whom we are publishing these articles, and to make sure that the journal is as relevant to today’s (and tomorrow’s) readership as it has been for those who have brought us to the present. to that end, we invite anyone who is reading this issue to take this brief survey — tell us a little about how you came to ital today, how you’re connected with library technology, and what you’d like to see in the journal. it won’t take much of you r time (no more than 5 minutes) and will help us understand the context in which we are working. there’s another opportunity for you to help shape the future of the journal. due to a number of terms being up at the end of june 2018, we have at least five openings on the editorial board to fill. if you are passionate about libraries and technology, enjoy working with authors to shape their articles, and want to help set out today’s scholarly record for tomorrow’s technologists, submit a statement of interest at https://goo.gl/forms/5gbqouuseolxrfx52. we seek to have an editorial board that represents the diversity of library technology practitioners, and particularly invite individuals from non-academic libraries and underrepresented demographic groups to apply. sincerely, kenneth j. varnum editor march 2018 https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://goo.gl/forms/5gbqouuseolxrfx52 accessible, dynamic web content using instagram jaci wilkinson information technology and libraries | march 2018 19 jaci wilkinson (jaci.wilkinson@umontana.edu) is web services librarian at the university of montana. abstract this is a case study in dynamic content creation using instagram’s application program interface (api). an embedded feed of the mansfield library archives and special collections’ (asc) most recent instagram posts was created for their website’s homepage. the process to harness instagram’s api highlighted competing interests: web services’ desire to most efficiently manage content, asc staff’s investment in the latest social media trends, and everyone’s institutional commitment to accessibility. introduction the mansfield library archives and special collections (asc) at the university of montana had a simple enough request. their homepage had been static for years and it was not possible to add more content creation to anyone’s workload. however, they had a robust instagram account with more than one thousand followers. was there any way to synchronize workflows with an instagram embed on the homepage? the solution was more complicated than we thought. we developed an instagram embed, but in the process grappled with some fundamental questions of technology in the library. how do we streamline the creation and sharing of ephemeral, dynamic content? how do we reconcile web accessibility standards with the innovative new platforms we want to incorporate on our websites? libraries have invested heavily in social media to improve their approachability, reduce library anxiety, and interact with their users. at the mansfield library, this investment has paid off for asc. this unit was an early adaptor of instagram, a photo and short video–sharing application with the public or approved followers. the asc instagram account launched in january 2015, and staff quickly settled on the persona of “banjo cat” to share collection items and relevant history. banjo cat was inspired by a whimsical nineteenth-century photograph in asc of a cat playing a banjo (see figure 1). asc now has about 1,200 followers including many other libraries, archives, and special collections. in fact, connecting to a wider community of similar institutions was a driving factor in creating an instagram account. the asc staff member who updates the account said, while we have lots of interactions with patrons on facebook we have basically zero interactions with other institutions. instagram is all about interacting with other institutions, sharing ideas for posts, commenting on posts. so by learning about this community and participating and interacting with it we are able to . . . learn about programs and ideas that we would probably not have access to otherwise. 1 mailto:jaci.wilkinson@umontana.edu accessible, dynamic web content using instagram | wilkinson 20 https://doi.org/10.6017/ital.v37i1.10230 figure 1. banjo cat by l. a. de ribas. mansfield library archives and special collections. 1880s. but while asc’s social media thrived, its website was bereft of dynamic content. given that the asc homepage is the ninth most visited page on the library site, it felt like a wasted opportunity to let such a highly trafficked area lack engaging, current, and appealing content. it seemed only natural to harness the energy put into the asc instagram account and embed that same light-hearted, community-oriented, and collection-focused content on the asc homepage. literature review libraries are enthusiastic adopters of social media; one study even shows that as of 2013, 94 percent of academic libraries had a social media presence.2 a 2006 library journal article observed the following about myspace, then a popular social media platform: “given the popularity and reach of this powerful social network, libraries have a chance to be leaders on their college campuses and in the larger community by realizing the possibilities of using social networking sites like myspace to bring their services to the public.” 3 this open-minded spirit and willingness to try new technology trends was shrewd. pew research reports that as of 2016, 69 percent of americans use some type of social media. 4 social media use has grown more representative of the population: the percentage of older adults on at least one social media site continues to increase.5 for academic libraries, the pull of facebook was immediately strong because of the initial requirement for users to have a .edu address. academic libraries very early on attempted to connect with students about services, resources, and spaces using facebook.6 information technology and libraries | march 2018 21 dynamic content is a gateway to building interest toward and buy-in to an institution. in user experience literature, “user delight” is “a positive emotional affect that a user may have when interacting with a device or interface.”7 in walter’s hierarchy of user needs, pleasure tops all other needs.8 figure 2. aaron walter’s hierarchy of user needs, from therese fessenden, “a theory of user delight: why usability is the foundation for delightful experiences,” nielsen norman group, march 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. using social media to engage users with special collections has its own niche. special collections are typically housed in closed stacks and have no digital equivalent. often the materials housed in special collections are rare, fragile, exotic, beautiful, and unusual; a study of library blogs and social media found that those with higher aesthetic value received more visitors and more revisits.9 social media “gives users an idea of what the collection offers while it promotes and potentially gains foot traffic.”10 it has even been suggested that social media gives special collections the opportunity to stand in when digitization isn’t possible: “instead of digitizing a whole collection, librarians can highlight important parts of the collection with a snippet of its history.”11 in creating ucla’s powell library instagram account, librarian danielle salomon https://www.nngroup.com/articles/theory-user-delight/ accessible, dynamic web content using instagram | wilkinson 22 https://doi.org/10.6017/ital.v37i1.10230 writes, “special collections items and digital library images can be a treasure trove of social media content. one of our library’s goals is to increase students’ exposure to special collections items, so we draw heavily from these collections.”12 instagram is a relative newcomer to social media, but it has been consistently successful since its inception in 2010.13 as of 2016, 28 percent of americans use instagram, up from 11 percent in 2013.14 facebook bought instagram in 2012 and has since bolstered the application’s success by making the two platforms easy to navigate and share between. after vine, a short video application, was shuttered in 2017, instagram’s ability to take and post short videos has increased its value. instagram is distinct in that it is mobile-dependent: it is difficult to run the application through a web browser, and only one device can operate an instagram account. within the library community, instagram’s adoption has been strongest in academic libraries. this is tied to the high number of instagram users who are college-age.15 another reason libraries select instagram is because it has more diverse users than other social media applications, specifically african americans and latinos.16 in a 2016 study, instagram was the second-most pick among college students at western oregon university when asked what social media application the library should use (twitter came in first). the most popular use of instagram in academic libraries is familiarizing students with services, resources, and spaces. uses include first-year instruction activities to combat library anxiety and mini-contests that ask users to identify what posted photos are of.17 ucla’s powell library discovered students posting instagram photos of their spaces, so they initially joined to repost those photos and interact with those users. instagram makes a library seem approachable. librarian joanna hare reflected on this discovery: “instagram is really powerful in that respect because you can just snap a few photos [and] show what’s going on . . . so that students don’t view the library as being intimidating.” 18 approachability is augmented by delegating photography and posting tasks to library student employees. social media is less often seen as a way to help create dynamic content for a library’s website. the exceptions to this trend have come from institutions with substantial technology resources. north carolina state university created an open source software that adds photos posted by anyone on instagram to a library photo collection when a certain hashtag is used.19 the university of nebraska’s calvin t. ryan library created an rss feed that disseminates blog posts to twitter, facebook, and the library homepage. posts from followed accounts in twitter and facebook are also a part of the resulting feed. the rss feed requires use of a third-party tool called dlvr.it (https://dlvrit.com/), which supports many other social media applications, but not instagram. a notable absence in literature on social media use in libraries is any mention of accessibility concerns. the “improving the accessibility of social media for public service” toolkit developed by a group of us government offices is a useful resource that includes specific guidelines on making instagram posts more accessible.20 the toolkit explains that “more and more organizations are using social media to conduct outreach, recruit job candidates and encourage workplace productivity. . . . but not all social media content is accessible to people with certain disabilities, which limits the reach and effectiveness of these platforms. and with 20% of the population estimated to have a disability, government agencies have an obligation to ensure that their messages, services and products are as inclusive as possible.”21 given the stated importance of social media in library literature, the lack of conversation about accessibility and social media is a barrier to inclusivity. https://dlvrit.com/ information technology and libraries | march 2018 23 mansfield library archives and special collections’ instagram feed dynamic content was lacking from any part of the asc website, but staff had a dearth of time and knowledge of the content management system to create web content. there was a drive to solve this problem because a new web services librarian had recently been hired. when the web services librarian learned of asc’s thriving instagram presence, she pursued the possibility of including that content on the asc website. she felt that, in addition to being more efficient, content creation should stay in-house given the highly specialized nature of asc’s collections, spaces, and resources. the ideal solution would allow asc staff to create and manage an instagram feed unassisted; the web services librarian sought the simplest possible solution for them. our content management system and instagram’s developer website were first consulted with the hope that one provided an automated embed or plugin. our content management system, cascade, could pull in content from facebook and twitter but not instagram, and instagram did not have an automated feed creator. after more research, we learned that third-party instagram feed embeds are the only possible way to create an instagram feed without using instagram’s api. the api was considered a last-resort option because we knew that asc staff could not manage the code themselves. the idea of using any third-party service was undesirable because of a lack of control, stability, and accessibility. if the service has technical issues or goes out of business, it would be very noticeable given the visibility of asc’s homepage. in 2012, a student advocacy organization at the university of montana filed a civil rights complaint with the us department of education focusing on disabled students’ unequal access to electronic and information technologies. since then, the mansfield library has been proactive to eliminate barriers to access.22 given this history, we are wary of the accessibility of third-party applications to someone using assistive technology, most likely, a screen reader. juicer (https://www.juicer.io/), for example, is a freely available service for an instagram feed but in exchange it retains its branding prominently at the top of the feed. an example of juicer in use can be found on the home page of the baltimore aquarium (http://aqua.org/). tests of juicer showed that it was not accessible for a screen reader. finally, it didn’t fit our need: juicer curated posts from other users depending on the hashtags and reposts, but we only wanted to feature our own content. the unpredictability of other accounts’ posts ending up on the asc homepage was not desirable. instagram’s developer site did not make finding a solution easy. the page titled “embedding” is about embedding individual posts on a webpage, not a whole feed.23 this content does not even link out to an explanation of how to embed a feed. the “authentication” page is where the process begins because calling the api requires a token an authenticated instagram account user.24 a user is authenticated by creating a client id and then receiving an access token. another interesting roadblock provided by the instagram developer site is that the “authentication” page provides no further information about using the access token to call the api. it took outside research to finally figure out the steps needed to make the api requests for asc’s feed.25 php code is used to call the api and copy the three most recent asc instagram posts to a local server file. (using javascript to call the api is a poor choice because that code will make the account’s access token public. if anyone sees this token they can use it themselves to pull your feed using the instagram api.) css replicates the look and feel of instagram with white, minimalistic icons and a simple photo display https://www.juicer.io/ http://aqua.org/ accessible, dynamic web content using instagram | wilkinson 24 https://doi.org/10.6017/ital.v37i1.10230 that darkens and shows the beginning of the description when a user’s mouse hovers over it. all code from this project is freely available in github.26 there is a catch to this embedded feed process. the directions given through instagram and by the online sources we used only took us to sandbox mode (in web development, sandbox refers to a restricted or test version of a final product). in sandbox, instagram limits the number of requests to the api. unfortunately, a request was made every time someone went to the asc page. the initial feed stopped working in minutes because we did not realize this limitation of sandbox mode meant. another look at the instagram developer site taught us that the only way to leave sandbox was to have our “app,” as instagram called it, reviewed.27 in other words, instagram has only set up their api to be used for full application development (like juicer). we decided not to leave sandbox mode because of uncertainty about what instagram’s review process would entail. if our app was rejected, would they force us to discontinue our work? the timeline for the approval process was also uncertain. distrust and uncertainty, unfortunately, guided our decision-making at this stage. instead of undergoing the review process, the php code was reconfigured to call the api only once a day. this made the feed less dynamic because it was not updating in real time. f or our purposes this was not a problem; the asc instagram account is updated at most once or twice a week anyway. as a result, we are “scraping” asc’s instagram account. although “crawling, scraping, and caching” are prohibited by instagram’s terms of use, other instagram feeds in github have similar workarounds and point out that a plugin/scraper “uses (the) same endpoint that instagram is using in their own site, so it’s arguable if the toc [terms of use] can prohibit the use of openly available information.”28 while figuring out how to work with the instagram api, a major accessibility roadblock cropped up: there was no place for the alt text—descriptive information about the image that is used by assistive technologies for users with low vision. besides taking or uploading a photo, the only other actions offered to create a new post were to write a caption, tag people, or add a location. only the caption allowed for a text string. without alt text, not only is the instagram feed unintelligible to a screen reader but it disturbs a screen reader user’s interaction with all other content on that page. an asc staff member discovered a solution when she noticed a joshua tree national park instagram post with alt text at the bottom of the caption. although initially put off by the “wordiness,” we concluded this was the only logical way to move forward. the benefits to this format of alt text took focus as we moved through the project: the asc staff member was able to choose the desired alt text without any additional steps or skills, and we grew to relish the opportunity to explain to curious users what the #alttext hashtag meant and why it was important to us. php code isolates all text after #alttext and displays that as the alt text to a screen reader. since the instagram feed was implemented, it has been interesting to follow how the instagram developer site has changed and grown. although facebook has owned instagram for five years, the instagram developer site is only now starting to link out to facebook developer content. most recently, the instagram developer site has been advertising the instagram graph api for use by business accounts. this type of development is useless for us because we have a personal instagram account, not a business account. and the function of the instagram graph api is focused on the internal user and analytics, not the end user and user experience. even if the instagram graph api was available for personal accounts, it is worth asking if this type of data collection would be of use to an organization that doesn’t have the labor of a devoted marketing team. information technology and libraries | march 2018 25 dynamic content through social media and web content provides opportunities to create user delight because it focuses on visually appealing, fun, timely, and interesting information. for archives, special collections, and other cultural heritage institutions, this content is particularly useful because it provides a look into collections that are interesting and rare but also fragile and housed in closed stacks. these positives are tempered by the reality many of these institutions face: budgets are tight, staffs are small, and technical expertise might be lacking. this paper demonstrates how important and useful social media is to create dynamic website content. unfortunately, there is a gap in library literature on accessibility and social media; although social media content is ephemeral or lacks specific utility, libraries need to pay more attention to the various ways users access resources and information through social media, especially if that same content appears on the institution’s website. the asc’s embedded homepage instagram feed fits their needs, is accessible, and builds community around their unique collections. by providing all the code created in this project in github,29 including the css we used, our hope is that institutions interested in this instagram feed model could replicate it for their own purposes without extensive technical support. acknowledgments i am thankful for the expertise of carlie magill, donna mccrea, and wes samson. without them this project would not have been possible. references 1 carlie magill, e-mail message to author, august 8, 2017. 2 michael sutherland, “rss feed 2.0” code4lib 31, january 28, 2016, http://journal.code4lib.org/articles/11299. 3 beth evans, “your space or myspace?” library journal 131 (2006): 8–12. library, information science & technology abstracts, ebscohost. 4 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 5 ibid. 6 brian s. mathews, “do you facebook?” c&rl news, may 2006, http://crln.acrl.org/index.php/crlnews/article/viewfile/7622/7622. 7 therese fessenden, “a theory of user delight: why usability is the foundation for delightful experiences,” nielsen norman group, march 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. 8 ibid. 9 daryl green, “utilizing social media to promote special collections: what works and what doesn’t” (paper, 78th ifla general conference and assembly, helsinki, finland, june 2012), 11, https://www.ifla.org/past-wlic/2012/87-green-en.pdf. 10 katrina rink, “displaying special collections online,” serials librarian 73, no. 2 (2017): 1–9, https://doi.org/10.1080/0361526x.2017.1291462. 11 ibid. http://journal.code4lib.org/articles/11299 http://www.pewinternet.org/fact-sheet/social-media/ http://crln.acrl.org/index.php/crlnews/article/viewfile/7622/7622 https://www.nngroup.com/articles/theory-user-delight/ https://www.ifla.org/past-wlic/2012/87-green-en.pdf https://doi.org/10.1080/0361526x.2017.1291462 accessible, dynamic web content using instagram | wilkinson 26 https://doi.org/10.6017/ital.v37i1.10230 12 danielle salomon, “moving on from facebook,” college & research libraries news 74, no. 8 (2013): 408–12, https://crln.acrl.org/index.php/crlnews/article/view/8991. 13 sarah perez, “the rise of instagram,” techcrunch, april 24, 2012, https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spreadworldwide/. 14 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 15 lauren wallis, “#selfiesinthestacks: sharing the library with instagram,” internet reference services quarterly 19, no. 3–4 (2014): 181–206, https://doi.org/10.1080/10875301.2014.983287. 16 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy ,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344; salomon, “moving on from facebook.” 17 wallis,“#selfiesinthestacks”; salomon, “moving on from facebook.” 18 wendy abbott et al., “an instagram is worth a thousand words: an industry panel and audience q&a,” library hi tech news 30, no. 7 (2013): 1–6, https://doi.org/10.1108/lhtn08-2013-0047. 19 salomon “moving on from facebook.” 20 “federal social media accessibility toolkit hackpad,” digital gov, accessed november 25, 2017, https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ . 21 ibid. 22 donna e. mccrea, “creating a more accessible environment for our users with disabilities: responding to an office for civil rights complaint,” archival issues 38, no. 1 (2017): 7, https://scholarworks.umt.edu/ml_pubs/25/ 23 “embedding,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/embedding/. 24 “authentication,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/authentication/ . 25 pranay deegoju, “embedding instagram feed in your website,” logical feed, december 25, 2015, https://www.logicalfeed.com/embedding-instagram-feed-in-your-website . 26 wes samson, “ws784512 instagram,” github, 2016, https://github.com/ws784512/instagram. 27 “sandbox mode,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/sandbox/. 28 “terms of use,” instagram, accessed november 25, 2017, https://help.instagram.com/478745558852511; and “image-hashtag-feed,” digitoimisto dude oy, accessed november 25, 2017, https://github.com/digitoimistodude/image-hashtag-feed. 29 samson, “ws784512 instagram.” https://crln.acrl.org/index.php/crlnews/article/view/8991 https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1080/10875301.2014.983287 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1108/lhtn-08-2013-0047 https://doi.org/10.1108/lhtn-08-2013-0047 https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ https://scholarworks.umt.edu/ml_pubs/25/ https://www.instagram.com/developer/embedding/ https://www.instagram.com/developer/authentication/ https://www.logicalfeed.com/embedding-instagram-feed-in-your-website https://github.com/ws784512/instagram https://www.instagram.com/developer/sandbox/ https://help.instagram.com/478745558852511 https://github.com/digitoimistodude/image-hashtag-feed abstract introduction literature review mansfield library archives and special collections’ instagram feed acknowledgments references editorial: singularity—are we there, yet? | truitt 55 i n my last column, i wrote about two books—nicholas carr ’s the shallows and william powers’ hamlet’s blackberry—relating to learning in the always-on, always connected environment of “screens.”1 since then, two additional works have come to my attention. while i won’t be able to do them justice in the space i have here, they deserve careful consideration and open discussion by those of us in the library community. if carr’s and power’s books are about how we learn in an always-connected world of screens, sherry turkle’s alone together and elias aboujaoude’s virtually you are about who we are in the process of becoming in that world.2 turkle is a psychologist at mit who studies human– computer interactions. among her previous works are the second self (1984) and life on the screen (1995). aboujaoude is a psychiatrist at the stanford university school of medicine, where he serves as director of the obsessive compulsive disorder clinic and the impulse control disorders clinic. based on extensive coverage of specialist and popular literature, as well as numerous anonymized accounts of patients and subjects encountered by the authors, both works are characterized by thorough research and thoughtful analysis. while their approaches to the topic of “what we are becoming” as a result of screens may differ— aboujaoude’s, for example, focuses on “templates” and the terminology of traditional psychiatry, while turkle’s examines the relationship between loneliness and solitude (they are different), and how these in turn relate to the world of screens—their observations of the everyday manifestations of what might be called the pathology of screens bear many common threads. i’m acutely aware of the potential for injustice (at best) and misrepresentation or misunderstanding (rather worse) that i risk in seeking to distill two very complex studies into such a small space. and, frankly, i’m still trying to wrap my head around both the books and the larger issues they raise. with that caveat, i still think we should be reading about and widely discussing the phenomena reported, which many of us observe on a daily basis. in the sections that follow, i’d like to touch on a very few themes that emerge from these books. ■■ “why do people no longer suffice?”3 a pair of anecdotes that turkle recounts to explain her reasons for writing the current book seems worth sharing at the outset. in the first, she describes taking her then-fourteen-year-old daughter, rebecca, to the charles darwin exhibition at new york’s american museum of natural history in 2005. among the many artifacts on display was a pair of live giant galapagos tortoises: “one tortoise was hidden from view; the other rested in its cage, utterly still. rebecca inspected the visible tortoise thoughtfully for a while and then said matter-of-factly, ‘they could have used a robot.’” when turkle queried other bystanders, many of the children agreed, with one saying, ‘for what the turtles do, you didn’t have to have live ones.’” in this case, “alive enough” was sufficient for the purpose at hand.4 sometime later, turkle read and publicly expressed her reservations about british computer scientist david levy’s book, love and sex with robots, in which levy predicted that by the middle of this century, love with robots will be as normal as love with other humans, while the number of sexual acts and lovemaking positions commonly practiced between humans will be extended, as robots will teach more than is in all of the world’s published sex manuals combined.5 contacted by a reporter from scientific american about her comments regarding levy’s book, turkle was stunned when the reporter, equating the possibility of relationships between humans and robots with gay and lesbian relationships, accused her of likewise opposing these human-to-human relationships. if we now have reached a point where gay and lesbian relationships can strike us as comparable to human-to-machine relationships, something very important has changed; for turkle, it suggested that we are on the threshold of what she terms the “robotic moment”: this does not mean that companionate robots are common among us; it refers to our state of emotional—and i would say philosophical—readiness. i find people willing to seriously consider robots not only as pets but as potential friends, confidants and romantic partners. we don’t seem to care what these artificial intelligences “know” or “understand” of the human moments we might “share” with them. at the robotic moment, the performance of connection seems connection enough. we are poised to attach to the inanimate without prejudice.6 marc truitteditorial: singularity—are we there, yet? marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2011 while these examples are admittedly extreme, both authors agree that something very basic has changed in the way we conduct ourselves. turkle characterizes it as mobile technology having made each of us “pausable,” i.e., that a face-to-face interaction being interrupted by an incoming call, text message, or e-mail is no longer extraordinary; rather, in the “new etiquette,” it is “close to the norm.”10 and the rudeness, as well we know, isn’t limited to mobile communications. referring to “flame wars,” which regularly erupt in online communities, aboujaoude observes: the internet makes it easier to suspend ethical codes governing conduct and behavior. gentleness, common courtesy, and the little niceties that announce us as well-mannered, civilized, and sociable members of the species are quickly stripped away to reveal a completely naked, often unpleasant human being.11 even our routine e-mail messages—lacking as they often do salutations and closing sign-offs—are characterized by a form of curtness heretofore unacceptable in paper communications. remarkably, to those old enough to recall the traditional norms, the brusqueness is not only unintended, it is as well unconscious; “[we] just don’t think warmth and manners are necessary or even advisable in cyberspace.”12 ■■ castles in the air: avatars, profiles, and remaking ourselves as we wish we were finally, a place to love your body, love your friends, and love your life. —second life, “what is second life?”13 one of the interesting and worrisome themes in both turkle’s and aboujaoude’s studies is that of the reinvention and transformation of the self, in the form of online personas and avatars. this is the stock-in-trade of online communities and gaming sites such as facebook and second life. these sites cater to our nearly universal desire to be someone other than who we are: online, you’re slim, rich, and buffed up, and you feel you have more opportunities than in the real world. . . . we can reinvent ourselves as comely avatars. we can write the facebook profile that pleases us. we can edit our messages until they project the self we want to be.14 the problem is that for many there is an increasing fuzziness at the interface between real and virtual ■■ changing mores, or the triumph of rudeness i can’t think of any successful online community where the nice, quiet, reasonable voices defeat the loud, angry ones. . . . the computer somehow nullifies the social contract. —heather champ, yahoo!’s flickr community manager7 sadly, we’ve all experienced it. we get stuck on a bus, train, or in an elevator with someone engaged in a loud conversation on her or his mobile phone. all too often, the person is loudly carrying on about matters we wish we weren’t there to hear. perhaps it’s a fight with a partner. or a discussion of some delicate health matter. whatever it is, we really don’t want to know, but because of the limitations imposed by physical spaces, we can’t avoid being a party to at least half of the conversation. what’s wrong with these individuals? do they really have no consideration or sense of propriety? it turns out that in matters of tact and good taste, the ground has shifted, and where once we understood and abided by commonly accepted rules of conduct and respect for others, we do so no longer. indeed, the everyday obnoxious intrusions by those using public spaces for their private conversations are among the least of offenders. consider the following situations shared by turkle: sal, 62 years old, holds a small dinner party at his home as part of his “reentry into society” after several years of having cared for his recently deceased wife: i invited a woman, about fifty, who works in washington. in the middle of a conversation about the middle east, she takes out her blackberry. she wasn’t speaking on it. i wondered if she was checking her e-mail. i thought she was being rude, so i asked her what she was doing. she said that she was blogging the conversation. she was blogging the conversation.8 turkle later tells of attending a memorial service for a friend. several [attendees] around me used the [printed] program’s stiff, protective wings to hide their cell phones as they sent text messages during the service. one of the texting mourners, a woman in her late sixties, came over to chat with me after the service. matter-of-factly, she offered, “i couldn’t stand to sit that long without getting on my phone.” the point of the service was to take a moment. this woman had been schooled by a technology she’d had for less than a decade to find this close to impossible.9 editorial: singularity—are we there, yet? | truitt 57 enough” became yet more blurred. turkle’s anecdotes of children explaining the “aliveness” of these robots are both touching and disturbing. speaking of a tamagotchi, one child wrote a poem: “my baby died in his sleep. i will forever weep. then his batteries went dead. now he lives in my head.”19 the concept of “alive enough” is not unique to the very young, either. by 2009, sociable robots had moved beyond children’s toys with the introduction of paro, a baby seal-like “creature” aimed at providing companionship to the elderly and touted as “the most therapeutic robot in the world. . . . the children were onto something: the elderly are taken with the robots. most are accepting and there are times when some seem to prefer a robot with simple demands to a person with more complicated ones.”20 where does it end? turkle goes on to describe nursebot, a device aimed at hospitals and long-term care facilities, which colleagues characterized as “a robot even sherry can love.” but when turkle injured herself in a fall a few months later, [i was] wheeled from one test to another on a hospital stretcher. my companions in this journey were a changing collection of male orderlies. they knew how much it hurt when they had to lift me off the gurney and onto the radiology table. they were solicitous and funny. . . . the orderly who took me to the discharge station . . . gave me a high five. the nursebot might have been capable of the logistics, but i was glad that i was there with people. . . . between human beings, simple things reach you. when it comes to care, there may be no pedestrian jobs.21 but need we librarians care about something as farfetched as nursebot? absolutely. now that ibm has proven that it can design a machine—okay, an array of machines, but something much more compact is surely coming soon—that can win at jeopardy!, is the robotic reference librarian really that much of a hurdle? take a bit of watson technology, stick it in nursebot, give it sensible shoes, and hey, i can easily imagine bibliobot, factory-standard in several guises, including perhaps donna reed (as mary, who becomes the town librarian in the alter-life of capra’s it’s a wonderful life) or shirley jones (as marian, the librarian, in the music man). i like donna reed as much as anyone, but do i really want reference assistance from her android doppelgänger? but then, for years after the introduction of the atm, i confess that i continued taking lunch hours off just so that i could deal with a “real person” at the bank, so perhaps it’s just me. the future is in the helping/service professions, indeed! and when we’re all replaced by robots (sociable and otherwise), what will we do to fill the time? personas: “not surprisingly, people report feeling let down when they move from the virtual to the real world. it is not uncommon to see people fidget with their smartphones, looking for virtual places where they might once again be more.”15 turkle speaks of the development of what she terms a “vexed relationship” between the real and the virtual: in games where we expect to play an avatar, we end up being ourselves in the most revealing ways; on social-networking sites such as facebook, we think we will be presenting ourselves, but our profile ends up as somebody else—often the fantasy of who we want to be. distinctions blur.16 and indeed, some completely lose sight of what is real and what is not. aboujaoude relates the story of alex, whose involvement in an online community became so consuming that he not only created for himself an online persona—“’i then meticulously painted in his hair, streak by streak, and picked “azure blue” for his eye color and “snow white” for his teeth.’”—but also left his “real” girlfriend after similarly remaking the avatar of his online girlfriend, nadia—“from her waist size to the number of freckles on her cheeks.” speaking of his former “real” girlfriend, alex said, “real had become overrated.”17 ■■ “don’t we have people for these jobs?”18 ageist disclaimer: when i grew up, robots—those that weren’t in science fiction stories or films—were things that were touted as making auto assembly lines more efficient, or putting auto workers out of jobs, depending on your perspective. while not technically a robot, the other machine that characterized “that time” was the automated teller machine (atm), which freed us from having to do our banking during traditional weekday hours, and not coincidentally resulted, again, in the loss of many entry-level jobs in financial institutions. as i recall, we were all reassured that the future lay in “helping/ service” professions, where the danger of replacement by machines was thought to be minimal. now, fast forward 30 years. the first half of turkle’s book is the history of “sociable robots” and our interactions with them. moving from the reactions of mit students to joseph weizenbaum’s eliza in the mid-1970s, she recounts her studies of children’s interactions, first with electronic toys—e.g., tamagotchi—and later, with increasingly sophisticated and “alive” robots, such as furby, aibo, and my real baby. with each generation, these devices made yet more “demands” on their owners—for care, “feeding”, etc. and with each generation, the line between “alive” and “alive 58 information technology and libraries | june 2011 to admit that we’ve seen many examples of how connectedness between people we’d otherwise consider “normal” has and is changing our manners and mores.24 many libraries and other public spaces, reacting to patron complaints about the lack of consideration shown by some users, have had to declare certain areas “cell phone free.” in the interest of getting your attention, i’ve admittedly selected some fairly extreme examples from the two books at hand. however, i think the point is that, now that the glitter of always-on, always-connected, has begun to fade a bit, there is a continuum of dysfunctional behaviors that we are beginning to notice, and it’s time to talk about how we as librarians fit into all of this. are there things we in libraries are doing that encourage some of these less desirable and even unhealthy behaviors? which takes us to a second concern raised by some of my gentle draft-readers: we’ve heard this tale before. television, and radio before it, were technologies that, when they were new, were criticized as corrupting and leading us to all sorts of negative, self-destructive, and socially undesirable behaviors. how are screens and the technology of always-connected any different? a part of me—the one that winces every time someone glibly refers to the “transformational” changes taking place around us—agrees. i was trained as a historian, to take a long view about change. and we’re talking about technologies that—in the case of the web— have been in common use for just over fifteen years. that said, my interest here is in seeing our profession begin a conversation about how connective technologies have influenced behavioral changes in people, and especially about how we in libraries may be unwittingly abetting those behavioral changes. television and radio were fundamentally different technologies in that they were one-way broadcast tools. and to the best of my recollection, neither has ever been widely adopted by or in libraries. yes, we’ve circulated videos and sound recordings, and even provided limited facilities for the playback of such media. but neither has ever really had an impact on the traditional core business of libraries, which is the encouragement and facilitation of the largely solitary, contemplative act of reading. connective technologies, in the form of intelligent machines and network-based communities, can be said to be antithetical to this core activity. we need to think about that, and to consider carefully the behaviors we may be encouraging. notwithstanding those critics of change in our profession who feel we move far too glacially, i would maintain that we have often been, if not at the forefront of the technology pack, then certainly among its most enthusiastic ■■ where from here? i titled this column “singularity.” for those not familiar with the literature of science fiction, turkle provides a useful explanation: this notion has migrated from science fiction to engineering. the singularity is the moment—it is mythic; you have to believe in it—when machine intelligence crosses a tipping point. past this point, say those who believe, artificial intelligence will go beyond anything we can currently conceive. . . . at the singularity, everything will become technically possible, including robots that love. indeed, at the singularity, we may merge with the robotic and achieve immortality. the singularity is technological rapture.22 i think it’s pretty clear that we’re still a fair distance from anything that one might reasonably term a singularity. but the concept is surely present, albeit in a somewhat less hubristic degree, when we speak in uncritical awe of “game-changing” or “transformational” technologies. turkle puts it this way: the triumphalist narrative of the web is the reassuring story that people want to hear and that technologists want to tell. but the heroic story is not the whole story. in virtual worlds and computer games, people are flattened into personae. on social networks, people are reduced to their profiles. on our mobile devices, we often talk to each other on the move and with little disposable time—so little, in fact, that we communicate in a new language of abbreviation in which letters stand for words and emoticons for feelings. . . . we are increasingly connected to each other but oddly more alone: in intimacy, new solitudes.23 some of my endlessly patient friends—the ones who provide both you and me with some measure of buffering from the worst of my rants in prepublication drafts of these columns—have asked questions about how all this relates to libraries, for example: how much it is legitimate to generalize to the broader population research findings from cases of obsessive compulsive disorder? the individuals studied are, of course, obsessive and compulsive, in relation to the internet and new technologies. do their behaviors not represent an extreme end of the population? a fair question. and yes, the examples i’ve provided in this column are admittedly somewhat extreme. but turkle and aboujaoud both point to many examples that are far more common. i think all of us would have editorial: singularity—are we there, yet? | truitt 59 references and notes 1. marc truitt, “editorial: the air is full of people,” information technology and libraries 30 (mar. 2011): 3–5. http:// www.ala.org/ala/mgrps/divs/lita/ital/302011/3001mar/ editorial_pdf.cfm (accessed apr. 25, 2011). 2. sherry turkle, alone together: why we expect more from technology and less from each other (new york: basic books, 2011); elias aboujaoude, virtually you : the dangerous powers of the e-personality (new york : norton, 2011). 3. turkle, 19. 4. ibid., 3–4. 5. quoted in ibid., 5. 6. ibid., 9–10. emphasis added. 7. quoted in aboujaoude, 99. 8. turkle, 162. emphasis in original. 9. ibid, 295. 10. turkle, 161. 11. aboujaoude, 96 12. ibid., 98. 13. quoted in turkle, 1. 14. ibid., 12. 15. ibid. 16. ibid., 153. 17. aboujaoude, 77–78. 18. turkle, 290. 19. ibid., 34. 20. ibid., 103–4. 21. ibid., 120–21. 22. ibid., 25. 23. ibid., 18–19. 24. for a recent and typical example, see david carr, “keep your thumbs still when i’m talking to you,” new york times, apr. 15, 2011, http://www.nytimes.com/2011/04/17/ fashion/17text.html (accessed may 2, 2011). 25. aboujaoude, 283. adopters. in our quest to remain “relevant” to our university or school administrations, governing boards, and (in theory, at least) our patrons, we have embraced with remarkably little reservation just about every technology trend that’s come along in the past few decades. at the same time, we’ve been remarkably uncritical and unreflective about our role in, and the larger implications of, what we might be doing by adopting these technologies. aboujaoude, in a surprising, but i think largely correct summary comment, observes: extremely little is available, however, for the individual interested in learning more about how virtual technology has reshaped our inner universe and may be remapping our brains. as centers of learning, public libraries, schools, and universities may be disproportionately responsible for this deficiency. they outdo one another in digitalizing their holdings and speeding up their internet connections, and rightfully see those upgrades as essential to compete for students, scholars, and patrons. in exchange, however, and with few exceptions, they teach little about the unintended, less obvious, and more personal consequences of the world wide web. the irony is, at least in some libraries’ case, that their very survival seems threatened by a shift that they do not seem fully engaged in trying to understand, much less educate their audiences about.25 i could hardly agree more. so, how do we answer aboujaoude’s critique? letter from the editor: farewell 2020 letter from the editor farewell 2020 kenneth j. varnum information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13051 i don’t think i’ve ever been so ready to see a year in the rear-view mirror as i am with 2020. this year is one i’d just as soon not repeat, although i nurture a small flame of hope. hope that as a society what we have experienced this year will exert a positive influence on the future. hope that we recall the critical importance of facts and evidence. hope that we don’t drop the effort to be better members of our local, national, and global communities and treat everyone equitably. hope that as a global populace we continue to get into “good trouble” and push back against institutionalized policies and practices of racism and discrimination and strive to be better. despite the myriad challenges this year has brought, it is welcome to see so many libraries continuing to serve their communities, adapting to pandemic restrictions, and providing new and modified access to books and digital information. and equally gratifying, from my perspective as ital’s editor, is that so many library technologists continue to generously share what they have learned through submissions to this journal. along those lines, i’m extending my annual invitation to our public library colleagues to propose a contribution to our quarterly column, “public libraries leading the way.” items in this series highlight a technology-based innovation from a public library perspective. topics we are interested in could include any way that technologies have helped you provide or innovate service to your communities during the pandemic, but could touch on any novel, interesting, or promising use of technology in a public library setting. columns should be in the 1,000-1,500 word range and may include illustrations. these are not intended to be research articles. rather, public libraries leading the way columns are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. wishing you the best for 2021, kenneth j. varnum, editor varnum@umich.edu december 2020 https://ejournals.bc.edu/index.php/ital/pllw https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform mailto:varnum@umich.edu lib-mocs-kmc364-20131012114451 322 highlights of lit a board meetings these highlights are published to inform division members of the activities of their board. they are abstracted from the official minutes. 1981 ala annual conference san francisco first session june 29, 1981 board members present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, bonnie k. juergens, marilyn j. rehnberg, helen cyr, heike kordish, donald p. hammer. lita election results. vice-president/president-elect: carolyn m. gray director-at-large: hugh atkinson ala councilor: bonnie k. juergens vccs vice-chairperson/chairperson-elect: mary h. karpinski vccs secretary: patricia m. paine vccs member-at-large: leon l. drolet, jr. avs chairperson: anne t. meyer a vs vice-chairperson/chairperson-elect: louis r. pointon avs member-at-large: michael d. miller isas vice-chairperson/chairperson-elect: james c. thompson isas member-at-large: sherrie schmidt evaluation of electronic mail project. the members of the board reviewed their experiences and impressions with the ontyme electronic mail system. the general consensus was that the system was very good and everyone was pleased with it and wants to expand its use. the board has not yet used the source, although we are now subscribers to the system. motion was made by markuson , seconded by rehnberg, and passed that: the electronic mail project be extended through the midwinter meeting, 1982, with a total budget of $2,000 from the inception of the project. uta's representation on ansi x-3. x-3 is the american national standards institute committee on computers and information processing. discussion included the mechanics of keeping the membership informed of proposed standards being considered, the large amount of time required of the representahighlights of meetings 323 tive to monitor, study, and disseminate the proposed standards, and the costs involved for lit a to support a representative. juergens requested that if a division-wide representative to x-3 is appointed that that person should also be made ex officio to the isas/tesla committee or be liaison to the chair of isas. no action was taken. goals and long-range planning committee. kenney announced that she had appointed an ad hoc goals and long-range planning committee chaired by george abbott. directory of library systems in use. the suggestion was made that a directory of the many automated systems in use in libraries would be very useful. a motion was made by markuson, seconded by kenney, and passed that: in response to inquiry about a directory to assist in identifying specific applications of technology in libraries, media, and information centers, that the publications committee explore the feasibility of an online lit a directory of library, media, and information center use of technology. the investigation should consider format of description, potential of interactive online updating, and possible output byproducts, and should result in a draft rfp for consideration by the lit a board for review at midwinter. president's program at philadelphia. kenney announced her plans for the lit a president's program at the philadelphia ala annual conference. she is planning to transmit by satellite to fifty receiving sites around the country an "ala sampler" of outstanding technically-based programs from the philadelphia conference and short vignettes of what ala is all about. the subject of "0 n-line catalogs" has been chosen for the president's program and segments of it and the rtsd/lit airasd preconference institute on the same subject will be used. the program is intended for people who cannot get to ala conferences. if not enough registration is received by the coming ala midwinter meeting the whole activity would be cancelled. oral history project. at the 1980 new york ala conference, the suggestion was made that in the future many of the pioneers in the field of library automation will pass off the scene and it was felt that it was lit a's responsibility to capture for posterity the ideas and philosophy of those people. a motion was made by kenney, seconded by eaton, and passed that: an ad hoc committee be formed to investigate an oral history project in all aspects and submit a detailed set of alternative approaches for the board's consideration. the library history roundtable will be informed of the committee's activity and invited to participate. second session june 30, 1981 board members present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton , kenneth}. bierman, ronald f. miller, bonnie k. juergens, marilyn j. rehnberg, helen cyr , heike kordish , charles husbands, and donald p. hammer. 324 journal of library automation vol. 14/4 december 1981 lita section reports: isas. bonnie juergens, chairperson of isas, reported that the section has approved three programs for the philadelphia conference. asis will be asked to cosponsor the program "information science, computer science, and library science: in search of common ground". another program is the "the uses of microcomputers in medium-sized public and academic libraries," and the third one will be a detailed analysis and comparison of the marc format. juergens reported that the isas retrospective conversion discussion group and one of the same name in rtsd would like to combine. a motion was made by juergens, and passed that: isas pursue appropriate steps to invite the rtsd section which currently hosts a discussion group on retrospective conversion to combine that discussion group with the lit a/isas retrospective conversion discussion group. the invitation to rtsd will include a specific description of mutual responsibilities. electronic library membership initiative group. (information report by richard sweeney, public library of columbus and franklin co., ohio; and neal kaske, oclc.) sweeney reviewed the discussions that took place at a meeting held in columbus on march 23-24, 1981 concerned with the whole area of remote electronic access to information and its impact on the library field. the group concluded that its members want to have some input on a very immediate level on the direction technology goes and the direction the policies and issues go. out of that meeting came a mission statement which is now the function statement of the ala electronic library membership initiative group (elmig). sweeney read that statement and reported on the group's concern for the future of libraries when these remote systems become established. he commented on the large number of programs and meetings on these areas that are not coordinated and not really providing the leadership our field should be giving. the almost total lack of research on these areas was also commented on. the need for the associations to provide the leadership was stressed. several members of the lita board expressed interest in providing a "home" for elmig within lita as many of lit a's interests are those of the mig. both groups are concerned with the same issues it was pointed out. lita section reports: audio-visual section. avs recommended that an audiovisual task force be established, which would include other ala units, and would share information about their plans, and would try to avoid major schedule conflicts and overlaps. a motion was made by cyr, and passed that lit a board approve ad hoc lit a a-v section participation in a broadbased task force involving rtsd, pla, acrl, aasl and others to coordinate audiovisual-related activities. cyr asked board's sanction for a "a-v interest group breakfast" where people could just socialize and talk together. this would be sometime in the future. the board members had no objection. marbi committee report. elaine woods reported that the marbi committee is focusing more on the principles and the issues that need to be highlights of meetings 325 addressed in the marc format. the committee is current with l.c. proposals. marbi has drawn up a shopping list of issues to be addressed and they are now working on some of them. publications committee report. charles husbands informed the board that the publications committee feels it is time to change the title of lola. they have chosen a title of information technology and libraries, and it is to be effective with the march 1982 issue. after discussion, a motion was made by bonnie juergens, and passed to that effect. the matter of raising the subscription price of lola was discussed. due to the fact that the division's subsidy to the journal will greatly increase next budget year, the motion was made by ken bierman, and passed that: non-member prices for the journal of the division be increased to $20 for a one-year subscription and $5.50 for a single issue, effective with march 1982, and that the published member subscription price be raised sufficiently to conform to postal regulation. husbands requested that various members of the lola editorial board be included in the lita electronic mail system. approved by the board by consensus. husbands asked the board to keep in mind the possibility of publishing some of the results of the oral history project in lola. brian aveney asked the board to allow him to investigate the possibility of putting the full text of lola online. it would be an experiment to see what people would do with it. the board approved by consensus. aveney will return with a final proposal later. other such ideas were discussed including the proposals to put the "headlines" from the lita newsletter on the source, and to include the roster of lita committees in the oclc address directory. arrangements are in process for both of these activities. goals and long-range planning committee. george abbott, chairperson, asked the board's permission to include his committee on lit a's electronic mail system. the intent would be to use it for text editing of committee documents. board approved by consensus. abbott reported that the committee expects to hold open hearings at midwinter and to have a basic document for discussion at that time. third session june 30, 1981 board members present: s. michael malinconico, brigitte l. kenney, ronald f. miller, kenneth j. bierman, marilyn j. rehnberg, heike kordish, and donald p. hammer. bylaws and organization report. there have been seven changes to the lit a bylaws that kordish will prepare in text form for the board to act on at midwinter in time for the spring ala ballot. ala priorities survey. ron miller reported that the ala executive board took action on the ala priorities and there are five of them. briefly, they are 326 journal of library automation vol. 14/4 december 1981 access to information , legislation and funding, intellectual freedom, public awareness, and personnel resources. joint council on educational telecommunications. lynne bradley reported that jcet has established a task force to bring information to its members about the new technologies and how they can best be used in education. since lit a members have much of the necessary expertise, bradley suggested that lita organize a one-day program for jcet. some board members were very much interested and bradley was asked to work with the lit a program planning committee to organize such a program. program planning committee. sue tyner reported that the telecommunications committee will hold a preconference institute at the philadelphia annual conference called "the teleconference center." it is intended to teach librarians how to set up a teleconference center. the lit a group that has been putting on the "data processing specifications and contracting" workshops has been asked to hold a workshop prior to the ifla meeting. malinconico suggested that the board adopt a policy of lit a costs plus 15 percent, but that a subcommittee of the lit a program planning committee should be set up to define policy in this area. carolyn gray was suggested as a person for this committee. marilyn rehnberg, chairperson of vccs, reported a request from national audio-visual association asking lit a to put on a " video showcase" for the seminar part of the nava annual conference in anaheim in january. lit a board of directors meetin gs record of votes 1981 annual conference motions (in order of appearance in the " highlights") board member 1 2 3 4 5 6 7 8 s. michael malinconico y y y y y y y y brigitte l. kenney y y y y y y y y barbara e. markuson y y y y y y y y nancy l. eaton y y y y y y y y kenneth j . bierman y y y y y y y y honald f. miller 0 0 0 y y y y y angie w. lecierq 0 0 0 0 0 0 0 0 helen cyr y y y y y y y y bonnie k. juergens y y y y y y y y marilyn j. rehnberg y y y y y y y y key: y =yes a= abstain 0 =absent president’s message andromeda yelton information technology and libraries | march 2018 2 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, massachusetts. in my last president’s message, i talked about change — ital’s transition to new leadership — and imagination — wakanda and the archival imaginary. today change and imagination are on my mind again as lita contemplates a new path forward: potential becoming a new combined division with alcts and llama. as you may have already seen on litablog (http://litablog.org/2018/02/lita-alcts-and-llamadocument-on-small-division-collaboration/), the three divisional leadership teams have been envisioning this possibility, and all three division boards discussed it at midwinter. while the id ea sprang out of our shared challenges with financial stability, in discussing it we’ve realized how much opportunity we have to be stronger together. for instance, we’ve heard for years that you, lita members, want more of a leadership training pathway, and more ways to stay involved with your lita home as you move into management; alignment with llama automatically opens up all kinds of possibilities. they have an agile divisional structure with their communities of practice and an outstanding set of lead ership competencies. and anyone involved with library technology knows that we live and die by metadata, but we aren’t all experts in it; joining forces with alcts creates a natural home for people no matter where they are (or where they’re going) on the technology/metadata continuum. alcts also runs far more online education than lita and runs a virtual conference. meanwhile, of course, lita has a lot to offer to llama and alcts. you already know how rewarding the networking is, and how great the depth of expertise on technology topics. we also bring strong publications (like this very journal), marquee conference programs (like top tech trends and the imagineering panel), and a face-to-face conference. (speaking of which, please pitch a session (http://bit.ly/2gpgxdf) for the 2018 lita forum!) i want to emphasize that no decisions have been made yet. the outcome of our three board discussions was that we all feel there is enough merit to this proposal to explore it further, but none of us are formally committed to this direction. furthermore, it is not practically or procedurally possible to make a change of this magnitude until at least 2019. in the meantime, we expect there will be numerous working groups to determine if and how this all could work, as well as open forums for the membership of all three divisions to express hopes, concerns, and ideas. personally, my highest priority is to ensure that that you, the members, continue to have a divisional home: one that gives you learning opportunities and a place for professional camaraderie, and that is on solid financial footing so it can continue to be here for you in the long term. http://litablog.org/2018/02/lita-alcts-and-llama-document-on-small-division-collaboration/ http://litablog.org/2018/02/lita-alcts-and-llama-document-on-small-division-collaboration/ http://bit.ly/2gpgxdf president’s message | march 2018 3 https://doi.org/10.6017/ital.v37i1.10386 so, i’m excited about the possibilities that a superhero teamup affords, but i’m even more excited to hear from you. do you find this prospect thrilling, scary, both? do you think we should absolutely go this way, or definitely not, or maybe but with caveats and questions? please tell me what you think. you can submit anonymous feedback and questions at https://bit.ly/litamergefeedback. i will periodically collate and answer these questions on litablog. you can also reach out to me personally any time (andromeda.yelton@gmail.com). https://bit.ly/litamergefeedback mailto:andromeda.yelton@gmail.com automated storage & retrieval system: from storage to service articles automated storage & retrieval system: from storage to service justin kovalcik and mike villalobos information technology and libraries | december 2019 114 justin kovalcik (jdkovalcik@gmail.com) is director of library information technology, csun oviatt library. mike villalobos (mike.villalobos@csun.edu) is guest services supervisor, csun oviatt library. abstract the california state university, northridge (csun) oviatt library was the first library in the world to integrate an automated storage and retrieval system (as/rs) into its operations. the as/rs continues to provide efficient space management for the library. however, added value has been identified in materials security and inventory as well as customer service. the concept of library as space, paired with improved services and efficiencies, has resulted in the as/rs becoming a critical component of library operations and future strategy. staffing, service, and security opportunities paired with support and maintenance challenges, enable the library to provide a unique critique and assessment of an as/rs. introduction “space is a premium” is a phrase not unique to libraries; however, due to the inclusive and open environment promoted by libraries, their floor space is especially attractive to those within and outside of the building’s traditional walls. in many libraries, the majority of floor space is used to house a library’s collection. in the past, as collections grew, floor space became increasingly limited. faced with expanding expectations and demands, libraries struggled to identify a balance between transforming space for new services while adding materials to a growing collection. in addition to management activities like weeding, other solutions such as offsite storage and compact shelving rose in popularity as a method to create library space in the absence o f new building construction. years later as collections move away from print and physical materials, libraries are beginning to reexamine their building’s space and envision new features and services. “now that so many library holdings are accessible digitally, academic libraries have the opportunity to make use of their physical space in new and innovative ways.”1 the csun oviatt library took a novel approach and launched the world’s first automated storage and retrieval system (as/rs) in 1991 as a storage solution to resolve its building space limitations. the project was a california state university (csu) system chancellor’s office initiative that cost more than $2 million to implement and began in 1989. the original concept “came from the warehousing industry, where it had been used by business enterprises for years.”2 by leveraging and storing physical materials in the as/rs, the csun oviatt library is able to create space within the library for new activities and services. “instead of simply storing information materials, the library space can and should evolve to meet current academic needs by transforming into an environment that encourages collaborative work.”3 mailto:jdkovalcik@gmail.com mailto:mike.villalobos@csun.edu automated storage & retrieval system | kovalcik and villalobos 115 https://doi.org/10.6017/ital.v38i4.11273 unfortunately, as the first stewards of an as/rs, csun made decisions that led to mismanagement and neglect resulting in the as/rs facing many challenges in becoming a stable and reliable component of the library. however, recent efforts have sought to resolve these issues and resulted in system updates, management, and functionality. whereas in the past low-use materials were placed in as/rs to create space for new materials, now materials are moved into the as/rs to create space for patrons, secure collections, and improve customer service. as part of this critical review, the functionality and maintenance along with the historical and current management of the as/rs will be examined. background csun is the second-largest member of the twenty-three-campus csu system. the diverse university community includes over 38,000 students and more than 4,000 employees.4 consisting of nine colleges offering 60 baccalaureate degrees, 41 master’s degrees, 28 credentials in education, and various extended learning and special programs, csun provides a diverse community with numerous opportunities for scholarly success.5 the csun oviatt library’s as/rs is an imposing and impressive area of the library that routinely attracts onlookers and has become part of the campus tour. the as/rs is housed in the library’s east wing and occupies an area that is 8,000 square feet and 40 feet high arranged into six aisles. the 13,260 steel bins, each 2 feet x 4 feet, in heights of 6, 10, 12, 15, and 18 inches, are stored on both sides of the aisles enabling the as/rs to store an estimated 1.2 million items.6 each aisle has a storage retrieval machine (srm) that performs automatic, semiautomatic, and manual “picks” and “deposits” of the bins.7 the as/rs was assessed in 2014 as responsibilities, support, and expectations of the system shifted and previous configurations were no longer viable. discontinued and failing equipment, unsupported server software, inconsistent training and use, and decreased local support and management were identified as impediments for greater involvement in library projects and operations. campus provided funding in 2015 to update the server software as well as major hardware components on three of the six aisles. divided into two phases, the server software upgrade was completed in may 2017 followed by the hardware upgrade in january 2019.8 literature review the continued growth of student, faculty, and academic programs along with evolving expectations and needs since the late 1980s has required the library to analyze library services and examine the building’s physical space and storage capacity. in the late 1980s, identifying space for increasing printed materials was the main contributing factor in implementing the as/rs. in the mid-2010s, creating space within the library for new services was dependent on a stable and reliable as/rs. “the conventional way of solving the space problem by adding new buildings and off-site storage facilities was untenable.”9 a benefit of an as/rs, as creaghe and davis predicted in 1986 was, “the probable slow transition from books to electronic media, an aaf [automated access facility] may postpone the need for future library construction indefinitely.”10 the as/rs has enabled the library to create space by removing physical materials while enhancing customer service, material security, and inventory control. “the role of the library as service has been evolving in lockstep with user needs. the current transformative process that takes place in academia has a powerful impact on at least two functional areas of the library: information technology and libraries | december 2019 116 library as space and library as collection.”11 in addition, the “increased security the aaf … offers will save patrons time that would be spent looking for books on the open shelves that may be in use in the library, on the waiting shelves, misplaced, or missing.”12 in subsequent years, library services have evolved to include computer labs with multiple high-use printers/scanners/copiers, instructional spaces, individual and group study spaces, makerspaces, etc., in addition to campus entities that have required large amounts of physical space within the library. “it is well-known that academic libraries have storage problems. traditional remedies for this situation—used in libraries across the nation—include off-site storage for less used volumes, as well as, more recently, innovative compact shelving. these solutions help, but each has its disadvantages, and both are far from ideal. . . . when the eastern michigan university library had the opportunity to move into a new building, we saw that an as/rs system would enable us to gain open space for activities such as computer labs, training rooms, a cafe, meeting rooms, and seating for students studying.”13 the as/rs provides all the space advantages provided by off-site storage and compact shelving while adding much more value while mitigating negatives of off-site time delays and the confusion of accessing and using compact shelving. staffing & usage 1991–1994 following the 80/20 principle, low-use items were initially selected for storage in the as/rs. “when the storage policy was being developed in [the] 1990s, the 80/20 principle was firmly espoused by librarians. . . . thus, by moving lower-use materials to as/rs, the library could still ensure that more than 80% of the use of the materials occurs on volumes available in the open stacks.”14 low-use items were identified if one of the following three conditions was met: (1) the item’s last circulation date was more than five years ago; (2) the item was a non-circulating periodical; or (3) items that were not designed to leave an area and received little patron usage such as the reference collection. in 1991, the as/rs was loaded with 800,000 low-use items and went live for the first time later that year. staffing for the initial as/rs department consisted of one full-time as/rs supervisor (40 hours/week), one part-time as/rs repair technician (20 hours/week), and 40 hours a week of dedicated student employees, for a total of 100 hours a week of dedicated as/rs management. the as/rs was largely utilized as a specialized service for internal library operations with limited patron-initiated requests. as/rs operations were uniquely created and customized for each as/rs operator as well as the desired task needing to be performed. skills were developed internally with knowledge and training shared by word of mouth or accompanied with limited documentation. 2000 mid-2000s the as/rs department functioned in this manner until the 1994 northridge earthquake struck the campus directly and required partial building reconstruction to the library. although there was no damage to the as/rs itself or its surrounding structure, extensive damage occurred in the wings of the library. the damage resulted in the library building being closed and inaccessible. when the library reopened in 2000, it was determined that due to previous as/rs low usage that a dedicated department was no longer warranted. the as/rs supervisor position was dissolved, the student employee budget was eliminated, and the as/rs technician position was not replaced after the employee retired in 2008. as/rs operational responsibilities were consolidated into the circulation department and as/rs administration into the systems department. both circulation automated storage & retrieval system | kovalcik and villalobos 117 https://doi.org/10.6017/ital.v38i4.11273 and systems departments redefined their roles and responsibilities to include the as/rs without additional budgetary funding, staffing, or training. in order for as/rs operations to be absorbed by these departments, changes had to occur in the administration, operating procedures, staffing assignments, and access to the as/rs. all five circulation staff members and twenty student employees received informal training by members of the former as/rs department in the daily operations of the as/rs. the circulation members also received additional training for first-tier troubleshooting of as/rs operations such as bin alignments, emergency stops, and inventory audits. the as/rs repair technician remained in the systems department; however, as/rs troubleshooting responsibility was shared among the systems support specialists and dedicated as/rs support was lost. the administrative tasks of scheduling preventive maintenance services (pms), resolving as/rs hardware/equipment issues with the vendor, and maintaining the server software remained with the head of the systems department. without a dedicated department providing oversight for the as/rs, issues and problems began to occur frequently. circulation had neither the training nor resources available to master procedures or enforce quality control measures. similarly, the systems department became increasingly removed from daily operations. many issues were not reported at all and became viewed as system quirks that required workarounds or were viewed as limitations of the system. for issues that were reported, troubleshooting had to start all over again and systems relied on circulation staff being able to replicate the issue in order to demonstrate the problem. system’s personnel retained little knowledge on performing daily operations, and troubleshooting became more complex and problematic as different operators had different levels of knowledge and skill that accompanied their unique procedures. mid-2000s–2015 these issues became further exasperated when areas outside of circulation were given full access to the as/rs in the mid-2000s. employees from different departments of the library began entering and accessing the as/rs area and operated the as/rs based on knowledge and skills they learned informally. student assistants from these other departments also began accessing the area and performing tasks on behalf of their informally trained supervisors. further, without access control, employees as well as students ventured into the “pit” area of the as/rs where the srms move and end-of-aisle operations occur. this area contains many hazards and is unsafe without proper training. during this period, the special collections and archives (sc/a) department loaded thousands of un-cataloged, high-use items into the as/rs that required specialized service from circulation. these items were categorized as “non-library of congress” and inventory records were entered into the as/rs software manually by various library employees. in addition, paper copies were created and maintained as an independent inventory by sc/a. over the years, the sc/a paper inventory copies were found to be insufficiently labeled, misidentified, or missing. therefore, the as/rs software inventory database and the sc/a paper copy inventory contained conflicts that could not be reconciled. to resolve this situation, an audit of sc/a materials was completed in spring 2019 to locate inventory that was thought to be missing. information technology and libraries | december 2019 118 all bound journals and current periodicals were eventually loaded into the as/rs as well, causing other departments and areas to rely on the as/rs more heavily. departments such as interlibrary loan and reserves, as well as patrons, began requesting materials stored in the as/rs more routinely and frequently. the as/rs transformed from a storage space with limited usage to an active area with simultaneous usage requests of different types throughout the day. without a dedicated staff to organize, troubleshoot, and provide quality control, there was an abundance of errors that led to long waits for materials, interdepartmental conflicts, and unresolved errors. high-use materials from sc/a, as well as currently received periodicals from the main collection, were the catalysts that drove and eventually warranted change in the as/rs usage model from storage to service. the inclusion of these materials created new primary customers identified as internal library departments: sc/a and interlibrary loan (ill). with over 4,000 materials contained in the as/rs, sc/a requires prompt service for processing archival material into the as/rs and filling specialized patron requests for these materials. in addition, ill processes over 500 periodical requests per month that utilize and depended on as/rs services. the additional storage and requests created an uptick in overall as/rs utilization that carried over into circulation desk operations as well. 2015–present the move from storage to service was not only inevitable due to an evolving as/rs inventory, but was necessary in order to regain quality control and manage the library-wide projects that involved the as/rs. the increased usage and reliance on the as/rs required the system be well maintained and managed. administration of the as/rs remains within systems and circulation student employees continue to provide supervised assistance to the as/rs. the crucial change was identified and emerged within circulation for a dedicated operations and project manager. an as/rs lead position was created with responsibilities for the daily operations and management of the system and service. however, this was not a complete return to the original staffing concept of the early 1990s. the concept for this new position focuses on project management and system operations rather than the original sole attention to system operations. the as/rs lead is the point of contact for all library projects that utilize the as/rs, relaying any as/rs issues or concerns to systems, and daily as/rs usage. this shift is necessary due to the increased demand and reliance on the system that has changed its charge from storage to service. customer service the library noted over time that the as/rs could be used as a tool in weeding and other collection shift projects to create space and aid in reorganizing materials. as more high-use materials were loaded into the as/rs the indirect advantages of the as/rs became more apparent. patrons request materials stored within the as/rs through the library’s website and pick up the materials at the circulation desk. there is no need for patrons to navigate the library, successfully use the classification system, and search shelves to locate an item that may or may not be there. as kirsch notes, “the ability to request items electronically and pick them up within minutes eliminates the user’s frustration at searching the aisles and floors of an unfamiliar library.”15 the vast majority of library patrons are csun students that commute and must make the best use of their time while on campus. housing items in the as/rs creates the opportunity to have hundreds of thousands of items all picked up and returned to one central location. this makes it far easier for library patrons, especially users with mobility challenges, to engage with a plethora of library automated storage & retrieval system | kovalcik and villalobos 119 https://doi.org/10.6017/ital.v38i4.11273 materials. the time allotted for library research and/or enjoyment becomes more productive as their desired materials are delivered within minutes of arriving in the building. as heinrich and willis state, “the provision of the nimble, just-in-time collection becomes paramount, and the demand for as/rs increases exponentially.”16 as/rs items are more readily available than shelved items on the floor, as it takes minutes to have as/rs items returned and made available once again. “they may be lost, stolen, misshelved, or simply still on their way back to the shelves from circulation—we actually have no way of knowing where they are without a lengthy manual search process, which may take days. . . . unlike books on the open shelves, returned storage books are immediately and easily ‘reshelved’ and quickly available again.”17 another advantage is there is no need to keep materials in call-number order with the unpleasant reality of missing and misshelved items. items in the as/rs are assigned bin locations that can only be accessed by an operatoror user-initiated request. the workflow required to remove a material from the as/rs involves multiple scans and procedures that increase accountability that does not exist for items stored on floor shelves. further, users are assured of an item’s availability within the system. storing materials in the as/rs ensures that items are always checked out when they leave the library and not sitting unaccounted for in library offices and processing areas. it also avoids patron frustration of misshelved, recently checked-out, or missing items. security the decision to follow the 80/20 principle and place low-use items in the as/rs meant high-use items remained freely available to library patrons on the open shelves of each floor. this resulted in high-use items being available for patron browsing and checkout, as well as patron misuse and theft. the sole means of securing these high-use items involved tattle-tape and installing security gates at the main entrance. therefore, the development of policies and procedures for the enforcement of these gates was also required. beyond the inherent cost, maintenance, and issue of ensuring items are sensitized and desensitized correctly, gate enforcement became another issue that rested upon the circulation department. assuming theft would occur by exiting the building through passing through the gates at the main entrance of the library, enforcement is limited in actions that may be performed by library employees. touching, impeding the path, following, detaining, searching, etc. of library patrons are restricted actions reserved for campus authorities such as the police and not library employees. rather than attempting to enforce a security mechanism in which we have no authority, the as/rs provides an alternative for the security of high-use and valuable materials. storing items in the as/rs eliminates the possibility of theft or damage by visitors and places control and accountability over the internal use of materials. “there would be far fewer instances of mutilation and fewer missing items.”18 further, access to the as/rs area was restricted from all library personnel to only circulation and systems employees with limited exceptions. individual log ins also provided a method of control and accountability as each operator is required to use a personal account rather than a departmental account to perform actions on the as/rs. materials stored in the as/rs are, “more significantly . . . safer from theft and vandalism.”19 information technology and libraries | december 2019 120 inventory conducting a full inventory of a library collection is time consuming, expensive, and often inaccurate by the time of completion. missing or lost items, shelf reading projects, in-process items, etc. create overhead for library employees and generate frustration for patrons searching for an item. massive, library-wide projects such as collection shifts and weeding are common endeavors undertaken to create space, remove outdated materials, and improve collection efficiency. however, actions taken on an open shelves collection is time consuming, costly, inefficient, and affect patron activities. these projects typically involve months of work that involve multiple departments to complete. items stored within the as/rs do not experience these challenges because the system is managed by a full-time employee throughout the year and not on a project basis. the system is capable of performing inventory audits, and does not affect public services. therefore, while the cost of an item on an open shelf is $0.079, the cost of storing the same item in the as/rs is $0.0220 routine and spot audits ensure an accurate inventory, confirm capacity level of the system, and establish best management of the bins. as/rs inventory audits are highly accurate and much more efficient than shelf reading with little impact to patron services. “while this takes some staff time, it is far less time-consuming than shelf reading or searching for misshelved books.”21 storing materials in the as/rs is more efficient than on open shelves; however, bin management is essential in ensuring bins are configured in the best arrangement to achieve optimal efficiency. the size and configuration of bins directly affects storage capacity. type of storage, random or dedicated, also influences capacity, efficiency, and accessibility of items. the 13,260 steel bins in the as/rs range in height from 6 to 18 inches. the most commonly used bins are the 10and 12-inch bins; however, there is a finite number of these bin heights. unfortunately, the smallest and largest bins are rarely used due to material sizes and weight capacity; therefore, as/rs optimal capacity is unattainable and the number of materials eligible for loading limited by number of bins available. the library also determined that dedicated, rather than random, bin storage type aided in locating specialized materials, reduced loading and retrieval errors, and enhanced accessibility by arranging highly used bins to reachable locations. in the event an srm breaks down and an aisle becomes nonfunctional for retrieving bins, strategically placing the highest used and specialized locations in bins that can be manually pulled is a proactive strategy. however, this requires dedicated bins with an accurate and known inventory that has been arranged in accessible locations. lessons learned disasters & security in 1994, the as/rs proved to provide a much more stable and secure environment than the open stacks when it successfully endured a 6.9 earthquake. the reshelving of more than 300,000 items required a crew of more than thirty personnel over a year to complete. many items were destroyed from the impact of falling to the floor and being buried underneath hundreds of other automated storage & retrieval system | kovalcik and villalobos 121 https://doi.org/10.6017/ital.v38i4.11273 items. the as/rs in contrast consisted of over 800,000 items and successfully sustained the brunt of the earthquake’s impact with no damage to any of the stored items. unfortunately. the materials that had been loaded into the as/rs in 1991 were low-use items that were viewed as one step from weeding. therefore, high-use items stored in open shelves were damaged and required the long process of recovery and reconstruction: identifying and cataloging damaged and undamaged materials, disposal of those damaged, renovation of the area, and purchase of new items. the low-use items stored in the as/rs by contrast required a few bins that had slightly shifted be pushed back fully into their slots. as/rs items have proven to be more secure from misplacement, theft, and physical damage from earthquakes as compared to items in open shelves. maintenance, support, and modernization the csun oviatt library has received two major updates to the as/rs since it was installed in 1991. in 2011, the as/rs received updates for communication and positioning components. the second major update occurred in two phases between 2016 and 2018 and focused on software and equipment. in phase one, server and client-side software was updated from the original software created in 1989. in phase two, half the srms received new motors, drives, and controllers. due to the many years of reliance on preventive maintenance (pm) visits and avoidance of modernization, our vendors were unable to provide support for the as/rs software and had difficulty locating equipment that had become obsolete. preventive maintenance visits were used to maintain the status quo and are not a long-term strategy for maintaining a large investment and critical component of business operations. creaghe and davis note that, “current industrial facility managers report that with a proper aaf [automated access facility] maintenance program, it is realistic to expect the system to be up 9598 percent of the time.”22 pm service is essential for long-term as/rs success; however, preventive maintenance alone is incapable of modernization and ensuring equipment and software do not become obsolete. maintenance is not the same as support, rather maintenance is an aspect of support. support includes points of contacts who are available for troubleshooting, spare supplies on hand for quick repairs, a life-cycle strategy for major components, and longterm planning and budgeting. kirsch attested the following describing eastern michigan university’s strategy: “although the dean is proud and excited about this technology, he acknowledges that just like any computerized technology, when it’s down, it’s down. ” to avoid system problems, emu bought a twenty-year supply of major spare parts and employs the equivalent of one-and-a-half full-time workers to care for its automated storage and retrieval system.”23 a system that relies solely on preventive maintenance will quickly become obsolete and require large and expensive projects in the future if the system is to continue functioning. further, modernization provides an avenue for new features and functions to be realized that increase functionality and efficiency. networking the csun oviatt library on average receives between three to four visits a year along with multiple emails and phone conversations requesting information from different libraries regarding the as/rs. these conversations aid the library by viewing the as/rs in different perspectives and forces the library to review current practices. information technology and libraries | december 2019 122 the library has learned through speaking with many different libraries that needs, design, and configuration of an as/rs can be as unique as the libraries inquiring. the csun oviatt library, for example. is much different than the three other csu system libraries that have an as/rs. due to our system being outdated, it has been difficult to form or establish meaningful groups or share information because the systems are all different from each other. as more conversations occur and systems become more modern and standard, there is potential for knowledge sharing as well as group lobbying efforts for features and pricing. buy in user confidence in any system is required in order for that system to be successful. convincing a user base that moving materials from readily available open shelves and transferring them into steel bins housed within 40-feet-high aisles that are inaccessible will be difficult if the system is consistently down. therefore, the better the as/rs is managed and supported, the more reliable and dependable that system will be and the likelihood user confidence will grow. informing stakeholders of long-term planning and welcoming feedback demonstrates that the system is being supported and managed with an ongoing strategy that is part of future library operations. similarly, administrators need confirmation that large investments and mission-critical services are stable, reliable, and efficient. creating a new line item in the budget for as/rs support and equipment life-cycle requires justification along with a firm understanding of the system. in addition, staffing and organizational responsibilities must also be reviewed in order to establish an environment that is successful and efficient. continuous assessments of the as/rs regarding downtime, projects involved, services and efficiencies provided, etc. aid in providing an illustration of the importance and impact of the system on library operations as a whole. recording usage and statistics unfortunately, usage statistics were not recorded for the as/rs prior to june 2017. therefore, data is unavailable to analyze previous system usage, maintenance, downtime, or project involvement. data-driven decisions require the collection of statistics for system analysis and assessment. following the server software and hardware updates, efforts have been taken to record project statistics, inventory audits, and srm faults, as well as public and internal paging requests. conclusion the as/rs remains, as heinrich & willis described it, “a time-tested innovation.”24 through lessons learned and objective assessment, the library is positioning the as/rs to be a critical component for future development and strategy. by expanding the role of the as/rs to include functions beyond low-use storage, the library discovered efficiencies in material security, customer service, inventory accountability, and strategic planning. the csun oviatt library has learned, experienced, and adjusted its perception, treatment, and usage of the as/rs over the past thirty years. factors often forgotten such as access to the area, staffing and inventory auditing are easily overlooked, while other potential functions such as material security and customer services may not be identified without ongoing analysis and assessment. critical review without a limited or biased perception, has enabled the library to realize the greater functionality the as/rs is able to provide. automated storage & retrieval system | kovalcik and villalobos 123 https://doi.org/10.6017/ital.v38i4.11273 notes 1 shira atkinson and kirsten lee, “design and implementation of a study room reservation system: lessons from a pilot program using google calendar,” college & research libraries 79, no. 7 (2018): 916–30, https://doi.org/10.5860/crl.79.7.916. 2 helen heinrich and eric willis. “automated storage and retrieval system: a time-tested innovation,” library management 35, no. 6/7 (august 5, 2014): 444-53. https://doi.org/10.1108/lm-09-2013-0086. 3 atkinson and lee, “design and implementation of a study room reservation system,” 916–30. 4 “about csun,” california state university, northridge, february 2, 2019, https://www.csun.edu/about-csun. 5 “colleges,” california state university, northridge, may 8, 2019, https://www.csun.edu/academic-affairs/colleges. 6 estimated as/rs capacity was calculated by determining the average size and weight of an item for each size of bin along with the most common bin layout. the average item was then used to determine how many could be stored along the width and length (and if appropriate height) of the bin and then multiplied. many factors affect the overall capacity including: bin layout (with or without dividers), stored item type (book, box, records, etc.), weight of the items, and operator determination of full, partial, empty bin designation. the as/rs mini-loaders have a weight limit of 450 pounds including the weight of the bin. 7 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 8 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 9 heinrich and willis, “automated storage and retrieval system,” 444-53. 10 norma s. creaghe and douglas a. davis. “hard copy in transition: an automated storage and retrieval facility for low-use library materials,” college & research libraries 47, no. 5 (september 1986): 495-99, https://doi.org/10.5860/crl_47_05_495. 11 heinrich and willis, “automated storage and retrieval system,” 444-53. 12 creaghe and davis, “hard copy in transition,” 495-99. 13 linda shirato, sarah cogan, and sandra yee, “the impact of an automated storage and retrieval system on public services.” reference services review 29, no. 3 (september 2001): 253-61, https://doi.org/10.1108/eum0000000006545. 14 heinrich and willis, “automated storage and retrieval system,” 444-53. 15 sarah e. kirsch, “automated storage and retrieval—the next generation: how northridge’s success is spurring a revolution in library storage and circulation,” paper presented at the https://doi.org/10.5860/crl.79.7.916 https://doi.org/10.1108/lm-09-2013-0086 https://www.csun.edu/about-csun https://www.csun.edu/academic-affairs/colleges https://library.csun.edu/about/asrs https://doi.org/10.5860/crl_47_05_495 https://doi.org/10.1108/eum0000000006545 information technology and libraries | december 2019 124 acrl 9th national conference, detroit, michigan, april 8-11 1999, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf . 16 heinrich and willis, “automated storage and retrieval system,” 444-53. 17 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 18 kirsch, “automated storage and retrieval.” 19 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 20 cost of material management was calculated by removing building operational costs (lighting, hvac, carpet, accessibility/open hours, etc.) and focusing on the management of the material instead. the management of materials (or unit cost) is determined by dividing the total amount of fixed and variable costs by the total number of units; 400,000 items divided by $31,500 in annual shelving student budget equals $0.079 per-material per-year in open shelves; 900,000 items divided by $18,000 in annual as/rs student budget equals $0.02 permaterial per-year in the as/rs. 21 shirato, cogan, and yee, “the impact of an automated storage and retrieval system,” 253-61. 22creaghe and davis, “hard copy in transition,” 495-99. 23 kirsch, “automated storage and retrieval.” 24 heinrich and willis, “automated storage and retrieval system,” 444-53. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf abstract introduction background literature review staffing & usage 1991–1994 2000 mid-2000s mid-2000s–2015 2015–present customer service security inventory lessons learned disasters & security maintenance, support, and modernization networking buy in recording usage and statistics conclusion notes 10181 20190318 galley a systematic approach towards web preservation muzammil khan and arif ur rahman information technology and libraries | march 2019 71 muzammil khan (muzammilkhan86@gmail.com) assistant professor, department of computer and software technology, university of swat. arif ur rahman (badwanpk@gmail.com) assistant professor, department of computer science, bahria university islamabad. abstract the main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. a number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. the proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. for each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. the potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. moreover, the model can help to initiate a web preservation process and create a wellorganized web archive to efficiently manage the archived web contents. a section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources. introduction the amount of information generated by institutions is increasing with the passage of time. one of the mediums that uses this information is the world wide web (www). the www has become a tool to share information quickly with everyone regardless of their physical location. the number of web pages is vast. google and bing each index approximately 4.8 billion.1 though the www is a rapidly growing source of information, it is fragile in nature. according to the available statistics, 80 percent of pages become unavailable after one year and 13 percent of links (mostly web references) in scholarly articles are broken after 27 months.2 moreover, 11 percent of posts and comments on websites for various purposes are lost within a year. according to another study conducted on 10 million web pages collected from the internet archive in 2001, the average survival rate of web pages is 1,132.1 days with a standard deviation of 903.5 days. 90.6 percent pages of those web pages are inaccessible today.3 the information fragility causes this valuable scholarly, cultural, and scientific information to vanish and become inaccessible to future generations. in recent years, it was realized that the lifespan of digital objects is very short, and rapid technological changes make it more difficult to access these objects. therefore, there is a need to preserve the information available on the www. digital preservation is performed using the primary methods of emulation and migration, in which emulation provides the preserved digital objects in their original format while migration provide objects in a different format.4 in the last systematic approach towards web preservation | khan and ur rahman 72 https://doi.org/10.6017/ital.v38i1.10181 two decades, a number of institutions worldwide, such as national and international libraries, universities, and companies started to preserve their web resources (resources found at a web server, i.e., web contents and web structure). the first web archive was initiated in 1996 by brewster kahle, named the internet archive, and it holds more than 30 petabytes data, which includes 279 billion web pages, 11 million books and texts, and 8 million other digital objects such as audio, video, image files, etc. more than seventy web archive initiatives were started in 33 countries since 1996, which shows the importance of web preservation projects and preservation of web contents. this information era encourages librarians, archivists, and researchers to preserve the information available online for upcoming generations. while digital resources may not replace the information available in physical form, the digital version of these information resources improves access to the available information.5 there are different aspects of the preservation process and web archiving, e.g., digital objects’ ingestion to the archive during preservation process, digital object’s format and storage, archival management, administrative issues, access and security to the archive, and preservation planning. these aspects need to be understood for effective web preservation and will help in addressing the challenges that occur during the preservation process. the reference model for open archival information system (oais) is an attempt to provide a high-level framework for the development and comparison of digital archives. in web preservation, a challenging task is to identify the starting point of the preservation process and to effectively complete the process which help to proceed further to the other activities. therefore, the complicated nature of the web and the complex structure of the web contents make the preservation of the web content even more difficult. the oais reference model helps in achieving the goals of a preservation task in a step-by-step manner. the stakeholders are identified, i.e., producer, management, and consumer, and the packages, i.e., submission information package (sip), archival information package (aip) and dissemination information package (dip), which need to be processed, are clearly defined.6 this study aims to design a step-by-step systematic approach for web preservation that helps to understand preservation or archival activities’ challenges, especially those that relate to digital information objects at various steps of the preservation process. the systematic approach may lead to an easy way to analyze, design, implement, and evaluate the archive with clarity and different options for an effective preservation process and archival development. an effective preservation process is one that leads to a well-organized, easily managed web archive and accomplishes designated community requirements. this approach may help to address the challenges and risks that confront archivists and analysts during preservation activities. step-by-step systematic approach digital preservation is “the set of processes and activities that ensure long-term, sustained storage of, access to and interpretation of digital information.”7 the growth and decline rates of www content and the importance of the information presented on the web make it a key candidate for preservation. web preservation confronts a number of challenges due to its complex structure, a variety of available formats, and the type of information (purpose) it provides. the overall layout of the web varies domain to domain based on the type of information and its presentation. the websites can be categorized based on two things. first, the type of information (i.e., the web information technology and libraries | march 2019 73 contents) and second, the way this information presented (i.e., the layout or structure of the web page. examples include educational, personal, news, e-commerce, and social networking websites, which vary a lot in their contents and structure. the variations in the overall layout make it difficult to preserve different web contents in a single web archive. the web preservation activities are summarized in figure 1. the following sections explain the web preservation activities and possible implementation in proposed systematic approach. defining the scope of the web archive the www provides an opportunity to share information using various services, such as blogs, social networking websites, e-commerce, wikis, and e-libraries. these websites provide information on a variety of topics and address different communities based on their interest and needs. there are many differences in the way the information is handled and presented on the www. in addition, the overall layout of the web changes from one domain to another domain.8 therefore, it is not practically feasible to develop a single system to preserve all types of websites for the long term. so, before starting to preserve the web, one (the archivist) should define the scope of the web to be archived. the archive will be either a site-centric, topic-centric, or domaincentric archive.9 site-centric archive a site-centric archive focuses on a particular website for preservation. these types of archives are mostly initiated by the website creator or owner. the site-centric web archives allow access to the old versions of the website. topic-centric archive topic-centric archives are created to preserve information on a particular topic published on the web for future use. for scientific verification, researchers need to refer to the available information while it is difficult to ensure access to these contents due to the ephemeral nature of the web. a number of topic-centric archive projects have been performed including the archipol archive of dutch political websites,10 the digital archive for chinese studies (dachs) archive2,11 minerva by the library of congress,12 and the french elections web archive for archiving the websites related to the french elections.13 domain-centric archive the word “domain” refers to a location, network, or web extension. a domain-centric archive covers websites published with a specific domain name dns, using either a top-level domain (tld), e.g., .com, .edu, or .org, or a second-level domain (sld), e.g., .edu.pk or .edu.fr. an advantage of domain-centric archiving is that it can be created by automatically detecting specific websites. several projects have a domain-centric scope, e.g., the portuguese web archive (pwa) national websites,14 the kulturarw, a swedish royal library web archive collection of.se and .com domain websites,15 and the uk government web archive collection of uk government websites, e.g., .gov.uk domain websites. understanding the web structure after defining the scope of the intended web archive, the archivist will have a better understanding of the interest and expected queries of the intended community based on the resources available or the information provided by the selected domain. the focus in this step is to understand the type of information (contents) provided by the selected domain and how the information has been presented. the web can be understood by two dimensions. the first systematic approach towards web preservation | khan and ur rahman 74 https://doi.org/10.6017/ital.v38i1.10181 figure 1. systematic approach for web preservation process. information technology and libraries | march 2019 75 considers the web as a medium that communicates contents using various protocols, i.e., http, and the second considers the web as a content container, which further presents the contents to the viewers and not simply contents, e.g. the underlying technology used to display the contents.16 the preservation team should understand such parameters as the technical issues, the future technologies, and the expected inclusion of other related content. identify the web resources the archivist should understand the contents and the representation of the contents of the selected domain, e.g., blogs, social networking websites, institutional websites, educational institutional websites, newspaper websites, or entertainment websites. all of these websites provide different information and address individual communities that have distinct information needs. a web page is the combination of two things, i.e., web contents and web structure.17 the resources which can be preserved are as follows. web contents web contents or web information can be categorized into the following categories: • textual contents (plain text): this category describes textual information that appears on a web page. it does not include links, behaviors, and presentation stylesheets. • visual contents (images): these contents are the visual forms of information or are a complementary material to the information provided in the textual form. • multimedia contents: as another form of information, multimedia contents mainly include audio and video. it may also include animation or even text as a part of a video or a combination of text, audio, and video. web structure web structure can be categorized in the following categories: • appearance (web layout or presentation): this category indicates the overall layout or presentation of a web page. the look and feel of a web page (representation of the contents) are important, which is maintained with different technologies, e.g., html or stylesheets, etc. • behavior (code navigations): categorized by link navigations, these can be within a website or to other websites, external document links or dynamic and animated features, such as live feed, comments, tagging, or bookmarking. identify designated community the archivist should identify the designated community of the intended web archive, their functional requirements and expected queries by analyzing them carefully. the designated community means the potential users, such as those who can access the archived web contents for different purposes, i.e., accessing old information that is not available in normal circumstances or referring to an old news article which is not bookmarked properly or retrieving relevant news articles published long ago, etc. prioritize the web resources after a comprehensive assessment of the resources of the selected domain and the identification of potential users’ requirements and expected queries, the archivist should prioritize the web systematic approach towards web preservation | khan and ur rahman 76 https://doi.org/10.6017/ital.v38i1.10181 resources. the complexity of web resources and their representation cause complications in the digital preservation process. generally, it may be undesirable or unviable to preserve all web resources; therefore, it is worthwhile to designate the web resources for preservation. the priority should be assigned on the basis of two things: first, the potential reuse of the resource and second, the frequency with which the resource will be accessed. the resources with no value, little value, or those managed elsewhere can be excluded. for prioritization of resources, the moscow method can be applied.18 the acronym moscow can be elaborated as: m must have, the resource must be preserved or resources that must be a part of the archive and preserved. for example, in the digital news story archive (dnsa), the textual news story must be preserved in the archive because the preservation emphasis is on a textual news story.19 online news contains textual news stories, and many news stories contain associated images, and a fraction of news stories contain associated audio-video contents. s should have, the resource should be preserved if at all possible. almost all the news stories have associated images; a few news stories have associated audio and video that complement it and should be preserved as a part of the news story in the web archive. c could have, the resource could be preserved if it does not affect anything else or is nice to have. the web structure in dnsa depends on the resources to be used for the preservation of news stories; the layout of the newspaper website could (c) be a part of the preservation process if it does not affect anything, e.g., storage capacity and system efficiency. w won’t have, the resource would not be included. archiving multiple versions of the layout or structure of the online newspaper are not worthwhile and hence would not (w) be preserved. the prioritization of these resources is very important in the context of web preservation planning because it does not waste time and energy, and it is the best way to handle users’ requirements and fulfill their expected queries. how to capture the resource(s) the selection of a feasible capturing technique depends on: first, the resources to be captured and second, the capturing task frequency. there are three web resources capturing techniques, i.e., by browser, web crawler, and authoring system. each capturing technique has associated advantages and disadvantages.7 web capturing using browsers the intended web content can be captured using browsers after a web page is rendered when the http transaction occurs. this technique is also referred to as a snapshot or post-rendering technique. the method captures those things which are visible to the users; the behavior and other attributes remain invisible. capturing static contents is one of the disadvantages of web capturing by the browser approach, this approach generally preserved contents in the form of images. it is best for well-organized websites, and commercial tools are available for capturing the web. the following are well-known tools to capture web using browsers. webcapture (https://web-capture.net/) is a free online web-capturing service. it is a fast web page snapshot tool, which can grab web pages in seven different formats, i.e. jpeg, tiff, png, bmp information technology and libraries | march 2019 77 image formats, pdf, svg, and postscript files of high quality. it also allows downloading the intended format in a zip file and is suitable for long vertical web pages with no distortion in layout. a.nnotate (http://a.nnotate.com/), is an online annotating web snapshot tool to keep track of information gathered from the web efficiently and easily. it allows adding tags and notes to the snapshot and building a personal index of web pages as document index. the annotation feature can be used for multiple purposes, for example, compiling an annotated library of objects for organization, sharing commented web pages, product comparison, etc. snagit (https://www.techsmith.com/screen-capture.html) is a well-known snapshot tool for capturing screens with built-in advanced image editing features and screen recording. snagit is a commercial and advanced screen capture tool that can capture web pages with images, linked files, source code, and the url of the web page. acrobat webcapture (file > create > pdf from web page...) creates a tagged pdf file from the web page that a user visits while the adobe pdf toolbar is used for the entire website.20 the capture by a browser technique has the following advantages: • by this technique, the archivist can capture only the displayed contents, and it is an advantage if you need to preserve the displayed contents only. • it is a relatively simple technique for well-organized websites. • commercial tools exist for web capturing using browsers. in addition, the disadvantages are the following: • capturing displayed contents only is a disadvantage if the focus is not on only displayed contents. • it results in frozen contents and treats contents as if they are publications. • it loses the web structure, such as appearance, behavior, and other attributes of the web page. web capturing using an authoring system/server the authoring system capturing technique is used for web harvesting directly from the website hosting server. all the contents, e.g., textual information, images, and source code, are collected from the source web server. the authoring system allows the archivist to preserve the different versions of the website. the authoring system depends on the infrastructure of the content management system and is not a good choice for external resources. the system is best for an owned web server and works well for limited internal purposes. the web curator tool (http://webcurator.sourceforge.net/), pandas (an old british library harvesting tool), and netarchivesuite (https://sbforge.org/display/nas/netarchivesuite) are known tools use for planning and scheduling web harvesting. they can be used by non-technical personnel for both selection and harvesting web content selection policies. these web archiving tools were developed in a collaboration of the national library of new zealand and the british library and are used for the uk web archive (http://www.ariadne.ac.uk/issue50/beresford/). the tools can interface with web crawlers, such as heritrix (https://sourceforge.net/projects/archivecrawler/). authoring systems are also referred to as workflow systems or curatorial tools. systematic approach towards web preservation | khan and ur rahman 78 https://doi.org/10.6017/ital.v38i1.10181 the authoring system has the following advantages: • it is best for web harvesting, which captures everything available. • it is easy to perform, if you have proper access permission or you own the server or system to access for capturing the resources. • it works in short to medium term resources and feasible for internal access within organizations. the disadvantages of web capturing using the authoring system are: • it captures all available raw information, not only presentations. • it may be too reliant on the authoring infrastructure or the content management system. • it is not feasible for large term resources, or for external access from outside organization. web capturing using web crawlers web crawlers are perhaps the mostly used technique for capturing web contents in systematic and automated manner.21 crawler development needs the expertise and experience of different tools, i.e. positive and negative of technologies, and the viability of a tool in a specific scenario. the main advantage of crawlers is that they extract embedded content. heritrix, httrack, wget, and deeparc are common examples of web crawlers. heritrix (https://github.com/internetarchive/heritrix3/wiki) is developed in java, an open source and freely available web crawler, and it was developed by internet archive. heritrix is one of the widely used extensible and web-scale web crawlers in web preservation projects. initially, the heritrix was developed for specific purpose crawling of specific websites and now a resourceful or customize web crawler for archiving the web. httrack (https://www.httrack.com/) is a freely available configurable browser utility. httrack crawls html, images, and other files from a server to a local directory and allows offline viewing of the website. the httrack crawler downloads a complete website from the web server to a local computer system and makes it available for offline for viewing with all related link-structure and seems like the user is using it online. it also updates the archived websites at the local system from the server and resumes all the interrupted previous extractions. the httrack available for both windows and linux/unix operating systems. wget (http://www.gnu.org/software/wget/) is a freely available non-interactive command line tool that can easily be configured with other technologies and different scripts. it can capture files from the web using widely used ftp, ftps, http and https protocols, and support cookies as well. it also updates the archived websites and resumes all the interrupted extractions. wget is available for both microsoft windows and unix operating systems. the advantages of web crawling: • widely used in capturing techniques. • can capture specific content or everything. • avoids some of the accessing issues, such as: link rewriting and embedded external content from an archive or live. information technology and libraries | march 2019 79 disadvantages associated with web crawling: • much work is required, as well as tools or development expertise and experience, etc. • the web crawler does not have the right scope: sometimes, it does not capture everything that it should, and sometimes the crawler captures too much content. web content selection policy in the previous steps, the web resources are identified, prioritized based on requirements and expected queries of the designated community, and feasible capturing technique is identified based on capturing frequency. now, the contents need to be prepared and filtered for selection, and a feasible selection approach needs to be selected based on the contents. a web content selection policy helps to determine and clarify, which web contents are required to be captured based on the priorities, the purpose and the scope of web contents already defined.22 the decision of the selection policy comprises the description of the context, the intended users, the access mechanisms and the expected uses of the archive. the selection policy may comprise the selection process and selection approach. the selection process can be divided into subtasks which, in combination, provide a qualitative selection of web contents to a certain extent, i.e., preparation, discovery, and filtering, as shown in figure 2. the main objective of the preparation phase is to determine the targeted information space, the capture technique, capturing tools, extension categorization, granularity level, and the frequency of archiving activity. the best personnel who can provide help in preparation are the domain experts, regardless of the scope of the web archive. the domain experts may be the archivists, researchers, librarians, or any other authentic reference, i.e. a document or a research article. the tools defined in the preparation phase will help to discover intended information in the discovery phase, which can be divided into the following four categories: 1. hubs may be the global directories or topical directories, collection of sites or even a single web page with essential links related to a particular subject or topic. 2. search engines can facilitate discovery by defining a precise query or set of alternative queries related to a topic. the use of specialized search engines can significantly improve the results of discovering related information that can be greatly improved. 3. crawlers can be used to extract web contents such as textual information, images, audio, video and links. moreover, the overall layout of a web page or a whole website can also be extracted in a well-defined systematic manner. 4. external sources may be non-web sources that may be anything, such as printed material for mailing lists, which can be monitored by the selection team. the main objective of the discovery phase is to determine the source of information to be stored the archive. this determination can be achieved by two ways. first, a manually created entry point list is used to determine the list of entry points (usually links) for crawling the collection manually and updating the list during the crawl. there are two discovery methods, i.e., exogenous and endogenous. exogenous discovery is used in manual selection and mostly relies on exploitation of an entry point list for hubs, search engines, and on non-web documents. second, there is an automatically created entry point list to determine the list of entry points by extracting links automatically and obtaining an updated list every time during the crawl. endogenous discovery is systematic approach towards web preservation | khan and ur rahman 80 https://doi.org/10.6017/ital.v38i1.10181 used in automatic selection and relies on the link extraction using crawlers by exploring the entry point list. figure 2. selection process. the main objective of the filtering phase is to optimize and make concise the discovered web contents (discovery space). filtering is important in order to collect more specific web content and remove unwanted or duplicated content. usually, for preservation, an automatic filtering method is used; manual filtering is useful if the robots or automatic tools cannot interpret the web. the discovery and filter phase can be combined practically or logically. several evaluation axes can be used for the selection policy (e.g., quality, subject, genre, and publisher). in the literature, we have three known techniques for selecting web content. the selection approach can be either automatic or manual. manual content selection is very rare because it is labor intensive: it requires automatic tools for finding the content, and then manual review of that collection to identify the subset that should be captured. automatic selection policies are used frequently in web preservation projects for web collection, especially for web archives.23 the selection of the collection approach depends on the frequency with which the web content has been preserved in the archive. there are four different selection approaches for web content collection. unselective approach the unselective approach implies collecting everything possible; by specifically using this approach, the whole website and its related domains and subdomains are downloaded to the archive. it is also referred to as automatic harvesting or selection, bulk selection, and domain selection.24 the automatic approach is used in a situation where a web crawler usually performs the collection. for example, the collection of websites from a domain, i.e., .edu means all educational institution websites (at domain level) or the collection of all possible contents/pages from a website (harvesting at website level) by extracting the embedded links. a section of the data preservation community believes that technically it is a relatively cheaper, quicker collection approach and yields a comprehensive picture of the web as a whole. in contrast, its significant drawbacks are that it generates huge unsorted, duplicated, and potentially useless data, consuming too many resources. information technology and libraries | march 2019 81 the swedish royal library’s project kulturarw3 harvests websites at domain level, i.e., collecting websites from a .se domain which is a physically located website in sweden and one of the first projects to adopt this approach.25 usually, national-based web archive initiatives adopt the unselective approach, most notably nedlib, a helsinki university library harvester, and aola, an austrian online archive.26 selective approach the selective approach was adopted by the national library of australia (nla) in the pandas project in 1997. in this approach, a website is included for archiving based on certain predefined strategies and on the access and information provided by the archive. the library of congress’ project minerva and the british library project “britain on the web” are the other known projects that have adopted the selective approach. according to nla, the selected websites are archived based on nla guidelines after negotiation with the owners.27 the inclusion decision could be taken at one of the following levels: • website level: which websites should be included from a selected domain, e.g., to archive all educational websites from high level domain “.pk”. • web page level: which web pages should be included from a selected website, e.g., to archive the homepages of all educational websites. • web content level: which type of web contents should be preserved, e.g., to archive all the images from the homepages of educational websites. a selective approach is best if the numbers of websites to be archived are very large or the archiving process is targeting the entire www and wants to narrow down the scope by identifying the resources in which the archivists are more interested. this approach performs implicit or explicit assumptions about the web contents that are not to be selected for preservation. it may be very helpful to initiate a pilot preservation project, which identifies: what is possible? what can be managed? in addition, some tangible results may be obtained easily and quickly in order to enhance the scope of the project in a broader perspective. the selective approach may be based on a predefined criterion or based on an event. selective approach based on criteria involves selecting web resources based on various predefined sets of criteria. nla’s guidance characterizes the criteria-based selective approach as the “most narrowly defined method,” and described it as “thematic selection.” a simple or a complex content-selection criteria can be defined, which depends on the overall goal of preservation. for example, all resources owned by an organization, all resources of one genre, i.e., all programming blogs, resources contributed to a common subject, resources addressing a specific community within an institution, i.e., students or staff, all publications belonging to an individual organization or group of organizations, all resources that may benefit external users or an external user’s community, e.g., historians, or alumni. selective approach based on event involves selecting web resources or websites based on various time-based events. the archivists may focus on websites that address national or international important events, e.g., disasters, elections, and the football world cup, etc. eventbased websites have two characteristics: (1) very frequent updates and (2) website content is lost after a short time, e.g., a few weeks or a few months. for example, the start and end of a term or systematic approach towards web preservation | khan and ur rahman 82 https://doi.org/10.6017/ital.v38i1.10181 academic year, the duration of an activity, e.g., research project, appointment, or departure of a new senior official. deposit approach in the deposit collection approach, the information package is submitted by the administrator or owner of the website which includes a copy of the website with related files that can be accessed through different hyperlinks. the archival information package is applicable to the small collection (of a few websites), or the owner of the website can initiate the preservation project, e.g. a company can initiate a project for preserving their website. the deposit collection approach was adopted by the national archives and records administration (nara) for the collection of us federal agency websites in 2001 and by die deutsche bibliothek (ddb, http://deposit.ddb.de/) for the collection of dissertations and some online publications. new digital initiatives are heavily dependent on administrator or owner support and provide an easy way to deposit new content to the repository, e.g., in the macewan university’s institutional repository, the librarians leading the project tried to offer an easy and effective way to deposit their archival contents.28 combined approach there are advantages and disadvantages associated with each collection approach. the ongoing debate is which approach is best in a given situation. for example, the deposit approach should be an inexpensive agreement with the depositors. the emphasis is to use the combination of automatic harvesting and selective approaches as these two approaches are cheaper as compared to other selection approaches because a few staff personnel are required and cope with technological challenges. this initiative was taken by the bibliothque nationale de france (bnf) in 2006. the bnf automatically crawls information regarding the updated web pages and stores it in an xml-based “site delta” and uses page relevancy and importance, similar to how google ranks pages, to evaluate individual pages.29 the bnf used a selective approach for the deep web (that is, web pages or websites that are behind a password or are otherwise not generally accessible to search engines), referred to as “deposit track.” metadata identification cataloging is required to discover a specific item from the digital collection. an identifier or set of identifiers is required to retrieve a digital record in digital repositories or an archive. for digital documents, this catalog or registration or identifier is referred to as metadata.30 metadata are structured information concerning resources that describe, locate (discover or place), manage, easily retrieve (access) and use digital information resources. metadata are often referred to as “data about data” or “information about information”, but it may be more helpful and informative to describe these data as “descriptive and technical documentation.”31 metadata can be divided into the following three categories: 1. descriptive metadata describes a resource for discovery and identification purposes. it may consist of elements for a document such as title, author(s), abstract, and keywords, etc. 2. structural metadata describes how compound objects are put together, for example, how sections are ordered to form chapters. information technology and libraries | march 2019 83 3. administrative metadata imparts information to facilitate resource management, such as when and how a file was created, who can access the file, its type, and other technical information. administrative metadata is classified into two types: (1) rights management metadata addresses intellectual property rights and (2) preservation metadata contains information needed to archive and preserve a resource.32 due to new information technologies, digital repositories, especially web-based repositories, have grown rapidly over the last two decades. this interest prompts the digital libraries communities to devise metadata strategies to manage the immense amount of data stored in digital libraries.33 metadata play a vital role in the long-term preservation of digital objects and important to identify the metadata which may help to retrieve a specific object from the archive after preservation. according to duff et al., “the right metadata is the key to preserving digital objects.”34 there are hundreds of metadata standards developed over the years for different user environments, disciplines, and for different purposes; many of them are in their second, third, or nth edition.35 digital preservation and archiving requires metadata standards to trace and ensure its access to the digital objects. several of the common standards are briefly discussed below. dublin core metadata initiative (dcmi, http://dublincore.org/) was initiated at the 2nd world wide web conference in 1994 and was standardized by ansi/niso z39.85 in 2001 and iso 15386 in 2003.36 the main purpose of the dcmi was to define an element set for representing web resources; initially, thirteen core elements were defined which later increased to a fifteen-element set. the elements are optional, repeatable, can be followed in any order, and expressed in xml.37 metadata encoding and transmission standard (mets, http://www.loc.gov/standards/mets/) is an xml metadata standard intended to represent information of the complex digital objects. mets elements evolved from the early project making of america ii “moa2” in 2001, supported by the library of congress and sponsored by the digital library federation “dlf” and registered with national information standards organization “niso” in 2004. a mets document contains seven major sections in which each contains different aspects of metadata.38 metadata object description schema (mods, http://www.loc.gov/standards/mods/) was initiated by the marc21 maintenance agency at the library of congress in 2002. mods elements are richer then dcmi, simpler then marc21 bibliographic format and expressed in xml.39 the mods identified the widest facets or features of an object and presented nineteen high-level optional elements.40 visual resources association core strategies (vra core, http://www.loc.gov/standards/vracore/) was developed in 1996, and the current version 4.0 was released in 2007. the vra core is a widely used standard for art, libraries, and archives for such objects as paintings, drawings, sculpture, architecture, and photographs, as well as books and decorative and performance art.41 the vra core contains nineteen elements and nine sub-elements.42 preservation metadata implementation strategies (premis, http://www.loc.gov/standards/premis/) was developed in 2005, sponsored by the online computer library center (oclc) and the research libraries group (rlg), includes a data dictionary and some information about metadata. premis defined a set of five interactive core semantic units or entities and xml schema for endorsing digital preservation activities. it is not systematic approach towards web preservation | khan and ur rahman 84 https://doi.org/10.6017/ital.v38i1.10181 concerned with discovery and access but with common metadata, and for descriptive metadata, other standards (dublin core, mets or mods) need to be used. the premis data model contains intellectual entities (contents that can be described as a unit, e.g., books, articles, databases), objects (discrete units of information in digital form, which can be files, bitstreams, or any representation), agents (people, organization, or software), events (actions that involve an object and an agent known to the system) and rights (assertion of rights and permission).43 it is indisputable that good metadata improves access to the digital object in the digital repository. therefore, the creation and selection of appropriate metadata make the web archive accessible to the archive user. structure metadata helps to manage the archival collection internally, as well as the related services, but may not always help to discover the primary source of the digital object.44 currently, there are many semi-automated metadata generation tools. the use of these semiautomatic tools for generating metadata is crucial for the future, considering the operation’s complexity and cost of manual metadata origination.45 archival format the web archive initiatives select websites for archiving based on relevance of contents and the intended audience of the archived information. the size of the web archives varies significantly depending on their scope and the type of content they are preserving, e.g., web pages, pdf documents, images, audio, or video files.46 to preserve these contents, a web archive uses different storage formats containing metadata and utilizes data compression techniques. the internet archive defined the arc format (http://archive.org/web/researcher/arcfileformat.php), later used as a defacto standard. in 2009, the internet organization for standardization (iso) established the warc format (https://goo.gl/0rbwsn) as an official standard for web archiving. approximately 54 percent of web archive initiatives applied arc and warc formats for archiving. the use of standard formats helps the archivists to facilitate the creation of collaborative tools, such as search engines and ui utilities to efficiently manipulate the archived data.47 information dissemination mechanisms a well-defined preservation process can lead to a well-organized web archive that is easy to maintain and easy to retrieve a specific digital object from the collection using information dissemination techniques. poor search results are one of the main problems in information dissemination of web archives. the users of a web archive expend excessive time to retrieve intended documents or information to satisfy the user’s query. archivists are more concerned with “ofness,” “what collections are made up of,” although archive users are concerned with aboutness, “what collections are about.”48 to use the full potential of web archives a usable interface is needed to help the user to search the archive for specific digital object. full text and keyword search are the dominant ways to search the unstructured information repository, evidently observed from the online search engines. the sophistication of search results against user queries is based on the ranking tools.49 the access tools and techniques are getting the attention of researchers, and approximately 82 percent of european web archives concentrate on such tools, which makes these web archives easily accessible.50 the lucene full-text search engine and its extension nutchwax is widely used in web archiving. moreover, for the combination of semantic descriptions that already rely on or are implicit within their descriptive metadata, reasoning-based or semantic searching of the archival information technology and libraries | march 2019 85 collection can enable the system to produce novel possibilities for the archival content retrieval and browsing.51 even in the current era of digital archives, mobile services are adopted in digital libraries, e.g., access to e-books, libraries databases, catalogs, and text messaging are common mobile services offered in university libraries.52 in a massive repository, a user query retrieves millions of documents, which makes it difficult for users to identify the most relevant information. the ranking model estimates the results relevancy based on user’s queries using specified criteria to overcome this problem and sorts the results by placing the most relevant result at the top.53 there are a number of ranking models that exist in the literature, e.g., conventional ranking models, e.g., tf-idf, bm25f, temporal ranking models, e.g., pagerank, and learning to rank models, e.g., l2r. the findings of the systematic approach for web preservation are used to automate the process of the digital news-story preservation. the steps of the proposed model are carefully adopted to develop a tool that is able to add contextual information to the stories to be preserved. digital news stories preservation framework the advancement of web technologies and maturation of the internet attracts news readers to access news online that is provided by multiple sources and to obtain the desired information comprehensively. the amount of news published online has grown rapidly, and for an individual, it is cumbersome to browse through all online sources for relevant news articles. the news generation in the digital environment is no longer a periodic process with a fixed single output, such as printed newspapers. the news is instantly generated and updated online in a continuous fashion. however, because of different reasons, such as the short lifespan of digital information and the speed of generation of information, it has become vital to preserve digital news for the long term. digital preservation includes various actions to ensure that digital information remains accessible and usable, as long as they are considered important.54 libraries and archives preserve by carefully digitizing newspapers considering as a good source of knowing the history. many approaches have been developed to preserve digital information for the long term. the lifespan of news stories published online varies from one newspaper to another, i.e., from one day to a month. however, a newspaper may be backed up and archived by the news publisher or national archives; in the future, it will be difficult to access particular information published in various newspapers regarding the same news story. the issues become even more complicated if a story is to be tracked through an archive of many newspapers, which requires different access technologies. the digital news story preservation (dnsp) framework was introduced to preserve digital news articles published online from multiple sources.55 the dnsp framework is planned based on adopting the proposed step-by-step systematic approach for web preservation to develop a wellorganized web archive. initially, the main objectives defined for the dnsp framework are: • to initiate a well-organized national level digital news archive of multiple news sources. • to normalize news articles during preservation to a common format for future use. • to extract explicit and implicit metadata, which would be helpful in ingesting stories to the archive and browsing through the archive in the future. • to introduce content-based similarity measures to link digital news articles during preservation. systematic approach towards web preservation | khan and ur rahman 86 https://doi.org/10.6017/ital.v38i1.10181 the digital news story extractor (dnse) is a tool developed to facilitate the extraction of news stories from the online newspapers and to migrate to a normalized format for preservation. the normalized format also includes a step to add metadata in the digital news stories archive (dnsa) for future use.56 to facilitate the accessibility of news articles preserved from multiple sources, some mechanisms need to be adopted for linking the archived digital news articles. an effective term-based approach “common ratio measure for stories (crms)” for linking digital news articles in dnsa is introduced that links similar news articles during the preservation process.57 the approach is empirically analyzed, and the results of the proposed approach are compared to get conclusive arguments. the initial results computed automatically using a common ratio measure for stories are encouraging and are compared with the similarity of news articles based on human judgment. the results are generalized by defining a threshold value based on multiple experimental results using the proposed approach. currently, there is ongoing work to extend the scope of dnsa to dual languages, i.e., urdu and english, as well as content-based similarity measures to link news articles published in urduenglish. moreover, research is underway to develop tools for exploiting the linkage created among stories during the preservation process for search and retrieval tasks. summary effective strategic planning is critical in creating web archives; hence, it requires a wellunderstood and a well-planned preservation process. the process should result in a wellorganized web archive that includes not only the content to be preserved but also the contextual information required to interpret the content. the study attempts to answer many questions by guiding the archivists and related personnel, such as: how to lead the web preservation process effectively? how to initiate the preservation process? how to proceed through different steps? what are the possible techniques that may help to create a well-organized web archive? how can the archived information can be used to its greatest potential? to answer these questions, the study resulted in an appropriate step-by-step process for web preservation and a well-organized web archive. the targeted goal of each step is identified by researching the existing approaches that can be adopted. the possible techniques for those approaches are discussed in detail for each step. references 1 “world wide web size,” the size of the world wide web, visited on jan 31, 2019, http://www.worldwidewebsize.com/. 2 brian f. lavoie, “the open archival information system reference model: introductory guide,” microform & imaging review 33, no. 2 (2004): 68-81; alexandros ntoulas, junghoo cho, and christopher olston, “what's new on the web? the evolution of the web from a search engine perspective,” in proceedings of the 13th international conference on world wide web-04 (new york, ny: acm, 2004), 1-12. information technology and libraries | march 2019 87 3 teru agata et al., “life span of web pages: a survey of 10 million pages collected in 2001,” ieee/acm joint conference on digital libraries, (ieee, 2014), 463-64, https://doi.org/10.1109/jcdl.2014.6970226. 4 timothy robert hart and denise de vries, “metadata provenance and vulnerability,” information technology and libraries 36, no. 4 (dec. 2017): 24-33, https://doi.org/10.6017/ital.v36i4.10146. 5 claire warwick et al., “library and information resources and users of digital resources in the humanities,” program 42, no. 1 (2008): 5-27, https://doi.org/10.1108/00330330810851555. 6 lavoie, “open archival information system reference model.” 7 susan farrell, k. ashley, and r. davis, “a guide to web preservation,” practical advice for web and records managers based on best practices from the jisc-funded powr project (2010), https://jiscpowr.jiscinvolve.org/wp/files/2010/06/guide-2010-final.pdf. 8 lavoie, “open archival information system reference model;” farrell, ashley, and davis, “guide to web preservation.” 9 peter lyman, “archiving the world wide web,” washington, library of congress (2002), https://www.clir.org/pubs/reports/pub106/web/. 10 diomidis spinellis, “the decay and failures of web references,” communications of the acm 46, no. 1 (2003): 71-77, https://dl.acm.org/citation.cfm?doid=602421.602422. 11 digital archive for chinese studies (dachs) archive2 https://www.zo.uniheidelberg.de/boa/digital_resources/dachs/index_en.html, visited on jan 31, 2019. 12 julien masanès, “web archiving methods and approaches: a comparative study,” library trends 54, no. 1 (2005): 72-90, https://doi.org/10.1353/lib.2006.0005. 13 hanno lecher, “small scale academic web archiving: dachs,” in web archiving (berlin/heidelberg: springer, 2006), 213-25, https://doi.org/10.1007/978-3-540-463320_10. 14 daniel gomes et al., “introducing the portuguese web archive initiative,” in 8th international web archiving workshop (berlin/heidelberg: springer, 2009). 15 gerrit voerman et al., “archiving the web: political party web sites in the netherlands,” european political science 2, no. 1 (2002): 68-75, https://doi.org/10.1057/eps.2002.51. 16 sonja gabriel, “public sector records management: a practical guide,” records management journal 18, no. 2 (2008), https://doi.org/10.1108/00242530810911914. 17 farrell, ashley, and davis, “guide to web preservation.” systematic approach towards web preservation | khan and ur rahman 88 https://doi.org/10.6017/ital.v38i1.10181 18 jung-ran park and andrew brenza, “evaluation of semi-automatic metadata generation tools: a survey of the current state of the art,” information technology and libraries 34, no. 3 (sept, 2015): 22-42, https://doi.org/10.6017/ital.v34i3.5889. 19 muzammil khan and arif ur rahman, “digital news story preservation framework,” in digital libraries: providing quality information: 17th international conference on asia-pacific digital libraries, icadl 2015 seoul, korea, december 9-12, 2015 (proceedings, vol. 9469, springer, 2015), 350-52, https://doi.org/10.1007/978-3-319-27974-9; muzammil khan, “using text processing techniques for linking news stories for digital preservation,” phd thesis, faculty of computer science, preston university kohat, islamabad campus, hec pakistan, 2018. 20 dennis dimick, “adobe acrobat captures the web,” washington apple pi journal (1999): 23-25. 21 trupti udapure, ravindra d. kale, and rajesh c. dharmik, “study of web crawler and its different types,” iosr journal of computer engineering (iosr-jce) 16, no. 1 (2014): 01-05, https://doi.org/10.9790/0661-16160105. 22 dora biblarz et al., “guidelines for a collection development policy using the conspectus model,” international federation of library associations and institutions, section on acquisition and collection development (2001). 23 farrell, ashley, and davis, “guide to web preservation;” e. pinsent et al., “powr: the preservation of web resources handbook,” http://jisc.ac.uk/publications/programmerelated/2008/powrhandbook.aspx (2010); michael day, “preserving the fabric of our lives: a survey of web preservation initiatives,” lecture notes in computer science (berlin/heidelberg: springer, 2003): 461-72, https://doi.org/10.1007/978-3-540-45175-4_42. 24 pinsent et al., “powr:”; day, “preserving the fabric.” 25 allan arvidson, “the royal swedish web archive: a complete collection of web pages,” international preservation news (2001): 10-12. 26 andreas rauber, andreas aschenbrenner, and oliver witvoet, “austrian online archive processing: analyzing archives of the world wide web,” research and advanced technology for digital libraries (2002): ecdl 2002. lecture notes in computer science, vol 2458, (berlin/heidelberg: springer, 2002), 16-31, https://doi.org/10.1007/3-540-45747-x_2. 27 william arms, “collecting and preserving the web: the minerva prototype,” rlg diginews 5, no. 2 (2001). 28 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: micro interactions and the user experience,” information technology and libraries 34, no. 3 (sept. 2015): 43-58, https://doi.org/10.6017/ital.v34i3.5900. 29 serge abiteboul et al., “a first experience in archiving the french web,” in international conference on theory and practice of digital libraries, (berlin/heidelberg: springer, 2002), 115, https://doi.org/10.1007/3-540-45747-x_1; sergey brin and lawrence page, “reprint of: information technology and libraries | march 2019 89 the anatomy of a large-scale hypertextual web search engine,” computer networks 56, no. 18 (2012): 3825-33, https://doi.org/10.1016/j.comnet.2012.10.007. 30 masanès, “web archiving.” 31 niso-press, “understanding metadata,” national information standards (2004), http://www.niso.org/publications/understanding-metadata. 32 ibid. 33 jane greenberg, “understanding metadata and metadata schemes,” cataloging & classification quarterly 40, no. 3-4 (2009): 17-36, https://doi.org/10.1300/j104v40n03_02. 34 michael day, “preservation metadata initiatives: practicality, sustainability, and interoperability,” publishers: archivschule marburg (2004): 91-117. 35 jenn riley, glossary of metadata standards (2010). 36 corey harper, “dublin core metadata initiative: beyond the element set,” information standards quarterly 22, no. 1 (2010): 20-31. 37 jane greenberg, “dublin core: history, key concepts, and evolving context (part one),” in slide presentation on dc-2010 international conference on dublin core and metadata applications pittsburgh, pa (2010). 38 cundiff v. morgan, “an introduction to the metadata encoding and transmission standard (mets),” library hi tech 22, no. 1 (2004): 52-64, https://doi.org/10.1108/07378830410524495; leta negandhi, “metadata encoding and transmission standard (mets),”in texas conference on digital libraries, tcdl-2012 (2012). 39 sally h. mccallum, “an introduction to the metadata object description schema (mods),” library hi tech 22, no. 1 (2004): 82-88, https://doi.org/10.1108/07378830410524521. 40 r. gartner, “mode: metadata object description schema,” jisc techwatch report tsw (2003): 03-06. www.loc.gov/standards/mods/. 41 vra-core, “an introduction of vra core,” http://www.loc.gov/standards/vracore/vra core4 intro.pdf, created: oct 2014. 42 vra-core, “vra core element outline,” http://www.loc.gov/standards/vracore/vra core4 outline.pdf, created: feb 2007. 43 priscilla caplan, “understanding premis,” washington dc, usa: library of congress, (2009), https://www.loc.gov/standards/premis/understanding-premis.pdf; j. relay, “an introduction to premis,” singapore ipress tutorial, (2011), http://www.loc.gov/standards/premis/premistutorial ipres2011 singapore.pdf. systematic approach towards web preservation | khan and ur rahman 90 https://doi.org/10.6017/ital.v38i1.10181 44 jennifer schaffner, “the metadata is the interface: better description for better discovery of archives and special collections, synthesized from user studies,” making archival and special collections more accessible, 85 (2015). 45 joao miranda and daniel gomes, “trends in web characteristics,” in web congress, 2009. laweb'09. latin american, (ieee, 2009), 146-53, https://doi.org/10.1109/la-web.2009.28. 46 daniel gomes, joão miranda, and miguel costa, “a survey on web archiving initiatives,” research and advanced technology for digital libraries (2011): 408-20, https://doi.org/10.1007/978-3-642-24469-8_41. 47 ibid. 48 schaffner, “metadata is the interface.” 49 miguel costa and mário j. silva, “evaluating web archive search systems,” in international conference on web information systems engineering (berlin/heidelberg: springer, 2012), 440454. https://doi.org/10.1007/978-3-642-35063-4_32. 50 foundation, i, “web archiving in europe,” technical report, commercenet labs (2010). 51 georgia solomou and dimitrios koutsomitropoulos, “towards an evaluation of semantic searching in digital repositories: a dspace case-study,” program 49, no. 1 (2015): 63-90, https://doi.org/10.1108/prog-07-2013-0037. 52 liu yan quan and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology and libraries 34, no. 2 (june 2015): 133, https://doi.org/10.6017/ital.v34i2.5650. 53 ricardo baeza-yates and berthier ribeiro-neto, modern information retrieval 463. (new york: acm pr., 1999). 54 daniel burda and frank teuteberg, “sustaining accessibility of information through digital preservation: a literature review,” journal of information science, 39, no. 4 (2013): 442-58, https://doi.org/10.1177/0165551513480107. 55 muzammil khan et al., “normalizing digital news-stories for preservation,” in digital information management (icdim), 2016 eleventh international conference on (ieee, 2016), 8590, https://doi.org/10.1109/icdim.2016.7829785. 56 khan, et al., “normalizing digital news.” 57 muzammil khan, arif ur rahman, and m. daud awan, “term-based approach for linking digital news stories,” in italian research conference on digital libraries (cham, switzerland: springer, 2018), 127-38, https://doi.org/10.1007/978-3-319-73165-0_13. generating collaborative systems for digital libraries | visser and ball 187 marijke visser and mary alice ball the middle mile: the role of the public library in ensuring access to broadband of fundamentally altering culture and society. in some circles the changes happen in real time as new web-based applications are developed, adopted, and integrated into the user’s daily life. these users are the early adopters; the internet cognoscenti. second tier users appreciate the availability of online resources and use a mix of devices to access internet content but vary in the extent to which they try the latest application or device. the third tier users also vary in the amount they access the internet but have generally not embraced its full potential, from not seeking out readily available resources to not connecting at all.1 regardless of the degree to which they access the internet, all of these users require basic technology skills and a robust underlying infrastructure. since the introduction of web 2.0, the number and type of participatory web-based applications has continued to grow. many people are eagerly taking part in creating an increasing variety of web-based content because the basic tools to do so are widely available. the amateur, creating and sharing for primarily personal reasons, has the ability to reach an audience of unprecedented size. in turn, the internet audience, or virtual audience, can select from a vast menu of formats, including multimedia and print. with print resources disappearing, it is increasingly likely for an individual to only be able to access necessary material online. web-based resources are unique in that they enable an undetermined number of people, personally connected or complete strangers, to interact with and manipulate the content thereby creating something new with each interaction and subsequent iteration. many of these new resources and applications require much more bandwidth than traditional print resources. with the necessary technology no longer out of reach, a crosssection of society is affecting the course the twenty-first century is taking vis à vis how information is created, who can create it, and how we share it.2 in turn, who can access web-based content and who decides how it can be accessed become critical questions to answer. as people become more adept at using web-based tools and eager to try new applications, the need for greater broadband will intensify. the economic downturn is having a marked effect on people’s internet use. if there was a preexisting problem with inadequate access to broadband, current circumstances exacerbate it to where it needs immediate attention. access to broadband internet today increases this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. i n the last twenty years library collections have evolved from being predominantly print-based to ones that have a significant digital component. this trend, which has a direct impact on library services, has only accelerated with the advent of web 2.0 technologies and participatory content creation. cutting-edge libraries with next generation catalogs encourage patrons to post reviews, contribute videos, and write on library blogs and wikis. even less adventuresome institutions offer a variety of electronic databases licensed from multiple publishers and vendors. the piece of these library portfolios that is at best ignored and at worst vilified is the infrastructure that enables internet connectivity. in 2010, broadband telecommunication is recognized as essential to access the full range of information resources. telecommunications experts articulate their concerns about the digital divide by focusing on firstand last-mile issues of bringing fiber and cable to end users. the library, particularly the public library, represents the metaphorical middle mile providing the public with access to rich information content. equally important, it provides technical knowledge, subject matter expertise, and general training and support to library users. this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. ■■ the culture of information information today is dynamic. as the internet continues on its fast paced, evolutionary track, what we call ‘information’ fluctuates with each emerging web-based technology. theoretically a democratic platform, the internet and its user-generated content is in the process marijke visser (mvisser@alawash.org) is information technology policy analyst and mary alice ball (maryaliceball@yahoo .com) former chair, telecommunications subcommittee, office for information technology policy, american library association, washington, dc. 188 information technology and libraries | december 2010 the geographical location of a community will also influence what kind of internet service is available because of deployment costs. these costs are typically reflected in varying prices to consumers. in addition to the physical layout of an area, current federal telecommunications policies limit the degree to which incentives can be used on the local level.7 encouraging competition between isps, including municipal electric utilities, incumbent local exchange carriers, and national cable companies, for example, requires coordination between local needs and state and federal policies. such coordinated efforts are inherently difficult when taking into consideration the numerous differences between locales. ultimately, though, all of these factors influence the price end users must pay for internet access. with necessary infrastructure and telecommunications policies in place, there are individual behaviors that also affect broadband adoption. according to the pew study, “home broadband adoption 2008,” 62 percent of dial-up users are not interested in switching to broadband.8 clearly there is a segment of the population that has not yet found personal relevance to high-speed access to online resources. in part this may be because they only have experience with dial-up connections. depending on dial-up gives the user an inherently inferior experience because bandwidth requirements to download a document or view a website with multimedia features automatically prevent these users from accessing the same resources as a user with a high-speed connection. a dial-up user would not necessarily be aware of this difference. if this is the only experience a user has it might be enough to deter broadband adoption, especially if there are other contributing factors like lack of technical comfort or availability of relevant content. motivation to use the internet is influenced by the extent to which individuals find content personally relevant. whether it is searching for a job and filling out an application, looking at pictures of grandchildren, using skype to talk to a family member deployed in iraq, researching healthcare providers, updating a personal webpage, or streaming video, people who do these things have discovered personally relevant internet content and applications. understanding the potential relevance of going online makes it more likely that someone would experiment with other applications, thus increasing both the familiarity with what is available and the comfort level with accessing it. without relevant content, there is little motivation for someone not inclined to experiment with internet technology to cross what amounts to a significant hurdle to adoption. anthony wilhelm argues in a 2003 article discussing the growing digital divide that culturally relevant content is critical in increasing the likelihood that non-users will want to access web-based resources.9 the scope of the issue of providing culturally relevant content is underscored in the 2008 pew study, the amount of information and variety of formats available to the user. in turn more content is being distributed as users create and share original content.3 businesses, nonprofits, municipal agencies, and educational institutions appreciate that by putting their resources online they reach a broader segment of their constituency. this approach to reaching an audience works provided the constituents have their own access to the materials, both physically and intellectually. it is one thing to have an internet connection and another to have the skill set necessary to make productive use of it. as reported in job-seeking in u.s. public libraries in 2009, “less than 44% of the top 100 u.s. retailers accept instore paper applications.”4 municipal, state, and federal agencies are increasingly putting their resources online, including unemployment benefit applications, tax forms, and court documents.5 in addition to online documents, the report finds social service agencies may encourage clients to make appointments and apply for state jobs online.6 many of the processes that are now online require an ability to navigate the complexities of the internet at the same time as navigating difficult forms and websites. the combination of the two can deter someone from retrieving necessary resources or successfully completing a critical procedure. while early adopters and policy-makers debate the issues surrounding internet access, the other strata of society, knowingly or not and to varying degrees, are enmeshed in the outcomes of these ongoing discussions because their right to information is at stake. ■■ barriers to broadband access by condensing internet access issues to focus on the availability of adequate and sustainable broadband, it is possible to pinpoint four significant barriers to access: price, availability, perceived relevance, and technical skill level. the first two barriers are determined by existing telecommunications infrastructure as well as local, state, and federal telecommunications policies. the latter barriers are influenced by individual behaviors. both divisions deserve attention. if local infrastructure and the internet service provider (isp) options do not support broadband access to all areas within its boundaries, the result will be that some community members can have broadband services at home while others must rely on work or public access computers. it is important to determine what kind of broadband services are available (e.g., cable, dsl, fiber, satellite) and if they are robust enough to support the activities of the community. infrastructure must already be in place or there must be economic incentive for isps to invest in improving current infrastructure or in installing new infrastructure. generating collaborative systems for digital libraries | visser and ball 189 at all. success hinges on understanding that each community is unique, on leveraging its strengths, and on ameliorating its weaknesses. local government can play a significant role in the availability of broadband access. from a municipal perspective, emphasizing the role of broadband as a factor in economic development can help define how the municipality should most effectively advocate for broadband deployment and adoption. gillett offers four initiatives appropriate for stimulating broadband from a local viewpoint. municipal governments can ■■ become leaders in developing locally relevant internet content and using broadband in their own services; ■■ adopt policies that make it easier for isps to offer broadband; ■■ subsidize broadband users and/or isps; or ■■ become involved in providing the infrastructure or services themselves.12 individually or in combination these four initiatives underscore the fact that government awareness of the possibilities for community growth made possible by broadband access can lead to local government support for the initiatives of other local agencies, including nonprofit, municipal, or small businesses. agencies partnering to support community needs can provide evidence to local policy makers that broadband is essential for community success. once the municipality sees the potential for social and economic development, it is more likely to support policies that stimulate broadband buildout. building strong local partnerships will set the stage for the development of a sustainable broadband initiative as the different stakeholders share perspectives that take into account a variety of necessary components. when the time comes to implement a strategy, not only will different perspectives have been included, the plan will have champions to speak for it: the government, isps, public and private agencies, and community members. it is important to know which constituents are already engaged in supporting community broadband initiatives and which should be tapped. the ultimate purpose in establishing broadband internet access in a community is to benefit the individual community members, thereby stimulating local economic development. key players need to represent agencies that recognize the individual voice. a 2004 study led by strover provides an example of the importance of engaging local community leaders and agencies in developing a successful broadband access project.13 the study looked at thirty-six communities that received state funding to establish community technology centers (ctc). it addressed the effective use and management of ctcs and called attention to the inadequacy of supplying the hardware without community support which found that of the 27 percent of adult americans who are not internet users, 33 percent report they are not interested in going online.10 that pew can report similar information five years after the wilhelm article identifies a barrier to equitable access that has not been adequately resolved. ■■ models for sustainable broadband availability in discussing broadband, the question of what constitutes broadband inevitably arises. gillett, lehr, and osoria, in “local government broadband initiatives,” offers a functional definition: “access is ‘broadband’ if it represents a noticeable improvement over standard dial-up and, once in place, is no longer perceived as the limiting constraint on what can be done over the internet.”11 while this definition works in relationship to dial-up, it is flexible enough to apply to all situations by focusing on “a noticeable improvement” and “no longer perceived as the limiting constraint” (added emphasis). ensuring sustainable broadband access necessitates anticipating future demand. short sighted definitions, applicable at a set moment in time, limit long-term viability of alternative solutions. devising a sustainable solution calls for careful scrutiny of alternative models, because the stakes are so high in the broadband debate. there are many different players involved in constructing information policies. this does not mean, however, that their perspectives are mutually exclusive. in debates with multiple perspectives, it is important to involve stakeholders who are aligned with the ultimate goal: assuring access to quality broadband to anyone going online. what is successful for one community may be entirely inappropriate in another; designing a successful system requires examining and comparing a range of scenarios. existing circumstances may predetermine a particular starting point, but one first step is to evaluate best practices currently in place in a variety of communities to come up with a plan that meets the unique criteria of the community in question. sustainable broadband solutions need to be developed with local constituents in mind and successful solutions will incorporate the realities of current and future local technologies and infrastructure as well as local, state, and federal information policies. presupposing that the goal is to provide the community with the best possible option(s) for quality broadband access, these are key considerations to take into account when devising the plan. in addition to the technological and infrastructure issues, within a community there will be a combination of ways people access the internet. there will be those who have home access, those who need public access, and those who do not seek access 190 information technology and libraries | december 2010 the current emphasis on universal broadband depends on selecting the best of the alternative plans according to carefully vetted criteria in order to develop a flexible and forward-thinking course of action. can we let people remain without access to robust broadband and the necessary skill set to use it effectively? no. as more and more resources critical to basic life tasks are accessible only online, those individuals that face challenges to going online will likely be socially and economically disadvantaged when compared to their online counterparts. recognition of this potential for intensifying digital divide is recognized in the federal communication commission’s (fcc) national broadband plan (nbp) released in march 2010.18 the nbp states six national broadband goals, the third of which is “every american should have affordable access to robust broadband service, and the means and skills to subscribe if they so choose.”19 research conducted for the recommendations in the nbp was comprehensive in scope including voices from industry, public interest, academia, and municipal and state government. responses to more than thirty public notices issued by the fcc provide evidence of wide concern from a variety of perspectives that broadband access should become ubiquitous if the united states is to be a competitive force in the twentyfirst century. access to essential information such as government, public safety, educational, and economic resources requires a broadband connection to the internet. it is incumbent on government officials, isps, and community organizations to share ideas and resources to achieve a solution for providing their communities with robust and sustainable broadband. it is not necessary to have all users up to par with the early adopters. there is not a one-size-fits-all approach to wanting to be connected, nor is there a one-size-fits-all solution to providing access. what is important is that an individual can go online via a robust, high-speed connection that meets that individual’s needs at that moment. what this means for finding solutions is ■■ there needs to be a range of solutions to meet the needs of individual communities; ■■ they need to be flexible enough to meet the evolving needs of these communities as applications and online content continue to change; and ■■ they must be sustainable for the long term so that the community is prepared to meet future needs that are as yet unknown. solutions to providing broadband internet access will be most successful when they are designed starting at the local level. community needs vary according to local demographics, geography, existing infrastructure, types of service providers, and how state and federal systems in place. users need a support system that highlights opportunities available via the internet and that provides help when they run into problems. access is more than providing the infrastructure and hardware. the potential users must also find content that is culturally relevant in an environment that supports local needs and expectations. strover found the most successful ctcs were located in places that “actively attracted people for other social and entertaining reasons.”14 in other words, the ctcs did not operate in a vacuum devoid of social context. successful adoption of the ctcs as a resource for information was dependent on the targeted population finding culturally relevant content in a supportive environment. an additional point made in the study showed that without strong community leadership, there was not significant use of the ctc even when placed in an already established community center.15 this has significant implications for what constitutes access as libraries plan broadband initiatives. investments in technology and a national commitment to ensure universal access to these new technologies in the 1990s provide the current policy framework. as suggested by wilhelm in 2003, to continue to move forward the national agenda needs to focus on updating policies to fit new information circumstances as they arise. today’s information policy debates should emphasize a similar focus. beyond accelerating broadband deployment into underserved areas, wilhelm suggests there needs to be support for training and content development that guarantees communities will actually use and benefit from having broadband deployed in their area.16 technology training and support for local agencies that provide the public with internet access, as well as opportunities for the individuals themselves, is essential if policies are going to actually lead to useful broadband adoption. individual and agency internet access and adoption require investment beyond infrastructure; they depend on having both culturally relevant content and the information literacy skills necessary to benefit from it. ■■ finding the right solution though it may have taken an economic crisis to bring broadband discussions into the living room, the result is causing renewed interest in a long-standing issue. many states have formed broadband task forces or councils to address the lack of adequate broadband access at the state level and, on the national front, broadband was a key component of the american recovery and reinvestment act of 2009.17 the issue changes as technologies evolve but the underlying tenet of providing people access to the information and resources they need to be productive members of society is the same. what becomes of generating collaborative systems for digital libraries | visser and ball 191 difficult to measure, these kinds of social and cultural capital are important elements in ongoing debates about uses and consequences of broadband access. an ongoing challenge for those interested in the social, economic, and policy consequences of modern information networks will be to keep up with changing notions of what it means to be connected in cyberspace.”20 the social contexts in which a broadband plan will be enacted influence the appropriateness of different scenarios and should help guide which ones are implemented. engaging a variety of stakeholders will increase the likelihood of positive outcomes as community members embrace the opportunities provided by broadband internet access. it is difficult, however, to anticipate the outcomes that may occur as users become more familiar with the resources and achieve a higher level of comfort with technology. ramirez states, the “unexpected outcomes” section of many evaluation reports tends to be rich with anecdotes . . . . the unexpected, the emergent, the socially constructed innovations seem to be, to a large extent, off the radar screen, and yet they often contain relevant evidence of how people embrace technology and how they innovate once they discover its potential.21 community members have the most to gain from having broadband internet access. including them will increase the community’s return on its investment as they take advantage of the available resources. ramirez suggests that “participatory, learning, and adaptive policy approaches” will guide the community toward developing communication technology policies that lead to a vibrant future for individuals and community alike.22 as success stories increase, the aggregation of local communities’ social and economic growth will lead to a net sum gain for the nation as a whole. ■■ the role of the library public libraries play an important role in providing internet access to their community members. according to a 2008 study, the public library is the only outlet for no-fee internet access in 72.5 percent of communities nationwide; in rural communities the number goes up to 82.0 percent.23 beyond having desktop or, in some cases, wireless access, public libraries offer invaluable user support in the form of technical training and locally relevant content. libraries provide a secondary community resource for other local agencies who can point their clients to the library for no-fee internet access. in today’s economy where anecdotal reports show an increase in library use, particularly internet use, the role of the public policies mesh with local ordinances. local stakeholders best understand the complex interworking of their community and are aware of who should be included in the decision-making process. including a local perspective will also increase the likelihood that as community needs change, new issues will be brought to the attention of policy makers and agencies who advocate for the individual community members. community agencies that already are familiar with local needs, abilities, and expectations are logical groups to be part of developing a successful local broadband access strategy. the library exemplifies a community resource whose expertise in local issues can inform information policy discussions on local, state, and federal levels. as a natural extension of library service, libraries offer the added value support necessary for many users to successfully navigate the internet. the library is an established community hub for informational resources and provides dedicated staff, technology training opportunities, and no-fee public access computers with an internet connection. libraries in many communities are creating locally relevant web-based content as well as linking to other community resources on their own websites. seeking a partnership with the local library will augment a community broadband initiative. it is difficult to appreciate the impacts of current information technologies because they change so rapidly there is not enough time to realistically measure the effects of one before it is mixed in with a new innovation. with web-based technologies there is a lag time between what those in the front of the pack are doing online and what those in the rear are experiencing. while there is general consensus that broadband internet access is critical in promoting social and economic development in the twenty-first century as is evidenced by the national purposes outlined in the nbp, there is not necessarily agreement on benchmarks for measuring the impacts. three anticipated outcomes of providing community access to broadband are ■■ civic participation will increase; ■■ communities will realize economic growth; and ■■ individual quality of life will improve. when a strategy involves significant financial and energy investments there is a tendency to want palpable results. the success of providing broadband access in a community is challenging to capture. to achieve a level of acceptable success it is necessary to focus on local communities and aggregate anecdotal evidence of incremental changes in public welfare and economic gain. acceptable success is subjective at best but can be usefully defined in context of local constituencies. referring to participation in the development of a vibrant culture, horrigan notes that “while inherently 192 information technology and libraries | december 2010 isolation. an individual must possess skills to navigate the online resources. as users gain an understanding of the potential personal growth and opportunities broadband yields, they will be more likely to seek additional online resources. by stimulating broadband use, the library will contribute to the social and economic health of the community. if the library is to extend its role as the information hub in the community by providing no-fee access to broadband to anyone who walks through the door, the local community must be prepared to support that role. it requires a commitment to encourage build out of appropriate technology necessary for the library to maintain a sustainable internet connection. it necessitates that local communities advocate for national information and communication policies that are pro-library. when public policy supports the library’s efforts, the local community benefits and society at large can progress. what if the library’s own technology needs are not met? the role of the library in its community is becoming increasingly important as more people turn to it for their internet access. without sufficient revenue, the library will have a difficult time meeting this additional demand for services. in turn, in many libraries increased demand for broadband access stretches the limit of it support for both the library staff and the patrons needing help at the computers. what will be the fallout from the library not being able to provide internet services the patrons desire and require? will there be a growing skills difference between people who adopt emerging technologies and incorporate them into their daily lives and those who maintain the technological status quo? what will the social impact be of remaining off line either completely or only marginally? can the library be the bridge between those on the edge, those in the middle, and those at the end? with a strong and well articulated vision for the future, the library can be the link that provides the community with sustainable broadband. ■■ conclusion the recent national focus on universal broadband access has provided an opportunity to rectify a lapse in effective information policy. whether the goal includes facilitating meaningful access continues to be more elusive. as government, organizations, businesses, and individuals rely more heavily on the internet for sharing and receiving information, broadband internet access will continue to increase in importance. following the status quo will not necessarily lead to more people having broadband access in the long run. the early adopters will continue to stimulate technological innovation which, in turn, will trickle down the ranks of the different user types. currently, library as a stable internet provider cannot be overestimated. to maintain its vital function, however, the library must also resolve infrastructure challenges of its own. because of the increased demand for access to internet resources, public libraries are finding their current broadband services are not able to support the demand of their patrons. the issues are two-fold: increased patron use means there are often neither sufficient workstations nor broadband speeds to meet patron demand. in 2008, about 82.5 percent of libraries reported an insufficient number of public workstations, and about 57.5 percent reported insufficient broadband speeds.24 to add to these already significant issues, the report indicates libraries are having trouble supporting the necessary information technology (it) because of either staff time constraints or the lack of a dedicated it staff.25 public libraries are facing considerable infrastructure management issues at a time when library use is increasing. overcoming the challenges successfully will require support on the local, state, and federal level. here is where the librarian, as someone trained to become inherently familiar with the needs of her local constituency and ethically bound to provide access to a variety of information resources, needs to insert herself into the debate. librarians need to be ahead of the crowd as the voice that assures content will be readily accessible to those who seek it. today, the elemental policy issue regarding access to information via the internet hinges on connectivity to a sustainable broadband network. to promote equitable broadband access, the librarian needs be aware of the pertinent information policies in place or under consideration, and be able to anticipate those in the future. additionally, she will need to educate local policy makers about the need for broadband in their community. in some circumstances, the librarian will need to move beyond her local community and raise awareness of community access issues on the state and federal level. the librarian is already able to articulate numerous issues to a variety of stakeholders and can transfer this skill to advocate for sustainable broadband strategies that will succeed in her local community. there are many strata of internet users, from those in the forefront of early adoption to those not interested in being online at all. the early adopters drive the market which responds by making resources more and more likely to be primarily available only online. as we continue this trend, the social repercussions increase from merely not being able to access entertainment and news to being unable to participate in the knowledge-based society of the twenty-first century. by folding in added value online access for the community, the library helps increase the likelihood that the community will benefit from broadband being available to the library patrons and by extension to the community as a whole. to realize the internet’s full potential, access to it cannot be provided in generating collaborative systems for digital libraries | visser and ball 193 community, the entire community benefits regardless of where and how the individuals go online. the effects of the internet are now becoming broadly social enough that there is a general awareness that the internet is not decoration on contemporary society but a challenge to it.28 being connected is no longer an optional luxury; to engage in the twenty-first century it is essential. access to the internet, however, is more than simple connectivity. successful access requires: an understanding of the benefits to going on line, technological comfort, information literacy, ongoing support and training, and the availability of culturally relevant content. people are at various levels of internet use, from those eagerly anticipating the next iteration of web-based applications to those hesitant to open an e-mail account. this user spectrum is likely to continue. though the starting point may vary depending on the applications that become important to the user in the middle of the spectrum, there will be those out in front and those barely keeping up. the implications of the pervasiveness of the internet are only beginning to be appreciated and understood. because of their involvement at the cutting edge of internet evolution, librarians can help lead the conversations. libraries have always been situated in neutral territory within their communities and closely aligned with the public good. librarians understand the perspective of their patrons and are grounded in their local communities. librarians can therefore advocate effectively for their communities on issues that may not completely be understood or even recognized as mattering. connectivity is an issue supremely important to the library as today access to the full range of information necessitates a broadband connection. libraries have carved out a role for themselves as a premier internet access provider in the continually evolving online culture. as noted by bertot, mcclure, and jaeger, the “role of internet access provider for the community is ingrained in the social perceptions of public libraries, and public internet access has become a central part of community perceptions about libraries and the value of the library profession.”29 in times of both economic crisis and technological innovation, there are many unknowns. in part because of these two juxtaposed events, the role of the public library is in flux. additionally, the network of community organizations that libraries link to is becoming more and more complex. it is a time of great opportunity if the library can articulate its role and frame it in relationship to broader society. evolving internet applications require increasing amounts of bandwidth and the trend is to make these bandwidth-heavy applications more and more vital to daily life. one clear path the library community can take however, the supply of internet resources is unevenly stimulating user demand and the unequal distribution of broadband access has greater potential for significant negative social consequences. staying the course and following a haphazard evolution of broadband adoption, may, in fact, renew valid concerns about a digital divide. without an intentional and coordinated approach to developing a broadband strategy, its success is likely to fall short of expectations. the question of how to ensure that internet content is meaningful requires instituting a plan on a very local level, including stakeholders who are familiar with the unique strengths and weaknesses of their community. strover, in her 2000 article the first mile, suggests connectivity issues should be viewed from a first mile perspective where the focus is on the person accessing the internet and her qualitative experience rather than from a last mile perspective which emphasizes isp, infrastructure, and market concerns.26 both perspectives are talking about the same physical section of the connection network: the piece that connects the user to the network. according to strover, distinguishing between the first mile and last mile perspectives is more than an arbitrary argument over semantics. instead, a first mile perspective represents a shift “in the values and priorities that shape telecommunications policy.”27 by switching to a first mile perspective, connectivity issues immediately take into account the social aspects of what it means to be online. who will bring this perspective to the table? and how will we ascertain what the best approach to supporting the individual voice should be? the first mile perspective is one the library is intimately familiar with as an organization that traditionally advocates for the first mile of all information policies. the library is in a key position in the connectivity debate because of its inclination to speak for the user and to be aware of the unique attributes and needs of its local community. as part of its mission, the library takes into account the distinctive needs of its user community when it designs and implements its services. a natural outgrowth of this practice is to be keenly aware of the demographics of the community at large. the library can leverage its knowledge and understanding to create an even greater positive impact on the social, educational, and economic community development made possible by broadband adoption. to extend the first mile perspective analogy, in the connectivity debate, the library will play the role of the middle mile: the support system that successfully connects the internet to the consumer. while the target populations for stimulating demand for broadband are really those in the second tier of users, by advocating for the first mile perspective, the library will be advocating for equitable information policies whose implementation has bearing on the early adopters as well. by stimulating demand for broadband within a 194 information technology and libraries | december 2010 initiatives,” 538. 12. ibid., 537–58. 13. sharon strover, gary chapman, and jody waters, “beyond community networking and ctcs: access, development, and public policy,” telecommunications policy 28, no. 7/8 (2004): 465–85. 14. ibid., 483. 15. ibid. 16. wilhelm, “leveraging sunken investments in communications infrastructure,” 282. 17. see, for example, the virginia broadband round table (http://www.otpba.vi.virginia.gov/broadband_roundtable .shtml), the ohio broadband council (http://www.ohiobroad bandcouncil.org/), and the california broadband task force (http://gov.ca.gov/speech/4596. see www.fcc.gov/recovery/ broadband/) for information on broadband initiatives in the american recovery and reinvestment act. 18. federal communication commission, national broadband plan: connecting america, http://www.broadband.gov/ (accessed apr. 11, 2010). 19. ibid. 20. horrigan, “broadband: what’s all the fuss about?” 2. 21. ricardo ramirez, “appreciating the contribution of broadband ict with rural and remote communities: stepping stones toward and alternative paradigm,” the information society 23 (2007): 86. 22. ibid., 92. 23. denise m. davis, john carlo bertot, and charles, r. mcclure, “libraries connect communities: public library funding & technology access study 2007–2008,” 35, http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708/libraries connectcommunities.pdf (accessed jan. 24, 2009). 24. john carlo bertot et al., “public libraries and the internet 2008: study results and findings,” 11, http://www.ii.fsu.edu/ projectfiles/plinternet/2008/everything.pdf (accessed jan. 24, 2009). these numbers represent an increase from the previous year’s study which suggests that libraries while trying to meet demand are not able to keep up. 25. ibid. 26. sharon strover, “the first mile,” the information society 16, no. 2 (2000): 151–54. 27. ibid., 151. 28. clay shirky, “here comes everybody: the power of organizing without organizations.” berkman center for internet & society (2008). video presentation. available at http:// cyber.law.harvard.edu/interactive/events/2008/02/shirky (retrieved march 1, 2009). 29. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286, http://www.journals.uchicago.edu.proxy.ulib.iupui.edu/ doi/pdf/10.1086/588445 (accessed jan. 30, 2009). is to develop its role as the middle mile connecting the increasing breadth of internet resources to the general public. the broadband debate has moved out of the background of telecommunication policy and into the center of public attention. now is the moment that calls for an information policy advocate who can represent the end user while understanding the complexity of the other stakeholder perspectives. the library undoubtedly has its own share of stakeholders, but over time it is an institution that has maintained a neutral stance within its community, thereby achieving a unique ability to speak for all parties. those who speak for the library are able to represent the needs of the public, work with a diverse group of stakeholders, and help negotiate a sustainable strategy for providing broadband internet access. references and notes 1. lee rainie, “2.0 and the internet world,” internet librarian 2007, http://www.pewinternet.org/presentations/2007/20 -and-the-internet-world.aspx (accessed mar. 4, 2009). see also john horrigan, “a typology of information and communication technology users,” 2007, www.pewinternet.org/~/media// files/reports/2007/pip_ict_typology.pdf.pdf (accessed feb. 12, 2009). 2. lawrence lessig, “early creative commons history, my version,” video blog post, 2008, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed jan. 20, 2009). see the relevant passage from 20:53 through 21:50. 3. john horrigan, “broadband: what’s all the fuss about?” 2007, p. 1, http://www.pewinternet.org/~/media/ files/reports/2007/broadband%20fuss.pdf.pdf (accessed feb. 12, 2009). 4. “job-seeking in us public libraries,” public library funding & technology access study, 2009, http://www.ala.org/ ala/research/initiatives/plftas/issuesbriefs/brief_jobs_july.pdf (accessed mar. 27, 2009). 5. ibid. 6. ibid. 7. sharon e. gillett, william h. lehr, and carlos osorio, “local government broadband initiatives,” telecommunications policy 28 (2004): 539. 8. john horrigan, “home broadband adoption 2008,” 10, http://www.pewinternet.org/~/media//files/reports/2008/ pip_broadband_2008.pdf (accessed feb. 12, 2009). 9. anthony g. wilhelm, “leveraging sunken investments in communications infrastructure: a policy perspective from the united states,” the information society 19 (2003): 279–86. 10. horrigan, “home broadband adoption,” 12. 11. gillett, lehr, and osorio, “local government broadband roel some considered 2000 the year of the e-book, and due to the dot-com bust, that could have been the format’s highwater mark. however, the first quarter of 2004 saw the greatest number of e-book purchases ever with more than $3 million in sales. a 2002 consumer survey found that 67 percent of respondents wanted to read e-books; 62 percent wanted access to e-books through a library. unfortunately, the large amount of information written on e-books has begun to develop myths around their use, functionality, and cost. the author suggests that these myths may interfere with the role of libraries in helping to determine the future of the medium and access to it. rather than fixate on the pros and cons of current versions of e-book technology, it is important for librarians to stay engaged and help clarify the role of digital documents in the modern library. a lthough 2000 was unofficially proclaimed as the year of the electronic book, or e-book, due in part to the highly publicized release of a stephen king short story exclusively in electronic format, the dot-com bust would derail a number of high-profile e-book endeavors. with far less fanfare, the e-book industry has been slowly recovering. in 2004, e-books represented the fastest-growing segment of the publishing industry. during the first quarter of that year, more than four hundred thousand e-books were sold, a 46 percent increase over the previous year ’s numbers.1 e-books continue to gain acceptance with some readers, although their place in history is still being determined—fad? great idea too soon? wrong approach at any time? the answers partly depend on the reader ’s perspective. the main focus of this article is the role of e-book technologies in libraries. libraries have always served as repositories of the written word, regardless of the particular medium used to store the words. from the ancient scrolls of qumran to the hand-illuminated manuscripts of medieval europe to the familiar typeset codices of today, the library’s role has been to collect, organize, and share ideas via the written word. in today’s society, the written word is increasingly encountered in digital form. writers use word processors; readers see words displayed; and researchers can scan countless collections without leaving the confines of the office. for self-proclaimed book lovers, the digital world is not necessarily an ideal one. emotional reactions are common when one imagines a world without a favorite writing pen or the musty-smelling, yellowed pages of a treasured volume from youth. one of the battle lines between the traditional bibliophile and the modern technologist is drawn over the concept of the e-book. some see this digital form of written word as an evolutionary step beyond printed texts, which have been sometimes humorously dubbed tree-books. although a good deal of attention has been generated by the initial publicity regarding newer e-book technologies, the apparent failures of most of them has begun to establish myths around the concept. abram points out that the relative success of e-books in niche areas (such as reference works) is in direct contrast with public opinion of those purchasing novels and popular literature through traditional vendors.2 crawford paraphrases lewis carroll in describing this confusion: “when you cope with online content about e-books, you can believe six impossible things before breakfast.”3 incidentally, this article will attempt to dispel a mere five of the myths about e-books. the future of e-books and the critical role of libraries in this future are best served by uncovering these myths and seeking a balanced, reasoned view of their potential. a 2002 consumer survey on e-books found that 67 percent of respondents wanted to read an e-book, and 62 percent wanted that access to be from a library.4 underlying this position is the assumption that the ideas represented by the written word are of paramount importance to both writers and readers. it is also assumed that libraries will continue their critical role in collecting, organizing, and sharing information. � myth 1—e-books represent a new idea that has failed many libraries have invested in various forms of e-book delivery with mixed results.5 sottong wisely warns of the premature adoption of e-book technology, which he dubs a false pretender as a replacement to printed texts.6 however, the last five years are but a small part of a longer history, and presumably, a still longer future as is often the case with computer jargon, the term e-book has emerged and gained currency in a very short amount of time. however, the concept of providing written texts in an electronic format has existed for a long time, as demonstrated by bush’s description of the dispelling five myths about e-books james e. gall james e. gall (james.gall@unco.edu) is assistant professor of educational technology at the university of northern colorado, greeley. dispelling five myths about e-books | gall 25 26 information technology and libraries | march 2005 memex.7 the gutenberg project put theory into practice by converting traditional texts into digital files as early as 1971.8 even if the e-book merely represents the latest incarnation of the concept, it does so tenuously. books in their present form have a history of hundreds of years, or thousands if their parchment and papyrus ancestors are included. this history is rich with successes and failures of technology. for example, petroski presents an interesting historical examination of the problem of storing books when the one book–one desk model collapsed under the proliferation of available texts.9 similarly, a determination on the success or failure of e-books, or digital texts, based upon a relatively short period of time, is fraught with difficulty. rather, it is important to look at recent developments as merely a next step. the technology is clearly not ready for uncritical, widespread acceptance, but it is also deserving of more than a summary dismissal. � myth 2—e-books are easily defined the term e-book means different things depending on the context. at the simplest, it refers to any primarily textual material that is stored digitally to be delivered via electronic display. one of the confusing aspects of defining ebooks is that in the digital world, information and the media used to store, transfer, and view it are loosely coupled. an e-book in digital form can be stored on cd–rom or any number of other media and then passed on through computer networks or telephone lines. the device used to view an e-book could be a standard computer, a personal digital assistant (pda), or an e-book reader (the dedicated piece of equipment on which an e-book can be read; confusingly, also referred to as an e-book). technically, virtually any computing device with a display could be used as an e-book reader. from a practical point of view, our eyes might not tolerate reading great lengths of text on a wireless phone, and banks will not likely provide excerpts of chaucer during atm transactions. another important factor in defining e-books is the actual content. a conservative definition is that an e-book is an electronic copy or version of a printed text. this appears to be the predominant view of publishers. purists often maintain that a true e-book is one that is specifically written for that format and not available in traditional printed form.10 this was one of the categories of the shortlived (2000–2002) frankfurt e-book awards. of course, the multitude of textual materials that could be delivered via the technology exceeds these definitions. magazines, primary-source documents, online commentaries and reviews, and transcripts of audio or video presentations are just a short list of nonbook materials that are finding their way into e-book formats. one can note with some sense of irony that the technology behind the web was originally designed as a way for scientists to disseminate research reports.11 despite the web’s popularity, reading research reports makes up an exceedingly small percentage of its use today. although there is a continuing effort to reach a common standard for e-books (see www.openebook.org/), the current marketplace contains numerous noncompatible formats. this noncompatibility is the result of both design and competitive tradeoffs. in the case of the former, there is a distinct philosophical difference between formats that attempt to retain the original look and navigation of the printed page (such as adobe’s popular pdf files) versus those that retain the text’s structure but allow variability in its presentation (as best exemplified by the free-flowing nature of texts presented as html pages). this difference can also be seen in the functionality built around the format. traditional systems provide readers with familiar book characteristics such as a table of contents, bookmarks, and margin notes, a view that could be named bibliocentric. the alternative is one that takes more advantage of the new medium and could be labeled technocentric, and can most easily be seen in the extensive use of hyperlinking.12 the simplest use of hyperlinking provides an easy form of annotating texts and presenting related texts. on the other extreme, hyperlinks are used in the creation of nonlinear texts in which the followed links provide a unique context for building meaning on the part of the reader.13 it is interesting to note that a preliminary study of e-book features found that the most desirable features tended to reflect the functionality of traditional books and the least desirable features provided functionality not found there.14 competitive tradeoffs are a critical issue at the current point of e-book development. the current profit models of publishing entities and copyright concerns of authors seem naturally opposed to e-book formats in which texts were freely shared, duplicated, and distributed. for example, the open ebook forum is the most prominent organization devoted to the development of standards for e-book technologies. in late 2004, their web site listed seventy-six current members. although the american library association is a member, it is one of only six members representing library-oriented organizations. in comparison, thirty-five members (or 46 percent) are publishing organizations, and thirteen (or 17 percent) are technology companies.15 the number of traditional publishers versus technology companies on this list may suggest that a bibliocentric view of ebooks would be more favored. this also appears to confirm one media prediction that traditional publishers would continue to dominate efforts with this new medium.16 however, the limited representation of libraries in this endeavor is troubling (despite the disclaimer of using an admittedly rough metric for measuring impact). it is clear that many industry formats attempt to limit the ability to distribute materials by keying files so that they may only be viewed on one device or a specific installed version of the reader software. this creates technological problems for entities like libraries that attempt to provide access to information for various parties. the concept of fair use of copyrighted materials has to be reexamined under an entirely new set of assumptions. another irony is that the availability of free, public-domain materials in e-book format can be viewed as negative by the publishing industry. after investing considerable time and effort in developing e-book technology, publishers would prefer that users continue purchasing new e-book material rather than spend time reading the vast library of free historical material. many of these content issues are currently being played out in courts and the marketplace, particularly with regard to digital music and video.17 although one can humorously imagine the so-called problems associated with a population obsessed with downloading and reading great literature, the precedents set by these popular media will have a direct impact on the future of digital texts. despite the labor required to scan or key entire print books into digital formats, there have been some reports of this type of piracy.18 other models for the dissemination of digital intellectual property that are not determined by traditional material concerns of supply and demand will continually be attempted. for example, nelson predicted a hypertext-publishing scheme in which all material was available, but royalties were distributed according to actual access by end users.19 theoretically, such a system would provide a perfect balance between access and profitability. in nelson’s words “nothing will ever be misquoted or out of context, since the user can inquire as to the origins and native form of any quotation or other inclusion. royalties will be automatically paid by a user whenever he or she draws out a byte from a published document.”20 � myth 3—e-books and printed books are competing media many, if not most, published articles regarding e-books follow classic plot construction; the writer must present a protagonist and an antagonist. bibliophiles place the printed page as the hero and the e-book as the potential bane of civilization. proulx, one such author, was quoted as saying, “nobody is going to sit down and read a novel on a twitchy little screen—ever.”21 technologists cast the e-book as the electronic savior of text, replacing the tired tradition of the printed word in the same way the printed word replaced oral traditions. hawkins quotes an author who claims that e-books are “a meteor striking the scholarly publication world.” his slightly more restrained view was that e-books had the potential “to be the most far-reaching change since gutenberg’s invention.”22 grant places this metaphorical battle at the forefront by titling an article “e-books: friend or foe?”23 before deciding which side to take, consider whether this clash of media is an appropriate metaphor. this author has introduced samples of current ebook technology in graduate classes he has taught. when presented with the technology as part of the coursework, students quickly declare their allegiances. bibliophiles most often suggest that the technology will never replace the love of curling up with a good book. the technologists will ask how many pages can be stored in the device and then fantasize about the types of libraries they can carry and the various venues for reading that they will explore. however, after a few weeks in using the devices, both groups tend to move to a middle ground of practical use. at that point, the discussion turns to what materials are best left on the printed page (usually described as pleasure reading) and what would be useful in e-book format (reference works, course catalogs, how-to manuals). other instructors have reported similar patterns of use.24 at this point, the observation is largely anecdotal, but it does call into question the perceived need for a decisive referendum on the value of e-books. the issue is not whether e-books will replace the printed word. the concern of librarians and others involved in the infrastructure of the book should be on developing the proper role for e-books in a broader culture of information. unless this approach is taken, the true goal of libraries—disseminating information to the public—will suffer. the gap between bibiliophile and technologist approaches can already be seen in the materials available in e-book format. the publishing industry in general treats the e-book as just another format, releasing the same titles in hardcover, book-on-tape, and e-book at the same time. on the opposite end of the spectrum, technologists have adopted various e-book formats for creating and transferring numerous reference documents. given their preferences, it is easy to find e-book references on unix, html coding, and the like, but there is a scarcity of materials in philosophy, history, and the arts. librarians seem the most appropriate group for developing shared understanding. publishers and e-book hardware and software manufacturers need to be concerned with the bottom line. libraries, by design, are concerned with the preservation of information and its continued dissemination long after the need to sell a particular book has passed. the hobby of creating and transferring texts to digital form is idiosyncratic and unorganized when viewed from the highest levels. libraries not only contain expertise in all areas of human endeavor, but also have strategies for categorizing and maintaining information in productive ways. in short, libraries are the best line of defense for maintaining the value of the printed page and promoting the value of digital texts. dispelling five myths about e-books | gall 27 28 information technology and libraries | march 2005 � myth 4—e-books are expensive a common complaint about e-books is that they are expensive. on the surface, this seems clear. dedicated ebook readers seemed to bottom out at around $300, and a new bestseller in e-book format is priced about the same as the hardcover edition. add the immediate and longterm costs of rechargeable batteries and the electricity needed to power them, and the economic case against the e-book appears closed. what if we turn the same critical eye to the printed page? the manufacture and distribution of printed texts is highly developed and astounding. when gutenberg succeeded in putting the christian bible in the hands of the moneyed public, he surely could not have comprehended the billions of copies that would eventually be distributed. even with the wealth of printed material at hand, one must still consider the high cost of the system. the law of supply and demand rules books as a tangible product. the most profitable books are those that will reach the most readers. specialized texts have limited audiences and, therefore, will usually be priced higher. this produces problems for both groups. popular texts must be printed in high quantities and delivered to various outlets. unfortunately, the printed page does have maintenance costs. sellen and harper point out that the actual printing cost is insignificant compared with the cost of dealing with documents after printing. they cite one study that indicated that united states businesses spend about $1 billion per year designing and printing forms, but spend an additional $25 to $35 billion filing, storing, and retrieving them.25 books are no different; as any librarian knows, it costs money to maintain a collection and protect texts from the environment and the effects of age. in the retail arena, the competition is fiercer. books that do not sell are removed in favor of those that do. it is estimated that 10 percent of texts printed each year are turned to pulp, although, fortunately, many are recycled.26 the bbc reported that more than two million former romance novels were used in the construction of a new tollway.27 with more specialized texts, the problem is not wealth, but scarcity. if a text is not profitable, it will probably become out of print. this is often synonymous with inaccessible. from the publisher’s perspective, it is only cost-effective to commit to a printing when the demand is high enough. a library is a good source of outof-print texts, provided that it has been funded appropriately to acquire and maintain the particular works that are needed. e-books are not a panacea. other innovations, such as on-demand publishing, may be part of the answer in solving the economic issues regarding collections. however, e-books can help alleviate some of these issues. e-books are easily copied and distributed, which is a boon to the researcher and information consumer. in many cases, the goal is the access to information, not the possession of a book. it could also benefit the author and publisher if appropriate reimbursement systems are put into place. as previously described, nelson originally envisioned his online hypertext system, xanadu, with a mechanism for royalties based on access—a supply-anddemand system for ideas, not materials.28 the systems used to manage access to digital materials continue to increase in complexity and have spawned a whole new business of digital rights management (drm).29 examples include reciprocal (www.reciprocal.com), overdrive (www.overdrive.com), and netlibrary (www.net library.com). libraries are the specific target of netlibrary, which promotes an e-books-on-demand project that allows free access for short periods of time.30 the creation of a standard digital object identifier (doi) for published materials may also help online publishers and entities like libraries manage their digital collections more easily.31 online music systems, such as apple’s itunes (www. itunes.com), strike a workable balance between quickand-easy access to music and a workable, economic model for reimbursing artists. e-books also have appeal for special audiences who already require assistive technologies for accessing print collections.32 having discussed the hidden costs of printed texts, another important economic issue of e-books to examine is a current trend in usage. despite the availability of dedicated e-book readers, the largest growth in e-book usage is surely in nondedicated devices. e-book–reading software is available for personal computers, laptops, and pdas. according to one source, microsoft had sold four million pocketpc e-book-enabled devices, and had two million downloads of the ms reader for the personal computer; palm had sold approximately 20 million ebook-enabled devices; and adobe had more than 30 million acrobat readers downloaded.33 these numbers alone indicate some 24 million reader-capable pdas, and 32 million reader-capable pcs, for a total of 56 million devices. although it is difficult to find data on actual use, one online bookseller reported some data on e-book use from an audience survey.34 although 88 percent had purchased books online, only 16 percent had read an e-book (11 percent using a pc, 3 percent on a handheld device, and 2 percent on both). it is presumed that in most cases this equipment was purchased for other reasons, with ebook reading being a secondary function. as such, it would be unfair to include the full cost of this equipment in any calculation of the cost of providing information in an e-book format. if so, the cost of providing artificial lighting in any building where reading takes place would need to be calculated as part of the cost of the printed page. the potential user base for the e-book rises as more computers and pdas are sold, decreasing the need for special equipment. this does not mean that the dedicated e-book reader is obsolete. by most commercial accounts, the apple newton was a failure. its bulky size and awkward interface were the subject of much ridicule. however, it did introduce the concept of the pda. the success of the palm line of products owes much to the proof of concept provided by the newton. the makers of the portable gameboy videogame system are repositioning it for multimedia digital-content delivery, and plan to pilot a flash-memory download system for various content types, including e-books.35 innovative products such as e-paper are already developed in prototype form.36 they are likely to lead to another wave of dedicated e-book readers or provide e-book–reading potential embedded in other consumer applications. � myth 5—e-books are a passing fad it is trendy to list the failures of past media (such as radio, film, and television) in impacting education despite great initial promise.37 however, all those media are still with us after having found particular niches within our culture. if the e-book is viewed as just an alternative format, comparisons with past experiences of library collections containing videotapes, record albums, and such are not appropriate.38 however, if e-books are viewed as a tool or way to access information, the questions change. instead of asking how digital formats will replace print collections, we can ask how will an e-book version extend the reach of our current collection or provide our readers with resources previously unavailable or unaffordable. when trying to locate a research article, one is generally not concerned with whether the local library has a loose copy, bound copy, microform, microfiche, or even has to resort to interlibrary loan. as long as the content is accessible and can be cited, it can be used. electronic access to journal content is becoming more common. perhaps dry journal articles do not conjure up the same romantic visions of exploring the stacks that may hinder greater acceptance of e-books. a parallel can be drawn to the current work of filmrestoration experts. the medium of film has reached an age where some of the earliest influential works no longer exist or are in a condition of rapid deterioration. according to one film site, more than half of the films made before 1950 have already been lost due to decay of existing copies.39 the work of restoration involves finding what remains of a great work in various vaults and collections. often, the only usable film is a secondor third-generation copy. from digitized copies, cleaning, color correction, and other painstaking work, a restored and—it is hoped—complete work emerges. ironically, once this laborious process is completed, a near-extinct classic is suddenly available to millions in the form of a dvd disc at a local retailer. what if the same attitude was taken with the world’s collections of printed materials? jantz has described potential impacts of e-book technology on academic libraries.40 lareau conducted a study on using e-books to replace lost books at kent state university, but found that limited availability and high costs did not make it feasible at the time.41 project gutenberg (www.gutenberg.net) and the electronic text center at the university of virginia (http://etext.lib.virginia.edu) are two examples of scholars attempting to save and share book content in electronic forms, but more efforts are needed. unfortunately, the shift to digital content has also contributed to the sheer volume of content available. edwards has recently discussed issues in attempting to archive and preserve digital media.42 the web may be suffering from a glut of information, but the content is highly skewed toward the new and technology oriented. in a few years, we may find that nontechnology–related endeavors are no longer represented in our information landscape. � conclusion the e-book industry is currently dominated by commercial-content providers, such as franklin, and software companies, most notably adobe, palm, and microsoft. traditional print-based publishers have also maintained continued interest in the medium. it is assumed that these publishers had the capital to weather the ups and downs of the industry more so than new publishers dedicated solely to e-book delivery. although the contributions and efforts of these organizations are needed, the future of e-book content should not be left to their largesse. when the rocket e-book device was initially released, a small but loyal following of readers contributed thousands of titles to its online library. some of these titles were self-published vanity projects or brief reference documents, but many were public-domain classics, painstakingly scanned or keyed in by readers wishing to share their favorite reads. when gemstar purchased rocket, the software’s ability to create non-purchased content was curtailed and the online library of free titles dismantled. apparently, both were viewed as limiting the profitability of the e-book vendor. however, gemstar recently made notice of discontinuing their e-book reading devices, one would assume due to a lack of profitability. this can be seen as a cautionary tale for libraries, which often define success by number of volumes available and accessed rather than units sold. committing to a technology that concurrently requires consumer success can be problematic. bibliophile and technologist alike must take responsibility for the future of our collective information resources. the bibliophile must ensure that all aspects of dispelling five myths about e-books | gall 29 30 information technology and libraries | march 2005 human knowledge and creativity are nurtured and allowed to survive in electronic forms. the technologist must ensure that accessibility and intellectual-property rights are addressed with every technological innovation. parry provides three concrete suggestions for public libraries in response to new media demands: continue to acknowledge and respond to customer demands, revisit the library’s mission statement for currency, and promote or accelerate shared agreements with other institutions to alleviate the high costs of accumulating resources.43 the proper frame of mind for these activities is suggested by levy: we make a mistake, i believe, when we fixate on particular forms and technologies, taking them in and of themselves, to be the carriers of what we want to embrace or resist. . . . it isn’t a question, it needn’t be a question, of books or the web, of letters or e-mail, of digital libraries or the bricks-and-mortar variety, of paper or digital technologies. . . . these modes of operation are only in conflict when we insist that one or the other is the only way to operate.44 in the early 1930s, lomax dragged his primitive audio-recording equipment over the roads of the american south to capture the performances of numerous folk musicians.45 at the time, he certainly didn’t imagine that at one point in history someone with a laptop computer sitting in a coffee shop with wireless access could download the performances of robert johnson from itunes. however, without his efforts, those unique voices in our history would have been lost. it is hoped that the readers of the future will be thanking the library professionals of today for preserving our print collections and enabling their access digitally via our primitive, but evolving, e-book technologies. references 1. open e-book forum, “press release: record e-book retail sales set in q1 2004,” june 4, 2004. accessed dec. 27, 2004, www.openebook.org. 2. stephen abram, “e-books: rumors of our death are greatly exaggerated,” information outlook 8, no. 2 (2004): 14–16. 3. walt crawford, “the white queen strikes again: an e-book update,” econtent 25, no. 11 (2002): 46–47. 4. harold henke, “consumer survey on e-books.” accessed dec. 27, 2004, www.openebook.org. 5. sue hutley, “follow the e-book road: e-books in australian public libraries,” aplis 15, no. 1 (2002): 32–37; andrew k. pace, “e-books: round two,” american libraries 35, no. 8 (2004): 74–75; michael rogers, “librarians, publishers, and vendors revisit e-books,” library journal 129, no. 7 (2004): 23–24. 6. stephen sottong, “e-book technology: waiting for the ‘false pretender,’” information technology and libraries 20, no. 2 (2001): 72–80. 7. vannevar bush, “as we may think,” atlantic monthly 176, no. 1 (1945): 101–108. 8. michael s. hart, “history and philosophy of project gutenberg.” accessed dec. 27, 2004, www.gutenberg.net/ about.shtml. 9. henry petroski, the book on the bookshelf (new york: vintage, 2000). 10. steve ditlea, “the real e-books,” technology review 103, no. 4 (2000): 70–73. 11. tim berners-lee, weaving the web: the original design and ultimate destiny of the world wide web by its inventor (new york: harpercollins, 1999). 12. james e. gall and annmari m. duffy, “e-books in a college course: a case study” (presented at the association for educational communications and technology conference, atlanta, ga., nov. 8–10, 2001). 13. george p. landow, hypertext 2.0: the convergence of contemporary critical theory and technology (baltimore, md.: johns hopkins univ. pr., 1997). 14. harold henke, “survey on electronic book features.” accessed dec. 27, 2004, www.openebook.org. 15. open e-book forum, “press release: record e-book retail sales set in q1 2004.” 16. lori enos, “report: e-book industry set to explode,” e-commerce times, 20 dec. 2000. accessed dec. 27, 2004, www. ecommercetimes.com/story/6215.html. 17. luis a. ubinas, “the answer to video piracy,” mckinsey quarterly no. 1. accessed accessed dec. 27, 2004, www .mckinseyquarterly.com. 18. mark hoorebeek, “e-books, libraries, and peer-topeer file-sharing,” australian library journal 52, no. 2 (2003): 163–68. 19. theodor h. nelson, “managing immense storage,” byte 13, no. 1 (1988): 225–38. 20. ibid., 238. 21. jacob weisberg, “the way we live now: the good ebook,” new york times, 4 june 2000. accessed dec. 27, 2004, www.nytimes.com. 22. donald t. hawkins, “electronic books: a major publishing revolution. part 1: general considerations and issues,” online 24, no. 4 (2000): 14–28. 23. steve grant, “e-books: friend or foe?” book report 21, no. 1 (2002): 50–54. 24. lori bell, “e-books go to college,” library journal 127, no. 8 (2002): 44–46. 25. abigail j. sellen and richard h. harper, the myth of the paperless office (cambridge, mass.: mit pr., 2002). 26. stephen moss, “pulped fiction,” sydney morning herald, 29 mar. 2002. accessed dec. 27, 2004, www.smh.com.au. 27. bbc news, “m6 toll built with pulped fiction,” bbc news uk edition, 18 dec. 2003. accessed dec. 27, 2004, http:// news.bbc.co.uk. 28. nelson, “managing immense storage.” 29. michael a. looney and mark sheehan, “digitizing education: a primer on e-books,” educause 36, no. 4 (2001): 38–46. 30. brian kenney, “netlibrary, ebsco explore new models for e-books,” library journal 128, no. 7 (2003). 31. stephen h. wildstrom, “a library to end all libraries,” business week (july 23, 2001): 23. online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers 26 information technology and libraries | september 2007 author id box for 2 column layout wikis in libraries matthew m. bejune wikis have recently been adopted to support a variety of collaborative activities within libraries. this article and its companion wiki, librarywikis (http://librarywikis. pbwiki.com/), seek to document the phenomenon of wikis in libraries. this subject is considered within the framework of computer-supported cooperative work (cscw). the author identified thirty-three library wikis and developed a classification schema with four categories: (1) collaboration among libraries (45.7 percent); (2) collaboration among library staff (31.4 percent); (3) collaboration among library staff and patrons (14.3 percent); and (4) collaboration among patrons (8.6 percent). examples of library wikis are presented within the article, as is a discussion for why wikis are primarily utilized within categories i and ii and not within categories iii and iv. it is clear that wikis have great utility within libraries, and the author urges further application of wikis in libraries. i n recent years, the popularity of wikis has skyrocketed. wikis were invented in the mid1990s to help facilitate the exchange of ideas between computer programmers. the use of wikis has gone far beyond the domain of com puter programming, and now it seems as if every google search contains a wikipedia entry. wikis have entered into the public consciousness. so, too, have wikis entered into the domain of professional library practice. the purpose of this research is to document how wikis are used in librar ies. in conjunction with this article, the author has created librarywikis (http://librarywikis.pbwiki.com/), a wiki to which readers can submit additional examples of wikis used in libraries. the article will proceed in three sections. the first section is a literature review that defines wikis and introduces computersupported cooperative work (cscw) as a context for understanding wikis. the second section documents the author’s research and presents a schema for classifying wikis used in libraries. the third section considers the implications of the research results. ■ literature review what’s a wiki? wikipedia (2007a) defines a wiki as: a type of web site that allows the visitors to add, remove, edit, and change some content, typically with out the need for registration. it also allows for linking among any number of pages. this ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring. wikis have been around since the mid1990s, though it is only recently that they have become ubiquitous. in 1995, ward cunningham launched the first wiki, wikiwikiweb (http://c2.com/cgi/wiki), which is still active today, to facilitate the exchange of ideas among computer program mers (wikipedia 2007b). the launch of wikiwikiweb was a departure from the existing model of web communica tion ,where there was a clear divide between authors and readers. wikiwikiweb elevated the status of readers, if they so chose, to that of content writers and editors. this model proved popular, and the wiki technology used on wikiwikiweb was soon ported to other online communi ties, the most famous example being wikipedia. on january 15, 2001, wikipedia was launched by larry sanger and jimmy wales as a complementary project for the nowdefunct nupedia encyclopedia. nupedia was a free, online encyclopedia with articles written by experts and reviewed by editors. wikipedia was designed as a feeder project to solicit new articles for nupedia that were not submitted by experts. the two services coexisted for some time, but in 2003 the nupedia servers were shut down. since its launch, wikipedia has undergone rapid growth. at the close of 2001, wikipedia’s first year of operation, there were 20,000 articles in eighteen language editions. as of this writing, there are approximately seven million articles in 251 languages, fourteen of which have more than 100,000 articles each. as a sign of wikipedia’s growth, when this manuscript was first submitted four months earlier, there were more than five million articles in 250 languages. author’s note: sources in the previous two para graphs come from wikipedia. the author acknowledges the concerns within the academy regarding the practice of citing wikipedia within scholarly works; however, it was decided that wikipedia is arguably an authoritative source on wikis and itself. nevertheless, the author notes that there were changes—insubstantial ones—to the cited wikipedia entries between when the manuscript was first submitted and when it was revised four months later. wikis and cscw wikis facilitate collaborative authoring and can be con sidered one of the technologies studied under the domain of cscw. in this section, cscw is explained and it is shown how wikis fit within this framework. cscw is an area of computer science research that considers the application of computer technology to sup port cooperative, also referred to as collaborative work. the term was first coined in 1984 by irene greif (1988) and matthew m. bejune (mbejune@purdue.edu) is an assistant professor of library science at purdue university libraries. he also is a doctoral student at the graduate school of library and information science, university of illinois at urbana-champaign. article title | author 27wikis in libraries | bejune 27 paul cashman to describe a workshop they were planning on the support of people in work environments with com puters. over the years there have been a number of review articles that describe cscw in greater detail, including bannon and schmidt (1991), rodden (1991), schmidt and bannon (1992), sachs (1995), dourish (2001), ackerman (2002), olson and olson (2002), dix, finlay, abowd, and beale (2004), and shneiderman and plaisant (2005). publication in the field of cscw primarily occurs through conferences. the first conference on cscw was held in 1986 in austin, texas. since then, the conference has been held biennially in the united states. proceedings are published by the association for computing machinery (acm, http://www.acm.org/). in 1991, the first european conference on computer supported cooperative work (ecscw) was held in amsterdam. ecscw also is held biennially, in oddnumbered years. ecscw proceedings are published by springer (http://www.ecscw.unisie gen.de/). the primary journal for cscw is computer supported cooperative work: the journal of collaborative computing. publications also appear within publications of the acm and chi, the conference on human factors in computing. cscw and libraries as libraries are, by nature, collaborative work envi ronments—library staff working together and with patrons—and as digital libraries and computer technolo gies become increasingly prevalent, there is a natural fit between cscw and libraries. the following researchers have applied cscw to libraries. twidale et al. (1997) pub lished a report sponsored by the british library research and innovation centre that examined the role of col laboration in the informationsearching process to inform how information systems design could better address and support collaborative activity. twidale and nichols (1998) offered ethnographic research of physical collaborative environments—in a university library and an office—to aid the design of digital libraries. they wrote two reviews of cscw as applied to libraries—the first was more com prehensive (twidale and nichols 1998) than the second (twidale and nichols 1999). sánchez (2001) discussed collaborative environments designed and prototyped for digital library environments. classification of collaboration technologies that facilitate collaborative work are typically classified within cscw across two continua: synchronous versus asynchronous, and colocated versus remote. if put together in a twobytwo matrix, there are four possibilities: (1) synchronous and colocated (same time, same place); (2) synchronous and remote (same time, different place); (3) asynchronous and remote (different time, different place); and (4) asynchronous and colocated (different time, same place). this classification schema was first proposed by johansen et al. (1988). nichols and twidale (1999) mapped work applications within the realm of cscw in figure 1. wikis are not present in the figure, but their absence is not an indication that they are not cooperative work technologies. rather, wikis were not yet widely in use at the time cscw was considered by nichols and twidale. the author has added wikis to nichols and twidale’s graphical representation in figure 2. interestingly, wikis are bordercrossers fitting within two quadrants: the upper right—asynchronous and colocated; and the lower right—asynchronous and remote. wikis are asynchronous in that they do not require people to be working together at the same time. they are both colocated and remote in that people working collaboratively may not need to be working in the same place. it is also interesting to note that library technologies also can be mapped using johansen’s schema. nichols and twidale (1999) also mapped this, and figure 3 illus trates the variety of collaborative work that goes on within libraries. ■ method in order to to discover the widest variety of wikis used in libraries, the author searched for examples of wikis used in libraries within three areas—the lis literature, the library success wiki, and within messages posted on three professional electronic discussion lists. when examples were found, they were logged and classified according to a schema created by the author. results are presented in the next section. the first area searched was within the lis literature. the author utilized the wilson library literature and figure 1. classification of cscw applications co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing team rooms organizational memory workflow web-based applications collaborative writing 2� information technology and libraries | september 20072� information technology and libraries | september 2007 information science database. there were two main types of articles: ones that argued for the use of wikis in libraries, and ones that were case studies of wikis that had been implemented. the second area searched was within library success: a best practices wiki (http://www.libsuccess.org/) (see figure 4), created by meredith farkas, distance learning librarian at norwich university. as the name implies, it is a place for people within the library community to share their success stories. posting to the wiki is open to the public, though registration is encouraged. there are many subject areas on the wiki, including management and leadership, readers’ advisory, reference services, infor mation literacy, and so on. there also is a section about collaborative tools in libraries (http://www.libsuccess .org/index.php?title=collaborative_tools_in_libraries), in which examples of wikis in libraries are presented. within this section there is a presentation about wikis made by farkas (2006) titled wiki world (http://www. libsuccess.org/indexphp?title=wiki_world), from which examples were culled. the third area that was searched was professional electronic discussion list messages from web4lib, dig_ ref, and librefl. the web4lib electronic discussion list (tennant 2005) is “for the discussion of issues relating to the creation, management, and support of library based world wide web servers, services, and applica tions.” the list is moderated by roy tennant and the web4lib advisory board and was started in 1994. the dig_ref electronic discussion list is a forum for “people and organizations answering the questions of users via the internet” (webjunction n.d.). the list is hosted by the information institute of syracuse, school of information studies, syracuse university, and was created in 1998. the librefl electronic discussion list is “a moderated discussion of issues related to reference librarianship (balraj 2005). established in 1990, it’s operated out of kent state university and moderated by a group of list own ers. these three electronic discussion lists were selected for two reasons. first, the author is a subscriber to each electronic discussion list, and prior to the research noted there were messages about wikis in libraries. second, based on the descriptions of each electronic discussion list stated above, the selected electronic discussion lists reasonably covered the discussion of wikis in libraries within the professional library electronic discussion lists. one year of messages, november 15, 2005, through november 14, 2006, was analyzed for each list. messages about wikis in libraries were identified through key word searches against the author’s personal archive of electronic discussion list messages collected over the figure 2. classification of cscw applications including wikis co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing wikis team rooms wikis organizational memory workflow web-based applications collaborative writing figure 3. classification of collaborative work within libraries co-located remote synchronous asynchronous personal help reference interview issue of book on loan fact-to-face interactions use of opacs database search video conferencing telephone notice boards post-it notes memos documents for study social information filtering e-mail, voicemail distance learning postal services figure �. library success: a best practices wiki (http://www. libsuccess.org/) article title | author 29wikis in libraries | bejune 29 years. an alternative method would have been to search the web archive of each list, but the author found it easier to search within his mail client, microsoft outlook. messages with the word “wiki” were found in 513 mes sages: 354 in web4lib, 91 in dig_ref, and 68 in libref l. this approach had high recall, as discourse about wikis frequently included the use of the word “wiki,” though low precision, as there were many results that were not about wikis used in libraries. common false hits included messages about the nature study (giles 2005) that com pared wikipedia to encyclopedia britannica, and messages that included the word “wiki” but were simply refer ring to wikis, though not examples of wikis used within libraries. from the list of 513 messages, the author read each message and came up with a much shorter list of thirtynine messages about wikis in libraries: thirtytwo in web4lib, three in dig_ref, and four in librefl. ■ results classification of the results after all wiki examples had been collected, it became clear that there was a way to classify the results. in farkas’s (2006) presentation about wikis, she organized wikis in two categories: (1) how libraries can use wikis with their patrons; and (2) how libraries can use wikis for knowledge sharing and collaboration. this schema, while it accounts for two types of collaboration, is not granular enough to represent the types of collaboration found within the wiki examples identified. as such, it became clear that another schema was needed. twidale and nichols (1998) identified three types of collaboration within libraries: (1) collaboration among library staff; (2) collaboration between a patron and a member of staff; and (3) collaboration among library users. their classification schema mapped well to the examples of wikis that were identified; however, it too was not granular enough, as it did not distinguish among col laboration between library staff intraorganizationally and extraorganizationally, the two most common types of wiki usage found in the research (see appendix). to account for these types of collaboration, which are common not only to wiki use in libraries but to all professional library prac tice, the author modified twidale and nichols schema (see figure 6). the improved schema also uniformly represents entities across the categories—library staff and member of staff are referred to as “library staff,” and patrons and library users are referred to as “patrons.” examples of wikis used in libraries for each category are provided to better illustrate the proposed classifica tion schema. ■ collaboration among libraries the library instruction wiki (http://instructionwiki .org/main_page) is an example of a wiki that is used for collaboration among libraries (figure 7). it appears as though the wiki was originally set up to support library instruction within oregon—it is unclear if this was asso ciated with a particular type of library, say academic or public—but now the wiki supports library instruction in general. the wiki is selfdescribed as: a collaboratively developed resource for librarians involved with or interested in instruction. all librarians and others interested in library instruction are welcome and encouraged to contribute. the tagline for the wiki is “stop reinventing the wheel”(library instruction wiki 2006). from this wiki there figure 6. four types of collaboration within libraries 1. collaboration among libraries (extra-organizational) 2. collaboration among library staff (intra-organizational) 3. collaboration among library staff and patrons 4. collaboration among patrons figure 5. wiki world (http://www.libsuccess.org/index.php?title=wiki _world) 30 information technology and libraries | september 200730 information technology and libraries | september 2007 is a list of library instruction resources that include the fol lowing: handouts, tutorials, and other resources to share; teaching techniques, tips, and tricks; classspecific web sites and handouts; glossary and encyclopedia; bibliography and suggested reading; and instructionrelated projects, brainstorms, and documents. within the handouts, tutori als, and other resources to share section, the author found a wide variety of resources from libraries across the country. similarly, there were a number of suggestions to be found under the teaching techniques, tips, and tricks section. another example of a wiki used for collaboration among libraries is the library success wiki (http://www .libsuccess.org/), one of the sources of examples of wikis used in this research. adding to earlier descriptions of this wiki as presented in this paper, library success seems to be one of the most frequently updated library wikis and perhaps the most comprehensive in its cover age of library topics. ■ collaboration among library staff the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) is an example of a wiki used for col laboration among library staff (figure 8). this wiki is a knowledge base containing more than one thousand infor mation technology services (its) documents. its docu ments support the information technology needs of the library organization. examples include answers to com monly asked questions, user manuals, and instructions for a variety of computer operations. in addition to being a repository of its documents, the wiki also serves as a portal to other wikis within the university of connecticut libraries. there are many other wikis connected to library units; teams; software applications, such as the libraries ils; libraries within the university of connecticut libraries; and other university of connecticut campuses. the health science library knowledge base, stony brook university (http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome) is another example of a wiki that is used for collaboration among library staff (figure 9). the wiki is described as “a space for the dynamic collaboration of the library staff, and a platform of shared resources” (health sciences library 2007). on the wiki there are the following content areas: news and announcements; hsl departments; projects; trouble shooting; staff training resources, working papers and support materials; and community activities, scholarship, conferences, and publications. ■ collaboration among library staff and patrons there are only a few examples of wikis used for collabora tion among library staff and patrons to cite as exemplars. one example is the st. joseph county public library (sjpl) subject guides (http://www.libraryforlife.org/ subjectguides/index.php/main_page), seen in figure 10. this wiki is a collection of resources and services in print and electronic formats to assist library patrons with subject area searching. as the wiki is published by library staff for public consumption, it has more of a professional feel than wikis from the first two categories. pages have images, and the content is structured to look like a standard web page. though the wiki looks like a web page, there still remain a number of edit links that follow each section of text on the wiki. while these tags bear importance for those editing figure 7. library instruction wiki (http://instructionwiki.org/) figure �. the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) article title | author 31wikis in libraries | bejune 31 the wiki—library staff only in this case—they undoubtedly puzzle library patrons who think that they have the ability to edit the wiki when, in fact, they do not. another example of collaboration between library staff and patrons that takes a similar approach is the usc aiken gregggraniteville library web site (http://library. usca.edu/) in figure 11. as with the sjpl subject guides, this wiki looks more like a web site than a wiki. in fact, the usc aiken wiki conceals its true identity as a wiki even more so than the sjpl subject guides. the only evidence that the web site is a wiki is a link at the bottom of each page that says “powered by pmwiki.” pmwiki (http:// pmwiki.org/) is a content management system that uti lizes the wiki technology on the back end to manage a web site while retaining the look and feel of a standard web site. it seems that the benefits of using a wiki in such a way are shared content creation and management. ■ collaboration among patrons as there are only three examples of wikis used for col laboration among patrons, all examples will be high lighted in this section. the first example is wiki worldcat (http://www.oclc.org/productworks/wcwiki.htm), sponsored by oclc. wiki worldcat launched as a pilot project in september 2005. the service allows users of open worldcat, oclc’s web version of worldcat, to add book reviews to item records. though this wiki does not have many book reviews in it, even for contemporary bestsellers, it gives a taste for how a wiki could be used to facilitate collaboration among patrons. a second example is the biz wiki from ohio university libraries (http://www.library.ohiou.edu/subjects/ bizwiki/index.php/main_page) (see figure 12). the biz wiki is a collection of business information resources avail able through ohio university. the wiki was created by chad boeninger, reference and instruction librarian, as an alternate form of a subject guide or pathfinder. what separates this wiki from those in the third category, collaboration among library staff and patrons, is that the wiki is editable by patrons as well as librarians. similarly, butler wikiref (http://www .seedwiki.com/wiki/butler_wikiref) is a wiki that has reviews of reference resources created by butler librarians, faculty, staff, and students (see figure 13).figure 9. health sciences library knowledge base (http://appdev .hsclib.sunysb.edu/twiki/bin/view/main/webhome) figure 11. usc aiken gregg-graniteville library (http://library.usca .edu/) figure 10. sjcpl subject guides (http://libraryforlife.org/subject guides/index.php/main_page/) 32 information technology and libraries | september 200732 information technology and libraries | september 2007 full results thirtythree wikis were identified. two wikis were classi fied in two categories each. the full results are available in the appendix. table 1 illustrates how wikis were not uniformly distributed across the four categories: category i had 45.7 percent, category ii had 31.4 percent, category iii had 14.3 percent, and category iv had 8.6 percent. nearly 80 percent of all examples were found within categories i and ii. as seen in some of the examples in the previous section, wikis were utilized for a variety of purposes. here is a short list of purposes for which wikis were utilized: for sharing information, supporting association work, collecting soft ware documentation, supporting conferences, facilitating librariantofaculty collaboration, creating digital reposito ries, managing web content, creating intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. wiki software utilization is summarized in tables 2 and 3. mediawiki is the most popular software utilized by libraries (33.3 percent), followed by unknown (30.3 percent), pbwiki (12.1 percent), pmwiki (12.1 percent), seedwiki (6.1 percent), twiki (3 percent), and xwiki (3 percent). if the values for unknown are removed from the totals (table 3 ), mediawiki is utilized in almost half (47.8 percent) of all library wiki applications. ■ discussion with a wealth of examples of wikis in categories i and ii and a dearth of examples of wikis in categories iii and iv, the library community seems to be more comfortable using wikis to collaborate within the community, but less comfortable using wikis to collaborate with library patrons or to enable collaboration among patrons. the research results pose the questions: why are wikis pre dominantly used for collaboration within the library community? and why are wikis minimally used for col laborating with patrons and helping patrons to collabo rate with one another? why are wikis predominantly used for collaboration within the library community? this is perhaps the easier of the two questions to explain. there is a long legacy of cooperation and collaboration intraorganizationally and extraorganizationally within libraries. one explanation for this is the shared bud getary climate within libraries. all too often there are insufficient money, staff, and resources to offer desired levels of service. librarians work together to overcome these barriers. prominent examples include coopera tive cataloging, interlibrary lending, and the formation of consortia to negotiate pricing. another explanation can be found in the personal characteristics of library professionals. librarianship is a service profession that consequently attracts serviceminded individuals who are interested in helping others, whether they are library patrons or fellow colleagues. a third reason is the role of library associations, such as the international federation of library associations and institutions, the american library association, the special libraries association, and the medical library association, as well as many others at the international, national, state, and local lev figure 12. ohio university libraries biz wiki (http://www.library. ohiou.edu/subjects/bizwiki) figure 13. butler wikiref (http://www.seedwiki.com/wiki/butler_ wikiref) article title | author 33wikis in libraries | bejune 33 els, and the work that is done through these associations at annual conferences and throughout the year. libraries use wikis to collaborate intraorganizationally and extra organizationally because collaboration is what they do most naturally. why are wikis minimally used for collaborating with patrons and helping patrons to collaborate with one another? the reasons for why libraries are only minimally using wikis to collaborate with patrons and for patron collabora tion are more difficult to ascertain. however, due to the untapped potential of using wikis, the proposed answers to this question are more important and may lead to future implementations of wikis in libraries. here are four pos sible explanations, some more speculative than others. first, perhaps one of the reasons is the result of the way in which libraries are conceived by library patrons and librarians alike. a strong case can be made for libraries as places of collaborative work, and the author takes this posi tion. however, historically libraries have been repositories of information, and this remains a pervasive and difficult concept to change—libraries are frequently seen simply as places to get books. in this scenario, the librarian is a gate keeper that a patron interacts with to get a book—that is, if the patron interacts with a librarian at all. it also is worthy to note that the relationship is oneway—the patron needs the assistance of librarian, but not the other way around. viewed in these terms, this is not a collaborative situation. for libraries to use wikis for the purpose of collaborating with library patrons, it might demand the reconceptualiza tion of libraries by library patrons and librarians. similarly, this extreme conceptualization of libraries does not con sider patrons working with one another, even though it is an activity that occurs formally and informally within libraries, not to mention with the emergence of interdisci plinary and multidisciplinary work. if wikis are to be used to facilitate collaboration between patrons, the conceptual ization of the library by library patrons and librarians must be expanded. second, there may be fears within the library commu nity about authority, responsibility, and liability. libraries have long held the responsibility of ensuring the authority of the bibliographic catalog. if patrons are allowed to edit the library wiki, there is potential for negatively affecting the authority of the wiki and even the perceived author ity of the library. likewise, there is potential liability in allowing patrons to post to the library wiki. similar con table 2. software totals wiki software no. % mediawiki 11 33.3 unknown 10 30.3 pbwiki 4 12.1 pmwiki 4 12.1 seedwiki 2 6.1 twiki 1 3 xwiki 1 3 total: 33 100 table 3. software totals without unknowns wiki software no. % mediawiki 11 47.8 pbwiki 4 17.4 pmwiki 4 17.4 seedwiki 2 8.7 twiki 1 4.3 xwiki 1 4.3 total: 23 100.0 table 1. classification summary category no. % i: collaboration among libraries 16 45.7 ii: collaboration among library staff 11 31.4 iii: collaboration among library staff and patrons 5 14.3 iv: collaboration among patrons 3 8.6 total: 35 100.0 3� information technology and libraries | september 20073� information technology and libraries | september 2007 cerns have been raised in the past about other collabora tive technologies, such as blogs, bulletin boards, mailing lists, and so on, all aspects of the library 2.0 movement. if libraries are fully to realize library 2.0 as described by casey and savastinuk (2006), miller (2006), and courtney (2007), these issues must be considered. third, perhaps it is due to a matter of fit. it might be the case that wikis are utilized in categories i and ii and not within categories iii and iv because the tools are better suited to support the types of activities within categories i and ii. consider some of the activities listed earlier: sup porting association work, collecting software documenta tion, supporting conferences, creating digital repositories, creating intranets, and creating knowledge bases. each of these illustrates a wiki that is utilized for the creation of a resource with multiple authors and readers, tasks that are wellsuited to wikis. wikipedia is a great example of a wiki with clear, shared tasks for multiple authors and multiple readers and a sense of persistence over time. in contrast, relationships between library staff and patrons do not typically lead to the shared creation of resources. while it is true that the relationship between patron and librarian in the context of a patron’s research assignment can be collab orative depending on the circumstances, authorship is not shared but is possessed by the patron. in addition, research assignments in the context of undergraduate coursework are shortlived and seldom go beyond the confines of a particular course. in terms of patrons working together with other patrons, there is the precedent of group work; however, groups often produce projects or papers that share the characteristics of nongroup research assignments listed above. this, of course, does not mean that wikis are not suitable for collaboration within categories iii and iv, but perhaps the opportunities for collaboration are fewer or that they stretch the imagination of the types and ways of doing collaborative work. fourth, perhaps it is a matter of “not yet.” while the research has shown that libraries are not utilizing wikis in categories iii and iv, this may be because it is too soon. it should be noted that wikis are still new technologies. it might be the case that librarians are experimenting in safer contexts so they will gain experience prior to trying more public projects where their expertise will be needed. if this explanation is true, it is expected that more exam ples of wikis in libraries will soon emerge. as they do, the author hopes that all examples of wikis in libraries, new and old, will be added to the companion wiki to this article, librarywikis (http://librarywikis.pbwiki.com/). ■ conclusion it appears that wikis are here to stay, and that their utili zation within libraries is only just beginning. this article documented the current practice of wikis used in libraries using cscw as a framework for discussion. the author located examples of wikis in three places: within the lis lit erature, on the library success wiki, and within messages from three professional electronic discussion lists. thirty three examples of wikis were identified and classified using a classification schema created by the author. the schema has four categories: (1) collaboration among librar ies; (2) collaboration among library staff; (3) collaboration among library staff and patrons; and (4) collaboration among patrons. wikis were used for a variety of purposes, including for sharing information, supporting associa tion work, collecting software documentation, supporting conferences, facilitating librariantofaculty collaboration, creating digital repositories, managing web content, creat ing intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. by and large, wikis were primarily used to support collaboration among library staff intraorganiza tionally and extraorganizationally, with nearly 80 percent (45.7 percent and 31.4 percent respectively) of the examples so identified, and less so in the support of collaboration among library staff and patrons (14.3 percent) and col laboration among patrons (8.6 percent). a majority of the examples of wikis utilized the mediawiki software (47.8 percent). it is clear that there are plenty of examples of wikis utilized in libraries, and more to be found each day. it is at this time that the profession is faced with extending the use of this technology, and it is to the future to see how wikis will continue to be used within libraries. works cited ackerman, mark s. 2002. the intellectual challenge of cscw: the gap between social requirements and technical feasibil ity. in human-computer interaction in the new millennium, ed. john m. carroll, 179–203. new york: addisonwesley. balraj, leela, et al. 2005 librefl. kent state university librar ies. http://www.library.kent.edu/page/10391 (accessed june 12, 2007). archive is available at this link as well. bannon, liam j., and kjeld schmidt. 1991. cscw: four charac ters in search for a context. in studies in computer supported cooperative work. ed. john m. bowers and steven d. benford, 3–16. amsterdam: elsevier. casey, michael e., and laura c. savastinuk. 2006. library 2.0. library journal 131, no. 14: 40–42. http://www.libraryjournal. com/article/ca6365200.html (accessed june 12, 2007). courtney, nancy. 2007. library 2.0 and beyond: innovative technologies and tomorrow’s user (in press). westport, conn.: libraries unlimited. dix, alan, et al. 2004. socioorganizational issues and stake holder requirements. in human computer interaction, 3rd ed., 450–74. upper saddle river, n.j.: prentice hall. dourish, paul. 2001. social computing. in where the action is: the foundations of embodied interaction, 55–97. cambridge, mass: mit pr. article title | author 35wikis in libraries | bejune 35 farkas, meredith. 2006. wiki world. http://www.libsuccess. org/index.php?title=wiki_world (accessed june 12, 2007). giles, jim. 2005. internet encyclopaedias go head to head. nature 438: 900–01. http://www.nature.com/nature/journal/v438/ n7070/full/438900a.html (accessed june 12, 2007). greif, irene, ed. 1988. computer supported cooperative work: a book of readings. san mateo, calif.: morgan kaufmann publishers. health sciences library, state university of new york, stony brook. 2007. health sciences library knowledge base. http://appdev.hsclib.sunysb.edu/twiki/bin/view/main/ webhome (accessed june 12, 2007). johansen, robert, et al. 1988. groupware: computer support for business teams. new york: free press. library instruction wiki. 2006. http://instructionwiki.org/ main_page (accessed june 12, 2007). miller, paul. 2006. coming together around library 2.0. dlib magazine 12, no. 4. http://www.dlib.org/dlib/april06/ miller/04miller.html (accessed june 12, 2007). nichols, david m., and michael b. twidale. 1999. com puter supported cooperative work and libraries. vine 109: 10–15. http://www.comp.lancs.ac.uk/computing/research/ cseg/projects/ariadne/docs/vine.html (accessed june 12, 2007). olson, gary m., and judith s. olson. 2002. groupware and com putersupported cooperative work. in the human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ed. julie a. jacko and andrew sears, 583–95. mahwah, n.j.: lawrence erlbaum associates, inc.. rodden, tom t. 1991. a survey of cscw systems. interacting with computers 3, no. 3: 319–54. sachs, patricia. 1995. transforming work: collaboration, learn ing, and design. communications of the acm 38: 227–49. sánchez, j. alfredo. 2001. hci and cscw in the context of digi tal libraries. in chi ‘01 extended abstracts on human factors in computing systems. conference on human factors in computing systems. seattle, wash., mar. 31–apr. 5 2001. schmidt, kjeld, and liam j. bannon. 1992. taking cscw seri ously: supporting articulation work. computer supported cooperative work 1, no. 1/2: 7–40. shneiderman, ben, and catherine plaisant. 2005. collaboration. in designing the user interface: strategies for effective humancomputer interaction, 4th ed., 408–50. reading, mass.: addison wesley. tennant, roy. 2005. web4lib electronic discussion. webjunc tion.org. http://lists.webjunction.org/web4lib/ (accessed june 12, 2007). archive is available at this link as well. twidale, michael b., et al. 1997. collaboration in physical and digital libraries. report no. 64, british library research and innovation centre. http://www.comp.lancs.ac.uk/ computing/research/cseg/projects/ariadne/bl/report/ (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998a. using studies of collaborative activity in physical environments to inform the design of digital libraries. technical report cseg/11/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/cscw98.html (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998b. a survey of applications of cscw for digital libraries. technical report cseg/4/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/survey.html (accessed june 12, 2007). webjunction. n.d. dig_ref electronic discussion list. http:// www.vrd.org/dig_ref/dig_ref.shtml (accessed june 12, 2007). wikipedia. 2007a. wiki. http://en.wikipedia.org/wiki/wiki (accessed april 29, 2007). wikipedia. 2007b. wikiwikiweb. http://en.wikipedia.org/ wiki/wikiwikiweb (accessed april 29, 2007). 36 information technology and libraries | september 200736 information technology and libraries | september 2007 appendix. wikis in libraries i = collaboration between libraries ii = collaboration between library staff iii = collaboration between library staff and patrons iv = collaboration between patrons category description location wiki software i library success: a best practices wiki—a wiki capturing library success stories. covers a wide variety of topics. also features a presentation about wikis http://www.libsuccess. org/index.php?title=wiki_world http://www.libsuccess.org/ mediawiki i wiki for school library association in alaska http://akasl.pbwiki.com/ pbwiki i wiki to support reserves direct. free, opensource software for managing academic reserves materials developed by emory university. http://www.reservesdirect.org/ wiki/index.php/main_page mediawiki i sunyla new tech wiki—a place for state university of new york (suny) librarians to share how they are using information technologies to interact with patrons http://sunylanewtechwiki.pbwiki. com/ pbwiki i wiki for librarians and faculty members to collaborate across campuses. being used with distance learning instructors and small groups message from robin shapiro. on [dig_ref] electronic discussion list dated 10/18/2006. unknown i discusses setting up three wikis in last month: “one to sup port a preconference workshop, another for behindthe scenes conferences planning by local organizers, and one for conference attendees to use before they arrived and during the sessions” (30). fichter, darlene. 2006. using wikis to support online collaboration in libraries. information outlook 10, no.1: 3031. unknown i unofficial wiki to the american library association 2005 annual conference http://meredith.wolfwater.com/ wiki/index.php?title=main_page mediawiki i unofficial wiki to the 2005 internet librarian conference http://ili2005.xwiki.com/xwiki/bin/ view/main/webhome xwiki i wiki for the canadian library association (cla) 2005 annual conference http://wiki.ucalgary.ca/page/cla mediawiki i wiki for south carolina library association http://www.scla.org/governance/ homepage pmwiki i wiki set up to support national discussion about institutional repositories in new zealand http://wiki.tertiary.govt.nz/ ~institutionalrepositories pmwiki i the oregon library instruction wiki used for sharing infor mation about library instruction http://instructionwiki.org/ mediawiki i personal repositories online wiki environment (prowe)— an online repository sponsored by the open university and the university of leicester that uses wikis and blogs to encourage the open exchange of ideas across communities of practice http://www.prowe.ac.uk/ unknown article title | author 37wikis in libraries | bejune 37 category description location wiki software i lis wiki—space for collecting articles and general informa tion about library and information science http://liswiki.org/wiki/main_page mediawiki i making of modern michigan—a wiki to support a statewide digital library project http://blog.lib.msu.edu/mmmwiki/ index.php/main_page unknown (behind firewall) i wiki used as a web content editing tool in a digital library initiative sponsored by emory university, the university of arizona, virginia tech, and the university of notre dame http://sunylanewtechwiki.pbwiki .com/ pbwiki ii wiki at suny stony brook health sciences library used as knowledge base http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome; presentation can be found at: http:// ms.cc.sunysb.edu/%7edachase/ wikisinaction.htm twiki ii wiki at york university used internally for committee work. exploring how to use wikis as a way to collaborate with users message from mark robertson. on web4lib electronic discussion list dated 10/13/2006. unknown ii wiki for internal staff use at the university of waterloo. they utilize access control to restrict parts of the wiki to groups message from chris gray. on web4lib electronic discussion list dated 08/09/2006. unknown ii wiki at the university of toronto for internal communica tions, technical problems, and as a document repository message from stephanie walker. on librefl electronic discussion list dated 10/28/2006. unknown ii wiki used for coordination and organization of portable professor program, which appears to be a collaborative infor mation literacy program for remote faculty http://tfppcommittee.pbwiki.com/ pbwiki ii the university of connecticut libraries’ staff wiki which is a repository of information technology services documents http://wiki.lib.uconn.edu/wiki/ main_page mediawiki ii wiki used at binghamton university libraries for staff intranet. features pages for committees, documentation, policies, newsletters, presentations, and travel reports screenshots can be found at http://library.lib.binghamton.edu/ presentations/cil2006/cil%202006 _wikis.pdf mediawiki ii wiki used at the information desk at miami university described in: withers, rob. “something wiki this way comes.” c&rl news 66, no. 11 (2005): 775–77. unknown ii use of wiki as knowledge base to support reference service http://oregonstate.edu/~reeset/ rdm/ unknown ii university of minnesota libraries staff web site in wiki form https://wiki.lib.umn.edu/ pmwiki ii wiki used to support the mit engineering and science libraries bteam. the wiki may no longer be active, but is still available http://www.seedwiki.com/wiki/b team seedwiki iii a wiki that is subject guide at st. joseph county public library in south bend, indiana http://www.libraryforlife.org/ subjectguides/index.php/main_page mediawiki 3� information technology and libraries | september 20073� information technology and libraries | september 2007 category description location wiki software iii wiki used at the aiken library, university of south carolina as a content management system (cms) http://library.usca.edu/main/ homepage pmwiki iii doucette library of teaching resources wiki—a repository of resources for education students http://wiki.ucalgary.ca/page/ doucette mediawiki iv wiki worldcat (wikid) is an oclc pilot project (now defunct) that allowed users to add reviews to open worldcat records http://www.oclc.org/product works/wcwiki.htm unknown iii and iv wikiref lists reviews of reference resources—databases, books, web sites, etc. —created by butler librarians, faculty, staff, and students. http://www.seedwiki.com/wiki/ butler_wikiref; reported in matthies, brad, jonathan helmke, and paul slater. using a wiki to enhance library instruction. indiana libraries 25, no. 3 (2006): 32–34. seedwiki iii and iv wiki used as a subject guide at ohio university http://www.library.ohiou.edu/sub jects/bizwiki/index.php/main_page; presentation about the wiki: http://www.infotoday.com/cil2006/ presentations/c101102_boeninger .pps mediawiki evaluation and comparison of discovery tools: an update f. william chickering and sharon q. yang information technology and libraries | june 2014 5 abstract selection and implementation of a web-scale discovery tool by the rider university libraries (rul) in the 2011–2012 academic year revealed that the endeavor was a complex one. research into the state of adoption of web-scale discovery tools in north america and the evolution of product effectiveness provided a good starting point. in the following study, we evaluated fourteen major discovery tools (three open source and ten proprietary), benchmarking sixteen criteria recognized as the advanced features of a “next generation catalog.” some of the features have been used in previous research on discovery tools. the purpose of the study was to evaluate and compare all the major discovery tools , and the findings serve to update librarians on the latest developments and user interfaces and to assist them in their adoption of a discovery tool. introduction in 2004, the rider university libraries’ (rul) strategic planning process uncovered a need to investigate federated searching as a means to support rese arch. a tool was needed to search and access all journal titles available to rul users at that time, including 12,000+ electronic full-text journals. lacking the ability to provide relevancy ranking due to its real-time search operations, as well as the cost of the products then available, the decision was made to defer implementation of federated search. monitoring developments yearly revealed no improvements strong enough to adopt the approach. by 2011, the number of electronic full-text journals had increased to 51,128, and by this time federated search as a concept had metamorphosed into web -scale discovery. clearly, the time had come to consider implementing this more advanced approach to searching the ever-growing number of journals available to our clients. though rul passed on federated searching, viewing it as too cumbersome to serve our students well, we anticipated the day when improved systems would emerge. vaughn nicely describes the ability of more highly evolved discovery systems to “provide qu ick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content.” 1 yang and hofmann anticipated the emergence of web-scale discovery with their evaluation of next generation catalogs. 2,3 by 2011, informed by yang and hofmann’s research, we believed that the systems in the marketplace were sufficiently evolved to make our efforts at assessing available systems worthwhile. this coincided nicely with an important objective in our strategic plan : f. william chickering (chick@rider.edu) is dean of university libraries, rider university, lawrenceville, new jersey. sharon q. yang (yangs@rider.edu) is associate professor–librarian at moore library, rider university, lawrenceville, new jersey. mailto:chick@rider.edu mailto:yangs@rider.edu evaluation and comparison of discovery tools: an update | chickering and yang 6 investigate link resolvers and discovery tools for federated searching and opac by summer 2011. heeding alexander pope’s advice to “be not the first by whom the new are tried, nor yet the last to lay the old aside,”4 we set about discovering what systems were in use throughout north america and which features each provided. some history in 2006, antelman, lynema, and pace observed that “library catalogs have represented stagnant technology for close to twenty years.” better technology was needed “to leverage the rich metadata trapped in the marc record to enhance collection browsing. the promise of online catalogs has never been realized. for more than a decade, the profession either turned a blind eye to problems with the catalog or accepted that it is powerless to fix them.” 6 dissatisfaction with catalog search tools led us to review the vufind discovery tool. while it had some useful features (spelling, “did you mean?” suggestions), it still suffered from inadequacies in full-text search and the cumbersome nature of searcher-designated boolean searching. it did not work well in searching printed music collections and, of course, only served as a catalog front end. with this all in mind, rul developed a set of objectives to improve information access for clients: • to provide information seekers with • an easy search option for academically valid information materials • an effective search option for academically valid information materials • a reliable search option for academically valid information materials across platforms • to recapture student academic search activity from google • to attempt revitalizing the use of monographic collections • to provide an effective mechanism to support offerings of e -books • to build a firm platform for appropriate library support of distance learning coursework literature review marshall breeding first discussed broad based discovery tools in 2005, shortly after the launch of google scholar. he posits that federated search could not compete with the power and speed of a tool like google scholar. he proclaims the need for, as he describes it, a “centralized search model.”7 information technology and libraries | june 2014 7 building on breeding’s observations four years earlier, diedrichs astutely observe d in 2009 that “user expectations for complete and immediate discovery and delivery of information have been set by their experiences in the web2.0 world. libraries must respond to the needs of those users whose needs can easily be met with google-like discovery tools, as well as those that require deeper access to our resources.”10 in that same year, dolski described the common situation in many academic libraries when in reference to the university of nevada las vegas (unlv) library he states, “our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation.” 11 this perfectly described the way we had come to view rul’s situation. in 2010, breeding reviewed the systems in the market, noting that these are not just nextgeneration catalogs. he stressed “equal access to content in all forms,” a concept we now take for granted. a key virtue in discovery tools, he notes, is the “blending of the full text of journal articles and books alongside citation data, bibliographic, and authority records resulting in a powerful search experience. rather than being provided a limited number of access points selected by catalogers, each word and phrase within the text becomes a possible point of retrieval.” breeding further points out that: “web-scale discovery platforms will blur many of the restrictions and rules that we impose on library users. rather than having to explain to a user that the library catalog lists books and journal titles but not journal articles, users can simply begin with the concept, author, or title of interest and straightaway begin seeing results across the many formats within the library’s collection.”12 working with freshmen at rider university revealed that they are ahead of the professionals in approaching information this way, and we believed that web-scale discovery tools could help our users. as we began the process of selecting a discovery tool, we looked at the experiences of others. fabbi at the university of nevada las vegas (unlv) folded in a strong component of organizational learning in a highly structured manner that was unnecessary at rider. 13 no information was disclosed on the process of selecting a discovery vendor, though the website reveals the presence of a discovery tool (http://library.nevada.edu/). in contrast, many librarians at rider explored a variety of libraries’ application of search tools. following hofmann and yang’s work, a process of vendor demonstrations and analysis of feasibility led to a trial of ebsco discovery service. what we hoped for is what way at grand valley state reported in 2010 of his analysis of serials solutions’ summon: http://library.nevada.edu/ evaluation and comparison of discovery tools: an update | chickering and yang 8 an examination of usage statistics showed a dramatic decrease in the use of traditional abstracting and indexing databases and an equally dramatic increase in the use of full text resources from full text database and online journal collections. the author concludes that the increase in full text use is linked to the implementation of a web‐scale discovery tool.14 method understanding both rul’s objectives and the state of the art as reflected in the literature, we concluded that an up-to-date review of discovery tool adoptions was in order before moving forward in the process of selecting a product. 1. the resulting study included these steps: (1) compiling a list of all the major discovery tools, (2) developing a set of criteria for evaluation, (3) examining between four to seven websites where a discovery tool was deployed and evaluating each tool against each criteria, (4) recording the findings, and (5) analyzing the data. the targeted population for the study included all the major discovery tools in use in the united states. we define a discovery tool as a library user interface independent of any library systems. a discovery tool can be used to replace the opac module of an integrated library system or liv e sideby-side with the opac. other names for discovery tools include stand -alone opac, discovery layer, or discovery user interface. lately, a discovery tool is more often called a discovery service because most are becoming subscription-based and reside remotely in a cloud-based saas (software as a service) model. the authors compiled a list of fourteen discovery tools based on marshall breeding’s “major discovery products” guide published in “library technology guides.”15 those included aquabrowser library, axiell arena, bibliocommons (bibliocore), blacklight, ebsco discovery service, encore, endeca, extensible catalog, sirsidynix enterprise, primo, summon, visualizer, vufind, and worldcat local. two open-source discovery layers, sopac (the social opac) and scriblio, were excluded from this study because very few libraries are using them. for evaluation in this study, academic libraries were preferred over public libraries during the sample selection process. however, some discovery tools , such as bibliocommons, were more popular among public libraries. therefore examples of public library websites were included in the evaluation. the sites that made the final list were chosen either from the vendor’s website that maintained a customer list or breeding’s “library technology guides.”16 the following is the final list of libraries whose implementations were used in the study. information technology and libraries | june 2014 9 example library sites with proprietary discovery tools: aquabrowser (serials solutions) 1. allen county public library at http://smartcat.acpl.lib.in.us/ 2. gallaudet university library at http://discovery.wrlc.org/?skin=ga 3. harvard university at http://lib.harvard.edu/ 4. norwood young america public library at http://aquabrowser.carverlib.org/ 5. selco southeastern libraries cooperating at http://aquabrowser.selco.info/?c_profile=far 6. university of edinburgh (uk) at http://aquabrowser.lib.ed.ac.uk/ axiell arena (axiell) 1. doncaster council libraries (uk) at http://library.doncaster.gov.uk/web/arena 2. lerums bibliotek (lerums library, sweden) at http://bibliotek.lerum.se/web/arena 3. london libraries consortium-royal kingston library (uk) at http://arena.yourlondonlibrary.net/web/kingston 4. norddjurs (denmark) at https://norddjursbib.dk/web/arena/ 5. north east lincolnshire libraries (uk) at http://library.nelincs.gov.uk/web/arena 6. someron kaupunginkirjasto (finland) at http://somero.verkkokirjasto.fi/web/arena 7. syddjurs (denmark) at https://bibliotek.syddjurs.dk/web/arena1 bibliocore (bibliocommons) 1. halton hills public library at http://hhpl.bibliocommons.com/dashboard 2. new york public library at http://nypl.bibliocommons.com/ 3. oakville public library at http://www.opl.on.ca/ 4. princeton public library at http://princetonlibrary.bibliocommons.com/ 5. seattle public library at http://seattle.bibliocommons.com/ 6. west perth (australia) public library at http://wppl.bibliocommons.com/dashboard 7. whatcom county library system at http://wcls.bibliocommons.com/ ebsco discovery service/eds (ebsco) 1. aston university (uk) at http://www1.aston.ac.uk/library/ 2. columbia college chicago library at http://www.lib.colum.edu/ 3. loyalist college at http://www.loyalistlibrary.com/ 4. massey university (new zealand) at http://www.massey.ac.nz/massey/research/library/library_home.cfm 5. rider university at http://www.rider.edu/library 6. santa rosa junior college at http://www.santarosa.edu/library/ 7. st. edward's university at http://library.stedwards.edu/ encore (innovative interfaces) http://smartcat.acpl.lib.in.us/ http://discovery.wrlc.org/?skin=ga http://lib.harvard.edu/ http://aquabrowser.carverlib.org/ http://aquabrowser.selco.info/?c_profile=far http://aquabrowser.lib.ed.ac.uk/ http://library.doncaster.gov.uk/web/arena http://bibliotek.lerum.se/web/arena http://arena.yourlondonlibrary.net/web/kingston https://norddjursbib.dk/web/arena/ http://library.nelincs.gov.uk/web/arena http://somero.verkkokirjasto.fi/web/arena https://bibliotek.syddjurs.dk/web/arena1 http://hhpl.bibliocommons.com/dashboard http://nypl.bibliocommons.com/ http://www.opl.on.ca/ http://princetonlibrary.bibliocommons.com/ http://seattle.bibliocommons.com/ http://wppl.bibliocommons.com/dashboard http://wcls.bibliocommons.com/ http://www1.aston.ac.uk/library/ http://www.lib.colum.edu/ http://www.massey.ac.nz/massey/research/library/library_home.cfm http://www.rider.edu/library http://www.santarosa.edu/library/ http://library.stedwards.edu/ evaluation and comparison of discovery tools: an update | chickering and yang 10 1. adelphi university at http://libraries.adelphi.edu/ 2. athens state university library at http://www.athens.edu/library/ 3. california state university at http://coast.library.csulb.edu/ 4. deakin university (australia) at http://www.deakin.edu.au/library/ 5. indiana state university at http://timon.indstate.edu/iii/encore/home?lang=eng 6. johnson and wales university at http://library.uri.edu/ 7. st. lawrence university at http://www.stlawu.edu/library/ endeca (oracle) 1. john f. kennedy presidential library and museum at http://www.jfklibrary.org/ 2. north caroline state university at http://www.lib.ncsu.edu/endeca/ 3. phoenix public library at http://www.phoenixpubliclibrary.org/ 4. triangle research libraries network at http://search.trln.org/ 5. university of technology, sydney (australia) at http://www.lib.uts.edu.au/ 6. university of north carolina at http://search.lib.unc.edu/ 7. university of ottawa (canada) libraries at http://www.biblio.uottawa.ca/html/index.jsp?lang=en enterprise (sirsidynix) 1. cerritos college at http://cert.ent.sirsi.net/client/cerritos 2. maricopa county community colleges at https://mcccd.ent.sirsi.net/client/default 3. mountain state university/university of charleston at http://msul.ent.sirsi.net/client/default 4. university of mary at http://cdak.ent.sirsi.net/client/uml 5. university of the virgin islands at http://uvi.ent.sirsi.net/client/default 6. western iowa tech community college at http://wiowa2.ent.sirsi.net/client/default primo (ex libris) 1. aberystwyth university (uk) at http://primo.aber.ac.uk/ 2. coventry university (uk) at http://locate.coventry.ac.uk/ 3. curtin university (australia) at http://catalogue.curtin.edu.au/ 4. emory university at http://web.library.emory.edu/ 5. new york university at http://library.nyu.edu/ 6. university of iowa at http://www.lib.uiowa.edu/ 7. vanderbilt university at http://www.library.vanderbilt.edu visualizer (vtls) 1. blinn college at http://www.blinn.edu/library/index.htm 2. edward via virginia college of osteopathic medicine at http://vcom.vtls.com:1177/ 3. george c. marshall foundation at http://gmarshall.vtls.com:6330/ 4. scugog memorial public library at http://www.scugoglibrary.ca/ http://libraries.adelphi.edu/ http://www.athens.edu/library/ http://coast.library.csulb.edu/ http://www.deakin.edu.au/library/ http://timon.indstate.edu/iii/encore/home?lang=eng http://library.uri.edu/ http://www.stlawu.edu/library/ http://www.jfklibrary.org/ http://www.lib.ncsu.edu/endeca/ http://www.phoenixpubliclibrary.org/ http://search.trln.org/ http://www.lib.uts.edu.au/ http://search.lib.unc.edu/ http://www.biblio.uottawa.ca/html/index.jsp?lang=en http://cert.ent.sirsi.net/client/cerritos https://mcccd.ent.sirsi.net/client/default http://msul.ent.sirsi.net/client/default http://cdak.ent.sirsi.net/client/uml http://uvi.ent.sirsi.net/client/default http://wiowa2.ent.sirsi.net/client/default http://primo.aber.ac.uk/primo_library/libweb/action/search.do?dscnt=1&dstmp=1326479965873&vid=aberu_vu1&fromlogin=true http://locate.coventry.ac.uk/primo_library/libweb/action/search.do?dscnt=1&fromlogin=true&dstmp=1326480439550&vid=cov_vu1&fromlogin=true http://catalogue.curtin.edu.au/primo_library/libweb/action/search.do?dscnt=0&dstmp=1326480547980&vid=cur&fromlogin=true http://web.library.emory.edu/ http://library.nyu.edu/ http://www.lib.uiowa.edu/ http://www.library.vanderbilt.edu/ http://www.blinn.edu/library/index.htm http://vcom.vtls.com:1177/ http://gmarshall.vtls.com:6330/ http://www.scugoglibrary.ca/ information technology and libraries | june 2014 11 summon (serials solutions) 1. arizona state university at http://lib.asu.edu/ 2. dartmouth college at http://dartmouth.summon.serialssolutions.com/ 3. duke university at http://library.duke.edu/ 4. florida state university at http://www.lib.fsu.edu/ 5. liberty university at http://www.liberty.edu/index.cfm?pid=178 6. university of sydney at http://www.library.usyd.edu.au/ worldcat local (oclc) 1. boise state university at http://library.boisestate.edu/ 2. bowie state university at http://www.bowiestate.edu/academics/library/ 3. eastern washington university at http://www.ewu.edu/library.xml 4. louisiana state university at http://lsulibraries.worldcat.org/ 5. saint john's university at http://www.csbsju.edu/libraries.htm 6. saint xavier university at http://lib.sxu.edu/home examples of open source and free discovery tools: blacklight (the university of virginia library) 1. columbia university at http://academiccommons.columbia.edu/ 2. johns hopkins university at https://catalyst.library.jhu.edu/ 3. north carolina university at http://historicalstate.lib.ncsu.edu 4. northwestern university at http://findingaids.library.northwestern.edu/ 5. stanford university at http://www-sul.stanford.edu/ 6. university of hull (uk) at http://blacklight.hull.ac.uk/ 7. university of virginia at http://search.lib.virginia.edu/ extensible catalog/xc (extensible catalog organization/carli/university of rochester) 1. demo at http://extensiblecatalog.org/xc/demo 2. extensible catalog library at http://xco-demo.carli.illinois.edu/dtmilestone3 3. kyushu university (japan) at http://catalog.lib.kyushu-u.ac.jp/en 4. spanish general state authority libraries (spain) at http://pcu.bage.es/ 5. thailand cyber university/asia institute of technology (thailand) at http://globe.thaicyberu.go.th/ vufind (villanova university) 1. auburn university at http://www.lib.auburn.edu/ 2. carnegie mellon university libraries at http://search.library.cmu.edu/vufind/search/advanced http://lib.asu.edu/ http://dartmouth.summon.serialssolutions.com/ http://library.duke.edu/ http://www.lib.fsu.edu/ http://www.liberty.edu/index.cfm?pid=178 http://www.library.usyd.edu.au/ http://library.boisestate.edu/ http://www.bowiestate.edu/academics/library/ http://www.ewu.edu/library.xml http://lsulibraries.worldcat.org/search?qt=affiliate_wcl_all&q=&wcsbtn2w.x=14&wcsbtn2w.y=9 http://www.csbsju.edu/libraries.htm http://lib.sxu.edu/home http://academiccommons.columbia.edu/ https://catalyst.library.jhu.edu/ http://historicalstate.lib.ncsu.edu/ http://findingaids.library.northwestern.edu/ http://www-sul.stanford.edu/ http://blacklight.hull.ac.uk/ http://search.lib.virginia.edu/ http://extensiblecatalog.org/xc/demo http://xco-demo.carli.illinois.edu/dtmilestone3 http://catalog.lib.kyushu-u.ac.jp/en http://pcu.bage.es/ http://globe.thaicyberu.go.th/ http://www.lib.auburn.edu/ http://search.library.cmu.edu/vufind/search/advanced evaluation and comparison of discovery tools: an update | chickering and yang 12 3. colorado state university at http://lib.colostate.edu/ 4. saint olaf college at http://www.stolaf.edu/library/index.cfm 5. university of michigan at http://mirlyn.lib.umich.edu 6. western michigan university at https://catalog.library.wmich.edu/vufind/ 7. yale university library at http://yufind.library.yale.edu/yufind/ the following list of criteria was used for the purpose of the evaluation. some were based on those used by the previous studies on discovery tools.17, 18, 19 the list embodied the librarians’ vision for the next-generation catalog and contained some of the most desirable features for a modern opac. the authors were aware of other desirable features for a discovery layer, and the following list was by no means the most comprehensive, but it served the purpose of the study well. 1. one-stop search for all library resources. a discovery tool should include all library resources in its search including the catalog with books and videos, journal articles in databases, and local archives and digital repository. this can be accomplished by the unified index or federated search, an essential component for a discovery tool. some of the discovery tools are described as web-scale because of their potential to search seamlessly across all library resources. 2. state-of-the-art web interface. a discovery tool should have a modern design similar to e-commerce sites, such as google, netflix, and amazon. 3. enriched content. discovery tools should include book cover images, reviews, and user driven input, such as comments, descriptions, ratings, and tag clouds. the enriched content can be either from library patrons, commercial sources, or both. 4. faceted navigation. discovery tools should allow users to narrow down the search results by categories, also called facets. the commonly used facets include locations, publication dates, authors, formats, and more. 5. simple keyword search box with a link to advanced search at the start page. a discovery tool should start with a simple keyword search box that looks like that of google or amazon. a link to the advanced search should be present. 6. simple keyword search box on every page. the simple keyword search box should appear on every page of a discovery tool. 7. relevancy. relevancy results criteria should take into consideration circulation statistics and books with multiple copies. more frequently circulated books indicate popularity and usefulness, and they should be ranked higher on the top of the display. a book of multiple copies may also be an indication of importance. http://lib.colostate.edu/ http://www.stolaf.edu/library/index.cfm http://mirlyn.lib.umich.edu/ https://catalog.library.wmich.edu/vufind/ http://yufind.library.yale.edu/yufind/ information technology and libraries | june 2014 13 8. “did you mean . . . ? spell-checking. when an error appears in the search, the discovery tool should correct the query spelling as a link so that users can simply click on it to get the search results. 9. recommendations/related materials. a discovery tool should recommend resources for readers in a similar manner to amazon or other e -commerce sites, based on transaction logs. this should take the form of “readers who borrowed this item also borrowed the following . . . ” or a link to recommended readings. it would be ideal if a discovery tool can recommend the most popular articles, a service similar to ex libris ’ bx usage-based services. 10. user contribution. user input includes descriptions, summaries, reviews, criticism, comments, rating and ranking, and tagging or folksonomies. 11. rss feeds. a modern opac should provide rss feeds. 12. integration with social networking sites. when a discovery tool is integrated with social networking sites, patrons can share links to library items with their friends on social networks like twitter, facebook, and delicious. 13. persistent links. records in a discovery tool contain a stable url capable of being copied and pasted and serving as a permanent link to that record. they are also called permanent urls. 14. auto-completion/stemming. a discovery tool is equipped with the computational algorithm that it can auto-complete the search words or supply a list of previously used words or phrases for users to choose from. google has stemming algorithms. 15. mobile compatibility. there is a difference between being “mobile compatible” and a “custom mobile website.” the former indicates a website can be viewed or used on a mobile phone, and the later denotes a different version of the user interface specially built for mobile use. in this study we include both as “yes.” 16. functional requirements for bibliographic retrieval (frbr). the latest development of rda certainly makes a discovery tool more desirable if it can display frbr relationships. for instance, a discovery tool may display and link different versions, editions or formats of a work, what frbr refers to as expressions and manifestations. for record keeping and analysis, a microsoft excel file with sixteen fields based on the above criteria was created. the authors checked the discovery tools on the websites of the selected libraries and recorded those features as present or absent. evaluation and comparison of discovery tools: an update | chickering and yang 14 rda compatibility is not used as a criterion in the study because most discovery tools allow users to add rda fields in marc. by now, all the discovery tools should be able to display, index, and search the new rda fields. findings one stop searching for all library resources—this is the most desirable feature when acquiring a discovery tool. unfortunately it also presented the biggest challenge for vendors. both librarians and vendors have been struggling with this issue for the past several years, yet no one has worked out a perfect solution. based on the examples the authors examined, this study found that only five out of fourteen discovery tools can retrieve articles from databases along with books, videos, and digital repositories. those include ebsco discovery service, encore, p rimo, summon, and worldcat local. whereas encore uses an approach similar to federated search performing live searches of databases, the other discovery tools build a single unified index. while the single unified index requires the libraries to send their catalog data and local information to the vendor for update and thus the discovery tools may fall behind in reflecting up to the minute accuracy in local holdings, federated search does real-time searching and does not lag behind in displaying current information. both approaches are limited in what they cover. both need permission from content providers for inclusion in the unified index or to develop a connection to article databases for real-time searching. for those discovery tools that do not have their own unified index or real-time searching capability, they provide web-scale searching through other means. for instance, vufind has developed connectors to application programming interfaces (apis) by serials solutions or oclc to pull search results from summon and worldcat local. encore not only developed its own realtime connection to electronic databases but is enhancing its web -scale search by incorporating the unified index from other discovery tools such as the ebsco discovery service. aquabrowse r is augmented by 360 federated search for the same purpose. despite of those possibilities, the authors did not find the article level retrieval in the sample discovery tools other than the main five mentioned above. comparing the coverage of each tools’ web-scale index can be challenging. ebsco, summon, and worldcat local publicize their content coverage on the web while primo and encore only share this information with their customers. this makes it hard to compare and evaluate the content coverage without contacting vendors and asking for that information. at present, none of the five discovery tools (ebsco discovery service, encore, primo, summon, and worldcat local) can boast 100% coverage of all library resources. in fact, none of the internet search engines, including google or google scholar, can retrieve 100% of all resources. therefore web -scale searching is more a goal than a possibility. apart from political and economic reasons, this is in part due to the nonbibliographic structure of the contents in databases such as scifinder and some others. one stop searching is still a work in progress because discovery tools provide students with a quick and simple way to retrieve a large number, but still an incomplete list of resources held by a information technology and libraries | june 2014 15 library. for more in-depth research, students are still encouraged to search the catalog, discipline specific databases, and digital repositories separately. state of the art interface—all the discovery tools are very similar in appearance to amazon.com. some are better than others. this study did not rate each discovery tool based on a scale and thus did not distinguish their fine degrees in appearance. rather each discovery tool is given a “yes” or “no.” the designation was based on subjective judgment. all the discovery tools received “yes” because they are very similar in appearance. enriched content—all the discovery tools have embedded book cover images or video jacket images, but some have displayed more, such as ratings and rankings, user -supplied or commercially available reviews, overviews, previews, comments, descriptions, title discussion, excerpts, or age suitability, just to name a few. a discovery tool may display enriched content by default out of box, but some may need to be customized to include it. the following is a list of enriched content implemented in each discovery tool that the authors found in the sample. the number in the last column indicates how many types of enriched content were found in the discovery tool at the time of the study. bibliocommons and aquabrowser stand out from the rest and made the top two on the list based on the number of enriched content from noncataloging sources (see figure 1). it is debatable how much nontraditional data a discovery tool should incorporate into its display. it warrants another discussion as to how useful such data is for users. faceted navigation—faceted navigation has become a standard feature in discovery tools over the last two years. it allows users to further divide search results into subsets based on predetermined terms. facets come from a variety of fields in marc records. some discovery tools have more facets than others. the most commonly seen facets include location or collections, publication dates, formats, author, genre, and subjects. faceted navigation is highly configurable as many discovery tools allow libraries to decide on their own facets. faceted navigation has become an integral part of a discovery tool. simple keyword search box on the starting page with a link to advanced search—the original idea is to allow a library’s user interface to resemble google by displaying a simple keyword search box with a link to advanced search at the starting page. most discovery tools provide the flexibility for libraries to choose or reject this option. however, many librarians find this approach unacceptable as they feel it lacks precision in searching and thus may mislead users. as the keyword box is highly configurable and up to the library to decide how they will present it, many libraries have added a pull down menu with options to search keywords, authors, titles, and locations. in doing so, the original intention for a google like simple search box is lost. therefore only a few libraries follow the goo gle-like box style at the starting page. most libraries altered the simple keyword search box on the starting page to include a dropdown menu or radio buttons, so the simple keyword search box is neither simple nor limited to keyword search only. nevertheless, this study gave all the discovery tools a “yes.” all the systems are capable of this feature even though libraries may choose not to use it. evaluation and comparison of discovery tools: an update | chickering and yang 16 rank discovery tool enriched content total 1 bibliocommons cover images, tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and rating 11 2 aquabrowser cover images, previews, reviews, summary, excerpts, tags, author notes & sketches, full text from google, rating/ranking 9 3 enterprise cover images, reviews, google previews, summary, excepts 5 4 axiell arena cover images, tags, reviews, and title discussion 4 vufind cover images, tags, reviews, comments 4 5 primo cover images, tags, previews 3 worldcat local cover images, tags, reviews 3 6 encore cover images, tags 2 visualizer cover images, reviews 2 summon cover images, reviews 2 7 blacklight cover images 1 ebsco discovery service cover images 1 endeca cover images 1 extensible catalog cover images 1 figure 1. the ranked list of enriched content in discovery tools . simple keyword search box on every page—the feature enables a user to start a new search at every step of navigation in the discovery tool. most of the discovery tools provide such a box on the top of the screen as users navigate through the search results and record displays except extensible catalog and enterprise by sirsidynix. the feature is missing from the former while the latter almost has this feature except when displaying bib records in a pop-up box. information technology and libraries | june 2014 17 relevancy—traditionally, relevancy is uniformly based on a computer algorithm that calculates the frequency and relative position of a keyword (field weighting) in a record and displays the search results based on the final score. other factors have never been a part of the decision in the display of search results. in the discussion on next-generation catalogs, relevancy based on circulation statistics and other factors came up as a desirable possibility, and no discovery tool has met this challenge until now. primo by ex libris is the only one among the discovery tools under investigation that can sort the final results by popularity. “primo’s popularity ranking is calculated by use. this means that the more an item record has been clicked and viewed, the more popular it is.”20 even though those are not real circulation statistics, this is considered to be a revolutionary step and a departure from traditional relevancy. three years ago none of the discovery tools provided this option.21 to make relevancy ranking even more sophisticated, scholarrank, another service by ex libris, can work with primo to sort the search results not only based on a query match but also an item’s value score (its usage and number of citations) and a user’s characteristics and information needs. this shows the possibility of more advanced relevancy ranking in discovery tools. other vendors will most likely follow in the future incorporating more sophistication in their relevancy algorithms. spell checker/“did you mean . . . ?”—the most commonly observed way of correcting a misspelling in a query is, “did you mean . . . ?” but there are other variations providing the same or similar services. some of those variations are very user-friendly. the following is a list of different responses when a user enters misspelled words (see figure 2). “xxx” represents the keyword being searched. evaluation and comparison of discovery tools: an update | chickering and yang 18 discovery tools responses for misspelled search words notes acquabrowser did you mean to search: xxx, xxx, xxx? the suggested words are hyperlinks to execute new searches. axiell arena your original search for xxx has returned no hits. the fuzzy search returned n hits. automatically displays a list of hits based on fuzzy logic. “n” is a number. bibliocommons did you mean xxx (n results)? displays suggested word along with the number of results as a link. blacklight no records found. no spell checker, but possible to add by local technical team. ebsco discovery service results may also be available for xxx. the suggested word is a link to execute a new search. encore did you mean xxx? the suggested word is a link to execute a new search. endeca did you mean xxx? the suggested word is a link to execute a new search. enterprise did you mean xxx? the suggested word is a link to execute a new search. extensible catalog sorry, no results found for: xxx. no spell checker, but possible to add by local technical team. primo did you mean xxx? the suggested word is a link to execute a new search. summon did you mean xxx? the suggested word is a link to execute a new search. visualizer did you mean xxx? the suggested word is a link to execute a new search. vufind 1. no results found in this category. search alternative words: xxx, xxx, xxx. 2. perhaps you should try some spelling variation: xxx, xxx, xxx. 3. your search xxx did not match any resources. what should i do now? a list of suggestions including checking a web dictionary. 1. alternative words are links to execute new searches. 2. suggested words are links to execute new searches. 3. suggestions what to do next. worldcat local did you mean xxx? the suggested word is a link to execute a new search. figure 2. spell checker. most of the discovery tools on the list provide this feature except blacklight and extensible catalog. open-source solutions sometimes provide a framework that you add features to. this leaves many information technology and libraries | june 2014 19 possibilities for local developers to add and develop. for instance, a diction ary or spell checker may be easily installed even if a discovery tool does not come with one out of the box. this feature may be configurable. 9. recommendation—amazon has one of those search engines with a recommendation system such as “customers who bought item a also bought item b.” the ecommerce recommendation algorithms analyze the activities of shoppers on the web and build a database of buyer profiles. the recommendations are made based on shopper behavior. when this applies to the library content, it could become “readers who were interested in item a were also interested in item b .” however, most discovery tools do not have a recommendation system. instead, they have adopted different approaches. most discovery tools make recommendations from bibliographic data in marc records such as subject headings for similar items. primo is one of the few discovery tools with a recommendation system similar to those used by amazon and other internet commercial sites. its bx article recommender service is based on usage patterns collected from its link resolver, sfx. developed by ex libris, bx is an independent service that integrates with primo well, but can serve as an add-on function for other discovery tools. bx is an excellent example that discovery tools can suggest new leads and directions for scholars in their research. the authors counted all the discovery tools that provide some kind of recommendations regardless of their technological approaches using marc data or algorithms. ten out of fourte en discovery tools provide this feature in various forms (see figure 3). those include axiell arena, bibliocommons, ebsco discovery service, encore, endeca, extensible catalog, primo, summon, worldcat local, and vufind. the following are some of the recommendations found in those discovery tools. the authors did not find any recommendation in the libraries that use aquabrowser, enterpri se, visualizer, or blacklight. discovery tools language used for recommending or linking to related items axiell arena “see book recommendations on this topic” “who else writes like this?” bibliocommons “similar titles & subject headings & lists that include this title” ebsco discovery service “find similar results” encore “other searches you may try” “additional suggestions” endeca “recommended titles for. . . . view all recommended titles that match your search” evaluation and comparison of discovery tools: an update | chickering and yang 20 “more like this” extensive catalog “more like this” “searches related to . . . ” primo “suggested new searches by this author” “suggested new searches by this subject” “users interested in this article also expressed an interest in the following:” summon “search related to . . . ” worldcat local “more like this” “similar items” “related subjects” “user lists with this item” vufind “more like this” “similar items” “suggested topics” “related subjects” figure 3. language used for recommendation. some discovery tool recommendations are designed in a more user friendly manner than others. most recommendations exist exclusively for items. ideally, a discovery tool should provide an article recommendation system like ex libris’ bx usage-based service that will show users the most frequently used and most popular articles. at the time of this evaluation, no discovery tool has incorporated an article recommendation system except primo. research is needed to evaluate how patrons utilize recommendation services or if they find recommendations beneficial in discovery tools. user contribution—traditionally, bibliographic data has been safely guarded by cataloging librarians for quality control. it has been unthinkable that users would be allowed to add data to library records. the internet has brought new perspectives on this issue. half of the discovery tools (7) under evaluation provide this feature to varying degrees (see figure 4). designed primarily for public libraries, bibliocommons seems the most open to user -supplied data among information technology and libraries | june 2014 21 all the discovery tools. many other discovery tools (7) allow users to contribute tags and reviews. all the discovery tools allow librarians to censor user -supplied data before releasing it for public display. the following figure is a summary of the types of data these discove ry tools allow users to enter. ranking discovery tool user contribution 1 bibliocommons tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and ratings (10) 2 aquabrowser tags, reviews, and ratings/rankings (3) axiell arena tags, reviews, and title discussions (3) vufind tags, reviews, comments (3) 3 primo tags and reviews (2) worldcat local tags and reviews (2) 4 encore tags (1) 5 blacklight (0) endeca (0) enterprise (0) extensible catalog (0) summon (0) visualizer (0) figure 4. discovery tools based on user contribution. past research indicates that folksono mies or tags are highly useful.22 they complement librarycontrolled vocabularies, such as library of congress subject headings, and increase access to library collections. a few discovery tools allow user entered tags to form “word clouds.” the relative importance of tags in a word cloud is emphasized by font color and size. a tag list is another way to organize and display tags. in both cases , tags are hyperlinked to a relevant list of items. some tags serve as keywords to start new searches, while others narrow search results. only four discovery tools, aquabrowser, encore, primo, and worldcat local, provide both tag clouds and lists. bibliocommons provides only tag lists for the same purpose. the rest of the discovery tools do not have either. one setback of user-supplied tags for subject access is their evaluation and comparison of discovery tools: an update | chickering and yang 22 incomplete nature. they may lead users to partial retrieval of information as users add tags only to items that they have used. the coverage is not systematic and inclusive of all collections. therefore data supplied by users in discovery tools remains controversial. it is possible to seed systems with folksonomies using services like librarything for libraries, which could reduce the impact of this issue. rss feed/email alerts—this feature can automatically send a list of new library resources to users based on his or her search criteria. it can be useful for experienced researchers or frequent library users. some discovery tools may use email alerts as well. eight out of fourteen discovery tools in this evaluation provide rss feeds. those with rss feeds include aquabrowser, axiell arena, ebsco discovery service, endeca, enterprise, primo, summon, and vufind. an rss feed can be added as a plug-in in some discovery tools if it does not come as part of the base system. integration with social networking sites—as most of the college students participate in social networking sites, this feature provides an easy way to share resources among college s tudents on social networking sites. users can place the link to a resource by clicking on an icon in the discovery tool and share the resource with friends on facebook, twitter, delicious and many other social network sites. nine out of the fourteen discovery tools provide this feature. some discovery tools provide integration possibilities with many more social networking sites than others. those with this feature include aquabrowser, axiell arena, bibliocommons, ebsco discovery service, encore, endeca, primo, worldcat local, and extensible catalog. so far , the interaction between discovery tools and social networking sites is limited to sharing resources. social networking sites should be carefully evaluated for the possibility of integra ting some of their popular features into discovery tools. persistent link—this is also called permanent link or permurl. not all the links displayed in a browser location box are persistent links, therefore some discovery tools specifically provid e a link in the records for users to copy and keep. five out of fourteen discovery tools explicitly listed this link in records. those include aquabrowser, axiell arena, blacklight, ebsco discovery service, and worldcat local. the authors marked a system a s “no” when a permanent link is not prominently displayed in a discovery tool. in other words, only those discovery tools that explicitly provide a persistent link are counted as “yes.” however, the url in a browser’s location box during the display of a record may serve as a persistent link in some cases. for instance, vufind does not provide a permanent url in the record, but indicates on the project site that url in the location box is a persistent link. auto-completion/stemming—when a user types in keywords in the search box, the discovery tool will supply a list of words or phrases that she or he can choose readily. this is a highly useful feature that google excels at. stemming not only automatically completes the spelling of a keyword, but also supplies a list of phrases that point to existing items. the authors found this feature in six out of fourteen discovery tools. they include axiell arena, endeca, enterprise, extensible catalog, summon, and worldcat local. information technology and libraries | june 2014 23 mobile interface—the terms “mobile compatible” and “mobile interface” are two different concepts. a mobile interface is a simplified version of a normal browser version of a discovery tool interface so it is optimized for use on mobile phones , and the authors only counted those discovery tools that have a separate mobile interface. a discovery tool may be mobile friendly or compatible and does not necessarily need a separate mobile interface. many discovery tools, such as ebsco, can detect the request from a mobile phone and automatically direct the request to the mobile interface. eleven out of fourteen claim to provide a separate mobile interface. blacklight, enterprise, and extensible catalog do not seem to have a separate mobile interface even though they may be mobile friendly. frbr—frbr groupings denote the relationships between work, manifestation, expression, and items. for instance, a search will not only retrieve a title, but different editions and formats of the work. only three discovery tools can display frbr relationships: extensible catalog (open source), primo by ex libris, and worldcat local by oclc. so far , most discovery tools are not capable of displaying the manifestations and expressions of a work in a meaningful way. from the user’s point of view, this feature is highly desirable. figure 5 is a screenshot from primo demonstrating displays indicating a large number of different adaptations of the work “romeo and juliet.” figure 6 displays the same intellectual work in different manif estations such as dvd, vhs, books, and more. figure 5. display of frbr relationships in primo . evaluation and comparison of discovery tools: an update | chickering and yang 24 figure 6. different versions of the same work in primo . summary the following are the summary tables of our comparison and evaluation. proprietary and open source programs are listed separately in these tables. the total number of features the authors found in a particular discovery tool is displayed at the end of the column. proprietary discovery tools seem to have more advanced characteristics of a modern d iscovery tool than the opensource counterparts. the open-source program blacklight displays fewer advanced features, but seems flexible for users to add features. see figures 7, 8, and 9. information technology and libraries | june 2014 25 figure 7. proprietary discovery tools. aquabrower axiell arena bibliocommons ebsco/ eds encore endeca 1. single point of search no no no yes yes no 2. state of the art interface yes yes yes yes yes yes 3. enriched content yes yes yes yes yes yes 4. faced navigation yes yes yes yes yes yes 5. simple keyword search box on the starting page yes yes yes yes yes yes 6. simple keyword search box on every page yes yes yes yes yes yes 7. relevancy no no no no no no 8. spell checker/ “did you mean . . . ?” yes yes yes yes yes yes 9. recommendation no yes yes yes yes yes 10. user contribution yes yes yes no yes no 11. rss yes yes no yes no yes 12. integration with social network sites yes yes yes yes yes yes 13. persistent links yes yes no yes no no 14. stemming/autocomplete no yes no no no yes 15. mobile interface yes yes yes yes yes yes 16. frbr no no no no no no total 11/16 13/16 10/16 12/16 11/16 11/16 evaluation and comparison of discovery tools: an update | chickering and yang 26 enterprise primo summon visualizer worldcat local 1. single point of search no yes yes no yes 2. state of the art interface yes yes yes yes yes 3. enriched content yes yes yes yes yes 4. faced navigation yes yes yes yes yes 5. simple keyword search box on the starting page yes yes yes yes yes 6. simple keyword search box on every page no yes yes yes yes 7. relevancy no yes no no no 8. spell checker/ did you mean...? yes yes yes yes yes 9. recommendation no yes yes no yes 10. user contribution no yes no no yes 11. rss yes yes yes no no 12. integration with social network sites no yes no no yes 13. persistent links no no no no yes 14. stemming/autocomplete yes no yes no yes 15. mobile interface no yes yes yes yes 16. frbr no yes no no yes total 7/16 14/16 11/16 7/16 14/16 figure 8. proprietary discovery tools (continued). blacklight extensible catalog vufind 1. one point of search no no no 2. state of the art interface yes yes yes 3. enriched content yes yes yes 4. faceted navigation yes yes yes 5. simple keyword search box on the starting page yes yes yes 6. simple keyword search box on every page yes yes yes 7. relevancy no no no 8. spell checker/did you mean ...? no no yes 9. recommendation no yes yes 10. user contribution no no yes 11. rss no no yes 12. integration with social network sites no yes no information technology and libraries | june 2014 27 13. persistent links yes no no 14. stemming/auto-complete no yes no 15. mobile interface no no yes 16. frbr no yes no total 6/16 9/16 10/16 figure 9. free and open-source discovery tools. as one-stop searching is the core of a discovery tool, this consideration placed five discovery tools above the rest: encore, ebsco discovery service, primo, summon, and worldcat local ( see figure 10). these five are web-scale discovery services. all of them use their native unified index except encore, which has incorporated the ebsco unified index in its search. despite of great progress made in the past three years in one-stop searching, none of the discovery to ols can truly search across all library resources—all of them have some limitations as to the coverage of content. each unified index may cover different databases as well as overlap each other in many areas. one possible solution may lie in a hybrid approach that combines a unified index with federated search (also called real-time discovery). those old and new technologies may work well when complementing each other. it remains a challenge if libraries will ever have one-stop searching in its true sense. discovery tools one-stop searching encore yes ebsco discovery service yes primo yes summon yes worldcat local yes figure 10. the discovery tools capable of one stop searching . it is also worth mentioning that one-stop searching is a vital and central piece of discovery tools. those discovery tools without a native unified index or connectors to databases for real -time searching are at a disadvantage. therefore discovery tools that do not provide web -scale searching are investigating various possibilities to incorporate one-stop searching. some are drawing on the unified indexes of those discovery tools that have them through connectors to the application programming interfaces (apis) of those products. for instance, vufind in cludes connectors to the apis of a few other systems that have a unified index or vast resources such as summon and worldcat. blacklight may provide one-stop searching through the primo api. such a practice may present other problems such as calculating relevancy ranking across resources that may not live in the same centralized index, thus not achieving fully balanced relevancy ranking. nevertheless, discovery tool developers are working hard to achieve one-stop searching. as a unified index can be shared across discovery tools, in the next few years, more and more discovery services may offer one-stop searching. evaluation and comparison of discovery tools: an update | chickering and yang 28 based on the count of the sixteen criteria in the checklist, we ranked primo and worldcat local as the top two discovery tools. based on our criteria , primo has two unique features that make it stand out: relevancy enhanced by usage statistics and value score and the frbr relationship display. worldcat local and extensible catalog are the other two discov ery tools that can display frbr relationships (see figure 11). rank discovery tools number of advanced features 1 primo and worldcat local 14/16 2 axiell arena 13/16 3 ebsco discovery service 12/16 4 aquabrowser, encore, and endeca 11/16 5 bibliocommons, summon, and vufind 10/16 6 extensible catalog 9/16 6 enterprise and visualizer 7/16 7 blacklight 6/16 figure 11. ranked discovery tools. limitations as discovery tools are going through new releases and improvements, what is true today may b e false tomorrow. discovery tools constantly improve and evolve , and many features are not included in this evaluation, such as integration with google maps for the location of an item and user-driven acquisitions. innovations are added to discovery tools constantly. this study only covers the most common features that the library community agreed upon as those that a discovery tool should have. some open-source discovery tools may provide a skeleton of an application that leaves the code open for users to develop new features. therefore different implementations of an open-source discovery tool may encompass totally different features that are not part of the core application. for instance, the university of virginia developed virgo based on blacklight, adding many advanced features. thus it is quite a challenge to distinguish what comes with the software and what are local developments. this study focused on the user interface of discovery tools. what are not included are content coverage, application administration, and searching capability of the discovery tools. those three are important factors when choosing a discovery tool. conclusion search technology has evolved far beyond federated searching. the concept of a “next generation catalog” has merged with this idea, and spawned a generation of discovery tools bringing almost google-like power to library searching. the problems facing libraries now are the intelligent information technology and libraries | june 2014 29 selection of a tool that fits their contexts, and structuring a process to adopt a nd refine that tool to meet the objectives of the library. our findings indicate that primo and worldcat local have better user interfaces, displaying more advanced features of a next generation catalog than their peers. for rul, ebsco discovery service (eds) provides something approaching the ease of google searching from either a single search box or a very powerful advanced search. being aware of the limitations noted above, rider’s libraries elected to continue displaying traditional search options in addition to what we’ve branded “library one search.” another issue we discovered in this process is when negotiating for a vendor-hosted test, libraries must be sure that the test period begins when the configuration is complete rather than only whe n the data load begins. all phases of the project took far more time than anticipated. the client institution’s implementation coordinator or team needs to be reviewing the progress on a daily basis and communicating often with the vendor-based implementation team. with the evaluative framework this study provides, libraries moving toward discovery tools should consider changing capabilities of the available discovery tools to make informed choices. references 1. jason vaughan, “investigations into library web-scale discovery services,” information technology & libraries 31, no. 1 (2012): 32–33, http://dx.doi.org/10.6017/ital.v31i1.1916. 2. sharon q. yang and melissa a. hofmann, “next generation or current generation? a study of the opacs of 260 academic libraries in the usa and canada,” library hi tech 29 no. 2 (2011): 266–300. 3. melissa a. hofmann and sharon q. yang, “‘discovering’ what’s changed: a revisit of the opacs of 260 academic libraries,” library hi tech 30, no. 2 (2012): 253–74. 4. alexander pope, “alexander pope quotes,” http://www.brainyquote.com/quotes/authors/a/alexander_pope.html. 5. f. william chickering, “linking information technologies: benefits and challenges,” proceedings of the 4th international conference on new information technologies, budapest, hungary, december 1991, http://web.simmons.edu/~chen/nit/nit%2791/019-chi.htm. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3, (2006): 128-39, http://dx.doi.org/10.6017/ital.v25i3.3342. 7. marshall breeding, “plotting a new course for metasearch,” computers in libraries 25, no. 2 (2005): 27–29. http://www.brainyquote.com/quotes/authors/a/alexander_pope.html http://web.simmons.edu/~chen/nit/nit%2791/019-chi.htm evaluation and comparison of discovery tools: an update | chickering and yang 30 8. judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (2009): 161–63, http://dx.doi.org/10:6017/ital.v28i4.3326. 9. priscilla caplan, “on discovery tools, opacs and the motion of library language,” library hi tech 30, no. 1 (2012): 108–15. 10. carol pitts diedrichs, “discovery and delivery: making it work for users,” serials librarian 56, no. 1–4 (2009): 79, http://dx.doi.org/10.1080/03615260802679127. 11. alex a. dolski, “information discovery insights gained from multipac, a prototype library discovery system,” information technology & libraries 28, no. 4, (2009): 173, http://dx.doi.org/10.6017/ital.v28i4.3328. 12. marshall breeding, “the state of the art in library discovery,” computers in libraries 30, no. 1 (2010): 31–34. 13. jennifer l. fabbi, “focus as impetus for organizational learning,” information technology & libraries 28, no. 4 (2009): 164–71, http://dx.doi.org/10.6017/ital.v28i4.3327. 14. douglas way, “the impact of web-scale discovery on the use of a library collection,” serials review 36, no. 4: (2010): 214–20, http://dx.doi.org/10.1016/j.serrev.2010.07.002. 15. marshall breeding, “library technology guides: discovery products,” http://www.librarytechnology.org/discovery.pl. 16. ibid. 17. sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?” library hi tech 28, no. 4 (2010): 690–709. 18. yang and hofmann, “next generation or current generation? ” 266–300. 19. melissa a. hofmann and sharon q. yang, “how next-gen r u? a review of academic opacs in the united states and canada,” computers in libraries 31, no. 6 (2010): 26–29. 20. brown library of virginia western community college, “primo-frequently asked questions,” http://www.virginiawestern.edu/library/primo -faq.php#popularity_ranking. 21. yang and wagner, “evaluating and comparing discovery tools,” 690–709. 22. yanyi lee and sharon q. yang, “folksonomies as subject access—a survey of tagging in library online catalogs and discovery layers,” paper presented at ifla post-conference “beyond libraries-subject metadata in the digital environment and semantic web ,” tallinn, estoniai, 18 august 2012, http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_yan.docx http://athena.rider.edu:2054/eds/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie42eik6tmvsk6k63nn5kx94um%2bsa2otkewpq9lnqe4sk%2bws0yexss%2b8ujfhvhx4yzn5eyb4rorsbguteq1r7u%2b6tfsf7vb7d7i2lt94unjho6c8nnls79mpnfsvdgmrlg2rbdjsaeusk6mtlcwnosh8opfjlvc84tq6uoq8gaa&hid=20 http://www.librarytechnology.org/discovery.pl http://www.virginiawestern.edu/library/primo-faq.php#popularity_ranking http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_yan.docx letter from the editor kenneth j. varnum information technology and libraries | june 2018 1 in this june 2018 issue, we continue our celebration of ital’s 50th year with a summary by editorial board member sandra shores of the articles published in the 1970s, the journal’s first full decade of publication. the 1970s are particularly pivotal in library technology, as it marks the introduction of the personal computer, as a hobbyist’s tool, to society. the web is still more than a decade away, but the seeds are being planted. with this issue, we introduce a new look for the journal — thanks to the work of lita’s web coordinating committee, and in particular kelly sattler (also a member of the editorial board), jingjing wu, and guy cicinelli. the new design is much easier on the eyes and more legible, and sports a new graphic identity for ital. board transitions june marks the changing of the editorial board. a significant number of board members’ terms expire this june 30, and i’d like to take this opportunity to thank those departing members for their years of service to information technology and libraries, and the support they have offered me this year as i began as editor. each has ably and generously contributed to the journal’s growth over the last years, and i thank them for their service to the journal and to ital: • mark cyzyk (johns hopkins university) • mark dehmlow (notre dame university) • sharon farnel (university of alberta) • kelly sattler (michigan state university) • sandra shores (university of alberta) these are big shoes to fill, but i am excited about the new members who have been appointed for two-year terms beginning july 1, 2018. in march, we extended a call for volunteers for 2 -year terms on the editorial board. we received almost 50 applications, and ultimately added seven new members: • steven bowers (wayne state university) • kevin ford (art institute of chicago) • cinthya ippoliti (oklahoma state university) • ida joiner (independent consultant) • breanne kirsch (university of south carolina upstate) • michael sauers (do space, omaha, nebraska) • laurie willis (san jose public library) readership survey summary over the past three months, we ran a survey of the ital readership to try to understand a bit more detail about who you are, collectively. the survey received 81 complete responses out of about 11,000 views of pages with the survey link on it. here are some brief summary results: • nearly half (46%) of respondents have attended at least one lita event (in-person or online). letter from the editor | varnum 2 https://doi.org/10.6017/ital.v37i2.10571 • three quarters (75%) of respondents are from academic libraries. public, special, and lis programs make up an additional 20%. • the majority (56%) are librarians, with the remaining spread across a number of other roles. • almost two thirds (63%) of respondents have never been lita members, a quarter (25%) are current members, and the remainder are former members. • about four fifths (81%) of responses came from the current issue (either the table of contents or individual articles). an invitation what can you share with your library colleagues in relation to technology? if you have interesting research about technology in a library setting, or are looking for a venue to share your your case study, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2018 mailto:varnum@umich.edu board transitions readership survey summary an invitation mitchell multimedia will have a profound effect on libraries during the next decade. this rapidly developing technology permits the user to combine digital still images, video, animation, graphics, and audio. it can be delivered in a variety of finished formats, including streaming video on the web, video on dvd/vcd, embedded digital objects within a web page or presentation software such as powerpoint, utilized within graphic designs, or printed as hardcopy. this article examines the elements of multimedia creation, as well as requirements and recommendations for implementing a multimedia facility in the library. t he term multimedia, which some may remember being used in the early 1970s as the name for slide shows set to music, now is used to describe “a number of diverse technologies that allow visual and audio media to be combined in new ways for the purpose of communicating.”1 almost all personal computers sold today are capable of viewing multimedia; many can, with minor modifications, also create multimedia. one of the most important features of multimedia is its flexibility. multimedia creation has several distinct elements—inputs, processes performed on those inputs, and outputs (see figure 1). each element can be described as follows. � inputs—new video can be recorded, or existing video, stored on a hard disk, cd/dvd, or tape can be imported. the same is true of audio, with the added flexibility of creating soundtracks or sound effects later, during the editing process. digital still images can be used, either shot on a camera or created by scanning an existing picture. digital artwork or animated sequences created in other software also can be brought in. � processing—regardless of the source, these digital inputs are loaded into the editing software. at this stage, the user will select and arrange the images and sounds, and the software may permit special effects to be created. in addition, the editing software may compress the file so that it is easier to use than the large file sizes used in raw video and audio recording. � outputs—at this point, the user has more choices to make. the new multimedia file can be sent to a program that will encode it for a streaming video in any one of a variety of popular formats, such as windows media, realmedia, or clipstream. then it can be mounted on a web site (either a regular page or within courseware such as webct or blackboard), or the file could be burned onto a cd or dvd, or it could be used within presentation software such as microsoft powerpoint. or the output file from the editing process could be encoded and embedded so that it is an avatar running as part of a web page with a product such as rovion bluestream. the possibilities are nearly endless. all of this is made possible by advances in technology on a variety of fronts. one of the happy anomalies in technology is that greater performance is frequently accompanied by lower costs. this is certainly the case with much of the activity surrounding multimedia. the following factors have fostered advances in multimedia: � increase in processing power and decrease in cost of computer hardware; � quality and affordability of video equipment; � compression of multimedia files; � consumer broadband internet access; and � current multimedia editing software the first two technology factors concern the equipment involved in multimedia production. leading off is the familiar, ever-increasing speed of processors and improved memory and hard-drive space, all delivered for less money. this trend is something that many people take for granted, but a reality check is sometimes in order. the processor in the typical desktop machine on advertised special today is approximately forty-four times as fast as the first pentium processor sold ten years ago, and is equipped with sixteen times as much ram and 117 times as much hard-drive space—at 20 percent of the cost of the old machine (not even adjusted for inflation!). the second factor is the incredible quality available in consumer-market video equipment at reasonable costs. while the images produced with consumer-grade video would not play well at the local megaplex movie theater, they look very good on the small screens found on computers, televisions, and classroom projectors. the third factor is that tremendous compression of multimedia files can be achieved during the editing process. an incoming raw-video file (in the standard .avi format) can be compressed with editing, encoding, and dedicated third-party compression software to an incredible 1 to 2 percent of its original size, and it will still retain very good quality as a digital object on the web and in other desktop viewing applications. the fourth factor is extremely critical for the success of multimedia web applications. home access is shifting away from dial-up access to broadband, with its greatly increased transfer rates. half of all united states homes with internet access are already using broadband, and the 32 information technology and libraries | march 2005 gregory a. mitchell (mitchellg@utpa.edu) is assistant director, resource management at the university of texas—pan american library, edinburg, texas. distinctive expertise: multimedia, the library, and the term paper of the future gregory a. mitchell forecast is for steady increase in these numbers.2 although not all broadband is created equal, it is all significantly faster than dial-up access. the final technology factor concerns the software that is currently available to the multimedia web developer. a developer can achieve some quite professional results with even the most basic products, and then can grow into more complex software that supports increasing levels of expertise. once again, this software is being sold in the price range that typical consumers can afford. � small really is beautiful creating a multimedia lab in the library need not be a large, complex undertaking. in fact, it can be very low cost and as simple as a single workstation. so it is scalable, allowing the library to start small and build in complexity and cost as time, money, and human resources will permit. at the bare-bones minimum, a multimedia lab would consist of a workstation with the software necessary for acquiring, editing, and outputting the files. for practical purposes, though, the workstation should be equipped with a network connection, a cd/dvd burner, a scanner, and a webcam with microphone. another very useful option is an analog-digital bridge device, which enables the capture of analog input (such as vhs tape) into digital files for the editor. to achieve better-quality video when shooting original content, a digital-video camera, tripod, wireless microphone, and portable light kit would be recommended. since more time typically is spent at the editing station than with the camera, the lab can be expanded with additional workstations before investing in another camera. experience at the author’s institution has shown that it is possible to operate a lab with ten workstations and only three video cameras and three still cameras. finally, output from the editing process will likely be printed, so a photoquality printer is another convenient option. this illustrates that the entry into multimedia work need not be a large expense, especially if an existing workstation and any other equipment is already available. if a fairly recent workstation is available to dedicate to the project, the library’s total startup cost could range from $200 to $1,000. not many new library services can be launched for as little as that. rather than dwell on equipment specifications, as that is not the intent of this discussion, the reader may consult the excellent tutorials available from desktop video and pc magazine’s online product guide.3 finally, the creation of a studio is a worthwhile option. although some video will need to be shot on location, many times it is possible to set up and shoot in just one place. a studio is the best place in which to work because it is a controlled environment. it does not need to be large or complicated, and a quiet office or study room can be set up with little effort and expense. the studio gives the users control over the sound and the lighting, and involves minimal setup time for projects. � the research paper of the future multimedia has begun to attract attention in the library community. joe janes, chair of library and information science at the information school at the university of washington and the person responsible for developing the internet public library, recently stated he foresees a growing role for multimedia in the library. it will replace much of the traditional, text-based communication that people are accustomed to. for example, multimedia projects can become the research paper of the future for students.4 it is the media in which many library customers will be working. experience from the author’s institution with creating a multimedia lab would seem to confirm his observation. during the first year and a half of operation, use of the lab has steadily increased (see figure 2). � collaboration the multimedia lab opens the doors to collaborative opportunities with faculty and students from a variety of disciplines across campus. this is because multimedia, like geographic information systems (gis) or other electronic information and communication technologies, is a tool and is not discipline-specific. as important as it is to make the connection with faculty, this media is something with which the students will frequently lead the figure 1. multimedia creation process distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 33 34 information technology and libraries | march 2005 way. they are, after all, the mtv generation, and multimedia has an incredible appeal to their visual orientation. faculty themselves have used it to augment their web-based courses as well as traditional classroom instruction. the author ’s library has even initiated a multimedia résumé service for graduating students. the students can record a video introduction of themselves, encode this as a rovion bluestream avatar, and post it with their résumés on the web. this creates a much stronger impression than a standard résumé, hopefully giving the students an edge in promoting themselves on the job market. even more impressive is the variety of projects that are created in the lab by the students. one might expect to see interest from students in art and communications classes, but students come from many other disciplines as well. for example, business students have effectively used multimedia in their graduate-school business-plan presentations, while biology students like to use the graphics capabilities to study close-ups of slides. education students have employed it to produce multimedia instructional aids, and a sociology student put together a presentation on underserved, low-income neighborhoods. the library supplies the facility and instruction—only the imagination of students is needed. libraries have always been involved in the students’ research and writing process, by providing content, instruction, and facilities for producing the final research product. the same is true in the multimedia environment, although implementing a multimedia lab calls for some new skills for librarians. these include familiarity with basic principles of videography, learning how to use the cameras and other equipment, and gaining some mastery of the editing and encoding software. � why put it in the library? in addition to the research-paper analogy, the author believes that librarians can point with pride to the values and value that libraries offer their communities. it is a central and neutral location—not in one department’s or college’s turf. libraries are conveniently open for many hours per week. many of the information resources that students might use to prepare the presentation are in the library. and librarians have a professional ethic that drives them to provide instruction and assistance for the services the library offers. since multimedia production does have a learning curve and most new users need help in mastering the technology, it does not fit very well with the typical 24/7 drop-in computer lab that the campus information technology (it) often operates. this is a good opportunity for librarians to recognize some of their strengths and capitalize on them. in addition, this can be a breath of fresh air for librarians. here is an opportunity to learn about something new and creative. most people find that they have less room for creativity as time goes by.5 with a multimedia lab in the building, it will offer the librarians the opportunity to create multimedia productions for the library, besides assisting students and faculty with their projects. � potential problems there are some obstacles to overcome, of course. they need not be seen as major, but it is best to be realistic when beginning any new venture. it is almost always a good idea to start small, with a pilot project that will yield valuable lessons before venturing into anything big. � equipment—define what specifications are needed, see what is already available to use or borrow, then figure out what you will actually need to buy. � software—check out the variety of software for editing and production; think about how you want to begin using multimedia (primarily on the web, in presentation software such as powerpoint, as standalone videos on cds and dvds). � money—if funding permits, a library can invest several thousand dollars in a high-end multimedia computer, associated peripherals such as a color printer and one or more scanners, and a software suite to meet initial anticipated demands for multimedia creation and editing. if funding is scarce, you may want to investigate what existing equipment could be used in support of a pilot project. � location—this needs some space of its own, accessible to students and monitored by staff. although the figure 2. university of texas—pan american library multimedia lab usage editing workstation could be in an area with other computers, a quiet area is needed for shooting video so that there will not be interference from noise and unwanted foot traffic through the shots. � staffing and training—a multimedia lab is not a good candidate for self-service. librarians and staff who will provide the service need to learn how to use the equipment and software. make sure that they all have an acceptable level of competence and confidence so that the library can shine with its new service, but expect that everyone will need to continue to learn and grow in their proficiency. if your library plans to produce its own multimedia sessions as well, it would be a good investment to attend a class on television or video production. � hours—how many hours per week will the new service be available? if it is the entire time the library is open, be prepared to train plenty of staff. repeat users will need less help as their skills increase (by the way, some of these students can be great workstudy employees). � instruction—plan to offer formal orientation and instruction sessions to faculty and their classes. if your lab is small, this is challenging, but it can be accomplished with some creativity. for example, a general instruction session on concepts can be done in a classroom, followed up by a series of small groups working by appointment for the appliedlearning component in the multimedia lab. the author and a colleague have even done instruction outside the library using laptops and cameras, creating a de facto mobile studio. � copyright—if there are already vcrs or photocopiers in the library, you have had to deal with this issue. the pan american library at university of texas does not allow people to use its lab to copy movies, which is a request that surely will come to you, and we post the usual copyright notices just as we do at our photocopiers. for some excellent information on copyright, visit the american library association web site (www.ala.org). � evaluation—plan on at least basic evaluation of the service. this can include an assessment of the effectiveness of the instruction sessions, a survey of satisfaction with the lab itself, a questionnaire on the intended uses of the multimedia projects, demographic data on the students, or other student input. logs of the number of uses and peak-demand periods are extremely useful for planning and for justifying further expenditures and staffing requests. � flexibility for the future—whatever you do in a pilot phase, always keep in mind that you want to keep an open mind—you are trying to learn from the experience so that you can make good decisions for the direction of this new service. it may not go exactly the way you originally thought, because of serendipity, or changes in technology, or very strong demand from some segments of the campus instead of others, or other environmental factors. � conclusion benefits to the library from the multimedia lab are many. one of the most important benefits is that it keeps the library involved in the process of academic communication, as the medium of the communication changes with technology. by being involved in this evolving medium at its early stages, the library is poised to pounce on opportunities to employ it to the benefit of the library in instruction and content delivery. the library also would position itself on campus as a key player in it and the leading local expert in the growing field of multimedia. since multimedia is a tool that crosses the entire range of subject disciplines on campus, it opens the doors of faculty to collaborate with librarians in exciting new ways. just as many campuses already have learning and collaborative communities that grew around their web courseware or gis endeavors, so too can one develop around multimedia. the appendix offers a list of multimedia web sites to consider. libraries are more than warehouses of books and periodicals. as more and more of our resources have been made available electronically, and indeed more of higher education has moved to electronic delivery, many libraries have been faced with declining gate counts, circulations, and reference statistics. as someone observed, we are victims of our own success. so what is the role of the library? we are intrinsically involved in the process of instruction, academic research, and communication. as kling observed, “one important strategic idea is that libraries configure their it services and activities to emphasize the distinctive expertise of their librarians rather than simply concentrate on the size and character of the documentary collection.”7 it is imperative therefore that libraries pick out the new trends that will allow them to excel by capitalizing on their traditional strengths. references 1. scala, inc. multimedia directory. accessed apr. 21, 2004, www.scala.com/multimedia/multimedia-definition.html. 2. nielsen/netratings as of june, 2004. accessed aug. 10, 2004, www.websiteoptimization.com/. 3. about.com, dvt101. accessed apr. 15, 2004, http:// desktopvideo.about.com/library/weekly/aa040703a.htm; “anatomy of a video editing workstation,” pc magazine. accessed apr. 16, 2004, www.pcmag.com/article2/0,1759,1264650 ,00.asp. distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 35 36 information technology and libraries | march 2005 4. college of dupage, “joe janes and colleagues: preparing for the future of digital reference,” a satellite broadcast from the college of dupage, 16 apr. 2004. 5. sandra kerka, creativity in adulthood (columbus, ohio: eric clearinghouse on adult career and vocational education, eric digest no. 204, ed429186, 1999). 6. american library association, “copyright issues, primer on the digital millennium.” accessed may 10, 2004, www.ala .org/ala/washoff/woissues/copyrightb/dmca/dmcprimer.pdf. 7. rob kling, “the internet and the strategic reconfiguration of libraries,” library administration & management 15, no. 3 (summer 2001): 144–51. appendix. for further reading: a multimedia web-site tour the following is a sampling of some of the most popular and interesting multimedia software, with examples of completed productions. this is not an official endorsement of any one product over another, whether listed here or not. a look at these sites will, however, give the reader an idea about the power and possibilities of multimedia communications. adobe (www.adobe.com) the well-known makers of some of the most powerful and popular editing software packages for graphics and video. camtasia (www.camtasia.com) easy to use, this is a good example of the type of software that does screen capture and recording, which is handy for producing online tutorials. clipstream (www.clipstream.com) an excellent example of the type of newer encoding software that achieves incredible compression of video and delivers it over the web with no viewer or plug-ins required for the user. finalcut pro (www.apple.com/finalcutpro) a perennial favorite among the mac crowd, this software is relatively easy to learn and lets the developer achieve dramatic results. flashants (www.flashants.com) a handy program that converts flash animation into .avi video format so that you can integrate animated sequences into a video production. macromedia (www.macromedia.com) the makers of flash and director, which are some of the most popular graphics, animation, and mulitimedia editing tools in the business. pinnacle (www.pinnaclesys.com) what finalcut pro is to the mac, this package is for the pc environment. easy to use, yet sophisticated in the results achieved. rovion (www.rovion.com) rovion bluestream is an encoder that enables the creation of avatar characters to appear live on your web page. a plugin is required for the user, but this approach definitely gets attention. serious magic (www.seriousmagic.com) an award-winning software package that allows you to turn a workstation into a studio, complete with teleprompter capability, sound effects, graphics, and editing. university of texas—pan american library (www.lib.panam.edu/libinfo/media.asp) links to multimedia projects at the author’s institution, including productions made by staff and students. identifying key steps for developing mobile applications and mobile websites for libraries devendra dilip potnis, reynard regenstreifharms, and edwin cortez information technology and libraries | september 2016 43 abstract mobile applications and mobile websites (mamw) represent information systems that are increasingly being developed by libraries to better serve their patrons. because of a lack of in-house it skills and the knowledge necessary to develop mamw, a majority of libraries are forced to rely on external it professionals who may or may not help libraries meet patron needs but instead may deplete libraries’ scarce financial resources. this paper applies a system analysis and design perspective to analyze the experience and advice shared by librarians and it professionals engaged in developing mamw. this paper identifies key steps and precautions to take while developing mamw for libraries. it also advises library and information science graduate programs to equip their students with the specific skills and knowledge needed to develop and implement mamw. introduction the unprecedented adoption and ongoing use of a variety of context-specific mobile technologies by diverse patron populations, the ubiquitous nature of mobile content, and the increasing demand for location-aware library services have forced libraries to “go mobile.” mobile applications and mobile websites (mamw), that is, web portals running on mobile devices, represent information systems that are increasingly being developed and used by libraries to better serve their patrons. however, a majority of libraries often lack the in-house human resources necessary to develop mamw. because of a lack of staff equipped with the requisite it skills and knowledge, libraries are often forced to partner with and rely on external it professionals, potentially losing control over the process of developing mamw.1 partnerships with external it professionals do not always help libraries meet the information needs of their patrons but instead can deplete their scarce financial resources. it then becomes necessary for librarians to understand the process of developing mamw to better evaluate mamw for better serving library patrons. one possibility devendra dilip potnis (dpotnis@utk.edu) is associate professor, school of information sciences; reynard regenstreif-harms (reynardrh@gmail.com) is project archives technician, great smoky mountains national park, gatlinburg, tennessee; and edwin cortez (ecortez@utk.edu) is professor, school of information sciences, university of tennessee at knoxville. mailto:dpotnis@utk.edu mailto:reynardrh@gmail.com) mailto:ecortez@utk.edu identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 44 is to re-educate themselves through continuing education or other professional development activities. another solution would be to see library and information science (lis) schools strengthen their curriculum in the area of management, evaluation, and application of mamw and related emerging technologies. issues, challenges, and strategies for providing librarians with these opportunities are abundant and have been debated for more than thirty years, especially since libraries started experiencing the impact of microchip and portable technologies.2 any practical and immediate guidance could help librarians in charge of developing mamw.3 however, a majority of the practical guidance available for developing mamw for libraries is limited to specific settings or patron populations. also, the practical guidance is not theoretically validated, curtailing its generalizability for diverse library settings. for instance, a number of librarians and it professionals share their experience and stories of mamw development to serve a specific patron population in a specific library setting.4,5 their stories typically describe their success stories of developing mamw, the lessons learned during the development of mamw, or their advice for developing mamw. this paper applies a system analysis and design perspective from the information systems discipline to examine the experience and advice shared by librarians and it professionals for identifying the key steps and precautions to be taken when developing mamw for libraries. system analysis and design, a branch of the information systems discipline, is the most widely used theoretical knowledgebase available for developing information systems.6 according to the system analysis and design perspective, development, planning, analysis, design, implementation, and maintenance are the six phases of building any information system.7 the next section synthesizes our method for this secondary research. the following section discusses the key steps we identified for developing, planning, analyzing, designing, implementing, and maintaining mamw for libraries. the concluding section presents the implications of this study for libraries and lis graduate programs. method we began this study with a practitioner’s handbook guiding libraries to use mobile technologies for delivering services to diverse patron populations.8 to search the literature relevant to our research, we devised many key phrases, including but not limited to “mobile technolog*,” “mobile applications for libraries,” and “mobile websites for libraries.” as part of our active informationseeking process, we applied a snowball sampling technique to collect more than seventy-five scholarly research articles, handbooks, ala library technology reports, and books hosted on ebsco and information science source databases. our passive information-seeking was helped by article suggestions from emerald insight and elsevier science direct, two of the most widely used journal hosting sites, in response to the journal articles we accessed there. we applied the following four criteria to establish the relevancy of publications to our research: accuracy of facts; duration of publications (i.e., from 2000 to 2014); credibility of authors; and content focused on information technology and libraries | september 2016 45 problems, solutions, advice, and tips for developing mamw. several research articles published by information technology and libraries and library hi tech, two top-tier journals covering the development of mamw for libraries, built the foundation of this secondary research. we analyzed the collected literature using the qualitative data presentation and analysis method proposed by miles and huberman.9 we developed microsoft excel summary sheets to code the experience and advice shared by librarians and it professionals. the coded data was read repeatedly to identify and name patterns and themes. each relevant publication was analyzed individually and then compared across subjects to identify patterns and common categories. the inter-coder reliability between the two authors who analyzed data was 85 percent. data analysis helped us identify the key steps needed for planning, analyzing, designing, implementing, and maintaining mamw for libraries. findings and discussion key steps for planning mamw forming and managing a team building teams of people with the appropriate skills, knowledge, and experience is one of the first steps suggested by the existing literature for planning mamw. it is essential for team members to be aware of new developments and trends in the market.10 for instance, developers should be aware of print resources on relevant technologies such as apache, asp, javascript, php, ruby on rails, and python, etc.; online resources such as detectmobilebrowser.com and w3c mobileok checker to test catalogs, design functionality, and accessibility on mobile devices; and various online communities of developers who could provide peer-support when needed.11 team members are also expected to keep up with new developments in mobile devices, platforms, operating systems, digital rights management terms and conditions, and emerging standards for content formats.12 periodic delegation of various tasks could help libraries develop mamw effectively.13 libraries should also form productive, financially feasible partnerships with external stakeholders such as internet service providers and network administrators for hosting mamw on appropriate internet servers that meet desired safety and security standards.14,15 requirements gathering requirements for developing mamw can be collected through empirical research and secondary research. typically, the goal of empirical research is to help libraries [set off as bulleted list?]gather patron preferences for and expectations of mamw,16,17 stay abreast of the continual evolution of patron needs,18 periodically (e.g., quarterly, annually, biannually, etc.) gather and evaluate user needs,19 index the content of mamw,20 investigate the acceptance of the library’s use of mamw by patrons,21 understand user needs, and identify top library services requested by patrons. identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 46 empirical research in the form of usability testing, functional validation, user surveys, etc., should be carried out before developing mamw to inform the development process and/or after developing mamw to study their adoption by library patrons. empirical research typically involves the identification of patrons and other stakeholders who are going to be affected by mamw. this step is followed by developing data-collection instruments, collecting data from patrons and other stakeholders, and analyzing qualitative and quantitative data using appropriate techniques and software.22 secondary research mainly focuses on scanning and assessing existing literature. for instance, using appropriate datasets on mobile use, librarians may be able to identify the factors responsible for the adoption of mobile technologies.23 typically, such factors include but are not limited to cognitive, affective, social, and economic conditions of potential users. mamw developers could also scan the environment by examining existing mamw and reviewing the literature to create sets of guidelines for replacing old information systems by developing new, well-functioning mamw.24 librarians could also scan the market for free software options to conserve financial resources.25 making strategic choices mobile applications or mobile websites? one of the most important strategic decisions libraries need to make during this phase is whether to use a mobile app or a mobile website—that is, a web portal running on mobile devices—for offering services to patrons. mobile websites are web browser-based applications that might direct mobile users to a different set of content pages, serve a single set of content to all patrons while using different style sheets or templates reformatted for desktop or mobile browsers, or use a site transcoder (a rule-based interpreter), which resides between a website and a web client and intercepts and reformats content in real time for a mobile device.26,27 mobile apps are more challenging to build than mobile websites because they require separate and specific programming for each operating system.28 mobile apps burden users and their devices. for instance, users are expected to remember the functionality of each menu item, and a significant amount of memory is required to store and support apps on mobile devices. however, potential profitability, better mobile-device functionality, and greater exposure through app stores can make mobile apps an economical option over mobile websites.29 buy or build? in the planning phase, libraries also need to decide whether to buy commercial, off-the-shelf (cots) mamw or build a customized mamw. mamw need to be evaluated in terms of customer support and service, maintenance, the ability to meet patron needs, and library needs when making this choice.30 sometimes libraries purchase cots products and end up customizing them, benefiting from both options. for example, some libraries first purchase packaged mobile frameworks to create simple, static mobile websites and subsequently develop dynamic library apps specific to library services.31 information technology and libraries | september 2016 47 managing scope many libraries have limited financial resources, which makes it necessary for their staff to manage the scope of mamw development. the ability to prioritize tasks and identify mission-critical features of mobile mamw are some of the most common activities undertaken by libraries to manage this scope.32 for instance, it is not practical to make entire library websites mobile because libraries would end up serving only those patrons who access their sites over mobile alone. instead, libraries should determine which part of the website should go mobile. a growing trend of using products like mobile first design to design a mobile version of a website first and then work up to a larger desktop version could help librarians better manage the scope of mamw development. alternatively, jeff wisniewski, a leading web services librarian in the united states, advises libraries to create a new mobile-optimized homepage alone, which is faster than trying to retrofit the library’s existing homepage for mobile.33 this advice is highly practical because no webmaster has any interest in trying to maintain two distinct versions of the library’s webpages with details such as hours of operations and contact information. selecting the appropriate software development method there are three key methods for developing mamw: structured methodologies (e.g., waterfall or parallel), rapid application prototyping (e.g., phased, prototyping, or throwaway prototyping), and agile development, an umbrella term used to refer to the collection of agile methodologies like crystal, dynamic systems development method, extreme programming, feature-driven development, and scrum. there is a bidirectional relationship between these mamw development methods and the resources available for their development. project resources such as funding, duration, and human resources influence and are affected by the type of software development method selected for developing mamw. however, studies rarely pay attention to this important dimension of the planning phase.34 key steps in the analysis phase requirements analysis after collecting data from patrons, the next natural step is to analyze the data to inform the process of conceptualizing, building, and developing mamw.35 the requirements-analysis phase helps libraries achieve user-centered design of mamw and assess the return on investment in mamw. the context and goals of the patrons using mobile devices, and the tasks they are likely and unlikely to perform on a mobile device, are the key considerations for developing user-centered mamw for library patrons.36 it is critical to gather, understand, and review user needs.37 surveys can be developed on paper or online, which can be analyzed using advanced statistical techniques or qualitative software.38,39 the analysis allows the following questions to be answered: which identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 48 library services do patrons use most frequently on their mobile devices? what is their level of satisfaction for using those services? what types of library services and products would they like to access with their mobile phones in the future? survey analyses can help librarians predict which mobile services patrons will find most useful;40 they can also help librarians classify users on the basis of their perceptions, experience, and habits when using mobile technologies to access library services.41 as a result, libraries can identify and prioritize functional areas for their mamw deployment.42 mamw developers can learn from their users’ humbling and/or frustrating experience of using mobile devices for library services. in addition, libraries can keep track of their patrons’ positive and negative observations, their information-sharing practices, and howthey create group experiences on the platform provided by their libraries.43 to improve existing mamw, libraries could also use google analytics, a free web metrics tool, for identifying the popularity of mamw features and analyzing statistics on how they are used.44 to develop operating system-specific mobile apps, google analytics can be used to learn about the popularity of mobile devices used by patrons.45 ideally, libraries should calculate and document roi before investing in the development of mamw.46 for instance, libraries can run a cost-benefit analysis on the process of developing mamw and compare various library services offered over mobile devices.47 typically the following data could help libraries run the cost-benefit analysis: specific deliverables (e.g., features of mamw), resources (e.g., resources needed, available resources, etc.), risks (e.g., types of risks, level of risks, etc.), performance requirements, and security requirements for developing mamw. this analysis would help libraries make decisions on service provisions such as specific goals to be set for developing mamw, feasibility of introducing desired features of mamw, and how to manage available resources to meet the set goals.48 libraries should also examine what other libraries have already done to provide mobile services.49 communication/liaising with stakeholders the effective communication between developers and stakeholders influences almost every aspect of developing information systems. however, existing studies do not emphasize the significance of communication with stakeholders. for instance, several studies vaguely refer to the translation of user needs into technology requirements.50 but few studies point out the precise modeling technique (e.g., entity relationship diagrams, unified modeling language, etc.) for converting user needs into a language understood by software developers. developers should communicate best practices and suggestions for the future implementation of mamw in libraries,51 which involves the prediction and selection of appropriate mamw for libraries,52 the demonstration of what is possible and how services are relevant, and how new resources can help create value for libraries.53,54 communication with users is also critical for creating value-added services for patrons who use different mobile technologies to meet their needs related to work, leisure, commuting, etc.55 information technology and libraries | september 2016 49 however, the existing literature on mamw development for libraries does not mention the significance of this activity. key steps for designing mamw prototyping prototyping refers to the modeling or simulation of an actual information system. mamw can have paper-based or computer-based prototypes. prototyping allows developers to directly communicate with mamw users to seek their feedback. developers can correct or modify the original design of mamw until users and developers are in agreement about the system design. building consensus between mamw developers and potential users is another key challenge to overcome during this phase, which may put a financial burden on mamw development projects. it requires skilled personnel to manage the scope, time, human resources, and budget of such projects. wireframing is one of the most prominent prototyping techniques practiced by librarians and it professionals for developing mamw for libraries.56 this technique depicts schematic on-screen blueprints of mamw, lacking style, color, or graphics, focusing mainly on functionality, behavior, and priority of content. selecting hardware, programming languages, platforms, frameworks, and toolkits existing literature on the development of mamw for libraries covers the selection and management of software; software development kits; scripting languages like javascript; data management and representation languages such as html, xml, and their text editors; and ajax for animations and transitions. the existing literature also guides libraries for training their staff for using mamw to better serve patrons.57 few studies also provide guidance on selecting cots products such as webkit, an open source web browser engine that renders webpages on smartphones and allows users to view high-quality graphics on data networks with faster throughput.58 however, it might be a good idea to use licensed open source cots products because licensed software allows libraries to legally distribute software within their organizations as covered by the licensing agreement. libraries that use software-licensing agreements may also be able to seek expert help and advice whenever they have a concern or query. in the authors’ experience, librarians have shared few effective strategies to design mamw. one key strategy is to purchase reliable device emulators and cross-compatible web editors. these technologies allow the user to work with the design at the most basic level, save documents as text, transfer the documents between web programs, and direct designers toward simple solutions.59 sample cross-compatible web editors include, but are not limited to, notetab pro (http://www.notetab.com/), code lobster (http://www.codelobster.com/), and bluefish (http://bluefish.openoffice.nl). http://www.notetab.com/ http://www.codelobster.com/ http://bluefish.openoffice.nl/ identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 50 hybrid mobile app frameworks like bootstrap, ionic, mobile angular ui, intel xdk, appcelerator titanium, sencha, kendo ui, and phonegap use a combination of web technologies like html, css, and javascript for developing mobile-first, responsive mamw. a majority of these frameworks use a drag-and-drop approach and do not require any coding for developing mobile apps. one-click api connect further simplifies the process. user-interface frameworks like jquerymobile and topcoat eliminate the need to design user interfaces manually. importantly, mamw developed using such frameworks can support many mobile platforms and devices. toolkits like github, skyronic, crudkit, and hawhaw enable developers to quickly build mobilefriendly crud (create/read/update/delete) interfaces for php, laravel, and codeigniter apps. such mobile apps also work with mysql and other databases, allowing users to receive and process data and display information to users. table 1 categorizes specific hardware and software features recommended for mamw to better serve library patrons. # areas of information systems/it specific features recommended for developing mamw for libraries 1 human-computer interaction (hci) behavioral, cognitive, motivational, and affective aspects of hci design responsive web sites for libraries to enhance user experience60 design a user interface meeting the expectations and needs of potential users (e.g., menu with the following items: library catalog, patron accounts, ask a librarian, contact information, listing of hours, etc.)61 design meaningful mobile websites based on user needs, documenting and maintaining mobile websites62 usability engineering design concise interfaces with limited links, descriptive icons, home and parent-link icons63 create a user-friendly site (e.g., the dok library concept center in delft, netherlands, offers a welcome text message to first-time visitors)64 effectively transition from traditional websites to mobile-optimized sites with responsive design65 create user-friendly interface designs66 present a clean, easy to navigate mobile version of search results67 information technology and libraries | september 2016 51 information visualization automatically maintain reliable and stable fundamental information required by indoor localization systems68 save time by redesigning existing sites69,70 2 web programming html, xml, etc. design sites with a complete separation of content and presentation71 code html and css for better user experiences72 create and shorten links to make them easier to input using small or virtual keyboards73 using cient-side and server-side scripting such as javascript object notation, etc. design and develop mashups74 develop mamw using client-server architecture, accessible on mobile devices75 without scripting implement widgetization to facilitate the integration of mobile websites—developing a widget library for mobile-based web information systems76 3 open source design mobile websites that allow users to leverage the same open source technology as the main websites77 design mobile websites linking to other existing services like library h3lp and library catalogs with mobile interfaces such as mobilecat78 4 networking design a mobile website capable of exploiting advancements in technology such as faster mobile data networks79 identify and address technology issues (e.g., connectivity, security, speed, signal strength, etc.) faced by patrons when using mamw80 5 input/output devices use a mobile robot to determine the location of fixed rfid tags in space81 design mamw capable of processing data communicated using radio frequency identification devices, near-field communication technology, and bluetoothbased technology like ibeacons82 offer innovative services using augmentedreality tools83 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 52 6 databases integrate a back-end database of metadata with front-end mobile technologies84 integrate front-end of mobile mamw with back-end of standard databases and services85 7 social media and analytics integrate social media sites (e.g., foursquare, facebook place, gowalla, etc.) with existing checkout services for accurate and information rich entries86 implement google voice or a free textmessaging service87 use google analytics for mobile optimized website by copying the free javascript code generated from google analytics and paste it into library webpages to gain insight into what resources are used and who used them88 integrate a geo-location feature with mobile services89 table 1. mamw with specific hardware and software features from the above table, which is based on the analysis of the literature on developing mobile applications and mobile websites for libraries, it becomes clear that web programming and hci are the two leading technology areas that shape the development of mamw and consequently the services offered by them. designing user interfaces of mamw librarians and it professionals engaged in developing mamw for libraries make the following recommendations. use two style sheets: css play a key role in offering uniform display to user interfaces for all webpages. studies recommend designing two style sheets—namely, mobile.css and iphone.css— when developing mamw, since most of the time smartphones ignore mobile stylesheets.90 in that case, iphone.css could direct itself to browsers of a specific screen-width, helping those mobile devices that are not directed to the mobile website by the mobile.css stylesheet.91 minimize use of javascript: javascript is instrumental in detecting what mobile device is being used by patrons and then directing them to the appropriate webpage with options including full website, simple text-based, and touch-mobile-optimized. however, it is critical to minimize the use of javascript on library mobile websites because not every smartphone offers the minimum level of support required to operate it.92 handle images intelligently: to help patrons optimize their bandwidth use, image files on mobile sites should be incorporated with css rather than html code; also, to ensure consistency in the information technology and libraries | september 2016 53 appearance of user interfaces of mobile websites, images should be kept to the same absolute size.93 key steps for implementing mamw programming for mamw programming is at the heart of developing mamw. as shown in table 1 above, web programming enables developers to build mamw with a number of value-added features for patrons. for instance, a web-application server running on cold fusion can process data communicated via web browsers on mobile devices; this feature allows mamw users to access search engines on library websites via smartphones.94 also, client-side processing of classes (with a widget library) allows patrons to use their mobile devices as thin clients, thereby optimizing the use of network bandwidth.95 testing mamw past studies recommend testing the content, display/design, and functionality of mamw in a controlled environment (e.g., usability lab) or in the real world (i.e., in libraries). content: librarians are advised to set up testing databases for testing image presentation, traditional free text search, location-based search, barcode scanning for isbn search, qr encapsulation, and voice search.96 display/design: librarians can review and test mamw on multiple devices to confirm that everything displays and functions as intended.97 they can also test a beta version of their mobile website with varying devices to provide guidance regarding image sizing;98 beta versions are also useful in testing mobile websites for their display on different browsers and devices.99 functionality: librarians can set up testing practices and environments for the most heavily used device platforms (e.g., hci incubators such as eye testing software, which is a combination of virtual emulators and mobile devices not owned by libraries).100,101 they can also use the user agent switcher add-on for firefox to test a mobile website and use web-based services like device anywhere and browser cam offering mobile emulation to test the functionality of mamw.102 training patrons unless patrons realize the significance of a new information system for managing information resources they will hardly use it. however, training patrons for using a newly developed mamw is almost completely missing from the studies describing the process of developing mamw for libraries. joe murphy, a technology librarian at yale university, identifies the significance of user training in managing the change from traditional to mobile search and advises librarians to explore the mobile literacy skills of their patrons and educate them on how to use new systems.103 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 54 data management mamw cannot function properly without clean data. cleaning up data, curating data, and addressing other data-related issues are some of the least mentioned activities in the literature for developing mamw. however, it is necessary for librarians engaged in developing mamw to identify and address common challenges for managing data when used for mamw. for example, it might be a good strategy for librarians to study the best practices for managing data-related issues when offering reference services using sms .104 skills needed for maintaining mamw documentation and version control of software past studies recommend developing a mobile strategy for building a mobile-tracking device and evaluating mobile infrastructure to ensure the continued assessment and monitoring of mobile usage and trends among patrons.105 however, past studies do not report or provide many details about the maintenance of mamw, which leads us to infer that maintenance of mamw involving documentation and version control is a neglected aspect of their development. open source software development is increasingly becoming a common practice for developing mamw. implementing version-control software (e.g., subversion and github) to accommodate the needs of developers distributed across the world is a necessity for developing mamw. versioncontrol software provides a code repository with a centralized database for developers to share their code, which minimizes errors associated with overwriting or reverting code changes and maximizes software development collaboration efforts.106 conclusion there are various forces driving change in the knowledge and skills area for information professionals: technologies, changing environments, and the changing role of it in managing and providing services to patrons. these forces affect all levels of it-based professionals, those responsible for information processing and those responsible for information services. this paper has examined the key steps and precautions to be taken while developing mamw to better serve their patrons. after analyzing the existing guidance offered by librarians and it professionals from the system analysis and design perspective, we find that some of the most ignored activities in mamw development are selecting appropriate software development methodologies, prototyping, communicating with stakeholders, software version control, data management, and training patrons to use newly developed or revamped mamw. the lack of attention to these activities could hinder libraries’ ability to better serve patrons using mamw. it is necessary for librarians and it professionals to pay close attention to the above activities when developing mamw. information technology and libraries | september 2016 55 our study also shows that web programming and hci are the two most widely used technology areas for developing mamw for libraries. to save their scarce financial resources, which otherwise could be invested in partnering with external it professionals, libraries could either train their existing staff or recruit lis graduates equipped with the skills and knowledge identified in this paper to develop mamw (see table 2). # key steps for developing mamw skills and knowledge required for developing mamw a planning phase 1 forming and managing team human resource management 2 making strategic choices time management cost management quality management human resource management (e.g., staff capacity) 3 requirements gathering research (empirical and secondary) 4 managing scope (e.g., managing financial resources, prioritizing tasks, identifying mission-critical features of mamw, etc.) scope management 5 selecting an appropriate software development method time management cost management quality management b analysis phase 6 requirements analysis research (empirical and secondary) 7 communication/liaising with stakeholders communications management c design phase 8 prototyping software development (hci) 9 selecting hardware and programming languages and platforms software development (web programming and hci) 10 designing user interfaces of mamw software development (hci) d implementation phase 11 programming for mamw software development (web programming—e.g., android, ios, visual c++, visual c#, visual basic, etc.) 12 testing mamw software development (web programming and hci) identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 56 13 training patrons human resource management 14 data management (e.g., cleaning up data, curating data, etc.) data management e maintenance phase 15 documentation and version control of software software development (web programming and hci) table 2. skills and knowledge necessary to develop mamw the management of scope, time, cost, quality, human resources, and communication related to any project is known as project management.107 in addition to the skills and knowledge related to project management, librarians would also need to be proficient in software development (with an emphasis on hci and web programming), data management, and the proper methods for conducting empirical and secondary research for developing mamw. if lis programs equip their graduate students with the skills and knowledge identified in this paper, the next generation of lis graduates could develop mamw for libraries without relying on external it professionals, which would make libraries more self-reliant and better able to manage their financial resources.108 this paper assumes a very small number of scholarly publications to be reflective of the realworld scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications analyzed in this study is not statistically representative of the development of mamw for libraries around the world. in the future, the authors plan to interview librarians and it professionals engaged in developing and maintaining mamw for their libraries to better understand the landscape of developing mamw for libraries. references 1. devendra potnis, ed cortez, and suzie allard, “educating lis students as mobile technology consultants” (poster presented at 2015 association for library and information science education annual meeting, chicago, january 25–27), http://f1000.com/posters/browse/summary/1097683. 2. edwin michael cortez, “new and emerging technologies for information delivery,” catholic library world no. 54 (1982): 214–18. 3. kimberly d. pendell and michael s. bowman, “usability study of a library’s mobile website: an example from portland state university,” information technology & libraries 31, no. 2 (2012): 45–62, http://dx.doi.org/10.6017/ital.v31i2.1913. 4. godmar back and annette bailey, “web services and widgets for library information systems,” information technology & libraries 29 no. 2 (2010): 76–86, http://dx.doi.org/10.6017/ital.v29i2.3146 . http://f1000.com/posters/browse/summary/1097683 http://dx.doi.org/10.6017/ital.v31i2.1913 http://dx.doi.org/10.6017/ital.v29i2.3146 information technology and libraries | september 2016 57 5. hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobile optimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, http://dx.doi.org/10.6017/ital.v32i4.4636. 6. june jamrich parsons and dan oja, new perspectives on computer concepts 2014: comprehensive, course technology (boston: cengage learning, 2013). 7. ibid. 8. andrew walsh, using mobile technology to deliver library services: a handbook (london: facet, 2012). 9. matthew b. miles and a. michael huberman, qualitative data analysis (thousand oaks, ca: sage, 1994). 10. bohyun kim, “responsive web design, discoverability and mobile challenge,” library technology reports 49, no 6 (2013): 29–39, https://journals.ala.org/ltr/article/view/4507. 11. james elder, “how to become the “tech guy and make iphone apps for your library,” the reference librarian 53, no. 4 (2012): 448–55, http://dx.doi.org/10.1080/02763877.2012.707465. 12. sarah houghton, “mobile services for broke libraries: 10 steps to mobile success,” the reference librarian 53, no. 3 (2012): 313–21, http://dx.doi.org/10.1080/02763877.2012.679195. 13. pendell and bowman, “usability study.” 14. lisa carlucci thomas, “libraries, librarians and mobile services,” bulletin of the american society for information science & technology 38, no. 1 (2011): 8–9, http://dx.doi.org/10.1002/bult.2011.1720380105. 15. elder, “how to become the ‘tech guy.’” 16. kim, “responsive web design.” 17. chad mairn, “three things you can do today to get your library ready for the mobile experience,” the reference librarian 53, no. 3 (2012): 263–69, http://dx.doi.org/10.1080/02763877.2012.678245. 18. rempel and bridges, “that was then.” 19. rachael hu and alison meier, “planning for a mobile future: a user research case study from the california digital library,” serials 24, no. 3 (2011): s17–25. 20. kim, “responsive web design.” http://dx.doi.org/10.6017/ital.v32i4.4636 https://journals.ala.org/ltr/article/view/4507 http://dx.doi.org/10.1080/02763877.2012.707465 http://dx.doi.org/10.1080/02763877.2012.679195 http://dx.doi.org/10.1002/bult.2011.1720380105 http://dx.doi.org/10.1080/02763877.2012.678245 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 58 21. lorraine paterson and boon low, “student attitudes towards mobile library services for smartphones,” library hi tech 29, no. 3 (2011): 412–23, http://dx.doi.org/10.1108/07378831111174387. 22. jim hahn, michael twidale, alejandro gutierrez and reza farivar, “methods for applied mobile digital library research: a framework for extensible wayfinding systems,” the reference librarian 52, no. 1-2 (2011): 106–16, http://dx.doi.org/10.1080/02763877.2011.527600. 23. patterson and low, “student attitudes.” 24. gillian nowlan, “going mobile: creating a mobile presence for your library,” new library world 114, no. 3/4 (2013): 142–50, http://dx.doi.org/10.1108/03074801311304050. 25. elder, “how to become the ‘tech guy.’” 26. matthew connolly, tony cosgrave, and baseema b. krkoska, “mobilizing the library’s web presence and services: a student-library collaboration to create the library’s mobile site and iphone application,” the reference librarian 52, no. 1-2 (2010): 27–35, http://dx.doi.org/10.1080/02763877.2011.520109. 27. stephan spitzer, “make that to go: re-engineering a web portal for mobile access,” computers in libraries 3 no. 5 (2012): 10–14. 28. houghton, “mobile services.” 29. cody w. hanson, “mobile solutions for your library,” library technology reports 47, no. 2 (2011): 24–31, https://journals.ala.org/ltr/article/view/4475/5222. 30. terence k. huwe, “using apps to extend the library’s brand,” computers in libraries 33, no. 2 (2013): 27–29. 31. edward iglesias and wittawat meesangnill, “mobile website development: from site to app,” bulletin of the american society for information science and technology 38, no. 1 (2011): 18– 23. 32. jeff wisniewski, “mobile usability,” bulletin of the american society for information science & technology 38, no. 1 (2011): 30–32, http://dx.doi.org/10.1002/bult.2011.1720380108. 33. jeff wisniewski, “mobile websites with minimal effort,” online 34, no. 1 (2010): 54–57. 34. hahn et al., “methods for applied mobile digital library research.” 35. j. michael demars, “smarter phones: creating a pocket sized academic library,” the reference librarian 53, no. 3 (2012): 253–62, http://dx.doi.org/10.1080/02763877.2012.678236. http://dx.doi.org/10.1108/07378831111174387 http://dx.doi.org/10.1080/02763877.2011.527600 http://dx.doi.org/10.1108/03074801311304050 http://dx.doi.org/10.1080/02763877.2011.520109 https://journals.ala.org/ltr/article/view/4475/5222 http://dx.doi.org/10.1002/bult.2011.1720380108 http://dx.doi.org/10.1080/02763877.2012.678236 information technology and libraries | september 2016 59 36. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile websites,” code4lib no. 8 (2009), http://journal.code4lib.org/articles/2055. 37. demars, “smarter phones.” 38. hahn et al., “methods for applied mobile digital library research.” 39. beth stahr, “text message reference service: five years later,” the reference librarian no. 52, no. 1-2 (2011): 9–19, http://dx.doi.org/10.1080/02763877.2011.524502. 40. patterson and low, “student attitudes.” 41. ibid. 42. ibid. 43. hanson, “mobile solutions for your library.” 44. stahr, “text message reference service.” 45. spitzer, “make that to go.” 46. allison bolorizadeh et al., “making instruction mobile,” the reference librarian 53, no. 4 (2012): 373–83, http://dx.doi.org/10.1080/02763877.2012.707488. 47. maura keating, “will they come? get out the word about going mobile,” the reference librarian no. 52, no. 1-2 (2010): 20-26, http://dx.doi.org/10.1080/02763877.2010.520111. 48. patterson and low, “student attitudes.” 49. hanson, “mobile solutions for your library.” 50. patterson and low, “student attitudes.” 51. hanson, “mobile solutions for your library.” 52. cody w. hanson, “why worry about mobile?,” library technology reports no. 47, no. 2 (2011): 5–10, https://journals.ala.org/ltr/article/view/4476. 53. keating, “will they come?” 54. spitzer, “make that to go.” 55. kim, “responsive web design.” 56. wisniewski, “mobile usability.” 57. elder, “how to become the ‘tech guy.’” http://journal.code4lib.org/articles/2055 http://dx.doi.org/10.1080/02763877.2011.524502 http://dx.doi.org/10.1080/02763877.2012.707488 http://dx.doi.org/10.1080/02763877.2010.520111 https://journals.ala.org/ltr/article/view/4476 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 60 58. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 214–32, http://dx.doi.org/10.1108/00907321011044990. 59. brendan ryan, “developing library websites optimized for mobile devices,” the reference librarian 52, no. 1-2 (2010): 128–35, http://dx.doi.org/10.1080/02763877.2011.527792. 60. kim, “responsive web design.” 61. connolly, cosgrave, and krkoska, “mobilizing the library’s web presence and services.” 62. demars, “smarter phones.” 63. mark andy west, arthur w. hafner, and bradley d. faust, “expanding access to library collections and services using small-screen devices,” information technology & libraries 25 (2006): 103–7. 64. houghton, “mobile services.” 65. rempel and bridges, “that was then.” 66. elder, “how to become the ‘tech guy.’” 67. heather williams and anne peters, “and that’s how i connect to my library: how a 42second promotional video helped to launch the utsa libraries’ new summon mobile application,” the reference librarian 53, no. 3 (2012): 322–25, http://dx.doi.org/10.1080/02763877.2012.679845. 68. hahn et al., “methods for applied mobile digital library research.” 69. danielle andre becker, ingrid bonadie-joseph, and jonathan cain, “developing and completing a library mobile technology survey to create a user-centered mobile presence,” library hi-tech 31, no. 4 (2013): 688–99, http://dx.doi.org/10.1108/lht-03-2013-0032. 70. rempel and bridges, “that was then.” 71. iglesias and meesangnill, “mobile website development.” 72. elder, “how to become the ‘tech guy.’” 73. andrew walsh, “mobile information literacy: a preliminary outline of information behavior in a mobile environment,” journal of information literacy 6, no. 2 (2012): 56–69, http://dx.doi.org/10.11645/6.2.1696. 74. back and bailey, “web services and widgets.” 75. ibid. 76. ibid. 77. spitzer, “make that to go.” http://dx.doi.org/10.1108/00907321011044990 http://dx.doi.org/10.1080/02763877.2011.527792 http://dx.doi.org/10.1080/02763877.2012.679845 http://dx.doi.org/10.1108/lht-03-2013-0032 http://dx.doi.org/10.11645/6.2.1696 information technology and libraries | september 2016 61 78. iglesias and meesangnill, “mobile website development.” 79. bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28, https://journals.ala.org/ltr/article/view/4506. 80. pendell and bowman, “usability study.” 81. hahn et al., “methods for applied mobile digital library research.” 82. andromeda yelton, “where to go next,” library technology reports 48, no. 1 (2012): 25–34, https://journals.ala.org/ltr/article/view/4655/5511. 83. ibid. 84. hahn et al., “methods for applied mobile digital library research.” 85. houghton, “mobile services.” 86. ibid. 87. mairn, “three things you can do today.” 88. ibid. 89. tamara pianos, “econbiz to go: mobile search options for business and economics— developing a library app for researchers,” library hi tech 30, no. 3 (2012): 436–48, http://dx.doi.org/10.1108/07378831211266582. 90. demars, “smarter phones.” 91. ryan, “developing library websites.” 92. pendell and bowman, “usability study.” 93. ryan, “developing library websites.” 94. michael j. whitchurch, “qr codes and library engagement,” bulletin of the american society for information science & technology 38, no. 1 (2011): 14–17. 95. back and bailey, “web services and widgets.” 96. jingru hoivik, “global village: mobile access to library resources,” library hi tech 31, no. 3 (2013): 467–77, http://dx.doi.org/10.1108/lht-12-2012-0132. 97. elder, “how to become the ‘tech guy.’” 98. ryan, “developing library websites.” 99. west, hafner and faust, “expanding access.” 100. hu and meier, “planning for a mobile future.” 101. iglesias and meesangnill, “mobile website development.” https://journals.ala.org/ltr/article/view/4506 https://journals.ala.org/ltr/article/view/4655/5511 http://dx.doi.org/10.1108/07378831211266582 http://dx.doi.org/10.1108/lht-12-2012-0132 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 62 102. wisniewski, “mobile usability.” 103. joe murphy, “using mobile devices for research: smartphones, databases and libraries,” online 34, no. 3 (2010): 14–18. 104. amy vecchione and margie ruppel, “reference is neither here nor there: a snapshot of sms reference services,” the reference librarian 53, no. 4 (2012): 355–72, http://dx.doi.org/10.1080/02763877.2012.704569. 105. hu and meier, “planning for a mobile future.” 106. wilson and mccarthy, “the mobile university.” 107. project management institute, a guide to the project management body of knowledge (pmbok guide) (newtown square, pa: project management institute, 2013). 108. devendra potnis et al., “skills and knowledge needed to serve as mobile technology consultants in information organizations,” journal of education for library & information science 57 (2016): 187–96. http://dx.doi.org/10.1080/02763877.2012.704569 abstract introduction method forming and managing a team key steps in the analysis phase key steps for designing mamw key steps for implementing mamw skills needed for maintaining mamw conclusion forming and managing team this paper assumes a very small number of scholarly publications to be reflective of the real-world scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications anal... references can bibliographic data be put directly onto the semantic web? | yee 55 martha m. yee can bibliographic data be put directly onto the semantic web? this paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the semantic web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of more frbrized cataloging rules than those about to be introduced to the library community (resource description and access) and in creating an rdf data model for the rules. i am now in the process of trying to model my cataloging rules in the form of an rdf model, which can also be inspected at http://myee. bol.ucla.edu/. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is sophisticated enough yet to deal with our data. this article is an attempt to identify some of those areas and explore whether or not the problems i have encountered are soluble—in other words, whether or not our data might be able to live on the semantic web. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. t his paper is a think piece about the possible future of bibliographic control; as such, it raises more complex questions than it answers. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of frbrized descriptive and subject-cataloging rules. here my focus will be on the data model rather than on the frbrized cataloging rules for gathering data to put in the model, although i hope to have more to say about the latter in the future. the intent is not to present you with conclusions but to present some questions about data modeling that have arisen in the course of the experiment. my premise is that decisions about the data model we follow in the future should be made openly and as a community rather than in a small, closed group of insiders. if we are to move toward the creation of metadata that is more interoperable with metadata being created outside our community, as is called for by many in our profession, we will need to address these complex questions as a community following a period of deep thinking, clever experimentation, and astute political strategizing. n the vision the semantic web is still a bewitching midsummer night’s dream. it is the idea that we might be able to replace the existing html–based web consisting of marked-up documents—or pages—with a new rdf– based web consisting of data encoded as classes, class properties, and class relationships (semantic linkages), allowing the web to become a huge shared database. some call this web 3.0, with hyperdata replacing hypertext. embracing the semantic web might allow us to do a better job of integrating our content and services with the wider internet, thereby satisfying the desire for greater data interoperability that seems to be widespread in our field. it also might free our data from the proprietary prisons in which it is currently held and allow us to cooperate in developing open-source software to index and display the data in much better ways than we have managed to achieve so far in vendor-developed ils opacs or in giant, bureaucratic bibliographic empires such as oclc worldcat. the semantic web also holds the promise of allowing us to make our work more efficient. in this bewitching vision, we would share in the creation of uniform resource identifiers (uris) for works, expressions, manifestations, persons, corporate bodies, places, subjects, and so on. at the uri would be found all of the data about that entity, including the preferred name and the variant names, but also including much more data about the entity than we currently put into our work (name-title and title), such as personal name, corporate name, geographic, and subject authority records. if any of that data needed to be changed, it would be changed only once, and the change would be immediately accessible to all users, libraries, and library staff by means of links down to local data such as circulation, acquisitions, and binding data. each work would need to be described only once at one uri, each expression would need to be described only once at one uri, and so forth. very much up in the air is the question of what institutional structures would support the sharing of the creation of uris for entities on the semantic web. for the data to be reliable, we would need to have a way to ensure that the system would be under the control of people who had been educated about the value of clean and accurate entity definition, the value of choosing “most commonly known” preferred forms (for display in lists of multiple different entities), and the value of providing access martha m. yee (myee@ucla.edu) is cataloging supervisor at the university of california, los angeles film and television archive. 56 information technology and libraries | june 2009 under all variant forms likely to be sought. at the same time, we would need a mechanism to ensure that any interested members of the public could contribute to the effort of gathering variants or correcting entity definitions when we have had inadequate information. for example, it would be very valuable to have the input of a textual or descriptive bibliographer applied to difficult questions concerning particular editions, issues, and states of a significant literary work. it would also be very valuable to be able to solicit input from a subject expert in determining the bounds of a concept entity (subject heading) or class entity (classification). n the experiment (my project) to explore these bewitching ideas, i have been conducting an experiment. as part of my experiment, i designed a set of cataloging rules that are more frbrized than is rda in the sense that they more clearly differentiate between data applying to expression and data applying to manifestation. note that there is an underlying assumption in both frbr (which defines expression quite differently from manifestation) and on my part, namely that catalogers always know whether a given piece of data applies at either the expression or the manifestation level. that assumption is open to questioning in the process of the experiment as well. my rules also call for creating a more hierarchical and degressive relationship between the frbr entities work, expression, manifestation, and item, such that data pertaining to the work does not need to be repeated for every expression, data pertaining to the expression does not need to be repeated for every manifestation, and so forth. degressive is an old term used by bibliographers for bibliographies that provide great detail about first editions and less detail for editions after the first. i have adapted this term to characterize my rules, according to which the cataloger begins by describing the work; any details that pertain to all expressions and manifestations of the work are not repeated in the expression and manifestation descriptions. this paper would be entirely too long if i spent any more time describing the rules i am developing, which can be inspected at http://myee.bol.ucla .edu. here, i would like to focus on the data-modeling process and the questions about the suitability of rdf and the semantic web for encoding our data. (by the way, i don’t seriously expect anyone to adopt my rules! they are radically different than the rules currently being applied and would represent a revolution in cataloging practice that we may not be up to undertaking in the current economic climate. their value lies in their thought-experiment aspect and their ability to clarify what entities we can model and what entities we may not be able to model.) i am now in the process of trying to model my cataloging rules in the form of an rdf model (“rdf” as used in this paper should be considered from now on to encompass rdf schema [rdfs], web ontology language [owl], and simple knowledge organization system [skos] unless otherwise stated); this model can also be inspected at http://myee.bol .ucla.edu. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is yet sophisticated enough to deal with our data. this article is an attempt to outline some of those areas and explore whether the problems i have encountered are soluble, in other words, whether or not our data might be able to live on the semantic web eventually. i have already heard from rdf experts bruce d’arcus (miami university) and rob styles (developer of talis, as semantic web technology company), whom i cite later, but through this article i hope to reach a larger community. my research questions can be found later, but first some definitions. n definition of terms the semantic web is a way to represent knowledge; it is a knowledge-representation language that provides ways of expressing meaning that are amenable to computation; it is also a means of constructing knowledgedomain maps consisting of class and property axioms with a formal semantics rdf is a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats; an rdf metadata model is based on making statements about resources in the form of triples that consist of 1. the subject of the triple (e.g., “new york”); 2. the predicate of the triple that links the subject and the object (e.g., “has the postal abbreviation”); and 3. the object of the triple (e.g., “ny”). xml is commonly used to express rdf, but it is not a necessity; it can also be expressed in notation 3 or n3, for example.1 rdfs is an extensible knowledge-representation language that provides basic elements for the description of ontologies, also known as rdf vocabularies. using rdfs, statements are made about resources in the form of 1. a class (or entity) as subject of the rdf triple (e.g., “new york”); 2. a relationship (or semantic linkage) as predicate of the rdf triple that links the subject and the object (e.g., can bibliographic data be put directly onto the semantic web? | yee 57 “has the postal abbreviation”); and 3. a property (or attribute) as object of the rdf triple (e.g., “ny”). owl is a family of knowledge representation languages for authoring ontologies compatible with rdf. skos is a family of formal languages built upon rdf and designed for representation of thesauri, classification schemes, taxonomies, or subject-heading systems. n research questions actually, the full-blown semantic web may not be exactly what we need. remember that the fundamental definition of the semantic web is “a way to represent knowledge.” the semantic web is a direct descendant of the attempt to create artificial intelligence, that is, of the attempt to encode enough knowledge of the real world to allow a computer to reason about reality in a way indistinguishable from the way a human being reasons. one of the research questions should probably be whether or not the technology developed to support the semantic web can be used to represent information rather than knowledge. fortunately, we do not need to represent all of human knowledge—we simply need to describe and index resources to facilitate their retrieval. we need to encode facts about the resources and what the resources discuss (what they are “about”), not facts about “reality.” based on our past experience, doing even this is not as simple as people think it is. the question is whether we could do what we need to do within the context of the semantic web. sometimes things that sound simple do not turn out to be so simple in the doing. my research questions are as follows: 1. is it possible for catalogers to tell in all cases whether a piece of data pertains to the frbr expression or the frbr manifestation? 2. is it possible to fit our data into rdf? given that rdf was designed to encode knowledge rather than information, perhaps it is the wrong technology to use for our purposes? 3. if it is possible to fit our data into rdf, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (i.e., providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)? as stated previously, i am not yet ready to answer these questions. i hope to find answers in the course of developing the rules and the model. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. n other relevant projects other relevant projects include the following: 1. frbr, functional requirements for authority data (frad), funtional requirements for subject authority records (frsar), and frbr-objectoriented (frbroo). all are attempts to create conceptual models of bibliographic entities using an entity-relationship model that is very similar to the class-property model used by rdf.2 2. various initiatives at the library of congress (lc), such as lc subject headings (lcsh) in skos,3 the lc name authority file in skos,4 the lccn permalink project to create persistent uris for bibliographic records,5 and initiatives to provide skos representations for vocabularies and data elements used in marc, premis, and mets. these all represent attempts to convert our existing bibliographic data into uris that stand for the bibliographic entities represented by bibliographic records and authority records; the uris would then be available for experiments in putting our data directly onto the semantic web. 3. the dc-rda task group project to put rda data elements into rdf.6 as noted previously and discussed further later, rda is less frbrized than my cataloging rules, but otherwise this project is very similar to mine. 4. dublin core’s (dc’s) work on an rdf schema.7 dublin core is very focused on manifestation and does not deal with expressions and works, so it is less similar to my project than is the dc-rda task groups’s project (see further discussion later). n why my project? one might legitimately ask why there is a need for a different model than the ones already provided by frbr, frad, frsar, frbroo, rda, and dc. the frbr and rda models are still tied to the model that is implicit in our current bibliographic data in which expression and manifestation are undifferentiated. this is because publishers publish and libraries acquire and shelve manifestations. in our current bibliographic practice, a new 58 information technology and libraries | june 2009 bibliographic record is made for either a new manifestation or a new expression. thus, in effect, there is no way for a computer to tell one from the other in our current data. despite the fact that frbr has good definitions of expression (change in content) and manifestation (mere change in carrier), it perpetuates the existing implicit model in its mapping of attributes to entities. for example, frbr maps the following to manifestation: edition statements (“2nd rev. ed.”); statements of responsibility that identify translators, editors, and illustrators; physical description statements that identify illustrated editions; and extent statements that differentiate expressions (the 102-minute version vs. the 89-minute version); etc. thus the frbr definition of expression recognizes that a 2nd revised edition is a new expression, but frbr maps the edition statement to manifestation. in my model, i have tried to differentiate more cleanly data applying to expressions from data applying to manifestations.8 frbr and rda tend to assume that our current bibliographic data elements map to one and only one group 1 entity or class. there are exceptions, such as title, which frbr and rda define at work, expression, and manifestation levels. however, there is a lack of recognition that, to create an accurate model of the bibliographic universe, more data elements need to be applied at the work and expression level in addition to (or even instead of) the manifestation level. in the appendix i have tried to contrast the frbr, frad, and rda models with mine. in my model, many more data elements (properties and attributes) are linked to the work and expression level. after all, if the expression entity is defined as any change in work content, the work entity needs to be associated with all content elements that might change, such as the original extent of the work, the original statement of responsibility, whether illustrations were originally present, whether color was originally present in a visual work, whether sound was originally present in an audiovisual work, the original aspect ratio of a moving image work, and so on. frbr also tends to assume that our current data elements map to one and only one entity. in working on my model, i have come to the conclusion that this is not necessarily true. in some cases, a data element pertaining to a manifestation also pertains to the expression and the work. in other cases, the same data element is specific to that manifestation, and, in other cases, the same data element is specific to its expression. this is true of most of the elements of the bibliographic description. frad, in attempting to deal with the fact that our current cataloging rules allow a single person to have several bibliographic identities (or pseudonyms), treats person, name, and controlled access point as three separate entities or classes. i have tried to keep my model simpler and more elegant by treating only person as an entity, with preferred name and variant name as attributes or properties of that entity. frbroo is focused on the creation process for works, with special attention to the creation of unique works of art and other one-off items found in museums. thus frbroo tends to neglect the collocation of the various expressions that develop in the history of a work that is reproduced and published, such as translations, abridged editions, editions with commentary, etc. dc has concentrated exclusively on the description of manifestations and has neglected expression and work altogether. one of the tenets of semantic web development is that, once an entity is defined by a community, other communities can reuse that entity without defining it themselves. the very different definitions of the work and expression entities in the different communities described above raise some serious questions about the viability of this tenet. n assumptions it should be noted that this entire experiment is based on two assumptions about the future of human intervention for information organization. these two assumptions are based on the even bigger assumption that, even though the internet seems to be an economy based on free intellectual labor, and, even though human intervention for information organization is expensive (and therefore at more risk than ever), human intervention for information organization is worth the expense. n assumption 1: what we need is not artificial intelligence, but a better human–machine partnership such that humans can do all of the intellectual labor and machines can do all of the repetitive clerical labor. currently, catalogers spend too much time on the latter because of the poor design of current systems for inputting data. the universal employment provided by paying humans to do the intellectual labor of building the semantic web might be just the stimulus our economy needs. n assumption 2: those who need structured and granular data—and the precise retrieval that results from it—to carry out research and scholarship may constitute an elite minority rather than most of the people of the world (sadly), but that talented and intelligent minority is an important one for the cultural and technological advancement of humanity. it is even possible that, if we did a better job of providing access to such data, we might enable the enlargement of that minority. can bibliographic data be put directly onto the semantic web? | yee 59 n granularity and structure issues as soon as one starts to create a data model, one encounters granularity or cataloger-data parsing issues. these issues have actually been with us all along as we developed the data model implicit in aacr2r and marc 21. those familiar with rda, frbr, and frad development will recognize that much of that development is directed at increasing structure and granularity in catalogerproduced data to prepare for moving it onto the semantic web. however, there are clear trade-offs in an increase in structure and granularity. more structure and more granularity make possible more powerful indexing and more sophisticated display, but more structure and more granularity are more complex and expensive to apply and less likely to be implemented in a standard fashion across all communities; that is, it is less likely that interoperable data would be produced. any switching or mapping that was employed to create interoperable data would produce the lowest common denominator (the simplest and least granular data), and once rendered interoperable, it would not be possible for that data to swim back upstream to regain its lost granularity. data with less structure and less granularity could be easier and cheaper to apply and might have the potential to be adopted in a more standard fashion across all communities, but that data would limit the degree to which powerful indexing and sophisticated display would be possible. take the example of a personal name: currently, we demarcate surname from forename by putting the surname first, followed by a comma and then the forename. even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is surname and which part is forename in a culture unfamiliar to the cataloger. in other words, the more granularity you desire in your data, the more often the people collecting the data are going to encounter ambiguous situations. another example: currently, we do not collect information about gender self-identification; if we were to increase the granularity of our data to gather that information, we would surely encounter situations in which the cataloger would not necessarily know if a given creator was self-defined as a female or a male or of some other gender identity. presently, if we are adding a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which date is the birth date and which is the death date (although an occasional “b.” or “d.” will tell us this kind of information). we could certainly provide more granularity for dates, but that would make the marc 21 format much more complex and difficult to learn. people who dislike the marc 21 format already argue that it is too granular and therefore requires too much of a learning curve before people can use it. for example, tennant claims that “there are only two kinds of people who believe themselves able to read a marc record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.”9 how much of the granularity already in marc 21 is used either in existing records or, even if present, is used in indexing and display software? granularity costs money, and libraries and archives are already starving for resources. granularity can only be provided by people, and people are expensive. granularity and structure also exist in tension with each other. more granularity can lead to less structure (or more complexity to retain structure along with granularity). in the pursuit of more granularity of data than we have now, rda, attempting to support rdf–compliant xml encoding, has been atomizing data to make it useful to computers, but this will not necessarily make the data more useful to humans. to be useful to humans, it must be possible to group and arrange (sort) the data meaningfully, both for indexing and for display. the developers of skos refer to the “vast amounts of unstructured (i.e., human readable) information in the web,”10 yet labeling bits of data as to type and recording semantic relationships in a machine-actionable way do not necessarily provide the kind of structure necessary to make data readable by humans and therefore useful to the people the web is ultimately supposed to serve. consider the case of music instrumentation. if you have a piece of music for five guitars and one flute, and you simply code number and instrumentation without any way to link “five” with “guitars” and “one” with “flute,” you will not be able to guarantee that a person looking for music for five flutes and one guitar will not be given this piece of music in their results (see figure 1).11 the more granular the data, the less the cataloger can build order, sequencing, and linking into the data; the coding must be carefully designed to allow the desired order, sequencing, and linking for indexing and display to be possible, which might call for even more complex coding. it would be easy to lose information about order, sequencing, and linking inadvertently. actually, there are several different meanings for the term structure: 1. structure is an object of a record (structure of document?); for example, elings and waibel refer to “data fields . . . also referred to as elements . . . which are organized into a record by a data structure.”12 2. structure is the communications layer, as opposed to the display layer or content designation.13 3. structure is the record, field, and subfield. 4. structure is the linking of bits of data together in the 60 information technology and libraries | june 2009 form of various types of relationships. 5. structure is the display of data in a structured, ordered, and sequenced manner to facilitate human understanding. 6. data structure is a way of storing data in a computer so that it can be used efficiently (this is how computer programmers use the term). i hasten to add that i am definitely in favor of adding more structure and granularity to our data when it is necessary to carry out the fundamental objectives of our profession and of our catalogs. i argued earlier that frbr and rda are not granular enough when it comes to the distinction between data elements that apply to expression and those that apply to manifestation. if we could just agree on how to differentiate data applying to the manifestation from data applying to the expression instead of our current practice of identifying works with headings and lumping all manifestation and expression data together, we could increase the level of service we are able to provide to users a thousandfold. however, if we are not going to commit to differentiating between figure 1b. example of encoding of musical instrumentation at the expression level based on the above model 5 guitars 1 flute instrumentation of musical expression original instrumentation of musical expression—number of a particular instrument rdfs:label> original instrumentation of musical expression—type of instrument figure 1a. extract from yee rdf model that illustrates one technique for modeling musical instrumentation at the expression level (using a blank node to group repeated number and instrument type) can bibliographic data be put directly onto the semantic web? | yee 61 expression and manifestation, it would be more intellectually honest for frbr and rda to take the less granular path of mapping all existing bibliographic data to manifestation and expression undifferentiated, that is, to use our current data model unchanged and state this openly. i am not in favor of adding granularity for granularity’s sake or for the sake of vague conceptions of possible future use. granularity is expensive and should be used only in support of clear and fundamental objectives. n the goal: efficient displays and indexes my main concern is that we model and then structure the data in a way that allows us to build the complex displays that are necessary to make catalogs appear simple to use. i am aware that the current orthodoxy is that recording data should be kept completely separate from indexing and display (“the applications layer”). because i have spent my career in a field in which catalog records are indexed and displayed badly by systems people who don’t seem to understand the data contained in them, i am a skeptic. it is definitely possible to model and structure data in such a way that desired displays and indexes are impossible to construct. i have seen it happen! the lc working group report states that “it will be recognized that human users and their needs for display and discovery do not represent the only use of bibliographic metadata; instead, to an increasing degree, machine applications are their primary users.”14 my fear is that the underlying assumption here is that users need to (and can) retrieve the single perfect record. this will never be true for bibliographic metadata. users will always need to assemble all relevant records (of all kinds) as precisely as possible and then browse through them before making a decision about which resources to obtain. this is as true in the semantic web—where “records” can be conceived of as entity or class uris—as it is in the world of marc–encoded metadata. some of the problems that have arisen in the past in trying to index bibliographic metadata for humans are connected to the fact that existing systems do not group all of the data related to a particular entity effectively, such that a user can use any variant name or any combination of variant names for an entity and do a successful search. currently, you can only look for a match among two or more keywords within the bounds of a single manifestation-based bibliographic record or within the bounds of a single heading, minus any variant terms for that entity. thus, when you do a keyword search for two keywords, for example, “clemens” and “adventures,” you will retrieve only those manifestations of mark twain’s adventures of tom sawyer that have his real name (clemens) and the title word “adventures” co-occurring within the bounded space created by a single manifestation-based bibliographic record. instead, the preferred forms and the variant forms for a given entity need to be bounded for indexing such that the keywords the user employs to search for that entity can be matched using co-occurrence rules that look for matches within a single bounded space representing the entity desired. we will return to this problem in the discussion of issue 3 in the later section “rdf problems encountered.” the most complex indexing problem has always proven to be the grouping or bounding of data related to a work, since it requires pulling in all variants for the creator(s) of that work as well. otherwise, a user who searches for a work using a variant of the author’s name and a variant of the title will continue to fail (as they do in all current opacs), even when the desired work exists in the catalog. if we could create a uri for the adventures of tom sawyer that included all variant names for the author and all variant titles for the work (including the variant title tom sawyer), the same keyword search described above (“clemens” and “adventures”) could be made to retrieve all manifestations and expressions of the adventures of tom sawyer, instead of the few isolated manifestations that it would retrieve in current catalogs. we need to make sure that we design and structure the data such that the following displays are possible: n display all works by this author in alphabetical order by title with the sorting element (title) appearing at the top of each work displayed. n display all works on this subject in alphabetical order by principal author and title (with principal author and title appearing at top of each work displayed), or title if there is no principal author (with title appearing at top of each work displayed). we must ensure that we design and structure the data in such a way that our structure allows us to create subgroups of related data, such as instrumentation for a piece of music (consisting of a number associated with each particular instrument), place and related publisher for a certain span of dates on a serial title change record, and the like. n which standards will carry out which functions? currently, we have a number of different standards to carry out a number of different functions; we can speculate about how those functions might be allocated in a new semantic web–based dispensation, as shown in table 1. in table 1, data structure is taken to mean what a record represents or stands for; traditionally, a record has represented an expression (in the days of hand62 information technology and libraries | june 2009 press books) or a manifestation (ever since reproduction mechanisms have become more sophisticated, allowing an explosion of reproductions of the same content in different formats and coming from different distributors). rda is record-neutral; rdf would allow uris to be established for any and all of the frbr levels; that is, there would be a uri for a particular work, a uri for a particular expression, a uri for a particular manifestation, and a uri for a particular item. note that i am not using data structure in the sense that a computer programmer does (as a way of storing data in a computer so that it can be used efficiently). currently, the encoding of facts about entity relationships (see table 1) is carried out by matching data-value character strings (headings or linking fields using issns and the like) that are defined by the lc/naco authority file (following aacr2r rules), lcsh (following rules in the subject cataloging manual), etc. in the future, this function might be carried out by using rdf to link the uri for a resource to the uri for a data value. display rules (see table 1) are currently defined by isbd and aacr2r but widely ignored by systems, which frequently truncate bibliographic records arbitrarily in displays, supply labels, and the like; rda abdicates responsibility, pushing display out of the cataloging rules. the general principle on the web is to divorce data from display and allow anyone to display the data any way they want. display is the heart of the objects (or goals) of cataloging: the point is to display to the user the works of an author, the editions of a work, or the works on a subject. all of these goals only can be met if complex, high-quality displays can be built from the data created according to the data model. indexing rules (see table 1) were once under the control of catalogers (in book and card catalogs) in that users had to navigate through headings and cross-references to find table 1. possible reallocation of current functions in a new semantic web–based dispensation function current future? data content, or content guidelines (rules for providing data in a particular element) defined by aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data elements defined by isbd–based aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data values defined by lc/naco authority file, lcsh, marc 21 coded data values, etc. defined as ontologies using rdf/ rdfs/owl/skos encoding or labeling of data elements for machine manipulation; same as data format? defined by iso 2709–based marc 21 defined by rdf/rdfs/xml data structure (i.e., what a record stands for) defined by aacr2r and marc 21; also frbr? defined by rdf/rdfs/owl/ skos schematization (constraint on structure and content) marc 21, mods, dcmi abstract model defined by rdf/rdfs/owl/ skos encoding of facts about entity relationships carried out by matching data value strings (headings found in lc/naco authority file and lcsh, issn’s, and the like) carried out by rdf/rdfs/owl/ skos in the form of uri links display rules ils software, formerly isbd– based aacr2r (“application layer”) or yee rules indexing rules ils software sparql, “application layer,” or yee rules can bibliographic data be put directly onto the semantic web? | yee 63 what they wanted; currently indexing is in the hands of system designers who prefer to provide keyword indexing of bibliographic (i.e., manifestation-based) records rather than provide users with access to the entities they are really interested in (works, authors and subjects), all represented currently by authority records for headings and cross-references. rda abdicates responsibility, pushing indexing concerns completely out of the cataloging rules. the general principle on the web is to allow resources to be indexed by any web search engines that wish to index them. current web data is not structured at all for either indexing or display. i would argue that our interest in the semantic web should be focused on whether or not it will support more data structure—as well as more logic in that data structure—to support better indexes and better displays than we have now in manifestation-based ils opacs. crucial to better indexing than we have ever had before are the co-occurrence rules for keyword indexing, that is, the rules for when a co-occurrence of two or more keywords should produce a match. we need to be able to do a keyword search across all possible variant names for the entity of interest, and the entity of interest for the average catalog user is much more likely to be a particular work than to be a particular manifestation. unfortunately, catalog-use studies only have studied so-called known-item searches without investigating whether a known-item searcher was looking for a particular edition or manifestation of a work or was simply looking for a particular work in order to make a choice as to edition or manifestation once the work was found. however, common sense tells us that it is a rare user who approaches the catalog with prior knowledge about all published editions of a given work. the more common situation is surely one in which a user desires to read a particular shakespeare play or view a particular david lean film and discovers that the desired work exists in more than one expression or manifestation only after searching the catalog. we need to have the keyword(s) in our search for a particular work co-occur within a bounded space that encompasses all possible keywords that might refer to that particular work entity, including both creator and title keywords. notice in table 1 the unifying effect that rdf could potentially have; it could free us from the use of multiple standards that can easily contradict each other, or at least not live peacefully together. examples are not hard to find in the current environment. one that has cropped up in the course of rda development concerns family names. presently the rules for naming families are different depending on whether the family is the subject of a work (and established according to lcsh) or whether the family is responsible for a collection of papers (and established according to rda). n types of data rda has blurred the distinctions among certain types of data, apparently because there is a perception that on the semantic web the same piece of data needs to be coded only once, and all indexing and display needs can be supported from that one piece of data. i question that assumption on the basis of my experience with bibliographic cataloging. all of the following ways of encoding the same piece of data can still have value in certain circumstances: n transcribed; in rdf terms, a literal (i.e., any data that is not a uri, a constant value). transcribed data is data copied from an item being cataloged. it is valuable for providing access to the form of the name used on a title page and is particularly useful for people who use pseudonyms, corporate bodies that change name, and so on. transcribed data is an important part of the historical record and not just for off-line materials; it can be a historical record of changing data on notoriously fluid webpages. n composed; in rdf terms, also a literal. composed data is information composed by a cataloger on the basis of observation of the item in hand; it can be valuable for historical purposes to know which data was composed. n supplied; in rdf terms, also a literal. supplied data is information supplied by a cataloger from outside sources; it can be valuable for historical purposes to know which data was supplied and from which outside sources it came. n coded; in rdf, represented by a uri. coded data would likely transform on the semantic web into links to ontologies that could provide normalized, human-readable identification strings on demand, thus causing coded and normalized data to merge into one type of data. is it not possible, though, that the coded form of normalized data might continue to provide for more efficient searching for computers as opposed to humans? coded data also has great cross-cultural value, since it is not as language-dependent as literals or normalized headings. n normalized headings (controlled headings); in rdf, represented by a uri. normalized or controlled headings are still necessary to provide users with coherent, ordered displays of thousands of entities that all match the user’s search for a particular entity (work, author, subject, etc.). the reason google displays are so hideous is that, so far, the data searched lacks any normalized display data. if variant language forms of the name for an entity 64 information technology and libraries | june 2009 are linked to an entity uri, it should be possible to supply headings in the language and script desired by a particular user. n the rdf model those who have become familiar with frbr over the years will probably not find it too difficult to transition from the frbr conceptual model to the rdf model. what frbr calls an “entity,” rdf calls a “subject” and rdfs calls a “class.” what frbr calls an “attribute,” rdf calls an “object” and rdfs calls a “property.” what frbr calls a “relationship,” rdf calls a “predicate” and rdfs calls a “relationship” or a “semantic linkage” (see table 2). the difficulty in any data-modeling exercise lies in deciding what to treat as an entity or class and what to treat as an attribute or property. the authors of frbr decided to create a class called expression to deal with any change in the content of a work. when frbr is applied to serials, which change content with every issue, the model does not work well. in my model, i found it useful to create a new entity at the manifestation level, the serial title, to deal with the type of change that is more relevant to serials, the change in title. i also created another new entity at the manifestation level, title-manifestation, to deal with a change of title in a nonserial work that is not associated with a change in content. one hundred years ago, this entity would have been called title-edition. i am also in the process of developing an entity at the expression level—surrogate—to deal with reproductions of original artworks that need to inherit the qualities of the original artwork they reproduce without being treated as an edition of that original artwork, which ipso facto is unique. these are just examples of cases in which it is not that easy to decide on the classes or entities that are necessary to accurately model bibliographic information. see the appendix for a complete comparison of the classes and entities defined in four different models: frbr, frad, rda, and the yee cataloging rules (ycr). the appendix also shows variation among these models concerning whether a given data element is treated as a class/entity or as an attribute/property. the most notable examples are name and preferred access point, which are treated as classes/entities in frad, as attributes in frbr and ycr, and as both in rda. n rdf problems encountered my goal for this paper is to institute discussion with data modelers about which problems i observed are insoluble and which are soluble: 1. is there an assumption on the part of semantic web developers that a given data element, such as a publisher name, should be expressed as either a literal or using a uri (i.e., controlled), but never both? cataloging is rooted in humanistic practices that require careful recording of evidence. there will always be value in distinguishing and labeling the following types of data: n copied as is from an artifact (transcribed) n supplied by a cataloger n categorized by a cataloger (controlled) tim berners-lee (the father of the internet and the semantic web) emphasizes the importance of recording not just data but also its provenance for the sake of authenticity.15 for many data elements, therefore, it will be important to be able to record both a literal (transcribed or composed form or both) and a uri (controlled form). is this a problem in rdf? as a corollary, if any data that can be given a uri cannot also be represented by a literal (transcribed and composed data, or one or the other), it may not be possible to design coherent, readable displays of the data describing a particular entity. among other things, cataloging is a discursive writing skill. does rdf require that all data be represented only once, either by a literal or by a uri? or is it perhaps possible that data that has a uri could also have a transcribed or composed form as a property? perhaps it will even be possible to store multiple snapshots of online works that change over time to document variant forms of a name for works, persons, and so on. 2. will the internet ever be fast enough to assemble the equivalent of our current records from a collection of hundreds or even thousands of uris? in rdf, links are one-to-one rather than one-to-many. this leads to a great proliferation of reciprocal links. the more granularity there is in the data, the more linking is necessary to ensure that atomized data elements are linked together. potentially, every piece of data describing a particular entity could be represented by a uri leading out to a skos list of data values. the number of links necessary to pull together table 2. the frbr conceptual model translated into rdf and rdfs frbr rdf rdfs entity subject class attribute object property relationship predicate relationship/ semantic linkage can bibliographic data be put directly onto the semantic web? | yee 65 all of the data just to describe one manifestation could become astronomical, as could the number of one-to-one links necessary to create the appearance of a one-to-many link, such as the link between an author and all the works of an author. is the internet really fast enough to assemble a record from hundreds of uris in a reasonable amount of time? given the often slow network throughput typical of many of our current internet connections, is it really practical to expect all of these pieces to be pulled together efficiently to create a single display for a single user? we yet may feel nostalgia for the single manifestation-based record that already has all of the relevant data in it (no assembly required). bruce d’arcus points out, however, that i think if you’re dealing with rdf, you wouldn’t necessarily be gathering these data in real-time. the uris that are the targets for those links are really just global identifiers. how you get the triples is a separate matter. so, for example, in my own personal case, i’m going to put together an rdf store that is populated with data from a variety of sources, but that data population will happen by script, and i’ll still be querying a single endpoint, where the rdf is stored in a relational database.16 in other words, d’arcus essentially will put them all in one place, or in one database that “looks” from a uri perspective to be “one place” where they’re already gathered. 3. is rdf capable of dealing with works that are identified using their creators? we need to treat author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. lexical labels, or human-readable identifiers for works that are identified using both the principal author and the title, are particularly problematic in rdf given that the principal author is an entity in its own right. is rdf capable of supporting the indexing necessary to allow a user to search using any variant of the author’s name and any variant of the title of a work in combination and still retrieve all expressions and manifestations of that work, given that author will have a uri of its own, linked by means of a relationship link to the work uri? is rdf capable of supporting the display of a list of one thousand works, each identified by principal author, in order first by principal author, then by title, then by publication date, given that the preferred heading for each principal author would have to be assembled from the uri for that principal author and the preferred title for each work would have to be assembled from the uri for that work? for fear that this will not, in fact, be possible, i have put a human-readable work-identifier data element into my model that consists of principal author and title when appropriate, even though that means the preferred name of the principal author may not be able to be controlled by the entity record for the principal author. any guidance from experienced data modelers in this regard would be appreciated. according to bruce d’arcus, this is purely an interface or application question that does not require a solution at the data layer.17 since we have never had interfaces or applications that would do this correctly, even though the data is readily available in authority records, i am skeptical about this answer! perhaps bruce’s suggestion under item 9 of designating a sortname property for each entity is the solution here as well. my human-readable work identifier consisting of the name of the principal creator and uniform title of work could be designated the sortname poperty for the work. it would have to be changed whenever the preferred form of the name for the principal creator changed, however. 4. do all possible inverse relationships need to be expressed explicitly, or can they be inferred? my model is already quite large, and i have not yet defined the inverse of every property as i really should to have a correct rdf model. in other words, for every property there needs to be an inverse property; for example, the property iscreatorof needs to have the inverse property iscreatedby; thus “twain” has the property iscreatorof, while “adventures of tom sawyer” has the property iscreatedby. perhaps users and inputters will not actually have to see the huge, complex rdf data model that would result from creating all the inverse relationships, but those who maintain the model will have to deal with a great deal of complexity. however, since i’m not a programmer, i don’t know how the complexity of rdf compares to the complexity of existing ils software. 5. can rdf solve the problems we are having now because of the lack of transitivity or inheritance in the data models that underlie current ilses, or will rdf merely perpetuate these problems? we have problems now with the data models that underlie our current ilses because of the inability of these models to deal with hierarchical inheritance, such that whatever is true of an entity in the hierarchy is also true of every entity below that entity in the hierarchy. one example is that of cross-references to a parent corporate body that should be held to apply to all subdivisions of that corporate body but never are in existing ils systems. there is a cross-reference from “fbi” to “united states. federal bureau of investigation,” but not from “fbi counterterrorism division” to “united states. federal bureau of investigation. counterterrorism division.” for that reason, a search in any opac name index for “fbi counterterrorism division” will fail. we need systems that recognize that data about a parent corporate body is relevant to all subdivisions of that parent body. we need systems that recognize that data about a work is relevant to all expressions and manifestations of that work. rdf allows you to link a work to an expression 66 information technology and libraries | june 2009 and an expression to a manifestation, but i don’t believe it allows you to encode the information that everything that is true of the work is true of all of its expressions and manifestations. rob styles seems to confirm this: “rdf doesn’t have hierarchy. in computer science terms, it’s a graph, not a tree, which means you can connect anything to anything else in any direction.”18 of course, not all links should be this kind of transitive or inheritance link. one expression of work a is linked to another expression of work a by links to work a, but whatever is true of one of those expressions is not necessarily true of the other; one may be illustrated, for example, while the other is not. whatever is true of one work is not necessarily true of another work related to it by related work link. it should be recognized that bibliographic data is rife with hierarchy. it is one of our major tools for expressing meaning to our users. corporate bodies have corporate subdivisions, and many things that are true for the parent body also are true for its subdivisions. subjects are expressed using main headings and subject subdivisions, and many things that are true for the main heading (such as variant names) also are true for the heading combined with one of its subdivisions. geographic areas are contained within larger geographic areas, and many things that are true of the larger geographic area also are true for smaller regions, counties, cities, etc., contained within that larger geographic area. for all these reasons, i believe that, to do effective displays and indexes for our bibliographic data, it is critical that we be able to distinguish between a hierarchical relationship and a nonhierarchical relationship. 6. to recognize the fact that the subject of a book or a film could be a work, a person, a concept, an object, an event, or a place (all classes in the model), is there any reason we cannot define subject itself as a property (a relationship) rather than a class in its own right? in my model, all subject properties are defined as having a domain of resource, meaning there is no constraint as to the class to which these subject properties apply. i’m not sure if there will be any fall-out from that modeling decision. 7. how do we distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location? sometimes a place is a jurisdiction and behaves like a corporate body (e.g., united states is the name of the government of the united states). sometimes place is a physical location in which something is located (e.g., the birds discussed in a book about the birds of the united states). to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, i have defined two different classes for place: place as jurisdictional corporate body and place as geographic area. will this cause problems in the model? will there be times when it prevents us from making elegant generalizations in the model about place per se? there is a similar problem with events. some events are corporate bodies (e.g., conferences that publish papers) and some are a kind of subject (e.g., an earthquake). i have defined two different classes for event: conference or other event as corporate body creator and event as subject. 8. what is the best way to model a bound-with or an issuedwith relationship, or a part–whole relationship in which the whole must be located to obtain the part? the bound-with relationship is actually between two items containing two different works, while the issued-with relationship is between two manifestations containing two different works (see figure 2). is this a work-to-work relationship? will designating it a work-to-work relationship cause problems for indicating which specific items or manifestation-items of each work are physically located in the same place? this question may also apply to those part–whole relationships in which the part is physically contained within the whole and both are located in the same place (sometimes known as analytics). one thing to bear in mind is that in all of these cases the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that are contained in the particular manifestation or item that is bound with, issued with, or part of the whole. however, if the relationship is modeled as a work-1manifestation to work-2-manifestation relationship, or a work-1-item to work-2-item relationship,, care must be taken in the design of displays to pull in enough information about the two or more works so as not to confuse the user. 9. how do we express the arrangement of elements that have a definite order? i am having trouble imagining how to encode the ordering of data elements that make up a larger element, such as the pieces of a personal name. this is really a desire to control the display of those atomized elements so that they make sense to human beings rather than just to machines. could one define a property such as natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of a person given that the ideal order of these elements might vary from one person to another? could one define properties such as sorting element 1, sorting element 2, sorting element 3, etc., and assign them to the various pieces that will be assembled to make a particular heading for an entity, such as an lcsh heading for a historical period? (depending on the answer to the question in item 11, it may or may not be possible to assign a property to a property in this fashion.) are there standard sorting rules we need to be aware of (in unicode, for example)? are there other rdf techniques available to deal with sorting and arrangement? bruce d’arcus suggests that, instead of coding the name parts, it would be more useful to designate sortname properties;19 might it not be necessary to designate a sortname property for each variant name, as well, can bibliographic data be put directly onto the semantic web? | yee 67 for cases in which variants need to appear in sorted displays? and wouldn’t these sortname properties complicate maintenance over time as preferred and variant names changed? 10. how do we link related data elements in such a way that effective indexing and displays are possible? some examples: number and kind of instrument (e.g., music written for two oboes and three guitars); multiple publishers, frequencies, subtitles, editors, etc., with date spans for a serial title change (or will it be necessary to create a new manifestation for every single change in subtitle, publisher name, place of publication, etc?). the assumption seems to be that there will be no repeatable data elements. based on my somewhat limited experience with rdf, it appears that there are record equivalents (every data element—property or relationship—pertaining to a particular entity with a uri), but there are no field or subfield equivalents that allow the sublinking of related pieces of data about an entity. indeed, rob styles goes so far as to argue that ultimately there is no notion of a “record” in rdf.20 it is possible that blank nodes might be able to fill in for fields and subfields in some cases for grouping data, but there are dangers involved in their use.21 to a cataloger, it looks as though the plan is for rdf data to float around loose without any requirement that there be a method for pulling it together into coherent displays designed for human beings. 11. can a property have a property in rdf? as an example of where it might be useful to define a property of a property, robert maxwell suggests that date of publication is really an attribute (property) of the published by relationship (another property).22 another example: in my model, a variant title for a serial is a property. can that property itself have the property type of variant title to encompass things like spine title, key title, etc.? another example appeared in item 9, in which it is suggested that it might be desirable to assign sort-element properties to the various elements of a name property. 12. how do we document record display decisions? there is no way to record display decisions in rdf itself; it is completely display-neutral. we could not safely commit to a particular rdf–based data model until a significant amount of sample bibliographic data had been created and open-source indexing and display software had been designed and user-tested on that data. it may be that we will need to supplement rdf with some other encoding mechanism that allows us to record display decisions along with the data. current cataloging rules are about display as much as they are about content designation. isbd concerns the order in which the elements should be displayed to humans. the cataloging objectives concern display to users of such entity groups as the works of an author, the editions of a work, and the works on a subject. 13. can all bibliographic data be reduced to either a class or a property with a finite list of values? another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus. cataloging is the art of writing discursive prose as much as it is the ability to select the correct value for a particular data element. we must deal with ambiguous data (presented by joe blow could mean that joe created the entire work, produced it, distributed it, sponsored it, or merely funded it). we must sometimes record information without knowing its exact meaning. we must deal with situations that have not been anticipated in advance. it is not possible to list every possible kind of data and every possible value for each type of figure 2. examples of part–whole relationships. how might these be best expressed in rdf? issued-with relationship a copy of charlie chaplin’s 1917 film the immigrant can be found on a videodisc compilation called charlie chaplin, the early years along with two other chaplin films. this compilation was published and collected by many different libraries and media centers. if a user wants to view this copy of the immigrant, he or she will first have to locate charlie chaplin, the early years, then look for the desired film at the beginning of the first videodisc in the set. the issued-with relationship between the immigrant and the other two films on charlie chaplin, the early years is currently expressed in the bibliographic record by means of a “with” note: first on charlie chaplin, the early years, v. 1 (62 min.) with: the count – easy street. bound-with relationship the university of california, los angeles film & television archive has acquired a reel of 16 mm. film from a collector who strung five warner bros. cartoons together on a single reel of film. we can assume that no other archive, library, or media collection will have this particular compilation of cartoons, so the relationship between the five cartoons is purely local in nature. however, any user at the film & television archive who wishes to view one of these cartoons will have to request a viewing appointment for the entire reel and then find the desired cartoon among the other four on the reel. the bound-with relationship among these cartoons is currently expressed in a holdings record by means of a “with” note: fourth on reel with: daffy doodles – tweety pie – i love to singa – along flirtation walk. 68 information technology and libraries | june 2009 data up front before any data is gathered. it will always be necessary to provide a plain-text escape hatch. the bibliographic world is a complex, constantly changing world filled with ambiguity. n what are the next steps? in a sense, this paper is a first crude attempt at locating unmapped territory that has not yet been explored. if we were to decide as a community that it would be valuable to move our shared cataloging activities onto the semantic web, we would have a lot of work ahead of us. if some of the rdf problems described above are insoluble, we may need to work with semantic web developers to create a more sophisticated version of rdf that can handle the transitivity and complex linking required by our data. we will also need to encourage a very complex existing community to evolve institutional structures that would enable a more efficient use of the internet for the sharing of cataloging and other metadata creation. this is not just a technological problem, but also a political one. in the meantime, the experiment continues. let the thinking and learning begin! references and notes 1. “notation3, or n3 as it is more commonly known, is a shorthand non–xml serialization of resource description framework models, designed with human-readability in mind: n3 is much more compact and readable than xml rdf notation. the format is being developed by tim berners-lee and others from the semantic web community.” wikipedia, “notation 3,” http://en.wikipedia.org/wiki/notation_3 (accessed feb. 19, 2009). 2. frbr review group, www.ifla.org/vii/s13/wgfrbr/; frbr review group, franar (working group on functional requirements and numbering of authority records), www .ifla.org/vii/d4/wg-franar.htm; frbr review group, frsar (working group, functional requirements for subject authority records), www.ifla.org/vii/s29/wgfrsar.htm; frbroo, frbr review group, working group on frbr/crm dialogue, www .ifla.org/vii/s13/wgfrbr/frbr-crmdialogue_wg.htm. 3. library of congress, response to on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 24, 39, 40, www.loc.gov/bibliographic-future/news/lcwgrpt response_dm_053008.pdf (accessed mar. 25, 2009). 4. ibid., 39. 5. ibid., 41. 6. dublin core metadata initiative, dcmi/rda task group wiki, http://www.dublincore.org/dcmirdataskgroup/ (accessed mar. 25, 2009). 7. mikael nilsson, andy powell, pete johnston, and ambjorn naeve, expressing dublin core metadata using the resource description framework (rdf), http://dublincore.org/ documents/2008/01/14/dc-rdf/ (accessed mar. 25, 2009). 8. see for example table 6.3 in frbr, which maps to manifestation every kind of data that pertains to expression change with the exception of language change. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998): 95, http://www.ifla.org/vii/s13/frbr/frbr.pdf (accessed mar. 4, 2009). 9. roy tennant, “marc must die,” library journal 127, no. 17 (oct. 15, 2002): 26. 10. w3c, skos simple knowledge organization system reference, w3c working draft 29 august 2008, http://www.w3.org/ tr/skos-reference/ (accessed mar. 25, 2009). 11. the extract in figure 1 is taken from my complete rdf model, which can be found at http://myee.bol.ucla.edu/ ycrschemardf.txt. 12. mary w. elings and gunter waibel, “metadata for all: descriptive standards and metadata sharing across libraries, archives and museums,” first monday 12, no. 3 (mar. 5, 2007), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/1628/1543 (accessed mar. 25, 2009). 13. oclc, a holdings primer: principles and standards for local holdings records, 2nd ed. (dublin, ohio: oclc, 2008), 4, http:// www.oclc.org/us/en/support/documentation/localholdings/ primer/holdings%20primer%202008.pdf (accessed mar. 25, 2009). 14. the library of congress working group, on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 30, http:// www.loc.gov/bibliographic-future/news/lcwg-ontherecord -jan08-final.pdf (accessed mar. 25, 2009). 15. talis, sir tim berners-lee talks with talis about the semantic web: transcript of an interview recorded on 7 february 2008, http://talis-podcasts.s3.amazonaws.com/twt20080207_timbl .html (accessed mar. 25, 2009). 16. bruce d’arcus, e-mail to author, mar. 18, 2008. 17. ibid. 18. rob styles, e-mail to author, mar. 25, 2008. 19. bruce d’arcus, e-mail to author, mar. 18, 2008. 20. rob styles, e-mail to author, mar. 25, 2008. 21. w3c, “section 2.3, structured property values and blank nodes,” in rdf primer: w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-primer/#structuredproperties (accessed mar. 25, 2009). 22. robert maxwell, frbr: a guide for the perplexed (chicago: ala, 2008). can bibliographic data be put directly onto the semantic web? | yee 69 entities/classes in rda, frbr, frad compared to yee cataloging rules (ycr) rda, frbr, and frad ycr group 1: work work group 1: expression expression surrogate group 1: manifestation manifestation title-manifestation serial title group 1: item item group 2: person person fictitious character performing animal group 2: corporate body corporate body corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family (rda and frad only) group 3: concept concept group 3: object object group 3: event event or historical period as subject group 3: place place as geographic area discipline genre/form name identifier controlled access point rules (frad only) agency (frad only) appendix. entity/class and attribute/property comparisons 70 information technology and libraries | june 2009 attributes/properties in frbr compared to frad model entity frbr frad work title of the work form of work date of the work other distinguishing characteristics intended termination intended audience context for the work medium of performance (musical work) numeric designation (musical work) key (musical work) coordinates (cartographic work) equinox (cartographic work) form of work date of the work medium of performance subject of the work numeric designation key place of origin of the work original language of the work history other distinguishing characteristic expression title of the expression form of expression date of expression language of expression other distinguishing characteristics extensibility of expression revisability of expression extent of the expression summarization of content context for the expression critical response to the expression use restrictions on the expression sequencing pattern (serial) expected regularity of issue (serial) expected frequency of issue (serial) type of score (musical notation) medium of performance (musical notation or recorded sound) scale (cartographic image/object) projection (cartographic image/object) presentation technique (cartographic image/object) representation of relief (cartographic image/object) geodetic, grid, and vertical measurement (cartographic image/ object) recording technique (remote sensing image) special characteristic (remote sensing image) technique (graphic or projected image) form of expression date of expression language of expression technique other distinguishing characteristic surrogate can bibliographic data be put directly onto the semantic web? | yee 71 model entity frbr frad manifestation title of the manifestation statement of responsibility edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution fabricator/manufacturer series statement form of carrier extent of the carrier physical medium capture mode dimensions of the carrier manifestation identifier source for acquisition/access authorization terms of availability access restrictions on the manifestation typeface (printed book) type size (printed book) foliation (hand-printed book) collation (hand-printed book) publication status (serial) numbering (serial) playing speed (sound recording) groove width (sound recording) kind of cutting (sound recording) tape configuration (sound recording) kind of sound (sound recording) special reproduction characteristic (sound recording) colour (image) reduction ratio (microform) polarity (microform or visual projection) generation (microform or visual projection) presentation format (visual projection) system requirements (electronic resource) file characteristics (electronic resource) mode of access (remote access electronic resource) access address (remote access electronic resource) edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution form of carrier numbering title-manifestation serial title item item identifier fingerprint provenance of the item marks/inscriptions exhibition history condition of the item treatment history scheduled treatment access restrictions on the item location of item attributes/properties in frbr compared to frad (cont.) 72 information technology and libraries | june 2009 model entity frbr frad person name of person dates of person title of person other designation associated with the person dates associated with the person title of person other designation associated with the person gender place of birth place of death country place of residence affiliation address language of person field of activity profession/occupation biography/history fictitious character performing animal corporate body name of the corporate body number associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body type of corporate body language of the corporate body address field of activity history corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family type of family dates of family places associated with family history of family concept term for the concept type of concept object term for the object type of object date of production place of production producer/fabricator physical medium event term for the event date associated with the event place associated with the event attributes/properties in frbr compared to frad (cont.) can bibliographic data be put directly onto the semantic web? | yee 73 model entity frbr frad place term for the place coordinates other geographical information discipline genre/form name type of name scope of usage dates of usage language of name script of name transliteration scheme of name identifier type of identifier identifier string suffix controlled access point type of controlled access point status of controlled access point designated usage of controlled access point undifferentiated access point language of base access point script of base access point script of cataloguing transliteration scheme of base access point transliteration scheme of cataloguing source of controlled access point base access point addition rules citation for rules rules identifier agency name of agency agency identifier location of agency attributes/properties in frbr compared to frad (cont.) 74 information technology and libraries | june 2009 attributes/properties in rda compared to ycr model entity rda ycr work title of the work form of work date of work place of origin of work medium of performance numeric designation key signatory to a treaty, etc. other distinguishing characteristic of the work original language of the work history of the work identifier for the work nature of the content coverage of the content coordinates of cartographic content equinox epoch intended audience system of organization dissertation or theses information key identifier for work language-based identifier (preferred lexical label) variant language-based identifier (alternate lexical label) language-based identifier (preferred lexical label) for work language-based identifier for work (preferred lexical label) identified by principalcreator in combination with uniform title language-based identifier (preferred lexical label) for work identified by title alone (uniform title) supplied title for work variant title for work original language of work responsibility for work original publication statement of work dates associated with work original publication/release/broadcast date of work copyright date of work creation date of work date of first recording of a work date of first performance of a work finding date of naturally occurring object original publisher/distributor/broadcaster of work places associated with work original place of publication/distribution/broadcasting for work country of origin of work place of creation of work place of first recording of work place of first performance of work finding place of naturally occurring object original method of publication/distribution/broadcast of work serial or integrating work original numeric and/or alphabetic designations—beginning serial or integrating work original chronological designations— beginning serial or integrating work original numeric and/or alphabetic designations—ending serial or integrating work original chronological designations— ending encoding of content of work genre/form of content of work original instrumentation of musical work instrumentation of musical work—number of a particular instrument instrumentation of musical work—type of instrument original voice(s) of musical work voice(s) of musical work—number of a particular type of voice voice(s) of musical work—type of voice original key of musical work numeric designation of musical work coordinates of cartographic work equinox of cartographic work original physical characteristics of work original extent of work original dimensions of work mode of issuance of work can bibliographic data be put directly onto the semantic web? | yee 75 model entity rda ycr work (cont.) original aspect ratio of moving image work original image format of moving image work original base of work original materials applied to base of work work summary work contents list custodial history of work creation of archival collection censorship history of work note about relationship(s) to other works expression content type date of expression language of expression other distinguishing characteristic of the expression identifier for the expression summarization of the content place and date of capture language of the content form of notation accessibility content illustrative content supplementary content colour content sound content aspect ratio format of notated music medium of performance of musical content duration performer, narrator, and/or presenter artistic and/or technical credits scale projection of cartographic content other details of cartographic content awards key identifier for expression language-based identifier (preferred lexical label) for expression variant title for expression nature of modification of expression expression title expression statement of responsibility edition statement scale of cartographic expression projection of cartographic expression publication statement of expression place of publication/distribution/release/broadcasting for expression place of recording for expression publisher/distributor/releaser/broadcaster for expression publication/distribution/release/broadcast date for expression copyright date for expression date of recording for expression numeric and/or alphabetic designations for serial expressions chronological designations for serial expressions performance date for expression place of performance for expression extent of expression content of expression language of expression text language of expression captions language of expression sound track language of sung or spoken text of expression language of expression subtitles language of expression intertitles language of summary or abstract of expression instrumentation of musical expression instrumentation of musical expression—number of a particular instrument instrumentation of musical expression—type of instrument voice(s) of musical expression voice(s) of musical expression—number of a particular type of voice voice(s) of musical expression—type of voice key of musical expression appendages to the expression expression series statement mode of issuance for expression notes about expression surrogate [under development] attributes/properties in rda compared to ycr (cont.) 76 information technology and libraries | june 2009 model entity rda ycr manifestation title statement of responsibility edition statement numbering of serials production statement publication statement distribution statement manufacture statement copyright date series statement mode of issuance frequency identifier for the manifestation note media type carrier type base material applied material mount production method generation layout book format font size polarity reduction ratio sound characteristics projection characteristics of motion picture film video characteristics digital file characteristics equipment and system requirements terms of availability key identifier for manifestation publication statement of manifestation place of publication/distribution/release/broadcast of manifestation manifestation publisher/distributor/releaser/broadcaster manifestation date of publication/distribution/release/broadcast carrier edition statement carrier piece count carrier name carrier broadcast standard carrier recording type carrier playing speed carrier configuration of playback channels process used to produce carrier carrier dimensions carrier base materials carrier generation carrier polarity materials applied to carrier carrier encoding format intermediation tool requirements system requirements serial manifestation illustration statement manifestation standard number manifestation isbn manifestation issn manifestation publisher number manifestation universal product code notes about manifestation titlemanifestation key identifier for title-manifestation variant title for title-manifestation title-manifestation title title-manifestation statement of responsibilities title-manifestation edition statement publication statement of title-manifestation place of publication/distribution/release/broadcasting of titlemanifestation publisher/distributor/releaser, broadcaster of title-manifestation date of publication/distribution/release/broadcast of titlemanifestation title-manifestation series title-manifestation mode of issuance notes about title-manifestation title-manifestation standard number attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 77 model entity rda ycr serial title key identifier for serial title variant title for serial title title of serial title serial title statement of responsibility serial title edition statement publication statement of serial title place of publication/distribution/release/broadcast of serial title publisher/distributor/releaser/broadcaster of serial title date of publication/distribution/release/broadcast of serial title serial title beginning numeric and/or alphabetic designations serial title beginning chronological designations serial title ending numeric and/or alphabetic designations serial title ending chronological designations serial title frequency serial title mode of issuance serial title illustration statement notes about serial title serial title issn-l item preferred citation custodial history immediate source of acquisition identifier for the item item-specific carrier characteristics key identifier for item item barcode item location item call number or accession number item copy number item provenance item condition item marks and inscriptions item exhibition history item treatment history item scheduled treatment item access restrictions attributes/properties in rda compared to ycr (cont.) 78 information technology and libraries | june 2009 model entity rda ycr person name of the person preferred name for the person variant name for the person date associated with the person title of the person fuller form of name other designation associated with the person gender place of birth place of death country associated with the person place of residence address of the person affiliation language of the person field of activity of the person profession or occupation biographical information identifier for the person key identifier for person language-based identifier (preferred lexical label) for person clan name of person forename/given name/first name of person matronymic of person middle name of person nickname of person patronymic of person surname/family name of person natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of person affiliation of person biography/history of person date of birth of person date of death of person ethnicity of person field of activity of person gender of person language of person place of birth of person place of death of person place of residence of person political affiliation of person profession/occupation of person religion of person variant name for person fictitious character [under development] performing animal [under development] corporate body name of the corporate body preferred name for the corporate body variant name for the corporate body place associated with the corporate body date associated with the corporate body associated institution other designation associated with the corporate body language of the corporate body address of the corporate body field of activity of the corporate body corporate history identifier for the corporate body key identifier for corporate body language-based identifier (preferred lexical label) for corporate body dates associated with corporate body field of activity of corporate body history of corporate body language of corporate body place associated with corporate body type of corporate body variant name for corporate body corporate subdivision [under development] place as jurisdictional corporate body [under development] attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 79 model entity rda ycr conference or other event as corporate body creator [under development] jurisdictional corporate subdivision [under development] family name of the family preferred name for the family variant name for the family type of family date associated with the family place associated with the family prominent member of the family hereditary title family history identifier for the family concept term for the concept preferred term for the concept variant term for the concept type of concept identifier for the concept key identifier for concept language-based identifier (preferred lexical label) for concept qualifier for concept language-based identifier variant name for concept object name of the object preferred name for the object variant name for the object type of object date of production place of production producer/fabricator physical medium identifier for the object key identifier for object language-based identifier (preferred lexical label) for object qualifier for object language-based identifier variant name for object event name of the event preferred name for the event variant name for the event date associated with the event place associated with the event identifier for the event key identifier for event or historical period as subject language-based identifier (preferred lexical label) for event or historical period as subject beginning date for event or historical period as subject ending date for event or historical period as subject variant name for event or historical period as subject place name of the place preferred name for the place variant name for the place coordinates other geographical information identifier for the place key identifier for place as geographic area language-based identifier (preferred lexical label) for place as geographic area qualifier for place as geographic area variant name for place as geographic area discipline key identifier for discipline language-based identifier (preferred lexical label) (name or classification number or symbol) for discipline translation of meaning of classification number or symbol for discipline attributes/properties in rda compared to ycr (cont.) 80 information technology and libraries | june 2009 model entity rda ycr genre/form key identifier for genre/form language-based identifier (preferred lexical label) for genre/form variant name for genre/form name scope of usage date of usage identifier controlled access point rules agency note: in rda, the following attributes have not yet been assigned to a particular class or entity: extent, dimensions, terms of availability, contact information, restrictions on access, restrictions on use, uniform resource locator, status of identification, source consulted, cataloguer’s note, status of identification, and undifferentiated name indicator. name is being treated as both a class and a property. identifier and controlled access point are treated as properties rather than classes in both rda and ycr. attributes/properties in rda compared to ycr (cont.) utilizing technology to support and extend access to students and job seekers during the pandemic public libraries leading the way utilizing technology to support and extend access to students and job seekers during the pandemic daniel berra information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13261 daniel berra (danielb@pfulgervilletx.gov) is assistant director, pflugerville (texas) public library. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. the ongoing pandemic necessitated a reimaging of public library services and resources. out of this challenge rose opportunities to better serve the needs of our communities during the pandemic and beyond. when our library first closed our doors to the public last march, we began discussions on how the needs of our community have changed. we identified two key groups for whom the pandemic had forced an uncomfortable shift: students suddenly thrust into virtual learning and adults who had lost their jobs. while we continue to serve all members of our community in a variety of ways, we looked to increase support for these specific groups utilizing available technology. like many public libraries, the pflugerville public library quickly shifted our service model to include virtual programs, curbside pickup, library cards issued remotely and a focus on electronic resources. our community is rapidly growing and diverse. many of our nearly 70,000 residents are frequent users of library services, attend our wide array of programs, hold meetings, study or work inside the building and enjoy both the physical and virtual library collection. the pandemic shift required our talented staff to find ways to provide a similar level of service to a community who heavily utilizes the library. for both students and job seekers, we took steps to alleviate some of the difficulties the building’s closure caused by utilizing existing technology. we worked with the city’s it department to extend the library’s wi-fi to cover the entire parking lot, allowing for 24-hour access. we also utilized our existing print from your own device system to allow library users to submit print jobs and then pick them up through our curbside service. we added additional wi-fi hotspots available for checkout to ensure access at home for those lacking internet. since these services were already offered to some degree, the expansion of access was relatively easy to implement. for students we drew upon our existing relationship with the pflugerville independent school district (pfisd) to provide support and extend access. we expanded the offering of our special digit cards, which allow students to sign up for an account giving them access to all of our electronic resources and wi-fi hotspots. the school district’s librarians handle the signups and then submit the forms so we can set up the accounts then contact students by email or phone. we further extended access to ebooks by working with the district and our vendor overdrive, to provide a direct way for students to browse and check out through the district’s own ebook app. this allows students to seamlessly see both of our collections, significantly increasing their reading options and removing barriers to access. on the support front, we utilized a portion of the city’s cares act funds directed toward the library to launch a live, virtual tutoring service called brainfuse helpnow. students of all ages have anonymous access to tutors from home seven days a week, as well as additional homework mailto:danielb@pfulgervilletx.gov information technology and libraries march 2021 utilizing technology to support and extend access | berra 2 support resources. this piece meshes nicely with some of our virtual programming for teens, like our sat and act practice tests and other testand career-preparation e-resources. recognizing the pandemic’s impact on the economy, and how this directly affects our community, we worked to prioritize support for the unemployed and under-employed. we added a resume review/job-search coaching service led by two of our circulation staff members. we utilized another portion of our cares act funds to offer career online high school, providing adults with access to an online program to obtain their high school diploma. we also began lending laptops for home use to ensure access to necessary technology. some of our support was already in place before the pandemic began, and we made a significant marketing push to highlight these e-resources. for instance, we partner with the pflugerville community development corporation to provide the online training resource lynda.com (soon to be linkedin learning). we saw a large increase in usage particularly in the first few months of the pandemic as community members looked to add employable skills to their toolboxes. we also created a page on our website with all of our job search assistance resources and services highlighted in one place. while the main emphasis of these efforts utilizes technology, serving the needs of the entire community also requires supporting those who are generally less connected. we have to balance our digital expectations with something more tangible, recognizing many library users still utilize the library in a more traditional way. for students, our senior youth services librarian partnered with pfisd for a book give away in conjunction with the district’s food distribution program to get books in the hands of children for the summer. we also began distributing “care kits” through our curbside service that include personal grooming products and cold weather gear for anyone in need. while 2020 featured the addition of many new services or significant expansion of existing ones, we are focused in 2021 on increasing our marketing efforts for these offerings. relying too heavily on digital forms of communication can limit the impact of our services. for instance, if we want to let people who do not have access to the internet at home know we have wi-fi hotspots and laptops available for checkout, then spreading the word through our standard methods of social media, website, and email will prove ineffective. with the building currently closed to the public, we face an additional barrier to communication. to help alleviate some of this, we have created a job search assistance flyer that we are distributing at places like local food pantries. we plan to expand on similar methods of marketing throughout the year. while positive feedback is often hidden from libraries since we prioritize patron privacy and anonymity, we have received a few specific stories that highlight our impact. our firs t scholarship recipient for career online high school shared how the opportunity to obtain her high school diploma will open up new professional avenues and erase the stigma of having not completed high school. another community member who took advantage of our job search coaching to prepare for an interview expressed gratitude to the library staff who helped increase his employment chances. we also see resumes and homework assignments printed through our virtual printing service, hear from parents with children utilizing hotspots for virtual schooling, see cars in the parking lot using the extended wi-fi and track statistics showing a large increase in the usage of our electronic resources. https://library.pflugervilletx.gov/services/assistance-for-job-seekers information technology and libraries march 2021 utilizing technology to support and extend access | berra 3 the ongoing pandemic necessitated a re-imagining of library services. the needs of our community changed and we set out to find ways to provide assistance to those who need it the most utilizing technology, while remaining mindful of those who are not as comfortable in the digital age. the combination of utilizing technology to address the current needs and expanding access to this technology, has allowed us to better serve the community. we are in the process now of evaluating all of our changes to determine which ones will continue even after the pandemic ends. we already know that we will keep our methods of extending access like the expanded wi-fi availability, laptops for checkout, digit cards for students and the seamless connection to our ebook collection for pfisd. in the area of support, we will continue to offer career online high school, brainfuse helpnow for virtual tutoring, and our resume review/job search coaching service. public libraries are well positioned to innovate and adjust to changes in society. it is one of the things we do extremely well, out of necessity, but also out of a deep desire to serve our communities. all of the shifts the pflugerville public library made related to supporting students and job seekers drew upon existing technology and available resources. what changed was the areas on which we chose to focus our efforts. by prioritizing support and access while pinpointing the needs of the moment, we found ways to better serve our community within the context of everything else we provide. while the jury is still out on how successful some of these initiatives will prove, we already know that many of these changes will continue long after the pandemic ends. editorial board thoughts | dehmlow 53 mark dehmloweditorial board thoughts the ten commandments of interacting with nontechnical people m ore than ten years of working with technology and interacting with nontechnical users in a higher education environment has taught me many lessons about successful communication strategies. somehow, in that time, i have been fortunate to learn some effective mechanisms for providing constructive support and leading successful technical projects with both technically and “semitechnically” minded patrons and librarians. i have come to think of myself as someone who lives in the “in between,” existing more in the beyond than the bed or the bath, and, while not a native of either place, i like to think that i am someone who is comfortable in both the technical and traditional cliques within the library. ironically, it turns out that the most critical pieces to successfully implementing technology solutions and bridging the digital divide in libraries has been categorically nontechnical in nature; it all comes down to collegiality, clear communication, and a commitment to collaboration. as i ruminated on the last ten plus years of working in technology, i began to think of the behaviors and techniques that have proved most useful in developing successful relationships across all areas of the library. the result is this list of the top ten dos and don’ts for those of us self-identified techies who are working more and more often with the self-identified nontechnical set. 1. be inclusive—i have been around long enough to see how projects that include only technical people are doomed to scrutiny and criticism. the single best strategy i have found to getting buy-in for technical projects is to include key stakeholders and those with influence in project planning and core decision-making. not only does this create support for projects, but it encourages others to have a sense of ownership in project implementation—and when people feel ownership for a project, they are more likely to help it succeed. 2. share the knowledge—i don’t know if it is just the nature of librarianship, but librarians like to know things, and more often than not they have a healthy sense of curiosity about how things work. i find it goes a long way when i take a few moments to explain how a particular technology works. our public services specialists, in particular, often want to know the details of how our digital tools work so that they can teach users most effectively and answer questions users have about how they function. sharing expertise is a really nice way to be inclusive. 3. know when you have shared enough—in the same way that i don’t need to know every deep detail of collections management to appreciate it, most nontechies don’t need hour-long lectures on how each component of technology relates to the other. knowing how much information to share when describing concepts is critical to keeping people’s interest and generally keeping you approachable. 4. communicate in english—it is true that every specialization has its own vocabulary and acronyms (oh how we love acronyms in libraries) that have no relevance to nonspecialists. i especially see this in the jargon we use in the library to describe our tools and services. the best policy is to avoid jargon and explain concepts in lay-person’s terms or, if using jargon is unavoidable, define specialized words in the simplest terms possible. using analogies and drawing pictures can be excellent ways to describe technical concepts and how they work. it is amazing how much from kindergarten remains relevant later in life! 5. avoid techno-snobbery—i know that i am risking virtual ostracism in writing this, but i think it needs to be said. just because i understand technology does not make me better than others, and i have heard some variant of the “cup holder on the computer” joke way too often. even if you don’t make these kinds of comments in front of people who aren’t as technically capable as you, the attitude will be apparent in your interactions, and there is truly nothing more condescending. 6. meet people halfway—when people are trying to ask technology-related questions or converse about technical issues, don’t correct small mistakes. instead, try to understand and coax out their meaning; elaborate on what they are saying, and extend the conversation to include information they might not be aware of. people don’t like to be corrected or made to feel stupid—it is embarrassing. if their understanding is close enough to the basic idea, letting small mistakes in terminology slide can create an opening for a deeper understanding. you can provide the correct terminology when talking about the topic without making a point to correct people. 7. don’t make a clean technical/nontechnical distinction— after once offering the “technical” perspective on a topic, one librarian said to me that it wasn’t that they themselves didn’t have any technical mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, hesburgh libraries, university of notre dame, notre dame, indiana. 54 information technology and libraries | june 2009 perspective, it just wasn’t perhaps as extensive as mine. each person has some level of technical expertise; it is better to encourage the development of that understanding rather than compartmentalizing people on the basis of their area of expertise. 8. don’t expect everyone to be interested—just because i chose a technical track and am interested in it doesn’t mean everyone should be. sometimes people just want to focus on their area of expertise and let the technical work be handled by the techies. 9. assume everyone is capable—at least at some level. sometimes it is just a question of describing concepts in the right way, and besides, not everyone should be a programmer. everyone brings their own skills to the table and that should be respected. 10. expertise is just that—and no one, no one knows everything. there just isn’t enough time, and our brains aren’t that big. embrace those with different expertise, and bring those perspectives into your project planning. a purely technical perspective, while perhaps being efficient, may not provide a practical or intuitive solution for users. diversity in perspective creates stronger projects. in the same way that the most interesting work in academia is becoming increasingly more multidisciplinary, so too the most successful work in libraries needs to bring diverse perspectives to the fore. while it is easy to say libraries are constantly becoming more technically oriented because of the expanse of digital collections and services, the need for the convergence of the technical and traditional domains is clear—digital preservation is a good example of an area that requires the lessons and strengths learned from physical preservation, and, if anything, the technical aspects still raise more questions than solutions—just see henry newman’s article “rocks don’t need to be backed up” to see what i mean.1 increasingly, as we develop and implement applications that better leverage our collections and highlight our services, their success hinges on their usability, user-driven design, and implementations based on user feedback. these “user”-based evaluation techniques fit more closely with traditional aspects of public services: interacting with patrons. lastly, it is also important to remember that technology can be intimidating. it has already caused a good deal of anxiety for those in libraries who are worried about long-term job security as technology continues to initiate changes in the way we perform our jobs. one of the best ways to bring people along is to demystify the scary parts of technology and help them see a role for themselves in the future of the library. going back to maslow’s hierarchy of needs, people want to feel a sense of security and belonging, and i believe it is incumbent upon those of us with a deep understanding of technology to help bring the technical to the traditional in a way that serves everyone in the process. reference 1. henry newman, “rocks don’t need to be backed up,” enterprise storage forum.com (mar. 27, 2009), www.enterprise storageforum.com/continuity/features/article.php/3812496 (accessed april 24, 2009). student use of library computers: are desktop computers still relevant in today’s libraries? susan thompson information technology and libraries |december 2012 20 abstract academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. a two-year study conducted at california state university san marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. the study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. it also showed that students who used computers in the library were more likely to use other library services and physical collections. introduction for more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. originally, the focus was on providing access to library resources, first the online catalog and then journal databases. for the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. however, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. do students still use and value access to computers in the library? what impact does student computer use have on the library and its other services? have we reached the point where we should reevaluate how we use computers to support student research? california state university san marcos (csusm) is a public university with about nine thousand students, primarily undergraduates from the local area. csusm was established in 1991 and is one of the youngest campuses in the 23-campus california state university system. the library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. one of the core principles in planning the new building was the vision of the library as a teaching and learning center. as a result, a great deal of thought went into the design of technology to support this vision. rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. we also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 susan thompson (sthompsn@csusm.edu) is coordinator of library systems, california state university san marcos. student use of library computers | thompson 21 rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. we used two strategies. first, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. in addition, simplified “walk-up” computers are available on every floor. the simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. the other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. more than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. however, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. in 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. the goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. the study addressed several specific research questions. first, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. were students using the library’s computers appropriately? second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. how did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? this article will focus on the study results pertaining to the second and third research questions. we found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. however, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for libraryprovided desktop computers. literature review the concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. information commons, defined by steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 one of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 information technology and libraries |december 2012 22 providing access to computers appears to contribute to the value of libraries as “place.” shill and toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. for example, the university of tennessee and indiana university have shown significant increases in gate counts after they implemented their commons.5 while many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. burke looked at factors influencing students’ choice of computing facilities at an australian university.6 given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. they rated the library most highly on its convenience and closeness to resources. perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. the 2010 nationwide educause center for applied research (ecar) study, from the same year as the second csusm study, showed that 89 percent of students had laptops.7 other nationwide studies have corroborated this high level of laptop ownership.8 so, does this increased use of laptops and mobile devices have affect the use of desktop computers? the 2010 ecar study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of csusm’s new library building. pew’s internet & american life project trend data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. university of colorado boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 while they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. to entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. tom, voss, and scheetz found students want flexibility with a spectrum of technological options.11 certainly, they want wi-fi and power outlets to support their mobile technology. however, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. while the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. a 2009 wall street journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 for the last three years, educause’s horizon reports have made support for non-laptop mobile technologies one of the top trends. the 2009 horizon report mentioned that in countries like japan, “young people equipped student use of library computers | thompson 23 with mobiles often see no reason to own personal computers.”13 in 2010, horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 members of the group with the mobile devices were found to work on the course more during their spare time. the 2011 horizon report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for internet access.15 therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. trends described in the ecar and horizon studies make it clear that students own multiple devices. so how do they use them in the study environment? head’s interviews with undergraduate students at ten us campuses found that “students use a less is more approach to manage and control all of the it devices and information systems available to them.”16 for example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. the question then may not be which technology libraries should support but rather how to support the right technology at the right time. method the csusm study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. the study protocol received exempt status by the university human subjects review board. it was carried out twice over a two-year period to determine whether time of the semester affected usage. in 2009, the study was administered at the end of the spring term, april 15 to may 3. we expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. the 2010 study was conducted near the beginning of the term, february 4 to february 18. we that early term students would be less engaged in academic assignments, particularly major research projects. we carried out each study over a two-week period. an attempt was made to check consistency by duplicating each time and location. each location was surveyed monday—thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. the survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. the fourth and fifth floor observations also included the library’s forty quiet study rooms. for the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. we used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. the survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. it also asked students for their suggestions for changes in the library. the survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. the survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation information technology and libraries |december 2012 24 period. users who refused to take the survey were counted in the total number of students asked to do the survey. however, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. the “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. the second method involved conducting “over-the-shoulder” observations of students using the library computers. while students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. all users in the area were observed whether or not they had agreed to take the survey. the one exception was users in group-study rooms. the observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. the observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). the student was not identified on the form. we felt that these observations could validate information provided by the users on the survey. results we completed 1,452 observations in 2009 and 2,501 observations in 2010. the gate counts for the primary month each study took place—70,607 for april 2009 and 59,668 for february 2010— show the library was used more heavily during the final exam period. the larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. the observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. in 2009, 17 percent of students were observed using laptops (see table1). this number almost doubled in 2010 to 33 percent. most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or ipods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. student use of library computers | thompson 25 table 1. mobile technology observed in 2009, 1,141 students completed the computer-use survey. however, we were unable to accurately determine the return rate that year. the nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. thus the majority of the refusals to take the survey were because the subject had already completed one previously. the 2010 study accounted for this phenomenon by counting refusals and duplications separately. in 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. the 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. the 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. table 2. frequency of computer use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% laptop in use headphones in use mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% daily when on campus several times a week several times a month rarely use comps in library 2009 2010 information technology and libraries |december 2012 26 in both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. students also indicated they were frequent users (see table 2). in 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. the frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. about a quarter of students preferred library computers because of their close access to other library services. table 3. preferred computer to use in the library the types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). it is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. however, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. the 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). in 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. the majority of students own music players. the one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% sit-down pc walk-up pc own laptop laptop checked out in library 2009 2010 student use of library computers | thompson 27 table 4. technology devices owned by students (2010) to understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. by far the most popular use of the library, by more than three-quarters of the students, was for study. around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. the biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. the late-term students were 9 percent more likely to meet with others. by contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% kindle/book reader other handheld devices netbook smart phone desktop computer regular cell phone ipod/mp3 music player laptop information technology and libraries |december 2012 28 table 5. other library services used in 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. the question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. the next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% other pick up ill/circuit create a video/web page use a reserve book ask questions/get help look for journals/newspapers checkout a book use media meet with others study 2009 2010 non-library student use of library computers | thompson 29 discussion the results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. this preference is confirmed by the statistics reported by csusm’s campus it department, which tracks computer login data. this data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. for instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. the data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. supporting the popularity of using computers in the library is the fact that most students are repeat customers. table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. in contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. the 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. the kind of computer that users prefered changed somewhat over the course of the study. the preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). one reason for this 12 percent drop may be related to how the survey was adminstered. the 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. as a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. these numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). the drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. the other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. for instance, the pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 however, it is interesting to note that, according to the pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. the 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. we found that 77 percent of csusm students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). the combined 84 percent laptop ownership is comparable with the 2010 ecar study’s finding of 89 percent student laptop ownership nationwide.18 this high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of csusm students still prefer to use desktop computers in the library. aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent information technology and libraries |december 2012 30 of respondents. further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. to try to better understand this seemingly contradictory behavior, we have done some further investigation. anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. the desktop computers’ speed and reliable internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. in may 2012, the csusm library held a focus group that provided additional insight to the question of desktops vs. laptops. all of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. when asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. the 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). table 6. reasons students prefer using library computer areas for almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. the top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. the most popular library service was the media 0% 5% 10% 15% 20% 25% 30% library services are close library staff are close 2009 2010 student use of library computers | thompson 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. it is interesting that the level of use of these library services was similar whether early or late in the term. the biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. for example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. to put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. the implication is clear that students using computers in the library are much more likely to use the library’s other services. we usually think of providing desktop computers as a service for students, and so it is. however, the study results show that providing computers also benefits the library itself. it reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. the popularity of the library computers provide us with a “captive audience” of repeat customers. conclusion the csusm library technology that was planned in 2004 is still meeting students’ needs. although most of our students own laptops, most still prefer to use desktop computers in the library. in fact, providing a full-service computer environment to support the entire research process benefits the entire library. students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. going forward, several questions arise for future studies. csusm is a commuter school. students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. one question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. if the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? the 2010 study was completed just a few weeks before the first ipad was introduced. since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the macbook air, more likely to be carried on campus by students and used them more frequently for their research? how important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? are students using smart phones and other mobile devices for study purposes? in fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? this study’s results make it clear that we can’t just look at data such as ecar’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. as information technology and libraries |december 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. however, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. references 1. susan m. thompson and gabriella sonntag. “chapter 4: building for learning: synergy of space, technology and collaboration.” learning commons: evolution and collaborative essentials. oxford: chandos publishing (2008): 117-199. 2. heidi m. steiner and robert p. holley, “the past, present, and possibilities of commons in the academic library,” reference librarian 50, no. 4 (2009): 309–332. 3. michael j. whitchurch and c. jeffery belliston,“information commons at brigham young university: past, present, and future,” reference services review 34, no. 2 (2006): 261–78. 4. harold shill and shawn tonner, “creating a better place: physical improvements in academic libraries, 1995–2002,” college & research libraries 64 (2003): 435. 5. barbara i. dewey, “social, intellectual, and cultural spaces: creating compelling library environments for the digital age,” journal of library administration 48, no. 1 (2008): 85–94; diane dallis and carolyn walters, “reference services in the commons environment,” references services review 34, no. 2 (2006): 248–60. 6. liz burke et al., “where and why students choose to use computer facilities: a collaborative study at an australian and united kingdom university,” australian academic & research libraries 39, no. 3 (september 2008): 181–97. 7. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, co: educause center for applied research, october 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed march 21, 2012). 8. pew internet & american life project, “adult gadget ownership over time (2006–2012),” http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx (accessed june 14, 2012); the horizon report: 2009 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2010 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2011 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012). 9. pew internet, “adult gadget ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf student use of library computers | thompson 33 10. deborah keyek-franssen et al., computer labs study university of colorado boulder office of information technology october 7, 2011, http://oit.colorado.edu/sites/default/files/labsstudypenultimate-10-07-11.pdf (accessed june 15, 2012). 11. j. s. c. tom, k. voss, and c. scheetz[full names?], “the space is the message: first assessment of a learning studio,” educause quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed june 25, 2012). 12. nick wingfield, “time to leave the laptop behind,” wall street journal, february 23, 2009, http://online.wsj.com/article/sb122477763884262815.html (accessed june 15 2012). 13. the horizon report: 2009 edition. 14. the horizon report: 2010 edition. 15. the horizon report: 2011 edition. 16. alison j. head and michael b. eisenberg, “balancing act: how college students manage technology while in the library during crunch time,” project information literacy research report, information school, university of washington, october 12, 2011, http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf (accessed june 14, 2012). 17. pew internet, “adult gadget ownership.” 18. smith and caruso, ecar study. http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/sb122477763884262815.html http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf table 1. mobile technology observed discussion digital collections are a sprint, not a marathon: adapting scrum project management techniques to library digital initiatives michael j. dulock and holley long information technology and libraries | december 2015 5 abstract this article describes a case study in which a small team from the digital initiatives group and metadata services department at the university of colorado boulder (cu-boulder) libraries conducted a pilot of the scrum project management framework. the pilot team organized digital initiatives work into short, fixed intervals called sprints—a key component of scrum. working for more than a year in the modified framework yielded significant improvements to digital collection work, including increased production of digital objects and surrogate records, accelerated publication of digital collections, and an increase in the number of concurrent projects. adoption of sprints has improved communication and cooperation between participants, reinforced teamwork, and enhanced their ability to adapt to shifting priorities. introduction libraries in recent years have freely adapted methodologies from other disciplines in an effort to improve library services. for example, librarians have • employed usability testing techniques to enhance users’ experience with digital libraries interfaces,1 improve the utility of library websites,2 and determine the efficacy of a visual search interface for a commercial library database;3 • adopted participatory design methods to identify information visualizations that could augment digital library services4 and determine user needs in new library buildings;5 and • utilized principles of continuous process improvement to enhance workflows for book acquisition and implementation of serial title changes in a technical services unit.6 librarians often come to the profession with disciplinary knowledge from an undergraduate degree unrelated to librarianship, so it should come as no surprise that they bring some of that disciplinary knowledge to their work. the interdisciplinary nature of librarianship also creates an environment that is amenable to adoption or adaptation of techniques from a variety of sources, not only those originating in library science. in this paper, the authors describe their experiences michael j. dulock (michael.dulock@colorado.edu) is assistant professor and metadata librarian, university of colorado boulder. holley long (longh@uncw.edu), previously assistant professor and systems librarian for digital initiatives at university of colorado, boulder, is digital initiatives librarian, randall library, university of north carolina wilmington. mailto:michael.dulock@colorado.edu mailto:longh@uncw.edu digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 6 in applying a modified scrum management framework to facilitate digital collection production. they begin by elucidating the fundamentals of scrum and then describes a pilot project using aspects of the methodology. they discuss the outcomes of the pilot and posit additional features of scrum that may be adopted in the future. fundamentals of scrum project management the scrum project management framework—one of several techniques under the rubric of agile project management—originated in software development, and has been applied in a variety of library contexts including the development of digital library platforms7 and library web applications.8 scrum’s salient characteristics include self-managing teams that organize their work into “short iterations of clearly defined deliverables” and focus on “communication over documentation.”9 the scrum primer: a lightweight guide to the theory and practice of scrum describes the roles, tools, and processes involved in this project management technique.10 scrum teams are cross-functional and consist of five to nine members who are cross-trained to perform multiple tasks. in addition to the team, two individuals serve specialized roles, scrum master and product owner. the scrum master is responsible for ensuring that scrum principles are followed and for removing any obstacles that hinder the team’s productivity. hence the scrum master is not a project manager, but a facilitator. the product owner’s role is to manage the product by identifying and prioritizing its features. this individual represents the stakeholders’ interests and is ultimately responsible for the product’s value. the team divides their work into short, fixed intervals called sprints that typically last two to four weeks and are never extended. at the beginning of each sprint, the team meets to select and commit to completing a set of deliverables. once these goals are set, they remain stable for the duration; course corrections can occur in later sprints. in software development, the scrum team aims to complete a unit of work that stands on its own and is fully functional, known as a potentially shippable increment. it is selected from an itemized list of product features called the product backlog. the backlog is established at the outset of development and consists of a comprehensive list of tasks that must occur to complete the product. a well-constructed backlog has four characteristics. first, it is prioritized with the features that will yield the highest return on investment at the top of the list. second, the backlog is appropriately detailed, so that the tasks at the top of the list are well-defined whereas those at the bottom may be more vaguely demarcated. third, each task receives an estimation for the amount of effort required to complete it, which helps the team to project a timeline for the product. finally, the backlog evolves in response to new developments. individual tasks may be added, deleted, divided, or reprioritized over the life of the project. during the course of a sprint, team members meet to plan the sprint, check-in on a daily basis, and then debrief at the conclusion of the sprint. they begin with a two-part planning meeting in which the product owner reviews the highest priority tasks with the team. in the second half of the meeting, the team and the scrum master determine how many of the tasks can be accomplished in information technologies and libraries |december 2015 7 the given timeframe, thus defining the goals for the sprint. this meeting generally lasts no longer than four hours for a two-week sprint. every day, the team holds a brief meeting to get organized and stay on track. during these “daily scrums,” each team member shares three pieces of information: what has been accomplished since the previous meeting, what will be accomplished before the next meeting, and what, if any, obstacles are impeding the work. these fifteen-minute meetings provide the team with a valuable opportunity to communicate and coordinate their efforts. sprints conclude with two meetings, a review and retrospective. during the review, the team inspects the deliverables that were produced during that sprint. the retrospective provides an opportunity to discuss the process, what is working well, and what needs to be adjusted. figure 1. typical meeting schedule for a two-week sprint evidence in the literature suggests that scrum improves both outcomes and process. one metaanalysis of 274 programming case studies found that implementing scrum led to improved productivity as well as greater customer satisfaction, product quality, team motivation, and cost reduction.11 proponents of this project management technique find that it leads to a more flexible and efficient process. scrum’s brief iterative work cycles and evolving product backlog promote adaptability so the team can address the inevitable changes that occur over the life of a project. by contrast, traditional project management techniques have been criticized for requiring too much time upfront on planning and being too rigid to respond to changes in later stages of the project.12 scrum also promotes communication over documentation,13 resulting in less administrative overhead as well as increased accountability and trust between team members. scrum pilot at university of colorado boulder libraries the university of colorado boulder (cu-boulder) libraries digital initiatives team was interested in adopting scrum because of its incremental approach to completing large projects, its focus on communication, and its flexibility. these attributes meshed well with the group’s goals to publish larger collections more quickly and to more effectively multitask the production of multiple high digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 8 priority collections. the group’s staffing model and approach to collection building prior to the scrum pilot is described here to provide some context for this choice of project management tool. digital collection proposals are vetted by a working group composed of ten members, the digital library management group (dlmg), to ensure that major considerations such as copyright status are fully investigated before undertaking the collection. approved proposals are prioritized by the appropriate collection manager as high, medium, or low and then placed in a queue for scanning and metadata provisioning. a core group of individuals generally works on all digital collections, including the metadata librarian, the digital initiatives librarian, and one or both of the digitization lab managers. additionally, the team frequently includes the subject specialist who nominated the collection for digitization, staff catalogers, and other library staff members whose expertise is required. at any given time, the queue may contain as many as fifteen collections, and the core team works on several of them concurrently to address the separate needs of participating departments. while this approach allows the teams to distribute resources more equitably across departments, progress on individual collections can be slower than if they are addressed one at a time. prior to implementing aspects of scrum, the team also completed the scanning and metadata records for every object in the collection before it was published. as a result, publication of larger collections trailed behind smaller collections. the details of digital collection production vary depending of the nature of the project, but the process usually follows the same broad outline. unless the entire collection will be digitized, the collection manager chooses a selection of materials on the basis of criteria such as research value, rarity, curatorial considerations, copyright status, physical condition, feasibility for scanning, and availability of metadata. photographs and paper-based materials are then evaluated by the preservation department to ensure that they are in suitable condition for scanning. likewise, the media lab manager evaluates audio and video media for condition issues such as sticky shed syndrome, which will affect digitization.14 depending on format, the material is then digitized by the digitization lab manager or the media lab manager and their student assistants according to locally established workflows that conform to nationally recognized best practices. once digitized, student assistants apply post-processing procedures as appropriate and required, such as running ocr (optical character recognition) software to convert images to text or equalizing levels on an audio file. the lab managers then check the files for quality assurance and move the files to the appropriate location on the server. the metadata librarian creates a metadata template appropriate to the material being digitized by using industry standards such as visual resources association core (vra core), metadata object description schema (mods), pbcore, and dublin core (dc). metadata creation methods depend on the existence of legacy metadata for the analog materials and in what format legacy metadata is contained. the metadata librarian, along with his staff and/or student assistants, adapts legacy metadata into a format that can be ingested by the digital library software or creates records directly in the software when there is no legacy metadata. metadata is formatted or created in accordance with existing input standards such as cataloging cultural objects (cco) and resource description and access (rda), and it is enhanced information technologies and libraries |december 2015 9 as much as possible using controlled vocabularies such as the art and architectural thesaurus (aat) and library of congress subject headings. the metadata librarian performs quality assurance on the metadata records during creation and before the collection is published. in the final stages, the collection is created in the digital library software, at which time search and display options are established: thumbnail labels, default collection sorting, faceted browsing fields, etc. then the files and metadata are uploaded and published online. the highlight of the cu-boulder digital library is the twenty-seven collections drawn from local holdings in archives, special collections department, music library, and earth sciences and map library, among others. the library also contains purchased content and “luna commons” collections created by institutions that use the same digital library platform, for a total of more than 185,000 images, texts, maps, audio recordings, and videos. the following four collections were created during the scrum pilot and illustrate the types of materials available in the cuboulder digital library: the colorado coal project consists of video and audio interviews, transcripts, and slides collected between 1974 and 1979 by the university of colorado coal project. the project was funded by the colorado humanities program and the national endowment for the humanities to create an ethnographic record of the history of coal mining in the western united states from immigration and daily life in the coal camps to labor conditions and strikes, including ludlow (1913–14) and columbine (1927). the mining maps collection provides access to scanned maps of various mines, lodes, and claims in colorado from the late 1800s to the early 1900s. these maps come from a variety of creators, including private publishers and us government agencies. digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 10 the vasulka media archive showcases the work of pioneering video artists steina and woody vasulka and contains some of their cutting-edge studies in video that experiment with form, content, and presentation. steina, an icelander, educated in music at the prague conservatory of music, and woody, a graduate of prague's film academy, arrived in new york city just in time for the new media explosion. they brought with them their experience of the european media awakening, which helped them blend seamlessly into the youth media revolution of the late sixties and early seventies in the united states. the 3d natural history collection comprises one hundred archaeology and paleontology specimens from the rocky mountain and southwest regions, including baskets, moccasins, animal figurines, game pieces, jewelry, tools, and other everyday objects from the freemont, clovis, and ancestral puebloan cultures as well as a selection of vertebrate, invertebrate, and track paleontology specimens from the mesozoic through the cenozoic eras (250 ma to the present). the diffusion of effort across multiple collections and a slower publication rate for larger collections offered opportunities for improvement. after attending a conference session on scrum project management for web development projects, one of the team members recognized scrum’s potential to improve production processes since the technique divides large projects into manageable subtasks that can be accomplished in regular, short intervals.15 this approach would allow the team to switch between different high priority collections at regularly defined intervals to facilitate steady progress on competing priorities. working in sprints would also make it easier to publish smaller portions of a large collection at regular intervals. thus scrum held the potential to increase the production rate for larger collections and make the team’s progress more transparent to users and colleagues. in april 2013, a small team of cu-boulder librarians and staff initiated a pilot to assess the effect on processes and outcomes for digital collection production. rather than involving individuals from all affected units, regardless of their level of engagement in a particular project, the scrum pilot was limited to the three individuals who were involved in most, if not all, of the projects information technologies and libraries |december 2015 11 undertaken: the digital initiatives librarian, metadata librarian, and digitization lab manager.16 by including these three individuals, the major functions of metadata provision, digitization, and publication were covered in the trial with no disruption to the existing workflows or organizational structures. selecting this group also ensured that scrum would be tested in a broad range of scenarios and on collections from several different departments. to begin, the team met to review the scrum project management framework and considered how best to pilot the technique. taking a pragmatic approach, they only adopted those aspects of scrum that were deemed most likely to result in improved outcomes. if the pilot were successful, other aspects of scrum could be incrementally incorporated later. the group discussed how scrum roles, processes, and tools could be adapted to digital collection workflows and determined that sprints would likely have the highest return on investment. they also chose to adapt and hybridize certain aspects of the planning meeting and daily scrum to achieve goals that were not being met by other existing meetings. sprint planning and end meetings were combined so that all three participants knew what each had completed and what was targeted for the next sprint. select activities of sprint planning and end meetings were already a part of the monthly dlmg meetings, making additional sprint meetings redundant. daily scrum meetings were excluded as the team felt that daily meetings would not produce enough benefit to justify the costs. in addition, two of the three participants have numerous responsibilities that lie outside of projects subject to the scrum pilot, so each person does not necessarily perform scrum-related work every day. however, the short meeting time was adopted into the planning/end meeting, as were elements of the three core questions of the daily scrum meeting, with some modifications. the questions addressed in the biweekly meetings are: what have you done since the last meeting? what are you planning for the next meeting? what impediments, if any, did you encounter during the sprint? the latter question was sometimes addressed mid-sprint through emails, phone calls, or one-off meetings that include a larger or different group of stakeholders. the team adopted the two-week duration typical of scrum sprints for the pilot. this has proven to be a good medium-term timeframe. it was short enough that the team could adjust priorities quickly, but long enough to complete significant work. the team chose to combine the sprint planning and sprint review meetings into a single meeting. part of the motivation for a trial of the scrum technique was to minimize additional time away from projects while maximizing information transfer during the meetings. a single biweekly planning/review meeting was determined to be sufficient to report accomplishments and set goals yet substantial and free of irrelevant content without being overly burdensome as “yet another meeting.” at each sprint meeting, each participant reported on results from the previous sprint. work that was completed allowed the next phase of a project to proceed. based on the results of the last sprint, each team member set measurable goals that could be realistically met in the next twoweek sprint. there has been a concerted effort to keep the meetings short, limited to about twenty digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 12 to twenty-five minutes. to enforce this habit, the sprint meetings were scheduled to begin twenty minutes before other regularly scheduled meetings for most or all of the participants. this helped keep participants on-topic and reinforced the transfer-of-information aspect of the meetings, with minimal leeway for extraneous topics. reflection the modified scrum methodology described above has been in place for more than a year. there have been several positive outcomes resulting from this practice. beginning with the most practical, production has become more regular than it was before scrum was implemented. the nature of digital initiatives in this environment dictates that many projects are in progress at once, in various stages of completion. the production work, such as digitizing media or creating metadata records, has become more consistent and regular. instead of production peaks and valleys, there is more of a straight line as portions of projects are finished and others come online. this in turn has resulted in faster publication of collections. in 2013, the team published six new collections, twice as many as the previous year. the ability to put all hands on deck for a project for a two-week period can increase productivity. since sprints allow for short, concentrated bursts of work on a single project, smaller projects can be completed in a few sprints and larger projects can be divided into “potentially shippable units” and thus published incrementally. another benefit of scrum is that the variability of the two-week sprint cycle allows the team to work on more collections concurrently. for example, during a given sprint, scanning is underway for one collection, a metadata template is being constructed for another, the analog material in a third is being examined for pre-scanning preservation assessment, and a fourth collection is being published. while this type of multitasking occurred before the team piloted sprints, the scrum project management framework lends more structure and coordination to the various team members’ efforts. collection building activities can be broken down into subtasks that are accomplished in nonconsecutive sprints without undercutting the team’s concerted efforts. as a result, the team can juggle competing priorities much more effectively. the team is working with multiple stakeholders at any given time, each of whom may have several projects planned or in progress. as focus shifts among stakeholders and their respective projects, the scrum team is able to adjust quickly to align with those priorities, even if only for a single sprint. this also makes it easier to respond to emerging requests or address small, focused projects on the basis of events such as exhibits or course assignments. additional benefits of the scrum methodology pertain to communication and work style among the three scrum participants. the frequent, short meetings are densely packed and highly focused. each person has only a few minutes to describe what has been accomplished, explain problems encountered, troubleshoot solutions, and share plans for the next sprint. the return on the time investment of twenty minutes every two weeks is significant—there is no time to waste on issues that do not pertain directly to the projects underway, just completed, or about to start. a further result is that the group’s sense of itself as a team is enhanced. as stated above, the three scrum information technologies and libraries |december 2015 13 participants do not all work in the same administrative unit within the library. though they shared frequent communication by email as projects progressed, regular sprint meetings have fostered a closer sense of team. the participants know from sprint to sprint what the others are doing; they can assist one another with problems face-to-face and coordinate with one another so that work segments progress toward production in a logical sequence. with more than a year of experience with scrum, the pilot team has determined that several aspects of the methodology have worked well in our environment. in general, the sprint pattern fits well with existing operating modes. the monthly dlmg meeting, which includes a large and diverse group, provides an opportunity to discuss priorities, review project proposals, establish standards, and make strategic decisions. the bi-weekly sprint meetings dovetail nicely, with one meeting taking place at a midpoint between dlmg meetings, and one just prior to dlmg meetings. this allows the three scrum participants to focus on strategic items during the dlmg meeting but keep a close eye on operational items in between. the scrum methodology has also accommodated the competing priorities that the three participants must balance on an ongoing basis. there is considerable variation between participants in terms of roles and responsibilities, but the division of work into sprints has given the team greater opportunity to fit production work in with other responsibilities, such as supervision and training; scholarly research and writing; service performed for disciplinary organizations; infrastructure building; and planning, research, and design work for future projects. the two-week sprint duration is a productive time interval during which the team can set and reach incremental goals, whether that is starting and finishing a small project on short notice, making a big push on a large-scale project, or continuing gradual progress on a large, deliberatelypaced initiative. the brief meetings ensure that participants focus on the previous sprint and the upcoming sprint. there is usually just enough time to discuss accomplishments, goals, and obstacles, with some time left to troubleshoot as necessary. the meeting schedule and structure allows each individual to set his or her own goals so that he or she can make maximum progress during the sprint. this in turn feeds into accountability. there is always an external check on one’s progress—the next meeting comes up in two weeks, creating an effective deadline (which also sometimes corresponds to a project deadline). it becomes easier to stay on task and keep goals in sight with the sprint report looming in a matter of days. at the same time, scrum helps to define each person’s role and clarifies how roles align with each other. some tasks are completely independent, while others must be done in sequence and depend on another’s work. the sprint schedule allows large, complex projects to be divided into manageable pieces so that each sprint can result in a sense of accomplishment, even if it may require many sprint cycles to actually complete a project. this is especially true for large digital initiatives. for instance, completing the entire project may take a year, but subsets of a collection may be published in phases at more frequent intervals in the meantime. digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 14 summary of benefits ● enhanced ability to manage multiple concurrent projects ● published large collections incrementally, increasing responsiveness to users and other stakeholders ● improved team building ● increased communication and accountability among team members future considerations based on these outcomes, the team can safely say that it met its objectives for the test pilot. one of the reasons that it was feasible to try this when the participants were already highly committed is that the pilot used a small portion of the scrum methodology and was not too rigid in its approach. the team felt that a hybrid of the scrum planning and scrum review meeting held twice a month would provide the benefits without overburdening schedules with additional meetings. there were also plans to have a virtual email check-in every other week to loosely achieve the goals of the daily scrum meeting, that is, to improve communication and accountability. the email check-in fell by the wayside; the team found it wasn’t necessary because there were already adequate opportunities to check-in with each other over the course of a two-week sprint. the team has found the sprints and modified scrum meetings to be highly useful and relatively easy to incorporate into their workflows. the next phase of the pilot will implement product backlogs and burn down charts, diagrams showing how much work remains for the team in a single sprint, with the goal of tracking collections’ progress at the item level through each step of the planning, selection, preservation assessment, digitization, metadata provisioning, and publication workflows. figure 2. hypothetical backlog for the first sprint of a digital collection17 information technologies and libraries |december 2015 15 scrum backlogs are arranged on the basis of a task’s perceived benefit for customers. to adapt backlogs for digital collection production work, the backlog task list’s order will instead be based in part on the workflow sequence. for example, pieces from the physical collection must be selected before preservation staff can assess them. additionally, the backlog items will be sequenced according to the materials’ research value or complexity. for instance, the digitization of a folder of significant correspondence from an archival collection would be assigned a higher priority in the backlog than the digitization of newspaper clippings of minor importance from the same collection. or, materials that are easy to scan would be listed in the backlog ahead of fragile or complex items that require more time to complete. this will allow the team to publish the most valuable items from the collection more quickly. according to scrum best practices, backlogs are also appropriately detailed. in the context of digital collection production work, collections’ backlogs would begin with a standard template of high-level activities: materials’ selection, copyright analysis, preservation assessment, digitization, metadata creation, and publication. as the team progresses through backlog items, they will become increasingly detailed. backlogs also evolve. scrum’s ability to respond to change has been one of its strongest assets in this environment and therefore the backlog’s ability to evolve will make it a valuable addition to the team’s process. for example, materials that a collection manager uncovers and adds to the project late in the process can be easily incorporated into the backlog or materials in the collection that are needed to support an upcoming instruction session can be moved up in the backlog for the next sprint. in this way, the backlog will support the team’s goal to nimbly respond to shifting priorities and emerging opportunities. figure 3. hypothetical burn down chart18 digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 16 the final relevant feature of a backlog, the “effort estimates,” taken in conjunction with the burn down chart will help the team develop better metrics for estimating the time and resources required to complete a collection. when items are added to the backlog, team members estimate the amount of effort needed to complete it. the burn down chart illustrates how much work remains and, in general practice, is updated on a daily basis. given that the team has truncated the scrum meeting schedule, this may occur on a weekly basis, but will nonetheless benefit the team in several ways. initially, it will keep the team on track and provide valuable and detailed information for stakeholders on the collections’ progress. as the team accrues old burn down charts from completed collections, they can use the data to hone their ability to estimate the amount of time and resources needed to complete a given project. conclusion through the pilot conducted for digital initiatives at cu-boulder libraries, application of aspects of the scrum project management framework has demonstrated significant benefits with no discernable downside. adoption of sprint planning and end meetings resulted in several positive outcomes for the participants. digital collection production has become more regular; work can be underway on more collections simultaneously; and collections are, on average, published more quickly. in addition, communication and cooperation among the sprint pilot participants have increased and strengthened the sense of teamwork among them. the sprint schedule has blended well with existing digital initiatives meetings and workflows, and has enhanced the team’s ability to handle ever-shifting priorities. additional aspects of scrum, such as product backlogs and burn down charts, will be incorporated into the participants’ workflows to allow them to better track the work done at the item level, provide more detailed information for stakeholders during the course of a project, and predict how much time and effort will be required for future projects. the positive results of this pilot demonstrate the benefits to be gained by looking outside standard library practice and adopting techniques developed in another discipline. given the range of activities performed in libraries, the possibilities to improve workflows and increase efficiency are limitless as long as those doing the work keep an open mind and a sharp eye out for methodologies that could ultimately benefit their work, and in turn, their users. references 1. sueli mara ferreira and denise nunes pithan, “usability of digital libraries,” oclc systems & services: international digital library perspectives 21, no. 4 (2005): 316, doi: 10.1108/10650750510631695. 2. danielle a. becker and lauren yannotta, “modeling a library web site redesign process: developing a user-centered web site through usability testing,” information technology & libraries 32, no. 1 (2013): 11, doi: 10.6017/ital.v32i1.2311. 3. jodi condit fagan, “usability testing of a large, multidisciplinary library database: basic search and visual search,” information technology & libraries 25 no. 3 (2006): 140–41, 10.6017/ital.v25i3.3345. http://dx.doi.org/10.1108/10650750510631695 http://dx.doi.org/10.6017/ital.v32i1.2311 http://dx.doi.org/10.6017/ital.v25i3.3345 information technologies and libraries |december 2015 17 4. panayiotis zaphiris, kulvinder gill, terry h.-y. ma, stephanie wilson and helen petrie, “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 58, doi: 10.1080/1361457042000304136. 5. benjamin meunier and olaf eigenbrodt, “more than bricks and mortar: building a community of users through library design,” journal of library administration 54 no. 3 (2014): 218–19, 10.1080/01930826.2014.915166. 6. lisa a. palmer and barbara c. ingrassia, “utilizing the power of continuous process improvement in technical services,” journal of hospital librarianship 5 no. 3 (2005): 94–95, 10.1300/j186v05n03_09. 7. javier d. fernández et al., “agile dl: building a delos-conformed digital library using agile software development,” in research and advanced technology for digital libraries, edited by birte christensen-dalsgaard et al. (berlin: springer-verlag, 2008), 398–9, doi: 10.1007/978-3540-87599-4_44. 8. michelle frisque, “using scrum to streamline web applications development and improve transparency” (paper presented at the 13th annual lita national forum, atlanta, georgia, september 30–october 3, 2010). 9. frank h. cervone, “understanding agile project management methods using scrum,” oclc systems & services 27, no. 1 (2011): 19, 10.1108/10650751111106528. 10. pete deemer, gabrielle benefield, craig larman, and bas vodde, “the scrum primer: a lightweight guide to the theory and practice of scrum," (2012), 3-15, www.infoq.com/minibooks/scrum_primer. 11. eliza s. f. cardozo et al., “scrum and productivity in software projects: a systematic literature review” (paper presented at the 14th international conference on evaluation and assessment in software engineering (ease), 2010), 3. 12. cervone, “understanding agile project management,” 18. 13. ibid., 19. 14. sticky shed syndrome refers to the degradation of magnetic tape where the binder separates from the carrier. the binder can then stick to the playback equipment rendering the tape unplayable. 15. frisque, “using scrum.” 16. the media lab manager responsible for audio and video digitization did not participate because his lab offers fee-based services to the public and thus has long-established business processes in place that would not have blended easily with sprints. 17. figure 2 is based on illustration created by mountain goat software, “sprint backlog,” https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog. 18. figure 3 is adapted from template created by expert project management, “burn down chart template,” www.expertprogrammanagement.com/wpcontent/uploads/templates/burndown.xls. http://dx.doi.org/10.1080/1361457042000304136 http://dx.doi.org/10.1080/01930826.2014.915166 http://dx.doi.org/10.1300/j186v05n03_09 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1108/10650751111106528 https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls eclipse editor for marc records bojana dimić surla information technology and libraries | september 2012 65 abstract editing bibliographic data is an important part of library information systems. in this paper we discuss existing approaches in developing user interfaces for editing marc records. there are two basic approaches: screen forms that support entering bibliographic data without knowledge of the marc structure, and direct editing of marc records shown on the screen. this paper presents the eclipse editor, which fully supports editing of marc records. it is written in java as an eclipse plug-in, so it is platform-independent. it can be extended for use with any data store. the paper also presents a rich client platform (rcp) application made of a marc editor plug-in, which can be used outside of eclipse. the practical application of the results is integration of the rcp application into the bisis library information system. introduction an important module of every library information system (lis) is one for editing bibliographic records (i.e., cataloguing). most library information systems store their bibliographic data in a form of marc records. some of them support cataloging by direct-editing of marc record; others have a user interface that enables entering bibliographic data by a user who knows nothing about how marc records are organized. the subject of this paper is user interfaces for editing marc records. it gives software requirements and analyzes existing approaches in this field. as the main part of the paper, we present the eclipse editor for marc records, developed at the university of novi sad, as a part of the bisis library information system. eclipse uses the marc 21 variant of the marc format. the remainder of this paper describes the motivation for the research, presents the software requirements for cataloging according to marc standards, and provides background on the marc 21 format. it also describes the development of the bisis software system, reviews the literature concerning tools for cataloging, and analyzes existing approaches in developing user interfaces for editing marc records. the results of the research are presented in the final section, which describes the functionality and technical characteristics of the eclipse marc editor. the rich client platform (rcp) version of the editor, which can be used independently of eclipse, is also presented. motivation the motivation for this paper was to provide an improved user interface for cataloging by the marc standard that will lead to more efficient and comfortable work for catalogers. bojana dimić surla (bdimic@uns.ns.ac.yu) is an associate professor, university of novi sad, serbia. eclipse editor for marc records |surla 66 there are two basic approaches in developing user interfaces for marc cataloging. the first approach includes using a classic screen form made of text fields and labels with the description of the bibliographic data, without marc standard indication. the second approach is direct editing of a record that is shown on the screen. those two approaches will be discussed in detail in “existing approaches in developing user interfaces for editing marc records” below. the current editor in the bisis system is a mixture of these two approaches—it supports direct editing, but data input is done via text field, which opens on double click.1 the idea presented in this paper is to create an editor that overcomes all drawbacks of previous solutions. the approach taken in creating the editor was direct record-editing with real-time validation and no additional dialogs. software requirements for marc cataloging the user interface for marc cataloging needs to support following functions: • creating marc records that satisfy constraints proposed by the bibliographic format • selecting codes for field tags, subfield names, and values of coded elements, such as character positions in leader and control fields, indicators, and subfield content • validating entered data • access to data about the marc format (a “user manual” for marc cataloging) • exporting and importing created records • providing various previews of the record, such as catalog cards background marc 21 as was previously mentioned, the eclipse editor uses the marc 21 variant. marc 21 consists of five formats: bibliographic data, authority data, holdings data, classification data, and community information.2 marc 21 records consist of three parts: record leader, set of control fields, and set of data fields. the record leader content, which follows the ldr label, includes the logical length of the record (first five characters) and the code for record status (sixth character). after the record leader, there are control fields. every control field is written in new line and consists of the threecharacter numeric tag and content of the control field. the content of the control field can be a single datum or a set of fixed-length bibliographic data. control fields are followed by data fields in the record. every line in the record that contains a data field consists of a three-character numeric tag, the value for the first and the second indicator—or the number sign (#) if indicators are not defined for the field—and the list of subfields that belong to the field. information technology and libraries | september 2012 67 detailed analysis of marc 21 shows that there are some constraints on the structure and content of the marc 21 record. constraints on the structure define which fields and subfields can appear more than once in the record (i.e., are the fields and subfields repeatable or not), the allowed length of the record elements, and all the elements of the record defined by marc 21. constraints on the record content are defined on the content of the leader, indicators, control fields and subfields. moreover, some constraints connect more elements in the record (when the content of one element depends on the content of the other element in the record). an example of constraint on the structure for data field 016 is that the field has the first indicator whereas the second indicator is undefined. the field 016 can have subfields a, z, 2, and 8, of which z and 8 are repeatable. bisis the results presented in this paper belong to the research on the development of the bisis library information system. this system, which has been in development since 1993, is currently in its fourth version. the editor for cataloging in the current version of bisis was the starting point for the development of eclipse, the subject of this paper. 3 apart from an editor for cataloging, the bisis system has a module for circulation and an editor for creating z39.50 queries.4 the indexing and searching of bibliographic records was implemented using the lucene text server.5 as a part of the editor for cataloging, we developed the module generating various reports and catalog cards from marc records.6 bisis also supports creating an electronic catalog of unimarc records on the web, where the input of bibliographic data can be down without knowing unimarc but the entered data are mapped to unimarc and stored in the bisis database.7 the recent research within the bisis project relates to its extension for managing research results at the university of novi sad. for that purpose, we developed the current research information system (cris) on the recommendation of the nonprofit organization eurocris.8 the paper “cerif compatible data model based on marc 21 format” gives the proposal for the common european research information format (cerif), a compatible data model based on marc 21. in this model, a part of the cerif data model that relates to research results is mapped to marc 21. furthermore, on the basis of this model, research management at the university of novi sad was developed.9 the paper “cerif data model extension for evaluation and quantitative expression of scientific research results” explains the extension of cerif for evaluation of published scientific research. the extension is based on the semantic layer of cerif, which enables classification of entities and their relationships by different classification schemas.10 the current version of the bisis system is based on a variant of the unimarc format. the development of the next version of bisis, which will be based on marc 21, is in progress. the first task was migrating existing unimarc records.11 the second task is developing the editor for marc 21 records, which is the subject of this paper. eclipse editor for marc records |surla 68 cataloging tools an editor for cataloging is a standard part of a cataloger’s workstation and the subject of numerous studies. lange describes the cataloging development process from handwritten cataloging cards, to typewriters (first manual then electronic), to the appearance of marc records and pc-based cataloger’s workstations.12 leroya and thomas debate the influence of web development on cataloging. they stress that the availability of information on the web, as well as the possibility that more applications can be opened in the same time in different windows, greatly influence the process of creating bibliographic records. their paper also indicates that there are some problems that result from using large numbers of resources from the web, such as errors that arise from copy-paste methods. consequently, there is a need for automatic check of spelling errors and the possibility of a detailed review by a cataloger during editing.13 khurshid deals with general principles of the cataloger’s workstation, its configuration, and its influence on a cataloger’s productivity. in addition to efficient access to remote and local electronic resources, khurshid includes record transfer through a network and sophisticated record editing as important functions of a cataloger’s workstation. furthermore, khurshid says it is possible to improve cataloging efficiency in the windows-based cataloger’s workstation by finding bibliographic records in other institutions and cutting and pasting lengthy parts of the record (such as summary notes) to their own catalog.14 existing approaches in developing user interfaces for editing marc records the basic source for this analysis of existing user interfaces for editing marc records was the official site for marc standards of the library of congress in addition to scientific journals and conferences. the analysis of existing systems shows that there are two basic approaches in the implementation of editing marc records: 15 • entering bibliographic data in classic screen forms made of text fields and labels, which does not require knowledge of the marc format (concourse,16 koha,17 j-marc18) • direct editing of a marc record shown on the screen (marcedit,19 isismarc,20 catalis,21 polaris,22 marcmaker and marcbraker,23 exlibris voyager24). both of these approaches have advantages and disadvantages. the drawback of the first approach is that it provides a limited set of bibliographic data to edit, and the extension of that set implies changes to the application, or in the best cases changes in configuration. another problem is that there are usually a lot of text fields, text areas, combo boxes, and labels on the screen that need to be organized into several tabs or additional windows. this situation usually makes it difficult for the users to see errors or to connect different parts of the record when checking their work. moreover, all found solutions from the first group perform little validation of data entered by the user.25 one important advantage of the first approach is that the application can be used by a user information technology and libraries | september 2012 69 who is not familiar with the standard, thus the need for access to marc data can be avoided (one of functions listed “marc 21” above). as for second approach, editing a marc record directly on the screen overcomes the problem of extending the set of bibliographic data to enter. it also enables users to scan entered data and check the whole record, which appears on the screen. users can also copy and paste parts of records from other resources into the editor. however, a majority of those applications are actually editors for editing marc files that are later uploaded in some database or transformed in some other format (marcedit, marcmaker and marcbreaker, polaris), and they usually support little or no data validation.26 they allow users to write anything (i.e., the record structure is not controlled by the program), and only validate at the end of the process when uploading or transforming the record. among those editors there are those, such as catalis and isismarc, that present the marc record as a table. they support the control of structure, but the record presented in this way is usually too big to fit on the screen, so it is separated into several tabs. an important function of editing marc records is selecting code for coded elements that can be positioned in the leader or control field, value of the indicator, or value of the subfield. there are also field tags or subfield codes that sometimes need to be selected for addition to a record. all analyzed editors provide additional dialogs for picking this code that require the user to constantly open and close dialogs, which sometimes can be annoying for the user. one important fact about editors in the second group is that they can be used only by a user who is familiar with marc, so access to the large set of marc element descriptions can make the job easier. some of the mentioned systems provide descriptions of the fields and subfields (e.g., isismarc), but most of them do not. findings the editor for marc records was developed as a plug-in for eclipse; therefore it is similar to eclipse’s java code editors. as the editor is written in java, it is platform-independent. the main part of this editor was created using oaw xtext framework for developing textual domain-specific languages.27 it was created using model-driven software development by specifying the model of marc record in a form of xtext grammar and generating the editor. all main characteristics of the editor were generated on the basis of the specification of constraints and extensions of the xtext grammar—therefore all changes to the editor can be realized by changing the specification. moreover, this editor can be easily adjusted for any database by using the concept of extension and extension point in the eclipse plug-in. we make this application independent of eclipse by using rich client platform (rcp) technology. this editor is implemented for marc 21 bibliographic and holdings formats. user interface eclipse editor for marc records |surla 70 figure 1 shows the editor opened within eclipse. the main area is marked with “1”—it shows the marc 21 file that is being edited. that file contains one marc 21 bibliographic record. the tags of the fields and subfields codes are highlighted in the editor, which contributes to presentation clarity. the area marked with “2” serves for listing the errors in the record, that is, nonvalid elements entered in the record. the area marked with “3” shows data about marc 21 in a tree form. this part of the screen has two other possible views: a marc 21 holdings format tree and a navigator, which is the standard eclipse view for browsing resources for the opened project. the actions available for creating a record are available in the cataloging menu and on the cataloging toolbar, which is marked with “4.” these are actions for previewing the catalog card, creating a new bibliographic record, loading a record from a database (importing the record), uploading a record to a database (exporting the record), and creating a holdings record for this bibliographic record. figure 1. eclipse editor for marc records in the eclipse editor for marc, selecting codes is enabled without opening additional dialogs or windows (figure 2). that is a standard eclipse mechanism for code completion: typing ctrl + space opens the dropdown list with all possible values for the cursor’s current position. information technology and libraries | september 2012 71 figure 2. selecting codes record validation is done in real time, and every violation is shown while editing (figure 3). figure 3 depicts two errors in the record: one is a wrong value in the second character position in control field 008, and another is that two 100 fields were entered, which is a field that cannot be duplicated in a record. figure 3. validation errors rcp application of the cataloging editor as shown above, the editor is available as an eclipse plug-in, which raises the question of what a cataloger will do with all the other functions of the eclipse integrated development environment (ide). as seen in figures 1 and 3, there are a lot of additional toolbars and menus that not related eclipse editor for marc records |surla 72 to cataloging. the answer lies in rcp technology. rcp technology generates independent software applications on the basis of a set of eclipse plug-ins.28 the main window of an rcp application with additional actions is shown in figure 4. beside the cataloguing menu that is shown, the window also contains the file menu, which includes save and save as actions, as well as the edit menu, which includes undo and redu actions. all of these actions are also available via the toolbar. figure 4. rcp application conclusion the goal of this paper was to review current user interfaces for editing marc records. we presented two basic approaches in this field and analyzed of advantages and disadvantages of each. we then presented the eclipse marc editor, which is part of the bisis library software system. the idea behind eclipse is inputting structured marc data in the form similar to programming language editors. the author did not find this approach in the accessible literature. the rcp application of the presented editor will find its practical application in future versions of the bisis system. it represents an upgrade of the existing editor and a starting point for forming the version of the bisis system that will be based on marc 21. the acquired results can also be information technology and libraries | september 2012 73 used for the input of other data into the bisis system, including data from the cris system used at the university of novi sad. this paper shows that eclipse plug-in technology can be used for creating end user applications. the development of applications with the plug-in technology enables the use of a big library of created components from the eclipse user interface, whereby writing source code is avoided. additionally, the plug-in technology enables the development of extendible applications by using the concept of the extension point. in this way, we can create software components that can be used by a great number of different information systems. by using the concept of “extension point,” the editor can be extended by the functions that are specific for a data store. an extension point was created for export and import of marc records, which means the marc editor plug-in can be used with any database management system by extending this extension point in eclipse plug-in technology. future work in the development of the eclipse marc editor is to implement support for additional marc formats, for authority and classification data, and for community information. these formats propose the same record structure but have different constraints on the content and different sets of fields and subfields, as well as different codes for character positions and subfields. therefore the appearance of the editor will remain the same. the only difference will be the specification of the constraints and codes for code completion. another interesting topic for discussion is considering implementation of other modules of library information systems in eclipse plug-in technology. references 1. bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloging,” electronic library 27 (2009): 509–28; bojana dimić, branko milosavljević, and dušan surla, “xml schema for unimarc and marc 21 formats,” electronic library 28 (2010): 245–62. 2. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 3. dimić and surla, “xml editor,” dimić, milosavljević, and surla, “xml schema.” 4. danijela tešendić, branko milosavljević, and dušan surla, “a library circulation system for city and special libraries,” electronic library 27 (2009): 162–68; branko milosavljevic and danijela tešendić, “software architecture of distributed client/server library circulation,” electronic library, 28 (2010): 286–99; danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard,” electronic library 27 (2009): 474–95. 5. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” electronic library 28 (2010): 525–36. http://www.loc.gov/marc eclipse editor for marc records |surla 74 6. jelana rađenović, branko milosavljеvić, and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43 (2009): 63–76. 7. katarina belić and dušan surla, “model of user friendly system for library cataloging,” comsis 5 (2008): 61–85; katarina belić and dušan surla, “user-friendly web application for bibliographic material processing,” electronic library 26 (2008): 400–410; eurocris homepage, www.eurocris.org (accessed february 21, 2011). 8. dragan ivanović, dušan surla, and zora konjović, “cerif compatible data model based on marc 21 format,” electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945. 9. eurocris, “common european research information format,” http://www.eurocris.org/index.php?page=cerifreleasesandt=1 (accessed february 21, 2011); dragan ivanović et al., “a cerif-compatible research management system based on the marc 21 format,” program: electronic library and information systems 44 (2010): 229–51. 10. gordana milosavljević et al., “automated construction of the user interface for a cerifcompliant research management system,” the electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1954429; dragan ivanović, dušan surla, and miloš racković, “a cerif data model extension for evaluation and quantitative expression of scientific research results,” scientometrics 86 (2010): 155–72. 11. gordana rudić and dušan surla, “conversion of bibliographic records to marc 21 format,” electronic library 27 (2009): 950–67. 12. holley r. lange, “catalogers and workstations: a retrospective and future view,” cataloging & classification quarterly 16 (1993): 39–52. 13. sarah yoder leroya and suzanne leffard thomas, “impact of web access on cataloging,” cataloging & classification quarterly 38 (2004): 7–16. 14. zahirrudin khurshid, “the cataloger’s workstation in the electronic library environment,” electronic library 19 (2001): 78–83. 15. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 16. book systems, “concourse software product,” http://www.booksys.com/v2/products/concourse (accessed february 19, 2011). 17. koha library software community homepage, http://koha-community.org (accessed february 19, 2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945 http://www.emeraldinsight.com/journals.htm?articleid=1954429 http://www.loc.gov/marc http://www.booksys.com/v2/products/concourse http://koha-community.org/ information technology and libraries | september 2012 75 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, alberta, canada). 19. terry reese, “marcedit—your complete free marc editing utility,” http://people.oregonstate.edu/~reeset.marcedit/html/index.php (accessed february 19, 2011). 20. united nations educational scientific and cultural organization, “isismarc,” http://portal.unesco.org/ci/en/ev.phpurl_id=11041&url_do=do_topic&url_section=201.html (accessed february 19, 2011). 21. fernando j. gómez “catalis,” http://inmabb.criba.edu.ar/catalis (accessed february 19, 2011). 22. polaris library systems homepage, http://www.gisinfosystems.com (accessed february 19, 2011). 23. library of congress, “marcmaker and marcbreaker user’s manual,” http://www.loc.gov/marc/makrbrkr.html (accessed february 19, 2011). 24. exlibris, “exlibris voyager,” http://www.exlibrisgroup.com/category/voyager (accessed february 19, 2011). 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. 27. eclipse.org, “xtext,” http://www.eclipse.org/xtext (accessed february 19, 2011). 28. the eclipse foundation, “rich client platform,” http://wiki.eclipse.org/index.php/rich_client_platform (accessed february 19, 2011). http://people.oregonstate.edu/~reeset.marcedit/html/index.php http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://inmabb.criba.edu.ar/catalis http://www.gisinfosystems.com/ http://www.loc.gov/marc/makrbrkr.html http://www.exlibrisgroup.com/category/voyager http://www.eclipse.org/xtext http://wiki.eclipse.org/index.php/rich_client_platform 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, ... 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. methods of randomization of large files with high volatility 79 patrick c. mitchell: senior programmer, washington state university, pullman, washington, and thomas k. burgess: project manager, institute of library research, university of california, los angeles, california key-to-address conversion algorithms which have been used for a large, direct access file are compared with respect to record density and access time. cumulative distribution functions are plotted to demonstrate the distribution of addresses generated by each method. the long-standing practice of counting address collisions is shown to be less valuable in fudging algorithm effectiveness than considering the maximum number of contiguously occupied file locations. the random access disk file used by the washington state university library acquisition sub-system is a large file with a sizable number of records being added and deleted daily. this file represents not only materials on order by the acquisitions section, but all materials which are in process within the technical services area of the library. the size of the file currently varies from approximately 12,000 to 15,000 items and has a capacity of 18,000 items. over 40,000 items are added and purged annually. each record consists of both fixed length fields and variable length fields. fixed fields primarily contain quantity and accounting information; the variable length fields represent bibliographic data. records are blocked at 1,000 characters for file structuring purposes; however the variable length information is treated as strings of characters with delimiters. the key to the file is a 16-character structure which is developed from the purchase order number. the structure of the key is as follows: six digits of the original purchase order number, two digits of partial order and credit information, and eight digits containing the computed relative record address. proper development of this key turns out to be 80 journal of library automation vol 3/1 march, 1970 the most important factor in achieving efficiency in both file access time and record density within the file. the w.s.u. purchase order numbering system, developed from a basic six-digit purchase order number, allows up to one million entries. of these, the library currently uses four blocks: one block for standing orders, one block for orders originating from the university after the system becomes operational, another block used by the systems people in prototype testing of the system, and a fourth block which was given to one vendor who operates an approval book program. in mapping a possible million numbers into eighteen thousand disk locations, there is a high probability that the disk addresses for more than one record will be the same. disk location, also called disk address, home position, and relative record address ( rra) in this paper, refers to the computed offset address of a record in the file, relative to the starting address of the file. currently, the file resides on an ibm 2316 disk pack which can store six 1000-character records per track. thus if the starting address of the file is track 40, a record with rra = 5 would have its home position on track 40, while a record with rra = 6 would have its home position on track 41. it should be noted that routines in this system are required to calculate neither absolute track address nor relative track address and therefore the file could be moved to any direct access device supported by os/bdam without program modification. when two records map into the same address, it is called a collision. for a write statement under the ibm 360 operating system, basic direct access methods, the system locates that disk address generated and if another record is found there, it sequentially searches from that point forward until a vacant space is found and then stores the new record in that space. the sequential search is done by a hardware program in the i/ 0 channel and proceeds at the rotational speed of the device on which the file resides. the cpu is free during this period to service other users. similarily, when searching for a record, the system locates the disk address and matches keys; if they do not match, it sequentially searches forward from that point. long sequential searches sharply degrade the operating efficiency of on-line systems. in initial experimentation with this file, it was discovered that some records were 2,500 disk positions away from their computed locations. this seriously reduced response time to the terminals which were operating against those records. the necessity to develop a method for placing each record close to its calculated location became quite obvious. however, the methodology for doing this was not as clear. the upper bound delay for a direct access read/write operation can be defined as the largest number of contiguously occupied record locations within the file. the problem of minimizing this upper bound for a particular file is equivalent to finding an algorithm which maps the keys in such a way that unoccupied locations are interspersed throughout the randomization of large files/mitchell and burgess 81 file space. one method for doing this is to triple the amount of space required for the file. this has been a traditional approach but is unsatisfactory in terms of its efficiency in space utilization. the method first used by the library was motivated by the necessity to "get on the air." its requirements were that it be easily implemented and perform to a reasonable degree. the prime modulo scheme seemed to qualify and was selected. as this algorithm was used, the largest prime number within the file size was divided into the purchase order number and the modulo remainder was used as an address; that is, rra = [po modulo pr] where rra is the relative record address, po is the purchase order number, and pr is a prime number. during the initial period file size grew to about 8,000 records. because the acquisitions section was converting from its manual operation, the file continued to grow in size and the collision problem became pronounced. when the file reached about 70% capacity-that is when 70% of the space allocated for the file was being occupied by records-this method became unusable; records were then located so far from their original addresses that terminal response times became degraded and batch process routines began to have significant increases in run times. with no additional space available to expand the size of the file, it became necessary to increase the record density within the existing file bounds. therefore an adaptation of the original algorithm was developed. in addition to generating the original number by dividing a prime number into the purchase order number and keeping the modulo remainder, the purchase order number was multiplied by 300 and divided by that same prime number to get an additional modulo remainder; the latter was added to the first modulo remainder and the sum then divided by 2: (po modulo pr) + (300 • po modulo pr) 2 rra = again this scheme brought some relief, but the file continued to grow as the system was implemented, and it became obvious that this procedure would also fail because of over-crowded areas in the file. a search of the literature using w. b. climenson's chapter on file structure ( 2) as a start provided some other methods for reducing the collision problem ( 1, 3, 4, 5, 6). several randomization or hashing schemes were examined. however, none of these methods appeared to be particularly pertinent to the set of conditions at washington state. in order to bring relief from the continuing problem of file and program maintenance involved with changing the file-mapping algorithm, research was initiated to devise an algorithm which would, independent of the input data, map records uniformly across the available file space. the algorithm which resulted utilizes a pseudo-random number generator, rand (7) developed at the w.s.u. computing center randl, program 360l-13.5.004, computing center library, computing center, 82 journal of library automation vol 3/ 1 march, 1970 washington state university, pullman, washington. the normal use of rand is to generate a sequence of uniformly distributed integers over the interval [1, m], where m is a specified upper bound in the interval [1, 231 -1]. in addition to m, rand has a second input parameter: n, which is the last number generated by rand. given m and n, rand generates a result r. rand is used by the algorithm to generate relative disk addresses by setting m to the size or capacity of the file, by setting n to the purchase order number of the record to be located, and by using r as the relative address of the record. rra =rand (po, m ) . in order to test the effectiveness of this algorithm and others which might be devised, a file simulation program was written bdamsim, program 360l-06.7.008, computing center library, computing center, washington state university, pullman, washington. inputs to this program are: a) an algorithm to generate relative record locations; b) a sequential file which contains the input data for "a"; c) various scalar values such as file capacity, approximate number of records in the file, title of output, etc. the program analyzes the numbers generated by "a" operating on "b" within the constraints of "c". the outputs of the program are some statistical results and a graphical plot showing the cumulative distribution function of the generated addresses. figures 1, 2, and 3 show the plotted output of the three algorithms operating against the current acquisitions file. the abscissas of the plots 8 • )!! ii! li 1i :;! 5i ::! !':! ~ ~ n n ~~ a= .. ~, ,~ -' -' ~)11 i'! a; ·:5 ~li ma! 0.. 0.. .. .. it ,:: ~ ~ ~--~~~~-±~~~--~~~~~~~~--~--~--~--~~0 21 , 10 '12.20 83.30 111,'10 105. 51 i:m.61 1~7.71 1811,81 im.tl 211.01 2$!,11 253.2f relrt ive record addresses lx i 02 l fig. 1. rra =po modulo pr randomization of large files/ mitchell and burgess 83 fig. 2. rra = ( (po modulo pr) + (300 x po modulo pr) )/ 2. 8 i )c ii! ~ ~ z! 5i fl !! l':! i<; ~ ;;; :::::: ;::: ~8 z 8::: .; .; ::~ ~ ,.. ~ ..j ..j ~~ ~iii ~m :ti~ a: a: """' "' ~ ~ ~ ~ fig. 3. rra =rand (po, pr). 84 journal of library a while any abandoned cluster (14,692,237 out of 24,030,176!) was erroneously described as follows: this xml empty statement omits the specific information about the abandoned cluster. to obtain this invaluable information again, we filed a bug by email. 29 the decision taken was drastic: starting in may 2020, viaf stopped including this information in its monthly dump, as stated at the bottom of the page itself.30 as a result, the only recourse available to viaf contributors or any https://www.wikidata.org/w/index.php?title=q102371&oldid=1220309663 https://viaf.org/viaf/57898554/ https://www.wikidata.org/wiki/q4117019 https://www.wikidata.org/wiki/q23665535 http://viaf.org/viaf/data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 8 other institution that would synchronize their authority records with viaf identifiers is to rely on an external identification tool such as wikidata! materials and methods any comparison between viaf and wikidata must consider their different content. viaf contains personal name clusters, corporate name clusters, geographic name clusters, and work clusters, whereas wikidata allows items to describe any kind of entity relevant in the universe of discourse of the users’ data and irrespective of their bibliographic nature. even if all kinds of viaf clusters are relevant for bibliographic control, this study is limited to the analysis of personal name clusters in viaf and of items having “instance of: human” (p31:q5) in wikidata, because they are largely the most represented in viaf and they can be directly compared.31 some entities, such as mythological persons, legendary persons, etc., that are personal clusters in viaf, are not treated as humans in wikidata and belong to other instances (e.g., https://www.wikidata.org/wiki/q95074). a double approach was used to compare viaf and wikidata: first, data analyses of viaf and wikidata were performed, to compare viaf clusters and wikidata items and to investigate their reciprocal relationships (see the data analysis section). second, a comparison of several general characteristics, such as scope, objectives, philosophy, authority control, and identification, was made based on respective websites and available literature to find and highlight differences and similarities. full viaf dumps are available in native xml, rdf, marc-21 xml, or iso-2709 marc-21 (http://viaf.org/viaf/data/). viaf clusters were analyzed using an xml dump published on september 6, 2020 (http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz). full wikidata dumps are available in xml, json, or rdf.32 however, given the size of the entire dataset, it is much more convenient to create customized rdf dumps using the tool wdumper (https://wdumps.toolforge.org/). all the information (settings, dimension, and date of base dump) about dumps created using wdumper remains traced (https://wdumps.toolforge.org/dumps). wikidata items were analyzed using a customized rdf dump updated to september 14, 2020 (https://wdumps.toolforge.org/dump/732). the customized dump contains all statements with non-deprecated values33 present in items having both “instance of: human” (p31:q5) in best rank and at least one value of “viaf id” (p214) in best rank. both dumps were parsed using three perl scripts. dumps and scripts were uploaded on zenodo and are all available for analysis and reuse.34 perl scripts generate json data that are published on the html page http://catalogo.pusc.it/beyond_viaf/, where they are interpreted by javascript scripts in order to populate eight tables: three dedicated to viaf (tables 1–3) and five to wikidata (tables 4–8). in order to select the statements to be analyzed in wikidata items, three sets of relevant properties were found through three distinct sparql queries at the end of september 2020: viaf members (table 5), authority controls related to libraries but not being viaf members (table 6), and biographical dictionaries (table 7).35 at the beginning of october 2020, another sparql query was performed to find all the personal items containing the authority controls related to libraries but not being viaf members (table 6, column 4), without filtering the search to personal items having at least one value of “viaf id” (p214).36 https://www.wikidata.org/wiki/q95074 http://viaf.org/viaf/data/ http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz https://wdumps.toolforge.org/ https://wdumps.toolforge.org/dumps https://wdumps.toolforge.org/dump/732 http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 9 data analysis: viaf clusters and wikidata items for this paper, two different versions of the data tables were produced: the first version, available at http://catalogo.pusc.it/beyond_viaf/, is a full, commented, and dynamic version of all the tables. within that version, links to the acronyms (such as lc, dnb, sudoc, etc.) of all the viaf contributors and other data providers are available too. static versions of these tables are included in this paper with commentary. viaf viaf has 22,099,715 personal clusters, half of which (50.90%; table 1, col. 2) are isolated clusters (i.e., they contain only one id). the presence of isolated clusters is interesting because it means that those clusters are created based on data coming from just one source. what is more, the percentage of isolated clusters is much higher (71.19%; table 1, col. 12) if just viaf contributors are taken into account (i.e., excluding isolated clusters due to data from other data providers, such as isni). it is worth noting that other data providers can form isolated clusters, with the relevant exception of wikidata (for which viaf uses the acronym wkp), which never appears in isolated clusters (table 1, cols. 7 and 8). table 1. viaf personal clusters by number of sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb1] the total number of ids present in viaf clusters is 51,327,847 (table 2), distributed in 22,099,715 clusters; the most relevant contributors include lc (7,266,628 ids), dnb (5,677,731 ids), sudoc (3,278,189 ids), and nta (2,754,036 ids), while the most relevant other data providers are isni (8,455,814 ids) and wkp (2,148,680 ids) (table 2). apart from lc and dnb, data about isolated clusters (table 2, col. 5) shows that the number of isolate clusters tends to slowly decrease over time and that clustering has improved: recently-added sources tend to have a higher share of isolated ids. another relevant figure is that sources in non-latin alphabets usually have higher shares of isolated ids.37 so, a high number of isolated clusters may reveal a source that is partially in need to be gathered to existing clusters. http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 10 table 2. viaf personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb2] the histories of viaf clusters, as contained in xml dumps, appear weird and incoherent. for example, many viaf contributors in their first year of appearance seem to have no additions and many removals (e.g., bav row; for complete information see table 3 on the website at http://catalogo.pusc.it/beyond_viaf/#tb3). incoherence is due to the absence of redirected and abandoned clusters in the data. nevertheless, the histories allow us to reconstruct the year of first contribution of each source—an information otherwise unavailable—and to detect major changes in the data provided to viaf by each source.38 table 3. viaf history of personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb3] wikidata wikidata has 8,304,947 personal items and 2,061,046 of them contain a viaf id. usually one or more viaf sources are extracted from the viaf id(s), so that 1,905,470 personal items containing viaf id have at least one viaf source id (table 4, col. 1). wikidata records ids from a wide range http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb3 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 11 of other resources, such as non-viaf bibliographic agencies and biographical dictionaries (investigated in these tables), but also encyclopedias and various online databases. considering the 2,061,046 items containing a viaf id, 684,367 items contain only one viaf source id (table 4, col. 1), but only 353,710 items contain only one among viaf sources ids and non-viaf sources ids and biographical dictionaries ids (table 4, col. 15); so, more than 300,000 items containing only one viaf source id have at least one non-viaf source id and/or one biographical dictionary id. table 4. wikidata personal items (pers. it.) by number of ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb4] viaf and wikidata: a data comparison from a quantitative perspective, wikidata personal items (8,304,947) are 37.58% of viaf personal clusters (22,099,715), while wikidata personal items having a viaf id (2,061,046) are 9.26%. ids from viaf sources present in wikidata personal items containing viaf id (6,292,778; table 5, col. 3) are 12.91% of ids present in viaf personal clusters (48,740,933; table 5, col. 4). in the authors’ opinion, quantitative confrontation between viaf and wikidata must be carefully considered. it could be argued that is a noticeable disadvantage of wikidata with respect to viaf, but it would be right only from a bibliographic control perspective and the other side of the coin must be examined too. as wikidata represents any kind of entity relevant for its users (libraries, archives, museums, and many other stakeholders), viaf contains just over a third of wikidata items (37%). furthermore, a very large part of the personal entities represented in wikidata (at present, more than 6,200,000, i.e., about 75%) cannot rely on viaf for identification purposes (for example, because wikidata personal items can also represent singers, lawyers, pilots, and so on). it can be concluded that viaf can be considered just one specialized source, in the domain of the semantic web and with respect to the objectives of wikidata. considering single viaf sources, wikidata surpasses viaf by number of ids only in two cases, perseus (135.18%) and simacob (102.17%) (table 5, col. 5). this is possible because wikidata and viaf gather different sets of data from both the sources; the former uses sets of data obtained by its users, while the latter uses only data sent by the contributor. all the other sources, because of the absence of systematic imports, are much rarer in wikidata than in viaf. http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 12 table 5. wikidata personal items (pers. it.) by viaf source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb5] table 6 and table 7 show authority control in wikidata living aside viaf. wikidata contains some non-viaf sources (usually non-national libraries or groups of libraries which couldn’t become viaf contributors); their ids in personal items having viaf id (894,161) are the 86.04% of their ids in all personal items (958,206; table 6, col. 4), meaning that wikidata provides a clusterization for more than 64,000 ids (6%) probably corresponding to non-existent viaf clusters (table 6, totals). http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 13 table 6. wikidata personal items (pers. it.) by non-viaf sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb6] table 7. wikidata personal items (pers. it.) by biographical dictionary [adapted from http://catalogo.pusc.it/beyond_viaf/#tb7] in general the presence of ids of biographical dictionaries (796,609 ids in total) in 725,755 personal items having viaf id helps significantly in the definition of authoritative dates of birth and death (table 7, total of column 2 and table 4, total of column 12). http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 14 a comparison between table 1, column 7, and table 2, row wkp (the acronym for wikidata wrongly used by viaf) shows that 2,147,319 clusters contain 2,148,680 wkp ids; it means that, from a viaf point of view, wikidata duplicates are only 1,361. furthermore, a comparison between the total and row 0 in table 8, col. 1, shows that 2,061,046 items contain at least one viaf id and that 2,037,638 items contain exactly one viaf id; so, items containing one or more viaf duplicates are 23,408. as a result, it can be concluded that the percentage of duplicates in wikidata is less than 0.01% and in viaf is about 0.01%, so wikidata is as trustworthy as viaf. viaf and wikidata not only are able to discover reciprocal duplicates, but also discover duplicates in viaf sources, by a comparison between table 8, col. 3—containing the total number of the cases in which a viaf source has at least one duplicate—and table 8, col. 5—containing the total number of the cases in which viaf sources are duplicated. however, while duplicates recorded by viaf are findable only by querying the monthly dumps using in-house–made programs, duplicates discovered by wikidata are easily findable through sparql queries detecting single-value constraint violations. table 8. wikidata personal items (pers. it.) by repeated viaf sources and viaf source ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb8] discussion viaf and wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting and management. a major difference between viaf and wikidata is in their purpose: on the one hand, viaf aims to identify bibliographic entities and to connect authority data provided by selected contributors (national libraries, cultural agencies, and other major institutions) and extracted from other data providers (such as isni, rism or de663, wikidata, etc.) through the creation of clusters by means of software. on the other hand, like isni, wikidata focuses on both identification and description of entities and has the purpose of building collaboratively a database concerning the sum of all relevant knowledge—provided that each item complying with its notability criteria is accepted— using a crowdsourced approach (https://www.wikidata.org/wiki/wikidata:notability). http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 https://www.wikidata.org/wiki/wikidata:notability information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 15 another relevant difference between viaf and wikidata is their scope: while viaf aims to identify a few selected types of entities already described within the bibliographic universe by national agencies, wikidata aims to identify and describe any kind of entity of interest for the wikidata community. wikidata items may exist for any kind of entity and may contain a very broad range of data and of external identifiers. so, wikidata can represent bibliographic data and entities —e.g., at present wikidata records data for the 54% of all the bibliographic sources cited in wikipedia entries—any other kind of entity provided for in viaf (i.e., agents, works, expressions, and places), and any other entity defined by the frbr-ifla lrm model (e.g., manifestations, items, timespans, nomens, res, etc.), and by other models relevant for the glam universe (such as frbroo and cidoc).39 but it is open to any data model because it can also include any kind of entity outside the bibliographic or cultural heritage universe, as it is a knowledge base capable of containing any kind of statement on any entity users want to describe. in addition, for any kind of entity there is no minimum or maximum number of statements that must or can be added; as soon as an entity is clearly identified, it can be added to wikidata. moreover, when miss ing, new identifiers—and properties for description—can be proposed by anyone through property proposals and, if well defined, they are usually approved within two weeks (https://www.wikidata.org/wiki/wikidata:property_proposal). a broader scope is supposed to be much more convenient for users who wish to discover previously unknown links and information in the semantic web. organizational model due to the viaf top-down approach, data is completely managed by oclc with no chance for common users or medium and small libraries or other institutions to directly improve viaf clusters (e.g., by adding other data coming from their collections or from encyclopedias or online databases, merging duplicates, solving conflations, etc.). as the wikidata approach is “to crowdsource data acquisition, allowing a global community to edit the data,” data is curated directly by users interested in their creation and use.40 so, in wikidata, data is produced by volunteers, by means of semiautomatic or manual data harvesting from any desired and available source. moreover, users’ statistics show that authoritative data from national bibliographic agencies and other libraries, archives, and museums are normally uploaded by common users, not by librarians (or any other kind of institutional data curator).41 identification function the theoretical approach differs too, both as to the form of the names and as to identification function. in viaf, preferred and variant forms of names for persons are based on national cataloguing codes. because national codes are different, viaf is needed and works as a neutral hub of all the national preferred forms. cataloguing rules can assure uniformity and univocity to the forms of the names of the entities within a national catalogue but are quite complicated to be understood and used by users. in ranganathan’s words, “the cataloguing conventions are on the surface quite contrary to what mr. everybody is familiar with.”42 in contrast, preferred forms in wikidata are based on the international principles of the convenience of the user and common usage.43 a clear example is the use of the direct form of name (jane doe) instead of the inverted form of name (doe, jane). a different usage in the forms of names could be an issue for the integration of library metadata in wikidata. in practice, however, it is not. first, there is no conflict between the wikidata form and any other form from a theoretical point of view, as wikidata form is already treated in viaf as the preferred form within its specific context.44 in addition to that, wikidata accepts any library https://www.wikidata.org/wiki/wikidata:property_proposal information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 16 identifier, so that any library-controlled form can be linked to a wikidata item and vice versa. furthermore, a wikidata bot could be programmed to dump authorized and variant access points from national authority files and add them to the item labels and aliases. 45 lastly, it could be argued that national cataloguing codes are compliant with the icp principles and with the convenience of the user and common usage. but a remarkable difference is that while in national codes principles are applied by cataloguers for users, in wikidata they are expressed directly by the users themselves. as the identification function is a major feature of the semantic web, the different approach of viaf and wikidata to this issue must be underlined. as noted, “viaf remains neutral towards differences in the cataloguing policy of its data contributors” and, for this reason, viaf accepts all ids provided by its sources, even when they are not clearly identifiable entities but are just labels (see for example https://viaf.org/viaf/307171748 or https://viaf.org/viaf/305052259).46 on the contrary, wikidata explicitly requires each item to refer to “a clearly identifiable conceptual or material entity” (second notability criterium; https://www.wikidata.org/wiki/wikidata:notability). as a consequence, many isolated clusters formed by viaf on the basis of single contributors’ ids related to not-clearly-identifiable entities are not acceptable in wikidata and remain unlinked. moreover, data on cluster duplication shows that identification in wikidata is performed with the same quality level as in viaf. clusters for identification purpose are created both in viaf and wikidata, but differently from viaf, in wikidata external identifiers—as all the other data—are not provided in a structured way by national libraries or other institutions (with very few exceptions); instead, identifiers are usually found and added by common users through web scrapers and after data cleaning. what is more, matches are not performed automatically, but semiautomatically (through tools such as openrefine or mix’n’match (https://mix-n-match.toolforge.org/ and https://openrefine.org/) or manually. an enhanced feature of wikidata in clusterization is the record of a wider variety of sources and relative ids: due to its openness, wikidata refers to viaf and its sources, but also to any other library or cultural institution and to a large number of reference sources like encyclopedias and biographical dictionaries too (table 7). a wider variety of identification sources and manual work assure a higher level of identification. data quantity data harvesting affects both quantity and quality of data. in viaf, data are collected from periodical contributions of viaf participants, with very large sets of data. therefore, from a quantitative point of view, viaf has a far larger number of people (22,099,715 personal clusters) in comparison with wikidata (8,304,947 personal items). even though wikidata was created in 2012, the number of personal items in wikidata is currently only over a third (37%) of all viaf personal clusters. although quantities are not directly comparable due to the different universe to be described, in the last few years initiatives to enhance organized cooperation between libraries and wikidata and to promote data production in wikidata are increasing. a very high-quality initiative is supported by cornell university, harvard university, stanford university, and the university of iowa’s school of library and information science, in collaboration with the library of congress and the program for cooperative cataloging (pcc). their linked data for production (ld4p) wikidata project is “an indepth exploration of how wikidata could serve as a platform for publishing, linking, and enriching library linked data” https://viaf.org/viaf/307171748 https://viaf.org/viaf/305052259/#jones,_a._l https://www.wikidata.org/wiki/wikidata:notability https://mix-n-match.toolforge.org/ https://openrefine.org/ http://catalogo.pusc.it/beyond_viaf/#tb7 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 17 (https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production). an additional example is the ifla wikidata working group that was formed “to explore and advocate for the use of and contribution to wikidata by library and information professionals, the integration of wikidata and wikibase with library systems, and alignment of the wikidata ontology with library metadata formats such as bibframe, rda, and marc” (https://www.ifla.org/node/92837). even so, wikidata is still very far from having a structured workflow to ingest data from national or local libraries, museums, and archives. in fact, while the projects mentioned above are mainly dedicated to explaining to the public of librarians and institutions why wikidata is important and how to contribute to it, there are still very few projects which are mainly dedicated to the concrete massive synchronisation of data between library and bibliographic data and wikidata. in fact, they also require a relevant effort in the manual cleaning of discrepancies and oddities emerging from the synchronisation. relevant exceptions are the national library of wales 47 and the biblioteca europea di informazione e cultura, where significant work has been done to synchronise respective databases of authors (and of other types of entities) with wikidata. 48 data quality data quality also needs to be analyzed in detail. even if data from national libraries are authoritative and of high quality, as a virtual file viaf neither has nor produces its own data. consequently, viaf data does not always remain authoritative because errors can be both inherited and added, and clusters can be duplicated. the issue is well known by isni, that “whenever necessary [. . .] splits and merges data coming from viaf, and even applies protection to data that has been fixed manually.”49 as shown in table 2 and table 8, viaf clusters are subject to isolation and duplication when they are created and to many changes and updates when they are maintained. so, even if viaf collects a huge amount of authoritative data and creates clusters of ids, viaf users can not always safely and continuously rely on them. data flows just in one direction (from national libraries to viaf), viaf deletes and rebuilds clusters without giving priority to the stability of one cluster over another, and, after april 2020, viaf no longer makes available to users a record of its changes.50 on the contrary, wikidata data is always under strict control of any user, as its structure is designed to trace any minimum change to its data. every single addition or deletion is documented, not just to easily recover eventual vandalism, but also to support any decision with clear evidence. any stakeholder can exactly know if, how, when, and why data changed, in any moment. what is more, from a qualitative point of view, wikidata seems to offer a better solution for the recording of authority data than viaf. first, it can store a wider variety of data about a person in a more semantic way. not only is it possible in wikidata to express preferred and variant forms of the name, related names, works, co-authors, publication statistics, and other data about the person—like in viaf—but all these data are all expressed in a semantic way. for example, whereas in viaf “bach, anna magdalena” is just a related name of johann sebastian bach, in wikidata she is recorded and qualified as the person who married the musician. thanks to that different approach, wikidata can represent and show bach’s full genealogic tree (https://magnustoolserver.toolforge.org/ts2/geneawiki/?q=q1339). as adamich noted, “building graphs from bibliographic entities is really about making the data machine readable and understandable. it is about making the data web enabled. in terms of translation, linked data opens up a whole new world over our marc entrapment.”51 https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production https://www.ifla.org/node/92837 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 18 quality is enhanced by matching methods too; whereas viaf matches identities by an algorithm based on explicit identifiers or string matching (such as the forms of the name, dates, and bibliographic relationships),52 wikidata matches are usually decided by a human, the user, or (in the case of semiautomatic imports) at least checked a posteriori by a human after some time. the higher precision of manual over automatic matching is recognized also in viaf guidelines. 53 furthermore, as seen above, notability requires that, when clear identification is impossible, no item must be created in wikidata. data maintenance and usability data quality relies also on maintenance. comparison between wikidata items and viaf clusters shows a very small but constant presence of errors to be fixed in both (around 0.01%), even if it is impossible to determine with certainty whether viaf uses wikidata error pages. issues on fixing viaf errors directly by viaf contributors were already noted: “while clustering anomalies can be handled by viaf itself, reporting errors found in source data of viaf partners raise problems related to the efficiency of the notification workflows. at this point, involvement of viaf partners themselves in the process is needed.”54 on the other hand, in wikidata anyone can edit items, add new data or delete mistakes, merge items, fix various issues, and so on, on the fly. due to its openness, wikidata may also suffer from vandalism, but it has its own solutions.55 along with this, data receive special attention to their accuracy and reliability because they are uploaded and maintained by users that are direct stakeholders. for this reason, in wikidata, references to bibliographical or biographical sources and to other data provider ids such as any national and international identification system are suggested, promoted, and carefully examined. moreover, there is a commitment to monitor the consistency of viaf clusters. the ability of wikidata to identify inconsistent viaf clusters and the fact that viaf isolated clusters can be reduced at least by 30%56 by referring to identifiers from wikidata and other data providers, are the best demonstration of the quality of its data and of the importance of the other data providers in viaf clusterization. as to the usability of data, the internal search of viaf lacks more than basic functions: the only available filter allows to limit results to clusters having one specific source; on the contrary, filtering searches for clusters having and/or not having a specific group of sources or to clusters having more or less sources would be very useful, especially in order to find duplicates. in contrast, wikidata has a sparql query service which returns results based on the current status of the database and its internal search can integrate some of the functions of the query service, allowing to look for items having and/or not having specific statements (https://www.wikidata.org/wiki/special:search).57 considering cases in which viaf and wikidata discover potential duplicates in their sources, viaf has no page dedicated to listing cases of (supposedly) duplicate ids from its sources, while wikidata easily allows to find cases in which single sources have (supposedly) duplicate ids through constraint violations58 and appropriate sparql queries. a comparison table a comparison table was built to compare scope, role, system, and functions between viaf and wikidata, inspired by and adapted from a viaf vs isni comparison.59 https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 19 table 9. comparison between and complementarity of viaf and wikidata features feature viaf wikidata scope ● persons ● organizations ● works ● expressions ● locations ● any kind of viaf entity ● any “res” of ifla lrm ● any entity of cidoc ● any other non-glam entity ● any entity in the universe of discourse software ● unknown ● wikibase60 data. person entity properties ● preferred form of name, based on national cataloguing rules ● very rich variant forms of name, identified by national agencies variant forms ● sources ● preferred form of name (label) based on convenience of the user and common usage61 ● variant forms of name (aliases), organized by languages and scripts62 ● sources (as statements and references and with qualifiers) data. quantity (persons) ● number of clusters: 33,656,281 (sept. 2020) ● number of personal clusters: 22,099,715 (sept. 2020) ● number of entities: 90,260,081 (oct. 2020) ● number of personal items: 8,304,947 (oct. 2020) ● number of personal items with viaf id: 2,061,046 (sept. 2020) data. harvesting ● data are provided by authoritative national bibliographic agencies ● data are added through massive semiautomatic imports and/or manually by any interested user data. quality ● data are granted by authoritative national bibliographic agencies ● data are controlled by any directly interested user, based on data from viaf, available bibliographic agencies, and other authoritative bibliographic sources data. other entities properties ● isbn, titles, dates included in the cluster ● any kind of property applicable to an entity can be used (multimedia included)63 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 20 feature viaf wikidata ● dates, genre, bibliographic references from sources, xlinks, etc. ● properties are unchangeable ● all statements admit references, which are strongly recommended in some cases ● unavailable properties can be freely added through a process of property proposal64 data. dates ● dates are extracted from authority and bibliographic records using a parsing technique; calendars and precision are not available65 ● dates are imported semiautomatically from various sources or filled in manually; different calendars are available and further statements can be made through qualifiers66 data. vandalism ● no vandalism: data are editable only by oclc ● everyone can edit, but items which are frequently vandalized can be temporarily or permanently protected from the edits of unregistered users67 data. fixing errors, deduplicating, or unmerging clusters/items ● suggestions and requests via email ● asynchronous ● presumably, automated processes and human interventions ● viaf rebuilds clusters and does not give priority to the stability of one cluster over another68 ● everyone can edit69 ● instantaneous ● probable errors (constraintviolations) are detected in an automated way (by bots and through queries) ● pages with lists of probable errors (constraint-violations) are freely available and constantly updated in an automated way (by bots)70 data. license ● all public data (license: http://opendatacommons.org/licen ses/by/1.0/) ● all public data (license: https://creativecommons.org/publi cdomain/zero/1.0/deed.it) role ● create clusters ● ingest authority records from viaf contributors and other data providers (included wkd and isni) ● publish and diffuse viaf ids and data ● create items with a worldwide recognized and standard identifier ● interlink items with any available external identifier ● ingest data from viaf, from viaf contributors, and other data providers (e.g., isni) http://opendatacommons.org/licenses/by/1.0/ http://opendatacommons.org/licenses/by/1.0/ https://creativecommons.org/publicdomain/zero/1.0/deed.it https://creativecommons.org/publicdomain/zero/1.0/deed.it information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 21 feature viaf wikidata ● allow to create and maintain on toolforge free tools—e.g., mix’n’match—to ingest external identifiers71 ● manage library, bibliographic, and non-library and non-bibliographic linked data ● publish and diffuse wikidata ids and data organizational model ● oclc service, guided by viaf council of participating institutions ● hierarchical, top-down ● membership on request and subordinated to approval ● largely limited to national bibliographic agencies ● wikimedia project ● distributed, bottom-up ● everyone can take part in the project72 ● open to any bibliographic or nonbibliographic institution (national, large, medium, and small) system. website ● interface only in english language ● interface in nearly any language and script; new ones can be added ● online facilities (end user input; edit online facilities for end user) ● login enhances users’ experience (by gadgets and scripts) system. updating ● periodical (asynchronous) ingestions ● continuous, instantaneous, free updates system. versioning ● history is included in each present cluster and for abandoned clusters ● history is inaccessible in redirected clusters ● page history available in each item and for redirected items ● for deleted items, history is accessible only to administrators long-term preservation policy ● oclc maintains the hosting, software, and data for viaf73 ● wikimedia foundation maintains the hosting, software, and data for wikidata74 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 22 feature viaf wikidata notifications to stakeholders ● notifications to be sent to data providers ● notifications are sent to end users and contributors display, search, and download ● in multiple formats: xml and json, including justlinks.json; ● basic search interface ● clusters are listed without clear ranking rule ● integrating monthly dumps ● api endpoint75 ● before april 2020, by monthly dump with persist links; after, monthly dumps without persists links ● in multiple formats: json, php, n3, ttl, nt, rdf, jsonld, html76 ● search interface 77 ● api endpoint78 ● sparql query endpoint79 ● dumps80, also customizable81 ● see https://www.wikidata.org/wiki/help :about_data linked data and sru ● linked data ● sru82 (search and browse indexes, using cql syntax; output formats are xml or html) ● linked data interoperability. local ● local institution can only reconcile viaf ids to their own data ● as changes are made by viaf, synchronization must be periodically performed by sources and local institutions ● full reconciliation, upload, and synchronization of local ids on wikidata and vice versa ● dedicated tools: mix’n’match ● other tools: openrefine ● bots ● manually conclusion main viaf and wikidata features and personal entities data were analyzed and compared in this study to focus on analogies and differences, and to highlight their reciprocal role and helpfulness in the worldwide bibliographical context and in the semantic web environment. viaf is a major international initiative to address the challenge of reliably identifying bibliographic agents on the web, by means of authoritative data based on national cataloguing codes and coming from the national libraries involved in the ubc program. moreover, viaf is a pillar of the identification process that users enact within wikidata. still, the comparison emphasized a few relevant issues in viaf’s approach, designed more than twenty years ago: a very selective policy of inclusion of its sources—contributors and other data providers—and to their participation to the governance, that prevents a worldwide openness of the project to non national libraries and cultural institutions; an obvious neutrality toward data coming from its https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 23 contributors, even when data are not compliant with the identification requirements of the semantic web; troubles in correct clustering of ids (duplicate clusters to be merged and conflated clusters to be split), and a one-way flow of data due to its top-down approach that prevents a quick and cooperative workflow to identify and fix errors; the ability to identify only a narrow range of entities (i.e., mainly bibliographic entities, but not even all those provided by ifla lrm). on the other side, the semantic web has offered new important tools and chances to libraries, archives, museums and other cultural institutions, and their data are recognized as a relevant asset for building the backbone of the semantic web as to the control of entities of bibliographic and cultural interest. after eight years of existence, wikidata is playing a relevant role in the publication, aggregation, and control of bibliographic and non-bibliographic information in the semantic web too. it is more and more indicated as a hub for identifiers in the semantic web.83 wikidata depends on viaf for a large part of the identification work of its items on viaf and viaf’s preeminent role in wikidata is acknowledged by its primary position in the identifiers section of the data of each item. for this reason, the wikidata community constantly monitors the consistency of viaf clusters and continuously updates lists of errors present in them . on the other hand, if viaf is undoubtedly very useful to the wikidata community, wikidata can support the consistency of viaf clusters. the wikidata informational ecosystem is much larger and wider, can be built by any interested institution and person, and its identification function can count also on the authority work of national and non-national libraries excluded from the viaf environment, and on authoritative non-bibliographical reference sources too. this study opens some research perspectives. analysis was limited to data about personal entities, as this kind of entity was the only one directly comparable, while further research is wanted to possibly extend the analysis to other kinds of entities. moreover, more research should be devoted to the investigation of the treatment of special categories of persons and their names, such as mythological and legendary characters, ancient greek and latin authors, kings, queens, popes, saints, and so on, as viaf guidelines84 themselves declare among viaf’s typical problems the clusterization of such names (and they often get five or more viaf ids in wikidata). a further line of research should consider the relevance of the clusterization of encyclopedias and other reference sources in the identification process within wikidata. lastly, isolated clusters would need more consideration; as a matter of fact, in this study they were used as a clue of relatively recent uploads in viaf, but lc and dnb show a high rate of isolated clusters too (maybe due to the richness of their collections and metadata). more research on isolated clusters could help to describe with more precision the possible role of non-national libraries and institutions and of their locally rich collections in identifying lesser-known agents (not just persons) in a worldwide perspective. from analyzed data and direct comparison, it can be concluded that viaf and wikidata can be constantly improved through reciprocal comparison, which allows discovery of errors in both. viaf and wikidata are two relevant tools for the authority control in the semantic web and they each have a specific role to play and different stakeholders. unfortunately, as opposed to the relationship between viaf and isni, at present no aspect of viaf-wikidata interoperability is discussed between the managing structures of both systems, on a regular or irregular basis . while wikidata appears to be more reliable with regards to the identification process, its most significant weakness consists in its unorganized and unplanned crowdsourced data acquisition, information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 24 even if based at present on about 11,500 active editors.85 furthermore, the wikidata community still lacks the constant support and cooperation of institutional data curators such as librarians, archivists, and museum curators. many current projects are mainly dedicated to explaining to the potential institutional stakeholders the importance and the usefulness of wikidata for their institutional missions, but there are still too few projects devoted to massive synchronization of data from institutional silos to wikidata. but, as soon as these initiatives reach a critical mass, wikidata will become the real global hub of the web of data. acknowledgements all the authors have cooperated in the redaction and revision of the article. nevertheless, each author has mainly authored specific sections and subsections of the article: • stefano bargioni: data analysis; viaf; wikidata; viaf and wikidata: a data comparison. • carlo bianchini: introduction; discussion; organizational model; identification function; data quantity; data quality; data maintenance and usability. • camillo carlo pellizzari di san girolamo: relationship between viaf and libraries; relationship between wikidata and academic, research, and public libraries; relationship between viaf and wikidata; wikidata controls on viaf; materials and methods; conclusion. all authors contributed to a comparison table. the authors wish to thank the anonymous reviewer whose suggestions helped to improve and enrich the paper, and the editor for his helpful edits. information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 25 endnotes 1 thomas baker et al., library linked data incubator group final report, sec. 2 (w3c incubator group, october 25, 2011), http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/. 2 baker et al., library linked data. 3 dorothy anderson, universal bibliographic control. a long term policy—a plan for action (munchen: verlag dokumentation, 1974), 11. 4 anila angjeli, andrew mac ewan, and vincent boulet, “isni and viaf: transforming ways of trustfully consolidating identities,” in ifla wlic 2014 (ifla 2014 lyon, ifla, 2014), 2, http://library.ifla.org/985/1/086-angjeli-en.pdf. 5 rick bennett et al., “viaf (virtual international authority file): linking the deutsche nationalbibliothek and library of congress name authority files,” international cataloguing and bibliographic control 36, no. 1 (2007): 12–18; barbara b. tillett, the bibliographic universe and the new ifla cataloging principles : lectio magistralis in library science = l’universo bibliografico e i nuovi principi di catalogazione dell’ifla : lectio magistralis di biblioteconomia (fiesole (firenze): casalini libri, 2008), 14–15, http://digital.casalini.it/9788885297814; “viaf. connect authority data across cultures and languages to facilitate research,” oclc, 2020, https://www.oclc.org/en/viaf.html. 6 gildas illien and françoise bourdon, “a la recherche du temps perdu, retour vers le futur: cbu 2.0” (paper, ifla wlic 2014, lyon, france, 2014), 13–14, http://library.ifla.org/956/. 7 illien and bourdon, “a la recherche,” 15. 8 gordon dunsire and mirna willer, “the local in the global: universal bibliographic control from the bottom up” (paper, ifla wlic 2014, lyon, france, 2014), 11, http://library.ifla.org/817/. 9 luca martinelli, “wikidata: la soluzione wikimediana ai linked open data,” aib studi 56, no. 1 (march 2016): 75–85, https://doi.org/10.2426/aibstudi-11434; jesús tramullas, “objetos culturales y metadatos: hacia la liberación de datos en wikidata,” anuario thinkepi 11 (2017): 319–21, https://doi.org/10/ghbj63; xavier agenjo-bullón and francisca hernández-carrascal, “wikipedia, wikidata y mix’n’match,” anuario thinkepi 14 (2020), https://doi.org/10/ghbj6t; claudio forziati and valeria lo castro, “the connection between library data and community participation: the project share catalogue-wikidata,” jlis.it 9, no. 3 (2018): 109–20, https://doi.org/10/ggxj9n; adrian pohl, “was ist wikidata und wie kann es die bibliothekarische arbeit unterstützen?,” abi technik 38, no. 2 (2018): 208, https://doi.org/10/ghbj6w; arl white paper on wikidata: opportunities and recommendations (the association of research libraries, 2019), https://www.arl.org/wpcontent/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf; regine heberlein, “on the flipside: wikidata for cultural heritage metadata through the example of numismatic description” (paper, ifla wlic 2019, libraries: dialogue for change, session 206: art libraries with subject analysis and access, athens, greece, august 28, 2019), http://library.ifla.org/2492/1/206-heberlein-en.pdf. 10 arl white paper on wikidata, 27–30; theo van veen, “wikidata: from ‘an’ identifier to ‘the’ identifier,” information technology and libraries 38, no. 2 (2019): 72–81, http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/ http://library.ifla.org/985/1/086-angjeli-en.pdf http://digital.casalini.it/9788885297814 https://www.oclc.org/en/viaf.html http://library.ifla.org/956/ http://library.ifla.org/817/ https://doi.org/10.2426/aibstudi-11434 https://doi.org/10/ghbj63 https://doi.org/10/ghbj6t https://doi.org/10/ggxj9n https://doi.org/10/ghbj6w https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf http://library.ifla.org/2492/1/206-heberlein-en.pdf information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 26 https://doi.org/10/ghbj62; hilary thorsen, “ld4p: linked data for production: wikidata as a hub for identifiers” (slideshow presentation, june 11, 2020), https://docs.google.com/presentation/d/1jwz3_ncf5rdd7ejetglfv99uv2pnd1v/edit?usp=embed_facebook. 11 tillett, the bibliographic universe, 15. 12 open data commons attribution license (odc-by) v1.0 (as stated in http://viaf.org/viaf/data/). 13 “viaf admission criteria,” oclc, 2020, https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf. 14 the description of wikidata source in http://viaf.org/viaf/partnerpages/wkp.html seems to refer to wikipedia before the existence of wikidata. the same acronym wkp reflects this anachronism, whereas isni correctly uses wkd. anyway, this description, as well as many others, requires an update. 15 stacy allison-cassin and dan scott, “wikidata: a platform for your library’s linked open data,” code4lib journal 40 (may 4, 2018), https://journal.code4lib.org/articles/13424. 16 carlo bianchini and pasquale spinelli, “wikidata at fondazione levi (venice, italy): a case study for the publication of data about fondo gambara, a collection of 202 musicians’ portraits,” jlis.it 11, no. 3 (september 15, 2020): 24. 17 ifla working group on functional requirements and numbering of authority records (franar), functional requirements for authority data: a conceptual model (münchen: k. g. saur, 2009), 46, https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf. for qualifiers, see https://www.wikidata.org/wiki/help:qualifiers; for references see https://www.wikidata.org/wiki/help:sources. 18 partial lists are linked from https://wikibase-registry.wmflabs.org/wiki/main_page. 19 see https://www.transition-bibliographique.fr/fne/french-national-entities-file/; the proof of concept is available at https://github.com/abes-esr/poc-fne. 20 jean godby et al., creating library linked data with wikibase: lessons learned from project passage (dublin oh: oclc research, 2019): 8, https://doi.org/10.25333/faq3-ax08. 21 ifla, “opportunities for academic and research libraries and wikipedia” (discussion paper, 2016), 10, https://www.ifla.org/files/assets/hq/topics/infosociety/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf. 22 john riemer, “the program for cooperative cataloging & a wikidata pilot” (slideshow presentation, june 16, 2020), slide 5, https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmy xffi/edit#slide=id.p. 23 godby et al., “creating library linked data,” 8. https://doi.org/10/ghbj62 https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook http://viaf.org/viaf/data/ https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf http://viaf.org/viaf/partnerpages/wkp.html https://journal.code4lib.org/articles/13424 https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf https://www.wikidata.org/wiki/help:qualifiers https://www.wikidata.org/wiki/help:sources https://wikibase-registry.wmflabs.org/wiki/main_page https://www.transition-bibliographique.fr/fne/french-national-entities-file/ https://github.com/abes-esr/poc-fne https://doi.org/10.25333/faq3-ax08 https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 27 24 maximilian klein and alex kyrios, “viafbot and the integration of library data on wikipedia,” code4lib journal 22 (october 14, 2013), https://journal.code4lib.org/articles/8964. 25 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp) (den haag: ifla, 2016), para. 5.3. 26 https://www.wikidata.org/wiki/mediawiki:wikibasesortedproperties#ids_with_datatype_%22external-id%22; isni (p213, https://www.wikidata.org/wiki/property:p213) is presently sorted after viaf instead of in the iso section because it is considered primarily as a viaf source. 27 epìdosis, viaf e wikidata.mpg, 2020, https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg; a list of gadgets is available at https://www.wikidata.org/wiki/wikidata:viaf/cluster#gadgets. 28 the main error-report page is https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities; its subpage https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries is designed for collecting “easy” cases of conflation, when only a few members of a cluster should be moved elsewhere, while the cluster is substantially sane. 29 moreno hayley, email to author, march 23, 2020. to the question if data about abandoned clusters would have been maintained, the viaf answered, “we recognize that the data in the file was not usable. viaf is in a period of transition and it was decided that we could not at this time fix the file so it has been removed from the list of available downloads.” 30 the statement read: “the persist-rdf.xml file has been removed and will no longer be available,” accessed october 23, 2020. 31 angjeli, mac ewan, and boulet “isni and viaf,” 3. 32 https://dumps.wikimedia.org/wikidatawiki/; instructions and a list of kinds of data dumps are available at https://www.wikidata.org/wiki/wikidata:database_download. 33 a general explanation of ranks is available at https://www.wikidata.org/wiki/help:ranking. here is a small summary: values of statements can be ranked in three ways, “preferred,” “normal” (default), and “deprecated”; the expression “values with non-deprecated rank” includes all values with preferred rank or normal rank; the expression “values with best rank” includes only values with preferred rank or normal rank, with this condition: if the same statement has two or more values and at least one of them has preferred rank, values with normal rank aren’t counted; if there aren’t values with preferred rank, all values with normal rank are counted. 34 viaf and wikidata dumps, together with the scripts, were published on zenodo at https://doi.org/10.5281/zenodo.4457114. https://journal.code4lib.org/articles/8964 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/property:p213 https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg https://www.wikidata.org/wiki/wikidata:viaf/cluster%23gadgets https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries https://dumps.wikimedia.org/wikidatawiki/ https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/help:ranking https://doi.org/10.5281/zenodo.4457114 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 28 35 the queries can be performed using the following links: viaf members: https://w.wiki/i5j; authority controls related to libraries but not being viaf members: https://w.wiki/i5k; biographical dictionaries: https://w.wiki/i5n. 36 the query can be performed using the following link: https://w.wiki/i5p. 37 it could be because they are probably more difficult to cluster, but in some cases also because they represent infrequently described entities. 38 as suggested by the reviewer, more removals than additions may be a clue of a cleanup project. 39 pat riva, patrick le boeuf, and maja zumer, ifla library reference model, draft (den haag: ifla, 2017), https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf; nick crofts et al., “definition of the cidoc conceptual reference model,” version 5.0.4, icom/cidoc crm special interest group, 2011, http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html; chryssoula bekiari et al., eds., frbr object-oriented definition and mapping from frbrer, frad and frsad, version 2.0 (international working group on frbr and cidoc crm harmonisation, 2013), http://old.cidoccrm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf; lydia pintscher, lea lacroix, and mattia capozzi, “what’s new on the wikidata features this year,” youtube video, october 26, 2020, truocolo, https://www.youtube.com/watch?v=ebxdzk54gru. 40 denny vrandečić and markus krötzsch, “wikidata: a free collaborative knowledgebase,” communications of the acm 57, no. 10 (september 23, 2014): 80, https://doi.org/10/gftnsk. 41 for a general statistic see http://wikidata.wikiscan.org/users; for a statistic about the viaf property see https://bambots.brucemyers.com/navelgazer.php?property=p214; changing the id of the property at the end of the url allows exploring other property statistics. 42 shiyali ramamrita ranganathan, reference service, 2nd ed., ranganathan series in library science 8 (bombay: asia publishing house, 1961), 74. 43 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5, https://www.ifla.org/publications/node/11015. 44 wikidata does have a guideline for a preferred label, and its choice is based on users’ convenience (https://www.wikidata.org/wiki/help:label, par. 1.2) as required by international cataloguing principles (2016). as to the choice of the wikidata label in a specific language, viaf does not show any clear principle, while the authors believe that it would be preferable to use the english (“en”) label, whenever available. see ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp). 45 for example, in september it was done for nkc using openrefine (sample edit: https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=12668704 64). https://w.wiki/i5j https://w.wiki/i5k https://w.wiki/i5n https://w.wiki/i5p https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf https://www.youtube.com/watch?v=ebxdzk54gru https://doi.org/10/gftnsk http://wikidata.wikiscan.org/users https://bambots.brucemyers.com/navelgazer.php?property=p214 https://www.ifla.org/publications/node/11015 https://www.wikidata.org/wiki/help:label https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 29 46 angjeli, mac ewan, and boulet, “isni and viaf,” 9. 47 simon cobb (https://www.wikidata.org/wiki/user:sic19) became wikidata visiting scholar in 2017 (https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar). 48 federico leva and marco chemello, “the effectiveness of a wikimedian in permanent residence: the beic case study,” jlis.it 9, no. 3 (september 2018): 141–47, https://doi.org/10.4403/jlis.it-12481. 49 angjeli, mac ewan, and boulet, “isni and viaf,” 11. 50 andrew mac ewan, “isni, viaf and naco and their relationship to orcid, discussion paper for pcc policy committee, 4 november,” 2013, 2, http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.d ocx. 51 tom adamich, “library cataloging workflows and library linked data: the paradigm shift,” technicalities 39, no. 3 (may/june 2019): 14. 52 oclc, viaf guidelines, rev. july 16, 2019, 2, https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 53 oclc, viaf guidelines, 5. “when viaf is unable to algorithmically match some of the source authority records with each other, they can be manually pulled together into a single cluster using an internal table.” 54 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 55 stefan heindorf et al., “vandalism detection in wikidata,” in proceedings of the 25th acm international conference on information and knowledge management, cikm ’16 (new york, ny: association for computing machinery, 2016), 327–36, https://doi.org/10/gg2nmm; amir sarabadani, aaron halfaker, and dario taraborelli, “building automated vandalism detection tools for wikidata,” in proceedings of the 26th international conference on world wide web companion, www ’17 companion (republic and canton of geneva, che: international world wide web conferences steering committee, 2017), 1647–54, https://doi.org/10/ghhtzf. 56 see table 1, col. 1 vs col. 9; it should be noted that col. 9 considers only non-viaf sources and biographical dictionaries, but wikidata also links to encyclopedias and other online databases. 57 for example, people not having viaf id but having iccu id (https://tinyurl.com/y6hbtjuo); instructions about the internal search are available at https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch. 58 https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations. 59 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 60 https://www.mediawiki.org/wiki/wikibase/datamodel. https://www.wikidata.org/wiki/user:sic19 https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar https://doi.org/10.4403/jlis.it-12481 http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf https://doi.org/10/gg2nmm https://doi.org/10/ghhtzf https://tinyurl.com/y6hbtjuo https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.mediawiki.org/wiki/wikibase/datamodel information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 30 61 “the label is the most common name that the item would be known by” (https://www.wikidata.org/wiki/help:label). see also ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5., https://www.ifla.org/publications/node/11015. 62 bots exist to create more and more variant forms based on matching properties, such as date of birth (p569) and date of death (p570), and to import variant forms of names from national authority files. see, for example, https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 . 63 https://www.wikidata.org/wiki/help:data_type. 64 https://www.wikidata.org/wiki/wikidata:property_proposal. 65 jenny a. toves and thomas b. hickey, “parsing and matching dates in viaf,” code4lib journal, 26 (october 21, 2014), https://journal.code4lib.org/articles/9607; stefano bargioni, “from authority enrichment to authoritybox : applying rda in a koha environment,” jlis.it 11, no. 1 (2020): 175–89, https://doi.org/10/gg66rq. 66 https://www.wikidata.org/wiki/help:dates. 67 see heindorf et al., “vandalism detection in wikidata.” 68 see mac ewan, “isni, viaf and naco.” 69 see https://www.wikidata.org/wiki/help:merge, https://www.wikidata.org/wiki/help:split_an_item, and https://www.wikidata.org/wiki/help:conflation_of_two_people. 70 complete list at https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations (e.g., https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214). 71 https://admin.toolforge.org/; see also xavier agenjo-bullón and francisca hernándezcarrascal, “registros de autoridades, enriquecimiento semántico y wikidata,” anuario thinkepi 12 (2018): 361–72, https://doi.org/10/ghbj6z. 72 https://www.wikidata.org/wiki/wikidata:property_proposal. 73 https://www.oclc.org/en/viaf.html. 74 https://www.wikidata.org/wiki/wikidata:introduction. 75 https://platform.worldcat.org/api-explorer/apis/viaf. 76 https://www.wikidata.org/wiki/special:entitydata; see also https://www.wikidata.org/wiki/wikidata:database_download. 77 https://www.wikidata.org/wiki/special:search. https://www.wikidata.org/wiki/help:label https://www.ifla.org/publications/node/11015 https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 https://www.wikidata.org/wiki/help:data_type https://www.wikidata.org/wiki/wikidata:property_proposal https://journal.code4lib.org/articles/9607 https://doi.org/10/gg66rq https://www.wikidata.org/wiki/help:dates https://www.wikidata.org/wiki/help:merge https://www.wikidata.org/wiki/help:split_an_item https://www.wikidata.org/wiki/help:conflation_of_two_people https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214 https://admin.toolforge.org/ https://doi.org/10/ghbj6z https://www.wikidata.org/wiki/wikidata:property_proposal https://www.oclc.org/en/viaf.html https://www.wikidata.org/wiki/wikidata:introduction https://platform.worldcat.org/api-explorer/apis/viaf https://www.wikidata.org/wiki/special:entitydata https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 31 78 https://www.wikidata.org/w/api.php. 79 https://query.wikidata.org/. 80 https://dumps.wikimedia.org/wikidatawiki/. 81 https://wdumps.toolforge.org/. 82 https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html. 83 van veen, “wikidata.” 84 see “typical problems” in viaf guidelines: https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 85 pintscher, lacroix, and capozzi, “what’s new.” https://www.wikidata.org/w/api.php https://query.wikidata.org/ https://dumps.wikimedia.org/wikidatawiki/ https://wdumps.toolforge.org/ https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf abstract introduction relationship between viaf and libraries relationships between wikidata and academic, research, and public libraries relationship between viaf and wikidata wikidata controls on viaf materials and methods data analysis: viaf clusters and wikidata items viaf wikidata viaf and wikidata: a data comparison discussion organizational model identification function data quantity data quality data maintenance and usability a comparison table conclusion acknowledgements endnotes 276 on-line acquisitions by lolita frances g. spigai: former information analyst, oregon state university library; and thomas mahan: research associate, oregon state university computer center, corvallis, oregon. the on-line acquisition program (lolita) in use at the oregon state university library is described in t erms of development costs, equipment requirements, and overall design philosophy. in pa1'ticular, the record format and content of records in the on-orde1' file, and the on-line processing of these records (input, search, correction, output) using a cathode ray tube display terminal are detailed. the oregon state university library collection has grown by 15,00020,000 new titles per year (corresponding to 30,000-35,000 volumes per year) for the past three years to a total of approximately 275,000 titles ( 600,000 volumes); continuing serials account for a large percentage of annual "volume" growth. these figures would indicate an average input of 60-80 new titles per day. on an average, a corresponding number of records are removed each day upon completion of the processing cycle. a like number of records are updated when books and invoices are received. in addition, approximately 200 searches per day are made to determine whether an item is being ordered or to determine the status of an order. since the mid-1960's, and with the introduction of time-sharing, a handful of academic libraries ( 1, 2, 3) and several library networks ( 4, 5, 6) have introduced the advantages ( 7) of on-line computer systems to library routines. most of the on-line library systems use teletypewriter terminals. use of visual displays for library routines has been limited, although stanford anticipates using visual displays with ibm 2741 typeon-line acquisitionsjspigai and mahan 277 writer terminals in a read-only mode ( 1), and the library of the ibm advanced systems development division at los gatos, sharing an ibm 360/50, uses an ibm 2260 display for ordering and receiving ( 8). in addition, an institute of library research study, focusing on on-line maintenance and search of library catalog holdings records, has concluded that even with the limited number of characters available on all but the most expensive display terminals " ... the high volume of data output associated with bibliographic search makes it desirable to incorporate crt's as soon as possible, in order to facilitate testing on a basis superior to that achievable with the mechanical devices." (9). many academic libraries, during shelflist conversion or input of acquisition data, use a series of tags for bibliographic information. some of these tags are for in-house use, while others presumably are used to aid in the conversion of marc tape input to the library's own input format. the number of full-time staff required to design and operate automated systems in individual academic libraries typically ranges from seven to fifteen. this doesn't seem to be an inordinate range, since most departments of a medium-large to large academic library require a similar size staff for operational purposes alone. lolita (library on-line information and text access) is the automated acquisition system used by the oregon state university library. it operates in an on-line, time-shared, conversational mode, using a cathode ray tube (cdc-210) or a 35-ksr teletype as a terminal, depending upon the operation required. both types of equipment are in the acquisitions department of the library; each interacts with the university's main computer ( cdc-3300, 91k core, 24-bit words), which, in turn accesses the mass storage disk ( cdc-814, capable of storing almost 300 million characters) through the use of lolita's programs in conjunction with the executive program, os-3 ( 10). under the os-3 time-sll,aring system, lolita shares the use of the central computer memory and processor with up to 59 other concurrent users; the use of the mass storage disk is also shared with other users of the university's computer center. (lolita will require approximately 11 million characters of disk storage). lolita's programs are written in fortran and in the assembly language, compass, and are composed of two sets: those which maintain the outstanding order file, and those which produce printed products and maintain the accounting and vendor files. several key factors have shaped the design of lolita. an on-line, time-sharing system has been operating at osu since july 1968, and online capabilities have been available for test purposes since the summer of 1967. programming efforts could be concentrated exclusively on the design of lolita and an earlier pilot project ( 11) , for no time was needed to design, debug or redesign the operating system software, as was necessary at washington state u. and the u. of chicago (2, 12) . heavy reliance was put on assembly language coding for the usual 278 journal of library automation vol. 3/4 december, 1970 reasons, plus the knowledge that the computer center's next computer is to be a cdc-3500, with an instruction set identical to that which the library now uses. in short, neither the os-3 operating system nor the assembly language will change for the next few years. an added motivation influencing program design was the desire to minimize response time for the user. in view of the transient nature of a university library's student and civil service staff, the need for an easily-learned and maintained system is paramotmt. the flerible display format of the crt allows a machine readable worksheet, with a built-in, automatic, tagging scheme; it obviates the need for a paper worksheet, and thus eliminates a time-consuming, · tedious, and error-prone conversion process. the book request slip contains the source information for input. proofreading and correction are done on-line at time of input. alterations can be made at any later time as well. lolita has used from 1.5 to 3.0 fte through the period of design to operation. after an initial testing and data base buildup period, anticipated to last about six months, and during which lolita will be run in parallel with the manual system, it is expected that the on-order/in-process, vendor, and accounting files will be maintained automatically and that reports and forms currently output by the acquisitions department staff will be generated automatically. specifically, records comprising three files will be kept on-line : 1) the outstanding order file (a slight misnomer since it includes and will include three types of book request data: outstanding orders, desiderata of high priority, and in-process material), 2 ) name and address for those vendors of high use (approximately 200 of 2500, or about 8% ), and codes and use-frequency counts for all vendors, and 3) accounting data for all educational resource materials purchased by the oregon state university library. it should be kept in mind that, although lolita is designed for book order functions, the final edited record, after the item has been cataloged, will be captured on magnetic tape as a complete catalog record. thus, all statistics and information, except circulation data, will be available for future book acquisitions. this project is being undertaken for two reasons: 1) the oregon state university library is concerned that librarians achieve their potential as productive professionals through the use of data processing equipment for routine procedures, and that cost savings may be realized as the library approaches a total system encompassing all of the technical services routines, and 2) a uniquely receptive computer center and a successful on-line time-sharing facility are available. record format and content each book request is described by 27 data elements which are grouped into three logical categories and are displayed in three logical "pages" on-line acquisitionsfspigai and mahan 279 of a crt screen. the categories are: 1) bibliographic information, 2) accounting information, and 3) inventory information; figures 1, 2, and 3 list the data elements in the same sequence as they appear on the crt screen. though most data elements listed are self-explanatory, eight require some description. order number flag word author title edition id number publisher year published notes fig. 1. bibliographic information. order number date requested date ordered estimated price number of copies account number vendor code vendor invoice number invoice date actual price date received date 1st claim sent date 2nd claim sent fig. 2. accounting information. order number bib cit date cataloged volume issue location code lc class number fig. 3. inventory information. 280 l ournal of library automation vol. 3 f 4 december, 1970 flag word this data element indicates the status of a request. the normal order procedure needs no hag word. exceptions are dealt with automatically by entering an appropriate hag word. as more requests are added to the system, and as more exceptional instances are uncovered, more hag words will undoubtedly be added. to date there are twelve hag words, plus one data element which serves both as a data element and as a status signal. flag words and procedures activated are described below. conf.: confirming orders for materials ordered by phone or letter, and for unsolicited items which are to be added to the collection. the order form is not mailed, but used for processing internal to the library only. accounting routines are activated. gift: for gift or exchange items, a special series number prefixed by a "g" is assigned and the printed purchase order is used internally only. this hag word also acts as a signal so that accounting routines will not encumber any money. the primary reason for assigning a purchase order number is to provide a record indexing mechanism (this is also true for held orders) . held : selected second-priority orders being held up for additional book budget funds. these order records are kept on line, and are assigned a special series of purchase order numbers, prefixed by an "h." no accounting procedures accompany these orders, although a purchase order is generated and manually filed by purchase order number. live : held orders which have been activated. this word causes a reassignment of purchase order numbers to the next number in the main sequence ( instead of "h" -prefixed numbered) and sets up the natural chain of accounting events. the new purchase order number is then written or typed on the order form, the order date added, and the order mailed. cash: orders for books from vendors who require advance payment. an expenditure, instead of an encumbrance, is recorded. rush: used for books which are to be rush ordered and/or rush cataloged. rush will also be rubber-stamped on the purchase order for emphasis. no special procedures are activated within the computer programs; rush is an instruction for people. docs: used when ordering items from vendors with whom the osu library maintains deposit accounts (e.g. government printing office). this causes a zero encumbrance in the accounting scheme; cash is used to put additional money into deposit accounts. canc: cancelled orders. unencumbers monies and credits accounts for cash orders. reis: used to reissue an order for an item which has been cancelled. a new purchase order containing a new order number, vendor, etc. will automatically be issued. re-input is not necessary; however, changes in vendor no., etc., can be made. on-line acquisitionsj spigai and mahan 281 part: denotes a partial shipment for one purchase order. no catalog date can be entered while part appears as the flag word. invo will replace part when the final shipment has been received; canc will replace part if the final shipment is not received, and the order is reissued for the portion received. · invo : when invoice information is entered into the file, invo is typed in as the flag word. this causes accounting information (purchase order number, vendor code, invoice number, actual price, invoice data, account number) to be duplicated in the accounting file. kill: used to remove an inactive record from the file ( cf. date cataloged). date cataloged: a value entered for this data element signals the end of processing. the record is removed from the main file and transferred to magnetic tape. changes and additions to inventory and bibliographic data elements are anticipated at this final point, to bring the record into line with those of the catalog dept. author(s) all authors are to be included in this data element, corporate authors, joint authors, etc. the entry form is last name first (e.g. smith, john a. ). for compound authors, a slash is used as the delimiter separating names (e.g. smith, john a. i jones, john paul) . id number standard book number, vendor catalog number, etc. order number the order number is automatically assigned to one of three series depending on the flag word: the main number series with the fiscal year as prefix; held order series with an "h"-prefix (stored in the order number index as 101, the "h" is what is printed on the order forms); and gift series with a "g" -prefix (likewise stored in the order number index as 102). vendor code a sample of 18 months of invoice data (obtained from the comptroller's office) for the library resource account number indicates the use of 2200 vendors during that period of time. by sorting by invoice frequency and dollar amount, about 200 vendors were identified who either invoiced the library more than 12 times during this time period (since the invoices tended to contain more than one item for frequently used vendors, the number of purchase orders issued could easily be several times this amount), or whose invoices totalled over $110.00. of these, 171 have been selected for on-line storage. they will be assigned code numbers 1 to 171, and names and addresses of these vendors will be included on the computer generated purchase orders. authority files for all vendors 282 journal of library automation vol. 3/4 december, 1970 are kept on rolodex units; one set is arranged alphabetically by vendor name, the other by vendor code. account number the library account to which the book is charged. the number is divided into four sections: 1) a two-digit prefix identification for osu, 2) a four-digit identification for osu library resource expenditures, 3) a oneor two-digit identification of the particular library resource fund account to be charged (e.g. science, humanities, serials, binding, etc. ), and 4) a oneor two-digit code identifying the subject which most closely describes the request. from this data, statistics will be derived which describe expenditures by subject as well as by fund allocation. this will provide a powerful tool for collection building and . may also be a political aid in governing departmental participation in book selection. bibcit bibliographic citation code which cites the location by acquisitions dept. personnel of bibliographic data ( l.c. copy, etc. ). this information is included on the catalog work slip (4th copy of the purchase order) so that duplicate searching by the catalog dept. can be avoided. lc classification number refers to the call number as it is assigned by the osu catalog dept. file organization on-order record the operating system for oregon state university's on-line, time-sharing system reads into memory a quarter page (or file block) of 510 computer words at a time. each on-order (outstanding order) record is composed of a block of 51 computer words ( 204 6-bit characters), or linked lists of blocks, in order to best use this system. thu·s, each quarter page is divided into ten physical records of 51 computer words apiece. for records requiring more than one block, the nearest available block of 51 words within the same 510 word file-block is used; but if none is vacant within the same file-block, the first available 51-word block in the file is used. if none is free the file is lengthened to provide more blocks. a bit array is used to keep track of the status (in use, vacant) of records in the main file. in the bit array, each of 20 bits of each 24-bit computer word corresponds to a 51-word block in the main file. as in figure 4, the 13th bit has a zero value, indicating a vacancy in the 13th 51-word block of the main file; the 14th bit has a value of 1, indicating the 14th 51-word block in the on-order file is in use. a total of 10,120 block locations can be monitored by each file block of the bit array. records in this file are logically ordered by purchase order number, the arrangement effected by pointers which string the blocks together. on-line acquisitiansf spigai .and mahan 28$ 510-word ftle block unused 4 bits one -word b i t array fig. 4. bit army monitor of record block use in the on order file. access points order number the order number index is arranged by the main portion of the order number, and within that, it is in prefix number sequence. the sequence in figure 5 illustrates order number index arrangement (as well as the logical arrangement of the on-order file). the order number index allows quick access to selected points within the main file. conceptually, the ordered main file is segmented into strings of records whose order numbers fall into certain ranges. more specifically, items whose sequence numbers range from 0 to 4 (ignoring the prefix of the order number) comprise the first segment, 5 to 9 the second, etc. the index itself merely contains pointers to the leading record in each (conceptual) segment. thus, in the records whose purchase order numbers are shown in figure 5, there would be pointers to the second (69-124) and sixth (70-125), but not to the others. to reach the fourth ( 101-124) one follows the index to the second, and then follows the block pointers through the third to the fourth . 102-118 69-124 70-124 101-124, 102-124 70-125 102-125 . 70-126 fig. 5. fiscal year 1969, order number 124 fiscal year 1970, order number 124 held order number 124 for the current year gift order number 124 for the current year ( note : the prefix 'h,' which is printed on the purchase orders is represented as the number 101 for internal computer processing; likewise 102 represents the prefix 'g') order number index sequence. 284 journal of library automation vol. 3/4 december, 1970 p.o. number forward pointer ' p.o. number backward pointer time of last update . p. 0. number title forward pointer v title backward pointer v pointers to author( s) / ~ ~ title > date of re_quest date ordered encumbered price number of c<>e_ies account number (2 words) vendor number flag word ~ publisher 1 date of publication ~ notes ~ ~ edition ~ ld number ~ blbcit ' lc classification number )' volume number issue ~ location code ; ~ ~ vendor's invoice number ~~ invoice date actual price date received date first claim sent date second claim sent fig. 6. "on order" record organization. on-line acquisitionsjspigai and mahan 285 author(s) the author index is in the form of a multi-tiered inverted tree. the lowest tier is an inverted index containing the only representation of the author's names (it is not stored in the on-order record (figure 6), and, for each author, pointers to the records of each of his books (figure 7). the entries for several authors may be packed into a single 51-word block, if space permits. each higher tier serves to direct the indexing mechanism to the proper block in the next tier below, and to this end as much as needed of an author's name is filed upwards into higher tiers; this method is described in more detail by lefkovitz ( 13) as "the unique truncation variable length key-word key." author index directory (level 0 + 1) john/ jones, j 927 inverted author index (level 0) control word (ii chars. in record; # chars. in full name of author; # of titles jones, t jones, john pa ul 928 jop k.a 1282 tow ~ ~~~3 in on order file ~~2~66~7------------~ on order file 1072 927 10/20/69 10/29/69 $4.95 . 30-1061-6-20 16 0000 1282 10 fig. 7. author index organization and access to on order file. title not yet programmed. on-line record processing record creation after a number of new book requests have been searched to determine their absence from osu's collection and after they have been bibliographically identified, they are hatched for vendor assignment and readied for entry into the on-line file of book requests via the crt (figure 8 ). l-.:> 00 0) g '"'t i5 -c -~ n ... /y'rifiid "-.. n _/ not ""i assiql vrnoor 1•..-::-::-. _ i .... ~ a ~ y i > ~ ...... c ~ ...... .... c ;:s n < 0 !-' cn -~ d (!) () (!) !3 0"' (!) ~'"i ..... to -..1 0 fig. 8. book request processing. on-line acquisitionsjspagai and mahan 287 lolita's starting page is obtained by typing in the word lolita on the crt screen. the text illustrated in figure 9 is then displayed on the screen of the crt. when 't' is typed in, indicating a wish to create a record, the first data element of the first page of input appears (figure 10). (since the majority of records do not need a flag word upon input, the flag word fill-in line appears only on a redisplay of this page, and the flag word may be inserted at that time.) main file please indicate a choice 1. create a new entry 2. locate an existing entry 9. terminate all processing fig. 9. "starting" page of function choices. author(s): examples: jones dequincey, thomas washington, booker t. adams, john quincy/ doe, john american medical association fig. 10. first data element displayed in new record creation process. at this point the user can go in one of two directions. the first page of input information may be entered one data element at a time, each element being requested in a tutorial fashion by lolita. alternately, all of the first page data may be input at once, with data elements separated by delimiters. the user can switch from one method to the other at any point. a control key (return) is the delimiter used to signal the end of each data element, and, at the same time, return repositions the cursor (which indicates the position of the next character to be typed on the crt screen) to the location of the next data element to be filled in. another conh·ol key (send): 1) serves as a terminal delimiter, and 2) transmits data on the screen to the computer, thereby 3) triggering the continuation of processing until the next screen display is generated. thus, with page one, data elements are displayed, filled in and sent one at a time in the tutorial approach, or, all seven data elements are typed in at once, a return mark following items 1-6, then sent after the last data element. return or send must be used with each data element, even with those for which there is no information. this secures the sequence of element input, thus providing an easy (for the user) and automatic way of tagging elements for any future tape searches to provide statistics or analytical reports. in particular, this process obviates all content restrictions on variable (ie., free-form) items. each of the pages is redisplayed after 288 journal of library auto'tiultion vol. 3/4 december, 1970 input, and corrections can be made at this time. the crt is used for all input and its write-over capabilities are utilized for corrections, as compared to the "read-only" use planned for crt displays used for stanford's ballots ( 1). except for the flag word, all the data elements on the first page are variable in length and unrestricted as to content. data elements on page 2 and 3 (figures 2 and 3) are more of a fixed length in nature; thus with these pages, a whole page at a time is always filled in and sent: the tutorial function is inherent in the display. the concluding display is shown in figure 11. send if all done, type 1-3 to review pages. fig. 11. review option. because hatched searching and input are assumed, when one search or input is finished, the program recycles to continue searching or inputting without going back to the starting page (figure 9) each time. record search searching programs have been completed which will search by order number and by author. title searching will be implemented within the next few months, although a satisfactory scheme for title searching ( improving on manual methods, yet economical) has not been uncovered. methods suggested or used by ames, kilgour, ruecking, and spires have been noted (14, 15, 16, 17). the procedure for searching within the outstanding order file begins with the display of choices shown in figure 9. one types a "2," indicating a desire to locate an existing entry, and the text shown in figure 12 is displayed on the crt screen. at this point one chooses to search either by order number or by author. if one selects a valid order number representing a request record, the first page of that record, containing bibliographic information, is displayed. this is followed by the display shown in figure 11, so that accounting and inventory information may also be reviewed. for the user's convenience the order number is displayed in the upper right-hand comer of each of the three pages, both upon record input and search redisplay. to search by author, one types the author's name on the second line of figure 12, using the same format as that used in record creation. if the ------------------------: order number ------------------------------: a uth 0 r supply one of the above (start on the appropriate line) fig. 12. display of search options. ' on-line acquisitionsjspigai and mahan 289 author has only one entry in the outstanding order file, the first page of the entry will appear, etc. (as in the order number search above) . if the author entered has more than one entry in the on-line file, information depicted in figure 13 will be displayed on the screen of the crt. __ _____________ : enter number or 'nf' (not found) 1. night of the iguana 2. the milk-train doesn't stop here anymore 3. cat on a hot tin roof n. the glass menagerie fig. 13. display of multiple titles on file for one author. if the requested title is one of the titles displayed, one types its number and the record for that title will be displayed. if the title isn't among those displayed, typing nf would result in a redisplay of the text in figure 12 in order for searching to continue. for personal authors, variant forms of the name may be located using the following procedure. the word others is entered at the top of the screen, after an unsuccessful author search, so that a search for author j. p. jones would find all documents by john paul jones, joseph p. jones, j. peter jones, etc., as well as j. p. jones. a search for john p. jones would find all documents by j. p. jones, john jones and j. peter jones as well as john p. jones. record changes additions and corrections to the original record are made by first locating the record (by order number, author, or eventually, title), adding to the data elements, or writing over them (for corrections), and transmitting the information. examples of this procedure include: 1) entering the date received, 2) recording the vendor invoice number, invoice date, and actual price and 3) inserting or changing a flag word. in addition, after an item has been cataloged, the record is revised to include catalog data, as well as to exclude extraneous order notes. output aside from the crt displays, output is in three forms: off-line tape, printed forms and on-line files (figure 14). examples of output are library purchase orders, accounting reports, vendor data, and records of cataloged items. the number of potential reporting uses is limited only by money and imagination. 290 journal of library automation vol. 3/4 december, 1970 fig. 14. output from on-line on order file input. i order number i i date i id number author title publisher vendor name vendor address voujmes edition fig. 15. purchase order. f estimated price i no. of copies i vendor cooe i account date of pub. * * * • flag** • * gift or held order no. bibcit library purchase order 00 r cd !il~ iii r= ::0 r < . > sp >cil r-i ~ c/l c~ x c/lftl :v 0 c: -i ::0 z 0 "' < q "' ~ ::0 c/l ~ :::; -< on-line acquisitionsfspigai and mahan 291 the purchase order, shown in figure 15, is composed of four copies: 1} the vendor's copy to be retained by him, 2) a vendor "report" copy, 3) the copy which is kept as a record in the osu library, and 4) a catalog work slip to be forwarded to the catalog department with the book. purchase orders are printed on the library's teletype, which is equipped with a sprocket-feed. orders can also be printed on the line printer in the computer center. while this is a slightly cheaper data processing procedure, since no terminal costs are incurred, convenience and security have produced a victory in "economics over economies" ( 18 ), and the librarian's time has been considered in the total scheme. for gift items, purchase orders are produced as the cheapest means of preparing a catalog work slip. held purchase orders are produced and manually filed in purchase order number sequence, but when their status is changed to live, the old numbers are automatically replaced by a purchase order number in the main series. these new numbers are written onto the purchase orders, along with any other changes, and the orders are mailed. the flag word live also activates accounting procedures. there are two sets of accounting reports. the first is generated when the purchase orders are issued and contains tabulated information for the library's bookkeeper, the head of business records in the acquisitions dept., and the comptroller of the oregon state system of higher education. the second summary report is issued after the book and invoice have been received and will contain additional information, pertinent to the invoicing procedure; this report has the same distribution as the first. periodic reports are planned for the library's subject divisions summarizing expenditures by account number, reference area, and subject. programming for this has not yet been done. a frequency count will be stored with each vendor code and periodic listings will be printed for use in retaining vendors. mter an item has been cataloged, the catalog work slip and a slip equivalent to a main-entry catalog card are sent to acquisitions, and all remaining information and changes are recorded in the on-line record. this record is then transferred to a file from which it is dumped onto a magnetic tape. this off-line file will be used for statistical analyses and will be the start of a machine readable data base. future plans will, of course, depend on funding; however, two logical steps which could follow immediately and require no additional conversion are: 1) additional computer generated paper products (charge cards, catalog cards, book spine labels, new book lists, etc. ) , and 2) a management information system using acquisition and cataloging data. the construction of a central serial record in machine readable form would produce many valuable by-products. a program for the translation of the marc ii test tape has been written which causes these records to be printed out on the computer center's line printer; and since a sub292 journal of library automation vol. 3/4 december, 1970 scription to the marc tapes is now available to osu for test purposes, its advantages and compatibility with lolita will be investigated as time permits. unsolved problems, aside from those which everyone working in a data processing environment faces (e.g. syst~m and hardware breakdown, continued project funding, and lengthy dehv~ry times for hardware), include: 1) the widely varying system response tunes (commonly from a fraction of a second up to 60 seconds; usually 2-15 seconds); 2) the lack of personnel skilled in both data processing and library techniques; 3) the limited print train currently available on the line printer ( 62 character set); and 4) bureaucratic policy, which can render the most sophisticated plans for automation unfeasible if properly applied. it is recognized that all these problems can be solved by money, time, and priorities. meanwhile, the period of in-parallel operation will be valued as a time to educate, to test, to gather statistics, and to further refine the programs and procedures which comprise lolita. evaluation preliminary input samples indicate that a daily average of from 8 hours, 20 minutes, to 10 hours and 45 minutes will be necessary for input, searches, ~ting and corrections using the crt. an additional 3 hours per day ~f terminal time using the teletype will be required to produce the purchase orders, answer rush search questions if the crt is busy, and activate the daily batch programs (accounting reports, etc.). the sad economic plight of most libraries causes librarians to cast an especially suspicious eye on the costs of automation; a few words on osu's data processing costs may b~ of interest. the cost of total development efforts to produce lolita is under $90,000 (though considerably less was actually expended), or an average annual cost of $30,000 over a three-year period. this compare~ favorably with average annual incomes of from $50,000 to over $300,000 m federal funds alone for other on-line library acquisition projects in ?tiiversities ( 19, 20, 21, 22). a total of 6.75 man-years was required to des1gn lolita. the 6.75 man-years comprises 2.5 years of programming, 3.25 years .of systems analysis, coordination and documentation, and 1.0 year of clencal work, and represents the efforts of four students and six professional workers. this total does not include the time spent by acqu~sitions department personnel in reviewing lolita's abilities or in leammg to use the terminals. current data processing rates charged by the computer center include the following: crt rental-$100/mo.; cpu time-$300/hr.; terminal time -$2.00/hr.; on-line storage costs-15c/2040 characters/mo. the teletype has been purchased, thus only local phone lines charges are incurred. the on-line system is available for use from 7 :30 a.m. to 11:00 p.m. each week-day, and from 7:30 a.m. to 5:00 p.m. on saturday, which more than covers the 8-5 schedule of the acquisitions department. il on-line acquisitionsjspigai and mahan 293 acknowledgments the work on which this paper is based was supported by the administration, the computer center and the library of oregon state university. special mention is due robert s. baker, systems analyst, osu library, and lawrence w. s. auld, head, technical services, osu library, for their extensive participation in the lolita project and for their many suggestions which benefitted the final version of this paper. hans weber, head, business records, osu library, also contributed much to lolita's design. references l. veaner, allen b.: project ballots: bibliographic automation of large library operations using a time-sharing system. progress report, march 27, 1969-june 26, 1969, (stanford california: stanford university libraries, 29 july 1969), ed-030 777. 2. burgess, thomas k.; ames, l.: lola: library on-line acquisition sub~system~ (pullman, washington: washington state university, systems office, july 1968), pb-179 892. 3. payne, charles: "the university of chicago's book processing system." in stanford conference on collaborative library systems development: proceedings, stanford, california, october 4-5, 1968 (stanford california: stanford university libraries, 1969). ed-031 281, 119-139. 4. pearson, karl m.: marc and the library service center: automation at bargain rates (santa monica, california: system development corporation, 12 september 1969). sp-3410. 5. nugent, william r.: "nelinet -the new england library information network." in congress of the international federation for information processing (ifip), 4th: proceedings, edinburgh, august 5-10, 1968 (amsterdam, north holland publishing co., 1968 ). g28-g32. 6. blair, john r.; snyder, ruby: «an automated library system: project leeds," american libraries, 1 (february 1970), 172-173. 7. warheit, i. a.: "design of library systems for implementation with interactive computers," ] ournal of library automation, 3 (march 1970)' 68-72. 8. overmyer, lavahn: library automation: a critical review (cleveland, ohio: case western reserve university, school of library science, december 1969). ed-034 107. 9. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i (berkeley, california university: institute of library research, march 1969). ed029 679, pp. 13-14. 294 journal of library automation vol. 3/4 december, 1970 10. meeker, james w.; crandall, n. ronald; dayton, fred a.; rose, g. : "os-3: the oregon state open shop operating system." in american federation for information processing societies: proceedings of the 1969 spring joint computer conference, boston, mass., may 14-16, 1969 (montvale, new jersey: afips press, 1969), 241248. 11. spigai, frances; taylor, mary: a pilot-an on-line library acquisition system (corvallis, oregon: oregon state university, computer center, january 1968), cc-68-40, ed-024 410. 12. university of chicago. library: development of an integrated, computer-based, bibliographical data system for a large university library (chicago, illinois: university of chicago, library, 1968). pb-179 426. 13. lefkovitz, david : file structures for on-line systems (new york: spartan books, 1969 ), pp. 98-104. 14. ames, james lawrence: an algorithm for title searching in a computer based file (pullman, washington : washington state university library, systems division, 1968). 15. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science, 5 (new york, greenwood publishing corp., 1968)' 133-136. 16. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 17. parker, edwin b.: spires (stanford physical information retrieval system). 1967 annual report (stanford california: stanford university, institute for communication research, december 1967), 33-39. 18. kilgour, frederick g.: "effect of computerization on acquisitions," program, 3 (november 1969), 100-101. 19. "university library systems development projects undertaken at columbia, chicago and stanford with funds from national science foundation and office of education," scientific information notes, 10 (april-may 1968), 1-2. 20. "grants and contracts," scientific information notes, 10 (octoberdecember 1968), 14. 21. "university of chicago to set up total integrated library system utilizing computer-based data-handling processes," scientific information notes, 9 (june-july 1967), 1. 22. "washington state university to make preliminary library systems study," scientific information notes, 9 (april-may 1967), 6. editorial facing what’s next, together lita president’s message facing what’s next, together emily morton-owens information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12383 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the acting associate university librarian for library technology services at the university of pennsylvania libraries. when i wrote my march editorial, i was optimistically picturing some of the changes that we are now seeing for lita—while being scarcely able to imagine how the world and our profession would need to adapt quickly to the impacts on library services as a result of covid-19. it is a momentous and exciting change for us to turn the page on lita and become core, yet this suddenly pales in comparison to the challenges we face as professionals and community members. libraries’ rapid operational changes show how important the ingenuity and dedication of technology staff are to our libraries. since states began to shut down, our listserv, lita-l, has hosted discussions on topics like how to provide person-to-person reference and computer assistance remotely, how to make computer labs safe for re-occupancy, how to create virtual reading lists to share with patrons, and how to support students with limited internet access. there has been an explosion in practical problem-solving (ils experts reconfiguring our systems with new user account settings and due dates), ingenuity (repurposing 3d printers and conservation materials to make masks), and advocacy (for controlled digital lending). sometimes the expense of library technologies feels heavy, but these tools have the ability to scale services in crucial ways—making them available to more people at the same time, available to people who can only take advantage after hours, available across distances. technologists are focused on risk, resilience, and sustainability, which makes us adaptable when the ground rules change. our websites communicate about our new service models and community resources; ill systems regenerate around increased digital delivery; reservation systems for laptops now allocate the use of study seating. our library technology tools bridge past practices, what we can do now, and what we’ll do next. one of our values as ala members is sustainability. (we even chose this as the theme for lita’s 2020 team of emerging leaders.) sustainability isn’t about predicting the future and making firm plans for it; it’s about planning for an uncertain future, getting into a resilient mindset, and including the community in decision-making. although the current crisis isn’t climate-related per se, this way of thinking is relevant to helping libraries serve their communities. we will need this agile mindset as we confront new financial realities. our libraries and ala itself are facing difficult budget challenges, layoffs, reorganizations, and fundamental conversations about the vitalness of the services we provide. my favorite example from my own library of a covid-19 response is one where management, technical services, and it innovated together. our leadership negotiated an opportunity for us to gain access to digitized, copyrighted material from hathitrust that corresponds to print materials currently locked away in our library building. thanks to decades of careful effort by our technical services team, we had accurate data to match our print records with records for the digital versions. our it team had processes for loading the new links into our catalog almost mailto:egmowens.lita@gmail.com information technology and libraries june 2020 facing what’s next, together | morton-owens 2 instantaneously. the result was a swift and massive bolstering of our digital access precisely when our users needed it most. this collaboration perfectly illustrates how natural our merger with alcts and llama is. as threats to our profession and the ways we’ve done things in the past gather around us, i am heartened by the strengths and opportunities of core. it is energizing to be surrounded by the talent of our three organizations working together. i hope more of our members experience that over the summer and fall, as we convene working groups and hold events together, including a unique social hour at ala virtual and an online fall forum. i close out my year serving as the penultimate lita president in a world with more sadness and uncertainty than we could have foreseen. we are facing new expectations and new pressures, especially financial ones. as professionals and community members, we are animated by our sense of purpose. while lita has been transformed by our vote to continue as core, the support and inspiration we provide each other in our association will carry on. lib-s-mocs-kmc364-20140601052432 118 journal of library automation vol. 5/2 june, 1972 automation of acquisitions at parkland college ruth c. carter: system librarian, university of pittsburgh libraries. when this article was in preparation, the author was head of technical services and automation, parkland college, champagne, illinois this paper presents a case study of the automation of acquisitions fun ctions at parkland college. this system, utilizing batch processing, demonstrates that small libraries can develop and support lm·ge-scale automated systems at a reasonable cost. in operation since september 1971 , it provides machine-generated purchase orders, multiple order cards, budget statements, ovet·due notices to vendors, and many cataloging by-products. th e entire collection, print and nonprint, of the learning resource center is being accumulated gradually into a machine-readable data base. introduction-background parkland college, opened in 1967, is a two-year community college located in champaign, illinois. before the librarian-analyst, who combines a library degree with several years' experience as a computer systems analyst and six months of programming training, was hired by parkland, the administration decided that automation of some library procedures was feasible. at the time the library decided to initiate automation planning (december 1970), it had a book collection just under 30,000 plus 1000 audio-visual items. the decision to automate would not have been possible unless a computer was available at the college. in the spring of 1970 when the librarian-analyst was hired, parkland owned an ibm 360/ 30 with 32k. before automation plans were under way, the college purchased an ibm 360/30 with 64k. the computer's increased capacity provided even more incentive for utilizing the computer for significant projects in addition to instructional and administrative functions. among the reasons in favor of automation was a automation of acquisitions/carter 119 general consensus indicating that automation was the way to go, and that the library with its many individual records is a natural for utilizing the computer. the automation of library acquisitions at parkland is notable for several reasons. first, automation was done relatively easily and rapidly; actual systems design and programming were completed in six months. full implementation was achieved within nine months of the formal beginning of the project. second, documentation of the system is exhaustive and is based on a detailed method of communication between the system's librarian-analyst and the programmer. third, automation in this instance was accomplished economically. fourth, the entire system can be run on an ibm 360/30 with 32k having two disk drives and two tape drives, and a standard print chain consisting of just upper-case letters. what to automate? this, of course, is a crucial question. where out of the various alternatives of circulation, acquisitions, cataloging, and others does one begin? neither the librarian-analyst nor the rest of the library staff made any attempt to work out an answer during the fall of 1970. the librarian-analyst, as head of technical services spent the first four months concentrating on cataloging and learning the problems in the acquisitions area. by december she was ready to begin planning for automation. meetings were arranged with the director of the learning resource center and the director of the computer center. informal discussions with the library staff were held. circulation was eliminated early from consideration, since parkland is in temporary quarters. it seemed more logical to develop the area of circulation with the move to the permanent campus. in addition, the volume of circulation did not appear to warrant the time and personnel commitment necessary to develop a comprehensive system at this time. several possibilities remained: the acquisition of new materials, conversion of our whole catalog, and periodicals control, including automatic claim notices. periodicals seemed the least likely of the three, because our holdings numbered less than 700, and it was felt that the volume involved did not justify the effort and expense of going to a computer system, particularly the first computer system within the library. converting the whole catalog had some positive arguments. it would provide a data base for later circulation efforts and also make it possible to produce bibliographies and other service features for faculty members. however, this idea was discarded due to the large initial data-conversion problem, and because it did not provide relief for existing problems within the library. the library staff concluded that acquisitions had first priority for automation. to this the director of the computer center heartily agreed on the grounds that it was a conventional data processing type of application, and it would dovetail with existing data bases already maintained for administrative purposes, in particular, the vendor file and financial reporting 120 journal of library automation vol. 5/ 2 june, 1972 files. furthermore, the library could then produce its encumbrance data to be entered into the budget programs for the business office accounting records. from the standpoint of the library staff, it was believed that by utilizing the computer in acquisitions we could improve the overall staff utilization in the area. probably the strongest point is that, while we did not expect clerical work time to be decreased, its nature would be changed. one specific function to be eliminated was the manual bookkeeping done, although a machine system would still require checking for accuracy. we expected that the acquisitions librarian, once freed from some routine responsibilities concerning the budget, would be able to devote that time to more professional activities. other advantages in automating acquisitions were: more accurate and up-to-date information, especially in regard to budget figures would be available; human errors in sending out orders would be cut down; and statistics on orders could be compiled automatically. at this point, as well as previously, the literature was searched for relevant discussions of acquisitions systems and/or mechanization applications in small libraries. relatively little had appeared in print describing library automation in junior colleges. those articles found to be helpful included: burgess, cage, corbin, dobb, dunlap, macpherson, morris, and vagianos (see references 1-5 and 7-9). also, hayes and becker's handbook of data processing for libraries ( 6) became available at this time. it was especially useful for the summary of features usually present within the scope of standard acquisitions applications. along with use of the literature, several visits to other libraries with operational systems were made. a visit of particular importance was made in january ( 1971) to study an established off-line acquisitions system. as soon as there was general agreement on proceeding with plans for acquisitions, a list was prepared of the criteria the library staff would expect from the automation of acquisitions. the list items included: 1. the system should be open-ended, i.e., it should be planned with other potential future systems in mind. 2. it should handle the preparation of outgoing forms such as purchase orders, book-order cards, notifications to faculty requestors, and overdue notices to vendors. 3. the system should perform bookkeeping functions and provide many different access points for inquiry into the data base. 4. there must be a status list of items in the acquisitions process, up to and including the point of receiving cataloging. 5. it should have as much automatic editing of input data as possible. 6. the system must have flexible updating and file maintenance routines. 7. it should provide the library staff with decision-aiding information including many of our previously manually maintained statistics. automation of acquisitions/carter 121 8. it must be flexible. 9. it should maintain simplicity. and, 10. it should provide better service to the faculty through faster and more accurate ordering and notifications. along with the criteria for an acquisitions system, a possible sequence of automation development was submitted. this was to provide a means for keeping clearly in mind that, while acquisitions would get first attention, this was only a starting point, and that the system should be planned in such a manner as to facilitate its compatibility with future developments. as originally stated, acquisitions, strictly speaking, represented phase 1, and materials added to the collection were phase 2. however, phases 1 and 2 were planned and programmed at the same time. thus, from the beginning, parkland college has included in its system cataloging information such as the complete call number, and up to three subject headings of fifty characters each. the decision regarding number and length of subject headings will be discussed later. (see master record layout at figure 1.) time estimate-schedule in january, 1971, a proposed time estimate (see figure 2) was submitted to the director of the computer center for his approval. this time estimate was prepared with the goal of automating acquisitions beginning with the fiscal year 1972 (i.e., july 1972). the proposed schedule also took into account the fact that most of the librarians were expected to be on vacation all (or at least most) of august, and also that during september, with the registration of students and other demands on the computer resulting from the beginning of a new academic year, computer time and personnel would be tight and probably could not provide the necessary support to a system still in its developmental stages. the schedule called for the librarian-analyst to begin full -time work on analysis on february 15 with final implementation of the system by the end of july. preparation of this estimate was based on computer output if everything went right. it was an extremely rigorous schedule. considering that problems did arise, the implementation of this system during the first week of august is truly notable. of course, bugs remained after the system was actually in operation, and, as with all systems, changes were still being made several months later both in specifications for programming and in the programs conforming to the specifications. when the time estimate was submitted, it was also necessary to make firm decisions regarding personnel to perform all the necessary tasks. the librarian-analyst assumed responsibility for all systems analysis and program definitions. the library staff supplied the keypunching support. one clerk had been hired previously because of her keypunch training. on july 1, an additional clerk was hired with this skill. the main problem was programtap'e layout foi'im tape no. i prepared by: i remarks r. carter library master files: on order, i n process, ~!story no , length, block : 400 x 9 iji"i'l ·~ ~~·i i ill i ii ii i i ill i i j";'i" i ~ ii ill ii i i ii ii i ill i i~ ~j""t"l" i rj,iiiiiiiiiiiiiiiii 'i'j"llllllllllllllj~ llll.~ k '"''''' "''""' '"· ' j l 1111111111111111111111111111111 1 111111111111111 f ~ositions 301-350 • subject hesding no. 2; 35 1-400 • subject he &ding no. 3. fig. 1. master record layout. 1-' ~ 0' ~ !::l --a t'"' .... cl"' ~ > ~ ...... 0 ~ ...... o· ;! < 0 ~ cjl ......._ l~ ._ c :l v(!) 1-' cd -l 1:-0 automation of acquisitions/carter 123 ming, because the computer center did not have the full-time personnel to support a major new effort. this was resolved by hiring a programmer on a special three-month contract running from april 15 to july 15, 1971. prior to implementation, the library was forced to rely on the availability of keypunch machines at the computer center. in september 1971, an ibm model 129 keypunch and verifier was installed in the technical services department of the library. a model 129 was chosen for the library in conformance with the initial requirement set by the director of the computer center-that all library data for the computer be verified. this has proven to be a wise decision, as we have had relatively limited problems with invalid or erroneous data. requirements specification phase (analysis) three weeks were allowed for identification and specification of all output desired from the initial system. many of these requirements were alluded to in the preliminary list of criteria for the system. to meet the library's needs we decided that the system must produce: purchase orders, individual order cards (including a copy used to order catalog cards from the library of congress), budget statements including all encumbrances and payments as well as other financial data, lists of all books on order or in process or cancelled, notices to vendors regarding items on order more than 120 days, notices to each faculty member of the additions to the collection of items they requested complete with call number, and a monthly accession list of all newly cataloged items that could be circulated to all faculty members. time date to date to development steps required start complete i. requirements specifications 3 weeks feb. 15 march 5 ii. detailed design-system how 3 weeks march 8 march 26 ill. detailed design-programming specifications 10 weeks march 29 june4 iv. programming-acquisitions 10 weeks april15 june 23 v. programming-materials accessioned 3 weeks june 24 july 14 vi. computer program system test -acquisitions & materials accessioned 2% weeks july 1 july 26 vii. implementation july 1971 fig. 2. time estimate for automation of acquisitions at parkland college as submitted in january 1971. a beginning and ending date for each phase is indicated and the actual time in weeks required is shown. 124 journal of library automation vol. 5/2 june, 1972 once it was known what forms were required, orders were placed for the necessary pre-printed forms. with some outside advice in the matter of forms suppliers, specifications for three new forms were delineated, two of which would be for use on the computer. the first form encountered in outlining the acquisitions process was a request form. the request form is used to make a record of all items ordered and to serve as a checklist in the searching process (see figure 3). later, it is stamped with a six-position control number and serves as the source document for keypunching new orders, which require three input cards per item ordered. the request form is then retained in control-number sequence until the item has completed its way through the technical services process. specifications for the purchase orders were drawn up by parkland's business manager. the machine-generated purchase orders used by parkland are almost identical to the conventional manual purchase orders used throughout the college. in this case, automation of the library's purchase orders is a likely precursor to automation of the purchase orders for the remainder of the college. the most complicated form to design, from the library's viewpoint, was the individual order form. this was required in five parts, including a copy complying with library of congress specifications for use with ocr equipment. (this is illustrated in figure 4. ) paper pato iy n.cji. co. speeoiset e moore business foams, inc., 26 searched in bip pbip 8pr ptla o. p, pil fund vendor format code author (last name first) titlefvol. card catalog publisher other year no. copies reviewed in: series/edition lccard no. requester control no. order code price sbn fig. 3. request form, used as a control record for each item ordered. -------r-------------------------------------------------------------------i 0 0 0 0 0 i subsc riber no i m i alpha pref' i i 220111 i i i author westheimer, david title lighter than a feathe r publisher little date 1971 no. copies l control number 103921-b order date ·v endor ll ttle brown & co j l c car d number r 174 -15494 7 i i 10 i list price i lo i ' w '; i r •· " ~ ii 0 7.95 i 1-14-72 i l01375 i 0 p 0. no . i parkland college library i io 11111111111111111 i 0 i a b c d e f g sbni h i j k l m n 0 i i b_ -------i-------------------------------------------------------------------~---~-. 01 1 o i i ------------_l__ original copy, used to order catalog cards rrom me library of congress. ---~--------------------------------,-0 i 1 r;:·-t,m7 ! o 0 [ author·westh£jmer, oavio l ... o ... ee•o title light fit than • featttfr i o-m 0 'o publisher ll ttle date 1971 list price 7. 95 0 0 0 0 . no. copies 1 control number 10)921-6 order date vendor little a·•town a. to 1-14-72 p.o. no. l01j7s parkland college library champaign, illinois 61820 second copy, used to send to vendor. fig. 4. copies one and two of the multiple-part order form . 0 0 0 126 journal of lib·rary automation vol. 5/2 june, 1972 it was important to determine forms requirements early, as it was anticipated that several months' time would elapse before they would be received. naturally, it was desired that the forms be on hand by the time the programs would be ready for testing, which was planned for late june or early july. one of the most critical parts of the requirements specification phase was the determination of data elements to be included in the master records. perhaps the most perplexing of those possibilities considered was subject headings. since we wanted an open-ended system which would leave us some room for future development, without major modifications, a decision was made to include three 50-character subject headings in each record. here we were limited because of the decision made (for purposes of simplicity of design and programming) to confine the system to fixed -length records. it was considered desirable for storage purposes to keep the master record length within 400 characters. while the decision on subject headings may prove to be adequate in the long run, it does give parkland's library a good starting point for some projects using subject headings, such as developing bibliographies on demand. despite possible future modifications to the data base, all items going into the history (master) file included headings as defined above. additional determinations made in the initial phase regarded files to be maintained. here a crucial factor was the physical limitations of the college's computer system. as only two tape drives and two disk drives comprised the primary storage facilities, the capability for performing sorts was limited. in fact, one of the disk drives was reserved strictly for systems programs, and could not be utilized directly by the library. this contributed to the decision to maintain separate on-order and in-process files, as well as a history file on tape. the college vendor file and the library budget file are maintained on disk. a final area of effort in the initial phase was developing codes to be utilized throughout the system. naturally, many conditions would be indicated in the computer records by the use of a oneor two-position code. one example is the format code, a one-position code, which indicates the types of items used such as: b=book, r=record, and s=filmstrip. design phase-system flow three weeks were allotted to developing the overall systems flow chart. this time was spent working out each separate program that would be required, and flow-charting the entire series of programs. a flow chart of the system (without minor additions dating after september 1971) is shown in figure 5. however, it does not necessarily indicate the sequence in which programs are run. in general, maintenance of each of the separate files is run prior to new data. this procedure has proved to work well. .-------~ : llfnoo• i : u'oaf( ca~o$ i ~ ----;--.! ... -.. -, i \woau i i vtlfooii r· ~:~~ ~ automation of acquisitions/ carter 127 o\uiojhiuv ooooo c:~f6' fig. 5. system flow chart. 128 journal of library automation vol. 5/ 2 june, 1972 in most cases, pre-sorting of card input is provided. this decision was not based on optimum efficiency but on the compatibility with routine procedures and facilities in the computer center. design phase-program specifications one of the most significant parts of the development of parkland's automated library acquisitions system is the exhaustive documentation provided by detailed written specifications for each program in the system. each program, including utilities such as sorts, was assigned a job number and then described under each of the following topics: purpose, frequency, definitions (any unusual terms), input, output, and method. a format was provided for each input and output, whether it was a card, tape, disk, list, or other printed report or form. these accompanied each individual program specification. the method section is particularly important. here the librarian-analyst stated the procedure used to arrive at the given output based on the given input. any necessary constants were defined. because the librarian-analyst has had programming training, these specifications are detailed to the point where the programmer does not have to do much more than code the problem, making it possible for programming to proceed quickly. this thorough problem definition for each program by the librarian-analyst was one of the major factors (perhaps the primary key) in our success in acquisitions being accomplished rapidly and efficiently. it had the advantage of obviating the need for a senior programmer, or for having someone from the computer center become highly involved in the analysis of library details. furthermore, and perhaps most important is the fact that it provides the detailed documentation of the system. there should be no doubt as to the procedures within each program. an example of a specification for one of the programs in the parkland college library acquisition series is presented in the appendix. it should be mentioned that most of the programs are written in cobol. there are a few in assembler, and some minimal use is made of rpg. testing of the program the original plans called for testing with test data which would proceed simultaneously with programming. however, as things developed, most coding was done prior to very much testing. as a result, the period originally devoted to live data testing of the whole system was instead devoted to testing the programs with test data. thus, in early july, we were about two weeks behind the original time estimate, and that is where it ended up. the usual problems showed up in testing with test data. moreover, during the first week of july, it was learned that the business office was changing the length of the account numbers from 9 to 11 positions. fortunately, space had been planned for up to a 12-position field, so the lengthened number could be easily accommodated by the system. however, the changautomation of acquisitions/carter 129 ing of numbers required modification of any program which edited data for valid account numbers. this was a minor problem and easily resolved. on july 15 the programmer completed the job for which he was hiredi.e., to complete a programming and systems test utilizing live data and to make appropriate changes as identified during testing. since not even testdata testing was complete on july 15, he stayed until july 20 and finished that work. meanwhile, the director of the computer center had already selected the individual to be the operator when the library's jobs were being run on a regular basis. this employee would also provide program maintenance. on july 21, this permanent staff member took over programming. for the next two weeks, while summer school classes were in session, most of the trial runs of the library series had to he done during evenings, nights, and on weekends. by the end of july, most of the major bugs appeared to be out of the programs. impact on technical services success on the first usable purchase order and order cards came on august 3. within the next day or two, a workable budget statement was produced along with a wits list (work in technical services). by august 13, when the vacation time came, nearly one thousand books had been ordered via the automated system. while a few bugs remained to be dealt with in september, the system was accomplishing its basic mission essentially on time. it took less than eight months to identify requirements, and design, program, and test a system consisting of twenty-seven programs in its original design! during the remainder of 1971, various bugs were found, and, it is to be hoped, eliminated from the system. more bugs occurred in the budget series than in any other single segment of the system. over a period of several months, these were worked out; as of march, 1972, the budget sequence of programs worked smoothly. implementation following the implementation of the automated technical services system, several effects were evident. an obvious effect was the saving of two to three days per month formerly spent on bookkeeping. on the other hand, one permanent staff member was added to technical services because of the keypunching workload. this addition had two causes: the keypunching load, and the fact that many more books were ordered directly from publishers with a consequent major increase in processing in-house. therefore, much of what was expended in salary for the extra clerk was saved by eliminating most prepaid processing costs. for several months after implementation, some duplication of effort was required, especially by acquisitions personnel. thus, the total effect on changing the nature of work was not immediately obvious. by march 1972, duplication was essentially phased out, and more realistic assessments of the 130 journal of library automation vol. 5/2 june, 1972 impact of automation in changing the nature of the workload are now being made. one of the most obvious changes is the increased number of bills to be approved for payment. by utilizing the computer to batch purchase orders and order cards, almost all materials are now ordered directly from publishers, rather than pre-processed from a jobber. although the speed by which items are received and processed has increased substantially, there has been a corresponding increase in paper work in this regard. additional services besides the immediate effects of the automation of acquisitions within technical services, other parts of the library and the college felt the impact. this is especially true of reference, which now has a weekly updated listing of all items on order, in process, or cataloged within the last month, in both author /main entry and title sequence. budget statements are now available to the director of the learning resource center and other personnel on a weekly rather than monthly basis. not only are they received sooner, but they provide more information than is present in the statement originating from the computer center. a useful fringe benefit is the availability of overdue notices to vendors when items have heen on order more than 120 days. a computer-generated notice is sent each week to faculty members regarding items requested, cancelled, or cataloged. the response of the library staff and the rest of the faculty to the automated system has been very favorable. cost at this date (march 1972) , costs are difficult to assess, but certainly seem minimal. the only direct costs are the installation of a 129 keypunch, which rents for $170 per month, plus the salary of the extra staff member for keypunching. however, the extra salary is compensated for by no longer ordering items pre-processed at an average cost of $2.05 per item. naturally, there is some local cost for processing materials such as pockets and labels, but it is minor on a per-volume basis. in addition, by being processed locally, materials are available to the users much more rapidly. among other costs, the learning resource center had to pay a threemonth salary for a programmer. other computer support, whether personnel or machine time, has not been directly billed to the library. analyst time is absorbed, in part, in general library salaries as the librarian-analyst is also head oftechnical services and is responsible for original cataloging. about one-half of her time is devoted to automation activities. as an indirect cost of automation, it is reasonable to include the cost of a special summer project contract of about $1500 for the reference librarian to catalog a-v materials. this was necessary because the librarian-analyst was directly involved with automation, thus not able to keep up with all media of materials to be cataloged. purchase-order forms previously covered by the business office budget cost the library $900. however, it was a two-year automation of acquisitions/carter 131 supply which was paid for by money the college, if not the library, would have expended anyway. the multiple-order forms for computer use exceed the cost of more standard forms by several hundred dollars per year. the library also expends about $400 per year to buy punch cards and magnetic tape. some direct savings resulted from what are by-products of the automated system, but which were previously done manually. these include production of a monthly accession list and notices to faculty members of items they requested which were ordered, cancelled, or cataloged. the accession list was previously compiled by xeroxing in ten copies the shelflist card for all items added to the collection during a month. this involved both xerox charges and student assistant time. notices to faculty were previously sent out by both the order and processing sections. now these notices are consolidated, which produces savings in addressing time, as well as eliminating manual production of each notice. overall, in calculating costs and savings, direct and indirect, it appears at this point that parkland has automated many library routines very inexpensively, although specific cost figures remain to be determined. with the availability of a similar computer, many other libraries should be able to undertake automation of certain basic functions without large expenditures of either money or personnel time. problems as with all automated efforts, some problems were encountered at almost every stage of development. taken as a whole, these were minor and, for the most part, few hitches were encountered. however, so that others may profit from the library automation experience at parkland, those problems will be discussed. the major problem was the original programmer of the series. this person was not a regular employee of parkland and was not concerned with being retained. since he was not part of the staff, he worked erratically and frequently was hard to get hold of. we were working on a tight time schedule, and it was very important to maintain close supervision of the progress being made, although sometimes this was difficult. in addition, even though it was strongly desired that tests be conducted throughout the three-month period, the programmer waited until all coding and compiling was completed before beginning even test-data testing with most programs. fortunately, it worked out satisfactorily, as the regular staff member of the computer center, who presently runs our jobs and does program maintenance, took over in mid-july and was available for live-data tests. all staff members directly involved with automation worked very hard the last two weeks of july and the first week of august to complete testing with live data. the programs were further refined during august and september, and most of the bugs were out by early fall . naturally, changes in specifications continued to be made, and our acquisitions system is definitely not static. 132 journal of library automation vol. 5/ 2 june, 1972 the lesson we learned from the experience with the initial programmer is that, if a regular staff member of the institution can be assigned to the development of programs for the library, avoiding other assignments during that time period, a more satisfactory response can be achieved from the programmer. also, in such an operation it would be possible to monitor progress on a more regular basis. another group of problems arose in connection with the new forms required for the automated system. fortunately, these were not serious. the forms arrived later than they were promised, and, without exception, their cost was about 25 percent more than the original estimates. because custom forms can take a long time to be completed, it is wise to identify output requirements ·early in the development of an automated system, so that the forms can be completed and delivered when the system is ready for final testing and implementation. a few minor problems revolved around decisions made in file design. for conserving space and holding down the size of the master record, it was decided to pack numerical fields. this would have been satisfactory if packing had been limited to such fields as the julian date, such as 72001 rather than 01-01-72. (this form of the date was used to provide easy computation when calculating overdue orders. ) unfortunately, fields such as the numerical part of the lc card number and the parkland college account numbers were also packed. no problem existed except when the lc card was blank at order time; then the lc number printed as zeros. of course, these could be suppressed once the problem was identified, although it was decided to make space to unpack the field. it was learned that packed fields always print zero when unpacked, unless this is specifically suppressed, and also that it is impossible to debug packed fields on routine file dumps that are requested with provisions for unpacking and reformatting the dump. this is because packed fields print blank when they are dumped. other minor difficulties included: l. the print chain did not print colons or semi-colons, except as zero, therefore, the library's records all contain commas instead. 2. in the midst of programming the account numbers , all the college's funds were changed, thus requiring the change of constants and edit criteria in many programs. 3. as originally specified for input, the lc classification number did not sort in shelf list order, for instance, bf 8 sorted after bf 21. this was eventually remedied by left-justifying the letters and right-justifying the numbers within separate fixed fields. 4. routine delays for machine repair and maintenance were a concern, since it is necessary to adhere to a tight schedule in systems development. automation of acquisitions/carter 133 future development as is so frequently the case, now that parkland is committed to automated functions within the library, more and more applications are seen. even the former skeptics on the staff are enthusiastic, and all the professionals have made suggestions for the future. several additions to the acquisitions system were made in the first six months following implementation of the system. these included a list of purchase orders sequenced by vendor and enlarging the machine-generated notices to faculty requestors to cover items ordered and cancelled. various additions have been made in several programs originally part of the system, which expand the services the system can provide for the library staff. many more minor modifications and supplementary features in acquisitions have been identified for inclusion in the system, and will be added as time permits. the first additional area to benefit directly by the computer availability has been periodicals. without involving complicated programming, the periodicals holdings have been converted to a card file which is then listed directly, card by card, without changes, except for suppression of a control and sequence number. nothing more is planned for periodicals in the near future, because the new card file enables the master holdings list of 800 titles to be updated in technical services by the periodicals assistant, who also keypunches one-half time. the time-consuming retyping of the holdings list is now eliminated, and multiple copies of up-to-date holdings lists can be produced more frequently with less effort. another new area for which programming specifications were released in december 1971 is reference. in this system it is hoped that subject bibliographies and holdings lists, based on library of congress classification, can be produced. this system will have a multitude of purposes, one of the primary ones being to give better service to our faculty members. we get many requests for copies of portions of our shelflist or other extracts of holdings. rather than filling these requests by xeroxing cards or tedious typing, a few extract specifications will permit computerized retrieval and printing. also, search time in the catalog will be cut down considerably. in the subject bibliographies, the library plans to be able to extract on any heading, stem of a heading, or any part of a heading, thus getting much more flexibility than in manual use of the card catalog. programming for this is currently under way, and after the system has been completed and is operational, some interesting results should be identified. by including three subject headings of fifty characters in our original file design, it was possible to design and program the reference series as a spin-off of the acquisitionstechnical services system with a minimum of additional effort. even if it is eventually decided to lengthen either the number or size of the subject headings contained in parkland's file, useful services will have been provided under the original design, as well as simply having provided a base for further decisions and developments. 134 journal of library automation vol. 5/2 june, 1972 other projects which are being considered for future action are serials holdings (in parkland's case, mostly annuals and yearbooks which get cataloged), including an anticipation list, and management statistics consisting of holdings percentages by class letter versus collection additions and circulation figures by class letter. circulation itself will undoubtedly not be designed prior to actual residence on the permanent campus ( anticipated for fall 1973), but all of the above are possibilities and some will receive attention in the immediate future. by building a data base which includes subject headings and call numbers, many future projects will be practical to consider as the file maintenance programs and the data base will already exist. these, of course, may be modified from time to time to meet changing conditions and requirements. additionally, parkland's library staff has been following cooperative library automation efforts involving other libraries, and would happily consider participation in appropriate cooperative ventures. conclusion in the opinion of both the library and computer staff, the automation of acquisitions is a success. it was accomplished rapidly and essentially on time and economically-with few costs higher than originally anticipated. now that the system is operating smoothly, with only an occasional bug cropping up, the extra workload caused by parallel operations has been phased out and the total efficiency of the system should continue to improve. the system to date has been running on a weekly basis, and this has proved satisfactory to both the computer center personnel and the library. the library is among the first parts of parkland to be on a regular weekly schedule using the computer. most other processing is on a monthly and quarterly cycle. in approaching any automated systems development, a general attitude of flexibility combined with thoroughness is very important and will probably bring the best long-term results. by being flexible and open-ended, regardless of what portion of a library's functions were originally automated, the way will be paved to provide a data nucleus for other applications in areas of the library. thoroughness in design and attention to initial detail are also important, as sometimes it is harder to find the time to make the changes than was expected. there is probably a tendency to get along with an operational system as it is, rather than making minor non-crucial modifications in it, although such changes do get worked in as time permits. nonetheless, it is very important that in the initial stages a system be as comprehensively planned as feasible. the parkland college learning resource center is fortunate in that original specifications (on the whole) were well thought out and provided a cohesive unit, which is also characterized by built-in flexibility, and as a result is adaptable to future growth. automation of acquisitions/carter 135 acknowledgments numerous individuals have participated in and supported library automation efforts at parkland college. david l. johnson, director of the learning resource center provided the initial inspiration and determination. robert 0. carr, director of the computer center, welcomed the library's commitment to automation and provided the technical advice where necessary. sandra lee meyer, acquisitions librarian, gave full cooperation, including tireless aid in clarification of requirements and debugging test results. since late july 1971, bill abraham has been the programmeroperator for the library system and has consistently given more than one hundred percent effort. jim whitehead from western illinois university contributed valuable advice based on his prior experience in acquisitions automation. finally, kathryn luther henderson, an inspirational teacher and friend, voluntarily spent many hours writing test data and offering the opportunity for many fruitful discussions. references 1. thomas k. burgess, "criteria for design of an on-line acquisitions system at washington state university library," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 50-66. 2. alvin c. cage, "data processing applications for acquisitions at the texas southern university library," in proceedings, texas confe1·ence on library automation, 1969 (houston: texas library association, acquisitions round table, 1969), p. 35-57. 3. john b. corbin, "the district and its libraries-tarrant county junior college district, fort worth, texas," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 114-34. 4. t. c. dobb, "administration and organization of data processing for the library as viewed from the computing centre," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 75-80. 5. connie dunlap, "automated acquisitions procedures at the university of michigan library," library resources & technical se rvices 11: 192202 (spring 1967). 136 journal of library automation vol. 5 / 2 jun e , 1972 6. robert m . hayes and joseph becker, handbook of data processing for libraries (new york: wiley-becker and hayes, 1970). 7. john f. macpherson, "automated acquisition at the university of western ontario," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library association, 1967). 8. ned c. morris, "computer-based acquisitions system at texas a & t university," journal of library automation 1 :1-12 (march 1968 ). 9. louis vagianos, "acquisitions: policies, procedures, and problems," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library associ ation , 1967 ), p. 1-9. 158 information technology and libraries | december 2009 michelle frisquepresident’s message i know the president’s message is usually dedicated to talking about where lita is now or where we are hoping lita will be in the future, but i would like to deviate from the usual path. the theme of this issue of ital is “discovery,” and i thought i would participate in that theme. like all of you, i wear many hats. i am president of lita. i am head of the information services department at the galter health sciences library at northwestern university. i also am a new part-time student in the masters of learning and organizational change program at northwestern university. as a student and a practicing librarian, i am now on both sides of the discovery process. as head of the information systems department, i lead the team that is responsible for developing and maintaining a website that assists our health-care clinicians, researchers, students, and staff with selecting and managing the electronic information they need when they need it. as a student, i am a user of a library discovery system. in a recent class, we were learning about the burkelitwin causal model of organization performance and change. the article we were reading described the model; however, it did not answer all of my questions. i thought about my options and decided i should investigate further. before i continue, i should confess that, like many students, i was working on this homework assignment at the last minute, so the resources had to be available online. this should be easy, right? i wanted to find an overview of the model. i first tried the library’s website using several search strategies and browsed the resources in metalib, the library catalog, and libguides with no luck. the information i found was not what i was looking for. i then tried wikipedia without success. finally, as a last resort, i searched google. i figured i would find something there, right? i didn’t. while i found many scholarly articles and sites that would give me more information for a fee, none of the results i reviewed gave me an overview of the model in question. i gave up. the student in me thought: it should not be this hard! the librarian in me just wanted to forget i had ever had this experience. this got me to thinking: why is this so hard? libraries have “stuff” everywhere. we access “stuff,” like books, journals, articles, images, datasets, etc., from hundreds of vendors and thousands of publishers who guard their stuff and dictate how we and our users can access that stuff. that’s a problem. i could come up with a million other reasons why this is so difficult, but i won’t. instead, i would like to think about what could be. in this same class we learned about appreciative inquiry (ai) theory. i am simplifying the theory, but the essence of ai is to think about what you want something to be instead of identifying the problems of what is. i decided to put ai to the test and tried to come up with my ideal discovery process. i put both my student and librarian hats on, and here is what i have come up with so far: n i want to enter my search in one place and search once for what i need. i don’t want to have to search the same terms many times in various locations in the hopes one of them has what i am looking for. i don’t care where the stuff is or who provides the information. if i am allowed to access it i want to search it. n i want items to be recommended to me on the basis of what i am searching. i also want the system to recommend other searches i might want to try. n i want the search results to be organized for me. while perusing a result list can be loads of fun because you never know what you might find, i don’t always have time to go through pages and pages of information. n i want the search results to be returned to me in a timely manner. n i want the system to learn from me and others so that the results list improves over time. n i want to find the answer. i’m sure if i had time i would come up with more. while we aren’t there yet, we should continually take steps—both big and small—to perfect the discovery process. i look forward to reading the articles in this issue to see what other librarians have discovered, and i hope to learn new things that will bring us one step closer to creating the ultimate discovery experience. michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, northwestern university, chicago. book reviews information technology and libraries | march 2014 44 epub 3: best practices, by matt garrish and markus gylling. sebastopol, ca: o'reilly. 2013. 345 pp. isbn: 978-1-449-32914-3. $29.99. there is much of value in this book—there aren't really that many books out right now about the electronic book markup framework, epub 3—yet i have a hard time recommending it, especially if you're an epub novice like me. so much of the book assumes a familiarity with epub 2. if you aren't familiar with this version of the specification, then you will be playing a constant game of catch-up. also, it's clear that the book was written by multiple authors; the chapters are sometimes jarringly disparate with respect to pacing and style. the book as a whole needs a good edit. this is surprising since o'reilly is almost unifo rmly excellent in this regard. the first three chapters form the core of the book. the first chapter, "package document and metadata," illustrates how the top level container of any epub 3 book is the "package document." this document contains metadata about the book as well as a manifest (a list of files included in the package as a whole), a spine (a list of the reading order of the files included in the book), and an optional list of bindings (a lookup list similar to the list of helper applications contained in the configurations of most modern web browsers). the second chapter, "navigation," addresses and illustrates the creation of a proper table of contents, a list of landmarks (sort of an abbreviated table of contents), and a page list (useful for quickly navigating to a specific print-equivalent page in the book). the third chapter, "content documents," is the heart of the core of the book. this chapter addresses markup of actual chapters in a book, pointing out that epub 3 markup here is mostly a subset of html5, but also pointing out such things as the use of mathml for mathematical markup, svg (scalable vector graphics), page layout issues, use of css, and the use of document headers and footers. after reading these first three chapters, my sense is that one is ready to dive into a markup project, which is exactly what i did with my own project. that said, i think a reread of these core chapters is due, which i intend to do presently. the rest of the book is devoted to specialty subjects such as how to embed fonts, use of audio and video clips, "media overlays" (epub 3 supports a subset of smil, the synchronized multimedia integration language, for creating synchronized text/audio/video presentations), interactivity and scripting (with javascript), global language support, accessibility issues, provision for automated text-to-speech, and a nice utility chapter on validation of epub 3 xml files. of these, the chapter on global language support i found to be fascinating. for us native english speakers, it's not immediately obvious some of the problems one will inevitably encounter when trying to create an electronic publication that can work in non-western languages. just consider languages that read vertically and from right to left, for one! as an epub novice, my greatest desire would be for the book to provide, maybe in an appendix, a fairly comprehensive example of an epub 3 marked -up book. maybe this is a tall book reviews 45 order? nevertheless, i would love to see an example of marked up text including bidirectional footnotes, pagination, a table of contents, etc.; simple, foundational things, really. examples of each of these are included in the book, but not in one place. having such an example in one place would be something that could be used as a quick-start template for us epub beginners. to be fair, code examples of all of this is up on the accompanying website, and i am using these examples as i learn to code epub 3 for my own project. but having a single, relatively comprehensive example as an appendix to the book would be very useful. as i read this book, something kept bothering me. epub2 and epub 3 are so very different, with reading systems designed to render epub 3 documents being fairly rare at this point. so if different versions of the same spec are so different, with no guarantee that a future reading system will be able to read documents adhering to a previous version, then the prospect of reading epub documents into the future is pretty sketchy. are e-books, then, just convenient and cool mechanisms for currently reading longish narrative prose—convenient and cool, but transitory? mark cyzyk is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland, usa. 78 design principles for a comprehensive library system tamer uluakar, anton r. pierce, and vinod chachra: virginia polytechnic institute and state university, blacksburg, virginia. this paper describes a project that takes a step-by-step or incremental approach to the development of an online comprehensive system running on a dedicated computer. the described design paid particular attention to present and predicted capabilities in computing as well as to trends in library automation. the resultant system is now in its second of three releases, having tied together circulation control, catalog access, and serial holdings . perspective the use of computers in libraries is no longer a speculative venture for the daring few. rather, library automation has become the accepted prerequisite for effective library service. the question faced is not "if," but rather "how" and "when." the reasons for this evolution are diverse, but fundamental is the recognition of online computer processing as the most effective means of simultaneously handling inventory control, information retrieval, and networking of large, complex, and volatile stores of data. most areas of current library practice could now benefit from effective computer-based control. mature and proven systems exist for cataloging, circulation, serials control, acquisitions, catalog access, and "reader guidance"; the latter by virtue of online literature searching facilities such as dialog, medlars, or brs. the challenge is to find or develop an optimal mix of capabilities. two common limitations from which library automation projects suffer are the use of nonstandardized, incomplete records and the lack of functional integration of different tasks. in most cases these limitations are due to historic circumstances. the pioneering systems say, those online systems introduced between 1967 and 1975 had to conserve carefully the available computing resources. a decade ago it was unthinkable for any library to store a million marc records online. mass manuscript received july 1980; accepted february 1981. design principles/uluakar, et al. 79 storage costs alone precluded that option. to best realize the benefits of automation, short records, usually of fixed length, were employed. there is little question that systems based on short records were helpful to their users . however, one characteristic of these systems was their proliferation within a particular library. after the first system was shown to be a success, it became compelling to try another. the problem was that these separate systems were usually not communicating directly with each other because of limitations imposed by program complexity and load on available resources. thus, the use of incomplete records breeds isolated, noncommunicating systems. however, system users have come to demand that all relevant data be available at a single terminal from a single system. it is not enough to know that a particular title is due back in twenty-five days; the user must also know that copy two has just been received, and that copy three is expected to arrive from the vendor in one week. that is, the functions of catalog access, circulation, and acquisitions must be brought together at a single place the user's terminal. and while the importance of functional integration has been recognized for some time, only a very few report successful implementations. i,z the kafkaesque alternative to functional integration becomes the library that has been "well computerized" but where the librarian must use five different terminals, one for each task. as computer-based systems have grown to maturity, increasing stress has been placed on standardization . in library automation the measure of standardization is wide-scale use of the marc formats for documents and authorities; the use of bibliographic "registry" entries such as isbn, issn, or coden; the use of standard bibliographic description; and so forth. however, the application of common languages and standardized protocols, data description, and definition has been less pervasive. we find many applications that eschew use of the common high-level languages, database management systems, and standard "off-the-shelf' or general-purpose hardware. the emergence of powerful and easy-to-use database management systems, the spectacular price reductions in hardware, and the concomitant, and equally spectacular, improvements in system capabilities have made it clear that it is practical to think ambitiously. perhaps the major articulation of these developments has been the pervasive shift from a central computer shared with nonlibrary users to the utilization of dedicated minicomputers. 3 our analysis of the requirements of a comprehensive system led to recognition of the key role played by serials in research libraries. serials form the most critical factor in automating library service because of the complexity of their bibliographic, order, and inventory records, and because of their importance to research. 4 a fundamental error in designing a comprehensive library system would involve focusing on the require80 journal of library automation vol. 14/2 june 1981 ments of monographs and/or other "one-shot" forms of the literature. the reason is, simply, that monographs and other such publications can be treated as an easy limiting case of a continuing set of publications . this observation is borne out by christoffersson, who reports an application that extends the idea of seriality and develops a means to provide useful control and access to all classes of material. 5 design philosophy the concerns outlined above mean that a viable library system should meet the following design criteria: functional integration. functional integration is simply the ability to conduct all appropriate inquiries, updates, and transactions on any terminal. this envisages a cradle-to-grave system wherein a title is ordered, has its bibliographic record added to the database, is received and paid, has its bibliographic record adjusted to match the piece, is bound, found by author, title, subject, series, etc., charged out, and, alas, flagged as missing. in this way a terminal linked to the system will be a one-stop place to conduct all the business associated with a particular title, subject, series, order, claim, vendor, or borrower. completeness of data. if the system is to be functionally integrated, it is clear that it must carry the data required to support all functions. in particular, data completeness is required to satisfy the access and control functions. consider, for example, the problems associated with the cataloging function. a book is frequently known by several titles or authors. creating these additional access points is a large portion of the cataloger's responsibility. only systems that allow the user access to these additional entries utilize the effort spent in building the catalog record. such system capabilities must be present to allow the laborintensive card catalog to be closed and, more important, to allow maintenance of the catalog within the system . use of standardized data and networking. in an excellent article, silberstein reminds us that, in general, the primary rationale for adhering to standards is interchangeability. 6 we give great importance to being able to project our data to whatever systems may develop in the future. we believe this consideration is of the highest priority because, fundamentally, the only thing that will be preserved into the future is the data itself.* without interchangeability of data, sharing of resources is impossible. data interchangeability is, of course, a basic assumption that has been made in speculation concering the national bibliographic network7 developing from the bibliographic utilities-notably, oclc, inc., the research libraries group's rlin facility, the washington library network, and the university of toronto's utlas facility. today, nearly all *this state of affairs seems to be true for all computer-based systems because their lifetime is, typically, no greater than ten years. design principles!uluakar, et al. 81 research libraries participate in some utility. while their participation is primarily directed to utilization of the c<;~,taloging support services, we find an increasing amount of interest and use of additional capabilities, notably interlibrary loan. we expect a steady and continual growth of these library networking capabilities. however, networking is not problem free. perhaps the biggest single problem in using the network is the misalignment between the record as found on the bibliographic database and the requirements of individual libraries. while such variability between the resource database record and the user's needed version is well understood, 8 the local library frequently has a difficult time adjusting records to meet local needs. one example is oclc's inability to "remember" in the online database a particular library's version of a record. another example is the conser project's practice of "locking" very dynamic records as soon as they are authenticated. this locking frequently means that required updates cannot be made and users cannot share with one another corrections to the base record. after locking, each must, independently, go about bringing the record up to date. thus, as roughton notes, "the next library to call up the record loses the benefit of the previous library's work. "9 this inhospitable state of affairs forces individual libraries to maintain their own records if they wish to change bibliographic records after initial entry. the problem of local adjustment of bibliographic records in no way conflicts with the goal of standardized bibliogra:phic data. standardized data provides a quick means of delivering an intelligible package to a variety of users who will adapt the package to meet their particular needs . standardization does not mean making adaptation inefficient or more costly than it need be; rather, standards provide a framework around which the details are filled in. these observations on standardized data formats imply that the library's data must be based on marc records for books, serials, authorities, etc.; and on the ansi standards for summary serials holdings notation, book numbers, library addresses, and so forth. microscopic data description. at this point, system administrators face a fundamental problem-many of the library's important records have no standard format. the most conspicuous example involves the notation for detailed serials holdings. 10 the only alternative one has when trying to build a system without standardized formats is to rely on "microscopic" description. that is, each and every distinct type of data element that makes up (or can make up) a field in a record must be accounted for and uniquely tagged. in this way, whatever standard format is ultimately set, it will be possible, in principle, to assemble by algorithm the data elements into an arrangement that will be in conformity with the standard. only if the library is using microscopic data description will the library be able to maintain its independence of particular lines 82 journal of library automation vol. 14/2 june 1981 of hardware or software. we are convinced that the use of untagged, free-form input will, in the long run, spell disaster. use of general purpose hardware and software. many strategies in dealing with library automation involve redesigning standard hardware or software. for example, one vendor has reported an interesting design of mass storage units that improved access time. 11 we feel that future applications should, as much as possible, steer clear of such customized implementations because the standard capabilities of most affordable systems allow sufficient processing power and storage economies even if these capabilities are suboptimal for a particular application . the use of general-purpose hardware and system software promotes system sharing between different installations. moreover, an application based on general-purpose hardware and system software will be easier to maintain and far less vulnerable to changes in personnel. for turnkey installations, the greater the degree of use of general-purpose hardware and software, the better shielded will the installation be against changes in product line or the vendor's ultimate demise . a noteworthy application of this principle of compatibility is seen in the system being developed by the national library of medicine. 12 system description the functional capabilities of the virginia tech library system (vtls) have been developed in two software releases, with the third release soon to appear. the initial release met the needs associated with circulation control and also provided rudimentary access to the catalog and serials holdings. the present release has benefited from the use of the marc format, and allows vastly improved catalog access and control. release iii, the comprehensive library system now being developed, will draw together acquisitions, authority control, and serials control with the current capabilities. vtls release i the initial release of the system was developed in 1976 to meet needs generated by rapid library growth. circulation transactions had been increasing at about 10 percent annually for the previous decade and were straining the manually maintained circulation files beyond acceptable limits. the main library* at virginia tech is organized in subject divisions-each essentially "owning" one floor of a 100,000-square-foot facility. a 100,000-square-foot addition to the library had been approved. because virginia tech's library has only one card catalog, some means was necessary to distribute catalog information throughout a facility that *only two quite small branch libraries (architecture and geology) exist on campus . in addition there is a reserve collection located in the washington, d.c., area that supports off-campus graduate programs in the areas of education, business administration, and coiuputer science. all these sites are linked to the system. design principles/uluakar, et al. 83 was to double its size. after reviewing the alternative means of distributing the catalog-e . g., a duplicate card catalog, photographic reproduction of the catalog, or a com catalog-it was decided to attack both problems, circulation control and remote catalog access, within a single online system . vtls was installed on a full-time basis in august 1976. its first release ran continuously on the library's dedicated hewlett/packard 3000 minicomputer until december 1979 . at that time the system held brief bibliographic data for approximately 325,000 monographs and 25,000 journals and other serial titles-records for about half the collection. while the first release ably met its goals, it became clear that it would prove to be an unsuitable host for additional modules involving acquisitions and serials control, primarily because of the brief, fixed-length bibliographic records. as a result of highly favorable price reductions in computer hardware and improvements in capability, it was possible to think in terms of storing one million marc records online as well as supporting the additional terminals required for a comprehensive library system. vtls release ii vtls runs under a single online program for all real-time transactions. the major goals in the design of this program were the following: 1. two conflicting requirements had to be a~commodated : first, the program had to be easy to use for library patrons. this is requisite for a system that will eventually replace the card catalog. second, the program had to be practical, efficient, and versatile for its professional users. the keystrokes required had to be minimal, and related screens had to be easily accessible· from one to another. 2. the response time had to be good, especially for more frequent transactions. 3. the contents of all screens had to be balanced to provide enough information without being overcrowded and difficult to read or comprehend. further, each screen of vtls had to be arranged by some logical arrangement of the data it contains-for most screens this meant alphabetical sorting of the data according to ala rules. 4. the format of all screens, especially those to be viewed by the patrons, had to be visually pleasing. thus , the use of special symbols (which are so abundant on many computer system displays), nonstandard abbreviations, and locally (and often quite arbitrarily) defined terms were unacceptable. 5. the program had to have security provisions to restrict certain classes of users from addressing particular modules of the program. considerable effort was spent to satisfy these goals. the first goal was achieved by the "network of screens" approach. the second goalprompt system response-necessitated the use of the "data buffer 84 journal of library automation vol. 14/2 june 1981 method," which, in turn , proved to have other uses (both of these techniques are discussed below) . to satisfy goals three and four, a committee of librarians and analysts spent months drafting and reviewing each screen until it was finally approved by the design group. goal fivesecurity provisions-was reached without much difficulty. network of screens vtls' s data-access system is designed to be used as easily as a road map. this is accomplished by the use of a "network of screens." the network of screens is much like a road map in which a set of related data (a screen displayed in one or more pages) acts as a "city," and the commands that lead from one set to another act as "highways." vtls has nineteen screens including various menu screens, bibliographic screens (see "the data buffer method" below), serial holdings screens, item (physical piece) screens, and screens for patron-related data. the user can "drive" from one "city" to another us ing system commands. the system commands are either "global" or "local." global commands, as the name implies, may be entered at any point during the execution of the online program. a local command is peculiar to a given screen. global commands are of two types: search commands and processing commands. search commands are used to access the database by author, title, subject, added entries, call number, lc card number, isbn, issn, patron name, etc. processing commands, on the other hand, initiate procedures such as check-out, renewal, or check-in of items. the user first enters a global (search) command to access one of the screens in the network. from there, local commands that are specific to the current screen can be used. there are three different types of local commands: commands that take the user from one screen to another; commands that page within the current screen; and commands that update data related to the screen. for example, it is possible to start by entering an author search command to access the network and then proceed not only to find what books the author has in the system but also the availability of each of the books . if the books are checked out, information about the patrons who have them can also be reached. this display is called the patron screen . from the patron screen, one can "drive" to the patron activity screen , which displays circulation information about the patrons. thus, each d isplayed screen leads to another. in fact, the searches can start at ten different screens and proceed in many different ways through the network. database design image/3000, hewlett-packard's database management system used by vtls, is designed to be used with fixed-length records. this fact, coupled with the need to sort entries on most screens, created serious problems in the early stages of the system design . but various techdesign principles/uluakar, et al. 85 niques were devised to overcome these apparent road blocks . figure 1 illustrates the breakdown of the bibliographic record in the database and the way it is linked with piece-specific · data. bibliographic data are stored in three distinct groups for subsequent retrieval: l. controlled vocabulary terms. (authority data set) 2. title and title-like data. (title data set) 3. all remaining bibliographic data; i.e., data that is not indexed. (marc-other data set) this grouping of the marc record extends to subfields, thus splitting mixed fields such as author-title added entries . when individual fields are parsed in this way, a single field may contribute more than one access point, such as variant forms of author, title, series name, subject, and added entries. access by the standard bibliographic control numbers is effected by use of inverted files (not shown in the figure). a fundamental characteristic of this layout involves the storage of controlled vocabulary terms (i.e., authors and subjects). regardless of the number of references made to an authority term from different bibliographic records, the controlled vocabulary term is stored only once. the system assigns a unique number (authority id) to each such term and uses this number to keep records of the references made to it in a separate data set (authority bibliographic linkage data set). this particular structure makes an authority control subsystem possible, speeds up online retrieval and display, and economizes mass storage. the data buffer method the system displays bibliographic records in two different formats. if the terminal used is designated for librarians, the records are displayed al'thority -bibliographic linka<;e data set fh;ure 16. biblio<;raphic layo ut of the cfs-11 data base . tsimplif'iedl fig. 1. bibliographic layout of the vtls database (simplified). 86 journal of library automation vol. 14/2 june 1981 in the marc format (the resulting screen is referred to as the marc screen); otherwise, they are displayed in a screen that is formatted similar to a catalog card. before displaying these screens, the online program collects and formats the data to be displayed and stores it in one of the two "buffer" data sets. the records stored in the buffer data sets are called buffer records. buffer records can be edited, as required, by adding new lines, deleting, or modifying existing character strings. these updates can be executed quickly and without placing much load on the system since they involve little, if any, analysis, indexing, and sorting. thus, the buffer data sets store all bibliographic updates and new data entry of the day. at night, these records are transferred to the rest of the database by a batch program. the data buffer method has had several pronounced effects on the system. by transferring periods of heavy resource demand to off-hours, the system can work with full marc records in a library that has a heavy real-time load of data entry, inquiry, and circulation. the data buffer approach also improves access efficiency because once a buffer record is prepared for a screen, subsequent searches for the same record are satisfied by the buffer record. data entry and the oclc interface the most frequently encountered method of entering marc records into a local computer involves use of tape in the marc ii communications format . alternative methods include the use of microprocessors or digital recorders which "play back" a marc-tagged screen image from oclc or some other bibliographic utility. these alternative methods have the strong advantage of shortening the delay introduced while waiting for a tape to be delivered. we have been able to link the utility's terminal to the data buffer. 13 data flows from the utility to the buffer in real time. no intervention in the utility's terminal was required for the local processor to be able to capture the marc-tagged screen. batch programs running on the hip 3000 read records from printer ports of oclc terminals and pass them directly to the data buffer. once a record gets into the data buffer, it is accessible by oclc number so that subsequent editing and linkage to piece-specific data or serial holdings can be made right away in the local system . buffer records can also be created by direct keyboarding of the full array of fixed and variable fields using the vtls terminals. circulation as with most other online circulation systems, vtls uses machinesensible bar-code labels to identify books and borrowers to the system. all efforts have been made to humanize the system. one consequence is design principles/uluakar, et al. 87 that the system does not make decisions better made by responsible staff. thus, two kinds of circulation stations reside side by side. the first is staffed by students who typically work a ten-to-twenty-hour week and historically have shown high turnover. their circulation stations only deal with inquiries and with heavily used but nondiscretionary transactions: check-out, renewal, and check-in. should problems arise, the borrower is directed to the adjacent station staffed by a full-time employee who, using the system, can articulate circulation policy to borrowers and make decisions with regard to any questions concerning fines, lost books, or reinstatement of invalidated or blocked privileges . start-up we found system start-up to be a relatively easy task. it was convenient to use the so-called rolling conversion in which items were labeled upon their initial circulation through the system. the greatest benefit was seen in the first year when the probability that items brought to the circulation desk were already known to the system increased exponentially. after six months this probability had risen to 65 percent with only 10 percent of the circulating collection having been labeled . at the end of the year the probability increased linearly at 0. 7 percent per month. after three years of operation, the probability was 90 percent, with approximately 50 percent of the circulating collection having been labeled. reference use the ability to distribute catalog access as well as circulation information provides a powerful information tool. a subset of all functions previously described is available to the nonlibrarian users of the system through user-cordial screens. a "help" function may also be initiated at any screen to guide users through the network of screens. current development critical to the overall design of vtls is the system's ability to treat serials and continuations. without this capability, the modules being developed to support acquisitions, serials check-in and claiming, and binding, will not function satisfactorily. equally important, the design lays the foundation for authority control by virtue of its use of a dictionary for all controlled vocabulary terms. thus a name or subject entry is carried internally as a four-byte code, which is translated to the authority entry upon display. another internally coded data element, the bib-id, is designed to handle many of the linkage problems associated with serials and continuations. the bib-id is unique for each marc record. prior to establishing the serials control modules governing receipt, 88 journal of library automation vol. 14/2 june 1981 claiming, and binding, the coded holdings module must be functioning. this module will allow automatic identification of volume (or binding unit) closure and automatic identification of gaps in holdings or overdue receipts. thus, highest priority has been given to the development of this module so that these other modules can, in turn, develop. the holdings module serves two functions: first, it allows the detailed recordings of serials holdings consistent with the principle stated earlier concerning microscopic data description; and second, these microscopic data are coded so that the system can recognize (and predict) particular pieces or binding units in terms of enumerative and chronological data. the next three areas of development are modules for acquisitions and fund control, serials receipts and binding, and authority control. the final development will be comprehensive management reports. it should be noted that each one of these developments will result in a specific benefit to the user community. the project is incremental in that the development of area a does not mean that area b must be developed for a to have lasting value. this incremental approach offers designers and administrators the advantages associated with an orderly growth in complexity and budget requirements. further, the capabilities of the host hardware and software are stressed in smaller steps than would be the case if the comprehensive system were written and then turned on. the key move appears to be predefining the scope and capabilities of each stage so that a useful product emerges at its completion, and so that it lays a foundation for the next. references 1. velma veneziano and james s. aagaard, "cost advantages of total system development," in proceedings of the 1976 clinic on library applications of data processing (urbana, ill.: university of illinois press, 1976), p.133-44 . 2. charles payne and others, "the university of chicago data management system ," library quarterly 47:1-22 (jan . 1977). 3. audry n. grosch, minicomputers in libraries (new york: knowledge industry press, 1979), 142p . 4. richard degennaro, "wanted: a mini-computer serials system," library journal 102:878-79 (april 15, 1977). 5. john g. christoffersson, "automation at the university of georgia libraries," journal of library automation 12:23-38 (march 1979). 6. stephen m. silberstein, "standards in a national bibliographic network," journal of library automation 10:142-53 (june 1977). 7. network technical architecture group, "message delivery system for the national library and information service network: general requirements," in david c. hartmann, ed . , library of congress network planning paper, no.4, 1978, 35p. 8. arlene t. dowell, cataloging with copy (littleton, colo.: libraries unlimited, 1976), 295p. 9. michael roughton, "oclc serials records: errors , omissions, and dependability," journal of academic librarianship 5:316-21 (jan. 1980). 10. tamer uluakar, "needed: a national standard for machine-interpretable representation of serial holdings," rtsd newsletter 6:34 (may/june 1981) . design principles!uluakar, et al. 89 11. c.l. systems, inc., "the libs 100 system: a techn-ological perspective," clsi newsletter, no .6 (fall/winter 1977). 12. lister hill national center for biomedical communications, national library of medicine, "the integrated library system: overview and status" (lhc/ctb internal documentation, bethesda, md., october 1, 1979), 55p. 13. francis j. galligan to pierce, 11 feb. 1980. tamer uluakar is manager of the virginia tech library automation project. anton r. pierce is planning and research librarian at the university libraries. vinod chachra is director of computing resources and associate professor of industrial engineering. using the harvesting method to submit etds into proquest: a case study of a lesser-known approach communications using the harvesting method to submit etds into proquest a case study of a lesser-known approach marielle veve information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12197 marielle veve (m.veve@unf.edu) is metadata librarian, university of north florida. © 2020. abstract the following case study describes an academic library’s recent experience implementing the harvesting method to submit electronic theses and dissertations (etds) into the proquest dissertations & theses global database (pqdt). in this lesser-known approach, etds are deposited first in the institutional repository (ir), where they get processed, to be later harvested for free by proquest through the ir’s open archives initiative (oai) feed. the method provides a series of advantages over some of the alternative methods, including students’ choice to opt-in or out from proquest, better control over the embargo restrictions, and more customization power without having to rely on overly complicated workflows. institutions interested in adopting a simple, automated, post-ir method to submit etds into proquest, while keeping the local workflow, should benefit from this method. introduction the university of north florida (unf) is a midsize public institution established in 1972, with the first theses and dissertations (tds) submitted in 1974. since then, copies have been deposited in the library, where bibliographic records are created and entered in the library catalog and the online computer library center (oclc). during the period of 1999 to 2012, some tds were also deposited in proquest by the graduate school on behalf of students who decided to. this practice, however, was discontinued in the summer of 2012, when the institutional repository, digital commons, was established and submission to it became mandatory. five years later, in the summer of 2017, interest in getting unf tds hosted in proquest resurfaced. this renewed interest grew out from a desire of some faculty and graduate students to see the institution’s electronic theses and dissertations (etds) posted there, in addition to a recent library subscription to the proquest dissertations & theses global database (pqdt). a month later, conversations between the library and graduate school began on the possibility of resuming hosting unf etds in proquest. consensus was reached that the pqdt database would be a good exposure point for our etds, in addition to the institutional repository (ir), yet some concerns were raised. one of the concerns was cost of the service and who would be paying for it. neither the library nor the graduate school had allocated funds for this. the next concern was the possibility of proquest imposing restrictions that could prevent students, or the university, from posting etds in other places. it was important to make sure there were no such restrictions. another concern was expressed over students entering embargo dates in proquest that do not match the embargo dates selected for the ir. this is a common problem encountered by other libraries.1 for that reason, we wanted to keep the local workflow. the last concern expressed during the conversations was preserving students’ right to opt-in or out from distributing their theses in proquest. this is something both the graduate school and library have been adamant mailto:m.veve@unf.edu information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 2 about. in higher education, requiring students to submit to proquest is a controversial issue which has raised ethical concerns and has been highly debated over the years.2 once conversations between the library and graduate school were held and concerns were gathered, the library moved ahead to investigate the available options to submit etds into proquest. literature review currently, there are three options to submit etds into proquest: (1) submission through the proquest etd administrator tool, (2) submission via file transfer protocol (ftp), and (3) submission through harvests performed by proquest.3 proquest etd administrator submission option in this option, a proprietary submission tool called proquest etd administrator is used by students, or assigned administrators, to upload etds into proquest. inside the tool, a fixed metadata form is completed with information on the degree, subject terms are selected from a proprietary list, and keywords are provided. the whole administrative and review process gets done inside the tool. afterwards, zip packages with the etds and proquest’s extensible markup language (xml) files are sent to the institution via ftp transfers, or through direct deposits to the ir using the simple web-service offering repository deposit (sword) protocol. the etd administrator submission method presents several shortcomings. first, the proquest xml metadata that is returned to the institutions must be transformed into ir metadata for ingest in the ir, a process that can be long and labor intensive.4 second, the subject terms supplied in the returned files come from a proprietary list of categories maintained by proquest, which does not match the library of congress subject headings (lcsh) used by libraries.5 third, control over the metadata provided is lost because the metadata form cannot be altered, plus customizations to other parts of the system can be difficult to integrate. 6 fourth, there have been issues with students indicating different embargo periods in the proquest and ir publishing options, with instances of students choosing to embargo etds in the ir, while not in proquest.7 lastly, this method does not allow students’ choice, unless the etds are submitted separately in two systems in a process that can be burdensome. ultimately, for these reasons, we found the etd administrator not a suitable option for our institution. ftp submission option in this option, an administrator sends zip packages with the institution’s etd files and proquest xml metadata to proquest via ftp.8 at the time of this investigation, there was a $25 charge per etd submitted through this method.9 we did not want to pursue this option because of the charge and the tedious metadata transformations that would be needed between ir and proquest xml schemas. another way to go around this would have been to submit the etds through the vireo application. vireo is an open source, etd management system used by libraries to freely submit etds into proquest via ftp.10 this alternative, however, was not an option for us as our ir, digital commons, does not support the vireo application. harvesting submission option this is the latest method available to submit etds into proquest. in this option, etds are submitted first into an ir, or other internal system, where they get processed to be later harvested by proquest through the ir’s existing open archives initiative (oai) feed.11 at the time of this writing, we were not able to find a single study that documents the use of this method. this option information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 3 looked appealing and worth pursuing as it met most of our desired criteria. first, with this option, students’ choice would not be compromised as etds would be submitted to proquest after being posted in the ir. second, because the etd administrator would not be used, issues with conflicting embargo dates and unalterable metadata forms would be avoided. in addition, the local workflow would be retained, thus eliminating the need for tedious metadata transformations between proquest and ir schemas. from the available options, this one seemed the most feasible solution for our institution. implementation of the harvesting method at unf after research on the different submittal options was performed, the library approached proquest to express interest in depositing our future etds into their system by using a post-ir option. in the first communications, proquest suggested we use the etd administrator to submit etds because is the most commonly used method. when we expressed interest in the harvesting option, they said “we have not been harvesting from bepress sites” (the company that makes digital commons) and suggested we use the ftp option instead.12 ten months later, they clarified the harvests could be performed from bepress sites and that the option is free, with the only requirement of a non-exclusive agreement between the university and proquest. the news appeased both the library’s and the graduate school’s previous concerns, as we would be able to adopt a free method that would not compromise on students’ choice nor restrict students from posting in other places, while keeping the local workflow. after agreement on the submittal method was established, planning and testing of the harvesting method began. the library worked with proquest and bepress to customize the harvesting process while the university’s office of the general counsel worked with proquest on the negotiation process. negotiation process before proquest could harvest unf etds, two legal documents needed to be in place. the first document was the theses and dissertations distribution agreement, which specifies the conditions under which etds can be obtained, reproduced, and disseminated by proquest. the document had to be signed by the unf’s board of trustees and proquest. the agreement stipulated the following conditions: • the agreement must be non-exclusive. • the university must make the full-text uniform resource locators (urls) and abstracts of etds available to proquest. • proquest must harvest the etds from the university’s ir. • the university and students have the option to elect not to submit individual works or to withdraw them. • no fees are due from the university or students for the service. • proquest must include the etds in the pqdt database. the second document that needed to be in place was the theses and dissertations availability agreement, which grants the university the non-exclusive right to reproduce and distribute the etds. this agreement between students and unf specifies the places where etds can be hosted and the embargo restrictions, if any. unf already has been using this document as part of its etd workflow, but the document needed to be modified to include the additional option to submit information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 4 etds into proquest. beginning with the spring 2019 semester, the revised version of the agreement provided students with two hosting alternatives: posting in the ir only or in the ir and proquest. local steps performed before the harvesting the workflow begins when students upload their etds and supplemental files (certificate of approval and availability agreements) directly into the digital commons ir. in there, students complete a metadata template with information on the degree and keywords related to the thesis are provided. after this, the graduate school reviews the submitted etds and approves them inside the ir platform. next, the library digital projects’ staff downloads the native pdf files of etds, processes them, and creates public and archival versions for each etd. availability agreements are reviewed to determine which students chose to embargo their etds and which ones chose to host them in proquest, in addition to the ir. if students choose to embargo their etds, the embargo dates are entered in the metadata template. if students choose to publish their etds in proquest, a “proquest: yes” option is checked in their metadata template, while students who choose not to host in proquest would get a “proquest: no” in their template. (the proquest field is a new administrative field that was added to the etd metadata template, starting with the spring 2019 semester, to assist with the harvesting process. it was designed to alert proquest of the etds that were authorized for harvesting. more detail on its functionality will be provided in the next section.) the reason library staff enters the proquest and embargo fields on behalf of students is to avoid having students enter incorrect data on the template. following this review, the metadata librarian assigns library of congress subject headings to each etd and creates authority files for the authors. these are also entered in the metadata template. afterwards, the etds get posted in the digital commons’ public display, with the fulltext pdf files available only for the non-embargoed etds. information that appears in the public display of digital commons will also appear immediately in the oai feed for harvesting. at this point, two separate processes take place: 1. metadata librarian harvests the etds’ metadata from the oai feed and converts it into marc records that are sent to oclc, with the ir’s url attached. the workflow is described at https://journal.code4lib.org/articles/11676. 2. on the seventh of each month, proquest harvests the full-text pdf files, with some metadata, of the non-embargoed etds that were authorized for harvesting from the oai feed. harvesting process (customized for our institution) to perform the harvests, proquest creates a customized robot for each institution that crawls oaipmh compliant repositories to harvest metadata and full-text pdf files of etds.13 the robot performs a date-limited oai request to pull everything that has been published or edited in an ir’s publication set during a specific timeframe. information to formulate the date limited request is provided to proquest by the institution for the first harvest only, subsequently, the process gets done automatically by the robot. the request contains the following elements: information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 5 • base url of the oai repository • publication set • metadata prefix or type of metadata • date range of titles to be harvested in the particular case of our institution, we needed to customize the robot to limit the harvests to authorized etds only. to achieve this, we worked with bepress to add a new, hidden field at the bottom of our digital commons’ etd metadata template. the field, called proquest, consisted of a dropdown menu with 2 alternatives: “proquest yes” or “proquest no” (see figure 1). the field was mapped to an element in the oai feed that displays the value of “proquest: yes” or “proquest: no,” thus alerting the robot of the etds that were authorized for harvesting and the ones that were not. the element used to map the proquest field in the oai feed is the , which is a qualified dublin core (qdc) element (figure 2). for that reason, the robot needs to perform the harvests from the qdc oai feed in order to see this field. figure 1. display of the proquest field’s dropdown menu in the metadata template figure 2. display of the proquest field in the qdc oai feed after the etds authorized for harvesting have been identified with help from the “proquest: yes” field, the robot narrows down the ones that can be harvested at the present moment by using the element. this element, as the name implies, provides the date when the full text file of an etd becomes available. it also displays in the qdc oai feed (see figure 3). if the date is on or before the monthly harvest day, the etd is currently available for harvesting. if the date is in the future, the robot identifies that etd as embargoed and adds its title to a log of embargoed etds with some basic metadata (including the etd’s author and the last time it was checked). the log of embargoed etds is then pulled out in the future to identify the etds that come out of embargo so the robot can retrieve them. figure 3. display of the element in the qdc oai feed information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 6 after the etds that are currently available for harvesting have been identified (because they have the “proquest: yes” field and a present or past availability date), the robot performs a harvest of their full-text pdf files by using the third element, which displays at the bottom of records in the oai feed (figure 4). the third element contains a url with direct access to the complete pdf file of etds that are currently not embargoed. etds that are currently on embargo contain a url that redirects the user to a webpage with the message: “the full-text of this etd is currently under embargo. it will be available for download on [future date]” (see figure 5). figure 4. display of the third element at the bottom of records in the qdc oai feed figure 5. message that displays in the url of embargoed etds once the metadata and full-text pdf files of authorized, non-embargoed etds have been obtained by the robot, they get queued for processing by the proquest editorial team, who then assigns them international standard book numbers (isbns) and proquest’s proprietary terms. it takes an average of four to nine weeks for the etds to display in the pqdt database after been harvested. records in the pqdt come with the institutional repository’s original cover page and a copyright statement that leaves copyright to the author. afterwards, the process gets repeated once a month. this frequency can be set to quarterly or semi-annually if desired. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 7 additional points on the harvesting method handling of etds that come out of embargo. when the embargo period of an etd expires, the full-text pdf of it becomes automatically available in the ir’s webpage, and consequently, in the third element that displays in the oai record. each month, when the robot prepares to crawl the oai feed, it will first check for the titles in the log of embargoed etds to determine if any of them have become fully available through the third element. the ones that become available are then pulled by the robot through this element. handling of metadata edits performed after the etds have been harvested and published in pqdt. edits performed to metadata of etds will trigger a change of date in the element that displays in the oai records. this change of date will alert the robot of an update that took place in a record, which is then manually edited or re-harvested, depending on the type of update that took place. sending marc records to oclc. as part of the harvesting process, proquest provides free marc records for the etds hosted in their pqdt database. these can be delivered to oclc on behalf of the institution on an irregular basis. records are machine-generated “k” level and come with urls that link to the pqdt database and with proquest’s proprietary subject terms. we requested to be excluded from these deliveries and continue our local practice of sending marc records to oclc with lcsh, authority file headings, and the ir’s urls. notifications of harvests performed by proquest and imports to the pqdt database. when harvests or imports to the pqdt have been performed by proquest, institutions do not get automatically notified. still, they can request to receive scheduled monthly reports of the titles that have been added to the pqdt. unf requested to receive these monthly reports. usage statistics of etds hosted in pqdt. usage statistics of an institution’s etds hosted in the pqdt can be retrieved from a tool called dissertation dashboard. this tool is available to the institution’s etd administrators and provides the number of times some aspect of an etd (e.g., citations, abstract viewings, page previews, and downloads) has been accessed through the pqdt database. royalty payments to authors. students who submit etds through this method are also eligible to receive royalties from proquest. obstacles faced during the planning phase, we encountered some obstacles that hindered progress on the implementation. these were: • amount of time it took to get the ball rolling. initially, we were misled by the assumption we would not be able to use the harvesting method to submit etds into proquest because we were bepress users, as we were originally told, but that ended up not being the case. ten months later, we were notified by the same source that the harvesting option for bepress sites would be possible and doable by proquest. these were ten months that delayed the implementation process. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 8 • amount of time it took to get the paperwork finalized and signed before the harvesting. from the moment first contact was initiated with proquest, to the moment the last agreement was finalized and signed by both parties, 21 months went by. there was a lot of back and forth in the negotiation process and paperwork between the university and proquest. • inconsistent lines of communication. there were multiple parties involved in the communication process and some of the emails began with one person only to be later transferred to someone else. this lack of consistency in the communication lines made it difficult to determine who was in charge of particular tasks at certain stages of the process. conclusion and recommendations although problems were encountered at the beginning, implementation of the harvesting process at unf was a complete success. once the process started, it ran smoothly without complications. harvests were performed on schedule and no issues with unauthorized content been pulled from the oai were faced. fields used to alert the robot in the oai of the etds authorized for harvesting worked as planned, and so did the embargo log used to identify and pull the out of embargo etds. it should be noted that digital commons users who want to exclude embargoed etds from displaying in the oai can do so by setting up an optional yes/no button in their submission form. this button prevents metadata of particular records from displaying in the oai feed. we did not pursue this option because we have been using the etd metadata that displays in th e oai to generate the marc records we send to oclc. in addition, we took the necessary precautions to avoid exposing full content of the embargoed etds in the oai feed. institutions planning to use this method should be very careful with the content they display in the oai as to avoid embargoed etds from been mistakenly pulled by proquest. access restrictions can be set by either suppressing the metadata of embargoed etds from displaying in the oai or by suppressing the urls with full access to the embargoed etds. the same precaution should be taken if planning to provide students with the choice to opt-in or out from proquest. altogether, the harvesting option proved to be a reliable solution to submit etds into proquest without having to compromise on students’ choice nor rely on complicated workflows with metadata transformations between ir and proquest schemas. institutions interested in adopting a simple, automated, post-ir method, while keeping the local workflow, should benefit from this method. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 9 endnotes 1 dan tam do and laura gewissler, “managing etds: the good, the bad, and the ugly,” in what’s past is prologue: charleston conference proceedings, eds. beth r. bernhardt et al. (west lafayette, in: purdue university press, 2017), 200-04, https://doi.org/10.5703/1288284316661; emily symonds stenberg, september 7, 2016, reply to wendy robertson, “anything to watch out for with etd embargoes?,” digital commons google users group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:da te/digitalcommons/rningtrarny/6byzt9apaqaj. 2 gail p. clement, “american etd dissemination in the age of open access: proquest, noquest, or allowing student choice,” college & research libraries news 74, no. 11 (december 2013): 562– 66, https://doi.org/10.5860/crln.74.11.9039; fuse, 2012-2013, graduate students re-fuse!, https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students %20re-fuse.pdf?sequence=25&isallowed=y. 3 “pqdt submissions options for universities,” proquest, http://contentz.mkt5049.com/lp/43888/382619/pqdtsubmissionsguide_0.pdf . 4 meghan banach bergin and charlotte roh, “systematically populating an ir with etds: launching a retrospective digitization project and collecting current etds,” in making institutional repositories work, eds. burton b. callicott, david scherer, and andrew wesolek (west lafayette, in: purdue university press, 2016), 127–37, https://docs.lib.purdue.edu/purduepress_ebooks/41/. 5 cedar c. middleton, jason w. dean, and mary a. gilbertson, “a process for the original cataloging of theses and dissertations,” cataloging and classification quarterly 53, no. 2 (february 2015): 234–46, https://doi.org/10.1080/01639374.2014.971997. 6 wendy robertson and rebecca routh, “light on etd’s: out from the shadows” (presentation, annual meeting for the ila/acrl spring conference, cedar rapids, ia, april 23, 2010), http://ir.uiowa.edu/lib_pubs/52/; yuan li, sarah h. theimer, and suzanne m. preate, “campus partnerships advance both etd implementation and ir development: a win-win strategy at syracuse university,” library management 35, no. 4/5 (2014): 398–404, https://doi.org/10.1108/lm-09-2013-0093. 7 do and gewissler, “managing etds,” 202; banach bergin and roh, “systematically populating,” 134; donna o’malley, june 27, 2017, reply to andrew wesolek, “etd embargoes through proquest,” digital commons google users group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort :date/digitalcommons/gadwi8infga/sg7de7sdcaaj. 8 gail p. clement and fred rascoe, “etd management & publishing in the proquest system and the university repository: a comparative analysis,” journal of librarianship and scholarly communication 1, no. 4 (august 2013): 8, http://doi.org/10.7710/2162-3309.1074. 9 “u.s. dissertations publishing services: 2017-2018 fee schedule,” proquest. https://doi.org/10.5703/1288284316661 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:date/digitalcommons/rningtrarny/6byzt9apaqaj https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:date/digitalcommons/rningtrarny/6byzt9apaqaj https://doi.org/10.5860/crln.74.11.9039 https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students%20re-fuse.pdf?sequence=25&isallowed=y https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students%20re-fuse.pdf?sequence=25&isallowed=y http://contentz.mkt5049.com/lp/43888/382619/pqdtsubmissionsguide_0.pdf https://docs.lib.purdue.edu/purduepress_ebooks/41/ https://doi.org/10.1080/01639374.2014.971997 http://ir.uiowa.edu/lib_pubs/52/ https://doi.org/10.1108/lm-09-2013-0093 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort:date/digitalcommons/gadwi8infga/sg7de7sdcaaj https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort:date/digitalcommons/gadwi8infga/sg7de7sdcaaj http://doi.org/10.7710/2162-3309.1074 information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 10 10 “support: proquest export documentation,” vireo users group, https://vireoetd.org/vireo/support/proquest-export-documentation/. 11 “pqdt global submission options, institutional repository + harvesting,” proquest, https://media2.proquest.com/documents/dissertations-submissionsguide.pdf. 12 marlene coles, email message to author, january 19, 2018. 13 “proquest dissertations & theses global harvesting process,” proquest. https://vireoetd.org/vireo/support/proquest-export-documentation/ https://media2.proquest.com/documents/dissertations-submissionsguide.pdf abstract introduction literature review proquest etd administrator submission option ftp submission option harvesting submission option implementation of the harvesting method at unf negotiation process local steps performed before the harvesting harvesting process (customized for our institution) additional points on the harvesting method handling of etds that come out of embargo. handling of metadata edits performed after the etds have been harvested and published in pqdt. sending marc records to oclc. notifications of harvests performed by proquest and imports to the pqdt database. usage statistics of etds hosted in pqdt. royalty payments to authors. obstacles faced conclusion and recommendations endnotes smartphones: a potential discovery tool | starkweather and stoward 187 smartphones: a potential discovery tool wendy starkweather and eva stowers the anticipated wide adoption of smartphones by researchers is viewed by the authors as a basis for developing mobile-based services. in response to the unlv libraries’ strategic plan’s focus on experimentation and outreach, the authors investigate the current and potential role of smartphones as a valuable discovery tool for library users. w hen the dean of libraries announced a discovery mini-conference at the university of nevada las vegas libraries to be held in spring 2009, we saw the opportunity to investigate the potential use of smartphones as a means of getting information and services to students. being enthusiastic users of apple’s iphone, we and the web technical support manager, developed a presentation highlighting the iphone’s potential value in an academic library setting. because wendy is unlv libraries’ director of user services, she was interested in the applicability of smartphones as a tool for users to more easily discover the libraries’ resources and services. eva, as the health sciences librarian, was aware of a long tradition of pda use by medical professionals. indeed, first-year bachelor of science nursing students are required to purchase a pda bundled with select software. together we were drawn to the student-outreach possibilities inherent in new smartphone applications such as twitter, facebook, and myspace. n presentation our brief review of the news and literature about mobile phones in general provided some interesting findings and served as a backdrop for our presentation: n a total of 77 percent of internet experts agreed that the mobile phone would be “the primary connection tool” for most people in the world by 2020.1 the number of smartphone users is expected to top 100 million by 2013. there are currently 25 million smartphone users, with sales in north america having grown 69 percent in 2008.2 n smartphones offer a combination of technologies, including gps tracking, digital cameras, and digital music, as well as more than fifty-thousand specialized apps for the iphone and new ones being designed for the blackberry and the palm pre.3 the palm pre offered less than twenty applications at its launch, but one million apllication downloads had been performed by june 24, 2009, less than a month after launch.4 n the 2009 horizon report predicts that the time to adoption of these mobile devices in the educational context will be “one year or less.”5 data gathered from campus users also was presented, providing another context. in march 2009, a survey of university of california, davis (uc-davis) students showed that 43 percent owned a smartphone.6 uc-davis is participating in apple’s university education forum. here at unlv, 37 percent of students and 26 percent of faculty and staff own a smartphone.7 the presentation itself highlighted the mobile applications that were being developed in several libraries to enhance student research, provide library instruction, and promote library services. two examples were abilene christian university (http://www.acu.edu/technology/ mobilelearning/index.html), which in fall 2008 distributed iphones and ipod touches to the incoming freshman class; and stanford university (http://www.stanford .edu/services/wirelessdevice/iphone/) which participates in “itunes u” (http://itunes.stanford.edu/). if the libraries were to move forward with smartphone technologies, it would be following the lead of such universities. readers also may be interested in joan lippincott’s recent concise summary of the implications of mobile technologies for academic libraries as well as the chapter on library mobile initiatives in the july 2008 library technology report.8 n goals: a balancing act ultimately the goal for many of these efforts is to be where the users are. this aspiration is spelled out in unlv libraries’ new strategic plan relating to infrastructure evolution, namely, “work towards an interface and system architecture that incorporates our resources, internal and external, and allows the user to access from their preferred starting point.”9 while such a goal is laudable and fits very well into the discovery emphasis of the mini-conference presentation, we are well aware of the need for further investigation before proceeding directly to full-scale development of a complete suite of mobile services for our users. of critical importance is ascertaining where our users are and determining whether they want us to be there and in what capacity. the value of this effort is demonstrated in booth’s research report on student interest in emerging technologies at ohio state university. the report includes the results of an extensive environmental survey of their wendy starkweather (wendy.starkweather@unlv.edu) is director, user services division, and eva stowers (eva.stowers @unlv.edu) is medical/health sciences librarian at the university of nevada las vegas libraries. 188 information technology and libraries | december 2009 library users. the study is part of ohio state’s effort to actualize their culture of assessment and continuous learning and to use “extant local knowledge of user populations and library goals” to inform “homegrown studies to illuminate contextual nuance and character, customization that can be difficult to achieve when using externally developed survey instruments.”10 unlv libraries are attempting to balance early experimentation and more extensive data-driven decision-making. the recently adopted strategic plan includes specific directions associated with both efforts. for experimentation, the direction states, “encourage staff to experiment with, explore, and share innovative and creative applications of technology.”11 to that end, we have begun working with our colleagues to introduce easy, small-scale efforts designed to test the waters of mobile technology use through small pilot projects. “text-a-librarian” has been added to our existing group of virtual reference service, and we introduced a “text the call number and record” service to our library’s opac in july 2009. unlv libraries’ strategic plan helps foster the healthy balance by directing library staff to “emphasize data collection and other evidence based approaches needed to assess efficiency and effectiveness of multiple modes and formats of access/ownership” and “collaborate to educate faculty and others regarding ways to incorporate library collections and services into education experiences for students.”12 action items associated with these directions will help the libraries learn and apply information specific to their users as the libraries further adopt and integrate mobile technologies into their services. as we begin our planning in earnest, we look forward to our own set of valuable discoveries. references 1. janna anderson and lee rainie, the future of the internet iii, pew internet & american life project, http://www.pewinternet .org/~/media//files/reports/2008/pip_futureinternet3.pdf .pdf (accessed july 20, 2009). 2. sam churchill, “smartphone users: 110m by 2013,” blog entry, mar. 24, 2009, dailywireless.org, http://www.daily wireless.org/2009/03/24/smartphone-users-100m-by-2013 (accessed july 20, 2009). 3. mg siegler, “state of the iphone ecosystem: 40 million devices and 50,000 apps,” blog entry, june 8, 2009, tech crunch, http://www.techcrunch.com/2009/06/08/40-million-iphones -and-ipod-touches-and-50000-apps (accessed july 20, 2009). 4. jenna wortham, “palm app catalog hits a million downloads,” blog entry, june 24, 2009, new york times technology, http://bits.blogs.nytimes.com/2009/06/24/palm-app-cataloghits-a-million-downloads (accessed july 20, 2009). 5. larry johnson, alan levine, and rachel smith, horizon report, 2009 edition (austin, tex.: the new media consortium, 2009), http://www.nmc.org/pdf/2009-horizon-report.pdf (accessed july 20, 2009). 6. university of california, davis. “more than 40% of campus students own smartphones, yearly tech survey says,” technews, http://technews.ucdavis.edu/news2.cfm?id=1752 (accessed july 20, 2009). 7. university of nevada las vegas, office of information technology, “student technology survey report: 2008– 2009,” http://oit.unlv.edu/sites/default/files/survey/survey results2008_students3_27_09.pdf (accessed july 20, 2009). 8. joan lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl bi-monthly report 261 (dec. 2008), http://www.arl.org/bm~doc/arl-br-261-mobile .pdf. (accessed july 20, 2009); ellyssa kroski, “library mobile initiatives,” library technology reports 44, no. 5 (july 2008): 33–38. 9. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 20, 2009): 2. 10. char booth, informing innovation: tracking student interest in emerging library technologies at ohio university (chicago: association of college and research libraries, 2009), http:// www.ala.org/ala/mgrps/divs/acrl/publications/digital/ ii-booth.pdf (accessed july 20, 2009); “unlv libraries strategic plan 2009–2011,” 6. 11. “unlv libraries strategic plan 2009–2011,” 2. 12. ibid. 76 information technology and libraries | june 2010 in this paper we discuss the design space of methods for integrating information from web services into websites. we focus primarily on client-side mash-ups, in which code running in the user’s browser contacts web services directly without the assistance of an intermediary server or proxy. to create such mash-ups, we advocate the use of “widgets,” which are easy-to-use, customizable html elements whose use does not require programming knowledge. although the techniques we discuss apply to any web-based information system, we specifically consider how an opac can become both the target of web services integration and also a web service that provides information to be integrated elsewhere. we describe three widget libraries we have developed, which provide access to four web services. these libraries have been deployed by us and others. our contributions are twofold: we give practitioners an insight into the trade-offs surrounding the appropriate choice of mash-up model, and we present the specific designs and use examples of three concrete widget libraries librarians can directly use or adapt. all software described in this paper is available under the lgpl open source license. ■■ background web-based information systems use a client-server architecture in which the server sends html markup to the user’s browser, which then renders this html and displays it to the user. along with html markup, a server may send javascript code that executes in the user’s browser. this javascript code can in turn contact the original server or additional servers and include information obtained from them into the rendered content while it is being displayed. this basic architecture allows for myriad possible design choices and combinations for mash-ups. each design choice has implications to ease of use, customizability, programming requirements, hosting requirements, scalability, latency, and availability. server-side mash-ups in a server-side mash-up design, shown in figure 1, the mash-up server contacts the base server and each source when it receives a request from a client. it combines the information received from the base server and the sources and sends the combined html to the client. server-side mash-up systems that combine base and mash-up servers are also referred to as data mash-up systems. such data mash-up systems typically provide a web-based configuration front-end that allows users to select data sources, specify the manner in which they are combined, and to create a layout for the entire mash-up. godmar back and annette bailey web services and widgets for library information systems as more libraries integrate information from web services to enhance their online public displays, techniques that facilitate this integration are needed. this paper presents a technique for such integration that is based on html widgets. we discuss three example systems (google book classes, tictoclookup, and majax) that implement this technique. these systems can be easily adapted without requiring programming experience or expensive hosting. t o improve the usefulness and quality of their online public access catalogs (opacs), more and more librarians include information from additional sources into their public displays.1 examples of such sources include web services that provide additional bibliographic information, social bookmarking and tagging information, book reviews, alternative sources for bibliographic items, table-of-contents previews, and excerpts. as new web services emerge, librarians quickly integrate them to enhance the quality of their opac displays. conversely, librarians are interested in opening the bibliographic, holdings, and circulation information contained in their opacs for inclusion into other web offerings they or others maintain. for example, by turning their opac into a web service, subject librarians can include up-to-the-minute circulation information in subject or resource guides. similarly, university instructors can use an opac’s metadata records to display citation information ready for import into citation management software on their course pages. the ability to easily create such “mash-up” pages is crucial for increasing the visibility and reach of the digital resources libraries provide. although the technology to use web services to create mash-ups is well known, several practical requirements must be met to facilitate its widespread use. first, any environment providing for such integration should be easy to use, even for librarians with limited programming background. this ease of use must extend to environments that include proprietary systems, such as vendor-provided opacs. second, integration must be seamless and customizable, allowing for local display preferences and flexible styling. third, the setup, hosting, and maintenance of any necessary infrastructure must be low-cost and should maximize the use of already available or freely accessible resources. fourth, performance must be acceptable, both in terms of latency and scalability.2 godmar back (gback@cs.vt.edu) is assistant professor, department of computer science and annette bailey (afbailey@vt.edu) is assistant professor, university libraries, virginia tech university, blacksburg. web services and widgets for library information systems | back and bailey 77 examples of such systems include dapper and yahoo! pipes.3 these systems require very little programming knowledge, but they limit mash-up creators to the functionality supported by a particular system and do not allow the user to leverage the layout and functionality of an existing base server, such as an existing opac. integrating server-side mash-up systems with proprietary opacs as the base server is difficult because the mash-up server must parse the opac’s output before integrating any additional information. moreover, users must now visit—or be redirected to—the url of the mash-up server. although some emerging extensible opac designs provide the ability to include information from external sources directly and easily, most currently deployed systems do not.4 in addition, those mash-up servers that do usually require server-side programming to retrieve and integrate the information coming from the mash-up sources into the page. the availability of software libraries and the use of special purpose markup languages may mitigate this requirement in the future. from a performance scalability point of view, the mash-up server is a bottleneck in server-side mash-ups and therefore must be made large enough to handle the expected load of end-user requests. on the other hand, the caching of data retrieved from mash-up sources is simple to implement in this arrangement because only the mash-up server contacts these sources. such caching reduces the frequency with which requests have to be sent to sources if their data is cacheable, that is, if realtime information is not required. the latency in this design is the sum of the time required for the client to send a request to the mashup server and receive a reply, plus the processing time required by the server, plus the time incurred by sending a request and receiving a reply from the last responding mash-up source. this model assumes that the mash-up server contacts all sources in parallel, or as soon as the server knows that information from a source should be included in a page. the availability of the system depends on the availability of all mash-up sources. if a mash-up source does not respond, the end user must wait until such failure is apparent to the mash-up server via a timeout. finally, because the mash-up server acts as a client to the base and source servers, no additional security considerations apply with respect to which sources may be contacted. there also are no restrictions on the data interchange format used by source servers as long as the mash-up server is able to parse the data returned. client-side mash-ups in a client-side setup, shown in figure 2, the base server sends only a partial website to the client, along with javascript code that instructs the client which other sources of information to contact. when executed in the browser, this javascript code retrieves the information from the mash-up sources directly and completes the mash-up. the primary appeal of client-side mashing is that no mash-up server is required, and thus the url that users visit does not change. consequently, the mash-up server is no longer a bottleneck. equally important, no maintenance is required for this server, which is particularly relevant when libraries use turnkey solutions that restrict administrative access to the machine housing their opac. on the other hand, without a mash-up server, results from mash-up sources can no longer be centrally cached. thus the mash-up sources themselves must be sufficiently figure 1. server-side mash-up construction figure 2. client-side mash-up construction 78 information technology and libraries | june 2010 scalable to handle the expected number of requests. as a load-reducing strategy, mash-up sources can label their results with appropriate expiration times to influence the caching of results in the clients’ browsers. availability is increased because the mash-up degrades gracefully if some of the mash-up sources fail, since the information from the remaining sources can still be displayed to the user. assuming that requests are sent by the client in parallel or as soon as possible, and assuming that each mash-up source responds with similar latency to requests sent by the user’s browser as to requests sent by a mash-up server, the latency for a client-side mash-up is similar to the server-side mash-up. however, unlike in the server-side approach, the page designer has the option to display partial results to the user while some requests are still in progress, or even to delay sending some requests until the user explicitly requests the data by clicking on a link or other element on the page. because client-side mash-ups rely on javascript code to contact web services directly, they are subject to a number of restrictions that stem from the security model governing the execution of javascript code in current browsers. this security model is designed to protect the user from malicious websites that could exploit client-side code and abuse the user’s credentials to retrieve html or xml data from other websites to which a user has access. such malicious code could then relay this potentially sensitive data back to the malicious site. to prevent such attacks, the security model allows the retrieval of html text or xml data only from sites within the same domain as the origin site, a policy commonly known as sameorigin policy. in figure 2, sources a and b come from the same domain as the page the user visits. the restrictions of the same-origin policy can be avoided by using the javascript object notation (json) interchange format.5 because client-side code may retrieve and execute javascript code served from any domain, web services that are not co-located with the origin site can make their results available using json. doing so facilitates their inclusion into any page, independent of the domain from which it is served (see source c in figure 2). many existing web services already provide an option to return data in json format, perhaps along with other formats such as xml. for web services that do not, a proxy server may be required to translate the data coming from the service into json. if the implementation of a proxy server is not feasible, the web service is usable only on pages within the same domain as the website using it. client-side mash-ups lend themselves naturally to enhancing the functionality of existing, proprietary opac systems, particularly when a vendor provides only limited extensibility. because they do not require server-side programming, the absence of a suitable vendor-provided server-side programming interface does not prevent their creation. oftentimes, vendor-provided templates or variables can be suitably adapted to send the necessary html markup and javascript code to the client. the amount of javascript code a librarian needs to write (or copy from a provided example) determines both the likelihood of adoption and the maintainability of a given mash-up creation. the less javascript code there is to write, the larger the group of librarians who feel comfortable trying and adopting a given implementation. the approach of using html widgets hides the use of javascript almost entirely from the mash-up creator. html widgets represent specially composed markup, which will be replaced with information coming from a mash-up source when the page is rendered. because the necessary code is contained in a javascript library, adapters do not need to understand programming to use the information coming from the web service. finally, html widgets are also preferable for javascript-savvy users because they create a layer of abstraction over the complexity and browser dependencies inherent in javascript programming. ■■ the google book classes widget library to illustrate our approach, we present a first example that allows the integration of data obtained from google book search into any website, including opac pages. google book search provides access to google’s database of book metadata and contents. because of the company’s book scanning activities as well as through agreements with publishers, google hosts scanned images of many book jackets as well as partial or even full previews for some books. many libraries are interested in either using the book jackets when displaying opac records or alerting their users if google can provide a partial or full view of an item a user selected in their catalog, or both.6 this service can help users decide whether to borrow the book from the library. the google book search dynamic link api the google book search dynamic link api is a jsonbased web service through which google provides certain metadata for items it has indexed. it can be queried using bibliographic identifiers such as isbn, oclc number, or library of congress control number (lccn). it returns a small set of data that includes the url of a book jacket thumbnail image, the url of a page with bibliographic information, the url of a preview page (if available), as well as information about the extent of any preview and whether the preview viewer can be embedded directly into other pages. table 1 shows the json result returned for an example isbn. web services and widgets for library information systems | back and bailey 79 widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. from the adapter’s perspective, the use of these widgets is extremely simple. the adapter places html or tags into the page where they want data from google book search to display. these tags contain an html
attribute that acts as an identifier to describe the bibliographic item for which information should be retrieved. it may contain its isbn, oclc number, or lccn. in addition, the tags also contain one or more html attributes to describe which processing should be done with the information retrieved from google to integrate it into the page. these classes can be combined with a list of traditional css classes in the attribute to apply further style and formatting control. examples as an example, consider the following html an adapter may use in a page: when processed by the google book classes widget library, the class “gbs-thumbnail” instructs the widget to embed a thumbnail image of the book jacket for isbn 0596000278, and “gbs-link-to-preview” provides instructions to wrap the tag in a hyperlink pointing to google’s preview page. the result is as if the server had contacted google’s web service and constructed the html shown in example 1 in table 2, but the mash-up creator does not need to be concerned with the mechanics of contacting google’s service and making the necessary manipulations to the document. example 2 in table 2 demonstrates a second possible use of the widget. in this example, the creator’s intent is to display an image that links to google’s information page if and only if google provides at least a partial preview for the book in question. this goal is accomplished by placing the image inside the span and using style=“display:none” to make the span initially invisible. the span is made visible only if a preview is available at google, displaying the hyperlinked image. the full list of features supported by the google book classes widget library can be found in table 3. integration with legacy opacs the approach described thus far assumes that the mashup creator has sufficient control over the html markup that is sent to the user. this assumption does not always hold if the html is produced by a vendor-provided system, since such systems automatically generate most of the html used to display opac search results or individual bibliographic records. if the opac provides an extension system, such as a facility to embed customized links to external resources, it may be used to generate the necessary html by utilizing variables (e.g., “@#isbn@” for isbn numbers) set by the opac software. if no extension facility exists, accommodations by the widget library are needed to maintain the goal of not requiring any programming on the part of the adapter. we implemented such accommodations to facilitate the use of google book classes within a iii millennium opac.7 we used magic strings such as “isbn:millennium.record” in a table 1. sample request and response for google book search dynamic link api request: http://books.google.com/books?bibkeys=isbn:0596000278&jscmd=viewapi&callback=process json response: process({ “isbn:0596000278”: { “bib_key”: “isbn:0596000278”, “info_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26source=gbs_viewapi”, “preview_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 source=gbs_viewapi”, “thumbnail_url”: “http://bks4.books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 img=1\x26zoom=5\x26sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a”, “preview”: “partial”, “embeddable”: true } }); 80 information technology and libraries | june 2010 table 2. example of client-side processing by the google book classes widget library example 1: html written by adapter browser display resultant html after client-side processing example 2: html written by adapter browser display resultant html after client-side processing table 3. supported google book classes google book class meaning gbs-thumbnail gbs-link-to-preview gbs-link-to-info gbs-link-to-thumbnail gbs-embed-viewer gbs-if-noview gbs-if-partial-or-full gbs-if-partial gbs-if-full gbs-remove-on-failure include an embedding the thumbnail image wrap span/div in link to preview at google book search (gbs) wrap span/div in link to info page at gbs wrap span/div in link to thumbnail at gbs directly embed a viewer for book’s content into the page, if possible keep this span/div only if gbs reports that book’s viewability is “noview” keep this span/div only if gbs reports that book’s viewability is at least “partial” keep this span/div only if gbs reports that book’s viewability is “partial” keep this span/div only if gbs reports that book’s viewability is “full” remove this span/div if gbs doesn’t return book information for this item attribute to instruct the widget library to harvest the isbn from the current page via screen scraping. figure 3 provides an example of how a google book classes widget can be integrated into an opac search results page. ■■ the tictoclookup widget library the tictocs journal table of contents service is a free online service that allows academic researchers and web services and widgets for library information systems | back and bailey 81 other users to keep up with newly published research by giving them access to thousands of journal tables of contents from multiple publishers.8 the tictocs consortium compiles and maintains a dataset that maps issns and journal titles to rss-feed urls for the journals’ tables of contents. the tictoclookup web service we used the tictocs dataset to create a simple json web service called “tictoclookup” that returns rss-feed urls when queried by issn and, optionally, by journal title. table 4 shows an example query and response. to accommodate different hosting scenarios, we created two implementations of this tictoclookup: a standalone and a cloud-based implementation. the standalone version is implemented as a python web application conformant to the web services gateway interface (wsgi) specification. hosting this version requires access to a web server that supports a wsgicompatible environment, such as apache’s mod_wsgi. the python application reads the tictocs dataset and responds to lookup requests for specific issns. a cron job downloads the most up-to-date version of the dataset periodically. the cloud version of the tictoclookup service is implemented as a google app engine (gae) application. it uses the highly scalable and highly available gae datastore to store tictocs data records. gae applications run on servers located in google’s regional data centers so that requests are handled by a data center geographically close to the requesting client. as of june 2009, google hosting of gae applications is free, which includes a free allotment of several computational resources. for each application, gae allows quotas of up to 1.3 mb requests and the use of up to 10 gb of bandwidth per twenty-fourhour period. although this capacity is sufficient for the purposes of many small and medium-size institutions, additional capacity can be purchased at a small cost. widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. like google book classes, this widget library is controlled via html attributes associated with html or tags that are placed into the page where the user decides to display data from the tictoclookup service. the html
attribute identifies the journal by its issn or its issn and title. as with google book classes, figure 3. sample use of google book classes in an opac results page table 4. sample request and response for tictocs lookup web service request: http://tictoclookup.appspot.com/0028-0836?title=nature&jsoncallback=process json response: process({ “lastmod”: “wed apr 29 05:42:36 2009”, “records”: [{ “title”: “nature”, “rssfeed”: http://www.nature.com/nature/current_issue/rss }], “issn”: “00280836” }); 82 information technology and libraries | june 2010 the html attribute describes the desired processing, which may contain traditional css classes. example consider the following html an adapter may use in a page: click to subscribe to table of contents for this journal when processed by the tictoclookup widget library, the class “tictoc-link” instructs the widget to wrap the span in a link to the rss feed at which the table of content is published, allowing users to subscribe to it. the class “tictoc-preview” associates a tooltip element with the span, which displays the first entries of the feed when the user hovers over the link. we use the google feeds api, another json-based web service, to retrieve a cached copy of the feed. the “tictoc-alternate-link” class places an alternate link into the current document, which in some browsers triggers the display of the rss feed icon figure 4. sample use of tictoclookup classes in the status bar. the element, which is initially invisible, is made visible if and only if the tictoclookup service returns information for the given pair of issn and title. figure 4 provides a screenshot of the display if the user hovers over the link. as with google book classes, the mash-up creator does not need to be concerned with the mechanics of contacting the tictoclookup web service and making the necessary manipulations to the document. table 5 provides a complete overview of the classes tictoclookup supports. integration with legacy opacs similar to the google book classes widget library, we implemented provisions that allow the use of tictoclookup classes on pages over which the mash-up creator has limited control. for instance, specifying a title attribute of “issn:millennium.issnandtitle” harvests the issn and journal title from the iii millennium’s record display page. ■■ majax whereas the widget libraries discussed thus far integrate external web services into an opac display, majax is a widget library that integrates information coming from an opac into other pages, such as resource guides or course displays. majax is designed for use with a iii millennium integrated library system (ils) whose vendor does not provide a web-services interface. the techniques we used, however, extend to other opacs as well. like many table 5. supported tictoclookup classes tictoclookup class meaning tictoc-link tictoc-preview tictoc-embed-n tictoc-alternate-link tictoc-append-title wrap span/div in link to table of contents display tooltip with preview of current entries embed preview of first n entries insert into document append the title of the journal to the span/div web services and widgets for library information systems | back and bailey 83 legacy opacs, millennium does not only lack a web-services interface, but lacks any programming interface to the records contained in the system and does not provide access to the database or file system of the machine housing the opac. providing opac data as a web service we implemented two methods to access records from the millennium opac using bibliographic identifiers such as isbn, oclc number, bibliographic record number, and item title. both methods provide access to complete marc records and holdings information, along with locations and real-time availability for each held item. majax extracts this information via screenscraping from the marc record display page. as with all screen-scraping approaches, the code performing the scraping must be updated if the output format provided by the opac changes. in our experience, such changes occur at a frequency of less than once per year. the first method, majax 1, implements screen scraping using javascript code that is contained in a document placed in a directory on the server (/screens), which is normally used for supplementary resources, such as images. this document is included in the target page as a hidden html