september_ital_yelton_final president’s message: 50 years andromeda yelton information technologies and libraries | september 2017 1 fifty years. lita was voted into existence (as isad, the information science and automation division) in detroit at midwinter 1966. therefore we have just completed our first fifty years, a fact celebrated (thanks to our 50th anniversary task force) with a slide show and cake at annual in chicago. it’s truly humbling to take office upon this milestone. looking back, some of the true giants of library technology have held this office. in 1971-72, jesse shera, who in his wide-ranging career challenged librarians to think deeply about the epistemological and sociological dimensions of librarianship; ala makes several awards in his name today. in 1973-74 and again in 1974-75, frederick kilgour, the founding director of oclc, who also has an eponymous award. in 1975-76, henriette avram, the mother of marc, herself. moreover, thanks to the work of countless lita volunteers, much of this history is available openaccess. i strongly recommend reading http://www.ala.org/lita/about/history/ for an overview of the remarkable people and key issues across our history. you can also read papers by avram and kilgour, among many others, in the archives of this very publication. in fact, reading the ital archives is deeply engaging. it turns out library technology has changed a bit in 50 years! (i trust that isn’t a shock to you.) the first articles (in what was then the journal of library automation) are all about instituting first-time computer systems to automate traditional library functions such as acquisitions, cataloging, and finance. the following passage caught my eye: “a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order” (“an integrated computer based technical processing system in a small college library”, jack w. scott; https://doi.org/10.6017/ital.v1i3.2931.) how many of us are still using punch cards today? and, indeed, how many of us are automating libraries for the first time? the topics discussed among lita members today are far more wideranging: user experience, privacy, accessibility. they’re more likely to be about assessing and improving existing systems than creating new ones, and more likely to center on patron-facing technologies. andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and owner/consultant of small beautiful useful llc. president’s message | yelton https://doi.org/10.6017/ital.v36i3.10086 2 and yet, with a few substitutions — say, “raspberry pi” for “friden flexowriter” — the blockquote above would not be out of place today. then as now, lita members were doing something exciting, yet deeply practical, that cleverly repurposes new technology to make library experiences better for both patrons and staff. our job descriptions have changed enormously in fifty years; in fact, the lita board charged a task force to develop lita member personas, so that we can better understand whom we serve, and work to align our publications, online education, conference programming, and committee work toward your needs. (you can see an overview of the task force’s stellar work on litablog: http://litablog.org/2017/03/who-are-lita-members-lita-personas/.) at the same time, the spirit of pragmatic creativity that runs throughout the first issues of the journal of library automation continues to animate lita members today. i’m looking forward to seeing where we go in our next fifty years. lita president's message: sustaining lita lita president’s message sustaining lita emily morton-owens information technology and libraries | september 2019 2 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. recently, at the 2019 midwinter meeting in seattle, ala decided to adopt sustainability as one of the core values of librarianship. the resolution includes the idea of a triple bottom line: “to be truly sustainable, an organization or community must embody practices that are environmentally sound and economically feasible and socially equitable.” if you had thought of sustainability mainly in terms of the environment, you have plenty of company. i originally pictured it as an umbrella term for a variety of environmental efforts: clean air, waste reduction, energy efficiency. but in fact the idea encompasses human development in a broader sense. one definition of sustainability involves making decisions in the present that take into account the needs of the future. of course our current environmental threats demand our attention, and libraries have found creative ways to promote environmental consciousness (myriad examples include books on bikes, seeking leed or passive house certification for library buildings, providing resources on xeriscaping, and many more). even if you’re not presently working in a position that allows you to engage directly on the environment, though, the concept of sustainability turns out to permeate our work and values. the ideas of solving problems in a way that doesn’t create new challenges for future people, developing society in a way that allows all people to flourish, and fostering strong institutions: these concepts all resonate with the work we do daily, not only in what we offer our users but also in how we work with each other. as a profession, we have a history of designing future-proof systems (or at least attempting to). whenever i’ve been involved in planning a digital library project, one of the first questions on the table is “how do we get our data back out of this, when the time comes?” no matter how enamored we are of the current exciting new solution, we remember that things will look different in the future. library metadata schemas are all about designing for interoperability and reusability, including in new ways that we can’t picture yet. someone who is unaccustomed to this kind of planning may see a high project overhead for these concerns, but we have consistently incorporated long-term thinking into our professional values due to the importance we place on free access, data preservation, and interoperability. the triple-bottom line approach, considering economic, social, and environmental factors, also influences the lita leadership. i recently announced the lita board’s decision to reduce our in person participation at ala midwinter for 2020, which is partly in response to ala’s deliberations about reinventing the event starting in 2021. with all the useful collaboration technologies now at our fingertips, it is harder to justify requiring our members to meet in person more than once per year. it is possible for us to do great work, on a continuous and rolling basis, throughout the year. more importantly, we want to offer committee and leadership positions to members who may not mailto:egmowens.lita@gmail.com http://www.ala.org/aboutala/sites/ala.org.aboutala/files/content/governance/council/council_documents/2019_ms_council_docs/ala%20cd%2037%20resolution%20for%20the%20adoption%20of%20sustainability%20as%20a%20core%20value%20of%20librarianship_final1182019.pdf sustaining lita | morton-owens 3 https://doi.org/10.6017/ital.v38i3.11627 be able to travel extensively, for personal or work reasons. (especially when many do not receive financial support from their employers. and, to come back around to environmental concerns for a moment, think of all the flights our in-person meetings require.) by being more flexible about what participation looks like, we sustain the effort that our members put into lita through a world of work that is changing. financial sustainability is also a factor in our pursuit of a merger with alcts and llama. we are three smaller divisions based on professional role, not library type, who share interests and members. we also have similar needs and processes for running our respective associations. unfortunately, lita has been on an unsustainable course with our budget for some time—we spend more than we take in annually, due to overhead costs and working within ala’s processes and infrastructure. the lita board has engaged for many years on the question of how to balance our financial future with the fact that our programs require full-time staff, instructors, technology, printing, meeting rooms, etc. core, as the new merged division will be known, will allow us to correct that balance by combining our operations, streamlining workflows, and containing our costs. the staff will also be freed up to invest more effort in member engagement. we can’t predict all the services that associations will offer in the future, but we know that, for example, online professional development is always needed, so we’re ensuring that the plan allows it to continue. it is inspiring to talk about the new collaborations and subject-matter synergies that the merger will bring with it, but core will also achieve something important for sustaining a level of service to our membership. at the ala level, the steering committee on organizational effectiveness (scoe) is also looking at ways to streamline the association’s structure and make it more approachable and welcoming to new members. i would add that a simplified structure should make ala more accountable to members as well, which is crucial for positioning it as an organization worth devoting yourself to. these shifts are essential because member volunteers are what make ala happen, and we need a structure that invites participation from future generations of library workers. taken together, these may look like a confusing flurry of changes. but librarians have evolved to be excellent at long-term thinking about our goals and values and how to pursue an exciting future vision based on what we know now and what tools (technology, people, ideas) we have at hand. we care about helping our users thrive and are able to take a broad view of what that encompasses. in particular, with the new resolution about sustainability, we’re including the health of our communities and the security of our environment as a part of that mission. due to their innovative spirit and principled sense of commitment, our members are well-placed to lead transformations in their home institutions and to participate in the development of lita. as we weigh all these changes, we value the achievements of our association and its past leaders and members, and seek to honor them by making sure those successes carry on for our future colleagues. letter from the editor (december 2019) letter from the editor kenneth j. varnum information technology and libraries | december 2019 1 https://doi.org/10.6017/ital.v38i4.11923 earlier this fall, i had the privilege of participating in the sharjah library conference, a three-day event hosted by the sharjah book authority in the united arab emirates with programming coordinated by the ala international relations office. the experience of meeting with so many librarians from cultures different from my own was truly rewarding and enriching. it was both refreshing and invigorating to see, first-hand, the global importance of the local matters that occupy so much of my professional life. i returned to my regular job with a newfound appreciation for how much the issues i spend so much of my professional time on—information access, equity, user experience, and the like—are universal. it is easy to get lost in the weeds of my own circumstances and environment, and sometimes difficult to look up and explore what colleagues, known and unknown, are doing and thinking. the experience reinforces the importance of important open access publications such as information technology and libraries. while “open access” doesn’t remove every possible barrier to accessing the knowledge, experience, and lessons contained within in its virtual cover, it does remove the all-important paywall. and that is no small thing, in a community of library technologists who interact and exchange information through social media, email, and other tools. our open access status gives this journal a vibrant platform for sharing knowledge, experience, and expertise to all who seek it. i hope you find this issue’s contents useful and informative, and will share the items you find most important with your peers at your institutions and beyond. i invite you to add your own knowledge and experience to our collective wisdom through a contribution to the journal. for more details, see the about the journal page or get in touch with me. sincerely, kenneth j. varnum, editor varnum@umich.edu december 2019 https://www.sibfala.com/program http://www.sba.gov.ae/ http://www.ala.org/aboutala/offices/iro https://ejournals.bc.edu/index.php/ital/about mailto:varnum@umich.edu editorial board thoughts halfway home: user centered design and library websites mark cyzyk information technology and libraries | march 2018 4 mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. our library website has now gone through two major redesigns in the past five or so years. in both cases, a user centered design approach was used to plan the site. in contrast to the single person vision and design by committee approaches, user centered design focuses on the empirical study and eliciting of the needs of users. great attention is paid to studying them, listening to them, and to exposing their needs as expressed. in both of our cases, the overall design, functionality, and content of the new site was then focused exclusively on the results of such study. if a proposed design element, a bit of functionality, or a chunk of content did not appear as an expressly desired feat ure for our users, it was considered clutter and did not make it onto the site. both iterations of our website redesign were strictly governed by this principle. but user centered design has blind spots. first, it may well be that what you take to be your comprehensive user base is not as comprehensive as you think. in my library, our primary users are our faculty and student researchers, so great attention was paid to them. this makes sense insofar as we are an academic library within a major research univ ersity. faculty and student researchers will always be our primary user group. but they are not our comprehensive user group. we have staff, administrators, visitors, members of our board of trustees, members of our friends, outside members of the profession, etc. — and they are all important constituencies in their own ways. second, unless your sample size of users is large enough to be statistically valid, you are merely playing a game of three blind men and the elephant. each user individually will be ex pressing his or her own experience and perceived needs based on that experience, and yet none of them, even taken as a group, will be reporting on the whole beast. while personal testimony definitely counts as evidence, it also frequently and insidiously results in blind spots that would otherwise be exposed through having a statistically valid sample of study participants. third, and perhaps most importantly, user centered design discounts the expertise of librarians. nobody knows a library’s users and patrons as well as librarians. knowing their users, eliciting their needs, is part of what librarians as one of the “helping professions” do; it is a central tenet of librarianship. there is no substitute for experience and the expertise that follows from it. in the art world, this is connoisseurship. somehow, the art historian just knows that what is before him is not a genuine rembrandt. the empirical evidence may ineluctably lead to a different conclusion — yet there remains something missing, something the connoisseur cannot fully elucidate. similarly, in the medical world the radiologist somehow just knows that the subtle gradations on his screen indicate one type of malady and not another. interestingly, in the poultry industry there is something called a “chicken sexer.” this is a person who quickly and accurately sorts baby chicks by sex. training for this vocation mailto:mcyzyk@jhu.edu editorial board thoughts: halfway home | cyzyk 5 https://doi.org/10.6017/ital.v37i1.103813 largely employs what the philosophers call “ostensive definition:” “this one is male; that one is female.” the differences are so small as to be imperceptible. and yet, experienced chicken sexers can accurately sort chicks at an astonishing rate. they just know through experience. such is the nature of tacit knowledge. in the case of our most recent website redesign, none of our users expressed any interest whatsoever, for example, in including floor maps as part of the new site. we were assured a demand for floor maps on the site was “not a thing.” so floor maps were initially excluded from the site. this was met with a slow crescendo of grumbling from the librarians, and rightly so. librarians, and the graduate students at our information desk, know through long experience that researchers of varying types find floor maps of the building to be useful. that’s why we’ve handed out paper copies for years. the fact that this need was missed through our focus on user centered design points to a blind spot in that process. valuable experience and the expertise that follows from it should not be dismissed or otherwise diminished through dogmatic adherence to the core principle of user centered design. ... and yet, don’t get me wrong: insofar as it’s the empirical study of select user groups and their expressed concerns and needs, user centered design as a design technique and foundational principle is crucially important and useful. it gets us halfway home. core leadership column: making room for change through rest core leadership column making room for change through rest margaret heller information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13513 i write this column from the vantage point of my current role as a member of the core technology section leadership team, and as a newly elected president-elect of core, with my term starting in july 2021. the planning for core began years ago but became a real division of ala in the most chaotic of times. visions for the first year of core were set aside as we had to face the reality of all the work needing to be done remotely, without any conferences that would allow for in-person conversations, and with all the leadership and members under personal and professional strain. yet being forced to start up slowly and deliberately provides some advantages. settling into this new situation has allowed staff, leaders, and members to acclimate to a new division an d learn how we want to do things in the future, rather than relying too much on how we did things in the past or feeling pressure to meet every demand. right now, we are all at a juncture in our personal and professional lives, and thinking about how to approach the coming months. summer offers the promise of growth and reinvention. the pause that a break implies allows time for us both as individuals to make time for what is important to us, and as members or employees of institutions to reconsider our priorities. for people working in library technology, however, the “summer break” is often anything but. public libraries become a hub for activity as schools are closed, and school and academic libraries may use slow periods when classes are not in session for necessary systems upgrades or to roll out a new service. the summer of 2020 was one of the most challenging of my life, both professionally and personally, and meeting all the demands of the moment left hardly any time for a true break. this year, just like last year, feels like a summer we might not let ourselves rest for a moment. while many libraries have been open to some degree over the past year, the upcoming summer has the potential for a return to something like normal. shutting down regular in-person services and buildings felt chaotic since it required new ways of providing those services and building up new technical infrastructure, but without us having expected this in advance like a normal summer project. the return may also feel chaotic, but rather than approaching it as a series of tasks in a plan that requires lots of energy and work, i hope we can treat the time as a period of reflective practice and give ourselves time to understand what has changed. adapting to the realities of life since spring of 2020 has changed us all in various ways, and so too our library users have new needs and expectations. in some cases, they have embraced new services, though this has not been a smooth process for everyone. i have a family member who started using an e-reader for the first time during the pandemic to access library e-books when her public library was closed or had limited services. she was grateful for the option to access books this way, but occasionally struggled to follow the complex workflow from library app to vendor site to device. without the ability to visit a physical reference desk to ask for help, she asked me to assist with device troubleshooting on several occasions. that worked well for her, but margaret heller (mheller1@luc.edu) is digital services librarian, loyola university chicago, and (as of july 1, 2021) president-elect of the core: leadership, infrastructure, futures division of ala. © 2021. mailto:mheller1@luc.edu information technology and libraries june 2021 making room for change through rest | heller 2 not everyone has a digital services librarian in their quarantine bubble. i share this to illustrate that while some people will have adapted or gotten the help they need, for many, this time has been one of doing without or maladaptation. going back to “normal” will not help those who will need even more than they did pre-pandemic. taking time to understand that fact, and to accept that it will not be a quick process of return for many people, will allow us to give each other space to find a way back to our lives as library users and library employees. while many of us feel uncomfortable when we see slow progress—i know i do—i am coming to realize the value of making space for slowness and for rest. rest comes in all forms. it could be physical rest, but it could be pursuing an artistic or athletic hobby, intentional social interactions, or spiritual practices. institutions might give extra time off or set healthy expectations for work hours and meeting-free days, while also discarding old practices and attitudes to create better future work environments. there are crises to which we must immediately react and respond, but without personal and institutional energy in reserve, we will not do as good a job when they occur. crises include political upheaval, public health emergencies, and other major events, but we can also appreciate how they unfold on a more mundane level. information technology work often requires odd hours, intense bursts of energy to complete projects in a small window of time, and unpredictable problems that require dropping everything else to address an emergency. it is natural to constantly look towards the most urgent and the newest problem. this tendency results in lengthy backlogs for requests and accumulates technical debt from deferred maintenance or refactoring. yet as we bring our libraries and other institutions out of pandemic mode over the next few years, allowing for reflective space can help us to be cautious about the choices we make. for example, during earlier stages of the pandemic, many of us probably had to set u p systems for some type of surveillance to maintain social distancing and aid in contact tracing. taking some time to review all those new procedures and systems—and purposefully dismantle those with negative privacy implications—will help us to go forward as more ethical and empathetic institutions. taking it slow is going to be the only way through the next period. summer 2021 should be about reflection on collective trauma. we responded to the events of the past year, whether it was for closing libraries, keeping libraries open as safely as possible, racial justice work, or election support, and now we must consider how to incorporate what we started into lasting change. to do that reflection will require rest. we know how important rest is but finding space for it is not usually a high priority. rest allows us to integrate our experiences, and will build us back to make sure we can keep responding to what comes next. i am challenging myself to spend time in deliberate reflection at the cost of mindless productivity over the coming months so that i can keep helping my library and core succeed. i hope you will consider doing the same. june_ital_fifarek_final president’s message: for the record aimee fifarek information technologies and libraries | june 2017 1 this is my final column as lita president. having just finished the 2016/17 annual report, i must admit i’m a little tapped out. over the last year i’ve written on the events of an ala annual and midwinter conferences, a lita forum, a new strategic plan, information ethics, and advocacy. even for an english major and a librarian that’s a lot of words. as i work with executive director jenny levine and the rest of the lita board to prepare the agenda for our meetings at annual, the temptation is to focus on all the work that is yet to be done. but with the end of school and fiscal years approaching, it is the ideal time to celebrate everything that has been accomplished over the last 12 months. first off, at some magical point during the year we completed the lita staff transition period. jenny has truly made the executive director position her own, and although she and mark beatty have more than enough work for six people, they are well on their way to guiding lita to a bright new future. with her knowledge of the inner workings of ala and her desire to make everything easier, faster and better, jenny is truly the right person for this job. next, we have a great new set of people coming in to lead lita. andromeda yelton is going to be a fabulous lita president. she is an eloquent speaker, has more determination than anyone i know, and is a kick ass coder to boot. bohyun kim has an amazing talent for organizing and motivating people, and as president-elect work wonders with the new appointments committee. our new directors-at-large lindsay cronk, amanda goodman, and margaret heller are all devoted litans who will be great additions to the board. i’m glad i get to work with them all in their new roles as i transition to past-president. and last but certainly not least we have started to make inroads on our advocacy and information policy strategic focus. the privacy interest group has already raised lita’s profile by supplementing ala’s intellectual freedom committee’s privacy policies with privacy checklists.1 a group of board members along with office for information technology policy liaison david lee king and advocacy coordinating committee liaison callan bignoli are working on a new task force proposal to outline strategies for effectively collaborating with the ala washington office. these are just the first steps towards a future in which lita is not only relevant but necessary. with all that hard work accomplished, it must be time to toast to our successes. i hope that everyone who will be at ala annual in chicago (http://2017.alaannual.org/) later this month will join us as we conclude our 50th anniversary year. sunday with lita promises to be amazing, with aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek https://doi.org/10.6017/ital.v36i2.10019 2 hugo award winner kameron hurley (http://www.kameronhurley.com) speaking at the president’ program, followed by what is sure to be a spectacular lita happy hour at the beer bistro (http://www.thebeerbistro.com/). we are still working on our goal to raise $10,000 for professional development scholarships. we’re only halfway there, so please donate at: https://www.crowdrise.com/lita-50th-anniversary. being lita president during the association’s 50th anniversary year has been both an honor and a challenge. during a milestone year like this you become acutely aware of all of the hard work and innovation that was required for the association to thrive for half a century, and feel more than a little pressure to leave an extraordinary legacy that will ensure another fifty years of success. it’s a tall order, especially in an era of rapid political and societal change. but as i navigated through my presidential year i realized that i didn’t have to do anything more than ensure that people who already want to work hard for the greater good have a welcoming place to do just that. after fifty years, lita still has the thing that made it a success in the first place: a core group of volunteers committed to the belief that new technologies can empower libraries to do great things. the talented and passionate people i have worked with on the board, in the committee and interest group leadership, and throughout the membership are the best legacy that an association can have. now more than ever the people in libraries who “do tech” can be leaders in their communities and on the national stage. now more than ever it is lita’s time to shine. references 1. http://litablog.org/2017/02/new-checklists-to-support-library-patron-privacy/ letter from the editor kenneth j. varnum information technology and libraries | march 2018 1 https://doi.org/10.6017/ital.v37i1.10388 this issue marks 50 years of information technology and libraries. the scope and everaccelerating pace of technological change over the five decades since journal of library automation was launched in 1968 mirrors what the world at large has experienced. from “automating” existing services and functions a half century ago, libraries are now using technology to rethink, recreate, and reinvent services — often in areas that simply were in the realm of science fiction. in an attempt to put today’s technology landscape in context, ital will publish a series of essays this year, each focusing on the highlights of a decade. in this issue, editorial board member mark cyzyk talks about selected articles from the first two volumes of the journal. in the remaining issues this year, we’ll tackle the 1970s, 1980s, 1990s, and 2000s. the journal itself, now as ever before, focuses on the present and the near future, so we will hold off recapitulating the current decade until our centennial celebration in 2068. as we look back over the journal’s history, the editorial board is also looking to the future. we want to make sure that we know for whom we are publishing these articles, and to make sure that the journal is as relevant to today’s (and tomorrow’s) readership as it has been for those who have brought us to the present. to that end, we invite anyone who is reading this issue to take this brief survey — tell us a little about how you came to ital today, how you’re connected with library technology, and what you’d like to see in the journal. it won’t take much of you r time (no more than 5 minutes) and will help us understand the context in which we are working. there’s another opportunity for you to help shape the future of the journal. due to a number of terms being up at the end of june 2018, we have at least five openings on the editorial board to fill. if you are passionate about libraries and technology, enjoy working with authors to shape their articles, and want to help set out today’s scholarly record for tomorrow’s technologists, submit a statement of interest at https://goo.gl/forms/5gbqouuseolxrfx52. we seek to have an editorial board that represents the diversity of library technology practitioners, and particularly invite individuals from non-academic libraries and underrepresented demographic groups to apply. sincerely, kenneth j. varnum editor march 2018 https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://goo.gl/forms/5gbqouuseolxrfx52 letter from the editor (september 2020) letter from the editor september 2020 kenneth j. varnum information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12xxx with “unprecedented” rising to first place on my personal list of words i would prefer never to need to use again, let alone hear used, i find it eminently satisfying that some activities and events from before covid continue in their usual, predictable ways. for me, the quarterly rhythm of publication of information technology and libraries is one of those activities. it is helping keep me grounded. while it is certainly not much in the scope of what is happening all around me, it is at least something. one thing that is changing is that this journal, along with library resources and technical services and library leadership & management are now publications of ala’s newest division: core: leadership, infrastructure, futures. you’ll notice a new logo at the top of our site, reflecting the new organizational structure. i am excited about the possibilities of richer cross-core cooperation and collaboration as we explore our new structure. this issue includes the first—and last—lita president’s message from incoming and outgoing lita president evviva weinraub lajoie. evviva assumed the lita presidency this summer, just before the merger of lita, llama, and alcts into the new core division took place on september 1. members of those three merged divisions should watch for information about elections for the new core president in october. i am pleased that this issue includes the 2020 lita/ex libris student writing award winning article, evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results, by sam grabus of drexel university. julia bauder, the chair of this year’s selection committee (i was also a member, as ital editor) said, “this valuable work of original research helps to quantify the scope of a problem that is of interest not only in the field of library and information science, but that also, as grabus notes in her conclusion, could affect research in fields from the digital humanities to the sciences.” before closing, i would like to express my appreciation to breanne kirsch, who ably served on the editorial board from 2018-2020. sincerely, kenneth j. varnum, editor varnum@umich.edu september 2020 https://doi.org/10.6017/ital.v39i3.12235 https://doi.org/10.6017/ital.v39i3.12235 mailto:varnum@umich.edu letter from the editor kenneth j. varnum information technology and libraries | june 2018 1 in this june 2018 issue, we continue our celebration of ital’s 50th year with a summary by editorial board member sandra shores of the articles published in the 1970s, the journal’s first full decade of publication. the 1970s are particularly pivotal in library technology, as it marks the introduction of the personal computer, as a hobbyist’s tool, to society. the web is still more than a decade away, but the seeds are being planted. with this issue, we introduce a new look for the journal — thanks to the work of lita’s web coordinating committee, and in particular kelly sattler (also a member of the editorial board), jingjing wu, and guy cicinelli. the new design is much easier on the eyes and more legible, and sports a new graphic identity for ital. board transitions june marks the changing of the editorial board. a significant number of board members’ terms expire this june 30, and i’d like to take this opportunity to thank those departing members for their years of service to information technology and libraries, and the support they have offered me this year as i began as editor. each has ably and generously contributed to the journal’s growth over the last years, and i thank them for their service to the journal and to ital: • mark cyzyk (johns hopkins university) • mark dehmlow (notre dame university) • sharon farnel (university of alberta) • kelly sattler (michigan state university) • sandra shores (university of alberta) these are big shoes to fill, but i am excited about the new members who have been appointed for two-year terms beginning july 1, 2018. in march, we extended a call for volunteers for 2 -year terms on the editorial board. we received almost 50 applications, and ultimately added seven new members: • steven bowers (wayne state university) • kevin ford (art institute of chicago) • cinthya ippoliti (oklahoma state university) • ida joiner (independent consultant) • breanne kirsch (university of south carolina upstate) • michael sauers (do space, omaha, nebraska) • laurie willis (san jose public library) readership survey summary over the past three months, we ran a survey of the ital readership to try to understand a bit more detail about who you are, collectively. the survey received 81 complete responses out of about 11,000 views of pages with the survey link on it. here are some brief summary results: • nearly half (46%) of respondents have attended at least one lita event (in-person or online). letter from the editor | varnum 2 https://doi.org/10.6017/ital.v37i2.10571 • three quarters (75%) of respondents are from academic libraries. public, special, and lis programs make up an additional 20%. • the majority (56%) are librarians, with the remaining spread across a number of other roles. • almost two thirds (63%) of respondents have never been lita members, a quarter (25%) are current members, and the remainder are former members. • about four fifths (81%) of responses came from the current issue (either the table of contents or individual articles). an invitation what can you share with your library colleagues in relation to technology? if you have interesting research about technology in a library setting, or are looking for a venue to share your your case study, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2018 mailto:varnum@umich.edu board transitions readership survey summary an invitation letter from the editor: reviewers wanted letter from the editor reviewers wanted kenneth j. varnum information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13xxx together with one of the other journals published by ala’s core division, information technology and libraries (ital) and library leadership and management (ll&m) invite applications for peer reviewers. serving as a reviewer is a great opportunity for individuals from all types of libraries and with a wide variety of experience to contribute to scholarship within our chosen profession. we are seeking the broadest pool of reviewers possible. reviewer responsibilities for both journals are to have an interest/experience with the journal’s topics, as described below. reviewers should expect to review 2-4 articles a year and should provide thoughtful and actionable comments to authors and the editor. reviewers will work with the editor, associate editor, and/or editorial board of the corresponding journal. see the job description for ital reviewers for more details about this new role. we welcome applications from individuals at libraries of all types, levels of experience, locations, perspectives, and voices, especially those from underrepresented groups. reviewers will be selected to maximize the diversity of representation across these areas, so if you’re not sure if you should apply, please do! increasing the pool of reviewers for information technology and libraries is part of the editorial board’s desire to provide equitable treatment to submitted articles and will enable us to follow a more typical process for peer-reviewed journals: a two-reviewer double-blind process. that will be a welcome and, frankly, overdue change to ital’s current process, in which submitted articles are typically reviewed by one person. expanding the number of reviewers across the breadth of subject areas our journal covers will foster a more rigorous yet more open review process. should you be more interested more in the policy side of this journal, please watch out for a call for volunteers for the ital editorial board. that process will start in april. * * * * * * * as this issue of the journal goes online, covid as a global health crisis has just entered its second year. i’m constantly reminded of the duality of our collective ability to show resilience and exhibit fragility as we continue to endure this period. when i wrote the letter from the editor a year ago, i focused on the imminent vote to establish a new ala division, core, as the most important question facing me. how quickly things changed! by the time the march 2020 issue was published, everything was different. wherever you are, however you have adapted to the situation, i hope you are well and, like me, are turning from wondering when this period will end, to wondering what “normal” will be in the post-pandemic world. kenneth j. varnum, editor varnum@umich.edu march 2021 https://docs.google.com/forms/d/e/1faipqlsc7fxjjk6vwute5pwxwpu_udxjrygpatkpqu4fzib9lj08sna/viewform?usp=sf_link https://docs.google.com/forms/d/e/1faipqlsc7fxjjk6vwute5pwxwpu_udxjrygpatkpqu4fzib9lj08sna/viewform?usp=sf_link https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing https://ejournals.bc.edu/index.php/ital/about/editorialteam https://doi.org/10.6017/ital.v39i1.12137 mailto:varnum@umich.edu president’s message andromeda yelton information technology and libraries | december 2017 2 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, united states. before i dive into my column, i’d like to recognize and thank bob gerrity for his six years of service as ital’s editor in chief. he oversaw our shift from a traditional print journal to a fully online one, recognized by micah vandegrift and chealsye bowley as having the strongest open-access policies of all lis journals (http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/). i’d like to further extend a welcome to ken varnum as our new editor in chief. ken’s distinguished record of lita service includes stints on the ital editorial board and the lita board of directors, so he knows the journal very well and i am enthusiastic about its future under his lead. i’m particularly curious to see what will be discussed in ital under ken’s leadership because i’ve just come back from two outstanding conferences which drove home the significance of the issues we wrestle with in library technology, and i’m looking forward to a third. in early november, i attended lita forum in scenic denver. the schedule was packed with sessions on intriguing topics – too many, of course, for me to attend them all – but two in particular stand out to me. in one, sam kome detailed how he’s going about a privacy audit at the claremont colleges library. he walked us through an extensive – and sometimes surprising – list of places personally identifiable information can lurk on library and campus systems, and talked through what his library absolutely needs (which is less than he’d thought, and far less than the library has been logging without thinking about it). in the other, mary catherine lockmiller took a design thinking approach to serving transgender populations. she shared a fantastic, practical libguide (http://libguides.southmountaincc.edu/transgenderresources), but the part that stuck with me most is her statement that many trans people may never physically enter a library because public spaces are not safe spaces; for this population, our electronic services are our public services. as technologists, we create the point of first, and maybe only, contact. a week later, i attended the inaugural data for black lives conference (http://d4bl.org/) at the mit media lab, steps from my office. this was – and i think everyone in the room felt it – something genuinely new. from the galvanizing topic, to the sophisticated visual and auditory design, to the frisson of genius and creativity buzzing all around a room of artists, activists, professors, poets, data scientists and software engineers, it was a remarkable experience for us all. those of you who heard dr. safiya noble speak at thomas dowling’s lita president’s program in 2016 are familiar with algorithmic bias. numerous speakers discussed this at d4bl: the ways that racial disparities in underlying data sets can be replicated, magnified, and given a veneer of objective power when run through the black boxes that power predictive policing or risk assessment for bail hearings. absent and messy data was a theme as well: in a moment that would make many librarians chuckle (and then wince) knowingly, a panel of music industry executives estimated that 40% of their metadata is wrong, thus making it impossible to credit and compensate artists appropriately. mailto:andromeda.yelton@gmail.com) https://www.google.com/url?q=http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/&sa=d&ust=1512118443864000&usg=afqjcnedfyl-ywfgnadmdzfcrvvnmhlhhq http://libguides.southmountaincc.edu/transgenderresources http://d4bl.org/) president’s message | yelton 3 https://doi.org/10.6017/ital.v36i4.10238 and yet – in a memorable keynote – dr. ruha benjamin called on us not only to collect data about black death, as she showed us an image of the ambulance bill sent to tamir rice’s family, but to listen to our artists and poets as we use our data to imagine black life – this in front of an image of wakanda. with our data and our creativity, what new worlds can we map? several of my mit colleagues also attended d4bl, and as we discussed it afterward we started thinking about how these ideas can drive our own work. how does the imaginary world of wakanda connect to the archival imaginary, and what worlds can we empower our own creators to imagine with what we collect and preserve? how can we use our data literacy and access to sometimes un-googleable resources to help community groups collate data on important issues that are not tracked by our public institutions, such as police violence (https://mappingpoliceviolence.org/) or racial disparities in setting bail? with these ideas swirling in my mind, i am looking forward with tremendous excitement to lita forum 2018. building on the work of our forum assessment task force, we’ll be doing a lot of things differently; in particular, aiming for lots of hands-on, interactive sessions. this will be a conference where, whether you’re a presenter or an attendee, you’ll be able to do things. and these last two conferences have driven home for me how very much there is to do in of library technology. our work to select, collect, preserve, clean, and provide access to data can indeed have enormous impact. technology services are front-line services. https://mappingpoliceviolence.org/) reproduced with permission of the copyright owner. further reproduction prohibited without permission. editorial: i inhaled helmer, john f information technology and libraries; jun 2000; 19, 2; proquest pg. 59 editorial: i inhaled t his editorial introduces the third special issue of information technology and libraries dedicated to library consortia, and the second primarily aimed at surveying consortial activities outside the united states. 1 the concept of a special consortial issue began in 1997 as an outgrowth of a sporadic and wide-ranging discussion with jim kopp, editor of ital 1996-98. at the time, jim and i were involved in the creation and maturation of the orbis consortium in oregon and washington. jim was a member and later chair of the governing council and i was chief volunteer staff person and finding myself increasingly absorbed by consortial work. our discussions lasted more than a year and were sustained by many e-mail messages and several enjoyable conversations over bottles of nut brown ale. in the mid-1990s it seemed obvious that we were witnessing the beginning of a renaissance in library consortia. consortia had been around for many years but now established groups were showing renewed vigor and new groups seemed to be forming every day. why was this happening? what were all these consortia doing? jim and i discussed these questions and speculated on future roles for library consortia and their impact on member libraries. library consortia seemed an ideal topic for a special issue of ital. my initial goal as guest editor of ital was to take a snapshot of a variety of consortia and begin to better understand the implications of the explosive growth we were witnessing. while assembling the march 1998 issue i soon realized that consortia were all over the map, both figuratively and literally. a small amount of study revealed a tremendous variety of consortia and a truly worldwide distribution. although american consortia were starting to receive attention in the professional literature, a great deal of important work was occurring abroad. this realization gave rise to the september 1999 issue and the present issue dedicated to consortia from around the world. in addition to six articles from the united states, these three special issues of ital include contributions from south africa, canada, israel, spain, australia, brazil, john f. helmer china, italy, micronesia, and the united kingdom. taken together these groups represent a dizzying array of organizing principles, membership models, governance structures, and funding models. although most are geographically defined, the type of library they serve also defines many. virtually all license electronic resources for their membership but many offer a wide variety of other services including shared catalogs, union catalogs, patron-initiated borrowing systems, authentication systems, cooperative collection development, digitizing, instruction, preservation, courier systems, and shared human resources. each consortium is formed by unique political and cultural circumstances, but a few themes are common to all. it is clear that the technology of the web, the increasing importance of electronic resources, and advances in resource-sharing systems have created new opportunities for consortia. beyond these technological and economic motivations, i believe that in consortia we see the librarian's instinct for collaboration being brought to bear at a time of great uncertainty and rapid change. librarians often forget that as a profession we collaborate and cooperate with an ease seldom seen in other endeavors. there is safety in numbers and in uncertain times it helps to confer with others, spread risk over a larger group, and speak with a collective voice. library consortia fulfill these functions very well and their future continues to look bright. as i conclude my duties as guest editor i would like to thank jim kopp for sparking my interest in this project and for several years of stimulating conversation. special thanks are due to managing editors ann jones and judith carter as well as the helpful and professional staff at ala production services. obstacles of language and time differences make composing and editing a publication such as this unusually challenging. the quality and cohesivejohn f.helmer(jhelmer@darkwing.uoregon.edu) is executive director, orbis library consortium. production: ala production services (troy d. linker, kevin heubusch; ellie barta-moran, angela hanshaw, and karen sheets), american library association, 30 e. huron st., chicago, il 60611. publication of material in infornrntion trclz110logy and libraries does not constitute official endorsement bv lita or the ala. . abstracted in computer & /11jtj1·11wtwn systems, compllting rn 1icws, il~{ormation science abstracts, library [-r lnforlllatio11 science abstracts, rtfrrati'unyi zlwrnal, i\iauclmaya i tckfrniclzeskaya l11fon11atsiya, otdyclnyi vyp11sk, and science abstracts pu{j/icnticms_ indexed in co111pu1\r!nth citation lndcx, comptdcr contents, co111putcr litaaturc lndc:r, current contc11ts/healtl1 scn.·iccs admi11istratio1l, current ccmtcnfs/social bclwuioral scic11ces, c11rrcnt index to journals in education, education, library literature, a1agazinc jndcj:, ncwscarcl1, and social sciences citation index. microfilm copies available to subscribers from university microfilms, ann arbor, michigan. for information sciences-permanence of paper for printed library materials, ansi 239.48-1992.= copyright ©2000 american library association. all material in this journal subject to copyright by ala may be photocopied for the noncommercial purpose of scientific or educational advancement granted by sections 107 and 108 of the copyright revision act of 1976. for other reprinting, photocopying, or translating, address requests to the ala office of rights and permissions. the paper used in this publication meets the minimum requirements of american national standard editorial 59 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ness of these issues of ital are due in large measure to the efforts of these individuals. in inhaling the spore, the editorial introduction to the first special consortial issue, i compared a librarian's involvement in consortia to the cameroonian stink ant's inhalation of a contagious spore. the effect of this spore is featured in mr. wilson's cabinet of wonder, lawrence weschler's remarkable history of the museum of jurassic technology. 2 weschler explains that, once inhaled, the spore lodges in the brain and "immediately begins to grow, quickly fomenting bizarre behavioral changes in its ant host." although the concept of a consortial spore is somewhat extreme (or "icky" according to my nine-yearold daughter) the editorial was an accurate reflection of my own sense of being inexorably drawn into a consortium-drawn not so much against my will but as a willing crazed participant. at the time i was nominally working for the university of oregon library system and vainly trying to keep consortial work in perspective. 60 information technology and libraries i june 2000 by the time of my second editorial, epidemiology of the consortia/ spore, i was exploring consortia around the world but still laboring under the illusion that i could keep my own consortium at arm's length. i must have failed since, as of this writing, i have left my position at the uo and now serve as the executive director of the orbis library consortium. like the cameroonian stink ant, i have inhaled the spore and am now happily laboring under its influence. references and notes 1. see ital 17, no. 1 (mar. 1998) and ital 18, no. 3 (sept. 1999). 2. lawrence weschler, mr. wilson's cabinet of wonder (new york: vintage books, 1995). the museum of jurassic technology (www.mjt.org) is located in culver city, calif. see www.mjt.org/ exhibits/stinkant.html for more on the cameroonian stink ant. google us! capital area district libraries gets noticed with google ads grant public libraries leading the way google us! capital area district libraries gets noticed with google ads grant sheryl cormicle knox and trenton m. smiley information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12089 sheryl cormicle knox (knoxs@cadl.org) is technology director for capital area district libraries. trenton m. smiley (smileyt@cadl.org) is marketing & communications director for capital area district libraries. increased choices in the marketplace are forcing libraries to pay much more attention to how they market themselves. libraries can no longer simply employ an inward marketing approach that speaks to current users through printed materials and promotional signage plastered on the walls. furthermore, they cannot rely on occasional mentions by the local media as the primary driver of new users. that’s why in 2016, capital area district libraries (cadl), a 13 branch library system in and around lansing, michigan, began using more digital tactics as a cost-effective way to increase our marketing reach and to have more control over promoting the right service, at the right time, to the right person. one example of these tactics is ad placement on the weather channel app. this placement allows ads about digital services like overdrive and hoopla to appear when certain weather conditions, such as a snowstorm, occur in the area. in 2017, while attending the library marketing and communications conference in dallas, our marketing and communications director had the good fortune of sitting in on a presentation by trey gordner and bill mott from koios (www.koios.co) on how to receive up to $10,000 of in-kind advertising every month from a google ad grants (www.google.com/grants). during this presentation, koios offered participants a 60day trial of their services to help secure the google ad grants and create a few starter campaigns. google ads are text-based and appear in the top section of google's search results, along with the ads of paying advertisers. nonprofits in the google ad grants program can set up various ad campaigns to promote whatever they like—the overall brand of the library, the collection, and various events, meeting room offerings or any other product or service. the appearance of each google ad is triggered by keywords chosen for each campaign. after cadl's trial period expired, we decided to retain koios to oversee the google ad grants project. while the library has used google ads for the sharing of video, we had not done much with keyword advertising. so, we were excited to learn more about the process of using keywords and the funding available through the grant. we viewed this as a great new tool to add to our marketing toolbox. it would help us achieve a few of our marketing goals: expanding our overall marketing reach and digital footprint by 50 percent; increasing the library’s digital advertisement budget by 300% (by using alternative funding); and promoting the right service at the right time. getting started koios coached us through the slalom course of obtaining accounts and setting them up. to secure the monthly ad grant, we first obtained a validation key from tech soup (www.techsoup.org), the nonprofit that makes technology accessible to other non-profits and libraries. that, in turn, pre-qualified us for a google for nonprofits account. (at the time, we were able to get a validation token from our existing tech soup account, but koios currently recommends starting by registering a 501c3 friends organization or library foundation with tech soup whenever possible.) after creating our google for nonprofits account, we used the same account username to create a google ads account. finally, to work efficiently with koios, mailto:knoxs@cadl.org mailto:smileyt@cadl.org https://www.koios.co/ https://www.google.com/grants https://www.techsoup.org/ information technology and libraries march 2020 google us! | knox and smiley 2 we provided them access to our google analytics property (which we have configured to scrub patron identifying information) and our google tag manager account (with the ability to create tags that we in turn review and approve). if you are taking the do-it-yourself approach, google has a step-by-step google ad grants activation guide and extensive help online. designing campaigns spending money well is hard work and that holds true with keyword search ads as well. there are some performance and ad quality requirements in the grant program that must be observed to retain your monthly allotment. understanding these guidelines and implementing campaigns that respect them, while working well enough to spend your grant allocation requires study and patience. again, we relied on koios to guide us. they helped us create campaigns and ad groups within those campaigns that were effective within the grant program. figure 1. example of minecraft title keyword landing page created by koios. information technology and libraries march 2020 google us! | knox and smiley 3 in august 2018, we started with campaigns for general branding awareness that included ads aimed at people actively searching for local libraries and our core services. these ads funnel users to our homepage and our online card signup. they are configured to display only to searchers who are geographically located in our service area. this campaign has been grown and perfected over 18 months into one of our most successful campaigns, garnering over 2,300 impressions and 650 clicks in january 2020, yet it spends just $450 of our grant funds. another consistent performer for us has been our digital media campaign with ads targeting users searching for ebooks and audiobooks. by june 2019 we had grown our grant spend to $1,500 a month using 27 different campaigns. the game changer for us has been working with koios to create campaigns based on an export of marc records from our catalog. we worked with koios to massage this data into a very simple pseudo-catalog of landing pages based on item titles. the landing page is very simple and seo friendly so that it ranks well in the split-second ad auction competition that determines whether your ad will be displayed. it has cover images, clear calls to action, loads fast, is mobile friendly and communicates the breadth of formats held by the library (see figure 1). clicking the item title or the borrow button sends users straight into our full catalog to get more information, request the item, or link to the digital version. figure 2. a user search in google for “dad jokes” showing a catalog campaign ad. grant program ads are displayed below paid ads. the format of the ad may vary as well. this version shows several extensions, like phone number, site links, and directions links. information technology and libraries march 2020 google us! | knox and smiley 4 figure 3. the landing page displayed to the searcher after they click on the ad and the resulting catalog page if the searcher clicks the borrow button. in google ads, koios created 14 catalog campaigns out of the roughly 250,000 titles we sent them. each campaign has keywords (single words and phrases from titles) derived from roughly 18,000 titles ranked by how frequently they are used in google search. again, these ads are limited geographically to our service area. figures 2 and 3 illustrate what a google searcher in ingham county, michigan, potentially encounters when searching for “dad jokes”. since their inception in september 2019, these catalog campaigns have been top performers for us, generating clickthrough rates of 8-15% and a couple thousand additional ad clicks monthly, the aggregation of a small number of clicks on any one ad from our “long tail” of titles. we are now spending over $5,000 of our grant funds and garnering nearly 23,000 impressions and 3,000 ad clicks monthly. results in general, we find that our google ads have succeeded in drawing additional new visitors to our web site. using our long-established google analytics implementation that measures visits to our website and catalog combined, we compared the third quarter of 2018, when we were ramping up our google ad grants campaigns, to the third quarter of 2019, after our catalog campaign was firmly established. the summary numbers are encouraging. the number of users is up 17%, and number of sessions is up 4%. within the overall rise in users, returning users are up 9%, but new users are up 25%. therefore, we are getting more of those coveted, elusive “non-library-users” to visit us online. when comparing the behavior of new and returning visitors, we also see that the overall increase in sessions was achieved despite the head wind of a 4% decline in returning visitor sessions. however, are the new visitors engaging? perhaps the most tangible measure of engagement for a public library catalog is placing holds. we have a google analytics conversion goal that measures those holds. the information technology and libraries march 2020 google us! | knox and smiley 5 rate of conversion on the hold goal among new visitors rose 7%, while dropping 13% among returning visitors. from other analysis, we know that our highly-engaged members are migrating to our mobile app and to digital formats, so the drop for returning users is explainable and the rise among new visitors is hopeful. we are working on ways to study more closely these new visitors so that we can discover and remove more barriers in the way of them becoming highly engaged members of their public library. future plans with the help of koios, new campaigns will be created to promote our blogs and podcasts. we will also link a campaign to our demco events database. finally, in partnership with koios, we will work with patron point to incorporate our automated email marketing system into google ad campaigns. we will add campaigns for pop-up ads that encourage library card signup through our online registration system. once someone signs up for a library card online, the system will trigger a welcome email that promotes some of our core services. this on-boarding set-up will also include an opportunity for the new cardholder to fill out a form to tailor content in future emails to their interests. through all these means, cadl leads the way in delivering the right service, at the right time, to the right person. getting started designing campaigns results future plans journey with veterans: virtual reality program using google expeditions public libraries leading the way journey with veterans virtual reality program using google expeditions jessica hall information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12857 jessica hall (jessica.hall@fresnolibrary.org) community librarian, fresno county public library. © 2020. “where would you like to go?” is the question of the day. we have stood atop the great wall of china, swam with sea lions in the galapagos islands, and walked along the vast red sands of mars. each journey was unique and available through the library. as a community librarian in charge of outreach to seniors and veterans, i first learned about the virtual tour idea from a colleague who returned from a conference excited to tell me about a workshop she had attended. the workshop she had taken described a program which utilized google expeditions to take seniors on virtual tours. this idea stayed with me for months until fresno county public library obtained the $3000 value of libraries grant, which was funded by the california library services act. as a part of this grant, $2905 went to purchase a google expeditions kit and supplied to create a virtual reality program called journey with veterans. the kit includes 5 viewers and 1 tablet. a viewer is basically a google cardboard except the case is plastic and there is a smartphone inside of the case. during the program, i use the table to select and run each tour. the tour i select on the tablet is projected to the 5 viewers so participants can experience it. in this manner, veterans can explore places without physically having to travel anywhere. the journey with veterans program took the technology to the veterans instead of requiring them to come into the library. the two locations that were chosen were the veterans home of california fresno and the community living center at the va medical center in fresno, ca. from the time the program began in september 2019 to march 2020, when the pandemic shutdown brought a halt to the program, the library hosted 26 sessions at these two locations with 182 veterans. in sessions where more than 5 people were in attendance, the viewers were shared between the participants. the tablet and smartphones inside of the viewers have an app installed on them called google expeditions which is the software that runs the tours. one hotspot, which was already owned by the library, was used for this program. it is a requirement that all the viewers and the tablet are connected to the same wifi. having a portable wifi connection was necessary to run this program in locations where there was not access to a strong internet connection. each tour is a selection of still 360-degree views. the landscape does not move. instead, the participant turns their head around, up and down to look at the entire scene. the control tablet included additional menu items not seen by participants. these items included scripts that i can read off about the landscape we are looking at and suggested points of interest that i could highlight for participants. when i selected the point of interest on the tablet, the participant would see arrows pointing to that area of their screen. the participant would follow the arrows by turning their head in the direction that was indicated. the participants knew they were looking at the area of interest when the arrows disappeared and was replaced by a white circle surrounding the relevant portion of the screen. mailto:jessica.hall@fresnolibrary.org information technology and libraries december 2020 journey with veterans | hall 2 the viewers did not have straps attached to them and there was no way to attach straps to them. therefore, the viewer could not be strapped to the participant’s head. instead, the participant had to hold up the viewer the entire time they wished to look through it. this presented a challenge for participants who did not have the ability to hold the viewer on their own. at the locations i went to, there were staff available to help and they would hold the viewer up to a participant’s eyes. in some cases, one staff person held the viewer up for the participant while another would turn the participant’s wheelchair in a circle so they could see the entire image. each program lasted 30-45 minutes but the amount of time looking through the viewer was kept to around 15-20 minutes. the rest of the time was filled with talking about the location that we are viewing. for the veterans in memory care at the veterans home of california fresno, this program was designed with the hope that it would allow the veterans to reminisce about places they had visited and lived in and encouraged them to talk about their experiences. some of the participants had been to the countries that we visited virtually and they reminisced on their time there. at every session, the participants shared their enthusiasm and eagerness to continue the program. the program once was tried with music. on one of my first visits to the community living center at the va medical center, a participant asked if he could play music in the background. since i had thought about incorporating music into the program, i agreed, and the participant played some classical music from his own device. though it was a good idea, the execution did not work well. the music was coming from one location, which made it too loud when one stood near it but too quiet once one walked too far away. i found the music difficult to talk over while giving the tour. i believe that incorporating sounds of the location we visit, such as the sounds of the countryside or a big city would make the experience more immersive. however, i have yet to find a way to do so successfully. after the grant ended, i continued the program at both locations. the partnership i had created at the veterans home of california-fresno grew into a second program, storytime with veterans which was requested by specifically by the residents. i alternated my visits so that some weeks we did a virtual reality program and some weeks i read to them. one time, there was miscommunication and the activity coordinator thought i had come to read a story but i was under the impression that it was a virtual reality week and so i had brought the google expeditions with me. the solution was to do both. one of the google expeditions tours is a very short and much abridged virtual reality version of twenty thousand leagues under the sea by jules verne. the tour used artwork to represent scenes from the books and each scene tells a different part of the story. the veterans home’s residents were treated to both a story and a virtual reality tour at the same time. up until the library’s shutdown in mid-march due to covid-19, i was in the process of expanding the use of the google expeditions but was unable to continue. since then, the equipment has not been used. restarting the program now includes multiple challenges, not the least of which is sanitizing the devices. sanitation was a consideration even before covid-19 and sanitary virtual reality masks were acquired using grant funds as part of the initial program. these masks look like strips of cloth that line the eyes with strings to hook it around the ears to hold it in place. cleaning products were also purchased and utilized to clean the devices after each program. information technology and libraries december 2020 journey with veterans | hall 3 before covid-19, a viewer could be handled by multiple people before it was cleaned. i always handled them first to prepare them for use. then i handed each one to the participant. occasionally they were also handled by staff. i always cleaned the viewers right after the program ended but not during the program. with the current covid-19 restrictions, the sanitation practices previously used are inadequate. i do not know the future of the program in a postcovid-19 world, but i intend to begin the program again once when it becomes safe to do so and i will incorporate all required precautions and restrictions. i look forward to once more being able to take veterans on exciting virtual journeys. intro to coding using python at the worcester public library public libraries leading the way intro to coding using python at the worcester public library melody friedenthal information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12207 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. abstract the worcester public library (wpl) offers several digital learning courses to our adult patrons, and among them is “intro to coding using python”. this 6-session class teaches basic programming concepts and the vocabulary of software development. it prepares students to take more intensive, college-level classes. the bureau of labor statistics predicts a bright future for software developers, web developers, and software engineers. wpl is committed to helping patrons increase their “hireability” and we believe our python class will help patrons break into these lucrative and gratifying professions… or just have fun. history and details of our class i came to librarianship from a long career in software development, so when i joined the worcester public library in january 2018 as a public services librarian, my manager proposed that i teach a class in programming. she asked me to research what language would be best. python got high marks for ease of use, flexibility, growing popularity, and a very active online community. once i selected a language, i had to choose an environment to teach it in – or so i thought. i had absolutely no experience in front of a classroom, and few pedagogical skills, so i sought out an online python course within which to teach. i decided to use the code academy (ca) website as our programming environment. ca has selfguided classes in a number of subjects and the free beginning python course seemed to be just what we needed. i went through the whole class myself before using it as courseware. my intent was to help students register for ca, then, each day, teach them the concepts in that day’s ca lesson. they would then be set to do the online lesson and assignments. we first offered python in june 2018. problems with ca came up right from the start: students registered for the wrong class (despite the handout explicitly naming the correct class) and ca frequently tried to upsell to a not-free python class. since ca’s classes are moocs (massive open online courses), the developers built in an automated way of correcting student code: embedded behind each web page of the course, there’s code that examines the student’s code and decides whether it is acceptable or not. good in theory, not so good in practice. ca’s “code-behind” is flawed and sometimes prevented students from advancing to the next lesson. mailto:mfriedenthal@mywpl.org information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 2 moreover, some of the ca tasks were inane. for example, one lesson incorporated a kind of mad libs game. this is where the instructions ask, for example, for 13 nouns and 11 adjectives, and these are combined with set sentences to generate a silly story. this assignment turned out to be too long and difficult to fulfill, preventing students from advancing. although i used ca the first few times i offered the class, i subsequently abandoned it and wrote my own classroom material. after determining that ca wasn’t appropriate, i chose an online ide where the students could code independently. this platform worked well when i tested it ahead of time, but when the whole class tried to log on at once, we received denial-of-service error messages. hurriedly moving on to plan c, i chose thonny, a free python ide which we downloaded to each pc in the lab (see https://thonny.org/). each student receives a free manual (see figure 1), which i wrote. every time i’ve offered this class i’ve edited the manual, clarifying those topics the students had a hard time with. i’ve also added new material, including commands students have shown me. it is now 90 pages long, written in microsoft word, and printed in color. we use soft binders with metal fasteners. figure 1. intro to coding using python manual developed for the course. https://thonny.org/ information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 3 the manual consists of the following sections: • cover: course name, dates we meet, time class starts and ends, location, instructor’s name, manual version number, and a place for the student to write their own name. • syllabus: goals for each of the six sessions. this is aspirational. • basic information about programming, including an online alternative to thonny, for students who don’t have a computer at home and wish to use our public computers for homework. • lessons 1 – 17: “hello world” and beyond. • lesson 18: object oriented design, which i consider to be advanced, optional material. skipped if time is pressing or the class isn’t ready for it. • lesson 19: wrap-up: o how to write good code. o how to debug. o list of suggested topics for further study. o online resources for python forums and community. • list of wpl‘s print resources on python and programming. • relevant comic strips and cartoons. in march 2019, my manager asked me to start assigning homework. if a student attends all six sessions and makes a decent attempt at each assignment, at the sixth session they receive a certificate of completion. the certificate has the wpl name & logo, the student’s name, and my signature. typically three or four students earn a certificate. homework is emailed to me as an attachment. this class meets on tuesday evenings and i tell students to send me their homework as soon as possible. inevitably, several students don’t email me until the following monday. while i don’t give out grades, i do spend considerable time reviewing homework, line by line, and i email back detailed feedback. when the january 2020 course started, i found that between october’s class and january, outlook implemented a security protocol which removes certain file extensions from incoming email. and – you can see where this is going – the .py python extension was one of them. i told students to rename their python code files from xxxx.py to xxxx.py.doc, where “xxxx” is their program name. this fools outlook into thinking the file is a microsoft word document and the email is delivered to me intact. when it arrives, i remove the .doc extension from the attachment and save it to a student-specific file. then i open the file in thonny and review it. physically, our computer lab contains an instructor’s computer and twelve student computers (see figure 2). it also has a projector which projects the active window from the instructor’s computer onto a screen: usually the class manual. i use dry erase markers in a variety of colors to illustrate the concepts on a whiteboard. there is also a supply of pencils on hand for student notetaking use. the class is offered once per season. although the classroom can accommodate twelve students, we set our maximum registration to fourteen, which allows us to maximize attendance even if patrons cancel or don’t show up. and if all fourteen do attend the first class, we have two lap tops i information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 4 can bring into the lab. we also maintain a small waitlist, usually of five spots. we’ve offered this class seven times and the registration and waitlists have been full every time. sometimes we have to turn students away. figure 2. classroom at worcester public library. however, we had a problem with registered patrons not showing up, so last spring we implemented a process where, about a week before class starts, i email each student, asking them confirm their continued interest in the class. i tell them that if they are no longer interested—or don’t respond i will give the seat we reserved for them to another interested patron (from the waitlist). in this email i also outline how the course is structured and that they can each earn a certificate of completion. i tell them class starts promptly at 5:30 and to please plan accordingly. some students don’t check their email. some patrons show up without ever registering; they are told registration is required and to try again in a few months. i keep track of attendance on an excel spreadsheet. here in worcester, ma, weather is definitely a factor for our winter sessions. information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 5 over time i’ve made the class more dynamic. i have a student read a paragraph in the manual aloud. i’ve switched around the order of some lessons, in response to student questions. i have them play a game to teach boolean logic: “if you live in worcester and you love pizza, stand up!”… then: “if you live in worcester or you love pizza, stand up!” students range from experienced programmers (of other languages), to people with no experience but great aptitude, to people who just never seem to “get it”. this material is technical and i try hard to communicate the concepts but i lose a few students every time. we ask our patrons for feedback on all of our programs. our python students have written: • “… the classes were formatted in an organized manner that was beginner friendly” • “the manual is a big help. i'm thankful that the program is free.” • “… coding is fun and i learned a new skill.” • “this made me think critically and helped me understand where my errors in the programs were.” wpl is proud to offer classes that make a difference in our patrons’ lives. abstract history and details of our class online ticketed-passes: a mid-tech leap in what libraries are for public libraries leading the way online ticketed-passes: a mid-tech leap in what libraries are for jeffrey davis information technology and libraries | june 2019 8 jeffrey davis (jtrappdavis@gmail.com) is branch manager at san diego public library, san diego, california. last year a library program received coverage from the new york times, the wall street journal, the magazines mental floss and travel+leisure, many local newspapers and tv outlets, online and trade publications like curbed, thrillist, and artforum, and more. that program is new york’s culture pass, a joint program of the new york, brooklyn, and queens public libraries. culture pass is an online ticketed-pass program providing access to area museums, gardens, performances, and other attractions. as the new york daily news wrote in their lede: “it’s hard to believe nobody thought of it sooner: a new york city library card can now get you into 33 museums free.” libraries had thought of it sooner, of course. museum pass programs in libraries began at least as early as 1995 at boston public library and the online ticketed model in 2011 at contra costa (ca) county library. the library profession has paid this “mid-tech” program too little attention, i think, but that may be starting to change. what are online ticketed-passes? the original museum pass programs in libraries circulate a physical pass that provides access to an attraction or group of attractions. sometimes libraries are able to negotiate free or discounted passes but many times the passes are purchased outright. the circulating model is still the most common for library pass programs, but it suffers from many limitations. passes by necessity are checked out for longer than they’re used. they sit waiting for pick up on hold shelves and in transit to their next location. long queues make it hard for patrons to predict when their requests will be filled, and therefore difficult to plan on using. for the participating attractions, physical passes are typically good anytime and so compete with memberships and paid admission. there are few ways to shape who borrows the passes in order to meet institutional goals. and there are few ways to limit repeat use by library patrons to both increase exposure and nudge users toward membership. as a result, most circulating pass programs only connect patrons to a small number of venues. despite these limitations, circulating passes have been incredibly popular: at writing there are 967 requests for san diego public library’s 73 passes to the new children’s museum. we sometimes see that sort of interest in a new bestseller, but this is a pass that sdpl has offered continuously since 2009. in 2011, contra costa county library launched the first “ticketed-pass” program, discover & go. discover & go replaced circulating physical passes with an online system with which patrons, remotely or in the library with staff assistance, retrieve day-passes — tickets — by available date or venue. this relatively simple and common-sense change makes an enormous difference. in addition to convenience and predictability for patrons, availability is markedly increased because venues are much more comfortable providing passes when they can manage their use: patrons can be restricted to a limited number of tickets per venue per year and venues can match the information technology and libraries | june 2019 9 number of tickets available to days that they are less busy. the latter preserves the value of their memberships while making use of their own “surplus capacity” to bring in new visitors and potential new members. funding and internal expectations at many venues carry obligations to reach underserved communities and the programs allow partner attractions to shape public access and receive reporting by patron zip code and other factors. the epass software behind discover & go is regional by design and supports sharing of tickets across multiple library systems in ways that are impractical to do with physical passes. as new library systems join the program, they bring new partner attractions into the shared collection with them. the oakland zoo, for example, needs only to negotiate with their contact at oakland public library to coordinate access for members of oakland, san francisco, and san jose public libraries. because of the increased attractiveness of participation, it’s been easier for libraries to bring venues into the program. in 2011, discover & go hoped for a launch collection of five museums but ultimately opened with forty. the success of ticketed-pass programs in turn attracts more partners. today, discover & go is available through 49 library systems in california and nevada with passes to 137 participating attractions. similarly, new york’s culture pass launched with 33 participating venues and has grown in less than a year to offer a collection of 49. while big city programs attract the most attention, pass programs are offered by county systems like alamace county (nc), consortiums like libraries in clackamas county (or), small cities like lawrence (ma), small towns like atkinson (nh), and statewide like the michigan activity pass which is available through over 600 library sites with tickets to 179 destinations plus state parks, camping, and historical sites. for each library, the participating destinations form a unique collection: a shelf of local riches, idiosyncratic and rooted in place. through various libraries one can find tickets for the basketball hall of fame, stone barns center for food and agriculture, dinosaur ridge, eric carle museum of picture book art, bushnell park carousel, california shakespeare theater, children’s museums, zoos, aquariums, botanical gardens, tours, classes, performances, and on to the met, moma, crocker, de young, and many, many, many more. for kids, “enrichments” like these are increasingly understood as essential parts of learning and exploration. for adults, access to our cultural treasures, including partners like san francisco’s museum of the african diaspora or chicago’s national museum of puerto rican arts & culture — besides being its own reward — enhances local connection and understanding. we’re also starting to see the ticketing platform itself become an asset to smaller organizations — craft studios, school performances, farm visits, nature centers, and more — that want to increase public access without having to take on a new ability. importantly, ticketed-pass programs are built on the core skills of librarians: information management, collection development, community outreach, user-centered design, customer service, and technological savvy. the technology discover & go was initially funded by a $45,000 grant from the bay area library and information system (balis) cooperative. contra costa contracted with library software company quipu group to develop the epass software that runs the program and that is also used by ny’s culture pass, public libraries leading the way: online ticketed passes | davis 10 https://doi.org/10.6017/ital.v38i2.11141 multnomah county (or) library’s my discovery pass, and a consortium of oregon libraries as cultural pass. ticketed-pass software is also offered by the libraryinsight and plymouth rocket companies and used by denver public library, seattle public library, the michigan activity pass, and others. the software consists of a web application with a responsive patron interface and connects over sip2 or vendor api to patron status information from the library ils. administrative tools set finegrained ticket availability, blackout dates, and policies including restrictions by patron age, library system, zip code, municipality, number of uses allowed globally and per venue, and more. recent improvements to epass include geolocation to identify nearby attractions and improved search filters. still in development are transfer of tickets between accounts, re-pooling of unclaimed tickets, and better handling of replaced library cards. the strength that comes from multi-system ticketed-pass programs also carries with it challenges on the patron account side. ilses each implement protocols and apis for working with patron account information differently and library systems maintain divergent policies around patron status. there’s a role for lita and for library consortia and state libraries to push for more attention to and consistency on patron account policies and standards. the emphasis in library automation is similarly shifting. our ilses originated to manage the circulation of physical items, a catalog-centric view. today, as robert anderson of quipu group suggested to me, a diverse range of online and offline services and non-catalog offerings orbit our users, calling for a new frame of reference: “it’s a patron-centric world now.” the vision library membership is the lynchpin of ticketed-pass and complementary programs in the technical sense, as above, and conceptually: library membership as one’s ticket to the world around. though i’m not aware of academic libraries offering ticketed-passes, they have been providing local access through membership. at many campuses, the library is the source for one’s library card which is also one’s campus id, onand off-campus cash card, transit pass, electronic key, print management, and more. that’s kind of remarkable and deserving of more attention. traditionally, librarians have responded to patron needs by providing information, resources, and services ourselves. new models and technologies are making it easier to complement this with the facilitation approach, of which online ticketed-passes are the quintessential example. we further increase access by reducing barriers of complexity, language, know-how, and social capital, for example, by maintaining community calendars of local goings-on or helping communities take advantage of nearby nature. online ticketed-pass programs will grow and take their place in the public’s expectations of libraries and librarians: that libraries are the place that help us (better, more equitably) access the resources and riches around us. powering this are important new tools for library technologists to interrogate and advance with the same attention we give to both more established and more speculative applications. president’s message: imagination and structure in times of change bohyun kim information technology and libraries | december 2018 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. in my last column, i talked about the discussion that lita had begun regarding forming a new division to achieve financial sustainability and more transparency, responsiveness, and agility. this proposed new division would merge lita with alcts (association for library collections and technical services) and llama (library leadership and management association). when this topic was brought up and discussed at an open meeting at the 2018 ala annual conference in new orleans, many members of these three divisions expressed interests and excitement. at the same time, there were many requests for more concrete details. you may recall that as a response to those requests, the steering committee, which consists of the presidents, presidents-elect, and executive directors of the three divisions decided to form four working groups with the aim of providing more complete information about what the new division would look like. today, i am happy to report that the work of the steering committee and the four working groups is well underway. the operations working group that i have been chairing for the last two months submitted its recommendations on november 23. the activities working group finished its report on december 5. the budget and finance working group also submitted its second report. the communications working group continues to engage members of all three divisions by sharing new updates and soliciting opinions and suggestions. most recently, it started gathering input and feedback on potential names for the new division.1 you can see the charges, member rosters, and current statuses of these four working groups in the ‘current information’ page at the ‘alcts/ llama/ lita alignment discussion’ community in the ala connect website (https://connect.ala.org/communities/allcommunities/all/all-current-information).2 to give you a glimpse of our work preparing for the proposed new division, i would like to share some of my experience leading the operations working group. the operations working group consisted of nine members, three from each division, in addition to myself as the chair and one staff liaison. we quickly became familiar with the organizational and membership structures of three divisions. the three divisions are similar to one another in size, but they have slightly different structures. lita has 18 interest groups (ig), 25 committees, and 4 (current) task forces; llama has 7 communities of practice (cop) and 46 discussion groups / committees / task forces; alcts has 5 sections, 42 igs, and 61 committees (20 at the division level and 41 at the section level). all committees and task forces in lita are division-level, while alcts and llama have committees that are either division-level or section/cop-level. alcts is unique in that it elects section chairs, who serve on the division board alongside with alcts directors-at-large. alcts also has a separate executive committee in addition to the board. llama has self-governed cops, which are formed by the board’s approval. among all three, lita has the most flat and simplest structure due to its intentional efforts in the past. for example, there are neither sections nor mailto:bohyun.kim.ois@gmail.com https://connect.ala.org/communities/allcommunities/all/all-current-information information technology and libraries | december 2018 3 communities of practice in lita, and the lita board eliminated the executive committee a few years ago. the steering committee of the three divisions agreed upon several guiding principles for the potential merger. these include (i) open, flexible, and straightforward member engagement, (ii) simplified and streamlined processes, and (iii) a governance and coordinating structure that engages members and staff in meaningful and productive work. the challenge is how to translate those guiding principles into a specific organizational structure, membership structure, and bylaws. clearly, some shuffling of existing sections, cops, and igs in three divisions will be necessary to make the new division as effective, agile, and responsive as promised. however, when and how such consolidation should take place? furthermore, what kind of guidance should the new division provide for members to re-organize themselves into a new and better structure? these are not easy questions to answer. nor are they something that can be immediately answered. some changes may require going through multiple stages for them to be completed. this may concern some members. they may prefer all these questions to have definitive answers before they decide on whether they will support the proposed new division or not. people often assume that a change takes place after a big vision is formed, and then the change is executed by a clear plan that directly translates that vision into reality in an orderly fashion. however, that is rarely how a change takes place in reality. more often than not, a possible change builds up its own pressure, showing up in a variety of forms on multiple fronts by many different people while getting stronger, until the idea of this change gains enough urgency. finally, some vision of the change is crafted to give a form to that idea. the vision for a change also does not materialize in one fell swoop. it often begins with incomplete details and ideas that may even conflict with one another in its first iteration. it is up to all of us to sort them out and make them consistent, so that they would become operational in the real world. recently, the steering committee reached an agreement regarding the final version of the mission, vision, and values of the proposed new division. i hope these resonate with our members and guide us well in navigating challenges ahead if the membership votes in favor of the proposal. the new division’s mission: we connect library and information practitioners in all career stages and from all organization types with expertise, colleagues, and professional development to empower transformation in technology, collections, and leadership, and to advocate for access to information for all. the new division’s vision: we shape the future of libraries and catalyze innovation across boundaries. the new division [name to be determined] amplifies diverse voices and advocates for equal and equitable access to information for all. the new division’s values: shared and celebrated expertise; strategically chosen work that makes a difference; transparent, equitable, flexible, and inclusive structures; empowering framework for experimental and proven approaches; intentional amplification of diverse perspectives; expansive collaboration to become better together. imagination and structure in times of change | kim 4 https://doi.org/10.6017/ital.v37i4.10850 in deciding on all operational and logistical details for the new division, the most important criteria will be whether a proposed change will advance the vision and mission of the new division and how well it aligns with the agreed-upon values and guiding principles. the steering committee and the working groups are busy finalizing the details about the new division. those details will be first reviewed by the board of each division and then shared with the membership at the midwinter for feedback. i did not anticipate that during my service as the lita president-elect and president, i would be leading a change as great as dissolving lita and forming a new division with two other divisions, alcts and llama. it has been an adventure filled with many surprises, difficulties, and challenges, to say the least. this adventure taught me a great deal about leading a change for an organization at a high level. when we move from the high-level vision of a change to the matter of details deep in the weeds, it is easy to lose sight of the original aspiration and goal that led us to the change in the first place. trying to determine as many logistical details becomes tempting to those in a leadership role because we all want to assure people in our organizations at a time of uncertainty and to make the transition smooth. however, creating a new division itself is a huge change at the highest level. it would be wrong to backtrack on the original goal to make the transition smooth. for it is the original goal that requires a transition, not vice versa. i believe those in a leadership role should accept that their most important work during the time of change is not to try to wrangle logistics at all levels but to keep things on track and moving in the direction of the original aspiration and goal. lita and two other divisions have many talented and capable members who will be happy to lend a hand in developing new logistics. the responsibility of leaders is to create space where those people can achieve that freely and swiftly and to provide the right amount of framework and guidance. i hope that all lita members and those associated and involved with lita see themselves in the vision, mission, and values of the new division, embrace changes from the lowest to the highest level, and work towards making the new vision into reality together. 1 you can participate in this process at https://connect.ala.org/communities/communityhome/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b1cb4a82b8d09 and http://www.allourideas.org/newdivisionname. 2 this ‘current information’ page will be updated as the plans for the new division develop. https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 http://www.allourideas.org/newdivisionname editorial board thoughts: critical technology cinthya ippoliti information technology and libraries | december 2018 5 cinthya ippoliti (cinthya.ippoliti@ucdenver.edu) is university librarian and director, auraria library, university of colorado. critical librarianship has brought many changes in how libraries have examined their programs and services, created new positions dedicated to equity, inclusion, and diversity, and paved the way to challenge existing assumptions about our work and environment. technology also exists in a space that is not neutral, as library systems and services reflect specific perspectives in their content and focus as well as how they are made accessible (or not). i would like to briefly examine how we can begin to think about these issues within academic libraries, and offer some additional readings for further reflection for four technology-related areas: spaces, services/programming, systems, and engaging with our users. technology spaces we might assume that because we are seeing students using our classrooms, makerspaces, and study areas, that we have been successful in meeting the needs of a wide variety of users. to a large extent that may be true, but we should also be asking ourselves who does not feel welcome in such a space and, more importantly, why not? there are two facets to this question. the first involves the degree to which libraries strive to create a welcoming environment. staff interactions, signage, hours, and institutional values are all part of a complex and broader environment that signals to users how these spaces function and how they are perceived by the organization. these same elements can also serve as deterrents through choices in layout, policy, or other intangible aspects so that they may in fact prevent individuals from entering these spaces in the first place. the second revolves around the notion that each technology-rich space conveys its level of friendliness and intended purpose through its physical presence. ensuring that furniture, paint, and layout are compliant with ada standards, and integrating these features with each other as opposed to setting them apart so that they are not considered “special” or “different,” is one small and vital step in this direction. maggie beers and teggin summers cover these issues an educause review article and discuss asking questions regarding how power structures are reinforced by having a “front” of the room or other configurations can enrich planning and assessment efforts. similarly, developing a plan so that new technology in areas such as makerspaces rotates as much as possible will help to provide access for those who may not be able to utilize these resources outside of the library context in order to accommodate differing skill levels, interests, and learning styles. in addition, students may not always be present on campus due to family, job, or other life circumstances and planning with the assumption that everyone who could benefit from using a particular space is in fact taking advantage of that benefit, is problematic. one way around that is to ensure that each space is as flexible as possible and (ideally) can be reconfigured for quiet reflection, collaborative work, or transformed into a sensory space or other type of specialized environment. the reservation process should be available both online and manually (as not critical technology | ippoliti 6 https://doi.org/10.6017/ital.v37i4.10810 everyone may have access to a computer and/or the internet), hour limitations should have several counter options, and the space should be available as much of the time as possible when it is not in use for more a more formalized purpose. any space usage assessments should also purposefully include non-users or perceived non-users and integrate questions about barriers to or about the space in their methodologies. finally, ensuring that the right level of staffing to support both the intended, as well as perhaps the unintended, uses of the space and the activities that occur within it will help create a sense that not only the space itself is valued, but that the experiences occurring within it are even more important. this is not easy to accomplish, as it is difficult to predict exactly how a space will be used unless there are very strict confines placed around its configuration and accessibility. but assuming that most spaces in libraries are designed to be malleable and keeping in constant communication with users via some of the methods described above should help. technology services and programming similarly, services and programs cannot be built around a one-size-fits-all model. this can prove to be quite challenging given the limited resources libraries face. engagement and learning lie not only in access to tools, but in the very process of sharing knowledge and experiences — whether for academic growth, social action, or simply personal enjoyment. matt ratto, who coined the term “critical making,” defines it as the process “intended to highlight the interwoven material and conceptual work that making involves.” he argues that “critical making is dependent on open design technologies and processes that allow the distribution and sharing of technical work and its results.” ratto makes the further point that this process also has the capacity of “unpacking the social and technical dimensions of information technologies.” this in turn allows for technology to become more than simply a cool resource, but rather a mechanism for democratizing this creative work of making and designing while dealing with its messy, political, and uncomfortable aspects which do not exist in vacuum outside of the tools themselves. an approach in this instance might involve taking technology outside of library spaces such as on campus or within the community, offering as much for free as possible, and capitalizing on programs such as girls who code (https://girlswhocode.com/) and grow with google (https://grow.google/). capturing how these resources are used in all of their possible permutations enables stories of individuals to shine through. the impact of these programs takes on a personal element through showcases, speaker events, and hackathons that are designed to bring the community together and engage in sharing of knowledge, perspectives, and conversations. in addition, this will hopefully shrink the barriers for those who don’t see themselves as having a role in these activities. integrated library systems i do not have a background in systems, but simon barron and andrew preater have written a great chapter unpacking the inherent power structures which manifest themselves in library systems such as the integrated library system (ils), discovery interfaces, and the third-party resources we provide access to. they suggest taking action by thinking about user privacy and ensuring that the information libraries are able to view, gather, and store is used ethically and that decisions for derivative services or actions are not made based on assumptions about gender identity, economic status, or other identifiers via access to these types of data. openness is another area the authors explore, as they discuss how libraries can use open source software whenever possible in order to balance the field against profit-based licensing models. barron and preater also raise a concern however that while crowdsourcing is in theory a good way to include the community in https://girlswhocode.com/ https://grow.google/ information technology and libraries | december 2018 7 developing ways to help itself, it still does not recognize the limited resources marginalized populations can dedicate to these efforts. finally, they discuss how it is crucial for libraries to recognize and support the expertise needed in this arena in order to avoid overreliance on vendor systems that can prove alluring with out-of-the-box solutions, but which compromise things like privacy, autonomy, and customization that might otherwise benefit from equity, diversity, and inclusion-centered practices. equity-driven design engaging with users in developing shared solutions to challenges is an important aspect of the user experience, and can help pave the way for deeper conversations. taking a step back and making sure the assessment and design process itself is transparent for everyone is one of the first things that needs to be in place. i would like to harken to the work of gretchen rossman and sharon rallis who make a crucial distinction between user-centered design, in which the user seldom has a voice in what the final process or product looks like, and what they term as “emancipatory design,” in which participants are “collaboratively producing knowledge to improve their work and their lives.” in addition, emancipatory design is one where “users are in charge; their power, their indigenous knowledge are more powerful and respected than those of the expert designer.” this approach can therefore be a means to promoting equity, diversity, and inclusion into technology work in libraries by focusing on the users’ voice as opposed to our own and working collaboratively to develop shared solutions to address their challenges. a specific example of how this framework might be applied comes from the stanford school of design which is famous for its course in design thinking. stanford has recently taken that concep t even further, and integrated an equity focus into the first steps of the progression, where the designer is not only identifying existing built-in biases but also raises questions such as who the users are, what are the equity challenges that need to be addressed, who has institutional power, and how is it manifested in the decisions that drive the organization. the stanford model also provides specific methods focusing on human values and developing relational trust as a way to bookend the design thinking process by reflecting on the blind spots that were uncovered as a way to help inform action items and next steps and ensure that the users are actively collaborating to develop these services and programs which in turn affect them. this version of the program is available at https://dschool.stanford.edu/resources/equity-centered-design-framework. as a final thought, one idea to keep at the forefront in all of these areas is that of universal design, which is defined by the center for universal design at ncsu as “the design of products and environments to be useable by all people, to the greatest extent possible, without the need for adaptation or specialized design.” the first principle is that of equitable use and can be applied to many technology-related aspects whether they are physical or virtual: • provide the same means of use for all users: identical whenever possible; equivalent when not • avoid segregating or stigmatizing any users • provisions for privacy, security, and safety should be equally available to all users • make the design appealing to all users https://dschool.stanford.edu/resources/equity-centered-design-framework critical technology | ippoliti 8 https://doi.org/10.6017/ital.v37i4.10810 further readings: barron, s. and preater, a. j. “critical systems librarianship.” in the politics of theory and the practice of critical librarianship (sacramento: litwin books, 2018). https://repository.uwl.ac.uk/id/eprint/4512/1/2018-barron-and-preater-critical-systemslibrarianship.pdf. beers, m. & summers, t. “educational equity and the classroom: designing learning-ready spaces for all students,” educause review. may 7, 2018. https://er.educause.edu/articles/2018/5/educational-equity-and-the-classroom-designinglearning-ready-spaces-for-all-students. north carolina state university center for universal design. “center for universal design”. https://projects.ncsu.edu/design/cud/ (accessed november 25, 2018). ratto, m. “critical making,” open design now. http://opendesignnow.org/index.html%3fp=434.html (accessed november 7, 2018). rossman, g. b., and rallis, s. f. learning in the field: an introduction to qualitative research (thousand oaks, ca: sage, 1998). https://repository.uwl.ac.uk/id/eprint/4512/1/2018-barron-and-preater-critical-systems-librarianship.pdf https://repository.uwl.ac.uk/id/eprint/4512/1/2018-barron-and-preater-critical-systems-librarianship.pdf https://er.educause.edu/articles/2018/5/educational-equity-and-the-classroom-designing-learning-ready-spaces-for-all-students https://er.educause.edu/articles/2018/5/educational-equity-and-the-classroom-designing-learning-ready-spaces-for-all-students https://projects.ncsu.edu/design/cud/ http://opendesignnow.org/index.html%3fp=434.html technology spaces technology services and programming integrated library systems equity-driven design further readings: 20190318 10979 gallley editorial board thoughts who will use this and why? user stories and use cases kevin m. ford information technology and libraries | march 2019 5 kevin m. ford (kefo@loc.gov) is librarian, linked data specialist, library of congress. perhaps i’m that guy. the one always asking for either a “user story” or a “use case,” and sometimes both. they are tools employed in software or system engineering to capture how, and importantly why, actors (often human users, but not necessarily) interact with a system. both have protagonists, but one is more a creative narrative, the other like a strict, unvarnished retelling. user stories relate what an actor wants to do and why. use cases detail to varying degrees how that actor might go about realizing his desire. the concepts, though distinct, are often confused and conflated. and, because they classify as jargon, the concepts have sometimes been employed outside of technology to capture what an actor needs, the path the actor takes to his or her objective, including any decisions that might be made along the way, and all of this effort is undertaken in order to identify the best solution. by giving the actors a starring role, user stories and use cases ensure focus is on the actors, their inputs, and the expected outcome. they protect against incorporating unnecessary elements, which could clutter and, even worse, weaken the end product, and they create a baseline understanding by which the result can be measured. and so i find myself frequently asking in meetings, and mumbling in hallways: “what’s the use case for that?” or “is there a user story? if not, then why are we doing it?” you get the idea. it’s a little ironic that i would become this person. not because i didn’t believe in user stories and use cases – quite the contrary, i’ve always believed in the importance and utility of them – but because of a book i was assigned during graduate coursework for my lis degree and my initial reaction. it’s not just an unassuming book, it has a downright boring appearance, as one might expect of a book entitled “use case modeling.”1 it’s a shocking 347 pages. it was a joint endeavor by two authors: kurt bittner and ian spence. i think i read it, but i can’t honestly recall. i assume i did because i was that type of student and i had a long chicago el commute at the time. in any case, i know beyond doubt that i was assigned this book, dutifully obtained it, and then picked it up, thumbed through it, rolled my eyes, and probably said, “ugh, really?” and that’s just it. the joke’s on me. the concepts, and as such the book, which i’ve moved across the country a couple of times, remain near-daily constants in my life. as a developer, i basically don’t do anything without a user story and a use case, especially one whose steps (including preconditions, alternatives, variables, triggers, and final outcome) haven’t been reasonably sketched out. “sketched out” is an interesting phrase because one would think that if entire books were being authored on the topic of use cases, for example, then use cases would be complicated and involved affairs. they can be, but they need not be. the same holds for user stories. imagine you were designing a cataloging system, here’s an example of the latter: as a librarian i want my student catalogers to be guided through selection of vocabulary terms to improve both their accuracy and speed.2 editorial board thoughts: who will use this and why? | ford 6 https://doi.org/10.6017/ital.v38i1.10979 that single-sentence user story identifies the actors (student catalogers), what they need (a “guided … selection of vocabulary terms”), and why (“to improve their accuracy and speed”). the use case would explore how the student catalogers (the actors) would interact with the system to realize that user story. the use case might be narrowly defined (“adding controlled terms to records”) or might be part of a broader use case (“cataloging records”), but in either instance the use case might go to some length to describe the interaction between the student catalogers and the system in order to generate a clear understanding of the various interactions. by doing this, the use case helps to identify functional requirements and it clearly articulates user/system expectations, which can be reviewed by stakeholders before work begins and used to verify delivery of the final product. as i have presented this, using these tools might strike you as overly formal and time-consuming. in many circumstances they might be, if the developer has sufficient user and domain knowledge (rare, very, very rare) and especially if the “solution” is not an entirely new system but just an enhancement or augmentation to an existing system. yet, whether it is a completely new system being developed by someone who has long and profound experience with the domain or a simple enhancement, it may be worth entertaining the questions/process if even informally. i find it is often sufficient to ask “who will use this and why?” essentially i’m asking for the “user story” but dispensing with the jargon. doing so may lead to additional questions, the answers to which would likely check the boxes of a “use case” even if the effort is not identified as such, and it certainly ensures the user-driven nature and need of the request. this might all sound obvious, but i like to think of it as defensive programming, which is like defensive driving. yes, the driver coming up to the stop sign on my right is going to stop, but i take my foot off the gas and position it over the brake just in case. likewise, i’m confident the functional requirements i’m being handed have been fully considered and address a user need, but i’m going to ask for the user story anyway. i’m also leery of scope creep which, if i were to continue the driving analogy, would be equivalent to driving to one store because you need to, but then also driving to two additional stores for items you think might be good to have but for which you have no present need. it’s time-consuming, you’ve complicated your project, you’ve added expense to your budget, and the extra items might be of little or no use in the end. the number of times i’ve been in meetings in which new, additional features are discussed because the designers think it is a good idea (that is, there has been no actual user request or input sought) is alarmingly high. that’s when i pipe up, “is there a user story? if not, then why are we doing it?” user stories and use cases help focus any development project on those who stand to benefit, i.e. the project’s stakeholders, and can guard simultaneously against insufficient planning and software bloat. and the concepts, though most often thought of with respect to large-scale projects, apply in all circumstances, from the smallest feature request to an existing system to the redesign of a complex system. if you are not in the habit of asking, try it next time: who will use this and why? endnotes 1 kurt bittner and ian spence, use case modeling (boston: addison-wesley, 2003). also useful: alistair cockburn, writing effective use cases (boston: addison-wesley, 2001). information technology and libraries | march 2019 7 2 “use case 3.4: authority tool for more accurate data entry,” linked data for libraries (ld4l), accessed march 1, 2019, https://wiki.duraspace.org/display/ld4l/use+case+3.4%3a+authority+tool+for+more+accur ate+data+entry. seeing through ontologies editorial board thoughts seeing through vocabularies kevin ford information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12367 kevin ford (kevinford@loc.gov) is librarian, linked data specialist in the library of congress’s network development and marc standards office. he works on the library’s bibframe initiative, and similar projects, such as mads/rdf, and is a member of the ital editorial board. the ideas and opinions expressed here are those of the author and do not necessarily reflect those of his employer. “ontologies” are popular in library land. “vocabularies” are popular too, but it seems that the library profession prefers “ontologies” over “vocabularies” when it comes to defining classes and properties that attempt to encapsulate some realm of knowledge. bibframe, mads/rdf, bibo, premis, and frbr are well-known “ontologies” in use in the library community.1 they were defined either by librarians or to be used mainly in the library space, or both. skos, foaf, dublin core, and schema are well known “vocabularies.”2 they are used widely by libraries though none were created by librarians or specifically for library use. in all cases, those ontologies and vocabularies were created for the very purpose of publication for broader use, which is one of the primary objectives behind creating one: to define a common set of metadata elements to facilitate the description and sharing of data within a group or groups of users. ontologies and vocabularies are common when working with rdf (resource description framework), a very simple data model in which information is expressed as a series of triple statements, each consisting of three parts: a subject, a predicate, and an object. the types of ontologies and vocabularies referred to here are in fact defined using rdf—thing a is a class and thing z is a property. those using any given ontology or vocabulary employ the defined classes and properties to further describe their things, for a lack of a better word. it is useful to provide an example. the first block of triples below represents class and property definitions in rdf schema (rdfs), which provides some very basic means to define classes and properties and some relationships between them, such as the domains and ranges for properties. the second block is instance data. ontovoc:book rdf:type rdfs:class ontovoc:authoredby rdf:type rdf:property ontovoc:authorof rdf:type rdf:property ex:12345 rdf:type ontovoc:book ex:12345 ontovoc:authoredby ex:abcde ontovoc:book is defined as a class and ontovoc:authoredby is defined as a property. using those declarations, it is possible to then assert that ex:12345, which is an identifier, is of type ontovoc:book and was authored by ex:abcde, an identifier for the author. is the first block— the definitions—an “ontology” or a “vocabulary?” putting aside the question for now, air quotes— in this case literal quotes—have been employed around “ontologies” and “vocabularies” to suggest that these are more terms of art than technical distinctions, though it must also be acknowledged that there is a technical distinction to be made. mailto:kevinford@loc.gov information technology and libraries june 2020 seeing through vocabularies | ford 2 ontologies in the rdf space frequently, if not always, use classes and properties from the web ontology language (known as owl) to define a specific realm’s classes and properties and how they relate to each other within that realm of knowledge. this is because owl is a more expressive definition language than basic rdfs. using owl, and considering the example above, ontovoc:authoredby could be defined as an inverse of ontovoc:authorof. ontovoc:authoredby owl:inverseof ontovoc:authorof in this way, and given the little instance data above (the two triples that begin ex:12345), it is then possible to infer the following bit of knowledge: ex:abcde ontovoc:authorof ex:12345 now that the owl:inverseof triple/declaration has been added to the definitions, it’s worth reasking: do the definitions represent an “ontology” or a “vocabulary?” a purist might answer “not an ontology,” but only because those statements have not been combined in a document, which itself has been given a uri and declared to be an owl:ontology. that’s the actual owl class that says, “this is an owl ontology.” but let’s say those statements had been added to a document published at a uri and declared to be an owl:ontology. is it an ontology now? perhaps in a strict sense the answer is “yes.” but in a practical sense few would view those four declarations, wrapped neatly in a document that has been given a uri and called an ontology, as an “ontology.” it doesn’t quite rise to the occasion—“ontologies” almost always have a broader scope and employ more formal semantics—making its use a term of art, often, rather than a real technical distinction. yet, based on the same narrow definition (a published document declaring itself to be an owl:ontology) combined with a far more extensive set of class and property definitions with defined relationships between them, it is possible to describe foaf as an ontology.3 but it is widely known as, and understood as, a “vocabulary.” (there is also an experimental version of schema as owl.4) and that gets to the crux of the issue in many ways. putting aside the technical distinction that can be argued to identify something as an “ontology” versus a “vocabulary,” there are non-technical semantics at work here—what was earlier described as a “term of art”—about when, how, and why something is deemed an “ontology” versus a “vocabulary.” the library community appears to think of their creations as “ontologies” and not “vocabularies,” even when the documentation tends to avoid the word “ontology.” for example, the opening sentence of the bibframe and mads/rdf documentation very clearly introduces each as a “vocabulary,” as does frbr in rdf.5 on the surface they may be presented as “vocabularies,” which they are of course, but despite this prominent self-declaration they are not seen in the same light as foaf or schema but instead as something more exacting, which they also are. it is worth contemplating why they are viewed principally as “ontologies” and to examine whether this has been beneficial. perhaps the ideas behind designating something a “vocabulary” are, in fact, more in line with the way libraries operate, whereas “ontologies” represent an ideal (and who doesn’t set their sights on the ideal?), striving toward which only exposes shortcomings and sows confusion. the answer to “why” is historical and probably derives from a combination of lofty thinking, traditional standards practices, and good ol’ misunderstanding. traditional standards practices favor more formal approaches. libraries’ decades-long experience with xml and xml schema information technology and libraries june 2020 seeing through vocabularies | ford 3 contributed significantly to this mindset. xml schema provides a way to describe the precise construction of an xml document and it can then be used to validate the xml document. xml schema defines what elements and attributes are permitted in the xml document and frequently dictates their order. it can further constrain the values of an element or attribute to a select list of options. in many ways, xml schema was the very expression of metadata quality control. librarians swooned. with the right controls and technology in place, it was impossible to produce poor, variable metadata. in the case of semantic modelling, owl is certainly a more formal approach. it’s founded in description logics whose expressions take the form of occult-like mathematics, at least as viewed by a librarian with a humanities background. owl can be used to declare domains and ranges for properties. one can also designate a property as a datatype property, meaning it takes a value such as a string or a date, as its value, or an object property, which means it will reference another rdf resource as its object. but these declarations are actually more about inferencing—deriving information by applying the ontology against some instance data—and not about restrictions, constraints, or validation. to be clear, there are ways to apply restrictions in owl—“wine can be either red or white”—but this is a form of advanced owl modelling that is not well understood and not often implemented, and virtually never in ontologies designed by librarians. conversely, indicating a domain for a property, for example, is easy, relatively straightforward, and seductive because it gives the appearance that the property can only be used with resources of a specific class. consider: the domain of ontovoc:authoredby is ontovoc:book. that does not mean that the ontovoc:authoredby can only be used with a ontovoc:book resource. it means that whatever resource uses ontovoc:authoredby must therefore be a ontovoc:book. defining that domain for that property is not restricting its use only to books; it allows one to derive the additional knowledge that the thing it is used with must be a book even if it doesn’t identify itself as one. this may seem like a subtle distinction and/or it may seem like tortured logic, but if it does it may suggest that one’s point of view, one’s mindset, favors constraints, restrictions, and validations. and that’s ok. that’s library training and conditioning, completely reinforced in our daily work. it’s what has been taught in library schools for decades and practiced by library professionals even longer. names should be entered “last name, first name” and any middle initial, if known, included. the data in this field should only be a three-character language code from this approved list of language codes. these rules and the consistency resulting from these rules are what make library data so often very high quality. google loves marc records from our community for this very reason. wishing to exert strong control at the definition level when creating a model or metadata scheme with an eye to data quality, it is a natural inclination for librarians to gravitate to a more formal means of defining a model, especially one that seems to promise constraints. so, despite these models self-describing at a high-level as vocabularies, the models themselves employ a considerable amount of owl at the technical level, which becomes the focus of any users wishing to implement the model. users comprehend these models as something more than a vocabulary and therefore view the model through this more complex lens. unfortunately, because owl is poorly understood (sometimes by creators and sometimes by users, and sometimes by both), this leads to various problems. on the one hand, creators and users believe there are technical restrictions or constraints where there are, in fact, none. when this happens, the “constraint” is information technology and libraries june 2020 seeing through vocabularies | ford 4 either identified as a problem (“consider removing the range for this property”) or—and this is more damaging—the property (read: model/vocabulary/ontology) is avoided. even when it is recognized that the “constraint” is not a real restriction (just a means to infer knowledge), forging ahead can generate new issues. when faced with a domain and range declaration, for example, forging ahead can result in inaccurate, imprecise, or simply undesirable inferences. most of the currently open “issues” (about 50 at the time of writing) about bibframe follow a basic pattern: 1) there is a declaration about this property or this class that makes it difficult to use because of how it has been defined with owl; 2) we cannot really use it presently because it would cause potential inferencing issues; 3) consider altering the owl definitions.6 pursuing an (owl) ontology, while formal and seemingly comforting because it feels a little like constraining the metadata schema, can result in confusion and a lack of adoption. given that vocabularies and ontologies are developed and published to encourage users to describe their data in a way that fosters wide consumption by others, this is unfortunate to say the least. it is notable that skos, foaf, dublin core, and schema have very different scopes and potentially much wider user bases than the more library-specific ontologies (bibframe, mads/rdf, bibo, etc.). there is something to be learned here: the smaller the domain, the more effective an ontology might be; the larger the universe, a more general approach may be better. it is further true that foaf, dublin core, and schema define specific domains and ranges for many of their properties, but they have strived for clarity and simplicity. the creators of schema, for example, eschewed the formal semantics behind rdfs and owl and redefine domain and range to better match their needs and (perhaps unexpectedly) most users’ automatic understanding.7 what is generally true is that each of the “vocabularies” approached the creation and defining of their models so as to minimize the use of formal semantics, and promoted this as a feature. in this way, they limited or removed altogether the actual or psychological barriers to adoption. their offering was more accessible, less fussy. bearing in mind the differences in scale and scope, they have been rewarded with a wider adopter base and passionate advocates. the decision to create a “vocabulary” or an “ontology” is a technical one and a political one, both of which must be in alignment. it’s a mindset and it is a statement. it is entirely possible to define the model at a technical level using owl, making it by definition an ontology, but to have it be perceived, and used, as a vocabulary because it is flexible and not strictly defined. likewise, it is not enough to call something a vocabulary, but in reality be a model burdened with formal semantics that is then expected to be adopted and used widely. if the objective is to fashion a (pseudo?) restrictive metadata set with rules that inform its use, and which is strongly bonded with a specific community, develop an “ontology,” but recognize that this may result in confusion and lack of uptake. if, however, the desire is to cultivate a metadata element set that is flexible, readily useable, and positioned to grow in the future because it employs fewer rules and formal semantics, create a “vocabulary.” that’s really what is being communicated when we encounter ontologies and vocabularies. interestingly, the political difference between “vocabulary” and “ontology” appears, in fact, to be understood by librarians: library models self-identify as “vocabularies.” but once past those introductory remarks, the truth is exposed quickly in the widespread use of owl, revealing beyond doubt that it is not a flexible, accommodating vocabulary but a strictly defined model. to dispense with the air quotes: as librarians we’re creating ontologies and calling them vocabularies. we really want to be creating vocabularies that are ontologies in name only. information technology and libraries june 2020 seeing through vocabularies | ford 5 endnotes 1 “bibframe ontology,” library of congress, accessed may 21, 2020, http://id.loc.gov/ontologies/bibframe.html; “mads/rdf (metadata authority description schema in rdf),” library of congress, accessed may 21, 2020, http://id.loc.gov/ontologies/madsrdf/v1.html; “bibliographic ontology specification,” the bibliographic ontology, accessed may 21, 2020, http://bibliontology.com/; “premis 3 ontology,” premis editorial committee, accessed may 21, 2020, http://id.loc.gov/ontologies/premis3.html; ian davis and richard newman, “expression of core frbr concepts in rdf,” accessed may 21, 2020, https://vocab.org/frbr/. 2 alistair miles and sean bechhofer, editors, “skos simple knowledge organization system reference,” w3c, accessed may 21, 2020, https://www.w3.org/tr/skos-reference/; dan brickley and libby miller, “foaf vocabulary specification 0.99,” accessed may 21, 2020, http://xmlns.com/foaf/spec/; “dcmi metadata expressed in rdf schema language,” dublin core™ metadata initiative, accessed may 21, 2020, https://www.dublincore.org/schemas/rdfs/; “welcome to schema.org,” schema.org, accessed may 21, 2020, http://schema.org/. 3 “foaf ontology,” xmlns.com, accessed may 21, 2020, http://xmlns.com/foaf/spec/index.rdf. 4 see “owl” at “developers,” schema.org, accessed may 21, 2020, https://schema.org/docs/developers.html. 5 see “bibframe ontology” and “mads/rdf (metadata authority description schema in rdf)” above. 6 “issues,” bibframe ontology at github, accessed 21 may 2020, https://github.com/lcnetdev/bibframe-ontology/issues. 7 r.v. guha, dan brickley, and steve macbeth, “schema.org: evolution of structured data on the web,” acmqueue 15, no. 9 (15 december 2015): 14, https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1. http://id.loc.gov/ontologies/bibframe.html http://id.loc.gov/ontologies/madsrdf/v1.html http://bibliontology.com/ http://id.loc.gov/ontologies/premis3.html https://vocab.org/frbr/ https://www.w3.org/tr/skos-reference/ http://xmlns.com/foaf/spec/ https://www.dublincore.org/schemas/rdfs/ http://schema.org/ http://xmlns.com/foaf/spec/index.rdf https://schema.org/docs/developers.html https://github.com/lcnetdev/bibframe-ontology/issues https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1 endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. a low-cost library database solution england, mark;lura, joseph;schlecht, nem w information technology and libraries; mar 2000; 19, 1; proquest pg. 46 responding rise in scholarly journal prices. nesli neither encourages nor hinders changes in scholarly communication and therefore the question of restructuring the scholarly communication process remains.20 references and notes 1. barbara mcfadden and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no . 1 (march 1998): 36. see also international coalition of library consortia, "statement of current perspective and preferred practices for the selection and purchase of electronic information," information technology and libraries 17, no. 1 (march 1998): 45. 2. martin s. white, "from psli to nesli: site licensing for electronic journals," new review of academic librarianship 3, (1997): 139-50. see also chest. chest: software, data, and information for education (1996). 3. thomas j. deloughry, "library consortia save members money on electronic materials," the chronicle of higher education (feb. 9, 1996): a21. 4. information services subcommittee , "principles for the delivery of content." accessed nov . 17, 1999, www.jisc.ac.uk/ pub97 / nl_97.html#issc. 5. joint funding council's libraries review group . the follett report. (dec. 1993): accessed nov . 20, 1999, www.niss . ac . uk/ ed ucation/hefc/ follett/report/ . 6. john kirriemuir, "background of the elib programme ." accessed nov . 21, 1999, www .ukoln.ac.uk/services .elib/ background/history.html . 7. psli evaluation team, "uk pilot site license initiative : a progress report," serials io, no. 1 (1997): 17-20. 8. white, "from psli to nesli," 149. 9. tony kidd, "electronic journals: their introduction and exploitation in academic libraries in the uk," serials review 24, no . 1 (1998): 7-14. 10. jill taylor roe, "united we save, divided we spend: current purchasing trends in serials acquisitions in the uk academic sector," serials review 24, no. 1 (1998): ~11. psli evaluation team, "uk pilot site license initiative," 17-20. 12. beverly friedgood, "the uk national site licensing initiative," serials 11, no. 1 (1998): 37-39 . 13. university of manchester and swets & zeitlinger, nesli: national electronic site license initiative (1999). accessed nov. 21, 1999, www.nesli.ac.uk/. 14. nesli brochure, "further information for librarians." accessed nov . 21, 1999, www .nesli .ac.uk/ nesli-librarians-leaflet.html. 15. a copy of the model site license is available on the nesli web site . accessed nov . 22, 1999, www .nesli .ac .uk/ mode1license8.html . 16. albert prior, "nesli progress through collaboration," learned publishing 12, no . 1 (1999). 17. science direct. accessed nov. 24, 1999, www .sciencedirect.com. 18. declan butler, "the writing is on the web for science journals in print," nature 397, oan. 211998) . 19. the journal access core collection request for proposal. accessed nov . 22, 1999, www .calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf . 20. frederick j. friend, "uk pilot site license initiative: is it guiding libraries away from disaster on the rocks of price rises?" serials 9, no. 2 (1996): 129-33. a low-cost library database solution mark england, lura joseph, and nem w. schlecht two locally created databases are made available to the world via the web using an inexpensive but highly functional search engine created in-house. the technology consists of a microcomputer running unix to serve relational databases. cgi forms created using the programming language perl offer flexible interface designs for database users and database maintainers. many libraries maintain indexes to local collections or resources and create databases or bibliographies con46 information technology and libraries i march 2000 cerning subjects of local or regional interest. these local resource indexes are of great value to researchers. the web provides an inexpensive means for broadly disseminating these indexes. for example, kilcullen has described a nonsearchable, webbased newspaper index that uses microsoft access 97.1 jacso has written about the use of java applets to publish small directories and bibliographies.2 sturr has discussed the use of wais software to provide searchable online indexes.3 many of the web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hardware that is expensive to purchase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. after trying several alternatives, staff members at the north dakota state university libraries have implemented an inexpensive but highly functional and reliable solution. we are now providing searchable indexes on the web using a microcomputer running unix to serve relational databases. cgi forms created at the north dakota state university libraries using the programming language perl offer flexible interface designs for database users and database maintainers. this article describes how we have implemark england (england@badlands . nodak.edu) is assistant director, lura joseph (ljoseph@badlands.nodak.edu) is physical sciences librarian, and nem w. schlecht (schlecht@plains.nodak.edu) is a systems administrator at the north dakota state university libraries, fargo, north dakota. reproduced with permission of the copyright owner. further reproduction prohibited without permission. mented this technology to distribute two local databases to the world via the web. it is hoped that recounting our experiences will facilitate other such projects . i creating the databases the two databases that we selected to use as demonstrations of this technology are a community newspaper index and a bibliography of publications related to north dakota geology. the forum index the farg o forum is a daily newspaper published in fargo, north dakota. it began publication in 1879 and is the paper of record for north dakota . for many years, the north dakota state university libraries have maintained an index to the forum. beginning with the selective indexing of notable events and editions, we started offering full-text indexing of the entire paper in 1996. until early in the 1980s, all indexing was done manually and preserved on cards or paper. then for several years , indexing was done on one of the university's mainframe computers . starting in 1987, microcomputers were used to compile the index, first using dbase and then using procite as the database management software . printed copies of the database were sold annually to subscribing libraries and businesses . starting in the summer of 1996, th e library made arrangements with the publisher of the paper to acquire digital copy of the text of each newspaper. in early 1997, the ndsu libraries began a project to place all of our forum indexes on the web. dbase, pro-cite, wordperfect, or microsoft access computer files existed for the newspaper index from 1879 to 1975, 1988, and from 1990 to 1996. all other data was unavailable or unreadable. printed indexes from 1976 to 1987 and 1989 were scanned using a hewlett packard 4c scanner fitted with a page feeder . optical character recognition was accomplished using the software omnipage pro. once experience was gained with scanner and software settings, the scanning went very quickly with very few errors appearing in the data. various members of the library staff volunteered to check and edit the data, and the digitizing of approximately 1,500 pages was completed in about three weeks. all data were checked and normalized using microsoft's excel spreadsheet software and then saved as tab-delimited text. programmer's file editor was used to do the final text editing. because of variations in the completeness of the indexing, three separate relational database tables were created: one each for the years 1879-1975, 1976-1996, and 1996-the present. the collective bibliography of north dakota geology in 1996 a project was initiated to combine three bibliographies of north dakota geology and to make the final product searchable and browsable on the web. all three of the original print bibliographies were published by the north dakota geological survey. scott published the first bibliography as a thesis . it is a bibliography of all then-known north dakota geological literature published between 1805 and 1960, and most entries are annotated. 4 the second print bibliography, also by scott, focuses on north dakota geological literature published in the years 1960 through 1979, and also includes some material omitted in the first bibliography .5 most entries in the second bibliography include annotations in the form of keywords or keyword phrases. the third bibliography covers the years 1980 through 1993, and is not annotated.6 all three bibliographies are indexed . the third bibliography was available in digital format, whereas the first two were in print format only. library staff members began rekeying the two print bibliographies using microsoft word. the remaining pages were digitally scanned using a new hewlett packard 4c scanner and the optical character recognition software omnipage pro . there were many errors in the resulting text. different font sizes in the original documents may have contributed to optical recognition errors . editing of the scanned pages was nearly as time consuming and tedious as rekeying the documents . the microsoft word documents were saved as text files and combined as a single text file. programmer's file editor was used as a final editor to remove any line breaks or other undesirable formatting. each record was edited to occupy one line, and each field was delimited by two asterisks . asterisks were used because there were many occurrences of commas, semicolons, and other symbols that would have made it difficult to parse any other way. because italics were removed by converting to a text file, some errors were made in parsing. in retrospect, parsing should have been done before the document was saved as a text file. punctuation between fields was removed because the database would be converted to a large table. it would have been better to leave the punctuation intact, since it cannot easily be put back in for the output to be presented in bibliographic form. the alphabetical additions to publication dates (e.g. baker, 1966a) were left intact to aid in hand-cutting and pasting index terms into the records at a later date. initially, the resulting document was converted to a microsoft access file so that it would be in a table format. however, many of the fields communications i england, joseph, and schlecht 47 reproduced with permission of the copyright owner. further reproduction prohibited without permission. secure database: shaw diese fields in results : aalhor: ::::=====~---~---date : le~al to i p author p date p tille p' source tid,:l . _j r annot:l!iom r: index sour1:e: ~=====~ amiotalions: l ... -· · ... ······-···~~ ..... ·.-... --.... j r prilll resource p record number bulu: ;:::::::::::=::::::::::::::::::;:::;,~~ priat re1oun:11: ! show all ii record naml,er: i equal to ij l=:j sort results by: jaulhor j r descending ai b ic idi eif ig ihii ij ikil im in io ipi.q.iri s it iuiviwix iy iz figure 1: secure database editing interface were well over the 256 character limit of individual fields . to solve this problem, the data were imported into a relational database called mysql, which allows large data fields called "blobs." running under unix, mysql is very flexible and powerful . i database and search engine design we examined the features and capabilities of various online bibliographies and indexes when deciding on our search interfaces and search engine designs . we wanted our databases to be both searchable and browsable and, in the case of the collective bibliography of north dakota geology, we wanted to provide the option of receiving search results accurately in a specific bibliographic format. we wanted both simple and advanced search capabilities, including the ability to do highly sophisticated boolean searching. finally, we wanted to provide those maintaining the databases with the ability to easily add, delete, and change records from within simple forms on the web and immediately see the results of this editing . mysql uses a perl interface, dbi (database independent interface), which makes accessing the database simple from a perl script. essentially, a sql statement is generated, based on data from an html form. this sql statement is then run against the mysql database, returning matching rows that the same script can handle and display as needed. all of the dynamically generated pages in this database are created this way. using both mysql and perl provided a nice, elegant way to integrate database functionality with the web. the databases were installed on a server and made available via the web. it soon became apparent that there were problems with large numbers of returns . depending upon the client machine's hardware configuration, browsers could lock up the 48 information technology and libraries i march 2000 machine. while an efficient search should not result in such a large number of hits, we decided to limit returns to reduce this problem. following suggestions from users, various search tips were added, and some search interface terminology was changed. from a secure gateway , it is possible to call up different forms that allow individual records to be displayed, edited, and saved (see figure 1). new records are added by using a simple html form . it is also possible to bulk-load large numbers of records by using a special perl program to load the data directly from a text file. i advantages of the unix/mysql solution after first using glimpse, a popular web search engine, under linux, a free unix platform, and then microsoft's internet information server (iis) software on a windows nt platform to search the forum newspaper index, we settled on using mysql on a microcomputer running linux and the apache web server. we found we could write perl scripts that allowed users to make very sophisticated searches of the data from within very simple web forms. mysql is stable, reliable, free, and offers a high degree of functionality, flexibility, and efficiency. apache is reliable, extendible, very fast, free, and offers tight control of data access. initially, each story received from the newspaper was maintained as a separate file on a microcomputer. by having the stories as separate files, it was easy to set up glimpse as a searching tool for the articles. although it did provide a nice preview of a workable system, glimpse did not provide enough flexibility in how records were displayed, organized, or searched. it was not meant for managing data of this sort. windows nt, although a popular and successful it solution, was reproduced with permission of the copyright owner. further reproduction prohibited without permission. found to be somewhat cumbersome to implement and did not provide enough flexibility. the installation of these tools was easy, but it was difficult to obtain a high level of database and web integration . reliability and cost were also concerns . we found that unix was more stable and practically eliminated any unavailability of the data . perl, mysql, and apache were ultimately used to manage, store, and deliver the data. although these products are available for windows nt, their native platform is unix. by running these products on unix, we were able to take advantage of all the features offered by each of the products. we found that mysql offered the flexibility and power to manage both sets of data efficiently. also, to load the data into a relational database such as mysql required the data to be normalized. normalized data are data that are separated into logically separate components. to normalize data often takes some extra effort, as fields must be defined to contain certain types of data, but in the end the data is easier to manage and well organized. by having articles and bibliographies in a relational database, we are able to easily make updates, additions, and generate output or reports on the data in many different ways. there are several web servers available on the market today . however, apache is often singled out as being the most popular server . apache, like perl and mysql, is available free for all uses (educational and commercial). using apache and .htaccess control files, we are able to restrict access to administrative pages where data are added or modified. many extensions for apache are available to increase web performance in different situations. for example, a module for apache allows the web server to execute perl code within the server without the need to run the regular perl interpreter. i conclusion and future plans work is under way to refine and update the collective bibliography of north dakota geology. because bibliography number three was not annotated, index terms are being added to facilitate searching and retrieval of citations. we have recently updated the collective bibliography of north dakota geology to include citations to publications through 1998, and we plan to update the database annually. additionally, we receive monthly updates of forum articles, which are added using a simple perl script as soon as they are received. we have successfully implemented a number of other databases using these methods. we realize that this unix/ mysql solution is likely to be most helpful to other academic libraries: there are generally students and staff available on many campuses who are capable of programming in perl and maintaining sql databases on unix servers. our perl scripts are available at the url ww.lib.ndsu .nodak.edu/ kids. references and notes 1. m . kilcullen, "publishing a newspaper index on the world wide web using microsoft access 97," the indexer 20, no . 4 (1997): 195-96 . 2. p . jacso, "publishing textual databases on the web," information today 15, no . 11 (1998): 33, 36 3. n .o . sturr, "wais: an internet tool for full-text indexing," computers in libraries 15 (june 1995): 52-54. 4. m .w . scott, annotated bibliography of the geology of north dakota 1806-1959 north dakota geological survey miscellaneous series, no. 49 . (grand forks , n .d .: north dakota geological survey , 1972). 5. m . w . scott , annotated bibliography of the geology of north dakota 1960-1979 north dakota geological survey miscellaneous series, no. 60. (grand forks, n.d.: north dakota geological survey, 1981). 6. l. greenwood and others, bibliography of the geology of north dakota 1980-1993 north dakota geological survey miscellaneous series, no. 83. (bismarck, n .d .: north dakota geological survey, 1996). related urls linux homepage: www.linux.org/ mysql homepage: www.mysql.com/ perl homepage: www.perl.com/ apache homepage: www.apache.org/ ndsu forum index: www.lib.ndsu. nodak.edu/forum/ collective bibliography of north dakota geology: www.lib.ndsu.nodak.edu/ ndgs/ communications i england, joseph, and schlecht 49 information technology and libraries at 50: the 1960s in review mark cyzyk information technology and libraries | march 2018 6 mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. in the quarter century since graduating from library school, i have now and then run into someone who had what i consider to be a highly inaccurate and unintuitive view of librarians and information technology. seemingly, in their view, librarians are at worst luddites and at best technological neophytes. not so! in my view, librarians have always been at worst technological power users and at best true it innovators. one has only to scan the first issues of ital, or the journal of library automation as it was then called, to put such debate to rest. march 1968 saw the first issue of the first volume of the journal of library automation published. the first article of that inaugural issue sets the scene: “computer based acquisitions system at texas a&i university” by ned c. morris. here we find librarians not only employing computing technology to streamline library operations (using an ibm 1620 with 40k ram), but as the article points out, this new system for computerizing acquisitions was an adjunct to the systems they already had in place at texas a&i for circulation and serials management. this first article in the first issue of the first volume indicates that we’ve dipped a toe into a stream that was already swiftly flowing. the other bookend of that first issue, “the development and administration of automated systems in academic libraries” by harvard’s richard de gennaro, goes meta and takes a comprehensive look at how automated library systems were already being created and the various system development and implementation rubrics under which such development occurred. much in this article should resonate with current readers of ital. i knew immediately that this article was going to be a good read when i encountered, in the very first paragraph: development, administration, and operations are all bound up together and are in most cases carried on by the same staff. this situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. i’d say that was a safe assumption. the second and final volume of the 1960’s contains gems as well. the entirety of volume 2 issue 2 that year was devoted to “usa standard for a format for bibliographic information interchange on magnetic tape” a.k.a. marc ii. is it possible for something to be dry, yet fascinating? some titles of this second volume point to the wide range of technological projects underway in the library world in 1969: mailto:mcyzyk@jhu.edu the 1960s in review | cyzyk 7 https://doi.org/10.6017/ital.v37i1.10339 • “an automated music programmer (musprog)” by david f. harrison and randolph j. herber • “a fast algorithm for automatic classification” by r. t. dattola • “simon fraser university computer produced map catalogue” by brian phillips and gary rogers • “management planning for library systems development” by fred l. bellomy • “performance of ruecking’s word-compression method when applied to machine retrieval from a library catalog” by ben-ami lipetz, peter stangl, and kathryn f. taylor and this is only in the first two volumes. as this current 2018 volume of ital proceeds, we’ll be surveying the morphing information technology and libraries landscape through ital articles of the seventies, eighties, and nineties. i think you will see what i mean when i say that librarians have always been at worst technological power users, at best true it innovators. 10738 20190318 galley determining textbook cost, formats, and licensing with google books api: a case study from an open textbook project eamon costello, richard bolger, tiziana soverino, and mark brown information technology and libraries | march 2019 91 eamon costello (eamon.costello@dcu.ie) is assistant professor, open education at dublin city university. richard bolger (richard.bolger@dcu.ie) is lecturer at dublin city university. tiziana soverino (tiziana.soverino@dcu.edu) is researcher at dublin city university. mark brown (mark.brown@dcu.ie) is full professor of digital learning, dublin city university. abstract the rising cost of textbooks for students has been highlighted as a major concern in higher education, particularly in the us and canada. less has been reported, however, about the costs of textbooks outside of north america, including in europe. we address this gap in the knowledge through a case study of one irish higher education institution, focusing on the cost, accessibility, and licensing of textbooks. we report here on an investigation of textbook prices drawing from an official college course catalog containing several thousand books. we detail how we sought to determine metadata of these books including: the formats they are available in, whether they are in the public domain, and the retail prices. we explain how we used methods to automatically determine textbook costs using google books api and make our code and dataset publicly available. introduction the cost of textbooks is a hot topic for higher education. it has been reported that by 2014 the average student spent $1,200 annually on textbooks.1 another study claimed that between 2006 and 2016 the costs of college textbooks increased over four times the cost of inflation.2 despite this rise in textbook costs, a survey of more than 3,000 us faculty members (“the babson survey”) found that almost every course (98 percent) mandated a textbook or related study resources.3 one response to the challenge of rising textbook costs is open textbooks. open textbooks are a type of open educational resource (oer). oers have been defined as “teaching, learning, and research resources that reside in the public domain or have been released under an intellectual property license that permits their free use and repurposing by others. open educational resources include full courses, course materials, modules, textbooks, streaming videos, tests, software, and any other tools, materials, or techniques used to support access to knowledge.”4 oers stem from the principle that access to education is a human right and that, as such, education should be accessible to all.5 hence an open textbook is made available under terms which grant legal rights to the public, not only to use, but also to adapt and redistribute. creative commons licensing is the most prevalent and well-developed intellectual property licensing tool for this purpose. open textbook projects aimed at promoting publishing and redistributing open textbooks, both in digital and print formats, have been growing. for example, the bcampus project in canada began in 2012 with the aim of creating a collection of open textbooks aligned with the most popular subject areas in british columbia.6 the project has shown strong growth, with over 230 open digital textbooks now available and more than forty institutions involved. a significant recent determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 92 https://doi.org/10.6017/ital.v38i1.10738 development in open textbooks occurred in march 2018, when the us congress announced a $5 million investment in an open textbook initiative.7 in addition to helping change institutional culture, and challenge attitudes to traditional publishing models, one of the most oft-cited benefits of open textbooks is cost savings. according to the college board’s survey of colleges, the average annual cost to us undergraduate students in 2017 for textbooks and materials was estimated at $1,250.8 this figure is remarkably close to the aforementioned figure of $1,200 a year, as reported by baglione and sullivan. however, there is little known about the monetary face value of books that students are expected to buy, beyond studies based on self-reported data. students themselves in the us have attempted to at least open the debate in this area by highlighting book price disparities.9 nonetheless, they only report on a very small number of books, and the college board representing on-campus us textbook retailers have disputed their results for this reason, claiming that they have been selective in the book prices they have chosen. hence this study seeks to address the gap that exists in knowledge about the true cost of textbooks in higher education. this is in the context of a wider research project we are conducting on open textbooks in ireland.10 determining the cost of books is not straightforward as books can be new, used, rental, or digital subscription. however, the cost of new books does set a baseline for other forms, particularly rental and used books. our aim here is hence to start with new books, by analyzing costs of all the required and recommended textbooks of one higher education institution (hei) in ireland. the overarching research question this study sought to address is: what is known about the currently assigned textbooks in an irish university? the sub-questions were: • rq1: what is the extent of textbooks that are required reading? • rq2: what are the retail costs of textbooks? • rq3: are textbooks available in digital or e-book form? • rq4: are textbooks available in the public domain? the next section outlines our methodology and how we sought to find answers to these questions. methods in this section we describe our approach, the dataset generated, and the methods we used to analyze the data. we identified a suitable data source comprising the official course catalog of a hei in ireland with more than ten thousand students. in the course catalog faculty give required and recommended textbook details for all courses. this information is freely accessible on the website of the hei; the course catalog is powered by a software system known as akari (http://www.akarisoftware.com/). akari is a proprietary software system used by several heis in and outside ireland to create and manage academic course catalogs. the course team gained access to a download of all books recorded in the database of the course catalog (figure 1). in this catalog, fields are provided for lecturers to input information for students about books such as title, international standard book number (isbn), author, and publisher. following manual and automated data cleansing, 3,014 unique records of books were created. due to the large number of books, at this stage we sought a programmatic solution for finding out more information about these books. information technology and libraries | march 2019 93 figure 1. course catalog screenshot. we initially thought that isbns might prove the best way to accurately reconcile records of books. however, many isbns were incomplete or mistyped. moreover, many instructors simply did not enter an isbn. given the capacity for errors in the data—for instance, some lectures simply entered “i will tell you in class” in the book title field—we required a tool that could handle fuzzy search queries, e.g. cases where a book title or author were misspelled. the tool we selected was the google books application programming interface (api).11 this api provides an interface to the google books database of circa thirty million books. the service, like the main google search engine, is forgiving of queries that are mistyped or misspelled. hence, we constructed a query based on a combination of author name, book title, and publisher. following experimentation, we determined that these three search terms together allowed us to find books with a high degree of accuracy whilst also accounting for possible spelling errors. determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 94 https://doi.org/10.6017/ital.v38i1.10738 figure 2. system design. we then wrote a custom javascript middleware program deployed in the google cloud platform. this program parsed the file of the book search queries, passed them to the google books api as search requests and saved the results. the api returned results in javascript object notation (json) format. json is a modern web language for describing data. it is related to javascript and can be used to translate objects in the javascript programming language into textual strings. it is used as a replacement for xml as it is arguably more human readable and is considerably less verbose. we then imported this json into a mongodb database to filter and clean the data, before finally exporting them to excel for statistical analysis. mongodb is a document store database that natively stores objects in the json format and allows for efficient querying of the data. the google books api provides some key metadata on books aside from the usual author, publisher, isbn, edition, pages, etc. as it gives prices for selected books. google draws this information from its own e-book store which contains over three million books and a network of resellers who sell print and digital versions of the books. in addition to price, google books also contains information on accessible versions of books, digital/e-pub versions, pdf versions, and whether the book is in the public domain. we have published a release of this dataset and all of our code to the software repository github. we then used the zenodo platform to generate a digital object identifier (doi) for the code.12 one of the functions of the zenodo platform is to allow for code to be properly cited and referenced. we published our code in this way for others interested in replicating this work in other contexts. in the next section we will provide an analysis of the results of our queries. results after extracting and processing the data from the course catalog and google platforms, we obtained 3,030 unique course names and in these courses we found over 15,414 books listed. required versus recommended reading from the course catalog data, we found that 11,022 (71.5 percent) books were required readings and the remaining 4,392 (28.5 percent) were recommended. information technology and libraries | march 2019 95 upon cleaning and removing duplicates and missing data, we identified 3,014 books that could be queried using the google books api. querying the api returned results for 2,940 books, i.e. it found 97 percent of the books and only seventy-four books could not be found. the google books api returns information in json format. figure 3 below shows an example of the json information returned for one book. { "volumeinfo" : { "title" : "psychiatric and mental health nursing", "authors" : [ "phil barker" ], "industryidentifiers" : [ { "type" : "isbn_13", "identifier" : "9781498759588" }, { "type" : "isbn_10", "identifier" : "1498759580" } ], "imagelinks" : { "smallthumbnail" : "http://books.google.com/books/content?id=btsocgaaqbaj&printsec=frontcover&img=1&zo om=5&edge=curl&source=gbs_api" } }, "saleinfo" : { "isebook" : true, "retailprice" : { "amount" : 62.39, "currencycode" : "usd" } }, "accessinfo" : { "publicdomain" : false, "pdf" : { "isavailable" : true } } } figure 3. sample of book information returned by google books api. digital formats and public domain license figure 4 shows the numbers of pdf (1,219) and e-book (1,016) versions of books reported to be available. eight hundred and fifty-four were available in both pdf and e-book format. from the determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 96 https://doi.org/10.6017/ital.v38i1.10738 total of 2,940 individual books listed their availability was as follows: figure 4. availability of 2,940 books in digital formats and public domain license. as per figure 4, only 0.18 percent (six) of the books had a version available in the public domain according to google books. cost results the google books api only returned prices for 596 (20 percent) of the books that we searched for. within that sample, the cost ranged from $0.99 to over $452, as illustrated in figure 5. the median price of a book was $40, and the mean price was $56.67. as there are on average 3.96 books per course, this implies an average cost to students of $224.41 per course taken. as students take an average of 8.05 courses per year, this further implies a cost per year of $1,806.50 per student if they were to buy new versions of all the books. 1,219 (39.73% ) 1,016 (34.56% ) 6 (0. 18%) 0 500 1000 1500 2000 2500 pdf ebook openpdf e-book public domain information technology and libraries | march 2019 97 figure 5. summary of book prices (n = 596). discussion and conclusion we have demonstrated that it is possible to programmatically search and determine the prices of large numbers of books. we used this information to attempt to estimate the full economic cost of books to students on average in an irish hei. we are still actively developing this tool and encourage others to use and even contribute to the code which we have published with the dataset. this proof of concept tool may allow stakeholders with an interest in book costs for students to quickly get real data on large numbers of books. ultimately, we hope that this will help highlight the costs of many textbooks. our findings also highlight relatively low levels of digital book availability. very few books were found to be in the public domain. a limitation of this research is that there are issues around the coverage of google books and its index policies or algorithms. in a literature review of research articles about google books in 2017, fagan pointed out that the coverage of google books is “hit and miss.”13 in 2017, google books included about thirty million books, though google did not release specific details on its database, as emphasized by fagan. it is known that content includes digitized collections from over forty libraries, and that us and englishlanguage books are overrepresented.14 furthermore, google books is only returning results for books that are in the public domain and cannot tell us if books are made available through open licenses such as creative commons. accepting such caveats, however, we have found the google books api to be a very useful tool for answering questions about large numbers of books in a systematic way and hope that our findings can help others. the prices that we derived in this study were for new books only. however, the new book prices provide a baseline for all other prices, e.g. a used book or a loan book price will be relative to a new book price and library budgets will need to take account of new book prices.15 further study is required to determine a more realistic figure for the cost of textbooks and the next phase of our 0 50 100 150 200 250 300 350 400 450 500 1 16 31 46 61 76 91 10 6 12 1 13 6 15 1 16 6 18 1 19 6 21 1 22 6 24 1 25 6 27 1 28 6 30 1 31 6 33 1 34 6 36 1 37 6 39 1 40 6 42 1 43 6 45 1 46 6 48 1 49 6 51 1 52 6 54 1 55 6 57 1 58 6 d ol la rs cost in usd books determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 98 https://doi.org/10.6017/ital.v38i1.10738 wider open textbook research projects involves interviews and focus groups with students to better understand the lived reality of their relationship with textbooks.16 references 1 stephen l. baglione and kevin sullivan, “technology and textbooks: the future,” american journal of distance education 30, no. 3 (aug. 2016): 145-55, https://doi.org/10.1080/08923647.2016.1186466. 2 etan senack and robert donoghue, “covering the cost: why we can no longer afford to ignore high textbook prices,” report, the student pirgs (feb. 2016), www.studentpirgs.org/textbooks. 3 elaine allen and jeff seaman, “opening the textbook: educational resources in u.s. higher education, 2015-16,” report, babson survey research group (july 2016), https://www.onlinelearningsurvey.com/reports/openingthetextbook2016.pdf. 4 william and flora hewlett foundation (2019), http://www.hewlett.org/programs/education-program/open-educational-resources. 5 2012 paris oer declaration, http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaratio n.htm. 6 mary burgess, “the bc open textbook project,” in open: the philosophy and practices that are revolutionizing education and science, rajiv s. jhangiani and robert biswas-diener (eds.). (london: ubiquity pr., 2017): 227–36. 7 nicole allen, “congress funds $5 million open textbook grant program in 2018 spending bill,” sparc open (mar. 20, 2018), https://sparcopen.org/news/2018/open-textbooks-fy18/. 8 jennifer ma et al., “trends in college pricing,” report, the college board (oct. 2017), https://trends.collegeboard.org/sites/default/files/2017-trends-in-college-pricing_0.pdf. 9 kaitlyn vitez, “open 101: an action plan for affordable textbooks,” report, student pirgs (jan. 2018), https://studentpirgs.org/campaigns/sp/make-textbooks-affordable. 10 mark brown, eamon costello, and mairéad nic giolla mhichíl, “from books to moocs and back again: an irish case study of open digital textbooks,” in exploring the micro, meso and macro. proceedings of the european distance and e-learning network 2018 annual conference, genova, 17-20 june, 2018 (budapest: the european distance and e-learning network): 206-14. 11 google books api (2018), https://developers.google.com/books/docs/v1/reference/volumes. 12 eamon costello and richard bolger, “textbooks authors, publishers, formats and costs in higher education,” bmc research notes 12, no. 1 (jan. 2019): 12-56, https://doi.org/10.1186/s13104-019-4099-1. information technology and libraries | march 2019 99 13 jody condit fagan, “an evidence-based review of academic web search engines, 2014-2016: implications for librarians’ practice and research agenda,” information technology and libraries 36, no. 2 (mar. 2017): 7-47, https://doi.org/10.6017/ital.v36i2.9718. 14 ibid. 15 anne christie, john h. pollitz, and cheryl middleton, “student strategies for coping with textbook costs and the role of library course reserves,” portal: libraries and the academy 9, no. 4 (oct. 2009): 491-510, http://digital.library.wisc.edu/1793/38662. 16 eamon costello et al., “textbook costs and accessibility: could open textbooks play a role?” proceedings of the 17th european conference on elearning (ecel), vol. 17 (athens, greece: 2018): 99-106. assessing the effectiveness of open access finding tools articles assessing the effectiveness of open access finding tools teresa auch schultz, elena azadbakht, jonathan bull, rosalind bucy, and jeremy floyd information technology and libraries | september 2019 82 teresa auch schultz (teresas@unr.edu) is social sciences librarian, university of nevada, reno. elena azadbkaht (eazadbakht@unr.edu) is health sciences librarian, university of nevada, reno. jonathan bull (jon.bull@valpo.edu) is scholarly communications librarian, valparaiso university. rosalind buch (rbucy@unr.edu) is research & instruction librarian, university of nevada, reno. jeremy floyd (jfloyd@unr.edu) is metadata librarian, university of nevada, reno. abstract the open access (oa) movement seeks to ensure that scholarly knowledge is available to anyone with internet access, but being available for free online is of little use if people cannot find open versions. a handful of tools have become available in recent years to help address this problem by searching for an open version of a document whenever a user hits a paywall. this project set out to study how effective four of these tools are when compared to each other and to google scholar, which has long been a source of finding oa versions. to do this, the project used open access button, unpaywall, lazy scholar, and kopernio to search for open versions of 1,000 articles. results show none of the tools found as many successful hits as google scholar, but two of the tools did register unique successful hits, indicating a benefit to incorporating them in searches for oa versions. some of the tools also include additional features that can further benefit users in their search for accessible scholarly knowledge. introduction the goal of open access (oa) is to ensure as many people as possible can read, use, and benefit from scholarly research without having to worry about paying to read and, in many cases, restrictions on reusing the works. however, oa scholarship helps few people if they cannot find it. this is especially problematic for green oa works, which are those that have been made open by being deposited in an open online repository even if they were published in a subscription -based journal. opendoar reports more than 3,800 such repositories.1 as users are unlikely to search each individual repository, an efficient search method is needed to find the oa items spread across so many locations. in recent years, several browser extensions have been released that allow a user to search for an open version of an article while on a webpage for that article. the tools include: • lazy scholar, a browser extension that searches google scholar, pubmed, europepmc, doai.io, and dissem.in. it has extensions for both the chrome and firefox browsers.2 • open access button, which uses both a website and a chrome extension to search for oa versions.3 • unpaywall, which also acts through a chrome extension to search for open articles via the digital object identifier.4 • kopernio, a browser extension that searches subject and institutional repositories and is owned by clarivate analytics. kopernio has extensions for chrome, firefox, and opera.5 mailto:teresas@unr.edu mailto:eazadbakht@unr.edu mailto:jon.bull@valpo.edu mailto:rbucy@unr.edu mailto:jfloyd@unr.edu assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 83 https://doi.org/10.6017/ital.v38i3.11109 some of the tools offer other services, such as open access button’s ability to help the user email the author of an article if no open version is available, as well as integration with libraries’ interlibrary loan workflows. kopernio and lazy scholar offer to sync with a user’s institutional library to see if an article is available through the library’s collection.6 although other similar extensions might also exist, this article is focused on the four mentioned above based on the authors’ knowledge of available oa finding tools at the time of the project. literature review as noted above, scholars have indicated for several years a need for reliable and user-friendly methods, systems, or tools that can help researchers find oa materials. bosman et al. forwarded the idea of a scholarly commons—a set of principles, practices, and resources to enable research openness—that depends upon clear linkages between digital research objects.7 bulock notes that oa has “complicated” retrieval in that oa versions are often housed in various locations across the web, including institutional repositories (irs), preprint servers, and personal websites. 8 there is no perfect search option or tool, although some have tried creating solutions, such as the open jericho project from wayne state university, which is seeking to create an aggregator to search institutional repositories and eventually other sources as well.9 however, this lack of a central search tool can lead to confusion among researchers.10 nicholas and colleagues found that their sample of early career scholars drawn from several countries relied heavily on google and google scholar to find articles that interested them.11 many also turn to researchgate and other social media platforms and risk running afoul of copyright. the results of ithaka s+r’s 2015 survey of faculty in the united states reflect these findings to a certain extent, as variations exist between researchers in different disciplines.12 a majority of the respondents also indicated an affinity for freely accessible materials. as more researchers become aware of and gravitate toward oa options, the efficacy of various discovery tools, such as the browser extensions evaluated in this study, will become even more pertinent. previous studies on the findability of oa scholarship have focused primarily on google and google scholar.13 a few have assessed tools such as oaister, opendoar, and pubmed central.14 norris, oppenheim, and rowland sought a selection of articles using google, google scholar, oaister, and opendoar.15 while oaister and opendoar found just 14 percent of the articles’ open versions, google and google scholar combined managed to locate 86 percent. jamali and nabavi assessed google scholar’s ability to retrieve the full text of scholarly publications and documented the major sources of the full-text versions (publisher websites, institutional repositories, researchgate, etc.).16 google scholar was able to locate full-text versions of more than half (57.3 percent) of the items included in the study. most recently, martin-martin et al. likewise used google scholar to gauge the availability of oa documents across different disciplines.17 they found that roughly 54.6 percent of the scholarly content for which they searched was freely available, although only 23.1 percent of their sample were oa by virtue of the publisher. as of yet, no known studies have systematically evaluated the growing selection of open access tools’ efficiency and effectiveness at retrieving oa versions of articles. however, several scholars and journalists have reviewed these new tools, especially the more established open access button and unpaywall.18 these reviews were mostly positive, even as some acknowledged that the tools are not a wholescale solution for locating oa publications. despite pointing out these tools’ information technology and libraries | september 2019 84 limitations, reviewers voiced their hope that the oa finding tools could help disrupt the traditional scholarly publishing industry.19 at least one study has used the open access button to determine the green oa availability of journal articles. emery used the tool as the first step to identify oa article versions and then searched individual institutional repositories, followed by google scholar as the final steps.20 emery found that 22 percent of the study sample was available as green oa but did not say what portion of that was found by the open access button. emery did note that the open access button returned 17 false positives (six in which the tool took the user to the wrong article or other content, and 11 in which it took the user to a citation of the article with no full text available). she also found at least 38 cases of false-negative returns from the open access button, or articles that were openly available that the tool failed to find. the study did not count open versions found on researchgate or academia.edu. methodology oa finding tools this study compared the chrome browser extensions for google scholar and four oa finding tools: lazy scholar, unpaywall, open access button, and kopernio. each extension was used while in the chrome browser to search for open versions of the selected articles and the success of each extension in finding any free, full version was recorded. the authors did not track whether an article was licensed for reuse. for the four oa finding tools, the occurrences of false positives (e.g., the retrieval of an error page, a paywalled version, or the wrong article entirely) were also tracked. false positives were not tracked for google scholar, which does not purport to find only open versions of articles. data collection occurred over a six-week period in october and november 2018. the authors used web of science to identify the test articles (n=1,000) with the aim of selecting articles that would give the tools the best chance for finding a high number of open versions. articles selected were published in 2015 and 2016. these years were selected in order to try to avoid embargoes that might have prevented articles being made open through deposit. the articles were selected from two disciplines: applied physics and oncology, both of which have a large share in web of science and come from a broader discipline with a strong oa culture.21 each comparison began with searching the google scholar extension by article doi or title if a doi was not available. all versions retrieved by google scholar were examined until an open version was located or until the retrieved versions were exhausted. the remaining oa tools were then tested from the webpage for the article record on the journal’s website (if available). if no journal page was available, the article pdf page was tested. all data were recorded in a shared google sheet according to a data dictionary. searches for open versions of paywalled articles were performed away from the authors’ universities to ensure the institutions’ subscriptions to various journals did not impact the results. authors were limited in the number of articles they could search each day as some tools blocked continued use, presumably over concerns of illegitimate web activity, after as few as 15 searches. study limitations this methodology might have missed open versions of articles, even using these five search tools. although studies have found google scholar to be one of the most effective ways of searching for assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 85 https://doi.org/10.6017/ital.v38i3.11109 open versions, way has shown that it is not perfect.22 therefore, it is possible that this study undercounted the number of oa articles. the study tested the ability of oa finding tools to locate open articles from a journal’s main article page, not other possible webpages (e.g., the google scholar results page). this design may have limited the effectiveness of some tools, such as kopernio, which appear to work well with some webpages but not others. results overall, the tools found open versions for just less than half of the study sample (490), whereas they found no open versions for 510 articles. although lazy scholar, unpaywall, open access button, and kopernio all found open versions, google scholar returned the most with 462 articles (94 percent of all articles with at least one open version). open access button, lazy scholar, and unpaywall all found a majority of the open articles (62 percent, 73 percent, and 67 percent, respectively); however, kopernio found open versions for just 34 percent of the articles (see figure 1). figure 1. number of open versions found by each tool. it was most common for three or more of the tools to find an open version for an article, with just 48 found by two tools and 98 found by only one tool (see figure 2). information technology and libraries | september 2019 86 figure 2. number of articles where x number of oa finding tools found an open version. when looking at articles where only one tool returned an open version, google scholar had the highest results (84). open access button (4) and lazy scholar (10) also returned unique hits, but unpaywall and kopernio did not. open access button returned the most false positives with 46, or nearly 5 percent of all 1,000 articles. lazy scholar returned 31 false positives (3 percent), unpaywall returned 14 (1 percent), and kopernio returned 13 (1 percent). discussion the results for the oa search tools show that while all four options met with some success, none of them performed as well as google scholar. three of the tools—lazy scholar, open access button, and unpaywall—did find at least half or more of the open versions that google scholar did. it is important to note that open access button, which found the second fewest open versions, does not search researchgate and academia.edu because of legal concerns over article versions that are likely infringing copyright.23 this could have affected open access button’s performance. likewise, kopernio’s lower percentage of finding oa resources might relate to concerns over article versions as well. when creating an account on kopernio, the user is asked to affiliate themselves with an institution so that the tool can search existing library subscriptions at that institution. for this study, the authors did not affiliate with their home institutions when setting up kopernio to get a better idea of which content was open as opposed to content being accessible because of the tool connecting to a library’s subscription collection. if the authors were to identify assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 87 https://doi.org/10.6017/ital.v38i3.11109 with an institution, the number of accessible articles would likely increase, but this access would not be a true representation of what open content is discoverable. in addition, some tools might work better with certain publishers than others. for instance, kopernio did not appear to work with spandidos publications, a leading biomedical science publisher that publishes much of its content as gold oa, meaning the entire journal is published as oa. kopernio found just one open version of a spandidos article, compared to 153 by google scholar. this could be an unintentional malfunction either with spandidos or kopernio, which if fixed, could greatly increase the efficacy of this finding tool. however, open access button, lazy scholar, unpaywall, and google were able to find oa publications from spandidos at similar rates (135, 138, and 139, respectively) with no false positives. while none of the tools performed as well as google scholar, some of the tools were easier to use compared to google scholar. google scholar does not automatically show an open version first; instead, users often have to first select the “all x versions” option at the bottom of each record and then open each version until they find an open version. lazy scholar and unpaywall appear (for the most part) automatically, meaning users can see right away if an open version is available and then click a button once to be taken to that version. although open access button and kopernio do not show automatically if they have found an open version, users need to click a button on their toolbar once to activate each tool and see if the tool was able to find an open version. open access button also provides the extra benefit of making it easy for users to email authors to make their works open if an open version is not already available. relying on lazy scholar, unpaywall, or open access button first causes users no harm, and they can always rely on google scholar as a backup. whether all four tools are needed is questionable. for instance, a few of the authors found kopernio difficult to work with as it seemed to be incompatible with at least one publisher’s website and it introduced extra steps in downloading a pdf file. the fact that it also returned by far the fewest open versions—just 36 percent of the ones google scholar found and no unique hits—does not argue well for users to include it in their oa finding toolbox. also, while lazy scholar, unpaywall, and open access button all performed better on their own, the authors wonder what improvements could be created by combining the resources of the individual tools. conclusion the growth of oa finding tools is encouraging to see as far as helping to make oa works more discoverable. although the study showed that google scholar uncovered more articles than any of the other tools, the utility of at least two of the tools—lazy scholar and open access button—can still be seen in that both found articles not discovered by the other tools, including google scholar. indeed, using the tools in conjunction with one another appears to be the best method. and although open access button found the second fewest articles, the tool’s effort to integrate with interlibrary loan and discovery workflows, as well as its concern about legal issues are all promising for its future. likewise, kopernio might be a better tool for those interested in combining access to a library collection—which likely has a large number of final, publisher versions of scholarship—with their search for openly available scholarship. future studies can include newer oa finding tools that have entered the market, as well as evaluate the user experience of the tools. another study can also look at how well open access information technology and libraries | september 2019 88 button’s author email feature works. also, as open access button and unpaywall continue to move into new areas, such as interlibrary loan support, research could explore if these are more effective ways of connecting users to oa material as well as measure users’ understanding of oa versions they find. overall, the emergence of oa finding tools offers much potential for increasing the visibility of oa versions of scholarship, although no tool is perfect. however, if scholars wish to support oa through their research practices or find themselves unable to purchase or legally acquire the publisher's version, each of these tools can be valuable additions to their work. data statement the data used for this study has been shared publicly in the zenodo database under a cc-by 4.0 license at https://doi.org/10.5281/zenodo.2602200. endnotes 1 jisc, “browse by country and region,” accessed february 15, 2019, http://v2.sherpa.ac.uk/view/repository_by_country/countries=5fby=5fregion.html. 2 colby vorland, “extension,” accessed march 14, 2019, http://www.lazyscholar.org/; colby vorland, “data sources,” lazy scholar (blog), accessed march 14, 2019, http://www.lazyscholar.org/data-sources/. 3 “avoid paywalls, request research,” open access button, accessed march 14, 2019, https://openaccessbutton.org/. 4 unpaywall, “browser extension,” accessed march 14, 2019, https://unpaywall.org/products/extension. 5 kopernio, “faqs,” accessed march 14, 2019, https://kopernio.com/faq. 6 colby vorland, “features,” lazy scholar (blog), accessed march 14, 2019, http://www.lazyscholar.org/category/features/. 7 jeroen bosman et al., “the scholarly commons—principles and practices to guide research communication,” open science framework, september 15, 2017, https://doi.org/10.17605/osf.io/6c2xt. 8 chris bulock, “delivering open,” serials review 43, no. 3–4 (october 2, 2017): 268–70, https://doi.org/10.1080/00987913.2017.1385128. 9 elliot polak, email message to author, june 4, 2019. 10 bulock, "delivering open.” 11 david nicholas et al., “where and how early career researchers find scholarly information,” learned publishing 30, no. 1 (january 1, 2017): 19–29, https://doi.org/10.1002/leap.1087. https://doi.org/10.5281/zenodo.2602200 http://v2.sherpa.ac.uk/view/repository_by_country/countries=5fby=5fregion.html http://www.lazyscholar.org/ http://www.lazyscholar.org/data-sources/ https://openaccessbutton.org/ https://unpaywall.org/products/extension https://kopernio.com/faq http://www.lazyscholar.org/category/features/ https://doi.org/10.17605/osf.io/6c2xt https://doi.org/10.1080/00987913.2017.1385128 https://doi.org/10.1002/leap.1087 assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 89 https://doi.org/10.6017/ital.v38i3.11109 12 christine wolff, alisa b rod, and roger c. schonfeld, “ithaka s+r us faculty survey 2015,” 2015, 83, https://sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/. 13 mamiko matsubayashi et al., “status of open access in the biomedical field in 2005,” journal of the medical library association 97, no. 1 (january 2009): 4–11, https://doi.org/10.3163/15365050.97.1.002; michael norris, charles oppenheim, and fytton rowland, “the citation advantage of open-access articles,” journal of the american society for information science and technology 59, no. 12 (october 1, 2008): 1963–72, https://doi.org/10.1002/asi.20898; doug way, “the open access availability of library and information science literature,” college & research libraries 71, no. 4 (2010): 302–09; charles lyons and h. austin booth, “an overview of open access in the fields of business and management,” journal of business & finance librarianship 16, no. 2 (march 31, 2011): 108–24, https://doi.org/10.1080/08963568.2011.554786; hamid r. jamali and majid nabavi, “open access and sources of full-text articles in google scholar in different subject fields,” scientometrics 105, no. 3 (december 1, 2015): 1635–51, https://doi.org/10.1007/s11192-0151642-2; alberto martín-martín et al., “evidence of open access of scientific publications in google scholar: a large-scale analysis,” journal of informetrics 12, no. 3 (august 1, 2018): 819–41, https://doi.org/10.1016/j.joi.2018.06.012. 14 norris, oppenheim, and rowland, “the citation advantage of open-access articles”; micahel norris, fytton rowland, and charles oppenheim, “finding open access articles using google, google scholar, oaister and opendoar,” online information review 32, no. 6 (november 21, 2008): 709–15, https://doi.org/10.1108/14684520810923881; maria-francisca abad‐garcía, aurora gonzález‐teruel, and javier gonzález‐llinares, “effectiveness of openaire, base, recolecta, and google scholar at finding spanish articles in repositories,” journal of the association for information science and technology 69, no. 4 (april 1, 2018): 619–22, https://doi.org/10.1002/asi.23975. 15 norris, rowland, and oppenheim, “finding open access articles using google, google scholar, oaister and opendoar.” 16 jamali and nabavi, “open access and sources of full-text articles in google scholar in different subject fields.” 17 martín-martín et al., “evidence of open access of scientific publications in google scholar.” 18 stephen curry, “push button for open access,” the guardian, november 18, 2013, sec. science, https://www.theguardian.com/science/2013/nov/18/open-access-button-push; bonnie swoger, “the open access button: discovering when and where researchers hit paywalls,” scientific american blog network, accessed may 30, 2017, https://blogs.scientificamerican.com/information-culture/the-open-access-buttondiscovering-when-and-where-researchers-hit-paywalls/; lindsay mckenzie, “how a browser extension could shake up academic publishing,” chronicle of higher education 68, no. 33 (april 21, 2017): a29–a29; joyce valenza, “unpaywall frees scholarly content,” school library journal 63, no. 5 (may 2017): 11–11; barbara quint, “must buy? maybe not,” information today 34, no. 5 (june 2017): 17–17; michaela d. willi hooper, “product review: unpaywall [chrome & firefox browser extension],” journal of librarianship & scholarly communication 5 https://sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.3163/1536-5050.97.1.002 https://doi.org/10.3163/1536-5050.97.1.002 https://doi.org/10.1002/asi.20898 https://doi.org/10.1080/08963568.2011.554786 https://doi.org/10.1007/s11192-015-1642-2 https://doi.org/10.1007/s11192-015-1642-2 https://doi.org/10.1016/j.joi.2018.06.012 https://doi.org/10.1108/14684520810923881 https://doi.org/10.1002/asi.23975 https://www.theguardian.com/science/2013/nov/18/open-access-button-push https://blogs.scientificamerican.com/information-culture/the-open-access-button-discovering-when-and-where-researchers-hit-paywalls/ https://blogs.scientificamerican.com/information-culture/the-open-access-button-discovering-when-and-where-researchers-hit-paywalls/ information technology and libraries | september 2019 90 (january 2017): 1–3, https://doi.org/10.7710/2162-3309.2190; terry ballard, “two new services aim to improve access to scholarly pdfs,” information today 34, no. 9 (november 2017): cover-29; diana kwon, “a growing open access toolbox,” the scientist, accessed december 11, 2017, https://www.the-scientist.com/?articles.view/articleno/51048/title/agrowing-open-access-toolbox/; kent anderson, “the new plugins — what goals are the access solutions pursuing?,” the scholarly kitchen, august 23, 2018, https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywallpursuing/. 19 curry, “push button for open access”; swoger, “the open access button”; mckenzie, “how a browser extension could shake up academic publishing”; kwon, “a growing open access toolbox.” 20 jill emery, “how green is our valley?: five-year study of selected lis journals from taylor & francis for green deposit of articles,” insights 31, no. 0 (june 20, 2018): 23, https://doi.org/10.1629/uksg.406. 21 anna severin et al., “discipline-specific open access publishing practices and barriers to change: an evidence-based review,” f1000research 7 (december 11, 2018): 1925, https://doi.org/10.12688/f1000research.17328.1. 22 way, “the open access availability of library and information science literature.” 23 open access button, “open access button library service faqs,” google docs, accessed february 19, 2019, https://docs.google.com/document/d/1_hwkryg7qj7ff05cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook. https://doi.org/10.7710/2162-3309.2190 https://www.the-scientist.com/?articles.view/articleno/51048/title/a-growing-open-access-toolbox/ https://www.the-scientist.com/?articles.view/articleno/51048/title/a-growing-open-access-toolbox/ https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywall-pursuing/ https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywall-pursuing/ https://doi.org/10.1629/uksg.406 https://doi.org/10.12688/f1000research.17328.1 https://docs.google.com/document/d/1_hwkryg7qj7ff05-cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook https://docs.google.com/document/d/1_hwkryg7qj7ff05-cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook abstract introduction literature review methodology oa finding tools study limitations results discussion conclusion data statement endnotes letter from the editor (september 2019) letter from the editor kenneth j. varnum information technology and libraries | september 2019 1 https://doi.org/10.6017/ital.v38i3.11631 editorial board changes thanks to the dozens of lita members who applied to join the board this spring. the large number of interested volunteers made the selection process challenging. i’m pleased to welcome six new members to the ital editorial board for two-year terms (2019-2021): • lori ayre (independent technology consultant) • jon goddard (north shore public library) • soo-yeon hwang (sam houston state university) • holli kubly (syracuse university) • brady lund (emporia state university) • paul swanson (minitex) in this issue welcome to lita’s new president, emily morton-owens. in her inaugural president’s message, “sustaining lita,” morton-owens discusses the many ways lita strives to provide a sustainable organization for its members. we also have the next edition of our “public libraries leading the way column. this quarter’s essay is by thomas lamanna, “on educating patrons on privacy and maximizing library resources.” joining those essays are six excellent peer-reviewed articles: • “library-authored web content and the need for content strategy,” by courtney mcdonald and heidi burkhardt • “use of language-learning apps as a tool for foreign language acquisition by academic libraries employees,” by kathia ibacache • “is creative commons a panacea for managing digital humanities intellectual property rights?,” by yi ding • “am i on the library website?,” by suzanna conrad and christy stevens • “assessing the effectiveness of open access finding tools,” by teresa auch schultz, elena azadbakht, jonathan bull, rosalind bucy, and jeremy floyd • “creating and deploying usb port covers at hudson county community college,” by lotta sanchez and john delooper call for pllw contributions if you work at a public library, you’re invited to submit a proposal for a column in our “public libraries leading the way” series for 2020. our series has gotten off to a strong start with essays by thomas finley, jeffrey davis, and thomas lamanna. if you would like to add your voice, please submit a proposal through this google form. kenneth j. varnum, editor varnum@umich.edu september 2019 https://doi.org/10.6017/ital.v38n3.11627 https://doi.org/10.6017/ital.v38n3.11571 https://doi.org/10.6017/ital.v38n3.11571 https://doi.org/10.6017/ital.v38n3.11627 https://doi.org/10.6017/ital.v38n3.11077 https://doi.org/10.6017/ital.v38n3.11077 https://doi.org/10.6017/ital.v38n3.10714 https://doi.org/10.6017/ital.v38n3.10714 https://doi.org/10.6017/ital.v38n3.10977 https://doi.org/10.6017/ital.v38n3.11009 https://doi.org/10.6017/ital.v38n3.11007 https://doi.org/10.6017/ital.v38n1.10974 https://doi.org/10.6017/ital.v38n2.11141 https://doi.org/10.6017/ital.v38n3.11571 mailto:https://docs.google.com/forms/d/e/1faipqlsfqu7c9ogmcdvvbn025a0kiehavrrlr7090ao3rowqypbqtng/viewform?usp=sf_link mailto:varnum@umich.edu editorial board changes in this issue call for pllw contributions academic libraries on social media: finding the students and the information they want heather howard, sarah huber, lisa carter, and elizabeth moore information technology and libraries | march 2018 8 heather howard (howar198@purdue.edu) is assistant professor of library science; sarah huber (huber47@purdue.edu) is assistant professor of library science; lisa carter (carte241@purdue.edu) is library assistant; and elizabeth moore (moore658@purdue.edu) is library assistant and student supervisor at purdue university. librarians from purdue university wanted to determine which social media platforms students use, which platforms they would like the library to use, and what content they would like to see from the library on each of these platforms. we conducted a survey at four of the nine campus libraries to determine student social media habits and preferences. results show that students currently use facebook, youtube, and snapchat more than other social media types; however, students responded that they would like to see the library on facebook, instagram, and twitter. students wanted nearly all types of content from the libraries on facebook, twitter, and instagram, but they did not want to receive business news or content related to library resources on snapchat. youtube was seen as a resource for library service information. we intend to use this information to develop improved communication channels, a clear social media presence, and a cohesive message from all campus libraries. introduction in his book tell everyone: why we share and why it matters, alfred hermida states, “people are not hooked on youtube, twitter or facebook but on each other. tools and services come and go; what is constant is our human urge to share.”1 libraries are places of connection, where people connect with information, technologies, ideas, and each other. as such, libraries look for ways to increase this connection through communication. social media is a key component of how students communicate with classmates, families, friends, and other external entities. it is essential for libraries to communicate with students regarding services, collections, events, library logistics, and more. purdue university is a large, land-grant university located in west lafayette, indiana, with an enrollment of more than forty thousand. the purdue libraries consist of nine libraries, presented collectively on the social media platforms facebook and twitter since 2009 and youtube since 2012. going forward, the purdue libraries want to ensure it establishes a cohesive message and brand that is communicated to students on platforms they use and on which they will engage with it. the purpose of this study was to determine which social media platforms the students are currently using, which platforms they would like the library to use, and what content they would like to see from the libraries on each of these platforms. mailto:howar198@purdue.edu mailto:huber47@purdue.edu mailto:carte241@purdue.edu mailto:moore658@purdue.edu academic libraries on social media | howard, huber, carter, and moore 9 https://doi.org/10.6017/ital.v37i1.10160 literature review academic libraries and social media academic libraries have been slow to accept social media as a venue for either promoting their services or academic purposes. a 2007 study of 126 academic librarians found that only 12 percent of those surveyed “identified academic potential or possible benefits” of facebook while 54 percent saw absolutely no value in social media.2 however, the mission of academic libraries has shifted in the last decade from being a repository of knowledge to being a conduit for information literacy; new roles include being a catalyst for on-campus collaboration and a facilitator for scholarly publication within contemporary academic librarianship.3 academic librarians have responded to this change, with many now believing that “social media, which empowers libraries to connect with and engage its diverse stakeholder groups, has a vital role to play in moving academic libraries beyond their traditional borders and helping them engage new stakeholder groups.”4 student perceptions about academic libraries on social media as the use of social media has grown with college-aged students, so has an increasing acceptance of academic libraries using social media to communicate. a pew research center report from 2005 showed just 7 percent of eighteen to twenty-nine year olds using social media. by 2016, 86 percent were using social media.5 in 2007 the oclc asked 511 college students from six different countries to share their thoughts on libraries using social networking sites. this survey revealed that “most college students would be unlikely to participate in social networking services offered by a library,” with just 13 percent of students believing libraries have a place on social media.6 however, just two years later (in 2009), a shift was seen: students were open to connecting with academic libraries, as observed in a survey of 366 freshmen at valparaiso university. when asked their thoughts on the library sending announcements and communications to them via facebook or myspace (a social media powerhouse at the time), 42.6 percent answered they would be “more receptive to information received in this way than any other response.” a smaller group, 12.3 percent, responded more negatively to this approach. students showed concern for their privacy and the level of professionalism, as a quote from a student illustrates: “facebook is to stay in touch with friends or teachers from the past. email is for announcements. stick with that!!!” 7 as students report becoming more open to academic libraries on social media, the question of whether they will engage through social media emerges. a recent study from western oregon university’s hammersley library asked this question with promising results. forty percent of students said they were either “very likely “or “somewhat likely” to follow the library on instagram and twitter, as opposed to wanting communications being sent to them directly through social media (for example, a facebook message). pinterest followed, with 33 percent of students saying they were either “very likely” or “somewhat likely” to follow the library using this platform.8 throughout the literature, students have shown an interest in information about the libraries that is useful to them. in another survey given to undergraduate students from three information technology classes at florida state university, one question examined the perceived importance of different library social media postings to students. the report showed students considered postings related to operations updates, study support, and events as the most important.9 in the hammersly study noted above, 78 percent and 87 percent of respondents said information technology and libraries | march 2018 10 they were either “very interested” or “somewhat interested,” respectively, in every category relating to library resources presented in the survey, but “interesting/fun websites and memes” received the least interest from participants.10 the literature shows an increase in students being receptive to academic libraries on social media. results vary campus to campus and students are leery of libraries reaching out to them via social media, but they have an increasingly positive view about content posted that will help them with the library. research questions the aim of this project was to investigate the social media behaviors of purdue university students as they relate to the libraries, and to develop evidence-based practices for managing the library’s social media accounts. the project focused on three research questions: 1. what social media platforms are students using? 2. what social media platforms do students want the library to use? 3. what kind of content do students want from the library on each of these platforms? methods we created the survey using the web-based qualtrics survey software. it was distributed in electronic form only, and it was promoted to potential respondents via table tents in the libraries, bookmarks at the library desk, facebook posts, and in-classroom promotion. potential respondents were advised that the survey was anonymous and voluntary. the survey consisted of closed questions, though many questions contained an open-ended field for answers that did not fall into the provided choices. inspiration for some of the options in our survey questions came from the hammersly library study, as we felt they did a good job capturing information about the social media usage of their patrons.11 our survey asked what social media platforms students use, what they use them for, how often they visit the library, how likely they are to follow the library on social media, which platforms they want the library to have, and what content they would like from the library on each of those platforms. the social media platforms included were facebook, flickr, g+, instagram, linkedin, pinterest, qzone, renren, snapchat, tumblr, twitter, youtube, and yik yak.12 there were also open-ended spaces where participants could write in additional platforms. the survey originally ran for three weeks in only the business library early in the spring 2017 semester, as its intended purpose was to inform how the business library would manage social media. after that survey was completed, we decided to replicate the survey in three additional libraries (humanities, social science, and education; engineering; and the main undergraduate libraries). this was done to expand the dataset and reach additional students in a variety of disciplines. these libraries were chosen because they were the libraries in which the authors work, with the hope to expand to additional libraries in the future. the second survey also lasted for three weeks starting in mid-april of the spring 2017 semester. as a participation incentive, students who completed the initial survey and the second survey had an opportunity to enter a drawing for a $25 visa gift card. academic libraries on social media | howard, huber, carter, and moore 11 https://doi.org/10.6017/ital.v37i1.10160 the survey was advertised across four different campus libraries and promoted in several ways to reach different populations. though the results are not from a random sample of the student population, the results are broad enough that we intend to apply them to our entire student population. results survey the survey was completed by 128 students. an additional 13 students began the survey but did not complete it; we removed their results from the analysis. the breakdown of respondents was 10 percent freshmen (n = 13), 22 percent sophomore (n = 28), 27 percent junior (n = 35), 20 percent senior (n = 25), and 21 percent graduate or professional (n = 27). library usage the students were asked how frequently they visit the library to determine if the survey was reaching a population of regular or infrequent library visitors. the results showed that the students who completed the survey were primarily frequent library users, with 93 percent (n = 119) visiting once a week or more. social media platforms the students were asked to identify which social media platforms they used and how frequently they used them. the most popular social media platforms were determined by combining the number of students who said they used them daily or weekly. the top five were facebook (n = 114, 88 percent), youtube (n = 102, 79 percent), snapchat (n = 90, 70 percent), instagram (n = 85, 66 percent), and twitter (n = 41, 32 percent). full results are in table 1. table 1. usage frequency by platform social media platform daily weekly monthly < once per month never facebook 94 (72.87%) 20 (15.50%) 5 (3.88%) 5 (3.88%) 4 (3.10%) flickr 0 (0.00%) 1 (0.78%) 2 (1.55%) 8 (6.20%) 117 (90.70%) g+ 3 (2.33%) 6 (4.65%) 4 (3.10%) 16 (12.40%) 99 (76.74%) instagram 68 (52.71%) 17 (13.18%) 5 (3.88%) 11 (8.53%) 27 (20.93%) linkedin 9 (6.98%) 29 (22.48) 22 (17.05%) 22 (17.05%) 46 (35.66%) pinterest 12 (9.30%) 12 (9.30%) 16 (12.40%) 19 (14.73%) 69 (53.49%) qzone 0 (0.00%) 0 (0.00%) 0 (0.00%) 4 (3.10%) 124 (96.12%) renren 0 (0.00%) 0 (0.00%) 1 (0.78%) 3 (2.33%) 124 (96.12%) snapchat 84 (65.12%) 6 (4.65%) 6 (4.65%) 7 (5.43%) 25 (19.38%) tumblr 7 (5.43%) 2 (1.55%) 7 (5.43%) 11 (8.53%) 101 (78.29%) information technology and libraries | march 2018 12 social media platform daily weekly monthly < once per month never twitter 28 (21.71%) 13 (10.08%) 12 (9.30%) 9 (6.98%) 66 (51.16%) youtube 58 (44.96%) 44 (34.11%) 15 (11.63%) 4 (3.10%) 7 (5.43%) yik yak 0 (0.00%) 0 (0.00%) 0 (0.00%) 11 (8.53%) 117 (90.70%) other: email 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: groupme 3 (2.33%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: reddit 2 (1.55%) 2 (1.55%) 0 (0.00%) 0 (0%) 0 (0.00%) other: skype 0 (0.00%) 0 (0.00%) 0 (0.00%) 1 (0.78%) 0 (0.00%) other: vine 0 (0.00%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: wechat 3 (2.33%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: weibo 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: whatsapp 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) social media activity next, students were asked how much time they spend on social media doing the following activities: watching videos, keeping in touch with friends/family, sharing photos, keeping in touch with classmates/professors, learning about campus events, doing research, getting news, or following public figures. table 2 shows that students overwhelmingly use social media daily or weekly to watch videos (94 percent, n = 120), keep in touch with family/friends (93 percent, n = 119), and to get news (81 percent, n = 104). the least popular activities, those that students do less than once per month or never, were research (47 percent, n = 60) and to following public figures (34 percent, n = 45). social media and the library the students were asked how likely they are to follow the libraries on social media. the response to this was primarily positive, with 57 percent of respondents saying they are either extremely likely or somewhat likely to follow the library. one response for this question was inexplicably null, so for this question n = 127. figure 1 contains the full results. academic libraries on social media | howard, huber, carter, and moore 13 https://doi.org/10.6017/ital.v37i1.10160 table 2. social media activity social media activity daily weekly monthly < once per month never watch videos 85 (66.41%) 35 (27.34%) 1 (0.78%) 4 (3.13%) 3 (2.34%) keep in touch with friends/family 89 (69.53%) 30 (23.44%) 6 (4.69%) 2 (1.56%) 1 (0.78%) share photos 32 (25%) 33 (25.78%) 38 (29.69%) 20 (15.63%) 5 (3.91%) keep in touch with classmates/professors 34 (26.56% 47 (36.72%) 21 (16.41%) 19 (14.84%) 7 (5.47%) learn about campus events 24 (18.75%) 53 (41.41%) 29 (22.66%) 18 (14.06%) 4 (3.13%) do research 24 (18.75%) 26 (20.31%) 18 (14.06%) 23 (17.97%) 37 (28.91%) get news 66 (51.56%) 38 (29.69%) 7 (5.47%) 9 (7.03%) 8 (6.25%) follow public figures 34 (26.56%) 30 (23.44%) 20 (15.63%) 19 (14.84%) 24 (18.75%) other 2 (1.56%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) figure 1. library social media follows. 12 66 23 16 10 0 10 20 30 40 50 60 70 extremely likely somewhat likely neither likely nor unlikely somewhat unlikely extremely unlikely how likely are you to follow the library on social media? information technology and libraries | march 2018 14 the students were asked which social media platforms they thought the library should be on. five rose to the top of the results: facebook (82 percent, n = 105), instagram (55 percent, n = 70), twitter (40 percent, n = 51), snapchat (34 percent, n = 44), and youtube (29 percent, n = 37). full results can be seen in figure 2. after a student selected a platform they wanted the library to be on, logic built into the survey then directed them to an additional question that asked what content they would like to see from the library on that platform. content included library logistics (hours, events, etc.), research techniques and tips, how to use library resources and services, library resource info (database instruction/tips, journal availability, etc.), business news, library news (e.g., if the library wins an award), campus-wide info/events, and interesting/fun websites and memes. for facebook, students widely selected all types of content, with the most selections made for library logistics (n = 73) and the fewest made for business news (n = 33). for instagram, students wanted all content except business news (n = 18). snapchat was similar, except along with business news (n = 8), students also were not interested in receiving content related to library resource information (n = 9). twitter was similar to facebook in that all content was widely selected. youtube had a focus on library services, with the three most-selected content options being research techniques and tips (n = 20), how to use library resources and services (n = 19), and library resource info (n = 16). table 3 contains the full results. figure 2. library social media presence. 105 7 70 23 10 1 1 44 5 51 37 0 20 40 60 80 100 120 facebook g+ instagram linkedin pinterest qzone renren snapchat tumblr twitter youtube what social media platform should the library be on? academic libraries on social media | howard, huber, carter, and moore 15 https://doi.org/10.6017/ital.v37i1.10160 table 3. library social media content by platform what type of content would you like to see from the library? content type f a c e b o o k (n = 1 0 5 ) g + (n = 7 ) in s ta g r a m (n = 7 0 ) l in k e d in (n = 2 3 ) p in te r e s t (n = 1 0 ) s n a p c h a t (n = 4 4 ) t u m b lr (n = 5 ) t w itte r (n = 5 1 ) y o u t u b e (n = 3 7 ) library logistics (hours, events, etc.) 73 (69.52%) 2 (28.57%) 34 (48.57%) 7 (30.43%) 4 (40%) 23 (52.27%) 2 (40%) 32 (62.75%) 8 (21.62%) research techniques & tips 52 (49.52%) 3 (42.85%) 28 (40%) 13 (56.53%) 7 (70%) 19 (43.18%) 3 (60%) 27 (52.94%) 20 (54.05%) how to use library resources & services 53 (50.48%) 3 (42.85%) 26 (37.14%) 8 (34.78%) 7 (70%) 16 (36.36%) 3 (60%) 25 (49.02%) 19 (51.35%) library resource info (database instruction/tips , journal availability, etc.) 53 (50.48%) 3 (42.85%) 22 (31.42%) 8 (34.78%) 6 (60%) 9 (20.45%) 2 (40%) 23 (45.10%) 16 (43.24%) business news 33 (31.43%) 2 (28.57%) 18 (25.71%) 13 (56.52%) 3 (30%) 8 (18.18%) 2 (40%) 17 (33.33%) 7 (18.92%) library news (e.g., if the library wins an award) 49 (46.67%) 3 (42.85%) 37 (52.86%) 12 (52.17%) 5 (50%) 19 (43.18%) 3 (60%) 24 (47.06%) 7 (18.92%) campus-wide info/events 73 (69.52%) 3 (42.85%) 42 (60%) 5 (21.74%) 5 (50%) 26 (59.09%) 2 (40%) 35 (68.63%) 13 (35.14%) interesting/fun websites & memes 48 (45.71%) 0 41 (58.57%) 2 (8.70%) 10 (100%) 30 (68.18%) 3 (60%) 26 (50.98%) 12 (32.43%) other 1 (0.95%) 0 2 (2.86%) 0 1 (10%) 2 (4.55%) 0 2 (3.92%) 1 (2.70%) discussion historically, libraries have used social media as a marketing tool.13 with social media’s everincreasing popularity with young adults, academic libraries have actively established a presence on several platforms.14 our survey shows that our students follow this trend, using social media regularly and for a variety of activities. we were surprised that facebook turned out to be the information technology and libraries | march 2018 16 most widely used by our students, as much has been written in the last few years about teens and young adults leaving the platform.15 a november 2016 survey, however, found that 65 percent of teens said they used facebook daily, a large increase from 59 percent in november 2014. though snapchat and instagram preferred, teens continue to use facebook for its utility in scheduling events or keeping in touch regarding homework.16 students do seem receptive to following the library on different platforms and report wanting primarily library-related content from us, including more in-depth content such as research techniques and database instruction. limitations and future work findings from this study give insight into opportunities for libraries to reach university students through social media. we acknowledge that only limited generalizations can be made because of the way the survey was conducted. our internal recruitment methods led to a selection bias in our surveyed population, as advertisement of the survey took place either in the chosen libraries or on the purdue libraries’ existing facebook page. because of this, our sample consists primarily of students who visit the library or already follow the library on facebook. we hope to alter this in future surveys by expanding our recruitment to other physical spaces across campus. in addition, we plan to add questions that first establish a better understanding of students’ opinions of libraries being on social media before asking what social media they would like to see libraries use. this would potentially avoid leading students to an answer. further, we are concerned we took for granted students’ understanding of library resources; that is, we may have made distinctions librarians understand, but students may not. in future studies, we plan to rephrase, and possibly combine, questions in a way that will be clear to people less familiar with library resources and services. we believe confusion with these questions created contradictory responses. for example, “research help through social media” received a low response rate, but “information on research techniques and tips” received a much higher response rate. additionally, a limitation of using a survey to collect behavior information is that respondents do not always report how they actually behave. using methods such as focus groups, interviews, text mining, or usability studies could provide a more holistic view of student behavior. duplication of this study on a yearly or semi-yearly basis across all libraries could help us see how social media preferences change over time and across a larger sample of our population. this study aimed to provide a broad view of a large university’s student body by surveying across different subject libraries. with the changes discussed, we think a revised survey could give us the detailed information we need to build a more effective social media strategy that reaches both library users and non-users. conclusion this study improved our understanding of the social media usage and preferences of purdue students. from these results, we intend to develop better communication channels, a clear social media presence, and a more cohesive message across the purdue libraries. under the direction of our new director of strategic communication, a social media committee was formed with representatives from each of the libraries to contribute content for social media. the committee will consider expanding the purdue libraries’ social media presence to communication channels where students have said they are and would like us to be. as social media usage is ever-changing, we recommend repeated surveys such as this to better understand where on social media students want to see their libraries and what information they want to receive from them. academic libraries on social media | howard, huber, carter, and moore 17 https://doi.org/10.6017/ital.v37i1.10160 references 1 alfred hermida, tell everyone: why we share and why it matters (toronto: doubleday canada, 2014), 1. 2 laurie charnigo and paula barnett-ellis, “checking out facebook.com: the impact of a digital trend on academic libraries,” information technology and libraries 26, no. 1 (march 2007): 23–34, https://doi.org/10.6017/ital.v26i1.3286. 3 stephen bell, lorcan dempsey, and barbara fister, new roles for the road ahead: essays commissioned for the acrl’s 75th anniversary (chicago: association of college and research libraries, 2015). 4 amanda harrison et al., “social media use in academic libraries: a phenomenological study,” journal of academic librarianship 43, no. 3 (may 1, 2017): 248–56, https://doi.org/10.1016/j.acalib.2017.02.014. 5 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 6 online computer library center, sharing, privacy and trust in our networked world: a report to the oclc membership, (dublin, ohio: oclc, 2007)), https://eric.ed.gov/?id=ed532599. 7 ruth sara connell, “academic libraries, facebook and myspace, and student outreach: a survey of student opinion,” portal: libraries and the academy 9, no. 1 (january 8, 2009): 25–36, https://doi.org/10.1353/pla.0.0036. 8 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344. 9 besiki stvilia and leila gibradze, “examining undergraduate students’ priorities for academic library services and social media communication,” journal of academic librarianship 43, no. 3 (may 1, 2017): 257–62, https://doi.org/10.1016/j.acalib.2017.02.013. 10 brookbank, “so much social media, so little time.” 11 stvilia and gibradze, “examining undergraduate students’ priorities.” 12 qzone and renren are chinese social media platforms. 13 curtis r. rogers, “social media, libraries, and web 2.0: how american libraries are using new tools for public relations and to attract new users,” south carolina state library, may 22, 2009, http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_20 09-5.pdf?sequence=1; jakob harnesk and marie-madeleine salmon, “social media usage in libraries in europe—survey findings,” linkedin slideshare slideshow presentation, august https://doi.org/10.6017/ital.v26i1.3286 https://doi.org/10.1016/j.acalib.2017.02.014 http://www.pewinternet.org/fact-sheet/social-media/ https://eric.ed.gov/?id=ed532599 https://doi.org/10.1353/pla.0.0036 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1016/j.acalib.2017.02.013 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 information technology and libraries | march 2018 18 10, 2010, https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europesurvey-teaser. 14 “social media fact sheet.” 15 daniel miller, “facebook’s so uncool, but it’s morphing into a different beast,” the conversation, 2013, http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-differentbeast-21548; ryan bradley, “understanding facebook’s lost generation of teens,” fast company, june 16, 2014, https://www.fastcompany.com/3031259/these-kids-today; nico lang, “why teens are leaving facebook: it’s ‘meaningless,’” washington post, february 21, 2015, https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teensare-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662. 16 alison mccarthy, “survey finds us teens upped daily facebook usage in 2016,” emarketer, january 28, 2017, https://www.emarketer.com/article/survey-finds-us-teens-upped-dailyfacebook-usage-2016/1015053. https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 https://www.fastcompany.com/3031259/these-kids-today https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 introduction literature review academic libraries and social media student perceptions about academic libraries on social media research questions methods results survey library usage social media platforms social media activity social media and the library discussion limitations and future work conclusion references persistent urls and citations offered for digital objects by digital libraries article persistent urls and citations offered for digital objects by digital libraries nicholas homenda information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12987 abstract as libraries, archives, and museums make unique digital collections openly available via digital library platforms, they expose these resources to users who may wish to cite them. often several urls are available for a single digital object, depending on which route a user took to find it, but the chosen citation url should be the one most likely to persist over time. catalyzed by recent digital collections migration initiatives at indiana university libraries, this study investigates the prevalence of persistent urls for digital objects at peer institutions and examines the ways their platforms instruct users to cite them. this study reviewed institutional websites from the digital library federation’s (dlf) published list of 195 members and identified representative digital objects from unique digital collections navigable from each institution’s main web page in order to determine persistent url formats and citation options. findings indicate an equal split between offering and not offering discernible persistent urls with four major methods used: handle, doi, ark, and purl. significant variation in labeling persistent urls and inclusion in item-specific citations uncovered areas where the user experience could be improved for more reliable citation of these unique resources. introduction libraries, archives, and museums often make their unique digital collections openly available in digital library services and in different contexts, such as digital library aggregators like the digital public library of america (dpla, https://dp.la/) and hathitrust digital library (https://www.hathitrust.org/). as a result, there can be many urls available that point to digital objects within these collections. take, for example, image collections online (http://dlib.indiana.edu/collections/images) at indiana university (iu), a service launched in 2007 featuring open access iu image collections. users discover images on the site through searching and browsing and its collections are also shared with dpla. the following urls exist for the digital object shown in figure 1, an image from the building a nation: indiana limestone photograph collection: • the url as it appears in the browser in image collections online: https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/i mages/vac5094/vac5094-01446 • the persistent url on that page (“bookmark this page at”) http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 • the url pasted from the browser for the image in dpla: https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone nicholas homenda (nhomenda@indiana.edu) is digital initiatives librarian, indiana university bloomington. © 2021. https://dp.la/ https://www.hathitrust.org/ http://dlib.indiana.edu/collections/images https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone mailto:nhomenda@indiana.edu information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 2 as a digital library or collection manager, which url would you prefer to see cited for this object? figure 1. an example of a digital object with multiple urls. mcmillan mill, ilco id in2288_1. courtesy, indiana geological and water survey, indiana university, bloomington, indiana. retrieved from image collections online at http://purl.dlib.indiana.edu/iudl/images/vac5094/vac509401446. citation instructions given to authors in major style guides explicitly mention using the best possible form of a resource’s url: “[i]t is important to choose the version of the url that is most likely to continue to point to the source cited.”1 of the three urls above, the second is a purl, or persistent url (https://archive.org/services/purl/), which is why both image collections online and dpla instruct users to bookmark or cite it. other common methods for issuing and maintaining persistent urls include digital object identifiers (doi, https://www.doi.org/), handles (http://handle.net/), and archival resource keys (ark, https://n2t.net/e/ark_ids.html). all of those have been around since the late 1990s to early 2000s. at indiana university libraries, recent efforts have focused on migrating digital collections to new digital library platforms, mainly based on the open source samvera repository software (https://samvera.org/). as part of these efforts, we wanted to survey how peer institutions were http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://archive.org/services/purl/ https://www.doi.org/ http://handle.net/ https://n2t.net/e/ark_ids.html https://samvera.org/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 3 employing persistent, citable urls for digital objects to determine if a prevailing approach had emerged since indiana university libraries’ previous generation of digital library services were developed in the earlyto mid-2000s. besides having the capability of creating and reliably serving these urls, our digital library platforms need to make these urls easily accessible to users, preferably along with some assertion that the urls should be used when citing digital objects and collections instead of the many non-persistent urls also directing to those same digital objects and collections. although libraries, archives, and museums have digitized and made digital objects in digital collections openly accessible for decades using several methods for providing persistent, citable urls, how do institutions now present digital object urls to people who encounter, use, and cite them? by examining digital collections within a large population of digital library institutions’ websites, this study aims to discover 1. what methods of url persistence are being employed for digital objects by digital library institutions? 2. how do these institutions’ websites instruct users to cite these digital objects? literature review the study of digital objects in the literature often takes a philosophical perspective in attempting to define them. moreover, practical accounts of digital object use and reuse note the challenges associated with infrastructure, retrieval, and provenance. much of the literature about common methods of persistent url resolution comes from individuals and entities who developed and maintain these standards, as well as overviews of the persistent url resolution methods available. finally, several studies have investigated the problem of “link rot” by tracking the availability of web-hosted resources over time. allison notes the generations of philosophical thought that it took to recognize common characteristics of physical objects and the difficulty in understanding an authentic version of a digital object, especially with different computer hardware and software changing the way digital objects appear.2 hui also investigates the philosophical history of physical objects to begin to define digital objects through his methods of datafication of objects and objectification of data, noting that digital objects can be approached in three phases: objects, data, and networks, in order to define them.3 lynch is also concerned with determining the authenticity of digital objects and challenges inherent in the digital realm. in describing digital objects, he creates a hierarchy with raw data at the bottom, elevated to interactive experiential works at the top which elicit the fullest emotional connection contributing to the authentic experience of the work.4 the literature often examines digital objects from the practitioner’s perspective, such as the publishing industry’s difficulty in repurposing digital objects for new publishing products. publishers in benoit and hussey’s 2011 case study note the tension between managers and technical staff concerning assumptions about what their computer system could automatically do with their digital objects; their digital objects always require some human labor and intervention to be accurately described and retrievable later. 5 dappert et al. note the need to describe a digital object’s environment in order to be able to reproduce it in their work with the premis data dictionary for preservation metadata (https://www.loc.gov/standards/premis/).6 strubulis et al. provide a model for digital object provenance using inference and resource description framework (rdf) triples (https://w3.org/rdf/) since storing full provenance information for https://www.loc.gov/standards/premis/ https://w3.org/rdf/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 4 complex digital objects, such as the large amount of mars rover data they offer as an example, would be cost prohibitive.7 in 2001, arms describes the landscape of persistent uniform resource names (urn) of handles, purls, and dois near the latter’s inception.8 recent work by koster explains the persistent identifier methods most in use today and examines current infrastructure practices for maintaining them.9 the persistent link resolution method most prominently featured in the literature is the digital object identifier (doi). beginning in 1999, those behind developing and implementing doi have explained its inception, development, and trajectory, continuing with paskin’s deep explanation in 2002 of the reasons why doi exist and the technology behind the service. 10 discipline-specific research notes the utility of doi. sidman and davidson and weissberg studied doi for the purposes of automating the supply chain in the publishing industry.11 derisi, kennison, and twyman, on behalf of the public library of science (plos) announced their 2003 decision to broadly implement doi, followed by additional disciplinespecific encouragement of the practice by skiba in nursing education and neumann and brase in molecular design.12 the archival resource key (ark) is an alternative permanent link resolution scheme. since 2001, the open-source ark identifier offers a self-hosted solution for providing persistent access to digital objects, their metadata, and a maintenance commitment.13 recently, duraspace working groups have planned for further development and expansion of ark with the arks in the open project (https://wiki.lyrasis.org/display/arks/arks+in+the+open+project). persistent urls (purls) have been used to provide persistent access to digital objects for nearly 20 years, and their use in the library community is well documented. shafer, weibel, and jul anticipate uniform resource names becoming a web standard and offer purls as an intermediate step to aid in urn development.14 shafer also explained how oclc uses purls and alternate routing methods (arms) to properly direct global users to oclc resources.15 purls are also used to provide persistent access to government information and were seen by the cendi persistent identification task group as essential to their early efforts to implement the federal enterprise architecture (fea) and a theoretical federal persistent identification resolver.16 digital objects and collections should ideally be accessible via urls that work beyond the life of any one platform, lest the materials be subjected to “link rot,” or the process of decay when previously working links no longer correctly resolve. ducut et al. investigated 1994–2006 medline abstracts for the presence of persistent link resolution services such as handle, purl, doi, and webcite and found 20% of the links were inaccessible in 2008.17 mcmurry et al. investigated link rot in life sciences data and suggested practices for formatting links for increased persistence and approaches for versioning.18 the topic of link rot has been examined as early as 2003, in markwell and brooke’s “broken links: just how rapidly do science education hyperlinks go extinct,” cited by multiple link rot studies. ironically, this article is no longer accessible at the cited url.19 methodology this study sought a set of digital objects within library institutions’ digital collections websites. to locate examples of publicly accessible digital objects in digital collections, this study collected institutional websites from the digital library federation’s (dlf) published list of 195 members https://wiki.lyrasis.org/display/arks/arks+in+the+open+project information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 5 as of august 2019.20 subsequent investigation aimed to find one representative digital object from unique digital collections navigable from each institution’s main web page. this study aimed to locate digital collections that met the following criteria: 1. collections are openly available. 2. collections are in a repository service, as opposed to highlighted content visible on an informational web page or blog. 3. collections are gathered within a site or service that contains multiple collections, as opposed to individual digital project websites, when possible. 4. collections are unique to an institution, as opposed to duplicated or licensed content. these criteria were developed in an effort to find unique, publicly accessible digital objects within each institution’s digital collections. to be sure, users search for and discover materials in a variety of ways and in numerous services, but studying the information-seeking behavior of users looking for digital objects or digital collections is outside the scope of this study. ultimately, digital collections indexed by search engines or available in aggregator services like dpla often contain links to collections and objects in their institutionally hosted platforms. users who discover these materials are likely to be directed to the sites this study investigated. for the purposes of this study, at least one digital collection was investigated from each dlf institution. multiple sites for an institution were investigated when more than one publicly accessible site or service met the above criteria. when digital collections at an institution were delivered only through the library catalog discovery service, reasonable attempts were made to delimit discoverable digital collections content. in total, 183 digital collections were identified for this study. once digital collections were located, subsequent investigation aimed to locate individual digital objects within them. while digital objects represent diverse materials available in a variety of formats, for ease of comparing approaches between institutions, a mixture of ind ividual digital images, multipage digital items, and audiovisual materials were examined. objects for this study were primarily available in websites containing a variety of collections and format types with common display characteristics despite format differences, and no additional efforts were made to locate equal or proportional digital object formats at each institution. one representative digital object was identified per digital collection, totaling 183 digital objects. once a digital object was located at an institution, the object’s unique identifier, format, persistent url, persistent url label, method of link resolution (if identifiable), and citation were collected with particular focus on the object’s persistent url, if available. commonly used persistent url types and their url components can be identified, as seen in table 1; however, any means of persistence was collected if clearly identified. after examining initial results, the object’s provided citation, if available, was added to the list of data collected since many digital collection platforms provide recommended citations for individual objects. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 6 table 1. commonly used persistent url methods and corresponding url components persistent url type url component archival resource key (ark) ark:/ digital object identifier (doi) doi.org/ (or doi:) handle hdl.handle.net persistent url (purl) purl. results most institutions have a single digital collection site or service that met the selection criteria for this study. some appear to have multiple digital collection repositories, often separated by digital object format or library department, and many institutions have collections that are only publicly accessible through discrete project web sites, such as digital exhibits or focused digital humanities research projects. out of 195 dlf member institutions, 171 had publicly accessible digital collections. of these 171 institutions, 153 had digital collections services/sites that adhered to the criteria of this study, while 21 had only project-focused digital collections sites. since several institutions had more than one digital collection platform accessible via their main institutional website, a population of 183 digital collections were investigated. one representative digital object from each collection was gathered, consisting of 107 digital images, 73 multipage items, and 3 audiovisual items (totaling 183). table 2. number of instances of digital collection platforms identified platform number percentage of total (183) custom or unidentifiable 53 29% contentdm 46 25% islandora 19 10% dspace 11 6% samvera 11 6% omeka 10 5% internet archive 7 4% digital commons 6 3% fedora custom 4 2% luna 3 2% xtf 3 2% artstor 2 1% iiif server 2 1% primo 2 1% aspace 1 1% elevator 1 1% knowvation 1 1% veridian 1 1% information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 7 as seen in table 2, almost a third of digital collection platforms encountered appear to be customdeveloped or customized to not reveal the software platform upon which they were based. of the platform-based services encountered where software was identifiable, 17 different platforms were used and the top five were contentdm, islandora, dspace, samvera (hyrax, avalon, curation concerns, etc.), and omeka. table 3. occurrence of persistent links in surveyed digital collections, method of link persistence, and persistent link labels persistent links? number percentage of total (183) no/unknown 93 51% yes/ persistence claimed 90 49% persistent link method number percentage of total (90) unknown 33 37% handle 27 30% ark 19 21% doi 6 7% purl 5 6% persistent link label number percentage of total (90) othera 24 26.7% permalink 22 24.4% identifier 13 14.4% [no label given] 10 11.1% permanent link 7 7.8% uri 5 6% persistent link 3 3.3% handle 2 2.2% link to the book 2 2.2% persistent url 2 2.2% atwenty-four other persistent link labels were reported,21 each occurring only once. as seen in table 3, the numbers of digital objects with and without publicly accessible persistent (or seemingly persistent) links were nearly equal. among the digital objects with persistent links, the majority claimed persistence without a discernible resolution method, with the rest divided between handle, ark, doi, and purl. these objects also had 33 different labels for these links in the public-facing interface. the top five labels were: permalink (22), identifier (13), permanent link (7), uri (5), and persistent link (3). as seen in table 4, the majority of digital objects surveyed had a unique item identifier in their publicly viewable item record. the majority did not offer a citation in the item’s publicly viewable record. among items that offered citations, the majority contained a link to the item, and three offered downloadable citation formats only, such as endnote, zotero, and mendeley. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 8 table 4. various digital object characteristics surveyed unique item identifier in item record number percentage of total (183) yes 132 72% no 51 28% citation in item record number percentage of total (183) yes 65 36% no 118 64% citations containing links to item number percentage of total (65) yes 39 60% downloadable citation format only 3 5% no 23 35% discussion since proper citation practice dictates choosing the url most likely to provide continuing access to a resource, it follows that providing persistent urls to resources such as digital objects or digital collections is also a good practice. it is encouraging to see a large number of institutions surveyed providing urls that persist (or claim to persist). providing persistent access to a unique digital resource implies a level of commitment to maintaining its url into the future, requiring policies, technology, and labor resources, further augmented by costs associated with registering certain types of identifiers like doi.22 it is likely that institutions not providing persistent (or not obviously persistent) urls are either internally committing to preserving their objects, collections, and services through means not known to end users; are constrained by technological limitations of their digital collection platforms; hope to develop or adopt new digital library services that offer these capabilities; or lack the resources to offer persistent urls. the four commonly used methods of persistent link resolution—doi, handle, ark, and purl— have been used for nearly 20 years, and it is not surprising that alternative observable methods were seldom encountered in this study. handles were the most common persistent url method, which seems related to the digital library platform used by an institution. dspace distributions are pre-bundled with handle server software, for example, and 12 out of 27 platforms serving digital objects with handles were based on dspace (https://duraspace.org/dspace/). when choosing to implement or upgrade a digital library platform, institutions often consider several available options. choosing a platform that offers the ability to easily create and maintain persistent urls might be less burdensome than making urls persist via independent or alternative means. thirty-three digital objects offered links that had labels implying some sort of persistence but lacked information describing the methods used or url components consistent with commonly used methods, as seen in table 1. to achieve persistence, there might be a combination of url rewriting, locally implemented solutions, or nonpublic persistent urls existing. it would benefit users, increasingly aware of the need to cite digital objects using persistent links, for digital object platforms that offer persistent linking to explicitly state that fact and ideally offer some evidence of the resolution method used. researchers will be looking for citable persistent links that offer https://duraspace.org/dspace/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 9 some cues signifying their persistence, whether it is clearly indicated language on the website or a url pattern consistent with the four major methods commonly used. the amount of variation in labeling persistent links was surprising. commonly used digital library software platforms have default ways of labeling these fields. nearly all of the “reference url” labels encountered are in contentdm sites, for example. since the concept of offering a persistent link to a digital object is not uncommon, perhaps there can be a more consistent approach to choosing the label for this content. when a researcher finds a digital object in an institutional digital library service, they might want to cite that object. accurately citing resources in all formats is an essential research skill, and digital library platforms often try to aid users by providing dynamically generated or pre-populated citations based on unique metadata associated with that object. it was somewhat surprising to encounter these types of citation helpers that did not include persistent links. since a digital object’s preferred persistent link is often different than the url visible in the browser, efforts should be made to make citations available containing persistent links. there are institutions with digital collections that were not examined in this study due to a number of factors. first, this study examined the 195 institutions who were members of the digital library federation, and there are 2,828 four-year postsecondary institutions in the united states as of 2018.23 additional study could expand perceptions about persistent links for digital objects when looking beyond the dlf member institutions, which are predominantly four-year postsecondary institutions but also contain museums, public libraries, and other cultural heritage organizations. an alternative approach to collecting this data would be to conduct user testing focused on finding and citing digital objects from a number of institutions. this approach was not used, however, since the initial goal of this study was to see how peer digital library institutions have employed persistent links and citations across a broad yet contained spectrum. as one librarian with extensive digital library experience, my approach to locating these platforms and resources is subject to subconscious bias i may have accumulated over my professional career, but i would hope that my experience makes me more able to locate these platforms and materials than the average user. digital library platforms are numerous, and often institutions have several of them with varying degrees of public visibility or connectivity to their institution’s main library website. this study’s findings for any particular institution are not as authoritative as self-reported information from the institution itself. while a survey aimed at collecting direct responses from institutions might have yielded more accuracy, a potentially low response rate would also make it difficult to truly know what methods of persistent linking peer institutions are employing, especially with the majority of these resources being openly findable and accessible. still, further study with self reported information could shed more light on the decisions to provide certain methods of persistent links to objects within their chosen digital collection platforms. moreover, it is possible that some digital object formats are more likely to have persistent urls than others. newer formats such as three-dimensional digital objects, commonly cited resources like data sets, and scholarship held in institutional repositories could be available in digital library services similar to those surveyed in this study with different persistent url characteristics. additional study could aim to survey populations of digital objects by format across multiple institutions to investigate any correlation between persistent urls and object format. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 10 conclusion unique digital collections at digital library institutions are made openly accessible to the pu blic in a variety of ways, including digital library software platforms and digital library aggregator services. regardless of how users find these materials, best practices require users to cite urls for these materials that are most likely to continue to provide access to them. persistent urls are a common way to ensure cited urls to digital objects remain accessible. commonly used methods of issuing and maintaining persistent urls can be identified in digital object records within digital collection platforms available at these institutions. this study identified characteristics about these digital objects, their platforms, prevalence of persistent urls in their records, and the way these urls are presented to users. findings indicate that dlf member institutions are split evenly between providing and not providing publicly discernible persistent urls with wide variation on how these urls are presented and explained to users. decisions made in developing and maintaining digital collection platforms and the types of urls made available to users impact which urls users cite and the possibility of others encountering these resources through these citations. embarking on this study also was prompted by digital collection migrations at indiana university, and these findings provide us interesting examples of persistent url usage at other institutions and ways to improve the user experience in digital collection platforms. endnotes 1 the chicago manual of style online (chicago: university of chicago press, 2017), ch. 14, sec. 7. 2 arthur allison et al., “digital identity matters,” journal of the american society for information science & technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 3 yuk hui, “what is a digital object?” metaphilosophy 43, no. 4 (2012): 380–95, https://doi.org/10.1111/j.1467-9973.2012.01761.x. 4 clifford lynch, “authenticity and integrity in the digital environment: an exploratory analysis of the central role of trust” council on library and information resources (clir), 2000, https://www.clir.org/pubs/reports/pub92/lynch/. 5 g. benoit and lisa hussey, “repurposing digital objects: case studies across the publishing industry,” journal of the american society for information science & technology 62, no. 2 (2011): 363–74, https://doi.org/10.1002/asi.21465. 6 angela dappert et al., “describing and preserving digital object environments,” new review of information networking 18, no. 2 (2013): 106–73, https://doi.org/10.1080/13614576.2013.842494. 7 christos strubulis et al., “a case study on propagating and updating provenance information using the cidoc crm,” international journal on digital libraries 15, no. 1 (2014): 27–51, https://doi.org/10.1007/s00799-014-0125-z. 8 william y. arms, “uniform resource names: handles, purls, and digital object identifiers,” communications of the acm 44, no. 5 (2001): 68, https://doi.org/10.1145/374308.375358. https://doi.org/10.1111/j.1467-9973.2012.01761.x https://www.clir.org/pubs/reports/pub92/lynch/ https://doi.org/10.1002/asi.21465 https://doi.org/10.1080/13614576.2013.842494 https://doi.org/10.1007/s00799-014-0125-z https://doi.org/10.1145/374308.375358 information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 11 9 lukas koster, “persistent identifiers for heritage objects,” code4lib journal 47 (2020), https://journal.code4lib.org/articles/14978. 10 albert w. simmonds, “the digital object identifier (doi),” publishing research quarterly 15, no. 2 (1999): 10, https://doi.org/10.1007/s12109-999-0022-2; norman paskin, “digital object identifiers,” information services & use 22, no. 2/3 (2002): 97, https://doi.org/10.3233/isu2002-222-309. 11 david sidman and tom davidson, “a practical guide to automating the digital supply chain with the digital object identifier (doi),” publishing research quarterly 17, no. 2 (2001): 9, https://doi.org/10.1007/s12109-001-0019-y; andy weissberg, “the identification of digital book content,” publishing research quarterly 24, no.4 (2008): 255–60, https://doi.org/10.1007/s12109-008-9093-8. 12 susanne derisi, rebecca kennison, and nick twyman, “the what and whys of dois,” plos biology 1, no. 2 (2003): 133–34, https://doi.org/10.1371/journal.pbio.0000057; diane j. skiba, “digital object identifiers: are they important to me?,” nursing education perspectives 30, no. 6 (2009): 394–95, https://doi.org/10.1016/j.lookout.2008.06.012; janna neumann and jan brase, “datacite and doi names for research data,” journal of computer-aided molecular design 28, no. 10 (2014): 1035–41, https://doi.org/10.1007/s10822-014-9776-5. 13 john kunze, “towards electronic persistence using ark identifiers,” california digital library, 2003, https://escholarship.org/uc/item/3bg2w3vs. 14 keith e. shafer, stuart l. weibel, and erik jul, “the purl project,” journal of library administration 34, no. 1–2 (2001): 123, https://doi.org/10.1300/j111v34n01_19. 15 keith e. shafer, “arms, oclc internet services, and purls,” journal of library administration 34, no. 3–4 (2001): 385, https://doi.org/10.1300/j111v34n03_19. 16 cendi persistent identification task group, “persistent identification: a key component of an egovernment infrastructure,” new review of information networking 10, no. 1 (2004): 97–106, https://doi-org/10.1080/13614570412331312021. 17 erick ducut, fang liu, and paul fontelo, “an update on uniform resource locator (url) decay in medline abstracts and measures for its mitigation,” bmc medical informatics & decision making 8, no. 1 (2008): 1–8, https://doi.org/10.1186/1472-6947-8-23. 18 julie a. mcmurry et al., “identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data,” plos biology 15, no. 6 (2017): 1–18, https://doi.org/10.1371/journal.pbio.2001414. 19 john markwell and david brooks, “broken links: just how rapidly do science education hyperlinks go extinct?” (2003), cited by many and previously available from: http://wwwclass.unl.edu/biochem/url/broken_links.html [currently non-functional]. 20 “our member institutions,” digital library federation (2020), https://www.diglib.org/about/members/. https://journal.code4lib.org/articles/14978 https://doi.org/10.1007/s12109-999-0022-2 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.1007/s12109-001-0019-y https://doi.org/10.1007/s12109-008-9093-8 https://doi.org/10.1371/journal.pbio.0000057 https://doi.org/10.1016/j.lookout.2008.06.012 https://doi.org/10.1007/s10822-014-9776-5 https://escholarship.org/uc/item/3bg2w3vs https://doi.org/10.1300/j111v34n01_19 https://doi.org/10.1300/j111v34n03_19 https://doi-org/10.1080/13614570412331312021 https://doi.org/10.1186/1472-6947-8-23 https://doi.org/10.1371/journal.pbio.2001414 http://www-class.unl.edu/biochem/url/broken_links.html http://www-class.unl.edu/biochem/url/broken_links.html https://www.diglib.org/about/members/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 12 21 twenty-four labels used only once: archival resource key; ark; bookmark this page at; citable link; citable link to this page; citable uri; copy; copy and paste this url; digital object url; doi; identifier (hdl); item; link; local identifier; permanent url; permanently link to this resource; persistent link to this item; persistent link to this record; please use this identifier to cite or link to this item; related resources; resource identifier; share; share link/location; to cite or link to this item, use this identifier. 22 one of the frequently asked questions (https://www.doi.org/faq.html) states that doi registration fees vary. 23 national center for education statistics, “table 317.10. degree-granting postsecondary institutions, by control and level of institution: selected years, 1949–50 through 2017–18,” in digest of education statistics, 2018, https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp. https://www.doi.org/faq.html https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp abstract introduction literature review methodology results discussion conclusion endnotes are ivy league library website homepages accessible? articles are ivy league library website homepages accessible? wenfan yang, bin zhao, yan quan liu, and arlene bielefield information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11577 wenfan yang (youngwf@126.com) is a master’s student in the school of management, tianjin university of technology, china. bin zhao (andy.zh@126.com) is professor in school of management, tianjin university of technology, china. yan quan liu (liuy1@southernct.edu) is professor in information and library science at southern connecticut university and special hired professor of tianjin university of technology. arlene bielefield (bielefielda1@southernct.edu) is professor in information and library science at southern connecticut university. copyright © 2020. abstract as a doorway for users seeking information, library websites should be accessible to all, including those who are visually or physically impaired and those with reading or learning disabilities. in conjunction with an earlier study, this paper presents a comparative evaluation of ivy league university library homepages with regard to the americans with disabilities act (ada) mandates. data results from wave and achecker evaluations indicate that although the error of missing form labels still occurs in these websites, other known accessibility errors and issues have been significantly improved from five years ago. introduction an academic library is “a library that is an integral part of a college, university, or other institution of postsecondary education, administered to meet the information and research needs of its students, faculty, and staff.”1 people living with physical disabilities face barriers whenever they enter a library. many blind and visually impaired persons need assistance when visiting a library to do research. in such cases, searching the collection catalog, periodical indexes, and other bibliographic references are frequently conducted by a librarian or the person accompanying that individual to the library. thus, professionals in these institutions can advance the use of academic libraries for the visually impaired, physically disabled, hearing impaired, and people with learning disabilities. library websites are libraries’ virtual front doors for all users pursuing information from libraries. fichter stated that the power of the website is in its popularization.2 access by everyone regardless of disability is an essential reason for its popularization. whether users are students, parents, senior citizens, or elected officials navigating the library website to find resources, or sign up for computer courses at the library, the website can be either a liberating or a limiting experience.3 according to the web accessibility initiative (https://www.w3.org/wai/), website accessibility means that people with disabilities can use the websites. more specifically, website accessibility means that people with disabilities can perceive, understand, navigate, and interact with websites and that they can contribute to the websites. incorporating accessibility into website design enables people with disabilities to enjoy the benefits of websites to the same extent as anyone else in their community. this study evaluated the current state of the accessibility of university websites of the american ivy league university libraries using guidelines established by the americans with disabilities act mailto:youngwf@126.com mailto:andy.zh@126.com mailto:liuy1@southernct.edu mailto:bielefielda1@southernct.edu https://www.w3.org/wai/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 2 (ada) for those who are visually or physically impaired or who have reading or learning disabilities. ada’s section 508 and the web content accessibility guidelines (wcag), by the world wide web consortium (w3c) provide guidelines for website developers which define what makes a website accessible to those with physical, sensory, or cognitive disabilities. since a broad array of disabilities are recognized under the ada, websites seeking to be compliant with the ada should use the act’s technical criteria for website design. this study used two common accessibility evaluation tools—wave and achecker—for both section 508 and the wcag version 2.0 level aa. among universities in the united states, the eight ivy league universities—brown, columbia, cornell, dartmouth, harvard, princeton, university of pennsylvania, and yale—all have a long and distinguished history, strict academic requirements, high-quality teaching, and high-caliber students. because of their good reputations, they are expected to lead by example, not only in terms of academic philosophy and campus atmosphere, but also by the accessibility of their various websites. of course, any library website, whether an urban public library or a university library, should be accessible to everyone. hopefully, this study of their accessibility can enlighten other universities on how to better develop and maintain library websites so that individuals with disabilities can enjoy the same level of accessibility to academic knowledge as everyone else. literature review in 1999, schmetzke reported that emerging awareness about the need for accessible website design had not yet manifested itself in the actual design of library websites. for example, at the fourteen four-year campuses within the university of wisconsin system, only 13 percent of the libraries’ top-level pages (homepages plus the next layer of library pages linked to them) were free of accessibility problems.4 has this situation changed in the last twenty years? to answer this question, a number of authors have suggested various methods for evaluating software/hardware for accessibility and usability.5 included in the process of compiling data is “involving the user at each step of the design process. involvement typically takes the form of an interview and observation of the user engaged with the software/hardware.”6 providenti & zai conducted a study in 2007 focused on providing an update on the implementation of website accessibility guidelines of kentucky academic library websites. they tested the academic library homepages of bachelor-degree granting institutions in kentucky for accessibility compliance using watchfire’s webxact accessibility tester and the w3c’s html validator. the results showed that from 2003 to 2007, the number of library homepages complying with basic accessibility guidelines was increasing.7 billingham conducted research on edith cowan university (ecu) library websites. the websites were tested twice, in october 2012 and june 2013, using automated testing tools such as code validators and color analysis programs, resulting in findings that 11 percent of the guidelines for wcag 2.0 level a to level aa were passed in their first test. additionally, there was a small increase in the percentage of wcag 2.0 guidelines passed by all pages tested in the second test. 8 while quite a few research studies focus on library website accessibility rather than the university websites, the conclusions diverge. tatiana & jeremy (2014) tested 509 webpages at a large public university in the northeastern united states using wave (http://wave.webaim.org) and cynthia says (http://www.cynthiasays.com). the results indicated that 51 percent of those webpages http://wave.webaim.org/ http://www.cynthiasays.com/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 3 passed automated website accessibility tests for section 508 compliance with cynthia says. however, when using wave for wcag priority 1 compliance, which is a more rigorous evaluation level, only 35 percent passed the test.9 maatta smith reported that not one of the websites of 127 us members of the urban library council (ulc) was without errors or alerts, with the average number of errors being 27.10 such results were similar with liu.11 12they also found that about half (58 of 127) of the urban public libraries provided no information specifically for individuals with disabilities. of the 127 websites, some were confusing by using the variety of verbiage to suggest services for individuals with disabilities. sixty-six of them provide some information about services within the library for individuals with disabilities. the depth of the information varied, but in all instances contact information was included for additional assistance. liu, bielefield, and mckay examined 122 library homepages of ulc members and reported on three main aspects. first, only seven of them presented as error free when tested for compliance with the 508 standards. the highest percentage of errors occurred in accessibility sections 508(a) and 508(n). second, the number of issues was dependent on the population served. that means libraries serving larger populations tend to have more issues with accessibility than those serving smaller ones. third, the most common errors were missing label and contrast errors, while the highest number of alerts was related to the device-dependent event handler, which means that a keyboard or mouse is a necessary piece of equipment to initiate a desired transaction.12 although they were interested in overall website accessibility, theofanos and redish focused their research on the visually impaired website user. the authors investigated and revealed six reasons to bridge the gap between accessibility and usability. the six reasons were: 1. disabilities affect more people than you may think worldwide. 750 million people have a disability, and three of every ten families are touched by a disability. in the united states, one in five have some kind of disability, and one in ten have a severe disability. that’s approximately 54 million americans. 2. it is a good business. according to the president’s committee on the employment of people with disabilities, the discretionary income of people with disabilities is $175 billion. 3. the number of people with disabilities and income to spend is likely to increase. the likelihood of having a disability increases with age, and the overall population is aging. 4. the website plays an important role and has significant benefits for people with disabilities. 5. improving accessibility enhances usability for all users. 6. it is morally the right thing to do.13 lazar, dudley-sponaugle, and greenidge validated that most blind users were just as impatient as most sighted users. they want to get the information they need as quickly as possible. they don’t want to listen to every word on the page just as sighted users do not read every word.14 similarly, foley found that using automated validation tools did not ensure complete accessibility. students with low vision found many of the pages hard to use even though they were validated.15 outcomes of all the research revealed that most university library websites have developed a policy on website accessibility, but the policies of most universities had deficiencies.16 library staff information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 4 must be better informed and trained to understand the tools available to users, and when reviewing web pages, the audiences of all kinds must be considered.17 research design and methods this study, as a continuing effort from an earlier study on urban library websites, made use of content analysis methodology to examine the website accessibility of the university libraries against the americans with disabilities act (ada), with a focus on those with visual or cognitive disabilities.18 under the ada, people with disabilities are guaranteed access to all postsecondary programs and services. the evaluation of accessibility focuses on the main pages of these university library websites, as shown in table 1, because these homepages considerably demonstrate the institution’s best effort or, at least, most recent redesigns. it was the intent of the authors of this research to reveal the current status of the ivy league library homepages’ accessibility and the importance that ivy league universities attach to the accessibility of their websites. commonly recognized website evaluators (wave, achecker, and cynthia says), along with other online tools, evaluate a website's accessibility by checking its html and xml code. wave and achecker were selected for this study for the robustness of their evaluation based on w3c guidelines, comprehensiveness of evaluation reporting, and ready availability to any institution or individual conducting website evaluations. wave is a web evaluation tool that was utilized to check websites against section 508 standards and wcag 2.0 guidelines. this assessment was conducted by entering a uniform research locator, url, or website address in the search box. the evaluation tool provided a summary of errors, alerts, features, structural elements, html5 and aria. achecker is a tool to check single html page content for conformance with accessibility standards to ensure the content can be accessed by everyone. it produces a report of all accessibility problems for the selected guidelines by three types of problems: known problems, likely problems and potential problems. both wave and achecker help website developers make their website content more accessible. data from different periods were compared to show statistically whether enough attention was paid to accessibility issues by the ivy league university systems. the study team collected the first data set in february 2014, using wave for section 508. in 2018, achecker accessibility checker was used for both section 508 and wcag 2.0 aa. the access board published new requirements for information and communication technology covered by section 508 of the rehabilitation act (https://www.access-board.gov/guidelines-andstandards/communications-and-it/about-the-ict-refresh) on january 18, 2017. the latest wcag 2.0 guidelines were updated on september 5, 2013 (https://www.w3.org/tr/wcag2ict/). while the wave development team indicated that they have updated the indicators in wave regarding wcag 2.0, the current indicators regarding section 508 refer to the previous technical standards for section 508, not the updated 2017 ones. according to achecker.ca, the versions of the section 508 standards and wcag 2.0 aa guidelines used were published on march 12, 2004 and june 19, 2006, respectively, with neither being the latest versions. this study centered on three research questions: https://www.access-board.gov/guidelines-and-standards/communications-and-it/about-the-ict-refresh https://www.access-board.gov/guidelines-and-standards/communications-and-it/about-the-ict-refresh https://www.w3.org/tr/wcag2ict/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 5 1. are the library websites of the eight ivy league universities ada compliant? 2. are there easily identified issues that present barriers to access for the visually impaired on the ivy league university library homepages? 3. what should ivy league libraries do to achieve ada compliance and to maintain it? table 1. investigated websites of ivy league university libraries. library website address brown university library https://library.brown.edu columbia university libraries http://library.columbia.edu cornell university library https://www.library.cornell.edu dartmouth library https://www.library.dartmouth.edu harvard library http://library.harvard.edu princeton university library http://library.princeton.edu penn libraries http://www.library.upenn.edu yale university library https://web.library.yale.edu results & discussion all five evaluation categories employed by wave for section 508 standards, as shown in figure 1, were examined, with a more in-depth review of the homepage of the university of pennsylvania library. similar results in numbers of the five categories are presented in the library homepages of brown university, columbia university, and cornell university. interestingly, wave indicates more errors and alerts on the homepage of yale university. figure 1. wave results for section 508 standards. in order to determine the accuracy of the results, the team also used achecker to reevaluate these homepages in the year 2018. known problems as the category in achecker are as serious as errors in wave. they have been identified with certainty as accessibility barriers by the website information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 6 evaluators and need to be fixed. likely problems are problems that could be barriers which require a human to decide whether there is a need to fix them. achecker cannot identify potential problems and requires a human to confirm if identified problems need remediation. figure 2 shows the numbers for each category as detected by achecker on june 18, 2018, on the eig ht ivy league university libraries’ homepages. the library homepage of the university of pennsylvania was found to contain the most, which was the same as the result from wave. however, among the seven remaining libraries’ homepages, the homepage of harvard university library displayed the same number of problems as the university of pennsylvania detected by achecker. figure 2. achecker results for section 508 standards. there was significant improvement between 2014 and 2018 the results of this research from wave for section 508 standards signify a significant shift in the accessibility of these websites between the years of 2014 and 2018. among the five wave detection categories in the eight library homepages, the total of errors and alerts decreased during this period. for instance, the total number of errors was 36 in 2014 decreasing to 11 in 2018, and the number of alerts decreased from 141 to 14. figure 3 shows the number of errors in each library homepage, and figure 4 shows the number of alerts. they all show a downward trend from 2014 to 2018. but features, structural elements and html/aria were all on the rise when comparing the two years’ data sets. the green sections in table 2 indicate a decrease of the numbers in three categories from 2014 to 2018, and the yellow sections indicate an increase in numbers. these data results revealed that errors and alerts, the most common problems related to access, had been better controlled during these years, while others might still remain unchanged. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 7 figure 3. change of errors from 2014 to 2018. figure 4. change of alerts from 2014 to 2018. table 2. changes of features, structural elements, and html/aria between 2014 and 2018. categories features structural elements html/aria year of data collection 2014 2018 2014 2018 2014 2018 total 108 191 184 233 24 89 brown university library 13 15 6 13 0 1 columbia university libraries 12 13 23 14 17 0 cornell university library 5 6 20 18 0 4 dartmouth library 10 8 15 27 0 23 harvard library 20 20 14 24 0 4 princeton university library 15 31 45 24 0 3 penn libraries 12 90 29 104 7 50 yale university library 21 8 32 9 0 4 missing form labels were the top error against the ada the data used in the analysis below were all the test data collected in 2018. all errors appearing in data results were collected and analyzed. figure 5 shows the number of errors that were identified based on the specific requirements contained in section 508 of the rehabilitation act as evaluated by wave. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 8 figure 5. occurrences of specific error per specific 508 standards. the term error refers to accessibility errors that need to be fixed. missing form label was the highest frequency error type shown. only two types of errors occurred in ivy league university libraries’ homepages. but these errors didn’t appear on every homepage. there are several errors in some homepages while others had no errors. for example, linked image missing alternative text occurred on the library homepage of harvard university twice. table 3 shows the distribution of errors in eight homepages. table 3. distribution of errors in eight homepages. missing form label linked image missing alternative text brown university library columbia university libraries 1 cornell university library dartmouth library 3 harvard library 2 princeton university library penn libraries 1 yale university library 4 missing form label is listed in section 508 (n) and means there is a form control without a corresponding label. this is important because if a form control does not have a properly associated text label, the function or purpose of that form control may not be presented to sc reen reader users. linked image missing alternative text occurred only in the harvard library homepage among the eight ivy league university libraries’ homepages. it indicated that an image without alternative text results in an empty link. if an image is within a link that does not provide alternative text, a screen reader has no content to present to the user regarding the function of the link. these website accessibility issues may be easy fixes and considered minor to some; however, if they are not detected, they are major barriers for persons living with low vision or blindness. as a result, users are left at a disadvantage because they are lacking critical information to successfully fulfill their needs. examples of such error icons in wave are displayed in figures 6 and 7. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 9 figure 6. missing form label icon from yale university library homepage. figure 7. linked image missing alternative text icon from harvard library homepage. a total of eleven errors, as shown in figure 8, were located on the homepages of the eight ivy league libraries and illustrated the number of errors that occurred in each library homepage. the average number of errors for each homepage was 1.375. yale university library homepage had the most errors with a total of four. library homepages of brown university, cornell university and princeton university performed best with zero errors. figure 8. the total of errors in ivy league libraries’ homepages. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 10 six alerts appear among ada requirements the issues that alerts identify are also significant for website accessibility. figure 9 shows there are six different kinds of alerts that were identified based on the specific requirements contained in section 508 of the rehabilitation act. figure 9. occurrences of specific alert per specific 508 standards. the noscript element was the most encountered alert issue. alerts that wave reports need close scrutiny, because they likely represent an end-user accessibility issue. the noscript element is related to the 508 (l) requirement and means a noscript element is present when javascript is disabled. for users of screen readers and other assistive technologies, almost all have javascript enabled, so noscript cannot be used to provide an accessible version of inaccessible scripted content. skipped heading level ranked was second in number. the importance of headings is in their provision of document structure and facilitation of keyboard navigation for users of assistive technology. these users may be confused or they may experience difficulty navigating when heading levels are skipped. examples of icons of these alerts, evaluated by wave, indicated these noscript elements as being to accessibility, as shown in figures 10 and 11. figure 10. noscript element icon from cornell university library homepage. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 11 figure 11. skipped heading level icon from dartmouth library homepage. a total of fourteen alert problems were detected. figure twelve illustrates the number of alerts that occurred on each library homepage. on average, there were 1.75 alerts present on the eight websites. the library homepages of yale university and the university of pennsylvania had the most alerts with 4 on each site. only the brown university library’s homepage had zero alerts. figure 12. the total of alerts in ivy league libraries’ homepages. linked image with alternative text was the most frequently found feature issue features as a category of issues indicates conditions of accessibility that probably need to be improved and usually require further verification and manual fixing. for example, if a feature is detected on a website, it means that further manual verification is required to confirm its accessibility. figure 13 shows the number of features that were identified, based on the specific requirement contained in section 508 of the rehabilitation act. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 12 figure 13. occurrences of specific features per specific 508 standards. linked image with alternative text, which is a 508 (a) requirement, was shown to be the most encountered features issue. this means that an alternative text should be presented for an image that is within a link. by including appropriate alternative text on an image within a link, the function and purpose of the link and the content of the image are available to screen reader users even when images are unavailable. another high occurring feature was form label, which means a form label is present and associated with a form control. a properly associated form label is presented to a screen reader user when the form control is accessed. these evaluation steps were the same ones used for errors and alerts. example icons of features evaluated by wave are displayed as figures 14 and 15. figure 14. linked image with alternative text icon from brown university library homepage. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 13 figure 15. form label icon from penn libraries homepage. this study also ranked the number of features that were detected by wave in the eight ivy league library homepages. figure 16 displays the number of features that occurred on each library homepage. in total there were 191 features detected by wave in the eight ivy league university libraries’ homepages. the homepage of the university of pennsylvania library was found to have 90 features, by far the most of all the libraries. no library was entirely free of features according to the wave measurement using section 508 standards. figure 16. the total of features in ivy league libraries’ homepages. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 14 table 4a. comparison between wave & achecker section 508 standards on brown and columbia’s library homepages. section 508 standards brown university columbia university wave achecker wave achecker april june april june april june april june total 33 29 47 47 28 29 79 83 a 9 9 9 9 12 13 12 14 b c 14 14 26 28 d 8 8 14 14 e f g h i j 8 8 14 14 k l 6 6 12 12 m n 1 1 1 1 1 1 o 23 19 1 1 15 15 1 1 p table 4b. comparison between wave & achecker section 508 standards on cornell and dartmouth’s library homepages. section 508 standards cornell university dartmouth college wave achecker wave achecker april june april june april june april june total 30 29 107 106 59 68 65 67 a 2 2 2 2 8 8 10 11 b c 36 36 22 23 d 32 32 9 9 e f g h i j 33 32 9 9 k l 3 3 7 7 m n 7 7 23 29 8 8 o 21 20 1 1 28 31 p information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 15 table 4c. comparison between wave & achecker section 508 standards on harvard and princeton’s library homepages. section 508 standards harvard university princeton university wave achecker wave achecker april june april june april june april june total 51 51 139 139 57 61 74 74 a 20 20 29 29 25 25 20 20 b c 43 43 32 32 d 32 32 10 10 e f g h i j 34 34 10 10 k l 1 1 m n 5 5 3 7 o 26 26 1 1 29 29 1 1 p table 4d. comparison between wave & achecker section 508 standards on pennsylvania and yale’s library homepages. section 508 standards university of pennsylvania yale university wave achecker wave achecker april june april june april june april june total 253 249 129 139 28 29 84 85 a 40 37 14 19 6 7 4 5 b c 82 87 28 28 d 11 11 21 21 e f g 1 1 h i j 11 11 21 21 k l 1 1 9 9 3 3 4 4 m 3 2 n 103 104 1 1 8 8 4 4 o 106 105 1 1 11 11 1 1 p information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 16 a few 508 standards deviate from comparison between two evaluators to determine whether the wave tool missed some specific requirements in section 508, the authors comparatively examined these eight university homepages using both wave and achecker from one site to another synchronously in april and again in june 2019. there are sixteen principles in section 508. they are arranged from a to p. tables 4a–4d indicate issues for these section 508’s requirements in the eight universities’ homepages respectively. except the requirement g for yale library homepage which shows one issue in achecker, in neither wave nor achecker during the time we conducted our examination, there was no issue found for the seven requirements (b, e, f, h, i, k, and p) below: b. equivalent alternatives for any multimedia presentation shall be synchronized with the presentation; e. redundant text links shall be provided for each active region of a server-side image map; f. client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape; h. markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers; i. frames shall be titled with text that facilitates frame identification and navigation; k. a text-only page, with equivalent information or functionality, shall be provided to make a website comply with the provisions of this part, when compliance cannot be accomplished in any other way. the content of the text-only page shall be updated whenever the primary page changes; p. when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. the results tabulated in tables 4a–4d indicate that these seven section 508 requirements perhaps are not problematic to the websites. conclusions based on the results, this study determined that the eight ivy league universities’ homepages exhibited some issues with accessibility for people with disabilities. considerable effort is necessary to ensure their websites ready to meet the challenges and future needs of web accessibility. users with visual impairments can navigate a website only when it is designed to be accessible with other assistive technology. while each institution presented both general and comprehensive coverage of services for users with disabilities, it would have been more practical and efficient if specific links were posted on the homepage. according to the american foundation for the blind (https://www.afb.org), “usability” is a way of describing how easy a website is to understand and use. accessibility refers to how easily a website can be used, understood, and accessed by people with disabilities. https://www.afb.org/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 17 this study has concluded that expertise and specialized training and skill are still needed in th is area. principles of accessible website design must be introduced and taught, underscoring that design matters for people with disabilities just as it does in the physical environment. as highlighted earlier through the evaluation tool wave, most of the problems detected can be fixed with provided solutions. a frequent review is critical and websites should be assessed at a minimum on a yearly basis for accessibility compliance. there is much to be done if accessibility is to be realized for everyone. limitations the authors recognize that this study, using free website accessibility testing tools, has certain limitations. as wave remarked in their help page, the aim of website developers is not to get rid of all identified problem categories except errors that need to be fixed, but to determine whether a website is accessible. at the time of writing neither wave nor achecker were updated with the latest general wcag 2.1 aa rules. while the version of wcag 2.1 is expected to provide new guidelines for making websites even more accessible, more careful and comprehensive studies against the wcag 2.1 aa rules could further assist university library professionals and their website developers to provide those with disabilities with accessible websites. moreover, while it is effective to conduct these machine-generated evaluations, it is equally important that researchers check the issues manually to impose human analysis in determining the major issues with content. endnotes 1 joan m. reitz, odlis: online dictionary for library and information science. (westport, ct: libraries unlimited, 2004), 1–2. 2 darlene fichter, “making your website accessible,” online searcher 37, no. 4 (2013): 73–76. 3 fichter, “making your website accessible,” 74. 4 axel schmetzke, web page accessibility on university of wisconsin campuses: a comparative study (stevens point, wi, 2019). 5 jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests (idaho: wiley, 2008), 6–11. 6 alan foley, “exploring the design, development and use of websites through accessibility and usability studies,” journal of educational multimedia and hypermedia 20, no. 4 (2011), 361–85, http://www.editlib.org/p/37621/. 7 michael providenti and robert zai iii, “web accessibility at kentucky's academic libraries,” library hi tech 25, no. 4 (2007): 478–93, https://doi.org/10.1108/07378830710840446. 8 lisa billingham, “improving academic library website accessibility for people with disabilities,” library management 35, no. 8/9 (2014): 565–81, https://doi.org/10.1108/lm-11-2013-0107. 9 tatiana i solovieva and jeremy m bock, “monitoring for accessibility and university websites: meeting the needs of people with disabilities,” journal of postsecondary education and http://www.editlib.org/p/37621/ https://doi.org/10.1108/07378830710840446 https://doi.org/10.1108/lm-11-2013-0107 information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 18 disability 27, no. 2 (2014): 113–27, http://search.proquest.com/docview/1651856804?accountid=9744. 10 stephanie l. maatta smith, “web accessibility assessment of urban public library websites,” public library quarterly 33, no. 3 (2014): 187–204, https://doi.org/187204.10.1080/01616846.2014.937207. 11 yan quan liu, arlene bielefeld, and peter mckay, “are urban public libraries’ websites accessible to americans with disabilities?,” universal access in the information society, 18, no. 1 (2019): 191–206, https://doi.org/10.1007/s10209-017-0571-7. 12 liu, bielefeld, and mckay, “are urban public library websites accessible.” 13 mary frances theofanos and j. redish, “bridging the gap: between accessibility and usability,” interactions 10, no. 6 (2003): 36–51, https://doi.org/10.1145/947226.947227. 14 jonathan lazar, a. dudley-sponaugle, and k. d. greenidge, “improving web accessibility: a study of webmaster perceptions,” computers in human behavior 20, no. 2 (2004): 269–88, https://doi.org/10.1016/j.chb.2003.10.018. 15 foley, “exploring the design,” 365. 16 david a. bradbard, cara peters, and yoana caneva, “web accessibility policies at land-grant universities,” internet & higher education 13, no. 4 (2010): 258–66, https://doi.org/10.1016/j.iheduc.2010.05.007. 17 mary cassner, charlene maxey-harris, and toni anaya, “differently able: a review of academic library websites for people with disabilities," behavioral & social sciences librarian 30, no. 1 (2011): 33–51, https://doi.org/10.1080/01639269.2011.548722. 18 liu, bielefeld, and mckay, “are urban public library websites accessible,” 195. http://search.proquest.com/docview/1651856804?accountid=9744 https://doi.org/187-204.10.1080/01616846.2014.937207 https://doi.org/187-204.10.1080/01616846.2014.937207 https://doi.org/10.1007/s10209-017-0571-7 https://doi.org/10.1145/947226.947227 https://doi.org/10.1016/j.chb.2003.10.018 https://doi.org/10.1016/j.iheduc.2010.05.007 https://doi.org/10.1080/01639269.2011.548722 abstract introduction literature review research design and methods results & discussion there was significant improvement between 2014 and 2018 missing form labels were the top error against the ada six alerts appear among ada requirements linked image with alternative text was the most frequently found feature issue a few 508 standards deviate from comparison between two evaluators conclusions limitations endnotes product ownership of a legacy institutional repository: a case study on revitalizing an aging service article product ownership of a legacy institutional repository a case study on revitalizing an aging service mikala narlock and don brower information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13241 mikala narlock (mnarlock@nd.edu) is digital collections strategy librarian, university of notre dame. don brower (dbrower@nd.edu) is digital projects lead, university of notre dame. © 2021. abstract many academic libraries have developed and/or purchased digital systems over the years, including digital collection platforms, institutional repositories, and other online tools on which users depend. at hesburgh libraries, as with other institutions, some of these systems have aged without strong guidance and resulted in stale services and technology. this case study will explore the lengthy process of stewarding an aging service that satisfies critical external needs. starting with a brief literature review and institutional context, the authors will examine how the current product owners have embraced the role of maintainers, charting a future direction by defining a clear vision for the service, articulating firm boundaries, and prioritizing small changes. the authors will conclude by reflecting on lessons learned and discussing potential future work, both at the institutional and professional level. introduction our home-grown institutional repository (ir) began almost a decade ago with enthusiasm and promise, driven by an eagerness to meet as many use cases as possible. over time, the code grew unwieldy, personnel transitioned into new roles, and new priorities emerged, leaving few individuals to manage the repository, allocate resources, articulate priorities, or advocate for user needs. this in turn left the system underutilized and undervalued. in mid -2019, two product owners (pos) at hesburgh libraries, university of notre dame were named to oversee the service and tasked with determining how the service should continue, if at all. the pos began by evaluating the service, current commitments, and benefits, and identifying potential on-campus adopters of the service. after agreeing the service should continue, the pos started the lengthy process of turning the metaphorical ship, prioritizing modest adjustments that would have large payoffs.1 selected literature review since the 2003 seminal article by clifford lynch, much has been authored on the topic of institutional repositories as academic libraries and archives have flocked to create their own.2 a complete literature review is beyond the scope of this case study: institutional repositories have contended and continue to contend with a wide variety of challenges, including legal, ethical, and socio-technical challenges.3 while the lessons presented in this case study can apply to a wide variety of legacy services, a brief overview of some of the literature surrounding irs is crucial to understanding the challenges the authors were presented as product owners. broadly defined “as systems and service models designed to collect, organize, store, share, and preserve an institution’s digital information or knowledge assets worthy of such investment,” libraries and archives flocked to build the “essential infrastructure for scholarship in the digital mailto:mnarlock@nd.edu mailto:dbrower@nd.edu information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 2 age.”4 operating under the assumption that faculty members would flock to the service to deposit their works, irs were promised to solve many problems, including supporting open access publishing and digital asset management.5 as articulated by dorothea salo, however, the field of dreams model (“build it and they will come”) was insufficient as repositories often failed to meet changing user needs and expectations while heavily employing library jargon that was foreign to faculty members.6 moreover, as identified by kim, some irs struggle to even be known to their users, while also grappling with concerns of trust.7 other problems that have plagued repositories include limited adoption rates, restricted resources to support digitization of analog materials for faculty that operate in both analog and digital media, failing support from fellow library colleagues, and inconsistent and incomplete metadata.8 salo warned more than a decade ago that high-level library administrative support would be necessary to empower repository managers to enact lasting and substantive change, and recent studies echo these concerns.9 libraries have slowly started to serve faculty on their terms, such as by creating automated processes for populating irs, streamlining content deposits, experimenting with metadata harvesting features to provide increased access, and building more tools to integrate directly with the research lifecycle.10 however, these new technologies and services may be out of reach for many institutions. in addition to limited resources, some institutions are grappling with a legacy system that is incompatible with newer code, leaving these institutions in a feature desert, reliant on aging technology and cumbersome deposit processes.11 moreover, even in an institution where resources might be more readily available for licensing or purchasing newer technology, early forks of open-source code or otherwise deprecated components might make migration to newer platforms extremely difficult, if not impossible, without extensive infrastructure improvements. lastly, as libraries grappled with some of the issues mentioned above and options for repositories continued to proliferate, many institutions struggled to clearly articulate boundaries around their digital library holdings. confusion between digital collections, scholarly content, e-resources, and other digital materials resulted in some institutions having too many options to store content, leaving internal and external stakeholders confused as to where to discover and distribute materials; conversely, other institutions have few options, and a wide variety of content is pigeonholed imperfectly into a single repository.12 in both situations, developing repositories with vague content scopes can be exceedingly difficult, as a restrictive scope can stifle development , while an overly inclusive approach results in too many use cases and competing stakeholder interests to effectively prioritize feature development. local context our institutional repository at the university of notre dame, managed by hesburgh libraries employees, suffered from many problems that affected our locally built code: limited adoption and awareness on campus; aging technology that made adding new features a monumental, if not impossible, task; and an overly broad scope (and a simultaneous proliferation of other digital collection tools). while the detailed history of this repository is beyond the scope of this paper, a brief overview of the development provides critical context. additionally, the technical details and implementation particulars will not be discussed, as this case study transcends specific software frustrations and will resonate with many institutions regardless. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 3 in 2012, after a failed attempt to launch a repository in the early 2000’s, consortial development of our ir began in an open-source community. in 2014, an early implementation of the product was envisioned to be a unified digital library service that would provide support to many different stakeholders. this included a plan for a single location for researchers to share their scholarly work, research outputs, and research data, as well as for the university libraries to provide access to digitized rare and archival holdings. as development continued on the homegrown service, features were implemented to serve the numerous purposes mentioned above. this included components of an institutional repository, such as a self-deposit interface, customizable access levels, and a proof-of-concept researcher profile system. over time support for browsing digital collections was added, namely the development of the work type “collection,” which allowed curators to create a landing page for their collection and customize it with a representative image. development continued in a somewhat sporadic fashion, often aligning at the intersection of “what is easy?” and “what is needed?” as technical staff continued growing the open-source code. as content was added to the system, stemming from special collections, various campus partners, and electronic thesis and dissertation (etd) deposits, additional use cases emerged and were added to the scope of the repository. the system quickly grew cumbersome and difficult to work with. in short, the repository struggled with the challenges of many open-source technologies. the struggle was compounded by decreasing resources, an overly inclusive scope, limited adoption— both with external faculty as well as library faculty and staff—and consortial development that introduced features extraneous to local campus needs. while our repository did many different things, it failed to do any one well. after falling short of meeting the expectations for digital collections, particularly with regards to browsing and displaying objects, the library applied for, and received, a three-year mellon grant.13 this grant, a collaboration with the snite museum of art, university of notre dame, was initially sought to improve upon the existing repository and to build the infrastructure necessary to support the online display of collective cultural heritage materials and facilitate serendipitous discovery for patrons. however, soon into the grant, it became clear that creating an entirely new system for digital collections would be not only easier to build and maintain, but also better suited to meet the specific needs of digital collections as articulated by campus partners. first things first: what is our ir? around the same time this shift was announced, two individuals were appointed to serve as product owners (pos) of the repository. while exact duties vary between institutions, pos are responsible for liaising with users, managing the product backlog, directing development, communicating with a wide variety of stakeholders, resolving issue tickets, and guiding the overall direction of the product.14 the pos were tasked with making this amorphous, oft-critiqued service usable while dealing with uncertain resources and competing institutional priorities. with the change in grant objectives mentioned above, namely the desire to develop a new repository instead of contending with the legacy code, the option was presented to retire the repository and direct users to other systems that could sufficiently meet their needs, such as discipline specific repositories, general purpose repositories, or even online cloud storage. the pos recognized that continuing the system due solely to sunk costs was a fallacy: if the service was too cumbersome to information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 4 maintain with even nominal use, the return on investment would be abysmal and ultimately prevent the library from investing resources more appropriately. in order to evaluate the service, the pos considered active commitments and ongoing partnerships tied to the service. in particular, several centers and departments on campus had utilized the system to capture citations and demonstrate their impact. additionally, after conversations with library liaisons, it became apparent that there was great value in providing the campus with a discipline-agnostic repository that allows deposition of, provides access to, and preserves scholarly outputs that might otherwise be lost. while the pos recognized that faculty adoption or even awareness of the service was limited, they realized there were several campusspecific features that were useful to local champions, including flexible access controls at the record and file levels, as well as a customized etd workflow that served the graduate school, internal technical services, and the students and faculty required to interact with the system. acknowledging that the system and related services were still critical, the pos prioritized making sure the system remained useful: maintaining the legacy repository would cost valuable time and resources and would need to overcome the resentment that many internal stakeholders had developed over the years. after deciding the system was worth maintaining, it was necessary to explicitly narrow the scope of the service, which had broadened over time in an ad hoc manner: as other services were turned off, leaving various digital content to find a new location, our institutional repository was often leveraged to host the content, even when support for the needs of niche content was poor at best. when considering the future of the repository, several key use cases emerged, including the etd support provided to the graduate school as mentioned above. while the service had done many things acceptably, the strength was in the support for scholarship: the customized access levels, self-deposit interface, and robust preservation capabilities were frequently lauded as the highlights of the service to internal and external stakeholders. these considerations, combined with the eventual migration of digitized rare and unique materials to the new mellon-funded platform, resulted in rebranding and redefining the service as exclusively focused on scholarly outputs. with the goal of best supporting the teaching and research mission of the university, the directional force became how to (re)build the service as a trusted, useful, and integral repository for campus scholars to provide access to their research outputs. mission (and vision) critical operating under the guiding principles of usefulness, usability, and transparency, the first task after redefining and rearticulating the scope of the service was to keep the service operational. however, with the recognition that maintenance alone, while critical, would not lead to an enhanced reputation on campus, it was important to continue charting a forward direction. the product owners were given the freedom to articulate their ideal mission statement. to complement the vision of the repository as both trusted and integral, the pos further defined the mission statement in three key areas: to increase the impact of campus researchers, scholars, and departments; to advance new research by facilitating access to scholarship in all forms; and to serve as active campus partners in the research lifecycle. while these statements are far from innovative or revolutionary, it was essential for moving the service forward. in fact, these sentences were carefully crafted over the course of a month, during which time the product owners drafted the language, compared it with peer and aspirational peer information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 5 institutions, and solicited feedback from trusted internal colleagues before sharing it more broadly. this time-consuming process was critical for success, however: with the knowledge that these words would serve as the foundation for prioritizing feature requests and advocating for resources, the pos wanted to establish both the repository and themselves in their new role. this clarity in mission was also important for grappling with legacy emotional and mental frustrations that lingered towards the system, as the pos had a strong, unified foundation to advocate for resources and the service as a whole. relatedly, these mission and vision statements provided critical and consistent talking points, which were leveraged in presentations to internal stakeholders, provided to librarians as messaging for the liaison faculty, and useful in short communications to teaching professors, research faculty, and department administrato rs. clear and present boundaries in rebranding the repository, it also became clear that firm boundaries would be instrumental in attaining success. in addition to narrowly focusing feature development on supporting research and scholarly outputs, the pos also scaled back goals for adoption, intentionally excluded digital collection features, and identified features that were patently unattainable in the short term. the repository was often seen as a failure locally due to limited adoption and an incomplete record of the academic outputs of campus, reflecting concerns of irs more generally.15 combatting this narrative required a clear articulation and acceptance of the fact that the institutional repository, regardless of how seamlessly integrated or easy to use, would never be absolutely comprehensive or the authoritative record of our researchers and scholars. with limited resources and a current technical infrastructure in which it is difficult to incorporate automatic harvesting mechanisms, any effort to make the repository comprehensive would be impractical, unrealistic, and a waste of limited resources. instead, by focusing efforts on making the repository useful and refraining from being yet another requirement for an already overwhelmed faculty member or graduate student, the service can be improved to meet the unique needs of campus faculty, serving as a more viable option for those who need it.16 similarly, because there is less concern with filling the repository and increasing usage statistics and more on what the patron needs, the pos have been able to develop robust partnerships with stakeholders, leading to champions in research centers, labs, departments, and other administrative units across campus. this has helped scholars demonstrate the impact of their work, which in turn led to more partnerships with other campus centers, as champions began to advocate for the service to colleagues facing similar challenges across the university. in this way, decreasing the effort to fill the repository has actually increased holdings and driven more traffic to the site: by focusing on useful offerings and decreasing the burden on ourselves to create a comprehensive research repository, the pos have been able to prove the value of a discipline-agnostic approach to internal and external stakeholders. an additional, and extremely beneficial, boundary was intentionally excluding library-owned digital collections from the repository’s collecting and feature-development scope. the pos received little pushback from internal users on this change: the repository had been the de facto scholarly and research repository for nearly five years, as it was patently clear that supporting digital collections had been more of an afterthought, with limited features built to support curators and users in creating and interacting with rare and archival materials. in fact, internal colleagues supported this change wholeheartedly, as the pos volunteered to continue providing access to the extant digital content in the ir as the mellon grant-funded site was built. while this information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 6 direction had already been understood by individuals across the organization, it was helpful to clearly articulate the new boundaries in open forums for internal stakeholders, communication through a library-wide listserv, and repetition in smaller meetings. by articulating this new boundary clearly and repeating it frequently in different methods of communication, the pos had the authority to reject feature requests that were explicitly in support of rare and archival materials. with a clear focus on collecting and providing access to scholarly and research outputs, niche metadata fields, advanced browsing features, and robust collection landing pages were identified as unnecessary, as they were scoped for the mellon-funded platform, and internal colleagues quickly embraced this boundary. the final, crucial boundary, also related to feature requests, was to clearly define requests that were impossible to accommodate in the current technical infrastructure. as mentioned earlier, the pos focused first on maintenance: by updating code, critically evaluating the service and existing commitments, and charting a future direction, the pos could more effectively steward the project. this also meant revisiting previous feature requests, and even technical promises, in order to set more reasonable expectations on what the service would, and would not, be able to support in the coming years. with limited resources, advanced features such as research profiles—a frequent request from internal allies—was beyond the current capabilities with the aging technical stack. moreover, a feature-rich repository would be essentially useless if users’ basic expectations were left unmet: a cumbersome deposit interface, limited upload support, and confusing language throughout the site were more pressing issues, as they prevented users from even engaging with the site for any amount of time. by resolving these limitations and generating awareness of the repository, the pos could better serve not only current campus partners, but also future users, as an increase in adoption and use would lead to more resources to develop advanced features. instead of planning a new outfit for the proverbial patient, it was more important to stop the bleeding. by adopting firm boundaries, the pos were able to scope developer work, prioritize maintenance and modest feature development, and even deny implementation of previously requested features that were no longer relevant to the repository or would be unattainable in the coming years. the pos could explicitly drop support for unused services, allow other services to limp along, and improve existing strengths. this has further helped to clarify messaging about the service and garner more support from our campus partners; instead of a malleable system that fits too many roles in a limited capacity, the pos could clearly state how the repository offers support and garner users from across campus. small changes, big rewards the last critical component of rebranding and revitalizing the institutional repository was the conscious decision to implement incremental improvements instead of large, sweeping changes. in particular, there were known frustrations with the service that were easy to start working on while the product owners expanded the user base and sought additional user feedback. small changes to the user interface, including the addition of use metrics and color-coded access tags, received immediate attention and positive feedback from key stakeholders. additionally, over the numerous years of development, many projects to improve the repository had stalled for various reasons. by either prioritizing the work necessary to complete the project or accepting the sunkcosts and clearing the backlog for other projects, the technical development team could build momentum, completing projects and clearing mental space for new, exciting endeavors. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 7 with limited resources on hand, maximizing the return on investment also included an emphasis on securing and keeping internal and external champions. due to the limited outreach conducted early in the system’s existence as well as the mediocre service offerings, many campus users were unaware of the tool, and a few were using the repository in a somewhat limited fashion. in order to build support for the service, it was critical that key users of the repository received targeted support and outreach efforts. a primary example of this was an imaging facility on campus: this unit provided a critical service to campus, yet had difficulty showing the impact of their work as many faculty members did not cite their team in publications. the facility slowly began collecting citations manually, but still struggled to publicly advertise their capabilities and show the fruits of their labor. they solved this problem by loading citation records into the repository, which became the single location where any interested faculty, staff, and students could look to see the full output of the center. while they were using the repository in a somewhat different manner than anticipated, they found the system useful and were actively directing other campus centers and institutes to the repository for similar support. in conversations with them, it became clear that a few modest changes would streamline their workflows and alleviate some cumbersome burdens. with this concentrated outreach and a minimal amount of development, the repository secured a champion that continues to advocate for the service to colleagues across campus. lastly, prioritizing maintenance and paying down technical debt was critical for moving the repository forward. many software dependencies had fallen behind by several major version updates, making it difficult to add new features or consider potential migration paths to future technical solutions. while the amount of technical debt to be paid was substantial, by prioritizing a small amount of maintenance every month, the development team quickly caught up, thereby improving the overall performance of the site and providing the product owners with the flexibility to consider future technical implementations and key features to continue recruiting users. lessons learned and future work moving forward, the product owners are embracing the role of maintainers. in specific reference to repositories, that includes “repairing, caring for and documenting a wide variety of knowledge systems beyond irs to facilitate access and optimize user experience.”17 the work of critically evaluating commitments, establishing clear boundaries, and reaffirming the mission of the repository is useful on a recurring basis, and will need to be continued as the repository ages. maintaining the technical infrastructure as appropriate and conducting user experience testing to improve the service will be critical to ensuring the long-term success of the repository and the information contained therein. beyond the stewardship and small improvements required for maintaining the service, there is the opportunity to reconsider the role of the institutional repository, both at the local level and within the academic community. by prioritizing usefulness over comprehensiveness, the product owners made great strides in making the service accessible to patrons and actually usable. when considering the future of repositories, specifically through a lens of usefulness, it is critical to consider how future work will best serve faculty needs without overburdening librarians. adding pos who are examining how a service will be used and what will promote the mission of the library reframes a repository from being a piece of technology to being a source of interconnections. scholarship usually requires a level of technology different from what most campus it departments can provide: research does not usually just deal in urls, it requires dois information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 8 and persistent identifiers; files are not just backed up, but are preserved (an active process that requires consideration for how computing will change over the coming decades). not only is a library a place to go to look for data, but it is also a place that can help publish and deposit items, providing valuable services to connect researchers to tools and platforms to facilitate research. this is an area of service that libraries and repositories can provide. in the relationship between libraries and technologies, innovation and maintenance, one clear challenge was the amount of emotional labor necessary to revitalize a service. the pos spent a large portion of time apologizing for previous failures, managing expectations by scaling back previous promises, and grappling with the current technical shortcomings of the service. while this is, at least in part, the role of the pos, the phenomenon of controlling expectations and handling the emotional debt that comes with broken promises and failed technologies is not localized to hesburgh libraries. in libraries especially, this work tends to fall to women, where they are forced to be the middle ground between technology and patron-facing librarians.18 while embracing the term “product owner” has helped to make visible and valuable the labor invested, especially that which might otherwise be overlooked, libraries writ large still need to contend with the gender divide plaguing the seeming dichotomy between innovation and maintenance. 19 in fact, as libraries continue to build new technologies and support innovative research, the role of the product owners in managing legacy technologies will be crucial for success, as will embracing a culture of care and empathy. while beyond the scope of this case study, continued discussions of the gender roles often employed in library technology need to continue, especially as academic libraries embrace scrum methodology, project management, and product ownership. conclusion in this case study, the product owners of a legacy institutional repository described methods for revitalizing a service. for the institutional repository managed by hesburgh libraries, there has been a noticeable increase in usage in the past six months: more deposits, higher access counts, and more support tickets tracked. it appears the efforts of the product owners are showing results. this increased usage is one more piece of evidence that a repository is more than software and more than technology: by allowing the product owners oversight of the mission and ultimate direction of the service, not to mention the freedom to engage with users on behalf of the development team, the system is in a much better position than in previous years. despite these improvements, there is still room for growth as the pos guide the overall mission and development of the institutional repository as both a service and a system. similarly, as more institutions contend with legacy digital technology, using pos and the methods described above may prove beneficial. there is additional work to be done, such as investigating more thoroughly the role of the repository—indeed the concept of the repository—and discussions of gender norms in technology. endnotes 1 this article is based on a presentation by don brower and mikala narlock: “what to do when your repository enters middle age” (online presentation, samvera connect 2020, october 28, 2020), https://doi.org/10.7274/r0-e32v-2h81. 2 clifford lynch, “institutional repositories: essential infrastructure for scholarship in the digital age,” portal: libraries and the academy 3 (april 1, 2003): 327–36, https://doi.org/10.1353/pla.2003.0039. https://doi.org/10.7274/r0-e32v-2h81 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 9 3 soohyung joo, darra hofman, and youngseek kim, “investigation of challenges in academic institutional repositories: a survey of academic librarians,” library hi tech 37, no. 3 (january 1, 2019): 525–48, https://doi.org/10.1108/lht-12-2017-0266. 4 j. j. branin, “institutional repositories,” in encyclopedia of library and information science, ed. m. a. drake (boca raton, fl: taylor & francis group, 2005): 237–48; lynch, “institutional repositories.” 5 raym crow, “the case for institutional repositories: a sparc position paper,” arl bimonthly report 223, august 2002: 7; lynch, “institutional repositories.” 6 dorothea salo, “innkeeper at the roach motel,” december 11, 2007, https://minds.wisconsin.edu/handle/1793/22088. 7 jihyun kim, “motivations of faculty self-archiving in institutional repositories,” journal of academic librarianship 37, no. 3 (may 1, 2011): 246–54, https://doi.org/10.1016/j.acalib.2011.02.017; deborah e. keil, “research data needs from academic libraries: the perspective of a faculty researcher,” journal of library administration 54, no. 3 (april 3, 2014): 233–40, https://doi.org/10.1080/01930826.2014.915168. 8 trevor owens, “the theory and craft of digital preservation,” lis scholarship archive, july 15, 2017, https://doi.org/10.31229/osf.io/5cpjt. 9 e.g., joo, hofman, and kim, “investigation of challenges in academic institutional repositories.” 10 sarah hare and jenny hoops, “furthering open: tips for crafting an ir deposit service,” october 26, 2018, https://scholarworks.iu.edu/dspace/handle/2022/22547; james powell, martin klein, and herbert van de sompel, “autoload: a pipeline for expanding the holdings of an institutional repository enabled by resourcesync,” code4lib journal, no. 36 (april 20, 2017), https://journal.code4lib.org/articles/12427; carly dearborn, amy barton, and neal harmeyer, “the purdue university research repository: hubzero customization for dataset publication and digital preservation,” oclc systems & services, february 1, 2014, https://docs.lib.purdue.edu/lib_fsdocs/62. 11 clifford lynch, “updating the agenda for academic libraries and scholarly communications,” college & research libraries 78, no. 2 (february 2017): 126–30, https://doi.org/10.5860/crl.78.2.126. 12 lynch, “updating the agenda,” 128. 13 diane walker, “hesburgh/snite mellon grant,” october 31, 2018, https://doi.org/10.17605/osf.io/cusmx. 14 hrafnhildur sif sverrisdottir, helgi thor ingason, and haukur ingi jonasson, “the role of the product owner in scrum-comparison between theory and practices,” in “selected papers from the 27th ipma (international project management association), world congress, dubrovnik, croatia, 2013,” special issue, procedia—social and behavioral sciences, 119 (march 19, 2014): 257–67, https://doi.org/10.1016/j.sbspro.2014.03.030. https://doi.org/10.1108/lht-12-2017-0266 https://doi.org/10.1108/lht-12-2017-0266 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.31229/osf.io/5cpjt https://doi.org/10.31229/osf.io/5cpjt https://scholarworks.iu.edu/dspace/handle/2022/22547 https://scholarworks.iu.edu/dspace/handle/2022/22547 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.17605/osf.io/cusmx https://doi.org/10.1016/j.sbspro.2014.03.030 https://doi.org/10.1016/j.sbspro.2014.03.030 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 10 15 salo, “innkeeper.” 16 carolyn ten holter, “the repository, the researcher, and the ref: ‘it’s just compliance, compliance, compliance’,” journal of academic librarianship 46, no. 1 (january 1, 2020): 102079, https://doi.org/10.1016/j.acalib.2019.102079. 17 don brower et al., “on institutional repositories, ‘beyond the repository services,’ their content, maintainers, and stakeholders,” against the grain, 32 (1), https://against-thegrain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-therepository-services-their-content-maintainers-and-stakeholders/. 18 bethany nowviskie, “on capacity and care,” october 4, 2015, http://nowviskie.org/2015/oncapacity-and-care/; ruth kitchin tillman, “who’s the one left saying sorry? gender/tech/librarianship,” april 6, 2018, https://ruthtillman.com/post/whos-the-one-leftsaying-sorry-gender-tech-librarianship/. 19 dale askey and jennifer askey, “one library, two cultures” (library juice press, 2017), https://macsphere.mcmaster.ca/handle/11375/22281; rafia mirza and maura seale, “dudes code, ladies coordinate: gendered labor in digital scholarship,” october 22, 2017, https://osf.io/hj3ks/. https://doi.org/10.1016/j.acalib.2019.102079 https://doi.org/10.1016/j.acalib.2019.102079 https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://osf.io/hj3ks/ abstract introduction selected literature review local context first things first: what is our ir? mission (and vision) critical clear and present boundaries small changes, big rewards lessons learned and future work conclusion endnotes lita president's message: a framework for member success lita president’s message a framework for member success emily morton-owens information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12105 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. this column represents my final venue to reflect on our potential merger with alcts and llama before the vote. after a busy midwinter meeting with lots of intense discussions about the steering committee on organizational effectiveness (scoe)’s recommendations, the divisions, the merger, ala finances, and more, my thoughts keep turning in a particularly wonkish direction: towards our organization. so many of the challenges before us hinge on one particular dilemma. for those of us who are most involved in ala and lita, the organization (our committees, offices, processes, bylaws, etc.) may be familiar and supportive. but for new members looking for a foothold, or library workers who don’t see themselves in our association, our organization may look like a barrier. moreover, many of our financial challenges are connected to our organization. the organization must evolve, but we must achieve this without losing what makes us loyal members. while ala and lita have specific audiences of library workers and technologists, we have a lot in common with other membership organizations. one of the responsibilities for the lita vicepresident is attendance at a workshop put on by the american society of association executives, where we learn how to steward an organization. representatives from many different groups attended this workshop, where i had a chance to discuss challenges with leaders from medical and manufacturing associations, and i learned that these challenges are often orthogonal to the subject matter at hand. everyone was dealing with the need to balance membership cost and value, how to give members a voice while allowing for agile decision-making, and how to put on events that are great for attendees without becoming the only way to get value from membership. hearkening back even further, i worked as a library-school intern at a library with a long run of german and french-language serials that i retrospectively cataloged. one batch that has always stuck in my mind is the planning materials for international congresses that were held in the early 20th century by the international societies for horticulture and botany. these events were massive undertakings held at multi-year intervals, gradually planned by international mail. interested parties would receive a lavish printed prospectus, with registration and travel arrangements starting several years in advance. the most interesting documents pertained to the events planned for the mid to late 1930s in europe. these events were cancelled or fell short of intentions because of pre-world war ii political pressures. the congress schedules did not resume until 1950 or later, with some radical changes—for example, german was no longer used as the language of science, and the geographic distribution of events increased significantly in the later 20th century. when i first encountered this material, i was intrigued by how the war affected science. looking back now, i see a dual case study in organizations weathering a crisis whose magnitude we can only imagine, and then reinventing themselves on the other side. both of these organizations still exist and continue to meet, by the way—and i can’t help but feel that reinvention is the key to survival. mailto:egmowens.lita@gmail.com information technology and libraries march 2020 a framework for member success | morton-owens 2 our organizational framework is a key part of the challenge for both ala and lita. i have no doubt that members remain excited about our key issues for advocacy, our subjects for continuing education, and our opportunities for networking. but we have concerns about how we make those things happen. in lita, for example, continuing education requires a massive effort on the part of both member volunteers and staff to organize. we need to brainstorm relevant topics, recruit qualified instructors, schedule and promote the events, and finally run the sessions, collect feedback, and arrange payment for the instructors. this takes the time of the same people we’d like to have creating newsletters and booking conference speakers. meanwhile, right across the hall at ala headquarters, we have staff from alcts and llama doing the same things. these inefficiencies hit at the heart of our financial problems. at the ala level, scoe has proposed ideas like a single set of dues structures for divisions, and a single set of policies and procedures for all round tables. these changes would reduce the overhead required to operate these groups as unique entities, a financial benefit, while also making it easier for members to afford, join, and move between them, a membership benefit. that framework also offers us an opportunity to improve our associations. members have been asking how the association can act more responsively on issues of diversity, equity, and inclusion—for example, how can we have incident response that is proactive and sensitive to member needs while recognizing the complexities of navigating that space as a member-based organization. this is a chance to live up to our aspirations as a community. the actions lita has taken to extend all forms of participation to members who can only participate remotely/online are a way to make us more accessible to library workers regardless of finances or home circumstances. bylaws and policies may not be the most glamorous part of associations but they are the levers we can employ to change the character of our community. coming back to core, we can observe elements of the plan that are responding to both threats and opportunities. members of alcts, llama, and lita know that financial pressures are a major impetus for the merger effort. but, in the hope of achieving a positive reinvention, the merger planning steering committee put most of its emphasis on the opportunity side. the diagram of intersecting interests for core’s six proposed sections (https://core.ala.org/core-overlap/) is a demonstration of the new frontiers of collaboration that core will offer members. the proposed structure of core retains committees while also offering a more nimble way to instantiate interest groups. moreover, the process of creating core reflects the kind of transparent process we want to see in the future. the steering committee and the communications sub-committee crossed not just the three divisions but also different levels of experience and types of prior participation in the divisions. the communications group answered freeform questions, held twitter ama’s, and held numerous forums to collect ideas and feelings about the project. zoom meetings and twitter are not new media, but the sustained effort that went into soliciting and responding to feedback through these channels is a new mode for our divisions. the lita board recently issued a statement (https://litablog.org/2020/02/news-regarding-thefuture-of-lita-after-the-core-vote/) explaining that if the core vote does not succeed, we don’t see a viable financial path forward and will be spending the latter half of 2020 and the beginning of 2021 working toward an orderly dissolution of lita. it is tempting to approach this crossroads from a place of disappointment or fear. we cannot yet say precisely what it will be like to be a https://core.ala.org/core-overlap/ https://litablog.org/2020/02/news-regarding-the-future-of-lita-after-the-core-vote/ https://litablog.org/2020/02/news-regarding-the-future-of-lita-after-the-core-vote/ information technology and libraries march 2020 a framework for member success | morton-owens 3 member of core. but when i look at the organizational structure core offers us, i feel hopeful about it being a framework in which members will find their home and flourish. the new division includes what we need for a rich member experience coupled with a streamlined structure that makes it easier to be involved in the ways and extent that make sense for you. in fifty years, perhaps a future member of core will be writing a letter to their members: looking back at this moment of technological and organizational disruption and reflecting on how we reinvented our organization at the moment it needed it most. using dpla and the wikimedia foundation to increase usage of digitized resources article using dpla and the wikimedia foundation to increase usage of digitized resources dominic byrd-mcdevitt and john dewees information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13659 dominic byrd-mcdevitt (dominic@dp.la) is data fellow, digital public library of america. john dewees (john.dewees@toledolibrary.org) is supervisor, digitization services, toledo lucas county public library. © 2022. abstract the digital public library of america has created a process by which rights-free or openly licensed resources that have already been harvested can be copied over into wikimedia commons, thus creating a simple path for including those digital collections materials into wikipedia articles. by meeting internet users where they already are, rather than relying on them to navigate to individual digital libraries, the access and usage of digital assets is dramatically increased, in particular to user groups that might otherwise not have a reason to interact with such digitized resources. introduction a dpla-sponsored webinar given by dominic byrd-mcdevitt, dpla data fellow, and sandra fauconnier, glam-wiki specialist at the wikimedia foundation, on april 21, 2020, entitled “dpla intro to wikimedia: increased discoverability and use” introduced a workflow by which records harvested by the digital public library of america (dpla) could be automatically copied over into wikimedia commons with their accompanying metadata.1 the major benefit of this migration is the ease with which assets can then be added to wikipedia articles, exposing resources to a large audience of general internet users who might otherwise have no reason to interact with a given repository’s resources. the gains from making digitized resources available in wikipedia articles is substantial, providing incredibly high usage statistics while requiring very little time commitment to execute the work. this dpla project, launched in early 2020, was a result of grant funding provided by the alfred p. sloan foundation and ongoing consultation from the wikimedia foundation. dpla’s interest in designing this system stemmed from an exploration of new ways to increase usage of materials. while previous bulk uploads to wikimedia commons by cultural institutions have required technical expertise and steep learning curves in navigating the wikimedia community, this project was designed to reduce these barriers by taking advantage of dpla’s role as an aggregator (more information is available at https://commons.wikimedia.org/wiki/commons:partnerships). with the workflow developed by dpla’s technology team in mid-2020, an authorized bot account on wikimedia commons (user:dpla_bot, https://commons.wikimedia.org/wiki/user:dpla_bot) uploads assets from dpla institutions. using data provided by contributing institutions, dpla applies filters to identify eligible items from participating institutions, then for each of these generates wiki markup from descriptive metadata and downloads media files to a server. these files are uploaded by a script that interacts with wikimedia’s api using the pywikibot framework (https://www.mediawiki.org/wiki/manual:pywikibot). by centralizing all of the dpla network’s wikimedia commons uploads, dpla was able to upload over 2.25 million files (or 2.5 tb of total mailto:dominic@dp.la mailto:john.dewees@toledolibrary.org https://commons.wikimedia.org/wiki/commons:partnerships https://commons.wikimedia.org/wiki/user:dpla_bot https://www.mediawiki.org/wiki/manual:pywikibot information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 2 storage) from 780,000 items in under a year and a half, becoming the largest single contribution to wikimedia commons ever (by more than quadruple the previous record). 2,3 this approach to the problem provides a simple on-ramp to participation in wikipedia for dpla institutions—especially the many that would otherwise lack the resources or expertise to do so— by requiring of them only those tasks that need their local knowledge, such as describing their own collections prior to aggregation and then making editorial decisions on wikipedia about them once uploaded. this project required a chain of partnerships between separate organizations, as well as a variety of metadata and technical requirements that needed to be satisfied: records of digitized resources are created by an organization locally and are then harvested by dpla. the eligible records in dpla are then copied over into wikimedia commons. once images are in wikimedia commons it is a straightforward process to embed the images in wikipedia articles, thus achieving the goal of expanded use and access to digitized resources. john dewees, supervisor digitization services at the toledo lucas county public library (tlcpl), was in attendance at the april 21, 2020 webinar and subsequently met with dominic byrd mcdevitt on april 30, 2020 to discuss the feasibility of using tlcpl collections as a pilot project for this workflow. the copying of records from tlcpl’s repository into wikimedia commons was actually started in the course of that first conversation on april 30. a map from page 96 of the book geography of ohio (see figures 1 and 2), previously digitized by dewees, will be used to illustrate the process of how records move through the various tools and platforms discussed.4 tlcpl makes digitized resources available through ohio memory, a shared contentdm instance for libraries, archives, and museums in ohio maintained by the state library of ohio and the ohio history connection. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 3 figure 1. digitized image of geography of ohio, page 96, as seen in ohio memory. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 4 figure 2. record metadata for geography of ohio, page 96, as seen in ohio memory. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 5 dpla harvest dpla is a discovery portal that aggregates records of digitized resources from over 4,000 libraries, archives, and museums from around the united states. this creates a single search interface allowing millions of digital records to be searched simultaneously without having to navigate through a wide variety of different digital libraries. the aggregation of these records is accomplished by working with two different types of partners: content hubs and service hubs.5 content hubs are either organizations large enough to contribute to dpla directly, such as the library of congress or harvard library, or large digital libraries that work with partner institutions of their own, such as hathitrust or the internet archive. service hubs, on the other hand, act as mediators between the national aggregation service and individual organizations in states (such as ohio and its service hub, the ohio digital network) or regions (such as utah, idaho, and nevada, who have collectively formed a service hub in the mountain west digital library). service hubs ensure that the technical and metadata requirements for harvesting into dpla are satisfied and act as consultants and facilitators to prospective contributors. as dpla has grown over time, the metadata requirements and possibilities have also evolved and have varied depending on which service hub a contributing organization is working with. the ohio digital network (odn) is the service hub for our example page from ohio memory. odn’s metadata requirements for contributors in march 2021 included a title and a standardized rights statement in the metadata application profile for the contributing collection. more information on the dpla harvest process for the ohio digital network is available at https://ohiodigitalnetwork.org/contributors/getting-started. the nature of these requirements has also evolved since odn’s first harvest in march 2018. initially, the standardized rights statement was required to be one of the options from rightsstatements.org but through the work of dpla and odn, now creative commons licenses and the cc0 public domain dedication can be utilized as well. standardized rights statements must be formatted as machine-readable uris rather than textual descriptions. finally, the technical backend that supports the harvest of a digital collection is via an oai-pmh feed. other hubs operate in very different ways—such as some that actually host all their contributors’ collections in a single domain—but in all cases the end result is providing a data set that dpla can harvest and ingest. figures 3 and 4 illustrate this process, showing the geography of ohio represented as a record in dpla (available at https://dp.la/item/aaba7b3295ff6973b6fd1e23e33cde14) with associated metadata. https://ohiodigitalnetwork.org/contributors/getting-started https://rightsstatements.org/page/1.0/?language=en. https://creativecommons.org/licenses/ https://creativecommons.org/share-your-work/public-domain/cc0/ https://dp.la/item/aaba7b3295ff6973b6fd1e23e33cde14 information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 6 figure 3. geography of ohio as seen in dpla, specifically focusing on the thumbnail, link to the original record, and initial metadata fields. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 7 figure 4. geography of ohio as seen in dpla, specifically focusing on the remaining metadata fields harvested. this process achieves the first level of aggregation: harvesting thumbnail images (full-sized images suitable for research are not harvested in this process) and metadata from local digital repositories and making them available for a unified search experience in dpla. dpla’s aggregation currently contains over 42 million items, with the majority of these containing standardized rights uris; 18 million items have rights compatible with upload to wikimedia commons (as can be seen at https://dp.la/search?rightscategory=%22unlimited%20reuse%22). once dpla has access to the records, it is possible for the code authored by dpla staff to be utilized to then integrate the resources into wikimedia commons. wikimedia commons harvest wikimedia commons is part of the larger network of services and tools under the umbrella of the wikimedia foundation. there are a wide variety of different tools available such as wikidata, a portal for open structured data; wikipedia, a collaboratively edited open encyclopedia; and https://dp.la/search?rightscategory=%22unlimited%20re-use%22 https://dp.la/search?rightscategory=%22unlimited%20re-use%22 information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 8 wikimedia commons. this last portal uses the same software platform that powers wikipedia to create an open file and media server that can interoperate with the other tools. wikimedia commons is capable of hosting digital still images, audio files, and video files. anyone can contribute to this open repository so long as the work is in the public domain or openly licensed. users may either release a work for which they own the rights under an open license at upload time or may upload any other works by providing evidence in the metadata that the work is out of copyright or openly licensed (more information on copyright and licensing in wikimedia commons is available at https://commons.wikimedia.org/wiki/commons:licensing). with this in mind, in order for records in dpla to be eligible for harvest into wikimedia commons, they first must have one of the five specific standardized rights statements available at the following links:6 • http://rightsstatements.org/vocab/noc-us/1.0/ • https://creativecommons.org/publicdomain/zero/1.0/ • https://creativecommons.org/publicdomain/mark/1.0/ • https://creativecommons.org/licenses/by/4.0/ • https://creativecommons.org/licenses/by-sa/4.0/ the uris above indicate the most recent version of each of the associated copyright descriptions or licenses, though being published under the most recent version is not a requirement for harvest into wikimedia commons. while standardized rights statements are not a requirement for contributing to dpla generally, they are being used as a requirement for wikimedia commons upload so that the software has a machine-readable way to determine the compatibility of rights. though it is a non-profit educational resource, wikimedia commons does not utilize media under fair use or materials only licensed for noncommercial/educational use, in order to ensure its users may reuse the media for any purpose. as a result one thing to keep in mind is that while a given organization may include in their gift or accession agreement a statement that digitized versions of physical resources are allowed to be shared through channels decided by the organization, this does not necessarily extend to wikimedia commons users outside your organization, because of the requirement to be able to reuse materials with little restriction past attribution and the need to share alike, depending on the standardized rights statement. dpla locates the asset to upload by using urls explicitly provided by the service hub; the urls can be provided in one of two ways. one is to provide the iiif manifest url (via the iiif presentation api), from which the dpla-developed software queries the manifest for the list of assets which are listed by the presentation api in the form of iiif image api urls. the other way the media location can be identified is by providing a list of direct urls to the media in the field dpla calls mediamaster during the initial harvest process. unlike the iiif manifest url, this is a multivalued field that can accommodate a list of urls. the reason for this approach is to allow any institution to contribute assets via the pipeline, regardless of whether they actually have implemented iiif in their repository or not. not all organizations have adopted the iiif suite of apis so it is important to be able to provide more than one avenue for wikimedia commons harvest. https://commons.wikimedia.org/wiki/commons:licensing http://rightsstatements.org/vocab/noc-us/1.0/ https://creativecommons.org/publicdomain/zero/1.0/ https://creativecommons.org/publicdomain/mark/1.0/ https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by-sa/4.0/ https://iiif.io/api/presentation/3.0/ https://iiif.io/api/presentation/3.0/ https://iiif.io/api/image/3.0/ information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 9 however, providing a iiif manifest when and if it becomes available has benefits over the mediamaster field. it will always be true when queried, whereas the mediamaster values are only accurate to the last harvest, which may be a month or more out of date. figure 5. the dashboard developed by dpla displaying, for pine river library, percent of records that have open rights statements and percent of files with media access. a dashboard has been developed for dpla content hub and service hub administrators to analyze how many records in a given collection conform to the standardized rights statement and iiif api requirements (see figure 5). harvest of a collection into wikimedia commons from dpla necessitates that all eligible records in the collection be harvested into wikimedia commons; it is not possible for a participating institution to hand-curate which of the eligible items will be included. that is, all records in a given collection with the aforementioned standardized rights statements will be harvested into wikimedia commons. an additional signed agreement or memorandum of understanding has not been required between dpla and participating organizations due to the open nature of the works being transferred. since the works have been identified as in the public domain or openly licensed, users can already freely use the resources for any purpose they want, so long as it conforms to the appropriate creative commons license. resource presentation in wikimedia commons each portion of the migration process presents the resource in different ways. the original instance of geography of ohio is made available in contentdm as a complex digital object: multiple images (or more specifically in this case, pages) associated with a single metadata record. dpla presents this resource only in terms of its metadata along with a thumbnail image of the resource itself; to view the contents of the resource the user is directed back to the original repository for full access to the digital object. the migration process into wikimedia commons actually copies the image assets themselves along with the metadata. in this example, both the information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 10 image assets and the metadata are drawn from contentdm. wikimedia commons is not able to accommodate complex digital objects, and any that are imported via this process are broken out into discrete simple digital objects in wikimedia commons, for example, page 96 of geography of ohio (see figures 6, 7, and 8; view page 96 in wikimedia commons). figure 6. geography of ohio, page 96, as seen in wikimedia commons, with a focus on the file name, image, and viewing options. https://commons.wikimedia.org/wiki/file:geography_of_ohio_-_dpla_-_aaba7b3295ff6973b6fd1e23e33cde14_(page_96).jpg information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 11 figure 7. geography of ohio, page 96, as seen in wikimedia commons, with a focus on the record metadata. figure 8. geography of ohio, page 96, as seen in wikimedia commons, with a focus on the derivative images created from the original record and administrative metadata. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 12 the filename is programmatically generated and embeds a great deal of information; the following example illustrates the various components of the filename. example filename: file:geography of ohio dpla aaba7b3295ff6973b6fd1e23e33cde14 (page 96) (cropped).jpg 1. the prefix for all items in wikimedia commons, “file:” 2. the title of the work, in this case “geography of ohio” followed by a hyphen 3. the source of the digital object, universally “dpla” for this project, followed by a hyphen 4. the unique identifier assigned by dpla, in this case “aaba7b3295ff6973b6fd1e23e33cde14” 5. in the case of complex objects, the page number, in this case “(page 96)” 6. if the file was cropped using wikimedia common’s built-in image editing tool, “(cropped)” will be included between the page number and file extension to indicate the image is a derivative of an original 7. the file format extension, in this case “.jpg” even if the complex object being imported is not actually a book, the individual item records in wikimedia commons still uses the “(page x)” nomenclature to differentiate the individual objects. the summary section of the wikimedia commons record displays how the metadata is crosswalked into this environment. the dublin core creator, title, description and date fields are copied verbatim from the local metadata application profile (map). to identify the contributing institution, and to differentiate between similarly named institutions, dpla maintains a json file mapping all dpla institutions with their wikidata identifiers.7 this document also indicates which hubs/institutions are participating in the project at any given time through a true/false field that is toggled when an institution authorizes upload. this enables distinct category pages for each contributing institution and analytics to be tracked and provided to dpla, relevant hubs, and contributing institutions. the source/photographer field is one of the most important as it ensures that attribution for the contributing institution is clear. the field contains a narrative description of how dpla facilitated this resource to be available in wikimedia commons. it also makes available information on the original contributing institution with links to the record as it is originally displayed (in ohio memory in this case) as well as in dpla. proper attribution of items was a topic that came up continuously when discussing this project with other organizations, so it should be reassuring to know that credit and direct links back to resources is enabled in this workflow. the permission and standardized rights statement fields leverage the aforementioned uris to be able to provide information to the user on the copyright status of the work as well as concrete information on how exactly they are able to use it responsibly for their own purposes. finally, an interesting aspect of this record is the links provided to derivative images. in this case we can see the map displayed on this book page has two cropped derivative images. use in wikipedia articles all of the work described above is in service of one goal: to enable higher usage and exposure of digitized resources in wikipedia articles. while it is possible to do this work manually, inserting images into articles without being a dpla contributor or even having a digital repository to speak information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 13 of, the automated process is a clear advantage, especially when talking about large collections. for the map on page 96 in geography in ohio, we can see that the map of limestone distribution in ohio has been included in an image gallery on the limestone wikipedia article (see figures 9 and 10). figure 9. the wikipedia article on limestone displaying the introduction and one image (but not the worked-example image). https://en.wikipedia.org/wiki/limestone information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 14 figure 10. the image gallery in the limestone wikipedia article, with the map from geography of ohio included and seen at the bottom right. figure 11. the source view editing option for the limestone article in wikipedia, allowing direct editing of the wikitext. once images are in wikimedia commons, embedding the images in wikipedia articles is a simple process. one option for wikipedia editors is to use a what you see is what you get (wysiwyg) information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 15 html editor that should be familiar to most users. alternately, there is also a source view editing option which uses the custom markdown called wikitext to format pages in wikipedia (see figure 11). source view editing allows more precision when inserting images into wikipedia articles and makes it easier to understand how they will ultimately be displayed in the article. the way in which different page elements flow around one another in articles can be surprising when using the wysiwyg editing option as images assumed to show up where you placed the cursor can ultimately be placed in very different locations than expected. 8 usage analytics analytics tools are available that allow organizations to track the articles containing their assets, showing what image was embedded in an article and how many views the article received. tlcpl’s initial ingest added a total of 129,725 discrete image assets to wikimedia commons. from that pool, images were added to a total of 227 wikipedia articles between may 2020 and february 2021 (see figure 12). in that time period the articles had a total of 11.7 million page views (see figure 13).9 in february 2021 alone, the 227 enriched articles received 1.87 million page views. by comparison, the total number of records tlcpl had available in ohio memory was 129,395 in february 2021, and those records received 26,602 unique page views. the major strength of this project is to display locally created digitized resources where researchers would already be on the open web and take advantage of that much wider level of exposure.10 figure 12. a graph displaying the cumulative total number of articles with inserted images from tlcpl resources from may 2020 to february 2021. https://en.wikipedia.org/wiki/help:wikitext information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 16 figure 13. a graph displaying the monthly total number of page views of wikipedia articles with inserted images from tlcpl resources from may 2020 to february 2021. there is a valid discussion to be had about the comparison of these metrics, as comparing page views to unique page views is not a one-to-one match, but no matter the measurement it is fairly clear this audience is an order of magnitude beyond what might conventionally be available. ultimately what might be one of the most exciting metrics for an organization looking to implement this work is the amount of time it actually took to execute this project. since tlcpl’s records already satisfied the requirements to be copied over to wikimedia commons, the actual import process was able to begin during the april 30, 2020 zoom call between tlcpl and dpla staff that was set up to discuss the project; from the perspective of the contributing organization, this process takes essentially no time or effort. once the process is started, staff at the contributor institution will be informed when the records have finished being copied. the actual work of locating images for inclusion into articles and inserting them took roughly an hour of work a week, or roughly ten minutes per article, sometimes less. approximately 38 hours of work was spent identifying images and inserting them into articles between may 2020 and february 2021. while not of central concern to the project or its usage, the editorial work is also interesting and uses enough creativity and problem solving to be an enjoyable activity. because the resources in wikimedia commons are available to be used by anyone (as in, anyone with a device and an internet connection), this makes it a wonderful opportunity for interns or volunteers to contribute. volunteers could work on the editorial portion of this project remotely with no real barriers. while all the effort of getting tlcpl digitized images into wikipedia described here has been using previously existing articles, this work could make an excellent partnership opportunity information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 17 with schools to write and create whole new articles for which there is an abundance of already digitized resources to support. conclusion the work of remediating and writing metadata to participate in large consortial efforts such as dpla is always going to be a major undertaking, but projects like this that can leverage automation and partnerships show just how powerful these relationships can be. making locally digitized resources available through dpla, copying them over to wikimedia commons, and then embedding those images into wikipedia articles is an excellent opportunity to meet users where they already are—online. this work provides exceptionally high usage statistics and is fertile ground for outreach and programming opportunities to get partners, volunteers, and interns involved with making those digitized resources available in wikipedia. acknowledgements special thanks to jen johnson, library consultant at the state library of ohio, and virginia dressler, digital projects librarian at kent state university, for their support in enabling this work and article. endnotes 1 this presentation is available on youtube at https://youtu.be/0bsoksybcbi. information on all past dpla webinars and programming can be found at https://pro.dp.la/events/workshops. 2 the entire collection of all resources contributed to wikimedia commons via dpla can be found at https://commons.wikimedia.org/wiki/category:media_contributed_by_the_digital_public_libr ary_of_america. 3 statistics related to contributor totals were created from a wikimedia database query published at https://quarry.wmflabs.org/query/51256. 4 geography of ohio was published as part of a series of bulletins by the ohio state geological survey. the book was authored by roderick peattie, an assistant professor of geology at ohio state university, in 1923. this item was digitized by the toledo lucas county public library and uploaded as part of public domain day 2019. the digitized version of this book is available through ohio memory at https://ohiomemory.org/digital/collection/p16007coll33/id/115214. 5 more information, including a complete list of content hubs and services hubs and their geographic distribution, is available on the dpla website at https://pro.dp.la/hubs/our-hubs. 6 as shared by dominic byrd-mcdevitt in a webinar on march 18, 2021 entitled “dpla + wikimedia: one year in + ten million views,” available at https://www.youtube.com/watch?v=jloj0gvvsnu. 7 the json file is available for view on dpla’s github page at https://github.com/dpla/ingestion3/blob/develop/src/main/resources/wiki/institutions_v2. json. https://youtu.be/0bsoksybcbi https://pro.dp.la/events/workshops https://commons.wikimedia.org/wiki/category:media_contributed_by_the_digital_public_library_of_america https://commons.wikimedia.org/wiki/category:media_contributed_by_the_digital_public_library_of_america https://quarry.wmflabs.org/query/51256 https://ohiomemory.org/digital/collection/p16007coll33/id/115214 https://pro.dp.la/hubs/our-hubs https://www.youtube.com/watch?v=jloj0gvvsnu https://github.com/dpla/ingestion3/blob/develop/src/main/resources/wiki/institutions_v2.json https://github.com/dpla/ingestion3/blob/develop/src/main/resources/wiki/institutions_v2.json information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 18 8 for more information on step-by-step instructions for adding images into wikipedia articles after harvest, see the blog post at https://johndewees.com/2021/03/18/adding-images-towikipedia-articles-via-dpla/. 9 all statistics on wikipedia page views are drawn from the baglama 2 utility available at https://glamtools.toolforge.org/baglama2/#gid=430. 10 up-to-date statistics and data are available at the digitization statistics dashboard created to communicate about major projects in digitization services at the toledo lucas county public library and available at https://docs.google.com/spreadsheets/d/1jv0zzt6h_jl1tq8v2zdxmf5ygn0dfbnhqbffifcep zm/edit?usp=sharing. https://johndewees.com/2021/03/18/adding-images-to-wikipedia-articles-via-dpla/ https://johndewees.com/2021/03/18/adding-images-to-wikipedia-articles-via-dpla/ https://glamtools.toolforge.org/baglama2/#gid=430 https://docs.google.com/spreadsheets/d/1jv0zzt6h_jl1tq8v2zdxmf5ygn0dfbnhqbffifcepzm/edit?usp=sharing https://docs.google.com/spreadsheets/d/1jv0zzt6h_jl1tq8v2zdxmf5ygn0dfbnhqbffifcepzm/edit?usp=sharing abstract introduction dpla harvest wikimedia commons harvest resource presentation in wikimedia commons use in wikipedia articles usage analytics conclusion acknowledgements endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. digital resource sharing and library consortia in italy giordano, tommaso information technology and libraries; jun 2000; 19, 2; proquest pg. 84 digital resource sharing and library consortia in italy tommaso giordano interlibrary cooperation in italy is a fairly recent and not very widespread practice. attention to the topic was aroused in the eighties with the italian library network project. more recently, under the impetus toward technological innovation, there has been renewed (and more pragmatic) interest in cooperation in all library sectors. sharing electronic resources is the theme of greatest interest today in university libraries, where various initiatives are aimed at setting up consortia to purchase licenses and run digital products. a number of projects in hand are described, and emerging trends analyzed. t he state of progress and the details of implementation in various countries of initiatives to share digital information resources obviously depend-apart from current investment policies to develop the information society-on many factors of a historical, social, and cultural nature that have determined the evolution and consolidation of cooperation practices specific to each context. before going to the heart of the specific subject of this article, in order to foster an understanding of the environment in which the trends and problems that we shall be considering are set, i feel it best to give a quick (and necessarily summary) sketch of the library cooperation position in italy. the word "cooperation" became established in the language of italian librarians only toward the mid-'70s, when in the sector of public libraries-which were transferred in those years from central government to local authorities-the "territorial library systems" were taking shape: this was a form of cooperation provided for and encouraged by regional laws that brought together groups of small and medium-sized libraries, often around a system centre supplying shared services. a few years later, in the wake of the new information technologies and in line with ongoing trends in the most advanced countries, in italy, too, the term "cooperation" became increasingly associated with the concept of computerized library networks. the decisive impulse in this direction came from a project of the national library service (sbn), the national network of italian libraries, then in a gestation stage, which also had the merit of tommaso giordano (giordano@datacomm.iue.it) is deputy director of the library at the european university institute, florence. 84 information technology and libraries i june 2000 speeding up the opening of the italian librarianship profession to experiences underway in the most advanced countries_! in the '80s, cooperation, together with automation, was the dominant theme at conferences and in italian professional literature. however, the heat of the debate had no satisfactory counterpart in terms of practical implementation, because of both resistance attributable to a noninnovative administrative culture and the polarization of the bulk of the investments around a single major project (the sbn network), the technical and organizational choices of which were shared by only part of the libraries, while others remained completely outside this programme. many librarians, while recognizing the progress over the last fifteen or twenty years (including the possibility of accessing the collective catalogue of sbn libraries through the internet), maintain that results obtained in the area of cooperation are well below expectations, or energy involved. i am touching here on one of the most sensitive, controversial points in the ongoing professional debate, which i do not wish to dwell on except to note the split that came in italian libraries following the vicissitudes of a project that ought, instead, to have united them and stimulated large-scale cooperation.2 i shall now seek to summarize the cooperation position in italy in relation to the subject of this article. very schematically (and arbitrarily) i have grouped the experiences i feel most signficant under three heads: sbn network, territorial library systems, and sectoral cooperation. sbn brings together some eight hundred large, medium-sized, and small libraries (national, localauthority, university, and research-institute). the programme, funded by the central government, supports cooperation in the following main sectors: • hardware sharing, • development and maintenance of library software packages, • network administration, • shared cataloguing, and • interlibrary loans. the sbn is a star network with its central node consisting of a database (the so-called "index") containing the collective catalogue of the participating libraries (currently some four million relevant bibliographic titles and 7.5 million locations). to the index are linked the thirtyseven local systems, single libraries or multiple libraries, that apply the computerized procedures developed by the sbn programme. thus the sbn is a closed network of only those libraries agreeing to adopt the automation systems distributed by the central institute for the union catalogue, the central office coordinating the programme, take part. reproduced with permission of the copyright owner. further reproduction prohibited without permission. from the organizational viewpoint, the sbn can be regarded as a de facto consortium (i.e., not in the legal sense of the term), even if the management bodies, participation structures, and funding mechanisms differ considerably from consortia that have been set up in other countries. in fact, libraries join the sbn through an agreement among state, regions, and universities, and the governing bodies represent not the libraries but their parent institutions. participating libraries receive the services free, and funding for developing the systems and network administration comes from the central government, which coordinates the technical level of the project through iccu.3 currently, ideas are moving toward evolving the sbn into an open network system and reorganizing its management bodies: if this provision becomes a reality, the sbn will have potential for taking on an important role in developing digital cooperation. the territorial library systems, developed especially in the central and northern regions, consist of small groups of public libraries cooperating in one or more sectors of activity such as: • sharing computer systems, • cataloguing, • centralized management of purchases, • interlibrary loans, and • professional training and other activities. the library systems are based on conventions and formal or informal agreements between local institutions (the municipalities) and receive support from the provincial and regional administrations. in more recent years some systems (e.g., abano terme, in the veneto) have formed themselves into formal, legal consortia. the most advanced experience in this sector-for example, the libraries in the valseriana (an industrial valley in lombardy), which have been operating on the basis of an informal consortium for some twenty years now-have reached a high level of efficiency comparable with the most developed european situations and may rightly be regarded as reference models for the organization of cooperation. however, given their limited size, they are unlikely to achieve economies of scale in the digital context unless they develop broader alliances. it is not unlikely that these consortia, given their capacity to work together, will in the near future develop broader forms of cooperation suited to tackling current technological challenges. sectoral cooperation (cooperation by area of specialization) is meeting today with steadily increasing interest, though it did not fare very well in the past. among the rare initiatives embarked upon by university and research libraries in this direction, particular importance in our context attaches to the national coordination of architectural libraries (cnba), started some twenty years ago, which became an association in 1991. the cnba has various projects on its programme and can be regarded as an established reference point for cooperation among architectural libraries. we should also mention one of the "oldest" cooperation projects among research libraries: the italian periodicals catalogue promoted by the national research council (cnr), recently made available online by the university of bologna.4 to complete this sketch, at least a mention should be made of the participation of italian libraries in the european commission's technical programme in favor of libraries. this programme, which since 1991 has mobilized the world of libraries in the european union, not only favors and guides explosion of technologies into libraries in accordance with preset objectives, but also has the aim of encouraging cooperation among libraries in the various countries. the programme-the latest edition of which includes not just libraries but also archives and museums-has secured significant participation from many italian libraries. over and above the validity of the projects already carried out or under way (important as that is), this programme has been very valuable to italian libraries in terms of exchanges of experience and of opening up professional horizons, especially as regards cooperation practice.s digital cooperation recently, following the expansion of electronic publishing, university libraries have been displaying renewed interest in cooperation activities with particular reference to acquiring licenses and sharing electronic resources. this movement is at present in full swing and is giving rise to manifold cooperation initiatives. to get an idea of the trends under way, one may leaf through a session on database networking in italian universities in the proceedings of the aib congress at genoa. 6 on that occasion a group of universities presented a "draft proposal of agreement on access to electronic information." the document is divided into two parts, the first defining the purposes and object of university cooperation in the sphere of electronic information. the second part indicates operational objectives for cooperation in acquiring electronic information and proposes a model contract for purchasing licenses, to which member universities are to keep. the content of this second part coincides with the recommendations and understandings signed by associations, consortia, and groups of libraries in other countries, and largely follows the indications and recommendations issued by the european bureau of library information and documentation associations (eblida), the organization that brings together the library associations of the various european countries; by digital resource-sharing and library consortia in italy i giordano 85 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the international coalition of library consortia (icolc); and by other library organizations. there is no point here in listing all initiatives under way in italian libraries, in part because most of them are only just started or in the experimental stage. i shall mention a few only to bring out the trends that seem, from my point of view , to be emerging . development of digital collections at the moment initiatives in this sector are much fewer and less substantial than in other industrialized countries. among them the biblioteca telematica italiana stands out: in it, fourteen italian and two foreign universities digitize , archive, and put online works in italian . the project is based on a consortium, the italian interuniversity library center for the italian telematic library (cibit), supported by funds from the national research council (cnr) and made up of the fourteen italian and two foreign universities that have signed the agreement. technical support is provided by the cnr institute for computer linguistics, located in pisa.7 in this context we must also note, especially for the consequences it may have for the future growth of digital collections, an agreement between the national central library in florence and the publishers and authors associations aimed at accomplishing the national legal depository for electronic publishing project, which also provides for production of a section of the italian national bibliography to be called bnidocumenti elettronici. the publishers who have signed the agreement undertake to supply a copy of their electronic products to the national central library in florence. the latter undertakes to guarantee conservation of the electronic products deposited, and to make them accessible to the public in accordance with the agreements reached. • description of electronic resources in this area the bulk of the initiatives are still in an embryonic stage. in the sector of periodicals index production (i.e., tocs), mention should be made of the economic social science periodicals (essper), a cooperation project on italian economics periodicals launched by the libero istituto universitario carlo cattaneo (castellanza, varese) to which some forty libraries are contributing. 9 recently the project has been extended to italian legal journals. essper is a cooperative programme based on an informal agreement among the libraries, each of which undertakes to supply in good time the tocs of the periodical titles they have undertaken to monitor. the programme does not benefit from any outside funds, being supported entirely by the participating libraries, which 86 information technology and libraries i june 2000 have recently been endeavouring to evolve into a more structured form of cooperation . administration of electronic resources and licenses in this sphere there have been numerous initiatives recently, particularly by university libraries . one may note, first, a certain activism by university data-processing consortia (big computing centres created at the start of the computer era to support applications in scientific and then university and library administration areas). the interuniversity consortium for automation (cilea) in milan , which has for some time been operating in the area of library systems and electronic information distribution (especially in the biomedical sector), has extended its activities by offering services to nonmembers of the consortium too. recently cilea, in connection with a broader programme---cdl-cilea digital library-has been negotiating with a number of major publishers the distribution of electronic journals and online bibliographic services on the basis of needs expressed by the libraries in the consortium. caspur (the university computing consortia in rome) is working on several projects, among them shared management of electronic resources on cd-rom in a network among five universities of the centre-south . caspur, too, has opened its services to libraries not in the consortium and is negotiating with a number of major publishers the licenses for establishing a mirror site for electronic periodicals. the university of genoa, through csita, its computing services centre, has concluded an agreement with an italian distributor of electronic services to enable multisite license-sharing for biomedical databases by institutions operating on the territory of liguria. very recently the universities of florence, bologna, modena, genoa, and venice and the european university institute in florence have initiated a pilot project (cipe) for shared administration of electronic periodicals, and have begun negotiations with a number of publishers. let us now seek to draw some conclusions from this initial, brief consideration of current initiatives: • initiatives in the area of digital cooperation are coming mainly from the world of university and research-institute libraries. • no projects are big enough to achieve economies of scale, with most initiatives in hand having a very limited number of partners and often being experimental in nature . • projects under way do not provide for the formation of proper consortia, most likely because the legal form of the consortium is hard to set up in italy because of the burdens involved, especially the complexity and length of the decision-making processes needed to constitute such an organization. reproduced with permission of the copyright owner. further reproduction prohibited without permission. • librarians prefer decentralized forms of cooperation, partly because , shaken by experiences of the past, they fear losing autonomy and efficiency and finding themselves caught up in the bureaucracy of centralized organizations. "however, there can also be a correlation between the amount of autonomy that the individual institution retains and the ability of the consortium to achieve goals as a group". this observation by allen and hirshon obviously holds for italy too . jo it is no coincidence , in fact, that university computing consortia, who have centralized staff and funds available, are able to carry out more incisiv e actions in this sector. • except for the biblioteca telematica italiana, no initiatives seem to have been incentivized by ad hoc government programmes or funds. • a part of the cooperation projects concerns sharing of databases on cd-roms. the traditional italian resistance to online materials would seem to be due partly to the still inadequate network infrastructures in our country; improvements in this sector might bring a quick turnaround here. • some initiatives in hand have been inspired more by suppliers than by librarians : the risk is to cooperate in distributing a particular product, not to enhance libraries' bargaining power. without wishing to deny anything to the suppliers, who today play an essential part in terms of professional information too, i feel that keeping the roles clearly separate may help to develop clear, upright and mutually advantageous cooperation. • some major project s are being led by universit y computing consortia that have begun to take an interest in the library sector. the university computing consortia would indeed have some of the requirements to play a first-rank role in this sphere if they can manage to bring themselves into their most natural position, i.e., to operate as agents of libraries rather than as distributors of services on behalf of the commercial suppliers. moreover, it ought to be clear that th e computing consortia should act as partners with the library consortia and not as substitutes for them, otherwise the libraries risk limiting their autonomy of decision . • some attention is turning toward university electronic publishing , though at the present stage it does not seem there are practical projects for cooperation in this area. • finally, one has to not e low initiative by libraries (compared with other countries) in developing content and in storing digital collections. th e analysis i have rapidly summarized here is the basis for an initative which has in recent months been stimulating the debate on digital cooperation in italy. i am referring to the italian national forum on electronic information resources (infer), a coordination group initially promoted by the europ ean university institut e, the university of florence, and a number of universities in the centre-north, which is today extending beyond the sphere of university and research libraries. the forum's chief mission is to coop erate to promote efficient use of electronic information reso urce s and facilitate access by the public. to this end it encourages libraries to set up consortia and other typ es of agreement on acquisition and management of electronic resources and access to them . infer's objectives can be summarized as follows: • to act as a reference and linkage point and develop initiatives to promote activities and programmes in the area of library e lectro nic resource sharing; • to enhance awar eness both at institutional political levels (ministries, universities, local authorities, etc.) and among librarians and end users; • to facilitate dialogue and mutual collaboration between libraries and all others in the knowledge production and distribution chain, to help them all (authors, publi shers, intermediaries, end users) to take advantage of the opportunities offered by the information society; and • to maintain contacts with similar initiatives under way in other countries. infer has immediately embarked on a rich programme of activities which is giving appreciable results especia lly in terms of raising awareness of the problem and coordinating initiativ es in the area. we shall her e briefly mention some of the actions in hand that seem to us most important. dissemination of information. infer has developed a web site where as well as information on the forum's activities, important information and documents can be found relating to the consortia, the negotiations and licenses, and in general the digital resource-sharing programmes in italy and around the world.1 1 a discussion list for tnfer members has also been activated. seminars and workshops. thi s activity is aimed at further exploration of themes of particular interest (e.g ., legal aspects of license contracts, or programmes under way in other countries) . data collection. the two main programmes corning und e r this heading are: (a) monitoring of italian cooperation initiatives under way in the digital sector; and (b) collecting data on acquisitions of electronic information resources in university libraries . this information will enable the libraries to have a more exact picture of the situation , so as to assess their bargaining power and achieve the necessary support to adopt the most appropriate strategies. digital resource-sharing and library consortia in italy i giordano 87 reproduced with permission of the copyright owner. further reproduction prohibited without permission. indications and recommendations. as well as translating and distributing documents from the most important associations operating in this area (such as eblida, icolc, and ifla), infer is developing a model license for the italian consortia. infer was set up in may 1999 and currently has some forty members, most of them representatives of university library systems, university computing consortia or research libraries, or univer si ty professors. one of infer's aspirations is to persuade decision-makers to develop a programme of incentives on a national scale for the creation of library consortia . i critical factors as to the delay we note in terms of shared management of electronic resources, weight clearly attaches to the fact that cooperation is not very established , nor are the national structures that ought to have supported it. it would be all too easy and perhaps also more fun to attribute this situation to the so-called individualism of italians and to abandon inquiry into th e structural limitations that may have determined it. first of all, except in very few cases, libraries have no administrative autonomy, or only very little, and with hardly any decision-making powers. this factor favors interference in decision-making processes, complicates th em, slows down procedures, and strips librarians of their responsibility. one of the reasons why the sbn has not managed to generate cooperation is to be sought in the mechanisms for joining and participating in the programme . in other words, many libraries have joined the sbn following decisions taken from above, at the political and admistrative levels, and not on the basis of an autonomous, weighted assessment of attitudes, needs, and alternatives. these experiences have augmented libraries' reluctance to embark on centrally steered national programmes. on the other hand, the low administrative autonomy they have prevents them from implementing truly effective alternative solutions, i.e., ones able to realize economies of scale. another factor is the administrative fragmentation of libraries . the big universities have fifty or so libraries each (often one per department). some universities have an office coordinating the librari es , but only in very few cases does this structure have the powers and the necessary support to coordinate; more often it acts as a mediation office with no real administrative powers. in short, the result is that since (perhaps also because of a misunderstood sense of departmental autonomy) there is no 88 information technology and libraries i june 2000 decision-making centre for libraries in each university , decisional processes prov e slow and cumbersom e. clearly, all this brings many probl ems in establishing understandings and cooperative programmes with other libraries and weakens the universities in negotiating licenses. this position, while objectively favoring suppliers in the short term, in the long term risks facing them with difficulties given an increasingly impoverished, uncertain market because of the fragmentation and the limited capacity of possible purchasers . another limit is the insufficient awareness, especially on the academic side, of the challenges of electronic information. in early 1999 the french daily le monde published an extensive feature on scientific publishing, showing how current publishing production mechanisms, whil e assuring a few big publishers of ample profit margins, are suffocating libraries and universities under the continu ous rises in prices for scientific joumals.12 the argument, immediately taken up by the spanish el pais and other european newspapers, met with very little response in italy. clearly, in italy today, the conditions do not exist to embark on initiatives like the incisive open letter to publishers sent by the kommission des deutschen bibliotheksinstituts filr erwerbung und bestandsentwicklung in germany, supported by similar swiss, austrian, and dutch organizations. 13 the lack of an adequate national policy in the area of electronic information is probably the direct consequence of the problems i have just mentioned. in this context, however praiseworthy the initiatives, they tend in the absence of reference points and practical support to break up or fritter away . under the ministry for universities there are no leadership or action bodies in the area of academic information, like the joint information system committee in britain that stimulates programmes aimed at developing and utilizing information technologies in university and research libraries . these observations are also valid for the state libraries and public libraries, too, where the central (ministry for cultural affairs) and regional authorities could play a more effective part in promoting digital cooperation . i conclusions the picture i have presented is not very rosy. however, it does reveal considerable elements of vitality and great er awareness of the problems emerging, starting with a few representatives of academic sectors who might be able to wield influence and bring about a turnaround. at the moment, the consortium movement to share electronic resources chiefly involves university libraries, reproduced with permission of the copyright owner. further reproduction prohibited without permission. but a few initiatives by public libraries are starting to appear, especially in the multimedia products sector. no specific lines of action are yet emerging at the level of the national authorities-especially the ministry for education and research and the ministry of cultural activities, on which the national libraries and many research libraries depend. it is likely that in the near future the entry of these agencies may be able to modify the current scenario and considerably influence the approach to cooperation. from this viewpoint, the impression is that a few consortium initiatives that have been flourishing in recent months on the part of both libraries and suppliers have the principal aim of proposing cooperation models to guide future choices. in conclusion, we are only at the outset, and the game is still waiting to be played. references and notes 1. michel boisse!, "l'organisation automatisee de la bibliotheque de l'institut universitaire europeen de florence," bulletin des bibliotheques de france 24, no. 5 (1979): 231-39. for an overall picture of the debate, see: la cooperazione: ii servizio bibliotecario nazionale: atti de/ 30th congresso del/'associazione italiana biblioteche, giardini naxos, november 21-24, 1982 (messina: universita di messina, 1986). 2. tommaso giordano, "biblioteche tra conservazione e innovazione," in giornate uncee su/le biblioteche pubbliche statali, roma, january 21-22, 1993 (roma: accademia nazionale dei lincei, 1994): 57-65. for the most recent developments in the debate, see the articles by antonio scolari, "a proposito di sbn," giovanna mazzola merola, "lo studio sull'evoluzione de! servizio bibliotecario nazionale," and claudio leombroni, "sbn un bilancio per ii futuro," bollettino aib 37, no. 4 (1977): 437-66. 3. further information on sbn can be found at www.iccu.sbn.it/sbn.htm, accessed oct. 27, 1999, where the collective catalogue of participating libraries is also accessible. 4. catalogo italiano dei periodici (acnp),www.cib.unibo.it/ cataloghi/infoacnp.htm, accessed sept. 19, 1999. 5. there is a considerable literature on the european commission's "libraries programme": for a summary of projects in the programme, see telematics for libraries: synopses of projects (luxembourg: office for official publications of european communities, 1998). updated information on the latest version of the programme can be found at www.echo.lu/ digicult, accessed oct. 26, 1999. on italian participation in the programme see: "ministero per i beni culturali e ambientali, l'osservatorio dei programmi internazionali delle biblioteche 1995-1998" (roma: mbac, 1999). 6. associazione italiana biblioteche (aib), xliv congresso nazionale aib. genova, 1988: www.aib.it/aib/congr/co98univ. htm, accessed oct. 27, 1999. 7. more information about cibit can be found at www.ilc.pi.cnr.it/pesystem/19.htm, accessed may 19, 2000. 8. progetto eden: deposito legale editoria elettronica n azionale, www.bncf.firenze.sbn.it/ progetti.html, accessed sept. 29, 1999. 9. more information about essper mav be found at www.liuc.it/biblio/ essper /default.htm, access~d may 19, 2000. 10. barbara mcfadden allen and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no. 1 (1998): 37-44. 11. the infer web page can be found on the universita di roma i site, www.uniromal.it/infer, accessed may 19, 1999. 12. le monde, 22 jan. 1999: a whole page is devoted to this topic. see especially the article titled "les journaux scientifiques menaces per la concurrence d'internet." accessed feb. 4, 1999, www.lemonde.fr/ nvtechno /branche / journo / index.html. the point was taken up again by el pa(s, 27 jan. 1999; see the article titled "las revistas cientfficas, amenazadas por internet." 13. the letter, signed by werner reinhardt, dbi president, is available at www.ub.uni-siegen.de/pub/misc/offener_brief-engl. pdf, accessed feb. 4, 1999. digital resource-sharing and library consortia in italy i giordano 89 development of a gold-standard pashto dataset and a segmentation app article development of a gold-standard pashto dataset and a segmentation app yan han and marek rychlik information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12553 yan han (yhan@email.arizona.edu) is full librarian at the university of arizona libraries. marek rychlik (rychlik@math.arizona.edu) is full professor at the department of mathematics, university of arizona. © 2021. abstract the article aims to introduce a gold-standard pashto dataset and a segmentation app. the pashto dataset consists of 300 line images and corresponding pashto text from three selected books. a line image is simply an image consisting of one text line from a scanned page. to our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the pashto language. we also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to ocr segmentation. the authors discuss the steps to build a pashto dataset and develop our unique approach to segmentation. the article starts with the nature of the pashto alphabet and its unique diacritics which require special considerations for segmentation. needs for datasets and a few available pashto datasets are reviewed. criteria of selection of data sources are discussed and three books were selected by our language specialist from the afghan digital repository. the authors review previous segmentation methods and introduce a new approach to segmentation for pashto content. the segmentation app and results are discussed to show readers how to adjust variables for different books. our unique segmentation approach uses an expanding textbox method which performs very well given the nature of the pashto scripts. the app can also be used for persian and other languages using the arabic writing system. the dataset can be used for ocr training, ocr testing, and machine learning applications related to content in pashto. background the ocr technology for printed modern latin scripts is a largely solved problem, as both character and word accuracies typically reach greater than 95%. most well-known commercial ocr systems include abbyy, omnipage, and adobe acrobat ocr engine (licensed from iris), while open source systems have tesseract, ocropus, and kraken. ocr technology for other languages and scripts, including arabic scripts and traditional chinese, is still not satisfactory despite the fact that ocr research on these languages has been ongoing since the 1980s. an east asian librarian in 2019 wrote to the author: i am just back from the annual aas (association for asian studies) and ceal (council on east asian libraries) meetings. this year, prof. peter bol of harvard hosted a 2-day digital tech expo there to promote digital humanities . . . i spent 1 day on the dh sessions, where scholars constantly mentioned chinese ocr as a conspicuous and serious block on their path to assessing “digitized” textual collections. if you and your team succeed, it will surely help the eas scholarly community a lot.1 mailto:yhan@email.arizona.edu mailto:rychlik@math.arizona.edu information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 2 sturgeon, who has directed the chinese text project since 2005, stated that ocr of premodern chinese texts presents challenges distinct from ocr of modern documents and premodern documents in other languages, because training data is typically not available and a natural approach to improving accuracy is to train using data extracted from real images of text in the same historical writing style.2 sturgeon utilized both imperfect ocr software and allowed users to manually key in corresponding text via a crowdsourcing approach to gradually improve the quality of transcriptions.3 in 2018, the authors received a grant award from the national endowment for the humanities (neh) to develop ocr and a software prototype for an open-source global language databank for pashto and traditional chinese. activities included fundamental research and software implementation of new ocr technology for the two languages. for the past two years, we have been engaged in all aspects of ocr research in pashto, persian, and chinese scripts, including assessing current technology and systems, reviewing and building datasets, and researching and implementing segmentation algorithms and machine learning models involving neural networks. languages, scripts, and writing systems people in the world read, write, and speak a handful of major languages. of those, reading and writing is accomplished through the use of several types of scripts: latin, chinese, arabic, and devanagari. languages and scripts are very complex topics in regard to origin, structure, and use. they evolve due to influencing and being influenced by each other. a script is defined as “a collection of letters and other written signs used to represent textual information in one or more writing systems,” where a writing system is a common communication method to allow people to exchange information through a medium such as paper.4 the first requirement in a writing system is letters or other written signs. a common writing system can use an alphabet, syllabary, or logography. specifically, the latin and arabic writing systems use alphabets, where an alphabet is a standardized set of letters. combination of letters makes a word. another approach is to use a logogram. chinese characters (including japanese kanji and korean hanja) are logograms. in the alphabet and syllabic systems, individual characters represent sounds only, while in the logographic system each logogram represents a word or a phrase. one script, such as latin and arabic, may be used for several different languages, while some languages use several scripts. latin script is used in western europe, most of eastern europe, and across north and south america. arabic script was adopted by the west asian, middle eastern, and near african regions. in contrast, the japanese use three scripts: the hiragana and katakana syllabaries and the kanji logogram. the next critical feature in a writing system is the order in which to read and write. a writing system has two directions: horizontal and vertical. almost all writing systems are written vertically from top to bottom (ttb). bottom-to-top (btt) writing systems do exist. the philippines traditional scripts, the tagalog (baybayin), hanunóo, buhid, and tagbanwa are in limited use today. they are written from btt.5 within the ttb method, four possibilities exist: 1. left to right (ltr) first and ttb: this method refers to writing a horizontal line starting from the top left of a page, continuing to the right, and returning to the next line all the way from top to bottom. the latin writing system uses this variation. the current chinese writing system uses this order as well. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 3 2. right to left (rtl) first and ttb: this method refers to writing a horizontal line starting from the top right of a page, continuing to the left, returning to the next line and all the way from top to bottom. arabic writing systems, such as arabic, persian, and pashto, use this order. 3. ttb first and rtl: this method refers to writing a vertical line starting from top right of a page, continuing to the bottom, and returning to the next line all the way from right to left. this method was widely used in traditional chinese (before the 1950s) and traditional japanese materials for thousands of years. it is still used in chinese calligraphy, and occasionally can be found in materials published in chinese. 4. ttb first and ltf: rarely used by a writing system. one of the examples is the manchu script.6 the nature of the scripts and the writing systems may require different algorithms and considerations when we deal with ocr technology, including preparing datasets, segmentation, and performing ocr in computer vision. pashto pashto (پښتو ), alternatively spelled as pushto, pukhto, or pakhto, historically as afghani (افغاني ), is one of the two official languages of afghanistan (the other is dari/farsi/persian). it is also spoken as a regional language in pakistan. pashto is spoken by 40 to 60 million people in afghanistan and pakistan. 7 the arabic script writing system is used for writing arabic, persian, and pashto languages in a cursive style. arabic, persian, and pashto are totally different languages, though they use almost the same alphabets within the same writing system. the pashto alphabet is a modified form of the arabic alphabet. it consists of 45 letters and four diacritic marks and includes all 28 letters from the arabic alphabet. the pashto alphabet includes all 32 letters from the persian alphabet, of which 28 letters are from the arabic alphabet. the romanization of pashto consists of several standards including the american library association (ala) and library of congress (lc) ala-lc romanization, bgn/pcgn, din 31635, iso233, and arabtex. details of romanization of pashto letters with their initial, medial, final forms, and the ala-lc rules are available at library of congress’s website.8 the need for datasets the authors are currently engaging in ocr research, and have applied machine learning (ml) models and methods such as convolutional neural networks (cnn) and recurrent neural networks (rnn). the advance of ml models and multiple methods has achieved great improvements in many fields. for instance, the most well-known event in ml occurred when an ai program named alphago defeated the world go champion in 2015. open-source ocr systems tesseract and ocropus both released their ocr systems using the rnn models in 2014 and 2018. these models and methods rely heavily on datasets for training, improvement, and evaluation. similarly, alphago uses datasets for training and evaluations. good and comprehensive datasets are critical to the success of an ml model and/or method. the most well-known dataset is the mnist database which contains a training set of 60,000 images and a test set of 10,000 images (28 × 28 pixels) of handwritten digits (0–9). the dataset is widely used for training and testing in ml as the gold-standard dataset for ml techniques and pattern recognition. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 4 related datasets currently, few pashto datasets are available as open access. while there are other pashto datasets mentioned in the literature, we have not found one that provides a one-to-one mapping of line images to texts. a search on github has one result showing a raw text dataset containing content in pashto scraped from the web. however, this dataset is of little use in the case of training ml models for ocr, because it has no corresponding text. the computer science department of the national university of computer and emerging sciences (nuces) peshawar campus has been working on pashto ocr since 2006, and its research has created a pashto image-to-ligature dataset titled fast-nu dataset, containing 4,000 images of 1,000 unique ligatures in a variety of font sizes.9 the creators of this dataset have kindly sent us the pashto image-to-ligature dataset. a recent paper discussed the use of deep learning architectures for ocr in pashto with the development of a bigger dataset based on the fast-nu dataset including contours, negative, and rotated images.10 ali developed a database recording pashto digits from 25 male and 25 female native pashto speakers for automatic speech recognition. unfortunately, the authors had difficulties in downloading this dataset.11 khan et. al. designed a database encompassing a total of 4,488 images (102 distin guishing samples for the 44 pashto letters). this approach is very close to that of the fast-nu dataset.12 we are not sure if they are very similar, as we have not found a way to download and evaluate the dataset. another article describes offline pashto ocr using ml which tested more than 5,000 images in the dataset.13 the article describes its “extraction of lines containing pashto content,” but these “lines containing pashto content” have no specific resource or link to check. rawan and han compiled a pashto–english dictionary, which is open accessible through its website and an android app.14 in the past decade of working with afghan materials, han and rawan found several existing pashto language dictionaries online but encountered several issues related to standardized spelling, pronunciation, romanization/transliteration, and limited content. this improved dictionary contains over 12,000 entries of pashto words; each entry has a pashto word and corresponding english meanings. the pashto–english dictionary has been created with the following objectives in mind: a) standardized spelling and vocabulary, b) standard pronunciation, and c) standardized romanization with the ala–lc romanization scheme. other published pashto dictionaries either use one of the above or a combination of a few romanization systems. this dataset is available for noncommercial use upon reasonable request. two datasets but in different languages (arabic and persian) were produced by the open islamicate texts initiative, available in github (https://github.com/openiti/ocr_gs_data).15 both arabic and persian datasets have scans of original books from the premodern and corresponding texts.16 for example, its persian datasets came from page images from three persian books. these pages were segmented into separated line images and the line images were transcribed with corresponding persian texts. https://github.com/openiti/ocr_gs_data information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 5 building a pashto dataset our dataset creation methodology consists of three phases: • the first is to select pashto publications from our largest digital afghan collections. the focus was to have a language specialist who selected publications varying in fonts, original quality, and publication years. • the second phase is to use our segmentation app to produce line images from page images of the selected titles. because of the nature of pashto alphabets, we took a different segmentation approach involving expanding textboxes. this approach produced positive outcomes. • the final phase is to generate gold-standard text from corresponding line images involving human key-in and final review. we originally hoped that ocr generated text could increase productivity. unfortunately, the text produced from the current open-source ocr system tesseract 4.x was not useful. a persian ph.d. student was hired to complete the one-to-one key-in. finally, the author and his colleague reviewed the dataset. data source rawan and han at the university of arizona libraries have been collaborating with the afghanistan centre at kabul university (acku), the de facto national library of afghanistan. the purpose of the 13-year-long collaboration is to preserve and provide open access to afghanistan’s unique materials from the acku’s physical collections. initially funded by a grant of $350,000 from the national endowment for the humanities (neh) for the period of 2008 to 2012, the project digitized 200,000 pages of materials from the modern period. the project continues to receive support from the university of arizona and the acku. the acku’s permanent collection is the most extensive in the region covering a time of war and social upheaval in the country, with most of the documents in the principal languages of pashto, dari (persian), and english with a variety of formats such as monographs, series, reports, yearbooks, videos, and newspapers. in addition, rawan and han also pursued related afghani scholars’ collections including those of ludwig w. adamec and m. mobin shorish. a repository (www.afghandata.org) has been openly accessible containing these unique materials dating from the 1950s to the present. the repository has grown from the initial 200,000 pages to 2 million, and is the biggest digital repository in the world covering afghanistan and its region with more than 200,000 active users viewing 400,000 pages per year. the wealth of the materials in terms of content, formats, and sources of information makes them undoubtedly the ultimate source of information for the studies of afghanistan and its region. from a data scientist’s point of view, the repository is a treasure trove for big data and ml purposes because it consists of a diversity of content from many sources in a variety of formats and document layouts. selection the selected books, published in 1986, 2002, and 2006 respectively, vary in fonts, printing, and digitization quality. ms. rawan, a language specialist, selected ten pashto books from the afghan digital repository. due to the limited funding available, only three books were used as the source for the dataset. more titles can be added if additional funding is available in the future. http://www.afghandata.org/ information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 6 1. “ کرګر اکبر محمد څیړونکی / څیره اوفلسفی عرفانی روښان دبایزید کی حالنامه په .” (mystic and philosophic profile of bayazid roshan as reflected in halnama), published in 2006 and digitized in 400 dpi in grayscale (www.doi.org/10.2458/azu_acku_bp189_5_bay29_pay23_1385). published in 2002 and digitized in 600 dpi in ,(women in life) ”لیکنه او څیړنه معصومه رضاء سید“ .2 black and white (www.doi.org/10.2458/azu_acku_bp173_4_ray62_1381). published in 1986 with lower ,(teaching the qur'an and theology) ”… تعلمیم القران او دینیات“ .3 quality printing in a different pashto font digitized in 600 dpi in black and white (www.doi.org/10.2458/azu_acku_bp45_tay67_1365). image processing algorithms and the segmentation app in a general traditional ocr system, the workflow of recognition consists of the following stages: preprocessing, document layout analysis, page segmentation, classification, and postprocessing. in our research, we refer to segmentation as the process of partitioning a digital image into one or more information blocks. the segmentation app is used during the preprocessing and segmentation stages. sturgeon’s paper discussed major methods for preprocessing and character segmentation.17 multiple papers discussed various methods to do arabic or pashto text segmentation, including • horizontal projection18 • baseline 19 • template matching 20 • contour analysis 21 • zoning, 22 and • a combination of one or more above methods such as contour analysis and template matching.23 these methods have certain issues when dealing with letters with dots on the top or the bottom, and diacritics, specific to pashto scripts, as the pashto alphabet contains more letters and diacritics than its counterparts in arabic and persian. in addition, noise from original low-quality printing and digitization creates additional barriers. ullah et. al.24 briefly mentioned text area detection and segmentation with the detection and removal of diacritics. their segmentation goes from line segmentation using the horizontal projection, to word, and then to character level progressively. the letters (e.g., څ ,ټ, ښ) are sensitive to noise randomly appearing in page images. our method has proven to be successful in getting accurate character and line segments with the benefits of simplicity and program efficiency. details of discussion of the method are beyond the scope of this paper and shall be discussed in another paper. the author and a postdoctoral researcher created the code to identify pashto/persian text lines from page images, where the page images are from our digitization master files. our method takes a different approach from the above segmentation methods. algorithms and specific properties related to the characteristics of pashto letters have been implemented. we called it the “expanding textbox” method, which calculates the overlapping ratio of one textbox with the others and merges them based on a confidence level controlled by users. the confidence level of overlapping ratio is controlled by properties such as textbox, overlaptype, overlapthreshold, maxdiacriticssize, and minlineheight. to achieve segmentation, the app is also a specific image processing program that contains common preprocessing algorithms such as binarization. http://www.doi.org/10.2458/azu_acku_bp189_5_bay29_pay23_1385 https://www.doi.org/10.2458/azu_acku_bp173_4_ray62_1381 https://www.doi.org/10.2458/azu_acku_bp45_tay67_1365 information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 7 all commercial and open-source ocr systems give users few choices in page segmentation. we believe that the availability of flexible adjustments unique to the pashto/persian/arabic alphabets allows users to achieve accurate results based on analysis of our largest collections of pashto materials. our huge collections of printed materials spanning the period from the 1950s to the 2010s were published by governments, non-profit organizations, local companies, and individuals. these materials were printed in a diverse range of fonts and printing quality. the app has unique features to allow users to adjust several variables to ensure that they have accurate segmentation. segmentation parameters such as vertical expansion and horizontal expansion (see fig. 1) can be adjusted to expand the line vertically and/or horizontally. our experiments show that typically vertical expansion is set to −0.15 and horizontal expansion is set to 5 for most of the page images from our collections. however, both variables are subject to change if lines are not segmented correctly. figures 2 and 3 show a real-life example of the different values in the vertical expansion (set to 0.20) to get all of the correct lines. users can adjust these variables to achieve desired outputs if diacritics and lines are not recognized correctly. the app was programmed using matlab, which can run on matlab or run independently if packaged with matlab. the app can be exported to other platforms and run in batch mode if needed. the app has a simple gui (see fig. 1) providing a preview of expanded ligatures, expanded diacritics, lines of text, and binarized image windows. this allows users to adjust segmentation variables and verify results before outputting. figure 4 demonstrates an example of lines of text preview. when satisfied, users can output these lines as images (one image per line from a page image). these line images are ready for ocr or manual transcription. figure 1. expanded diacritics (highlighted in red) and the app gui. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 8 figure 2. vertical expansion set as −0.15 missing two lines. figure 3. vertical expansion set as −0.20 producing correct results information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 9 figure 4. text lines identified (lines in green). information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 10 finalizing the dataset to build a truly 100% accurate dataset, we have a language specialist who keyed in, verified, and double-checked the corresponding pashto texts. a ph.d. student from the school of middle eastern and north african studies (with persian language fluency) was hired to complete this task. initially, we tried to ocr these line images by using the open-source system tesseract 4.x with the hope that its output would speed up the key-in process. unfortunately, the majority of the ocr results from these line images was not usable. to ensure that the dataset has the gold-standard one-to-one mapping of a line image to a line text, the ph.d. student keyed in pashto texts line by line by viewing every individual line image. figure 5 shows a sample line image and its text. finally, ms. rawan and the author han reviewed these line images and their corresponding texts. the dataset is organized in a hierarchical structure consisting of directories, where each directory contains line subdirectories which hold line images and its texts. the dataset is openly available at github (https://github.com/yhan818/pashto-dataset). figure 5. sample line image and corresponding text. discussion the nature of the scripts and the writing systems may require different algorithms and considerations when we deal with ocr technology, including preparing datasets, segmentation, and performing ocr in computer vision. in our research in specific languages, we have tested this app with documents in pashto, persian/dari and arabic with successful results. our textbox extension method should work for any language using the arabic writing system beyond these above scripts. during our research, we are clearly aware of the following limitations of the ocr technology, techniques and systems: 1) lack of high accuracy in segmentation: a) while it is true that ocr on the character/word accuracy of the latin scripts can exceed 95% accuracy, one shall not believe that the accuracy of a document after ocr will be at the same level. depending on the nature of a document, segmentation accuracy varies among documents. ocring documents in simple layout (e.g., a monograph without columns and tables) generally reaches high accuracy, while ocring documents with complex layouts (e.g., newspapers and scientific articles) generates poor results. b) we have tested multiple popular commercial and open-source ocr systems specifically in the area of segmentation. on several samples, every ocr system failed completely. in other words, the text output of every ocr system is nonsense. in some cases, only abbyy recognized columns correctly; the remaining systems unexpectedly transposed https://github.com/yhan818/pashto-dataset information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 11 text columns, which means potential indexing and searching errors, although the character and word accuracy reached 95% accuracy. c) we argue that segmentation accuracy shall be added as one of the most important evaluation criteria. 2) to date almost all ocr technology and systems are limited to text only: a) missing information in other formats: we agree that plain text in the writing systems is the most commonly used and a very important communication method. however, almost all materials contain information in other formats (e.g., illustrations, figures, tables, and formulas) that may be very difficult to describe in text. an individual page from a monograph, journal article, or newspaper may contain information in other formats beyond text. such information can be a table, mathematical formula, figure, picture, or drawing. one company, mathpix, recently started to provide ocr on simple tables and mathematical formulas for a fee. b) missing semantic information: in addition, current ocr simply outputs plain text, ignoring existing semantic information (e.g., bold highlighted text and section/subsection headings in different fonts and sizes). ocr refers to character recognition, which limits its own scope in theory. in current practices, semantic information is totally ignored by every ocr system. scant research has been carried out for them too. conclusion and future work so far, we have created a pashto dataset containing 300 line images and their corresponding text from three pashto monographs published in 1986, 2002, and 2006, respectively. the dataset is openly available as a gold-standard pashto dataset from real books. when future funding is received, we will add more data to this dataset. the segmentation app produces accurate line images from page images for pashto and persian content. it will work for other languages using the arabic writing system. potential users of our prototype software will find it is relatively easy to modify with little knowledge of the underlying technology in other programming languages such as java. in addition, researchers who understand linear algebra in which matlab is used can modify the code for their needs. we are also using this dataset to train and evaluate our current ocr algorithms with rnn and other ml models. an initial report of our research and results can be found at arxiv.org.25 the authors will report and update future research results and available datasets via conferences and formal publications. the authors would like to thank the national endowment for the humanities for its grant (pr 263939-19) to our project development of image-to-text conversion for pashto and traditional chinese. the authors would like to thank riaz ahmad and saeeda naz for providing the nuces fast ligature dataset. the authors would also like to thank atifa rawan, sayyed m. vazirizade, and sharam parastesch for their valuable contributions. ms. rawan selected the sample pashto manuscripts and reviewed the lines. dr. vazirizade worked on segmentation algorithms and code. ph.d. student sharam parastesch keyed in and verified the dataset. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 12 glossary of terms • alphabet: an alphabet is a standardized set of letters and symbols. the most popular are the latin alphabet (a–z) and the arabic alphabet. • classification: in machine learning, classification is to assign a sample to one or more classes by supervised learning from examples. • dataset: a set of data. a dataset can be in a variety of forms or formats (e.g., text, images, audio, videos, 3d objects, gps data, machine learning data), from one table, to a collection of images of handwritten digits (e.g., mnist database), to a collection of metadata of a digital repository (e.g., arxiv dataset). • document layout analysis: in the ocr technology, document layout analysis is the process of identifying the layout and categorizing the information blocks in a digital image of a document. the goal is to segmentalize one information block from the other and arrange these information blocks in the correct reading order. • language: a structured system of communication; the system of linguistic signs or symbols considered in the abstract (as opposed to speech). • left to right (ltr) first and top to bottom (ttb) writing direction: writing a horizontal line starting from the top left of a page, continuing to the right, and returning to the next line all the way from top to bottom. the latin writing system uses this writing direction. the current chinese writing system uses this order as well. • logogram: a written character that represents a word or phrase. the most popular are chinese (simplified and traditional) characters, kanji (japanese), and hanja (korean). • optical character recognition (ocr): conversion of an image consisting of text (printed or handwritten) into digital text. • page segmentation: segmentation process for a scanned page in a digital image file format. • right to left (rtl) first and top to bottom (ttb) writing direction: writing a horizontal line starting from the top right of a page, continuing to the left, returning to the next line and all the way from top to bottom. the arabic writing systems such as arabic, persian, and pashto use this order. • script: “[a] collection of letters and other written signs used to represent textual information in one or more writing systems.”26 • segmentation: segmentation is the process of partitioning a digital image of a document into multiple segments where each segment consists of a set of pixels. it aims at separating the digital image into one or more information blocks, where each information block contains logical information separated from the other information block. these information blocks shall be arranged in the correct reading order. (see document layout analysis) • textbox: in an ocr system, a textbox is a box with (x,y) (identified in the computer source code) that contains one or more characters. • top to bottom (ttb) first and left to right (ltr) writing direction: writing a vertical line starting from the top left of a page, continuing to the bottom, and returning to the next line all the way from left to right. this method is rarely used by any writing system. • top to bottom (ttb) first and right to left (rtl) writing direction: writing a vertical line starting from the top right of a page, continuing to the bottom, and returning to the next line all the way from right to left. this method was widely used in traditional chinese (before 1950s) and traditional japanese materials for thousands of years. it is still used in chinese calligraphy, and occasionally can be found in materials published in chinese. • writing system: a common communication method to allow people to exchange information through a medium such as paper. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 13 endnotes 1 lu gan, email message to author, march 25, 2019. 2 donald sturgeon, “large-scale optical character recognition of pre-modern chinese texts,” international journal of buddhist thought and culture 28, no. 2 (2018): 11–44, https://dsturgeon.net/papers/large-scale-chinese-ocr.pdf. 3 donald sturgeon, “digitizing premodern text with the chinese text project,” journal of chinese history 4, no. 2 (2020): 486–98, https://doi:10.1017/jch.2020.19. 4 “glossary of unicode terms,” the unicode consortium, last updated may 20, 2020, http://www.unicode.org/glossary/. 5 the unicode standard version 13.0—core specification: chapter 17: indonesia and oceania (the unicode consortium: mountain view, ca, 2020), https://www.unicode.org/versions/unicode13.0.0/ch17.pdf#g26723. 6 britta-maria gruber and wolfgang kirsch, “writing machu on a western computer (an interim report),” saksaha: a journal of manchu studies, 3, (1998): https://doi.org/10.3998/saksaha.13401746.0003.008. 7 herbert penzl, a grammar of pashto: a descriptive study of the dialect of kandahar, afghanistan. (new york: ishi press, 2009). 8 library of congress, pushto romanization tables (2013), https://www.loc.gov/catdir/cpso/romanization/pushto.pdf. 9 riaz ahmad et al., “robust optical recognition of cursive pashto script using scale, rotation and location invariant approach,” plos one 10, no. 9 (september 14, 2015): e0133648, https://doi.org/10.1371/journal.pone.0133648. 10 shizza zahoor et al., “deep optical character recognition: a case of pashto language,” journal of electronic imaging 29, no. 02 (march 4, 2020), https://doi.org/10.1117/1.jei.29.2.023002. 11 zakir ali et al., “database development and automatic speech recognition of isolated pashto spoken digits using mfcc and k-nn,” international journal of speech technology 18, no. 2 (june 2015): 271–75, https://doi.org/10.1007/s10772-014-9267-z. 12 sulaiman khan et al., “knn and ann-based recognition of handwritten pashto letters using zoning features,” international journal of advanced computer science and applications 9, no. 10 (2018), https://doi.org/10.14569/ijacsa.2018.091069. 13 sultan ullah et al., “offline pashto ocr using machine learning,” in 2019 7th international electrical engineering congress (ieecon), (hua hin, thailand, 2019): 1–4, https://doi.org/10.1109/ieecon45304.2019.8938859. 14 atifa rawan and yan han, the pasto-english dictionary (2014), http://www.pashtoenglish.org. https://dsturgeon.net/papers/large-scale-chinese-ocr.pdf about:blank http://www.unicode.org/glossary/ https://www.unicode.org/versions/unicode13.0.0/ch17.pdf#g26723 https://doi.org/10.3998/saksaha.13401746.0003.008 https://doi.org/10.3998/saksaha.13401746.0003.008 https://doi.org/10.3998/saksaha.13401746.0003.008 https://www.loc.gov/catdir/cpso/romanization/pushto.pdf https://doi.org/10.1371/journal.pone.0133648 https://doi.org/10.1371/journal.pone.0133648 https://doi.org/10.1371/journal.pone.0133648 https://doi.org/10.1117/1.jei.29.2.023002 https://doi.org/10.1117/1.jei.29.2.023002 https://doi.org/10.1007/s10772-014-9267-z https://doi.org/10.1007/s10772-014-9267-z https://doi.org/10.14569/ijacsa.2018.091069 https://doi.org/10.14569/ijacsa.2018.091069 %20 %20 https://doi.org/10.1109/ieecon45304.2019.8938859 http://www.pashtoenglish.org/ information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 14 15 open islamicate texts initiative, open islamicate texts initiative (openiti): creating the digital infrastructure for the study of the premodern islamicate world (2016), https://iticorpus.github.io/. 16 matthew thomas miller, maxim g. romanov, and sarah bowen savant, “digitizing the textual heritage of the premodern islamicate world: principles and plans,” international journal of middle east studies 50, no. 1 (february 2018): 103–9, https://doi.org/10.1017/s0020743817000964. 17 sturgeon, “large-scale optical character recognition of pre-modern chinese texts,” 11–44. 18 mohamed attia and mohamed el-mahallawy, “histogram-based lines and words decomposition for arabic omni font-written ocr systems; enhancements and evaluation,” in computer analysis of images and patterns, ed. walter g. kropatsch, martin kampel, and allan hanbury, vol. 4673, lecture notes in computer science (berlin, heidelberg: springer berlin heidelberg, 2007), 522–30, https://doi.org/10.1007/978-3-540-74272-2_65; mahmoud a. a. mousa, mohammed s. sayed, and mahmoud i. abdalla, “arabic character segmentation using projection based approach with profile’s amplitude filter,” arxiv:1707.00800 [cs], july 3, 2017, http://arxiv.org/abs/1707.00800. 19 atallah al-shatnawi and khairuddin omar, “methods of arabic language baseline detection— the state of art,” international journal of computer science and network security 8, no. 10 (october 2008); tarik abu-ain et al., “a novel baseline detection method of handwritten arabic-script documents based on sub-words,” in soft computing applications and intelligent systems, ed. shahrul azman noah et al., communications in computer and information science 378 (springer: berlin, heidelberg, 2013), 67–77, https://doi.org/10.1007/978-3-642-405679_6; saeeda naz et al., “challenges in baseline detection of arabic script based languages,” in intelligent systems for science and information, ed. liming chen, supriya kapoor, and rahul bhatia, studies in computational intelligence (springer international publishing, 2014), 542: 181–96, https://doi.org/10.1007/978-3-319-04702-7_11. 20 majid ziaratban and karim faez. “a novel two-stage algorithm for baseline estimation and correction in farsi and arabic handwritten text line,” in 2008 19th international conference on pattern recognition, tampa, fl, usa: ieee, 2008: 1–5, https://doi.org/10.1109/icpr.2008.4761822. 21 safwan wshah, zhixin shi, and venu govindaraju, “segmentation of arabic handwriting based on both contour and skeleton segmentation,” in 2009 10th international conference on document analysis and recognition, barcelona, spain: ieee, 2009: 793–97, https://doi.org/10.1109/icdar.2009.152; yusra osman, “segmentation algorithm for arabic handwritten text based on contour analysis,” in 2013 international conference on computing, electrical and electronic engineering (icceee), khartoum, sudan: ieee, 2013: 447–52, https://doi.org/10.1109/icceee.2013.6633980. 22 khan et al., “knn and ann-based recognition of handwritten pashto letters using zoning features.” https://iti-corpus.github.io/ https://iti-corpus.github.io/ https://doi.org/10.1017/s0020743817000964 https://doi.org/10.1017/s0020743817000964 https://doi.org/10.1017/s0020743817000964 https://doi.org/10.1007/978-3-540-74272-2_65 https://doi.org/10.1007/978-3-540-74272-2_65 http://arxiv.org/abs/1707.00800 http://arxiv.org/abs/1707.00800 http://arxiv.org/abs/1707.00800 https://doi.org/10.1007/978-3-642-40567-9_6 https://doi.org/10.1007/978-3-642-40567-9_6 https://doi.org/10.1007/978-3-642-40567-9_6 https://doi.org/10.1007/978-3-319-04702-7_11 https://doi.org/10.1007/978-3-319-04702-7_11 https://doi.org/10.1109/icpr.2008.4761822 https://doi.org/10.1109/icpr.2008.4761822 https://doi.org/10.1109/icpr.2008.4761822 https://doi.org/10.1109/icdar.2009.152 https://doi.org/10.1109/icdar.2009.152 https://doi.org/10.1109/icdar.2009.152 https://doi.org/10.1109/icceee.2013.6633980 https://doi.org/10.1109/icceee.2013.6633980 https://doi.org/10.1109/icceee.2013.6633980 information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 15 23 abdelhay zoizou, arsalane zarghili, and ilham chaker. “a new hybrid method for arabic multifont text segmentation, and a reference corpus construction.” journal of king saud university—computer and information sciences 32, no. 5 (june 2020): 576–82, https://doi.org/10.1016/j.jksuci.2018.07.003. 24 ullah, “offline pashto ocr using machine learning.” 25 marek rychlik et al., “development of a new image-to-text conversion system for pashto, farsi and traditional chinese,” arxiv:2005.08650 [cs], may 8, 2020, http://arxiv.org/abs/2005.08650. 26 “glossary of unicode terms,” http://www.unicode.org/glossary/. https://doi.org/10.1016/j.jksuci.2018.07.003 https://doi.org/10.1016/j.jksuci.2018.07.003 https://doi.org/10.1016/j.jksuci.2018.07.003 http://arxiv.org/abs/2005.08650 http://arxiv.org/abs/2005.08650 http://arxiv.org/abs/2005.08650 http://www.unicode.org/glossary/ abstract background languages, scripts, and writing systems pashto the need for datasets related datasets building a pashto dataset data source selection image processing algorithms and the segmentation app finalizing the dataset discussion conclusion and future work glossary of terms endnotes use of language-learning apps as a tool for foreign language acquisition by academic libraries employees articles use of language-learning apps as a tool for foreign language acquisition by academic libraries employees kathia ibacache information technology and libraries | september 2019 22 kathia ibacache (kathia.ibacache@colorado.edu) is the romance languages librarian at the university of colorado boulder. abstract language-learning apps are becoming prominent tools for self-learners. this article investigates whether librarians and employees of academic libraries have used them and whether the content of these language-learning apps supports foreign language knowledge needed to fulfill library-related tasks. the research is based on a survey sent to librarians and employees of the university libraries of the university of colorado boulder (ucb), two professional library organizations, and randomly selected employees of 74 university libraries around the united states. the results reveal that librarians and employees of academic libraries have used language-learning apps. however, there is an unmet need for language-learning apps that cover broader content including reading comprehension and other foreign language skills suitable for academic library work. introduction the age of social media and the advances in mobile technologies have changed the manner in which we connect, socialize, and learn. as humans are curious and adaptive beings, the moment mobile technologies provided apps to learn a foreign language, it was natural that self-regulated learners would immerse themselves in them. language-learning apps’ practical nature, as an informal educational tool, may attract self-learners such as librarians and employees of academic libraries to utilize this technology to advance foreign language knowledge usable in the workplace. the academic library employs a wide spectrum of specialists, from employees offering research consultations, reference help, and instruction, to others specialized in cataloging , archival, acquisition, and user experience, among others. regardless of the library work, employees utilizing a foreign language possess an appealing skill, as knowing a foreign language heightens the desirability of employees and strengthens their job performance. in many instances, librarians and employees of academic libraries may be required to have reading knowledge of a foreign language. therefore, for these employees, acquiring knowledge of a foreign language might be paramount to deliver optimal job performance. this study aims to answer the following questions: 1) are librarians and employees of academic libraries using language-learning apps to support foreign language needs in their workplace? and 2) are language-learning apps addressing the needs of librarians and employees of academic libraries? for purposes of this article, mobile language apps are those accessed through a website, and apps downloaded onto portable smartphones, tablets, desktops, and laptops. mailto:kathia.ibacache@colorado.edu use of language-learning apps | ibacache 23 https://doi.org/10.6017/ital.v38i3.11077 background mobile-assisted language learning (mall) has a user-centered essence that resonates with users in the age of social media. librarians and employees of academic libraries needing a foreign language to fulfill work responsibilities are a target group that can benefit from using languagelearning apps. these apps provide a multifaceted capability that offers time and space flexibility and adaptability that facilitates the changeable environment favored by self-learners. kukulskahulme states that it is customary to have access to learning resources through mobile devices. 1 in the case of those individuals working in academic libraries, language-learning apps may present an opportunity to pursue a foreign language accommodating their self-learning style, time availability, space, and choice of device. considering the features of language-learning apps, some have a more personal quality where the device interacts with one user while other apps emulate social media characteristics connecting a wide array of users. for instance, users learning a language through the hello talk app can communicate with native speakers all around the world. through this app, language learners can send voice notes, corrections to faulty grammar, and use the built-in translator feature. therefore, language-learning apps may not only provide self-learners a vehicle to communicate remotely, but also to interact using basic conversational skills in a given language. in the case of those working in academic libraries, this human connectedness among users may not be as relevant as the interactive nature of the device, its mobility, the convenience of the virtual learning, and the flexibility of the mobile technology. kukulska-hulme notes that the ubiquity of mobile learning is affecting the manner in which one learns.2 although there is abundant literature referring to mobile language technologies and their usefulness in students’ language learning in different school levels including higher education, scholarship regarding the use of language-learning apps by professionals is scarce.3 broadbent refers to self-regulated learners as those who plan their learning through goals and activities. 4 the author concurs that to engage in organized language learning through a language-learning app, one should have some level of organizational learning or as a minimum enough motivation to engage in self-learning. in this context, some scholars believe that the level of self-management of learning will determine the level of learning success.5 moreover, learners who possess significant personal learning initiative (pli) have the foundation to accomplish learning outcomes and overcome difficulties.6 pli may be one factor affecting learners’ motivation to learn a language in a virtual environment and away from the formal classroom setting. this learning initiative may play a significant role in the learning process, as it may influence the level of engagement and positive learning outcome. in terms of learning outcomes, language software developers may also play a role by adapting and broadening content based on learning styles and considering the elements that would provide a meaningful user experience. in this sense, bachore conveys that there is a need to address language-learning styles when using mobile devices.7 bachore also notes that as interest in mobile language learning increases, so does the different manners in which mobile devices are used to implement language learning and instruction.8 similarly, louhab refers to context dimensions as the parameters in mobile learning that consider learners’ individuality in terms of where the learning takes place, individual personal qualities and information technology and libraries | september 2019 24 learning needs, and the features of their mobile device.9 bradley also suggests that learning is part of a dialogue between the learners and their devices as part of a sociocultural context where thinking and learning occur.10 in addition. bradley infers that users are considered when creating learning activities and when improving them.11 for these reasons, some researchers address the need to focus on accessibility and developing content designed for different types of users, including differently abled learner s.12 furthermore, adaptation, according to the learner’s style, may be considered as a pivotal quality of languagelearning apps as software developers try to break the gap between formal instruction and a learner-oriented mobile learning platform. undoubtedly, the technological gap, which includes the cost of the device, interactivity, screen size, and input capabilities, among others, matter when centering on implementing language learning supported by mobile technologies. however, learning style is only one aspect in the equation. a learner’s need is another. for example: the needs of a learner who seeks to acquire familiarity with a foreign language because of an upcoming vacation may be substantially distinct from the needs of professionals such as academic librarians, who may need reading, writing, or even speaking proficiency in a given language. a user-centered approach in language-learning software design may advance the adequacy of these apps connecting them with a much wider set of learning needs. when referring to mobile apps for language learning, godwin-jones asserts that while the capability of devices is relevant, software development is paramount to the educational process.13 therefore, language-learning software developers may consider creating learning activities that target basic foreign language-learning needs and more tailored ones suitable for people who require different content. kukulska-hulme refers to “design for learning” as creating structured activities for language learning.14 although language-learning apps appear to rely on learning activities built on basic foreign language learning needs, these apps should desire to rely more on learners’ evaluative insights to advance software development that meets the specific needs of learners. although mobile technologies as a general concept will continue to evolve, its mobile nature will likely continue focusing on user experience satisfying those who prefer the freedom of informal learning. methodology instrument the author used a 26-question qualtrics survey approved by the institutional review board at the university of colorado boulder (ucb). the survey was open for eight weeks and received 199 total responses. however, the number of responses to each question varied depending on the question. the data collected was both quantitative and qualitative in nature, seeking to capture respondents’ perspectives and measurable data that could be used for statistics. the survey consisted of twelve general questions for all respondents that reported working in an academic library, then branched into either nine questions for respondents who had used a languagelearning app, and five questions for those who had not. the respondents answered via text fields, standard single and multiple-choice questions, and a single answer likert matrix table. qualtrics provided a statistical report, which the author used to analyze the data and create the figures. use of language-learning apps | ibacache 25 https://doi.org/10.6017/ital.v38i3.11077 participants the survey was distributed through an email to librarians and employees of ucb’s university libraries. the author also identified 74 university libraries in the united states from a list of members of the association of research libraries, and distributed the survey via email to ten randomly selected library employees from each of these libraries.15 the recipients included catalogers, subject specialists, archivists, and others working in metadata, acquisition, reference, and circulation. in addition, the survey was also distributed to the listserv of two library organizations: the seminar on the acquisition of latin american library materials (salalm) and reforma, the national association to promote library and information services to latinos and the spanish speaking. these organizations were chosen due to their connection with foreign languages. results use of foreign language at work of the respondents, 172 identified as employees of academic libraries (66 percent). of these, a significant percentage reported using a foreign language in their library work. the respondents belonged to different generational groups. however, most of the respondents were in the age groups of 30-39 and 40-49 years old. the respondents performed a variety of duties within the categories presented. due to incomplete survey results, varying numbers of responses were collected for each question. therefore, of 110 respondents, 82 identified their gender as female. in addition, of 105 respondents, 62 percent reported being subject specialists, 56 worked in reference, 54 percent identified as instruction librarians, 30 percent worked in cataloging and metadata, 30 percent worked in acquisition, 10 percent worked in circulation, 2 percent worked in archiving, and 23 percent reported doing “other” types of library work. information technology and libraries | september 2019 26 figure 1. age of respondents (n=109). figure 2. foreign language skills respondents used at work (multiple responses allowed, n-106). 9.17% 29.36% 30.28% 12.84% 18.35% 20-29 years old 30-39 years old 40-49 years old 50-59 years old 60 years or older 102 65 49 49 reading writing speaking listening use of language-learning apps | ibacache 27 https://doi.org/10.6017/ital.v38i3.11077 as shown in figure 2, respondents used different foreign language skills at work. however, reading was used with significantly more frequency. when asked, “how often do you use a foreign language at work?” 38 respondents out of 105 used it daily, 29 used it weekly, and 21 used it monthly. in addition, table 1 shows that a large percentage of respondents noted that knowing a foreign language helped them with collection development tasks and reference services. however, the respondents who chose “other” stated in a text field that knowing a foreign language helped them with translation tasks, building management, creating a welcoming environment, attending foreign guests, communicating with vendors, researching, processing, and having a broader perspective of the world emphasizing empathy. these respondents also expressed that knowing a foreign language helped them to work with materials in other languages, digital humanities projects, and to offer library tours and outreach to the community. type of librarian work expressed benefit (%) collection development 61.5 reference 57.6 communication 56.7 instruction 41.3 cataloging and metadata 41.3 acquisition 40.3 other 19.2 table 1. types of librarian work benefiting from knowledge of a foreign language (multiple responses allowed, n=104). figure 3. languages respondents studied using an app (multiple responses allowed, n=51). as shown in figure 3, spanish was the most prominent language studied. thirteen out of 51 respondents studied french and portuguese. additionally, respondents stated in the text field “other” that they have also used these apps to study english, mandarin, arabic, malay, hebrew, swahili, korean, navajo, turkish, russian, greek, polish, welsh, indonesian, thai, and tamil. regardless, apps were not the sole means for language acquisition. some respondents specified using books, news articles, pimsleur cds, television shows, internet radio, conversations with family members and native speakers, formal instruction, websites, dictionaries, online tutorials, audio tapes, online laboratories, flashcards, podcasts, movies, and youtube videos. 22 13 13 9 8 5 26 spanish french portuguese german italian japanese other information technology and libraries | september 2019 28 over a third of 49 respondents used a language-learning app for 30 hours or more, and less than a quarter used it between 11-30 hours. concerning the device preferred to access the apps, most of the respondents utilized a smartphone (63.27 percent), followed by a laptop (16.33 percent), and a tablet (14.29 percent). table 2 shows the elements of language-leaning apps that 48 respondents found more satisfactory. they selected “learning in own time and space” as the most desired element followed by “vocabulary” and “translation exercises.” participants were less captivated by “pronunciation capability” (29.1 percent) and “dictionary function” (16.6 percent). element of a language-learning app percentage finding satisfactory (%) learning in own time and space 64.5 vocabulary 56.2 translation exercises 56.2 making mistakes without feeling embarrassed 54.1 responsive touch screen 52 self-testing 52 reading and writing exercises 43.7 game-like features 37.5 voice recognition capability 37.5 comfortable text entry 37.5 grammar and verb conjugation exercises 35.4 pronunciation capability 29.1 dictionary function 16.6 table 2. most satisfactory aspects with language-learning apps (multiple responses allowed, n=48). figure 4. most unsatisfactory elements of language-learning apps (n=30). conversely, 30 respondents described unsatisfactory elements on the survey. these elements were grouped into the categories shown in figure 4. the elements were: payment restrictions, lack of grammatical explanations, monocentric content focused on travel, vocabulary-centric content 13 10 5 2 content flexibility/interface grammar payment use of language-learning apps | ibacache 29 https://doi.org/10.6017/ital.v38i3.11077 (although opinions were varied on this issue), and poor interface. respondents also mentioned a lack of flexibility that inhibited learners from reviewing earlier lessons or moving forward as desired, unfriendly interfaces, and limited scope. other respondents alluded to technical issues with keyword diacritical, non-intuitive software and repetitive exercises. while these elements relate to the language apps themselves, one respondent mentioned missing human interaction and another reported the lack of a system to prompt learners to be accountable for their own learning process. figure 5. reasons participants had not used a language-learning app (multiple responses allowed, n=53). figure 5 shows that time restriction (i.e., availability of time to use the app) was the most prevalent reason why respondents had not used a language-learning app. however, a larger percentage of respondents answered “other” to expand on the reason they had not tried this technology. the explanations provided included: missing competent content for work; already having sufficient proficiency; preferring books, dictionaries, google translate, and podcasts; lacking interest; and having different priorities. similarly, when asked whether they would use a language-learning app if given an opportunity, a large percentage of 52 respondents answered “maybe” (65.38 percent). however, when 51 respondents answered the question: “what elements facilitated your language learning?,” 66.6 percent responded they preferred having an instructor, 54.9 percent liked being part of a classroom, and 41.1 percent liked language activities with classmates. discussion library employee use of language-learning apps the data revealed that a large number of respondents used a foreign language in their library work, reporting that reading and writing were the most needed skills. however, only about half of the respondents had used a language-learning app. therefore, there appears to be interest in language-learning apps, but use is not widespread at this time. overall, respondents felt languagelearning apps did not offer a curriculum that supported foreign language enhancement for the workplace, especially the academic library one. this factor may be one reason why respondents stopped using the apps and why this technology was not utilized more extensively. 54.71% 37.73% 32.07% 1.88% other lack of time prefer traditional setting screen too small information technology and libraries | september 2019 30 interestingly, the majority of the respondents were in their thirties and forties. one may surmise that young millennials in their twenties would be more inclined to use language-learning apps. however, the data showed a slight lead by respondents in their forties. this information may corroborate the author’s inference that generational distinctions among employees of academic libraries do not limit the ability to seek and even prefer learning another language through apps. moreover, a pew research center study showed that older generations than millennials have welcomed technology and even gen xers had a 10 percent lead on the ownership of tablets over millennials.16 referring to the device used to interact with the language app, most respondents preferred a smartphone. only a smaller fraction of respondents preferred a tablet, laptop, or desktop. this data may attest to the movability feature of language-learning apps preferred by self-learners and the notion that language learning may happen outside the classroom setting. however, while smartphones provide ubiquity and a sense of independence, so can tablets. therefore, what is it about smartphones that ignites preference from a user experience perspective? is it their ability to make calls, portability, fast processors, wi-fi signal, or cellular connectivity that makes a difference? since tablets can also be considered portable, and their larger screen and web surfing capabilities are desirable assets, is it the “when and where” that determines the device? while not all respondents reported using an app to learn a language, those who did expressed satisfaction with learning in their own space and time and with translation exercises. nevertheless, it is captivating that few respondents deemed important the ability of the software to help learners with the phonetic aspect of the language. this diminished interest in pronunciation may be connected with the type of language learning needed in the academic library profession. as respondents indicated, language-learning apps tend to focus on conversational skills rather than reading and text comprehension. in addition to those respondents who used an app to learn a new language, one respondent reported reinforcing skills in a language already acquired. a compelling matter to consider is the frequency with which respondents utilize a foreign language in their work. about a third of the respondents used a foreign language at work on a daily basis, and approximately a quarter used it weekly. this finding reveals that foreign language plays a significant role in academic library work. since the respondents fulfilled different responsibilities at their library work, one may deduce that foreign language is utilized in a variety of settings other than strictly desk tasks. in fact, as stated before, respondents reported using foreign language for multiple tasks including communicating with vendors and foreign guests as well as providing a welcoming environment, among others. even though 59 respondents stated that knowing a foreign language helped them with communication, respondents appeared to be more concerned with reading comprehension and vocabulary. it is likely reading comprehension was ranked high in the level of importance since library jobs that require foreign language knowledge tend to utilize reading comprehension skills widely. nonetheless, the author wonders whether subject specialists utilize more skills related to listening and communication in a foreign language, especially those librarians who provide instruction. therefore, it is curious that they did not prioritize these skills. perhaps this topic could be the subject for future research. notwithstanding these results, language-learning apps appear to center on content that improves listening and basic communication instead of reading use of language-learning apps | ibacache 31 https://doi.org/10.6017/ital.v38i3.11077 comprehension. therefore, the question remains as to whether mobile language apps have enough capabilities to provide a relevant learning experience to librarians and staff working in academic libraries. are language-learning apps responding to the language needs of employees working in academic libraries? the survey results indicate that language-learning apps are not sufficiently meeting respondents’ foreign language needs. qualitative data showed that there may be several elements affecting the compatibility of language-learning apps with the needs of employees working in academic libraries. however, the findings were not conclusive due to the limited number of responses. when respondents were asked to identify the unsatisfactory elements in these apps, 65.9 percent of 47 respondents found an issue with language-learning apps, but 23 percent of those respondents answered “none.” according to respondents, the main problems with apps were the lack of content and scope that was suitable for employees of academic libraries, flexibility, and grammar. perhaps mobile language-app developers speculate that some learners still use a formal classroom setting for foreign language acquisition, and therefore leave more advanced curriculum for that setting. it is also possible that developers deem more dominant a market that centers on travel and basic conversation; this may explain why these apps do not address foreign language needs at the professional level. finally, these academic library employees appear to perceive that there is a need for these apps to explore and offer a curriculum and learning activities that benefit those seeking deeper knowledge of a language. conclusion mobile language learning has changed the approach to language acquisition. its mobility, portability, and ubiquity have established a manner of instruction that provides a sense of freedom and self-management that suits self-learners. moreover, as app technology has progressed, features have been added to devices that facilitate a more meaningful user experience with language-learning apps. employees of academic libraries that have used foreign languagelearning apps are cognizant of language-learning activities that support their foreign language needs for work such as reading comprehension and vocabulary. however, language-learning apps appear to market conversational needs, providing exercises that focus on travel more than less ons that center on reading comprehension and deeper areas of language knowledge. this indicates a lack of language-learning content that would be more appropriate for those working in academic libraries. finally, academic library employees who require a foreign language in their work are a target group that may benefit from mobile language learning. presently, this target group feels languagelearning apps are too basic to cover professional, broader needs. therefore, as language-learning app developers consider service to wider groups of people, it would be beneficial for these apps to expand their lesson structure and content to address the needs of academic library professionals. endnotes 1 agnes kukulska-hulme, “will mobile learning change language learning?” recall 21, no. 2 (2009): 157, https://doi.org/10.1017/s0958344009000202. https://doi.org/10.1017/s0958344009000202 information technology and libraries | september 2019 32 2 ibid, 158. 3 see florence martin and jeffrey ertzberger, “here and now mobile learning: an experimental study on the use of mobile technology,” computers & education 68, (2013): 76-85, https://doi.org/10.1016/j.compedu.2013.04.021; houston heflin, jennifer shewmaker, and jessica nguyen, “impact of mobile technology on student attitudes, engagement, and learning,” computers & education 107, (2017): 91-99, https://doi.org/10.1016/j.compedu.2017.01.006; yoon jung kim, “the effects of mobileassisted language learning (mall) on korean college students’ english-listening performance and english-listening anxiety,” studies in linguistics, no. 48 (2018): 277-98, https://doi.org/10.15242/heaig.h1217424; jack burston, “the reality of mall: still on the fringes,” calico journal 31, no. 1 (2014): 103-25, https://www.jstor.org/stable/calicojournal.31.1.103. 4 jaclyn broadbent, “comparing online and blended learner’s self-regulated learning strategies and academic performance,” internet and higher education 33 (2017): 24, https://doi.org/10.1016/j.iheduc.2017.01.004. 5 rui-ting huang and chung-long yu, “exploring the impact of self-management of learning and personal learning initiative on mobile language learning: a moderated mediation model,” australian journal of education technology 35, no. 3 (2019): 118, https://doi.org/10.14742/ajet.4188. 6 ibid, 121. 7 mebratu mulato bachore, “language through mobile technologies: an opportunity for language learners and teachers,” journal of education and practice 6, no. 31 (2015): 51, https://files.eric.ed.gov/fulltext/ej1083417.pdf. 8 ibid, 50. 9 fatima ezzahraa louhab, ayoub bahnasse, and mohamed talea, “considering mobile device constraints and context-awareness in adaptive mobile learning for flipped classroom,” education and information technologies 23, no. 6 (2018): 2608, https://doi.org/10.1007/s10639-018-9733-3. 10 linda bradley, “the mobile language learner: use of technology in language learning,” journal of universal computer science 21, no. 10 (2015): 1270, http://jucs.org/jucs_21_10/the_mobile_language_learner/jucs_21_10_1269_1282_bradley.pdf . 11 ibid. 12 tanya elias, “universal instructional design principles for mobile learning,” the international review of research in open and distance learning 12, no. 2 (2011): 149, https://doi.org/10.19173/irrodl.v12i2.965. 13 robert godwin-jones, “emerging technologies: mobile apps for language learning,” language learning & technology 15, no. 2 (2011): 3, http://dx.doi.org/10125/44244. https://doi.org/10.1016/j.compedu.2013.04.021 https://doi.org/10.1016/j.compedu.2017.01.006 https://doi.org/10.15242/heaig.h1217424 https://www.jstor.org/stable/calicojournal.31.1.103 https://doi.org/10.1016/j.iheduc.2017.01.004 https://doi.org/10.14742/ajet.4188 https://files.eric.ed.gov/fulltext/ej1083417.pdf https://doi.org/10.1007/s10639-018-9733-3 http://jucs.org/jucs_21_10/the_mobile_language_learner/jucs_21_10_1269_1282_bradley.pdf https://doi.org/10.19173/irrodl.v12i2.965 http://dx.doi.org/10125/44244 use of language-learning apps | ibacache 33 https://doi.org/10.6017/ital.v38i3.11077 14 kukulska, 158. 15 “membership: list of arl members,” association of research libraries, accessed april 5, 2019, https://www.arl.org/membership/list-of-arl-members. 16 jingjing jiang, “millenials stand out for their technology use,” pew research center (2018), https://www.pewresearch.org/fact-tank/2018/05/02/millennials-stand-out-for-theirtechnology-use-but-older-generations-also-embrace-digital-life/. https://www.arl.org/membership/list-of-arl-members https://www.pewresearch.org/fact-tank/2018/05/02/millennials-stand-out-for-their-technology-use-but-older-generations-also-embrace-digital-life/ https://www.pewresearch.org/fact-tank/2018/05/02/millennials-stand-out-for-their-technology-use-but-older-generations-also-embrace-digital-life/ abstract introduction background methodology instrument participants results use of foreign language at work discussion library employee use of language-learning apps are language-learning apps responding to the language needs of employees working in academic libraries? conclusion endnotes editorial board thoughts: reinvesting in our traditional personnel through knowledge sharing and training mark dehmlow information technology and libraries | december 2017 4 mark dehmlow (mdehmlow@nd.edu) is a member of the ital editorial board and director of library information technology, hesburgh library, university of notre dame, south bend, indiana. lately i have been giving a lot of thought to how those of us in technology positions can extend our impact throughout our organizations. with finite budgets and time and relatively low personnel turnover, i have realized that the solution goes beyond merely finding ways that technology can optimize workflows through automation. i have been working in academic library technology for nearly 20 years and when i began my career, virtually all areas of technology required specialized staff – from supporting general computer applications to managing the technical infrastructure that underlay our core systems. these days, technology is still a specialty, but the function of technicians has become more focused on providing infrastructure and much of the general application support we used to provide has become ubiquitous and has become an expectation of almost all library positions. managing email, creating specialized formulas for data analysis, navigating operating systems, even developing basic databases, are now regular parts of library work. the trend of technological infusion will continue but instead of general technical tasks, almost all new library positions will require deeper technical skills. this is due, in part, to the function of knowledge work becoming more specialized as libraries focus on the areas where they can create the most value and those new domains require more technical expertise to be effective. perhaps the most striking example of this evolution is in the transition of catalogers to metadata specialists. the days of working with a single metadata format (marc) in a single, tabular interface (catalog) are quickly slipping away and being replaced by metadata structured in multiple complex schemes, expressed in formats like xml and json. instead of acquiring data from oclc, libraries need to work with web-based apis to harvest metadata. and the tools for manipulation require basic programming skills in languages like python or working with open source applications that look little like the integrated library systems we are used to. working with these tools can enable metadata experts to customize metadata at scale, but it requires new knowledge and even new ways of thinking about metadata and metadata manipulation. cataloging isn’t the only position undergoing change in academic libraries, either. acquisitions is pushing toward greater automation and patron driven selection. the catalog is becoming more like a bookstore and the discovery landscape includes a panoply of resources that are purchased only at the point a user clicks on a link to a resource. acquisitions is also occurring at larger scale, and requiring the ability to work with thousands of items in a batch, to select based on the qualities of what libraries want to make available, to analyze usage trends, and to load, update, and remove metadata as quickly from our discovery environment as possible. the tools to accomplish this are similar to those for metadata. beyond technical services, we’re beginning to see the role of the subject selector transition from building broad disciplinary collections toward a focus on curation editorial board thoughts | dehmlow 5 https://doi.org/10.6017/ital.v36i4.10239 of specialized collections requiring digitization and digital curation. the tools to accomplish this are digital asset management systems and web-based digital exhibition tools which are specialized content management systems. subject selectors are transforming into digital content creators and managers. technologically-driven change regularly outpaces generational personnel turnover in libraries, and given that technological change continues to grow exponentially, it is clear we need a flexible workforce and an organizational commitment to training and professional growth. while organizations are rewriting positions to include technical skills, we will always have a preponderance of staff that started their careers in libraries with depreciating skillsets. merely directing staff to webinars, conferences and self-driven development isn’t enough. multi-day workshops are great as long as there are opportunities to apply learning upon returning to work. to guarantee skill retention, sustained training needs to be directed towards the specific skills needed now and based in actual work, not just theoretical exercises. the challenge, then, becomes how to implement such a program and identifying who can provide the necessary training. how can specialization be disseminated to non-specialists? many libraries have some of the needed resources close at hand, even if staffing is thin and technical resources scarce. it requires thinking a bit pragmatically to reuse the resources libraries do have, and for technologists to evolve with demands as well, transitioning our roles from technology experts alone to a hybrid of practitioners, teachers, and enablers. teaching is, itself, a specialty and many it professionals are unlikely to have developed that skillset. most libraries, though, have staff who do have experience and expertise in training and pedagogy. evolving towards in-sourced technology development will undoubtedly require it staff to first learn effective teaching methods and basic curricula development. they will need a framework to take a set of specific skills and build ad-hoc courses with medium range learning objectives. teaching can occur in the context of actual work scenarios so that learning is put to practical use as part of that training, and skills retention improved. libraries can become labs for cross-training and knowledge sharing through leveraging our teachers and technologists in interdisciplinary partnerships and collaboration with a focus on internal growth so that library organizations can meet continuously changing demands. once staff have been trained in new technical areas, there is another opportunity for it professionals to extend their impact, by dividing technology-driven projects into the parts that require deep technical work and the parts that require transferable technical skills. if technologists start looking at ways to implement technical solutions in componentized ways instead of as end-to-end solutions, they have the opportunity to empower newly trained staff to contribute in practical ways through building solution foundations and then delegating configurable application inputs. as an example – developing a full application stack requires considerable programming skill, but learning to create and update extensible stylesheets to transform xml-based metadata is a teachable skill. it professionals could develop applications that take a configuration file and an xsl file as inputs while staff with xslt training can modify the configuration to include parameters for connecting to apis or loading xml. trained staff could then modify the xsl to transform data to their specifications without having to pass the task back to the it professional. information technology and libraries | december 2017 6 moving toward more holistic technology capability in libraries will require all personnel to be committed to evolving to meet the emerging needs of our organizations – it professionals included. for decades, technologists have been in the privileged position of having the necessary skills to advance the profession’s digital future, but it will be important for technologists in libraries to integrate the many valuable skills other personnel can offer so that they also can evolve in ways that best support our organizations – leveraging foundational library skills to enhance overall organizational capacity to accomplish tasks that are increasingly requiring technical expertise. i won’t pretend it will be easy. it will require libraries to prioritize organizationally-led training, even amidst the flurry of demands around us, but i think it is also critical to the future of the profession, and the old adage that winter pays for summer feels apropos here. technologists will need to be open to incorporating foundational library skills, to collaborating and learning from other library specialists, to thinking of their positions more broadly, and, for those who live in ivory towers (you know who you are), to eliminating the silos they’ve built and collaborate, cooperate, and engage. technologists are an important part of library ecosystems with what we contribute operationally, but i think we can have a greater impact if we propagate our knowledge in an effort to increase the profession’s overall technology capacity and become agents to support knowledge workers’ future skill development. letter from the editor kenneth j. varnum information technology and libraries | december 2018 1 https://doi.org/10.6017/ital.v37i4.10852 as 2018 draws to a close, so does our celebration of information technology and libraries’ 50th anniversary. in the final “ital at 50” column, editorial board member steven bowers takes a look at the 1990s. much as for steven, for me this decade was where my career direction and interests crystallized around the then-newfangled “world wide web.” taking a look at the topics covered in ital over those ten years, it’s clear that plus ça change, plus c'est la même chose: the more things change, the more they stay the same. we were exploring then questions of how the burgeoning internet would allow libraries to provide new services and be more efficient and helpful in improving existing ones. user experience, distributed data and the challenges that causes, who has access to technology and who does not…. all topics as vibrant and concerning then as they ar e now. with the end of our look back at the last 50 years, we are taking the opportunity start something new in 2019. there will be a new quarterly column, “public libraries leading the way,” to highlight a technology-based innovation from a public library perspective. topics we are interested in include the following, but proposals on any other technology topic are welcome. • virtual and augmented reality • artificial intelligence • big data • internet of things • 3-d printing and makerspaces • robotics • drones • geographic information systems and mapping • diversity, equity, and inclusion and technology • privacy and cyber-security • library analytics and data-driven services • anything else related to public libraries and innovations in technology columns will be in the 1,000-1,500 word range and may include illustrations. these will not be research articles, but are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. i’m grateful to the ital editorial board, and especially to ida joiner and laurie willis, for their guidance in shaping this concept. regardless of whether you work in a public, or any other, library, i’m always happy to talk with you about how your experience and knowledge could be published as an article in ital. get in touch with me at varnum@umich.edu. kenneth j. varnum, editor varnum@umich.edu december 2018 https://goo.gl/forms/mcz2kdltiwypsnq43 https://goo.gl/forms/mcz2kdltiwypsnq43 mailto:varnum@umich.edu mailto:varnum@umich.edu tending to an overgrown garden: weeding and rebuilding a libguides v2 system article tending to an overgrown garden weeding and rebuilding a libguides v2 system rebecca hyams information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12163 rebecca hyams (rhyams@bmcc.cuny.edu) is web and systems librarian, borough of manhattan community college/cuny. © 2020. abstract in 2019, the borough of manhattan community college’s library undertook a massive cleanup and reconfiguration of the content and guides contained in their libguides v2 system, which had been allowed to grow out of control over several years as no one was in charge of its maintenance. this article follows the process from identifying issues, getting departmental buy-in, and doing all of the necessary cleanup work for links and guides. the aim of the project was to make their guides easier for students to use and understand and for librarians to maintain. at the same time, work was done to improve the look and feel of their guides and implement the built-in a-z database list, both of which are also discussed. introduction in early 2019, the a. philip randolph library at the borough of manhattan community college (bmcc) (part of the city university of new york (cuny) system) hired a new web and systems librarian. the position itself was new to the library, though some of its functions had previously been performed by a staff member who had left more than a year prior. it quickly became apparent to the newest member of the library’s faculty that, while someone had at one point managed the website, the same could not really be said for the library’s libguides system and the mass of content contained within. the library’s libguides system was first implemented in january 2013 and over time the system came to be used primarily by instruction librarians to serve their teaching efforts. not long after bmcc implemented libguides, springshare announced libguides version 2 (v2), a new version of the system that included several enhancements and features not present in the earlier version.1 these features included the ability to mix content types in a single box (in the earlier version, for example, boxes could have either rich text or links but not both), a centrally managed asset library, and an automatically-generated a-z database list designed to make it easy to manage a publicfacing display. bmcc moved to libguides v2 around early 2015, but few of those who worked with the system ever took advantage of the newer features offered for quite some time, if at all. at the time the web and systems librarian came aboard, the bmcc libguides system contained over 400 public guides and an unwieldy asset library filled with duplicates and broken widgets and links. many of the guides essentially duplicated others, with only the name of the classroom instructor differing. there were, for example, 69 separate guides just for english 101, some of which had not been updated in three or four years. there were there no local guidelines for creating or maintaining guides, and in theory, each librarian was responsible for their own. however, it was apparent that in practice, no one was actively managing the guides or their related assets, as the lists of both were overwhelming. the creators of existing guides were primarily reference and instruction librarians whose other responsibilities meant there was little mailto:rhyams@bmcc.cuny.edu information technology and libraries december 2020 tending to an overgrown garden | hyams 2 time to do guide upkeep and because there was no single person in charge of the guides, there was no one to ensure any maintenance took place. in addition to the unwieldy guide list and asset library, the bmcc library was also effectively maintaining two separate a-z database lists, one on the library’s website that was a homegrown sql database built by a previous staff member, and another running on libguides to provide links to databases via the guides. the lists were not in sync with one another and several of the librarians were unaware that the libguides version of the list even existed, leading to links to databases appearing on both the database list and as link assets. and, while the libguides a -z list was not linked from the library’s website, it was still accessible from points within libguides, meaning that patrons could encounter an incorrect list that was not being maintained. getting started before any work could be done on our system, there needed to be buy-in from the rest of the library faculty. with the library director in agreement, agenda items were added to department meetings between march and may 2019 for discussion and department approval. the various aspects of the project were pitched to emphasize the following goals: • removing outdated material, broken links, etc. • streamlining where information could be found • decluttering guides to make everything easier to use and understand for students • improving the infrastructure to make maintenance and new guide creation easier and more manageable • standardizing layouts and content the aim of all of this would be to increase guide usability, accessibility, and make the guides overall a more consistent resource for our students. for the sake of transparency (as well as to have a demo of some of the aesthetic changes discussed in more detail below), a project guide was created and shared with the rest of the library department to share preliminary data as well as detailed updates as tasks were completed.2 process the database list while the libguides a-z database list, a feature built into v2 of the platform, contained information about our databases, it was essentially only serving to provide links to databases when creating guide content. there was some indication, in the form of a dormant a-z database “guide,” that someone had tried to create a list in libguides by manually adding assets to a guide. while that was a common practice in libguides v1 sites, as the built-in list was not yet a part of the system, the built-in list itself was never properly put into use. the links on our website all pointed to a homegrown list which, while powered by an sql database, was essentially a manual list. because of its design, it had proved impossible for anyone in the library to update without extensive web programming knowledge. it seemed a no-brainer to work on the database list first. this way we had both the infrastructure to update database-related content on the guides and a single and up-to-date list of resources with enhanced functionality that could benefit the library’s users almost immediately.3 information technology and libraries december 2020 tending to an overgrown garden | hyams 3 to begin, the two lists were compared to find any discrepancies, of which there were many. as the e-resources librarian was on leave at the time, the library director was consulted to determine which of databases missing from the libguides list were active subscriptions (and which of the ones missing from the homegrown list were previously cancelled so they could be removed). once the database list reflected current holdings, the metadata entries for the databases on the libguides side were updated to include resource type, related subjects, and related icons. these updates would enhance the functionality of the libguides list, as it could be filtered or searched using that additional information, something that was missing from the homegrown lis t. in addition to updating content and adding useful metadata, some slight visual changes were made to improve the look and usability of the list using custom css. most of this was done because as the list was being worked on, several librarians (of those who were even aware of it in the first place) mentioned that one reason they disliked the libguides list was because of the font size and spacing, which they felt was too small and hard to read. with the list updated, it was presented at the march 2019 department meeting and quickly won over all in attendance, especially when it was pointed out that the list could be very easily maintained because it required no special coding knowledge. while the homegrown list would remain live on the server for the rest of the semester (so as to not disrupt any classes that may have been using it), it was agreed that the web and systems librarian could go ahead with switching all of the links pointing to the homegrown list to point to the springshare list instead. the asset library because of how guides were typically created over the years since adopting libguides (many appeared to have been copied from another existing guide each time) the asset library had grown immense and unmanageable. for example, there were 149 separate links to our “databases by subject” page on the library’s website, the overwhelming majority of which were only used once. there were also 145 separate widgets for the same embedded scribd-hosted keyword worksheet, which was in fact broken and displayed no content. this is to say nothing of the broken-link report that no one had reviewed in quite some time. tackling the cleanup of duplicates and fixing of broken links/embeds was a large piece of the invisible work taken on behind the scenes to make maintaining the guides easier in the future. in order to analyze the data, the asset library report was exported to an excel file to make it easier to identify issues that needed correction. to start this process, we requested that springshare technical support wipe out all assets (other than documents) that were not mapped to anything and were just cluttering up the asset library (this ended up being just under 2,000 assets).4 most of those items had been removed from the guides they were originally included on but were never removed from the asset library. they served no real function other than to clutter up the backend. the guide authors had given the web and systems librarian permission to remove anything broken that could not be easily fixed. this included the aforementioned broken worksheet (and other similar items), as well as an assortment of youtube video embeds where the video had since been taken down, resulting in a “this video is unavailable” error message. it was felt that since those were already not working and seriously hurt the reliability of our guides to our users, that no further permission was needed. then came the much more tedious task of standardizing (where possible) which assets were in use. this involved going into guides listed as containing known-duplicate assets, replacing them with a single, designated asset, and then removing the resulting unmapped items.5 it was decided information technology and libraries december 2020 tending to an overgrown garden | hyams 4 that while many of the guides would likely be deleted after the spring semester, that only assets appearing on currently-active guides would be standardized. while in hindsight, as many of the links that were fixed were on guides that were soon-to-be deleted, it would have been better to hold off and wait until guides could be deleted first. however, doing at least some of this work in advance helped find other issues including instances where our proxy prefix was included directly in the url (an issue as we were also in the process of changing our ezproxy hosting) and where custom descriptions or link names were unclear. “books from the catalog” assets had their own issues that also needed to be addressed. with a pending migration of the library’s ils, it was already apparent that the links to any books in the library’s catalog would need updating so they could have a shot at continuing to function postmigration.6 we had been told at the time that the library’s primo instance would remain through the migration (though this changed during the migration process) so at the time we felt it important to ensure that all links were pointing to primo, as some had been pointing to the soonto-be decommissioned opac. for consistency, the urls were structured as isbn searches instead of ones relying on internal system numbers that would soon change. however, it became obvious very early on that some of the links to books were either pointing to materials that were no longer in the library’s collection, or were pointing to a previously decommissioned opac server, both of which resulted in errors. because the domain of the previously decommissioned opac server had been whitelisted in the link checker report settings, these items had not appeared on the broken link list. using the filtered list of “books from the catalog” assets, all titles were checked, which allowed the web and systems librarian to remove items that were no longer in the collection and make other adjustments as needed. as a result of the asset cleanup process, the asset library went from an unwieldy total of more than 5,000 items to just over 2,000 items. it also simplified the process for reusing assets in new guides, as there was now only one choice per item, and made it much easier to find and fix broken links and embeds. the guides the cleanup of the guides themselves was by far the most complex task. before starting the guide cleanup work itself, the web and systems librarian performed a content analysis to identify and recommend guides for deletion and which could be converted into general subject area guides. because a common practice was to create a “custom” guide for each class that came in for a library instruction session, there was an overrepresentation of guides for the classes that had regular sessions: english 101 (english composition), english 201 (introduction to literature), speech 100 (public speaking), and introduction to critical thinking. those four courses accounted for 187 guides, or over 40 percent of the total number in our system. the majority of them had not been updated directly in over three years, and in some cases, were designed for instructors who no longer taught at the college. perhaps more telling was that the content for these guides diff ered more across the librarians who created them than across the courses they were designed for. this meant that while there might be three or four different iterations of the english 101 guide, the guides created by the same librarian for different introductory courses were essentially the same. before the arrival of the web and systems librarian, one of the other librarians had been occasionally maintaining guide groups for “current courses” and “past courses,” but it was unclear if anyone was still actively maintaining these groupings, as guides for current instructors were sometimes under “past courses” and vice versa. because these groups did not actually hide the information technology and libraries december 2020 tending to an overgrown garden | hyams 5 guides from view on the master list of guides and appeared to be unnecessary work, it was decided to remove the groupings. instead, the web and systems librarian would plan to revisit the guides on a regular basis to unpublish/remove anything for courses that were no longer taught. however, since the philosophy behind the guides was to move from “custom” guides for each instructor’s section to a general guide for the course as a whole for the overwhelming majority of cases, the need for maintaining these groupings was essentially eliminated anyway. in may 2020, a preliminary list of guides to be deleted was presented to the librarians at the monthly department meeting. the list was broken down as: • duplicates to be deleted: this portion consisted primarily of course guides like those mentioned above where multiple guides existed for the same course, most of which used the exact same content. • guides to be “merged:” while merging guides is not actually possible in the libguides platform, there were cases where we had two or three for the same course. they could be condensed into a single guide with the rest deleted. • guides to convert to subject area guides: these were guides that were essentially already structured as a subject guide but were titled for a specific course, and in many cases, a guide for the subject area did not already exist (for example, a course-specific guide for business would become the business subject area guide). • dead guides: these were guides that had not been updated in more than two years and had not been viewed in at least one year. librarians were given an opportunity in the department meeting to comment on the list, as well as to contact the web and systems librarian with any comments. additionally, as some of the classroom faculty on campus had connections to specific guides, the library director also sent out a message to classroom faculty to let them know of our general plan to revamp the guides and that many would be removed over the summer. surprisingly, there were few objections either amongst the librarians or the classroom faculty once they understood the rationale and process. of the few classroom faculty members that did respond to the library director’s message, most of them were more concerned with content or specific links that they felt strongly about versus the guides themselves. in those cases, we noted the content requests to make sure they appeared on the new guides. most of these instructors were satisfied when we further explained our process and , if needed, ensured them that the content they requested would be worked into the new guide. only one instructor who responded, whose assignment was related to a grant they had received, made a strong case for keeping a separate guide for their sections of english 101. with the project approval out of the way, it was then time to begin removing all of the to-bedeleted guides and start the process of revamping those that would be kept. the goal was that the project would be completed by the start of the fall semester so that faculty and students would come back to a new (and hopefully, much improved) set of guides. removing debris to be cautious, a few preliminary steps were taken before the guides selected for deletion were removed. for starters, the selected guides had their status changed to “unpublished,” meaning that they no longer appeared on the public-facing list of guides. this gave everyone a chance to say something if a guide they were actively using suddenly went “missing.” these unpublished guides were then downloaded using the libguides html backup feature and saved to the department’s information technology and libraries december 2020 tending to an overgrown garden | hyams 6 network share drive. while the html backup output is not a full representation of the guide (the file generated displays as a single page and is missing any formatting or images that were included in the guide), it does include all of a guide’s content, meaning that a link or block of text can be retrieved from the backup in case of moments of “i know i had this on my guide before but....” because of the somewhat haphazard nature of our guides, deleting unwanted ones turned out to result in interesting and unexpected challenges. over the years, some of librarians had, from time to time, reused individual boxes between guides, but there was no consistency to the practice. while there was a repository guide for reusable content, not everyone used it or used it consistently. thankfully, libguides runs a pre-delete check, which proved to be invaluable in this process, as it showed if any of the boxes displayed on one guide were reused on any others. in most cases where boxes were reused, they were reused on guides that were also on the “to be deleted” list, but that was not always the case. by having that check we could find the other guides listed and make copies of the boxes that would have otherwise been deleted. if a box was reused on multiple guides that were being kept, it was copied to the reusable content guide and then remapped from there. cosmetic improvements in conjunction with the work being done to improve content of our guides, the web and systems librarian felt it was the perfect opportunity to update the guide templates and overall aesthetics to make the guides more visually appealing, especially considering little had been done in this area system-wide apart from setting the default color scheme. using the project guide as an initial sandbox, several changes were put into motion that would eventually be worked into new templates and pushed out to all of the reworked guides. the first, and perhaps biggest, change was the move from tab navigation to side navigation (an option first made available with the release of libguides v2). while there have been several usability studies that have debated using one over the other, in this case side navigation was chosen both for the streamlined nature of the layout as a whole (by default there is only one full content column), and because enabling the box-level navigation could serve as a quick index for anyone looking to find specific content on a page.7 side navigation also avoided the issue of long lists of tabs spilling into a second row, which further complicated page navigation. several changes to the look and feel of the guides were also put into place, with many of the changes coming from suggestions given on various libguide style or best practice guides or more general recommendations from web usability guidelines.8 perhaps most importantly, all of the font sizes were increased for improved readability, especially on box titles and headers, to better facilitate visual scanning. the default fonts were also replaced with two commonly used fonts from the google fonts library, roboto (for headings and titles) and open sans (for body text). additionally, the navigation color scheme was changed because the orange of the college’s blueand-orange color scheme regularly failed accessibility contrast checks and was described by some colleagues as “harsh on the eyes.” instead, two analogous lighter shades of blue (one of which was taken from the college’s branding documentation) were selected for the navigation and box titles respectively, both of which allowed for the text in those areas to be changed from white to black (again, for improved readability). figure 1 shows a typical “before” guide navigation design, and figure 2 shows a typical “after” design. information technology and libraries december 2020 tending to an overgrown garden | hyams 7 figure 1. a sample of guide navigation and content frequently found on guides before start of cleanup figure 2. navigation and content after revisions additionally, the web and systems librarian took this opportunity to go through the remaining guides to ensure they were all consistent. most of this work fell in the area of text styling, or rather, undoing text styling. it was clear from several of the guides that over the years, librarians had not been happy with the default font sizes or styles, which lead to a lot of customizing using the built-in wysiwyg text editor. not only did this create a nightmare in the code itself (as the wysiwyg editor adds a lot of extraneous tags and style markup), but it also meant that the changes coming from the new stylesheet were not being applied universally as any properties assigned on a page overrode the global css. there was also the issue of paragraph text (

) that was sometimes styled as fake headings (made larger or bolder to look like headings, but not using the proper tags) which needed to be corrected for consistency and accessibility purposes. replanting and sprucing up with an overwhelming majority of the guides (and their associated assets) deleted, it was finally time to rework the remaining guides into clear, easy-to-use resources that would benefit our students. at this point the guides fell into three categories: • guides that just needed to be pruned and updated. • guides that should be combined into a single subject area guide. • guides that should be created to fill an unmet need. information technology and libraries december 2020 tending to an overgrown garden | hyams 8 pruning and updating tasks were generally the least-arduous, as many of the guides included content that was also housed on discrete guides (citations, resource evaluation, etc.). instead of duplicating, for example, citation formats on every guide, those pages were replaced with navigation-level links out to the existing citation guide. this was also the point that we could do more extensive quality control such as switching to a single content column which further emphasized the extraneous information on many of our guides. infographics, videos, and long blocks of links or text were scrutinized to determine if they were helping to enhance students’ understanding of the core content or if they were merely providing clutter that would make it more difficult to understand the important information.9 in some cases, by going from guide to guide, it became apparent that there were guides for multiple courses in a subject area where the resources were basically identical. this was most noticeable in the criminal justice and health education subject areas. in these cases, it made little sense to keep separate course guides when the content was basically the same across them. to remedy this duplication, one of the course guides for each subject was transformed into the subject area guide, and resources were added to ensure they covered the same materials that the separate course guides may have covered. the remaining course guides were then marked for future deletion as they were no longer needed. lastly, subject areas without guides were identified so that work could be done later to create them. as we had discussed moving towards using the “automagic” integration of guide content into our blackboard learning management system (lms), this step will be key in ensuring that all subject areas have at least some resources students can use. however, as of this time we have yet to finish creating these additional guides, and several subject areas (including computer science, nursing, and gender studies) have no guides at all. next steps now that all of the work to clean and update our libguides is done, the most important next step is coming up with a workflow to ensure that the guides stay relevant and useful. the web and systems librarian mostly left the guides alone for the fall 2019 semester to allow their colleagues time to use them and report back any issues. to the web and systems librarian’s surprise there were few issues reported, but that does not mean there is no room for future improvement. as a department, it is clear that we need a formal plan for maintaining the guides, including update frequency, content review, and guidelines for when guides should be added or deleted. additionally, immediately following the conclusion of this cleanup project the library’s website was forced into a server migration and full rebuild for reasons outside of the scope of this article. however, as a result there were changes made on the library’s site involving the look and feel of pages that will need to be carried through into our guides and associated springshare platforms. while most of this work is relatively simple, mimicking changes developed in wordpress to work properly on external services will take time and effort. conclusion overall, while this project was a massive undertaking (done almost entirely by a single person), the end result, at least on the surface, has made our guides much easier to use and understand. there were obviously several things that, if the project were to be done over, should have been done differently, mostly involving the cleaning of the asset library. however, it is now much easier information technology and libraries december 2020 tending to an overgrown garden | hyams 9 to refer students to guides for their courses and the feelings about the guides amongst the library faculty have become much more positive. endnotes 1 “libguides: the next generation!,” springshare blog (blog), june 26, 2013, https://blog.springshare.com/2013/06/26/libguides-the-next-generation/. 2 the guide can be viewed at: https://bmcc.libguides.com/guidecleanup. 3 though the author only learned of the project undertaken at unc a few years ago, after they had already finished this project, a similar project was outlined here: sarah joy arnold, “out with the old, in with the new: migrating to libguides a-z database list,” journal of electronic resources librarianship 29, no. 2 (april 2017): 117–20, https://doi.org/10.1080/1941126x.2017.1304769. 4 because there was no way to view the documents before a bulk deletion, documents were manually reviewed and deleted as needed. 5 it was only long after this process that springshare promoted that they could do this on the backend by request. 6 however, it turned out that due to the differences in url structure between classic primo and primo ve that this change was completely unnecessary as the urls did actually needed to be changed again post-migration. at least they were consistent which meant a systemwide findand-replace could take care of most of the links. 7 several studies have been done since the roll out of libguides v2 including: sarah thorngate and allison hoden, “exploratory usability testing of user interface options in libguides 2,” college and research libraries 78, no. 6 (2017): 844–61, https://doi.org/10.5860/crl.78.6.844; kate conerton and cheryl goldenstein, “making libguides work: student interviews and usability tests,” internet reference services quarterly 22, no. 1 (january 2017): 43–54, https://doi.org/10.1080/10875301.2017.1290002. 8 of the many guides the author consulted, the following were the most informative: stephanie jacobs, “best practices for libguides at usf,” https://guides.lib.usf.edu/c.php?g=388525&p=2635904; jesse martinez, “libguides standards and best practices,” https://libguides.bc.edu/guidestandards/getting-started; carrie williams, “best practices for building guides & accessibility tips,” https://training.springshare.com/libguides/best-practices-accessibility/video. 9 there is a very detailed discussion of cognitive overload in libguides in jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (march 1, 2010): 53–63, https://doi.org/10.1080/10875300903530199. https://blog.springshare.com/2013/06/26/libguides-the-next-generation/ https://bmcc.libguides.com/guidecleanup https://doi.org/10.1080/1941126x.2017.1304769 https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1080/10875301.2017.1290002 https://guides.lib.usf.edu/c.php?g=388525&p=2635904 https://libguides.bc.edu/guidestandards/getting-started https://training.springshare.com/libguides/best-practices-accessibility/video https://doi.org/10.1080/10875300903530199 abstract introduction getting started process the database list the asset library the guides removing debris cosmetic improvements replanting and sprucing up next steps conclusion endnotes facing what’s next, together lita president’s message facing what’s next, together emily morton-owens information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12383 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the acting associate university librarian for library technology services at the university of pennsylvania libraries. when i wrote my march editorial, i was optimistically picturing some of the changes that we are now seeing for lita—while being scarcely able to imagine how the world and our profession would need to adapt quickly to the impacts on library services as a result of covid-19. it is a momentous and exciting change for us to turn the page on lita and become core, yet this suddenly pales in comparison to the challenges we face as professionals and community members. libraries’ rapid operational changes show how important the ingenuity and dedication of technology staff are to our libraries. since states began to shut down, our listserv, lita-l, has hosted discussions on topics like how to provide person-to-person reference and computer assistance remotely, how to make computer labs safe for re-occupancy, how to create virtual reading lists to share with patrons, and how to support students with limited internet access. there has been an explosion in practical problem-solving (ils experts reconfiguring our systems with new user account settings and due dates), ingenuity (repurposing 3d printers and conservation materials to make masks), and advocacy (for controlled digital lending). sometimes the expense of library technologies feels heavy, but these tools have the ability to scale services in crucial ways—making them available to more people at the same time, available to people who can only take advantage after hours, available across distances. technologists are focused on risk, resilience, and sustainability, which makes us adaptable when the ground rules change. our websites communicate about our new service models and community resources; ill systems regenerate around increased digital delivery; reservation systems for laptops now allocate the use of study seating. our library technology tools bridge past practices, what we can do now, and what we’ll do next. one of our values as ala members is sustainability. (we even chose this as the theme for lita’s 2020 team of emerging leaders.) sustainability isn’t about predicting the future and making firm plans for it; it’s about planning for an uncertain future, getting into a resilient mindset, and including the community in decision-making. although the current crisis isn’t climate-related per se, this way of thinking is relevant to helping libraries serve their communities. we will need this agile mindset as we confront new financial realities. our libraries and ala itself are facing difficult budget challenges, layoffs, reorganizations, and fundamental conversations about the vitalness of the services we provide. my favorite example from my own library of a covid-19 response is one where management, technical services, and it innovated together. our leadership negotiated an opportunity for us to gain access to digitized, copyrighted material from hathitrust that corresponds to print materials currently locked away in our library building. thanks to decades of careful effort by our technical services team, we had accurate data to match our print records with records for the digital versions. our it team had processes for loading the new links into our catalog almost mailto:egmowens.lita@gmail.com information technology and libraries june 2020 facing what’s next, together | morton-owens 2 instantaneously. the result was a swift and massive bolstering of our digital access precisely when our users needed it most. this collaboration perfectly illustrates how natural our merger with alcts and llama is. as threats to our profession and the ways we’ve done things in the past gather around us, i am heartened by the strengths and opportunities of core. it is energizing to be surrounded by the talent of our three organizations working together. i hope more of our members experience that over the summer and fall, as we convene working groups and hold events together, including a unique social hour at ala virtual and an online fall forum. i close out my year serving as the penultimate lita president in a world with more sadness and uncertainty than we could have foreseen. we are facing new expectations and new pressures, especially financial ones. as professionals and community members, we are animated by our sense of purpose. while lita has been transformed by our vote to continue as core, the support and inspiration we provide each other in our association will carry on. near-field communication (nfc): an alternative to rfid in libraries articles near-field communication (nfc) an alternative to rfid in libraries neeraj kumar singh information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11811 neeraj kumar singh (neerajkumar78ster@gmail.com), phd, is deputy librarian, panjab university, chandigarh, india abstract libraries are the central agencies for the dissemination of knowledge. every library aspires to provide maximum opportunities to its users and ensure optimum utilization of available resources. hence, libraries have been seeking technological aids to improve their services. near-field communication (nfc) is a type of radio-frequency technology that allows electronics devices—such as computers, mobile phones, tags, and others—to exchange information wirelessly across a small distance. the aim of this paper is to explore nfc technology and its applications in modern era. the paper will discuss potential use of nfc in the advancement of traditional library management system. introduction similar to other identification technologies such as radio-frequency identification (rfid), barcodes, and qr codes, near-field communication (nfc) is a short-range (4–10 cm) wireless communication technology. nfc is based on the existing 13.56 mhz rfid contactless card standards which have been established for several years and are used for payment, ticketing, electronic passport, and access control among many other applications. data rates range from 106 to 424 kilobits per second. a few nfc devices are already capable of supporting up to 848 kilobits per second which is now being considered for inclusion in the nfc forum specifications. 1 compared to other wireless communication technologies nfc is designed for proximity or shortrange communication which provides a dedicated read zone and some inherent security. its 13.56 mhz frequency places it within the ism band, which is available worldwide. it is a bi-directional communication meaning that you can exchange data in both directions with a typical range of 4 – 10 cm depending on the antenna geometry and the output power.2 nfc is convenient and fast: the action is automatically triggered when your phone comes within 10 cm near the nfc tag and you get instant access to the content on mobile, without a single click.3 rfid and nfc technologies are similar in that both use radio waves. both rfid and nfc technologies exchange data within electronic devices in active mode as well as in passive mode. in the active mode, outgoing signals are basically those that actually come from the power source, whereas in case of passive mode the signals use the reflected energy they have received from the active signal. in rfid technology the radio waves can send information to receivers up to hundreds of meters away depending on the frequency of the band used by th e tag. if provided with high amount of power, these signals can also be sent to extreme distances (e.g., in the case of airport radar). at large airports it typically controls traffic within a radius of 100 kilometers of the airport below an elevation of 25,000 feet. rfid is also used very often in tracking animals and vehicles. mailto:neerajkumar78ster@gmail.com information technology and libraries june 2020 near field communication (nfc) | singh 2 in contrast, items like passports and payment cards should not be capable of long-distance transmissions because of the threat of theft of personal information or funds. nfc is designed to meet this need. nfc tags are very small in size so as to fit on the inner side of devices and products such as inside luggage, purses and packs as well as from inside wallets and clothing and can be tracked. nfc technology has added security features that make it much more secure than the previously popular rfid equivalent and it is difficult to steal information stored in it. nfc has short range of work area compared to other wireless technologies, so it can be widely used for payments, ticketing and service admittance and thus has proved to be a safer technology. it is because of this security feature that this technology is used in cellular phones to turn them into a wallet.4 both rfid and nfc wireless technologies can operate in active and passive communication modes to exchange data within electronic devices. the main differences between nfc and rfid are: • though both rfid and nfc use radio frequencies for communication, nfc can be said to be an extension of the rfid technology. the rfid technology has been in use for more than a decade, but nfc has emerged on the scene recently. • rfid has a wider range whereas nfc has limited communication and operates only at close proximity. nfc typically has a range of a few centimeters. • rfid can function in many frequencies and many standards are being used, but nfc requires a fixed frequency of 13.56 mhz, and some other fixed technical specifications to function properly. • rfid technology can be used for such applications as item tracking, automated toll collecting on roads, vehicle movement, etc., that require wide area signals. nfc is appropriate for applications that carry data that needs to be kept secure like mobile payments, access controls, etc., that carry sensitive information. • rfid operates over long distances while exchanging data wirelessly so it is not secure for the applications that store personalized data. rfid using items susceptible to various fraud attacks such as data corruption. nfc’s short working range considerably reduces this risk of data theft, eavesdropping, and “man in the middle” attacks. • nfc has the capability to communicate both ways and thus is suitable to be used for advanced interactions such as card emulation and peer-to-peer sharing. • a number of rfid tags can be scanned simultaneously, while only a single nfc tag can be scanned at a time. how nfc works the extended functionality of a traditional rfid system has led to the nfc forum. the nfc forum has defined three operating modes for nfc devices: tag reader/writer mode; peer-to-peer mode, and card emulation mode (see figure 1). the nfc forum technical specifications for the different operating modes are based on the iso/iec 18092 nfc ip-1, jis x 6319-4, and iso/iec 14443. these specifications must be used to derive the full benefit from the capabilities of nfc technology. contactless smart card standards are referred to as nfc-a, nfc-b, and nfc-f in nfc forum specifications.5 information technology and libraries june 2020 near field communication (nfc) | singh 3 figure 1. nfc operation modes6 reader/writer mode in reader/writer mode (see figure 2), an nfc-enabled device is capable of reading nfc forummandated tag types, such as a tag embedded in an nfc smart poster. this mode allows nfcenabled devices to read the information that is stored on nfc tags embedded in smart posters and displays. since these tags are relatively inexpensive, they provide a great marketing tool for companies. figure 2. reader mode7 the reader/writer mode on the radio frequency interface is compliant with the nfc-a, nfc-b, and nfc-f schemes. examples of its use include reading timetables, tapping for special offers, and updating frequent flyer points, etc.8 information technology and libraries june 2020 near field communication (nfc) | singh 4 peer-to-peer mode in peer-to-peer mode (see figure 3), both devices must be nfc-enabled in order for them to communicate with each other to exchange information and to share files. the users of nfcenabled devices can thus quickly share information and other files with a touch. as an example, users can exchange data such as digital photos or virtual business cards via bluetooth or wifi. figure 3. peer-to-peer mode9 peer-to-peer mode is based on the nfc forum’s logical link control protocol specification and is standardized on the iso/iec 18092 standard. card-emulation mode in card-emulation mode (see figure 4), an nfc device behaves like a contactless smart card so that users can perform transactions such as purchases, ticketing, and transit access control with just a touch. an nfc device may have the ability to emulate more than one card. in card-emulation mode, an nfc-enabled device communicates with an external reader much like a traditional contactless smart card. this allows contact less payments and ticketing by nfc-enabled devices without changing the existing infrastructure. information technology and libraries june 2020 near field communication (nfc) | singh 5 figure 4. card-emulation mode by adding nfc to a contactless infrastructure one can enable two-way communications. in the air transport sector, this could simplify many operations such as updating seat information while boarding or adding frequent flyer points while making a payment.10 nfc standards and specifications the nfc specifications are defined by an industry organization called the nfc forum, which has nearly 200 member companies. the nfc forum was formed in 2004 with the objective of advancing the use of nfc technology. this was achieved by educating the market about nfc technology and developing specifications to ensure interoperability among devices and services. the nfc forum members are working together in task forces and working groups. as noted earlier, nfc technology is based on existing 13.56 mhz rfid standards and includes several protocols such as iso 14443 type a and type b, and jis x 6319-4 (which is also a japanese industrial standard known as sony felica). the iso 15693 standard, an additional 13.56 mhz protocol established in the market, is being integrated into the nfc specification by an nfc forum task force. smartphones in the market are already supporting the iso 15693 protocol.11 these nfc specifications and especially the specifications for the extended nfc functionalities are again standardized by the international standard organizations like iso/iec ecma and etsi.12 initially the rfid standards i.e. iso/iec 14443 a, iso/iec 14443 b and jis x6319-4 were also pronounced as nfc standards by different companies working in the field such as nxp, infineon, and sony. the first ever nfc standard was ecma 340, based on the air interface of iso/iec 14443a and jis x6319-4. ecma 340 adapted the iso/iec standard 18092. at the same time, major credit card companies like europay, mastercard, and visa introduced the emvco payment standard, which is based on iso/iec 14443 a and iso/iec 14443 b. these groups harmonised the over-the-air interfaces within the nfc forum. they are named nfc-a (iso/iec 14443 a based), nfc-b (iso/iec 14443 b based), and nfc-f (felica based).13 information technology and libraries june 2020 near field communication (nfc) | singh 6 nfc tags an nfc tag is a small microchip embedded in a sticker or wristband that can be read by the mobile devices that are within range. information regarding the item is stored in these microchips.14 an nfc tag has the capability to send the information stored on it to nfc enabled mobile phones. nfc tags can also perform various actions, such as changing the settings of handsets or even launch a website.15 tag memory capacity varies by the type of tag. for example, a tag may store a phone number or a url.16 the most common use of the nfc tag function on an object is mobile wallet payment processing, where the user swipes or flicks a mobile phone on a nfc tag to make payment. google’s version of this system is google wallet.17 figure 5. a quick overview of the tag types18 applications of nfc since it emerged as a standard technology in 2003, nfc technology has been implemented across multiple platforms in various ways. the primary driving force behind nfc is its application in the commercial sector in which the implementation of the technology focuses on such areas as sales and marketing. there are also emerging many new and interesting applications in various other fields of education and healthcare. all of these may impact libraries, librarians, and library users, either by prompting adaptations to existing collections and services or inspiring innovation in our profession.19 • mobile payment: customers with nfc-enabled smartphones can link with their bank accounts and are able to pay by simply tapping phones to an nfc-enabled point-of-sale.20 information technology and libraries june 2020 near field communication (nfc) | singh 7 • access and authentication: “keyless access” to restricted areas, cars, and other vehicles. one can imagine other potential uses of nfc in the future with the devices in the home being controlled by it.21 • transportation and ticketing: nfc-enabled phones can connect with an nfc-enabled kiosk to download a ticket, or the ticket can be sent directly to an nfc-enabled phone over the air (ota). the phone can then tap a reader to redeem that ticket and gain access. 22 • mobile marketing: nfc tags they can be embedded into the indoor and outdoor signage. upon tapping their smartphone on an nfc-enabled smart poster, the customer can read a consumer review, visit a website, or even view a movie trailer. • healthcare: nfc medical cards and bracelet tags can store relevant, up-to-date patient information like health history, allergies, infectious diseases, etc. • gaming: nfc technology is the bridge between physical and digital games. players can tap each other’s phones together and earn extra points or receive access to a new level, or get clues, by using nfc application.23 • inventory tracking, smart packaging, and shelf labels: nfc-tagged objects could provide a wide variety of information in different use environments. nfc-enabled smartphones can be used to tap the tags to access book reviews and information about the book’s author and recommend the book to other readers. users could check out a book or add it to a wish list to check out at a later date. indeed, with nfc, library records and metadata could theoretically be stored on and retrieved from library physical holdings themselves, allowing a patron to tap a book or resource borrowed from the library to recall its title, author, and due date.24 applications of nfc in libraries: introducing the smart library some libraries are beginning to use nfc technology as an alternative to rfid. yusof et al. proposed a newly developed application called the smart library, or “s-library,” that has adopted the nfc technology.25 in the s-library, library users can perform many library transactions just by using their mobile smartphones with integrated nfc technology. the users of s-library are required to download and install an app in their compatible mobile phone. this app provides the user relevant and easy to use library functionality such as searching, borrowing, returning, and viewing their transaction records. in this s-library model the app is integrated with the library management software. the s-library app needs to be installed on the mobile device, and the mobile device requires an internet connection that will connect it to the lms. the s-library provides five major functionalities to the user: scan, search, borrow, return, and transaction history. in the scanning function, users can access the information of a book by simply touching their mobile phone to the nfc tag on the book. as soon as the phone touches the book, information regarding its title, author, contents, synopsis, etc. will automatically be displayed on the screen of the mobile device. users can search for books by entering keywords such as book title, author name, year, etc. through the borrowing function the app allows users to check out books of interest. the user just needs to touch their mobile phone to the nfc-tagged book to borrow it. the transaction is automatically stored to the lms database. similar to the borrowing process is the returning process. the user is required to select the return function on the menu and touch the mobile device to the book, and the returning transaction will be automatically performed and stored in the lms database. however, it should be ensured that the book is physically returned to the library by returning the book through the nfc-enabled book drop information technology and libraries june 2020 near field communication (nfc) | singh 8 system of the library and only then transaction should be updated in the lms. the user can check the due date for the current transaction as well as his transaction history. the function of transaction history allows the user to view the list of books that have been borrowed from time to time and their status.26 data transmission for nfc technology can be up to 848 kilobits/second whereas the data transmission rate with rfid technology is 484 kilobits/second. taking advantage of this high data rate, the response time for s-library is also very fast. this is a huge improvement over rfid technology and especially over barcode technology where data transmission rate is variable and inconsistent and dependent upon the quality of the barcodes. the second key advantage of slibrary is that the time taken to read a tag (the communication time between a reader and an nfc enabled device) is very fast. the third advantage of nfc is its usability in comparison to the other two technologies. nfc technology is human-centric because it is intuitive and fast and the user is able to use it anywhere, anytime using their mobile phones. in rfid and barcode technology usability is item centric as person has to go to the specific device located in the library. 27 most of the shortcomings of rfid and barcode technology have been overcome by the s-library. with barcode technology, the quality of barcodes, printing clarity, print contrast ratio , and also the low level of security were all challenges. rfid technology had many drawbacks such as lack of common rfid standards, security vulnerability, reader and tag collision that happens when multiple tags are energized by the rfid tag reader simultaneously and they reflect their respective signals back to the reader at the same time. because nfc is touch based, it has presented a viable alternative tool for library users to overcome these weaknesses of the older technology. yosof et al. found many advantages to s-library: faster book borrowing; saved time of the user as well as the library staff; the connection can be initialised in less than a second; no configuration on the mobile device is required; and higher usability ratings and security.28 however, there are also some limitations of s-library. first, device compatibility is an issue, because s-library presently supports only the android platform. second, as the s-library application only supports up to a 10centimeter range, coverage is an issue. mobile payments nfc technology can be used for several library functions such as making payments, paying library fines, purchasing tickets to library events, or donating to library. users may also be able to use their digital wallet to pay for photocopying, printing, scanning, etc. keeping the requirements of the nfc technology in the future libraries have to enquire about the possibility of adding nfc payment capabilities into the existing hardware and also while purchasing new machines. already, bibliotheca’s smartserv 1000 self-serve kiosk, introduced in september 2013, includes nfc as a payment option. in the future other library automation companies for nfc integration would also be worth monitoring.29 library access and authentication nfc-enabled devices can be used to accessing the library and authenticate users. these capabilities suggest that nfc technology may play an important role in the next generation of identity management systems. of particular interest in this context are several applications of nfc in two-factor authentication, which generally combines a traditional password or other digital credential with a physical, nfc-enabled component as well. for example, an authentication system information technology and libraries june 2020 near field communication (nfc) | singh 9 could require the user to type in a fixed password in addition to tapping an nfc-enabled phone, identity card, or ring to the device they are logging in to. ibm has demonstrated a two -factor authentication method for mobile payment in which a user first types in a password and then taps an nfc-enabled credit card, issued by their bank, to their nfc-enabled smartphone. libraries could investigate similar access and authentication applications for nfc, both for internal use (staff badges and keys) as well as for public services. particularly if nfc mo bile payment finally gains consumer attraction, library patrons may begin to expect that they can use their nfc-enabled mobile devices to replace not just their credit cards but also their library cards. already, d-tech’s rfid air self check unit allows library patrons to log into their user accounts by tapping their nfc-enabled phone to the kiosk. the patron then uses the kiosk’s rfid reader to check out their library materials and receives a receipt via email or sms. beyond its application in circulation, nfc authentication can be applied to streamline access to other services and resources of the library.30 nfc-enabled devices could be used to make reservation of library spaces, classrooms, auditoriums or community halls, digital media labs, meeting rooms , etc. library users could use nfc authentication to be able to access digital library resources, such as databases, e-journals, e-books collections, and other digital collections. nfc might allow libraries of all kinds to provide more convenient access and authentication options to users, though privacy and security considerations would certainly need to be addressed. nfc access and authentication will certainly have an impact on academic libraries. at universities where nfc access systems are deployed, student identification cards can be replaced with nfc-enabled mobile phones for afterhours services such as library building entry, wifi access, and printing, copying, and scanning services. the inconvenience of multiple logins can be eliminated. however, the libraries will have to take the responsibility of protecting student information and library resources with added security.31 promotion of library services librarians can borrow ideas from commercial implementations of nfc-based marketing to enhance promotions for library resources, services, and events. as a first step, as kane and schneidewind suggested, nfc tags can complement several promotional uses of qr codes that have already been piloted or implemented in libraries. 32 for promotional use, libraries can easily embed nfc tags in their new book displays that can be linked to the bestseller list or current acquisitions lists in the library catalog or digital collections. similarly, if the reference book collection is tagged with nfc tags, it could be linked to the relevant digital collections of databases or e-books. nfc tags can be placed on library building doors or on library promotional material by which information such as library hours, opening days, schedule of events, membership rules , or floor plans for the building could be shared. as an example, at the renison university college library in ontario, canada, visitors can tap an nfc-enabled “library smartcard” to retrieve a digital brochure of library services in a variety of formats, including pdf, epub, and mp3.33 to promote outreach programs and events instead of merely sharing links the libraries can take advantage of nfc’s interactive capabilities. as an example, libraries could use nfc tags on their event posters so that the users that can scan them and register for an event, save the event to their personal calendar, join the friends of the library program, or even download a library app. to send a text message to a librarian the users can tap the smart poster promoting a virtual reference service. nfc-enabled promotional materials can engage users with library content even when they are outside of the library building itself. a brilliantly creative example was created by the field information technology and libraries june 2020 near field communication (nfc) | singh 10 museum of chicago. it used nfc-enabled outdoor smart posters throughout the city to promote an exhibit of the 1893 world’s fair. the event posters depicted a personage from 1893 that invited the viewer to “see what they saw.” users could tap their nfc-enabled mobile device to the smart poster (or read a qr code) to download an app from the field museum that included 360° images of the fair as well as videos highlighting items in the exhibition.34 inventory control the smart packaging use case brings forward a very important question for libraries that use rfid for inventory control. first, can existing rfid tags and infrastructure be leveraged to provide additional services to patrons with nfc-enabled mobile devices? the concept is not new; walsh envisioned using library rfid tags to store book recommendations or other digital information, which users could then access with a conveniently located rfid reader. 35 what nfc brings to walsh’s vision is that a dedicated rfid reader may no longer be necessary; a patron could use their own nfc-enabled smartphone to read a tag rather than taking it to a special location to be read. indeed, with nfc, library records and metadata could theoretically be stored on and retrieved from library physical holdings themselves, allowing a patron to tap a book or resource borrowed from the library to recall its title, author, and due date. an exciting and immediate use for nfc in libraries is for self-checkout: a patron can browse the stacks and could tap an nfctagged book with their nfc-enabled phone to check it out without visiting the circulation desk or waiting in line.36 smart packaging a sector close to librarians’ hearts is publishing and several publishers have started testing smart packaging for books, using embedded nfc tags to share additional content with readers such as book reviews, reading lists, etc. with digital extras, the concept of smart packaging has significant implications for libraries as a new opportunity to connect physical collections (i.e., from books to digital media). one can envision in the future that when a user taps an nfc-enabled library book they shall get access to relevant digital information (such as bibliographic information) in a variety of citation formats, editorial reviews, the author’s biography, a projected rating for the book, and links to other similar information. borrowing and returning books one of a library’s key functions is circulating physical books from the library’s collections. due to the low cost of barcode technology, many libraries around the world are using it for circulation management. however, barcode technology has several constraints: it requires a line-of-sight to the barcode, it does not provide security of library collection, it does not offer any benefit for collection management, and it is becoming challenging for libraries to satisfy the increasing demands of their users, for example, reservation of books issued out, checking their transaction history, etc. this leads to the need to implement a new technology to improve the library circulation management, inventory, and security of library collections. librarians are known as early adopters of technology and have started using rfid to provide circulation services in a more effective and efficient manner, for security of library collections, and to satisfy the increasing demands of the users, for example putting tags in books allows them to issue multiple books together by placing stack of books near a reader. information technology and libraries june 2020 near field communication (nfc) | singh 11 recommendations according to mchugh and yarmey, the implementation of nfc has been slow and unsteady and they do not foresee an immediate implementation in libraries.37 however, they recommend that librarians learn and prepare for nfc. they recommend, for example, that librarians: • follow the progress of research and scholarship on nfc and commercial progress of nfc technology to better anticipate its adoption in your community; • experiment with nfc technology and develop prototype applications for nfc use in the library; • offer an informational workshop on nfc for users and library colleagues; • enquire from the rfid vendor about tag compatibility with nfc and rewriting the tags; • monitor the progress of security and privacy aspects of nfc technology and educate the users about these issues; develop or update your library security policy; • allow patrons to “opt-in” to any nfc services at your library, providing other modes of communication where possible; • develop and share best practices for nfc implementations; and • support research on nfc in libraries via planning grants, research forums, and conference sessions. conclusions beyond the potential benefits of nfc, librarians should also be aware of and prepared for privacy and security concerns that accompany the technology. user privacy is of the utmost concern. nfc involves users’ mobile devices generating, collecting, storing, and sharing a significant amount of personal data. several of these functions, particularly mobile payment, necessitate the exchange of highly confidential data, including but not limited to a user’s financial accounts, purchase history, etc. spam may also be a concern; sending unwanted content (e.g., advertisements, coupons, or adware) to users’ mobile devices without their consent. librarians should also use special caution when considering the implementation of nfc for library promotions or services. security is a significant concern and an active area of research, as many nfc implementations involve the exchange of sensitive financial or otherwise personal data. an important concept in nfc security, particularly in the context of mobile payment, is the idea of a tamper-proof “secure element” as a basic protection for sensitive or confidential data such as account information and credentials for authentication.38 outside of continued standardization, the most effective measures for protecting n fc data transmissions are data encryption and the establishment of a secure channel between the sending and receiving devices (e.g., using a key agreement protocol and/or via ssl). for security concerns, as with privacy concerns, librarians have a crucial role to play in user education. there are important steps that individual users can and should take to protect their devices—e.g., setting a lock code for their device, knowing how to remotely wipe a stolen phone, and installing and regularly updating antivirus software. however, many users are unaware of the vulnerability of their mobile devices and often fail to enact even basic protections. by empowering objects and people to communicate with each other at a different level and establish a “touch to share” paradigm, nfc technology has the potential to transform the information technology and libraries june 2020 near field communication (nfc) | singh 12 information environment surrounding our libraries and fundamentally alter the ways in which the library patrons interact with information. endnotes 1 doaa abdel-gaber and abdel-aleem ali, “near-field communication technology and its impact in smart university and digital library: comprehensive study,” journal of library and information sciences, 3, no. 2 (december 2015): 43-77, https://doi.org/10.15640/jlis.v3n2a4. 2 “nfc technology discover what nfc is, and how to use it,” accessed march 17, 2019, https://www.unitag.io/nfc/what-is-nfc. 3 apuroop kalapala, “analysis of near field communication (nfc) and other short range mobile communication technologies” (project report, indian institute of technology, roorkee, 2013 ), accessed march 19, 2019, https://idrbt.ac.in/assets/alumni/pt2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc) %20and%20other%20short%20range%20mobile%20communication%20technologies_2013. pdf. 4 ed, “near field communication vs radio frequency identification,” accessed march 10, 2019, http://www.nfcnearfieldcommunication.org/radio-frequency.html. 5 “what it does,” nfc forum, accessed march 12, 2019, https://nfc-forum.org/what-is-nfc/whatit-does. 6 josé bravo et al., “m-health: lessons learned by m-experiences,” sensors 18, 1569 (2018): 1–27. 10.3390/s18051569. 7 vedat coskun, busra ozdenizci, and kerem ok, “the survey on near field communication,” sensors 15, no. 6 (2015): 13348-405, https://doi.org/10.3390/s150613348. 8 coskun, ozdenizci, and ok, “the survey on near field communications,” 13352. 9 coskun, ozenizci, and ok, “the survey on near field communication.” 10 “how nfc works?,” cnrfid, accessed january 12, 2019, http://www.centrenationalrfid.com/how-nfc-works-article-133-gb-ruid-202.html. 11 coskun, ozdenizci, and ok, “the survey on near field communication,” 13352. 12 c. ruth, “nfc forum calls for breakthrough solutions for annual competition,” accessed march 21, 2019, https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-forannual-competition/. 13 m. roland, “near field communication (nfc) technology and measurements,” accessed may 12, 2019, https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182 /1ma182_5e_nfc_white_paper.pdf. 14 roland, “near field communication (nfc) technology and measurements.” https://doi.org/10.15640/jlis.v3n2a4 https://www.unitag.io/nfc/what-is-nfc https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf http://www.nfcnearfieldcommunication.org/radio-frequency.html https://nfc-forum.org/what-is-nfc/what-it-does https://nfc-forum.org/what-is-nfc/what-it-does https://doi.org/10.3390/s150613348 http://www.centrenational-rfid.com/how-nfc-works-article-133-gb-ruid-202.html http://www.centrenational-rfid.com/how-nfc-works-article-133-gb-ruid-202.html https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-for-annual-competition/ https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-for-annual-competition/ https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182/1ma182_5e_nfc_white_paper.pdf https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182/1ma182_5e_nfc_white_paper.pdf information technology and libraries june 2020 near field communication (nfc) | singh 13 15 “what is a near field communication tag (nfc tag)?,” techopedia, accessed may 27, 2019, https://www.techopedia.com/definition/28812/near-field-communication-tag-nfc-tag. 16 “what is meant by the nfc tag?,” quora, accessed july 12, 2019, https://www.quora.com/what-is-meant-by-the-nfc-tag. 17 s. profis, “everything you need to know about nfc and mobile payments,” accessed june 27, 2019, https://www.cnet.com/how-to/how-nfc-works-and-mobile-payments/. 18 “the 5 nfc tag types,” accessed march 24, 2019, https://www.dummies.com/consumerelectronics/5-nfc-tag-types/. 19 abdel-gaber and ali, “near-field communication technology and its impact in smart university and digital library,” 64–71. 20 iviane ramos de luna et al., “nfc technology acceptance for mobile payments: a brazilian perspective,” review of business management 19, no. 63 (2017): 82–103, https://doi.org/10.7819/rbgn.v0i0.2315. 21 rajiv, “applications and future of near field communication,” accessed march 14, 2019, https://www.rfpage.com/applications-near-field-communication-future/. 22 “nfc in public transport,” nfc forum, accessed april 12, 2019, http://www.smartticketing.org/downloads/papers/nfc_in_public_transport.pdf. 23 “gaming applications with rfid and nfc technology,” smarttech, accessed may 14, 2019, https://www.smarttec.com/en/applications/gaming. 24 sheli mchugh and kristen yarmey, “near field communication: recent developments and library implications,” synthesis lectures on emerging trends in librarianship 1, no. 1 (march 2014), 1–93. 25 m.k. yusof et al., “adoption of near field communication in s-library application for information science,” new library world 116, no. 11/12 (2015): 728–47, https://doi.org/10.1108/nlw-02-2015-0014. 26 yusof et al., “adoption of near field communication,” 734–36. 27 yusof et al., “adoption of near field communication,” 744. 28 yusof et al., “adoption of near field communication,” 745. 29 abdel-gaber and ali, “near-field communication technology and its impact in smart university and digital library,” 64. 30 mchugh and yarmey, “near field communication,” 27. 31 mchugh and yarmey, “near field communication,” 734. https://www.techopedia.com/definition/28812/near-field-communication-tag-nfc-tag https://www.quora.com/what-is-meant-by-the-nfc-tag https://www.cnet.com/how-to/how-nfc-works-and-mobile-payments/ https://www.dummies.com/consumer-electronics/5-nfc-tag-types/ https://www.dummies.com/consumer-electronics/5-nfc-tag-types/ https://doi.org/10.7819/rbgn.v0i0.2315 https://www.rfpage.com/applications-near-field-communication-future/ http://www.smart-ticketing.org/downloads/papers/nfc_in_public_transport.pdf http://www.smart-ticketing.org/downloads/papers/nfc_in_public_transport.pdf https://www.smarttec.com/en/applications/gaming https://doi.org/10.1108/nlw-02-2015-0014 information technology and libraries june 2020 near field communication (nfc) | singh 14 32 danielle kane and jeff schneidewind, “qr codes as finding aides: linking electronic and print library resources,” public services quarterly 7, no. 3–4 (2011): 111–24, https://doi.org/10.1080/15228959.2011.623599. 33 mchugh and yarmey, “near field communication,” 31. 34 mchugh, and yarmey, “near field communication,” 31. 35 andrew walsh, “blurring the boundaries between our physical and electronic libraries: location-aware technologies, qr codes and rfid tags,” the electronic library 29, no. 4 (2011): 429–37, https://doi.org/10.1108/02640471111156713. 36 projes roy and shailendra kumar, “application of rfid in shaheed rajguru college of applied sciences for women library, university of delhi, india: challenges and future prospects,” qualitative and quantitative methods in libraries 5, no. 1 (2016): 117–130, http://www.qqmljournal.net/index.php/qqml/article/view/310. 37 mchugh, and yarmey, “near field communication,” 61–2. 38 garima jain and sanjeet dahiya, “nfc: advantages, limits and future scope,” international journal on cybernetics & informatics 4, no. 4 (2015): 1–12, https://doi.org/10.5121/ijci.2015.4401. https://doi.org/10.1080/15228959.2011.623599 https://doi.org/10.1108/02640471111156713 http://www.qqml-journal.net/index.php/qqml/article/view/310 http://www.qqml-journal.net/index.php/qqml/article/view/310 https://doi.org/10.5121/ijci.2015.4401 abstract introduction how nfc works reader/writer mode peer-to-peer mode card-emulation mode nfc standards and specifications nfc tags applications of nfc applications of nfc in libraries: introducing the smart library mobile payments library access and authentication promotion of library services inventory control smart packaging borrowing and returning books recommendations conclusions endnotes september_ital_varnum_final editorial board thoughts: content and functionality: know when to buy ‘em, know when to code ‘em1 kenneth j. varnum2 information technologies and libraries | september 2017 3 we in library technology live in interesting times, though not those of these apocryphal curse. no, these are interesting times in the best possible way. where once there was a paucity of choice in interfaces and content, we have arrived at a time when a range of competing and valid choices exists for just about any particular technology need. data and functionality of actual utility to libraries are increasingly available not just through proprietary interfaces, but also through apis (application programming interfaces) that are ready to be consumed by locally developed applications. this has expanded the opportunity for libraries to respond more thoughtfully and strategically to local needs and circumstances than ever before. libraries are faced with an actual, rather than hypothetical, choice between building or buying fundamental user interfaces and systems as the internet has evolved, and coding has become more central to the skillset of many libraries, the capability of libraries to seriously consider building their own interfaces has grown. how does a technologically capable library make the decision to buy a complete system or build its own interface to existing data? the process can be decided using a range of criteria that can help define the library’s need for a locally managed solution. we’ll start by discussing technological capabilities needed to take on almost any development project, then define three criteria, and finally discuss the circumstances in which a build solution might be appropriate. the goal is outline a process for deciding when it make more sense to buy both the interface and the content, to build one or the other locally, or to build both. criterion 0: what are the shortand long-term technological capabilities of the library? clearly, the first point of consideration is whether the institution has the capacity to manage application development and user research. the short-term answer may be no, but the long-term answer -one based on the library’s strategic direction -may be that these skills are needed to meet the library’s goals or strategic vision. one project may not be enough to tip the scales, but if the library is continually deciding if the immediate project under discussion is the one to change the balance, then perhaps the answer is that it’s time to invest in new skillsets and capabilities. there are actually several skillsets needed to undertake development projects. individuals with coding skills are needed to adapt existing open-source software to the library’s needs — it is a rare 1 with apologies to kenny rogers 2kenneth j. varnum (varnum@umich.edu), a member of the ital editorial board, is senior program manager for discovery, delivery, and library analytics at the university of michigan library, ann arbor, mi. editorial board thoughts | varnum https://doi.org/10.6017/ital.v36i3.10087 4 open-source project that does exactly what a library needs it to do, with connectors to all the same data sources and library management tools already perfectly configured by somebody else — but that is not sufficient. a library also needs people with user interface and user research skills ensure that the application meets at least the critical needs of its own user community, and does so with language and cues that match user expectations. even if there is not a permanent capability on the library’s staff, development can take place with contract services. if this is the option selected, a library would do well to make sure that staff are sufficiently trained to make minor updates to interfaces and applications, or that a longer-term arrangement is made for ongoing maintenance and updates. criterion 1: what is the need to customize interactions to local situations? most, but not all, applications offer opportunities to match interface features and functionality with local user needs. the more interactive and core to the library’s service model the tool is, the more likely the tool is to benefit from customization. for example, a proxy server -technology that allows an authenticated user to access licensed content as if she were in the physical library or within a campus on a defined network -has little or no user interface. there is little need to customize the tool to meet user needs, beyond ensuring the list of online resources and urls subject to being proxied is up to date. there really aren’t any particularly useful apis to consumer and reproduce elsewhere, and there are easier ways to build an a-z list of licensed content than harvesting the proxy server’s configuration lists. in contrast, the link resolver -technology that takes a citation formatted according to the openurl standard and returns a list of appropriate full-text destinations to which the library has licensed access -may well be worth bringing in house. some vendors offer their software to be run locally, while others provide api access to the metadata. at my institution, we used the apis serials solutions makes available for its 360 link api to build our own interface using the opensource umlaut software. (see https://mgetit.lib.umich.edu/). why go to the trouble of recreating an interface? for several reasons, some of which (understanding user behaviors and maintaining control over user data to the extent practical) i’ll touch on in the following two sections. the main reason centered on providing a user interface consistent with the rest of our web presence, offering integrations to our document delivery service, and a way to contact our online chat service, and a way to report problem links directly to the library when the full text links provided by the system do no work. while these features are generally available through vendor interfaces, the user experience is hard to make consistent with other services we offer. criterion 2: what are the needs for integration with other systems from different providers? integrations can run in two directions: from the system under consideration to existing library or campus/community tools, and from those environmental tools to the library. when thinking about the buy-or-build decision, understanding the scope of these integrations up front is important. if all of the tools or services that need to consume information from or provide information to your information technologies and libraries | september 2017 5 system rely on well-defined standards that are broadly implemented, this criterion may be a wash; there may not be an inherent advantage to building or buying based on data exchange. if, however, the other systems are themselves tricky to work with, relying on inputs or providing outputs in a non-standard or idiosyncratic way, this situation may swing the pendulum toward building the system yourself so you can manage. for example, many course management systems on academic campuses can consume and provide data using the lti [learning tools interoperability] standard for data exchange. many traditional library applications do, as well, so if a library using an lti-compliant system needs to provide course reserves reading lists to the course management system, this is a ready-made way to make that information available. at the other extreme, bringing registrar’s data into a library catalog -to know who is in what courses to provide those patrons with an appropriate reference librarian contact for a particular subject, or access to a reading list through a course reserves system -may only be possible through customized applications to read non-standard data. in this case, to provide the desired level of service to the campus, the library may need to build local applications. criterion 3: who manages confidentiality or privacy of user interactions? a final, and increasingly significant, criterion to consider is where the library believes responsibility for patron data and information seeking behavior to reside. notwithstanding contractual or licensing obligations taken on by library vendors, the risk of inadvertent exposure or intentional sharing of user interactions is always present. one advantage of building local systems to interact with vendor systems (link resolvers, discovery platforms, etc.) is that vendor does not have access to the end-user’s ip address or any other personally identifying information. the vendor only sees a request coming from the library’s application; all requests are equal and undifferentiated. of course, once users access the target item they are seeking (an online journal, database, etc.), that particular vendor’s site has access to that information. for libraries concerned about user privacy, the risk of exposure is somewhat mitigated by managing the discovery or access layer in-house -and deciding to maintain a level of user information that suits that particular library’s comfort level -and potentially minimizing the single point of failure for breaches. at the same time, such a decision puts more responsibility on the library or its parent information technology organization to protect data from exposure. some libraries feel they can handle this responsibility -either by careful protection of the data, or by not collecting and storing it in the first place -in a way that library vendors cannot. concluding thoughts making the buy-or-build decision is not straightforward; the criteria described here are not the only ones a library might wish to consider, but they are common ones with the greatest ramifications. putting the decision process into a framework can help a library make consistent editorial board thoughts | varnum https://doi.org/10.6017/ital.v36i3.10087 6 decisions over time, enabling it to focus on the projects and systems that are most important to the library and its community (a campus, a town, or company). the open access citation advantage: does it exist and what does it mean for libraries? colby lewis information technology and libraries | september 2018 50 colby lewis (colbyllewis@gmail.com), a second year master of science in information student at the university of michigan school of information, is winner of the 2018 lita/ex libris student writing award. abstract the last literature review of research on the existence of an open access citation advantage (oaca) was published in 2011 by philip m. davis and william h. walters. this paper reexamines the conclusions reached by davis and walters by providing a critical review of oaca literature that has been published since 2011 and explores how increases in open access publication trends could serve as a leveraging tool for libraries against the high costs of journal subscriptions. introduction since 2001, when the term “open access” was first used in the context of scholarly literature, the debate over whether there is a citation advantage (ca) caused by making articles open access (oa) has plagued scholars and publishers alike.1 to date, there is still no conclusive answer to the question, or at least not one that the premier publishing companies have deemed worthy of acknowledging. there have been many empirical studies, but far fewer with randomized controls. the reasons for this range from data access to the numerous potential “methodological pitfalls” or confounding variables that might skew the data in favor of one argument or another. the most recent literature review of articles that explored the existence (or lack thereof) of an open access citation advantage (oaca) was published in 2011 by philip m. davis and william h. walters. in that review, davis and walters ultimately concluded that “while free access leads to greater readership, its overall impact on citations is still under investigation. the large access -citation effects found in many early studies appear to be artifacts of improper analysis and not the result of a causal relationship.”2 this paper seeks to reexamine the conclusions reached by davis and walters in 2011 by providing a critical review of oaca literature that have been published since their 2011 literature review.3 this paper will examine the methods and conclusions provoking such criticisms and whether these criticisms are addressed in the studies. i will begin by identifying some of the top confounders in oaca studies, in particular the potential for self-archiving bias. i will then examine articles from july 2011, when davis and walters published their findings, to july 2017. there will be a few exceptions to this time frame, but the studies cited in figures 4 and 5 are entirely from this period. in addition to reviewing oaca studies since davis and walters’ march 2011 study, i will explore the implications of an oaca on the future of publishing and the role of librarians in the subscription process. as antelman points out in her association of college and research libraries conference paper, “leveraging the growth of open access in library collection decision making,” it is the responsibility of libraries to use the newest data and technology available to them in the interest of best serving their patrons and advancing scholarship.4 in connecting oaca mailto:colbyllewis@gmail.com the open access citation advantage | lewis 51 https://doi.org/10.6017/ital.v37i3.10604 studies and the potential bargaining power an oaca could bring libraries, i assess the current roles that universities and university libraries play in promoting (or not) oa publications and the implications of an oaca for researchers, universities, and libraries, and i provide suggestions on how recent research could influence the present trajectory. i conclude by summarizing what my findings tell us about the existence (or lack thereof) of an oaca, and what these findings imp ly for the future of library journal subscriptions and the publish-or-perish model for tenure. lastly, i will suggest some alternative metrics to citations that could be used by libraries in determining future journal subscriptions and general collection management. self-archiving bias and why it doesn’t matter the idea of a self-archiving bias is based upon the concept that, if faced with a choice, authors will always opt to make their best work more widely available. effectively, when open access is not mandated, these articles may be specifically chosen to be made open access to increase readership and, hypothetically, citations.5 this biased selection method has the potential to confound the results of oaca studies because of the intuitive notion that an author’s best work is much more likely to be cited than any of their other work. its effect is amplified by making this work available oa, but it prevents studies in which articles were self-archived from being able to convincingly claim that the citation advantage these articles received was due to oa and not to its inherent quality and subsequent likelihood to be cited anyway. in a 2010 study, gargouri et al. determined that articles by authors whose institutions mandated self-archiving (such as in an institutional repository [ir]) saw an oaca just as great for articles that were mandated to be oa as for articles that were self-selected to be oa.6 this by no means proves a causal relationship between oa and ca, but does counter the notion that self -archived articles are an uncontrollable confounder that automatically compromises the legitimacy of oaca studies.7 ottaviani affirms this conclusion in a 2016 study in which he writes, “in the long run better articles gain more citations than expected by being made oa, adding weight to the results reported by gargouri et al.”8 in short, claiming that articles self-selected for self-archiving irreparably confound oaca studies ignores the fact that these authors have accounted for the likelihood that articles of higher quality will inherently be cited more. as gargouri et al. put it, “the oa advantage [to self-archived articles] is a quality advantage, rather than a quality bias” (italics in original).9 gold versus green and their effect on oaca analyses many critics of oaca studies have argued that such studies do not distinguish between gold oa, green oa, and hybrid (subscription journals that offer the option for authors to opt-in to gold oa) journals in their sample pool, thus skewing the results of their studies. in fact, there are many acknowledged subcategories of oa, but for the purposes of this paper, i will primarily focus on gold, green, and hybrid oa. figure 1, provided by elsevier as a guide for their clients, distinguishes between gold and green oa.10 while the chart provided applies specifically to those looking to publish with elsevier, it highlights the overarching differences between gold oa and green oa. a comprehensive list of oa journals is available through the directory of open access journals (doaj) website (https://doaj.org/). https://doaj.org/ information technology and libraries | september 2018 52 figure 1. elsevier explains to potential clients their options for publishing oa with elsevier and the differences between publishing with gold oa versus green oa. the argument that not distinguishing between gold oa and green oa in oaca studies distorts study results primarily stems from the potential for skew in green oa journals. green oa journals allow authors to self-archive their articles after publication, but the articles are often not made full oa until an embargo period has passed. this problem was addressed in a recent study conducted by science-metrix and 1science, who manually checked and coded approximatively 8,100 top-level domains (tlds).11 it is important to note that this study was made available as a white paper on the 1science website and has not been published in a peer-reviewed journal. additionally, 1science is a company built on providing oa solutions to libraries, which means they have a vested interest in proving the existence of an oaca. however, just as publishers such as elsevier have a vested interest in a substantial oaca not existing, this should not prevent us from examining their data. for their study, 1science did not distinguish hybrid journals as being in a distinct journal category. critics, such as the editorial director of journals policy for oxford university press, david crotty, were quick to fixate on this lack of distinction as a means of discrediting the study.12 employees of elsevier were similarly inclined to criticize the study, declaring that it, “like many others [studies] on this topic, does not appear to be randomized and controlled.”13 however, archambault et al., acknowledging that their study “does not examine the overlap between green and gold,” have provided an extremely comprehensive sample pool, examining 3,350,910 oa papers published between 2007 and 2009 in 12,000 journals.14 this paper examines the notion that “the advantage of oa is partly due to citations having a chance to arrive sooner . . . and concludes that the purported head start of oa papers is actually contrary to observed data.” 15 the open access citation advantage | lewis 53 https://doi.org/10.6017/ital.v37i3.10604 in a more recent study published in february 2018, piwowar et al. examine the prevalence of oa and average relative citation (arc) based on three sample groups of one hundred thousand articles each: “(1) all journal articles assigned a crossref doi, (2) recent journal articles indexed in web of science, and (3) articles viewed by users of unpaywall, an open-source browser extension that lets users find oa articles using oadoi.”16 unlike the 1science study, piwowar et al. had a twofold purpose: to examine the prevalence of oa articles available on the web and whether an oaca exists based on their sample findings. i do not include their results in my literature review because of the dual focus of their study, although i do compare their results with those of archambault et al. and analyze the implications of their findings. bronze: neither gold nor green in their article, piwowar et al. introduce a new category of oa publication: bronze. if gold oa refers to complete open access at the time of publication, and green oa refers to articles published in a paywalled journal but ultimately made oa either after an embargo period or via an ir, bronze oa refers to oa articles that somehow don’t fit into either of these categories. piwowar et al. define bronze oa articles as “free to read on the publisher page, but without any clearly identifiable license.”17 however, as crotty points out in a scholarly kitchen article reflecting on the preprint version of piwowar et al.’s article, “bronze” already exists as an oa category, but has simply been called “public access.”18 while coining “bronze” as a new term for “public access” is helpful in connecting it to oa terms such as “green” and “gold,” it is not quite the new phenomenon it is touted to be. arc as an indication of an oaca both archambault et al. and the authors of the 1science paper provide the arc as a means of establishing a paper’s impact on the larger research community. 19 within their arc analyses, archambault et al. distinguish between non-oa and oa, within which they differentiate between gold and green oa (figure 2). piwowar et al. group papers by closed (non-oa) and oa, with the following oa subcategories: bronze, hybrid, gold, and green oa (figure 3). an arc of 1.0 is the expected amount of citations an article will receive “based on documents published in the same year and [national science foundation (nsf)] specialty.” 20 based on this standard, articles with an arc above or below 1.0 represent a citation impact that percentage above or below the expected citation impact of like articles. for example, an article with an arc of 1.23 has received 23 percent more citations than expected for articles of similar content and quality. this scale can be incredibly useful in determining the presence of a citation advantage, and it can enable researchers to determine overall ca patterns. information technology and libraries | september 2018 54 figure 2. research impact of paywalled (not oa) versus open access (oa) papers “computed by science-metrix and 1science using oaindx and the web of science.” archambault et al., “research impact of paywalled versus open access papers,” white paper, science-metrix and 1science, 2016, http://www.1science.com/1numbr/. critics’ fixation on the “randomized and controlled” nature of the 1science study ignores the fact that the authors do not claim causation. rather, their findings suggest the existence of an oaca when comparing oa (in all forms) and non-oa (in any form) articles (see figure 2). the authors ultimately conclude that “in all these fields, fostering open access (without distinguishing between gold and green) is always a better research impact maximization strategy than relying on strictly paywalled papers.”21 unlike archambault et al., piwowar et al. found that gold oa articles had a significantly lower arc, and that the average arc of all oa balances out to 1.18 because of the high arcs of bronze (1.22), hybrid (1.31), and green (1.33). however, both studies fou nd that non-oa (referred to by piwowar et al. as “closed”) articles had an arc below 1.0, suggesting a definitive correlation between oa (without specifying type) and an increase in citations. http://www.1science.com/1numbr/ the open access citation advantage | lewis 55 https://doi.org/10.6017/ital.v37i3.10604 figure 3. “average relative citations of different access types of a random sample of world of science (wos) articles and review with a digital object identifier (doi) published between 2009 and 2015.” heather piwowar et al., “the state of oa: a large-scale analysis of the prevalence and impact of open access articles,” peerj, february 13, 2018, https://doi.org/10.7717/peerj.4375. six years and what has changed in oaca research between july 2011 and the publication of piwowar et al.’s work in february 2018, nine new oaca studies have been published in peer-reviewed journals. of these, five only look at the oaca in one field, such as cytology or dentistry. the other four are multidisciplinary studies, two of which are repository-specific and only use articles from deep blue and academia.edu, respectively. this is important to note because of critics’ earlier stated objections to the use of studies that are not randomized controlled studies. however, the deep blue study can still be considered a randomized controlled sample group because the authors are not self-selecting articles to upload to the repository as they are with academia.edu. rather, articles were made accessible through deep blue “via blanket licensing agreements between the publishers and the [university of michigan] library.”22 some of the field-specific studies use sample sizes that may not reflect a general oaca, but rather one only for that field, and in certain cases, only for a single journal. field-specific studies between july 2011 and july 2017, five field-specific studies were conducted to determine whether an oaca existed in those fields. i summarize the scope and conclusions of these studies in table 1. as you can see from the table, the article sample size vastly varied between studies, but that can likely be accounted for by considering the specific fields studied since there are only five major cytopathology journals and nearly fifty major ecology journals. piwowar et al. acknowledge this in their study, noting that the nsf assigns all science journals “exactly one ‘discipline’ (a high-level categorization) and exactly one ‘specialty’ (a finer-grained categorization).”23 the more deeply nested in an nsf discipline a subject is, the more specialized the field becomes and the fewer journals there are on the subject. this alone is reason not to extrapolate from the results of these studies and project their results on the existence of oaca across all fields. https://doi.org/10.7717/peerj.4375 information technology and libraries | september 2018 56 only two of these studies, those focused on an oaca in dentistry and ecology, can be cons idered truly randomized controlled studies. both the cytopathology and marine ecology studies chose a specific set of journals from which to draw their entire sample pool. while the dentistry and ecology studies can be considered randomized controlled in nature, they still only reflect the occurrence (or lack thereof) of an oaca in those specific fields. it would be irresponsible to allow the results from studies in a single field of a single discipline to represent oaca trends across all disciplines. therefore, it is surprising that elsevier employees use the dentistry study to make such a claim. hersh and plume write, “another recent study by hua et al (2016) looking at citations of open access articles in dentistry found no evidence to suggest that open access articles receive significantly more citations than non-open access articles.”24 the key phrase missing from the end of this analysis is in dentistry. one might question whether a claim about multidisciplinary oaca can effectively be extrapolated from a single-field analysis. the authors do, two sentences later, qualify their earlier statement by saying, “in dentistry at least, the type of article you publish seems to make a difference but not oa status.”25 that is indeed what this study seems to show, and is therefore a logical claim to make. likewise, the three empirical studies in table 1 show that, for those respective fields, oa status does correlate to a citation advantage. in the case of the ecology study, the authors are confident enough in their randomized controlled methodology to claim causation. 26 the ecology study is the most recently published oaca study, and its authors were able to learn from similar past studies about the necessary controls and potential confounders in oaca studies. with this knowledge, tang et al. determined that: by comparing oa and non-oa articles within hybrid journals, our estimate of the citation advantage of oa articles sets controls for many factors that could confound other comparisons. numerous studies have compared articles published in oa journals to those in non-oa journals, but such comparison between different journals could not rule out the impacts of potentially confounding factors such as publication time (speed) and quality and impact (rank) of the journal. these factors are effectively controlled with our focus on hybrid journals, thereby providing robust and general estimates of citation advantages on which to base publication decisions. 27 the open access citation advantage | lewis 57 https://doi.org/10.6017/ital.v37i3.10604 summary of key field-specific studies author study design content number of articles controls results, interpretation, and conclusion clements 2017 empirical 3 hybrid-oa marine ecology journals all articles published in these journals between 2009 and 2012; specific number not provided jif; article type; selfcitations “on average, open access articles received more peer-citations than nonopen access articles.” oaca found. frisch et al. 2014 empirical 5 cytopathology journals; 1 oa and 4 non-oa 314 articles published between 2007 and 2011 jif; author frequency; publisher neutrality “overall, the averages of both cpp and q values were higher for oa cytopathology journal (cytojournal) than traditional non-oa journals.” oaca found. gaulé and maystre 2011 empirical 1 major biology journal 4,388 articles published between 2004 and 2006 last author; characteristics; article quality “we find no evidence for a causal effect of open access on citations. however, a quantitatively small causal effect cannot be statistically ruled out.” oaca not found. hua et al. 2016 randomized controlled articles randomly selected from pubmed database, not specific dentistry journals 908 articles published in 2013 randomized article selection; exclusion of articles unrelated to dentistry; multidatabase search to determine oa status “in the present study, there was no evidence to support the existence of oa ‘citation advantage’, or the idea that oa increases the citation of citable articles.” oaca not found. tang et al. 2017 randomized controlled 46 hybrid-oa ecology journals 3,534 articles published between 2009 and 2013 gni of author country; randomized article pairing; article length “overall, oa articles received significantly more citations than non-oa articles, and the citation advantage averaged approximately one citation per article per year and increased cumulatively over time after publication.” oaca found. table 1. scope, controls, and results of field-specific oaca studies since 2011. based on a chart in stephan mertens, “open access: unlimited web based literature searching,” deutsches ärzteblatt international 106, no. 43 (2009): 711. jif, journal impact factor; cpp, citations per publication; q, q-value (see frisch, nora k., romil nathan, yasin k. ahmed, and vinod b. shidham. “authors attain comparable or slightly higher rates of citation publishing in an open access journal (cytojournal) compared to traditional cytopathology journals—a five year (2007–2011) experience.” cytojournal 11, no. 10 (april 2014). https://doi.org/10.4103/1742-6413.131739 for specific equation used.) https://doi.org/10.4103/1742-6413.131739 information technology and libraries | september 2018 58 summary of key multidisciplinary studies author study design content number of articles controls results, interpretation, and conclusion mccabe and snyder 2014 empirical 100 journals in ecology, botany, and multidisciplinary science all articles published in these journals between 1996 and 2005; specific number not provided jif; journal founding year “we found that open access only provided a significant increase for those volumes made openly accessible via the narrow channel of their own websites rather than the broader pubmed central platform.” oaca found. niyazov et al. 2016 empirical unspecified number of journals across 23 academic divisions 31,216 articles published between 2009 and 2012 field; jif; publication vs. upload date “we find a substantial increase in citations associated with posting an article to academia.edu. . . . we find that a typical article that is also posted to academia.edu has 49% more citations than one that is only available elsewhere online through a non-academia.edu venue.” oaca found for academia.edu. ottaviani 2016 randomized controlled unspecified number of journals who have blanket licensing agreements between the publishers and the university of michigan library 93,745 articles published between 1990 and 2013 self-selection “even though effects found here are more modest than reported elsewhere, given the conservative treatments of the data and when viewed in conjunction with other oaca studies already done, the results lend support to the existence of a real, measurable, open access citation advantage with a lower bound of approximately 20%.” oaca found. sotudeh et al. 2015 empirical 633 apc-funded oa journals published by springer and elsevier 995,508 articles published between 2007 and 2011 journals who adopted oa policies after 2007 journals with non– article processing charge oa policies “the apc oa papers are, also, revealed to outperform the ta ones in their citation impacts in all the annual comparisons. this finding supports the previous results confirming the citation advantage of oa papers.” oaca found. table 2. scope, controls, and results of multi-disciplinary oaca studies since 2011. jif, journal impact factor; apc, article processing charge; ta, toll access the open access citation advantage | lewis 59 https://doi.org/10.6017/ital.v37i3.10604 based on the randomized controlled methodology that tang et al. found hybrid journals to provide, it is possible that this study may serve as an ideal model for future larger oaca studies across multiple disciplines. however, more field-specific hybrid journal studies will have to be conducted before determining if this model would be the most accurate method for measuring oaca across multiple disciplines in a single study. multidisciplinary studies the multidisciplinary oaca studies conducted since 2011 include a single randomized control study and three empirical studies (table 2). all these studies found an oaca; in the case of niyazov et al., an oaca was found specifically for articles posted to academia.edu. i included this study because it is an important contribution to the premise that a relationship exists between self selection and oaca. niyazov et al. highlight this point in the section “sources of selection bias in academia.edu citations,” explaining that “even if academia.edu users were not systematically different than non-users, there might be a systematic difference between the papers they choose to post and those they do not. as [many] . . . have hypothesized, users may be more likely to post their most promising, ‘highest quality’ articles to the site, and not post articles they believe will be of more limited interest.”28 to underscore this point, i refer to gargouri et al., who stated that “the oa advantage [to self archived articles] is a quality advantage, rather than a quality bias” (italics in original).29 again, it is unsurprising that articles of higher caliber are cited more and that making such articles more readily available increases the amount of citations they would likely already receive. similar to my conclusion in the field-specific study section, we simply need more randomized controlled studies, such as ottaviani’s, to determine the nature and extent of the relationship between oa and ca across multiple disciplines. conclusions critics of some of the most recent studies, specifically archambault et al. and ottaviani, have argued that authors of oaca studies are too quick to claim causation. while a claim of causation does indeed require strict adherence to statistical methodology and control of potential confounders, few of the authors i have examined actually claim causation. they recognize that the empirical nature of their studies is not enough to prove causation, but rather to provide insight into the correlation between open access and a citation advantage. in all their conclusions, these authors acknowledge that further studies are needed to prove a causal relationship between oa and ca. the recent work published by piwowar et al. provides a potential model for replication by other researchers, and ottaviani offers a replicable method for other large research institutions with non-self-selecting institutional repositories. alternatively, field-specific studies conducted in the style of tang et al. across all fields would serve to provide a wider array of evidence for the occurrence of field-specific oaca and therefore of a more widespread oaca. recent developments in oa search engines have created alternative routes to many of the same articles offered by subscriptions, but at a fraction (if any) of the cost. antelman proposed that libraries use an oa-adjusted cost per download (oa-adj cpd), a metric that “subtracts the downloads that could be met by oa copies of articles within subscription journals,” as a tool for negotiating the price of journal subscriptions.30 by calculating an oa-adj cpd, libraries could information technology and libraries | september 2018 60 potentially leverage their ability to access journal articles through means other than traditional subscription bundles to save money and encourage oa publication. while antelman suggests using oa-adj cpd as a leveraging tool when making deals with publishers for journals subscriptions, i suggest that libraries use the data-gathering methods of piwowar et al. via unpaywall to determine whether enough articles from a specific journal can be found oa via unpaywall. by using metrics such as those collected by piwowar et al. through unpaywall, the potential confounding variable of articles found through illegitimate means (such as scihub) is alleviated. instead, piwowar et al.’s metrics focus on tracking the percentage of material searched by library patrons that can be found oa through the unpaywall browser extension. according to unpaywall’s “libraries user guide” page, libraries “can integrate unpaywall into their sfx, 360 link, or primo link resolvers, so library users can read oa copies in cases where there's no subscription access. over 1000 libraries worldwide are using this now. ”31 ideally, scholars will also be more willing to publish papers oa, and institutions will be more supportive of providing the necessary costs for making publications oa. though the publish-orperish model still reigns in academia, there is great potential in encouraging tenured professors to publish oa by supplementing the costs through institutional grants and other incentives wrapped into a tenure agreement. perhaps through this model, as gargouri et al. have suggested, the longstanding publish-or-perish doctrine will give way to an era of “self-archive to flourish.”32 bibliography antelman, kristin. “leveraging the growth of open access in library collection decision making.” acrl 2017 proceedings: at the helm, leading the transformation, march 22–25, baltimore, maryland, ed. dawn m. mueller (chicago: association of college and research libraries, 2017), 411–22. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/l everagingthegrowthofopenaccess.pdf. archambault, éric, grégoire côté, brooke struck, and matthieu voorons. “research impact of paywalled versus open access papers.” white papers, science-metrix and 1science, 2016. http://www.1science.com/1numbr/. calver, michael c. and j. stuart bradley. “patterns of citations of open access and non -open access conservation biology journal papers and book chapters.” conservation biology 24, no. 3 (may 2010): 872-80. https://doi.org/10.1111/j.1523-1739.2010.01509.x. chua, s. k., ahmad m. qureshi, vijay krishnan, dinker r. pai, laila b. kamal, sharmilla gunasegaran, m. z. afzal, lahri ambawatta, j. y. gan, p. y. kew, et al. “the impact factor of an open access journal does not contribute to an article’s citations” [version 1; referees: 2 approved]. f1000 research 6 (2017): 208. https://doi.org/10.12688/f1000research.10892.1. clarivate analytics. “incites journal citation reports.” dataset updated september 9, 2017. https://jcr.incites.thomsonreuters.com/. clements, jeff c. “open access articles receive more citations in hybrid marine ecology journals.” facets 2 (january 2017): 1–14. https://doi.org/10.1139/facets-2016-0032. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.1science.com/1numbr/ https://doi.org/10.1111/j.1523-1739.2010.01509.x https://doi.org/10.12688/f1000research.10892.1 https://jcr.incites.thomsonreuters.com/ https://doi.org/10.1139/facets-2016-0032 the open access citation advantage | lewis 61 https://doi.org/10.6017/ital.v37i3.10604 crotty, david. “study suggests publisher public access outpacing open access; gold oa decreases citation performance.” scholarly kitchen, october 4, 2017. https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-accessoutpacing-open-access-gold-oa-decreases-citation-performance/. crotty, david. “when bad science wins, or ‘i’ll see it when i believe it.’” scholarly kitchen, august 31, 2016. https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-itwhen-i-believe-it/. davis, philip m. “open access, readership, citations: a randomized controlled trial of scientific journal publishing.” faseb journal 25, no. 7 (july 2011): 2129–34. https://doi.org/10.1096/fj.11183988. davis, philip m., and william h. walters. “the impact of free access to the scientific literature: a review of recent research.” journal of the medical library association 99, no. 3 (july 2011): 208– 17. https://doi.org/10.3163/1536-5050.99.3.008. elsevier. “your guide to publishing open access with elsevier.” amsterdam, netherlands: elsevier, 2015. https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf. evans, james a. and jacob reimer. “open access and global participation in science.” science 323, no. 5917 (february 2009): 1025. https://doi.org/10.1126/science.1154562. eysenbach, gunther. “citation advantage of open access articles.” plos biology 4, no. 5 (may 2006): e157. https://doi.org/10.1371/journal.pbio.0040157. fisher, tim. “top-level domain (tld).” lifewire, july 30, 2017. https://www.lifewire.com/toplevel-domain-tld-2626029. frisch, nora k., romil nathan, yasin k. ahmed, and vinod b. shidham. “authors attain comparable or slightly higher rates of citation publishing in an open access journal (cytojournal) compared to traditional cytopathology journals—a five year (2007–2011) experience.” cytojournal 11, no. 10 (april 2014). https://doi.org/10.4103/1742-6413.131739. gaulé, patrick, and nicolas maystre. “getting cited: does open access help?” research policy 40, no. 10 (december 2011): 1332–38. https://doi.org/10.1016/j.respol.2011.05.025. gargouri, yassine, chawki hajjem, vincent larivière, yves gingras, les carr, tim brody, and stevan harnad. “self-selected or mandated, open access increases citation impact for higher quality research.” plos one 5, no. 10 (october 2010). https://doi.org/10.1371/journal.pone.0013636. hajjem, chawki, stevan harnad, and yves gingras. “ten-year cross-disciplinary comparison of the growth of open access and how it increases research citation impact.” ieee data engineering bulletin 28, no. 4 (december 2005): 39-46. hall, martin. “green or gold? open access after finch.” insights 25, no. 3 (november 2012): 235– 40. https://doi.org/10.1629/2048-7754.25.3.235. https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://doi.org/10.1096/fj.11-183988 https://doi.org/10.1096/fj.11-183988 https://doi.org/10.3163/1536-5050.99.3.008 https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf https://doi.org/10.1126/science.1154562 https://doi.org/10.1371/journal.pbio.0040157 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.lifewire.com/top-level-domain-tld-2626029 https://doi.org/10.4103/1742-6413.131739 https://doi.org/10.1016/j.respol.2011.05.025 https://doi.org/10.1371/journal.pone.0013636 https://doi.org/10.1629/2048-7754.25.3.235 information technology and libraries | september 2018 62 hersh, gemma, and andrew plume. “citation metrics and open access: what do we know?” elsevier connect, september 14, 2016. https://www.elsevier.com/connect/citation-metrics-andopen-access-what-do-we-know. houghton, john, and alma swan. “planting the green seeds for a golden harvest: comments and clarifications on ‘going for gold.’” d-lib magazine 19, no. 1/2 (january/february 2013). https://doi.org/10.1045/january2013-houghton. hua, fang, heyuan sun, tanya walsh, helen worthington, and anne-marie glenny. “open access to journal articles in dentistry: prevalence and citation.” journal of dentistry 47 (april 2016): 41– 48. https://doi.org/10.1016/j.jdent.2016.02.005. internet corporation for assigned names and numbers. “list of top-level domains.” last updated september 13, 2018. https://www.icann.org/resources/pages/tlds-2012-02-25-en. jump, paul. “open access papers ‘gain more traffic and citations.’” times higher education, july 30, 2014. https://www.timeshighereducation.com/home/open-access-papers-gain-more-trafficand-citations/2014850.article. mccabe, mark j., and christopher m. snyder. “identifying the effect of open access on citations using a panel of science journals.” economic inquiry 52, no. 4 (october 2014): 1284–1300. https://doi.org/10.11111/ecin.12064. mccabe, mark j., and christopher m. snyder. “does online availability increase citations? theory and evidence from a panel of economics and business journals.” review of economics and statistics 97, no. 1 (march 2015): 144–65. https://doi.org/10.1162/rest_a_00437. mertens, stephan. “open access: unlimited web based literature searching.” deutsches ärzteblatt international 106, no. 43 (2009): 710–12. https://doi.org/10.3238/arztebl.2009.0710. moed, hank. “does open access publishing increase citation or download rates?” research trends 28 (may 2012). https://www.researchtrends.com/issue28-may-2012/does-open-accesspublishing-increase-citation-or-download-rates/. niyazov, yuri, carl vogel, richard price, ben lund, david judd, adnan akil, michael mortonson, josh schwartzman, and max shron. “open access meets discoverability: citations to articles posted to academia.edu.” plos one 11, no. 2 (february 2016): e0148257. https://doi.org/10.1371/journal.pone.0148257. ottaviani, jim. “the post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course).” plos one 11, no. 8 (august 2016): e0159614. https://doi.org/10.1371/journal.pone.0159614. pinfield, stephen, jennifer salter, and peter a. bath. “a ‘gold-centric’ implementation of open access: hybrid journals, the ‘total cost of publication,’ and policy development in the uk and beyond.” journal of the association for information science and technology 68, no. 9 (september 2017): 2248–63. https://doi.org/10.1002/asi.23742. piwowar, heather, jason priem, vincent larivière, juan pablo alperin, lisa matthias, bree norlander, ashley farley, jevin west, and stefanie haustein. “the state of oa: a large-scale https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://doi.org/10.1045/january2013-houghton https://doi.org/10.1016/j.jdent.2016.02.005 https://www.icann.org/resources/pages/tlds-2012-02-25-en https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article https://doi.org/10.11111/ecin.12064 https://doi.org/10.1162/rest_a_00437 https://doi.org/10.3238/arztebl.2009.0710 https://www.researchtrends.com/issue28-may-2012/does-open-access-publishing-increase-citation-or-download-rates/ https://www.researchtrends.com/issue28-may-2012/does-open-access-publishing-increase-citation-or-download-rates/ https://doi.org/10.1371/journal.pone.0148257 https://doi.org/10.1371/journal.pone.0159614 https://doi.org/10.1002/asi.23742 the open access citation advantage | lewis 63 https://doi.org/10.6017/ital.v37i3.10604 analysis of the prevalence and impact of open access articles.” peerj (february 13, 2018): 6:e4375. https://doi.org/10.7717/peerj.4375. research information network. “nature communications: citation analysis.” press release, 2014. https://www.nature.com/press_releases/ncomms-report2014.pdf. riera, m. and e. aibar. “¿favorece la publicación en abierto el impacto de los artículos científicos? un estudio empírico en el ámbito de la medicina intensive” [does open access publishing increase the impact of scientific articles? an empirical study in the field of intensive care medicine]. medicina intensiva 37, no. 4 (may 2013): 232-40. http://doi.org/10.1016/j.medin.2012.04.002. sotudeh, hajar, zahra ghasempour, and maryam yaghtin. “the citation advantage of author-pays model: the case of springer and elsevier oa journals.” scientometrics 104 (june 2015): 581–608. https://doi.org/10.1007/s11192-015-1607-5. swan, alma, and john houghton. “going for gold? the costs and benefits of gold open access for uk research institutions: further economic modelling.” report to the uk open access implementation group, june 2012. http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-openaccess-implementation-group-final.pdf. tang, min, james d. bever, and fei-hai yu. “open access increases citations of papers in ecology.” ecosphere 8, no. 7 (july 2017): 1–9. https://doi.org/10.1002/ecs2.1887. unpaywall. “libraries user guide.” accessed september 13, 2018. https://unpaywall.org/userguides/libraries. wray, k. brad. “no new evidence for a citation benefit for author-pay open access publications in the social sciences and humanities.” scientometrics 106 (january 2016): 1031–35. https://doi.org/10.1007/s11192-016-1833-5. endnotes 1 elsevier, “your guide to publishing open access with elsevier” (amsterdam, netherlands: elsevier, 2015), 2, https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf. 2 philip m. davis and william h. walters, “the impact of free access to the scientific literature: a review of recent research,” journal of the medical library association 99, no. 3 (july 2011): 213, https://doi.org/10.3163/1536-5050.99.3.008. 3 david and walters, “the impact of free access,” 208. 4 kristin antelman, “leveraging the growth of open access in library collection decision making,” acrl 2017 proceedings: at the helm, leading the transformation, march 22–25, baltimore, maryland, ed. dawn m. mueller (chicago: association of college and research libraries, 2017): 411, 413, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/leveragingthegrowthofopenaccess.pdf. https://doi.org/10.7717/peerj.4375 https://www.nature.com/press_releases/ncomms-report2014.pdf http://doi.org/10.1016/j.medin.2012.04.002 https://doi.org/10.1007/s11192-015-1607-5 http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-open-access-implementation-group-final.pdf http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-open-access-implementation-group-final.pdf https://doi.org/10.1002/ecs2.1887 https://unpaywall.org/user-guides/libraries https://unpaywall.org/user-guides/libraries https://doi.org/10.1007/s11192-016-1833-5 https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf http://jmla.mlanet.org/ https://doi.org/10.3163/1536-5050.99.3.008 http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf information technology and libraries | september 2018 64 5 research information network, “nature communications: citation analysis,” press release, 2014, https://www.nature.com/press_releases/ncomms-report2014.pdf. 6 gargouri et al., “self-selected or mandated, open access increases citation impact for higher quality research,” plos one 5, no. 10 (october 2010): 17, https://doi.org/10.1371/journal.pone.0013636. 7 david crotty, “when bad science wins, or ‘i’ll see it when i believe it’,” scholarly kitchen, august 31, 2016, https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-seeit-when-i-believe-it/. 8 jim ottaviani, “the post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course),” plos one 11, no. 8 (august 2016): 9, https://doi.org/10.1371/journal.pone.0159614. 9 gargouri et al., “self-selected or mandated,” 18. 10 elsevier, “your guide to publishing,” 2. 11 top-level domain (tld) refers to the last string of letters in an internet domain name (i.e., the tld of www.google.com is .com). for more information on tlds, see tim fisher, “top-level domain (tld),” lifewire, july 30, 2017, https://www.lifewire.com/top-level-domain-tld2626029. for a full list of tlds, see “list of top-level domains,” internet corporation for assigned names and numbers, last updated september 13, 2018, https://www.icann.org/resources/pages/tlds-2012-02-25-en. 12 crotty, “when bad science wins.” 13 hersh and plume, “citation metrics and open access: what do we know?,” elsevier connect, september 14, 2016, https://www.elsevier.com/connect/citation-metrics-and-open-accesswhat-do-we-know. 14 archambault et al., “research impact of paywalled versus open access papers,” white paper, science-metrix and 1science, 2016, http://www.1science.com/1numbr/. 15 archambault et al., “research impact.” 16 heather piwowar et al., “the state of oa: a large-scale analysis of the prevalence and impact of open access articles,” peerj, february 13, 2018, https://doi.org/10.7717/peerj.4375. 17 piwowar et al., “the state of oa,” 5. 18 david crotty, “study suggests publisher public access outpacing open access; gold oa decreases citation performance,” scholarly kitchen, october 4, 2017, https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-accessoutpacing-open-access-gold-oa-decreases-citation-performance/. https://www.nature.com/press_releases/ncomms-report2014.pdf https://doi.org/10.1371/journal.pone.0013636 https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://doi.org/10.1371/journal.pone.0159614 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.icann.org/resources/pages/tlds-2012-02-25-en https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know http://www.1science.com/1numbr/ https://doi.org/10.7717/peerj.4375 https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ the open access citation advantage | lewis 65 https://doi.org/10.6017/ital.v37i3.10604 19 archambault et al., “research impact”; piwowar et al., “the state of oa,” 15. 20 piwowar et al., “the state of oa,” 9–10. 21 archambault et al., “research impact.” 22 ottaviani, “the post-embargo open access citation advantage,” 2. 23 piwowar et al., “the state of oa,” 9. 24 hersh and plume, “citation metrics and open access.” 25 hersh and plume, “citation metrics and open access.” 26 tang et al., “open access increases citations of papers in ecology,” ecosphere 8, no. 7 (july 2017): 8, https://doi.org/10.1002/ecs2.1887. 27 tang et al., “open access increases citations,” 7. tang et al. list the following as examples of the “numerous studies” as quoted above, which i did not include in the quote for the purpose of brevity: (antelman 2004, hajjem et al. 2005, eysenbach 2006, evans and reimer 2009, calver and bradley 2010, riera and aibar 2013, clements 2017). 28 yuri niyazov et al., “open access meets discoverability: citations to articles posted to academia.edu,” plos one 11, no. 2 (february 2016): e0148257, https://doi.org/10.1371/journal.pone.0148257. 29 gargouri et al., “self-selected or mandated,” 18. 30 antelman, “leveraging the growth,” 414. 31 “library user guide,” unpaywall, accessed september 13, 2018, https://unpaywall.org/userguides/libraries.<> 32 gargouri et al., “self-selected or mandated,” 20. https://doi.org/10.1002/ecs2.1887 https://doi.org/10.1371/journal.pone.0148257 https://unpaywall.org/user-guides/libraries https://unpaywall.org/user-guides/libraries abstract introduction self-archiving bias and why it doesn’t matter gold versus green and their effect on oaca analyses bronze: neither gold nor green arc as an indication of an oaca six years and what has changed in oaca research field-specific studies summary of key field-specific studies summary of key multidisciplinary studies multidisciplinary studies conclusions bibliography endnotes of the people, for the people: digital literature resource knowledge recommendation based on user cognition wen lou, hui wang, and jiangen he information technology and libraries | september 2018 66 wen lou (wlou@infor.ecnu.edu.cn) is an assistant professor in the faculty of economics and management, east china normal university. hui wang (1830233606@qq.com) is a graduate student in the faculty of economics and management, east china normal university. jiangen he (jiangen.he@drexel.edu) is a doctoral student in the college of computing and informatics, drexel university. abstract we attempt to improve user satisfaction with the effects of retrieval results and visual appearance by employing users’ own information. user feedback on digital platforms has been proven to be one type of user cognition. through conducting a digital literature resource organization model based on user cognition, our proposal improves both the content and presentation of retrieval systems. this paper takes powell's city of books as an example to describe the construction process of a knowledge network. the model consists of two parts. in the unstructured data part, synopses and reviews were recorded as representatives of user cognition. to build the resource category, linguistic and semantic analyses were used to analyze the concepts and the relationships among them. in the structural data part, the metadata of every book was linked with each other by informetrics relationships. the semantic resource was constructed to assist with building the knowledge network. we conducted a mock-up to compare the new category and knowledge-recommendation system with the current retrieval system. thirty-nine subjects examined our mock-up and highly valued the differences we made for the improvements in retrieval and appearance. knowledge recommendation based on user cognition was tested to be positive based on user feedback. there could be more research objects for digital resource knowledge recommendations based on user cognition. introduction the concept of user cognition originates in cognitive psychology. this concept principally explores the human cognition process through information-processing methods.1 the concept characterizes a process in which a user obtains unknown information and knowledge through acquired information. as information-science workers, we may explore the psychological activities of users by analyzing their cognitive processes when they are using information services.2 a knowledge-recommendation service based on user cognition has become essential since it emphasizes facilitating collaborations between humans and computers and promotes the participation of users, which ultimately improves user satisfaction. a knowledge-recommendation system is based on a combination of information organization, a retrieval system, and knowledge visualization.3 however, when exploring digital online literature resources, it is difficult to quickly and precisely find what we want because of the problem of information organization and retrieval. most search results only display a one-by-one list view. mailto:2012101040015@whu.edu.cn mailto:1830233606@qq.com mailto:jiangen.he@drexel.edu of the people, for the people | lou, wang, and he 67 https://doi.org/10.6017/ital.v37i3.10060 thus, adding visualization techniques to an interface could improve user satisfaction. furthermore, the retrieval system and visualizations rely on information organization. only if information is well designed can the retrieval system and visualization be useful. therefore, we attempt to improve retrieval efficiency by proposing a digital literature resource organization model based on user cognition to improve both the content and presentation of retrieval systems. taking powell’s city of books as an example, this paper proposes user feedback as first-hand user information. we will focus on (1) resource organizations based on user cognition and (2) new formats on search results based on knowledge recommendations. we will purposefully employ data from users’ own information and give knowledge back to users in accordance with the quote “of the people, for the people.” related work user cognition and measurement user cognition usually consists of a series of processes, including feeling, noticing, temporary memory, learning, thinking, and long-term memory.4 feeling and noticing are at an inferior level, while learning, thinking, and memory are comparatively superior. researchers have so far tried to identify user cognition processes by analyzing user needs. there are four levels of user needs according to ma and yang5 (see figure 1.) in turn, user interests normally reflect potential user needs. users who retrieve information on their own show feeling needs. users who give feedback show expression needs. users who ask questions show knowledge needs, which is the highest level. the methods to quantify user cognition require visible and measurable variables. existing studies have commonly used website log analysis or user surveys. website log analysis has been proven to be a solid data source to record and analyze both user interests and information needs.6 user surveys, including online questionnaires and face-to-face interviews, have been widely used to comprehend user feelings and user satisfaction.7 user surveys generally measure two kinds of relationship: between users and digital services and between users and the digital community.8 with a survey, we can make the most of statistics and assessment studies to analyze user satisfaction about an array of standards and systems of existing service platforms, service environments, service quality, and service personnel, which provides some references and suggestions for future study of user experience quality, platform elements, interaction process , and more.9 however, neither log data nor surveys can obtain first-hand user information in reallife settings. eye tracking and the concept-map method can be used to understand user behavior in the course of user testing.10 however, these approaches are difficult to adapt to a large group of users. therefore, a linguistic-oriented review analysis has become an increasingly important method. user content, including reviews and tags, could be analyzed through text mining and become valuable data sources to learn their preferences for the product and service in the areas of electronic commerce and digital libraries.11 this type of data has been called “more than words.”12 information technology and libraries | september 2018 68 figure 1. understanding user cognition by analyzing user needs. user-oriented knowledge service model the user-oriented service model includes user demand, user cognition, and user information behavior. a service model based on user demand chiefly concentrates on the motives, habits, regularities, and purposes of user demand to identify the model of use demand so that the appropriate service is adopted.13 service models based on user cognition attach importance to the process of user cognition, the influence that users are facing,14 and the change of library information services under the effects of series of cognitive processes (such as feeling, receiving, memorizing, and thinking).15 a service model based on user information behavior focuses on interactive behavior in the process of library information services that users participate in, such as interactions with academic librarians, knowledge platforms,16 and others. studies have paid more attention to the pre-process of the user-oriented service model, which analyzes information habits and user behaviors.17 studies have also proposed frameworks of knowledge services, design innovations,18 or personalized systems and frames of the knowledge service model, but they have not succeeded in implementing or performing user testing. knowledge service system construction most studies of knowledge service system construction are in business areas. numerous studies have explored knowledge-innovation systems for product services.19 cheung et al. proposed a knowledge system to improve customer service.20 vitharana, jain, and zahedi composed a knowledge repository to enhance the knowledge-analysis skills of business consultants.21 from of the people, for the people | lou, wang, and he 69 https://doi.org/10.6017/ital.v37i3.10060 the angle of user demand, zhou analyzed the elements of service-platform construction and found that crucial platforms should serve knowledge service system construction. 22 scholars proposed basic models for knowledge management and knowledge sharing, but they did not simulate their applications.23 knowledge management from the library-science perspective is very different from that in the business area. library knowledge management usually refers to a digital library, especially a personal digital library.24 others explore and attempt to construct a personalized knowledge service system,25 while fewer studies about system designs are based on the results of user surveys in accordance with documented surveys. we rarely see a user-feedback study combined with the method of using users’ own knowledge. users themselves know what they desire. if user-oriented studies separate the system design from user-needs analysis or the other way around, the studies may miss the purpose. therefore, we propose a resource-organization method based on users’ own knowledge to close the distance between the users and the system. resource-organization model based on user cognition there are normally two ways to construct a category system. one method gathers experts to determine categories and assign content to them; the category system comes first and the content second. the other method is to derive a category tree from the content itself, as we propose in this paper. in this way, the content takes priority over the categorization system. in this paper, we focus on this second way to organize resources and index content. resource organization requires a series of steps, including information processes, extraction, and organization. figure 2 shows the resource-organization model based on user cognition. this model fits the needs of digital resources with comments and reviews. the model has two interrelated parts. one is for indexing the content, and the other is for knowledge recommendations. for the first part, the model integrates all the comments and reviews of all literature in an area or the whole resource. the core concepts and the relationships among the concepts are extracted through natural language processing. the relationships between concepts are either subordination and correlation. a triple consists of two core concepts and their relationship. the triple set includes all triples. next, all books are indexed by taxonomy in the new category system. however, the indexing of every book is not based upon the traditional method, which is to manually determine each category by reading the literature. we use a method based on the books’ content. while we are extracting the core concepts from all books we extract the core concepts from every book by the same semantic-analysis methods and build up triples for the individual book. then the triples of this book can match the triple set in the new category system. once a triple in a single book yields a maximum matching value, the core concepts in the triple set will be indexed as the keywords of the book. a few examples of the matching process will be discussed in the empirical study (in the section “indexing books”). the first part is about comments and reviews, which are unstructured data. the second part is to make use of structural data in the bibliography to build a semantic network. structural data, including titles, keywords, authors, and publishers, is stored separately. we calculate the information technology and libraries | september 2018 70 informetrics relationships among the entities. the relationships can be among different entities, such as between one author and another or between an author and a publisher. then two entities and their relationship compose a triple. the components in triples are linked to each other, which makes them semantic resources. furthermore, the keywords in structural data are not the original keywords before the new category system but are the modified keywords. finally, the reindexed resources (books in the new category) and semantic resources (the triples from structural data) are both used to build the knowledge network. figure 2. resource-organization model based on user cognition. however, why is it important to use both unstructured data and structural data? the reason is to complete the entire content of a literature resource. neither of them can fully represent the whole semantics for a literature resource. structural data lacks subjective content, and unstructured data lacks basic information. thus, a full semantic network can be built using both kinds of data. of the people, for the people | lou, wang, and he 71 https://doi.org/10.6017/ital.v37i3.10060 resource-organization experiment object selection located in portland, oregon, powell’s city of books (hereafter referred to as “book city”) is one of the largest bookstores in the united states, with 200 million books in its inventory. book city caught our eyes for four reasons. (1) the comments and reviews of books on book city’s website are well constructed and plentiful. the national geographic channel established it as one of the ten best bookstores in the world.26 atlantis books, pendulo, and munro's books are also on the list. among these bookstores, only book city and munro’s books have indexed the information of comments and reviews. since user reviews are fundamental to this study, we restricted ourselves to bookstores that provided user reviews. (2) we excluded libraries because literature resources have been well organized in libraries. it might not be necessary to reorganize them according to user cognition. however, we can put this topic in the future study. (3) book city is a typical online bookstore that also has a physical bookstore. unlike amazon, book city, indigo, barnes & noble, and munro’s books have physical bookstores. however, they all have technological limitations on retrieval-system and taxonomical construction compared to amazon. thus, it is necessary to investigate these bookstores’ online systems and optimize them. (4) the location was geographically convenient to the researchers. the authors are more familiar with book city than other bookstores. moreover, we plan on conducting a face-to-face interview for the user study. it is doable only if the authors can get to the bookstore and the users who live there. in all, we choose book city as a representative object. data collection and processing on december 22, 2015, we randomly selected the field “cooking and food” and downloaded bibliographic data for 462 new and old books that included title, picture, synopsis and review, isbn, publication date, author, and keywords. in our previous work we described how metadata for all kinds of literature can be categorized into one of three types: structural data, semistructural data, and unstructured data.27 (see table 1). title, isbn, date, publisher, and author are classified as structural data. titles can be seen as structural data or unstructured data depending on the need. titles will be considered as an indivisible entity in this paper as titles need to retain their original meanings. keywords are considered as semistructural data for two reasons: (1) normally one book is indexed with multiple keywords, which are natural language; and (2) keywords are separated by punctuation. each keyword can individually exist with its own meaning. however, in the current category system, keywords are the names of categories and subcategories. since we are about to reorganize the category system, the current keywords will not be included in the following steps. we use the field “synopsis and review” in the downloaded bibliographic records as the source of user cognition. synopses and reviews are classified as unstructured data. all synopses and reviews of a single book are first incorporated into one paragraph, since some books contain more than one review. structural data will be stored for constructing a knowledge network. unstructured data will be part-of-speech tagged and word segmented by the stanford segmenter. all the books’ metadata are stored into the defined three data types and separate fields. each field is linked by the isbn as the primary key. information technology and libraries | september 2018 72 category organization first, the frequencies of words in all books are separately calculated after word segmenting so that core concepts are identified by the frequencies of words. in total, 29,370 words appeared 43,675 times, after excluding stop words. the 206 words in the sample that occurred more than 105 times appeared 34,944 times. this subset was defined as the core words according to the pareto principle. table 1. data sample. field content data type title a modern way to eat: 200+ satisfying vegetarian recipes structural data isbn 9781607748038 date 04/21/2015 publisher ten speed press author anna jones kwds cooking and food-vegetarian and natural semistructural data synopsis and review a beautifully photographed and modern vegetarian cookbook packed with quick, healthy, and fresh recipes that explore the full breadth of vegetarian ingredients—grains, nuts, seeds, and seasonal vegetables—from jamie oliver's london-based food stylist and writer anna jones. how we want to eat is changing. more and more people cook without meat several nights a week and are constantly seeking to . . . unstructured data we are inspired by zhang et al., who described a linguistic-keywords-extraction method by defining multiple kinds of relationships among words.28 the relationships include direct relationship, indirect relationship, part-whole relationship, and related relationship. • direct relationship. two core words have a relationship directly to each other. • indirect relationship. two core words are related and linked by another word as a media. • part-whole relationship. the “is a” relation. one core word belongs to the other. it is the most common relationship in context. • related relationship. two core words have no relationships but they both appear in a large context. the first two relationships can be mixed with the second two relationships. for instance, a partwhole relationship can have either a direct relationship or an indirect relationship. for this study, we combined every two core words into pairs for analysis. for example, the sentence “a picnic is a great escape from our day-to-day and a chance to turn a meal into something more festive and memorable” would result in several core-word pairs, including of the people, for the people | lou, wang, and he 73 https://doi.org/10.6017/ital.v37i3.10060 “picnic” and “meal,” “picnic” and “festive,” and “meal” and “festive.” for “picnic” and “meal,” there is an obvious part-whole relationship in this context. we observed all their relationships in all books and determined their relationship as a direct part-whole relationship because 67 percent of their relationships are part-whole relationship, 80 percent are direct relationship, and others are related relationship. this is the case when two core words are in the same sentence. for two words in different sentences but within one context, we define the words’ relationship as a sentence relationship. for example, “ingredient” and “meat” in one review in table 1 have an indirect relationship because they are connected by other core concepts between them. therefore, the relationship between “ingredient” and “meat” is an indirect part-whole one in this context. for other cases, two concepts are either related if they appear in the same context or are not related if they do not appear in the same review. thus, all couples of concepts are calculated and stored as semantic triples. figure 3. parts of a modified category in “cooking and food” based on user cognition. the next step is to build up a category tree (figure 4). a direct part-whole relationship is that between a parent class and child class. an indirect part-whole relationship is the relationship between a parent class and a grandchild class. a related relationship is the relationship between sibling classes. information technology and libraries | september 2018 74 compared to the modified category system (figure 3), the current hierarchical category system (figure 4) has two major issues. first, some categories’ names are duplicated. for example, the child class “by ingredient” contains “fruit,” “fruits and vegetables,” and “fruits, vegetables, and nuts.” second, there are categories without semantic meaning, such as “oversized books.” these two problems brought out disorderly indexing and recalled many irrelevant results. for example, the system would let you refine your search first if you type one word in search box. however, refining is confusing by parent class and children class. searching “diet” books as an example, the system suggests you refine your search from five subcategories of “diet and nutrition” under three different parent classes. however, the modified category system has avoided the duplicated keywords. furthermore, the hierarchical system based on users’ comments maintains meaning. figure 4. parts of current category system in “cooking and food.” indexing books we found that the list of keywords was confusing due to the inefficiency of the previous category system. it is necessary to re-index the keywords of each book based on the modified category system. we stand on the data-oriented indexing process. the method to detect the core concepts of each book is the same as that for all books in section 4.3. taking the book a modern way to eat as an example, triples are extracted from the book, including “grain-direct part whole-ingredient,” “nut-direct part whole-ingredient,” “vegetarian-related-health,” and so on. using all triples of the book to match with the triples set from all books in section 4.3, we index this book to categories by the best match parent class. in this case, 5 out of 9 triples of a modern way to eat are matched with the parent class “ingredient.” another two are matched with “natural” and “technique,” and of the people, for the people | lou, wang, and he 75 https://doi.org/10.6017/ital.v37i3.10060 the other two cannot correctly match with the triples set. then, a modern way to eat will be indexed with “cooking and food-ingredient,” “cooking and food-natural,” and “cooking and food-technique.” 4.5 semantic-resource construction the semantic resource is constructed based on structural data that was prepared at the beginning. the informetrics method (specifically co-word analysis) will be used to extract the precise relationship among the bibliography of books, as we previously proposed.29 we construct all structural data together and conduct co-words matrixes between each title, publisher, date, author, and keyword. for example, the author “anna jones” co-occurred with many keywords to varying degrees. the author co-occurred with the keyword “natural” four times and “person” seven times. according to qiu and lou, the precise relationship needs to be divided by the threshold and formatted as literal words.30 therefore, among the degree of all relationships between “anna jones” and other keywords, the relationship between “anna jones” and “natural” is highly correlated, and the relationship between “anna jones” and “person” is extremely correlated. triples are composed of two concepts and their relationships. then a semantic resource is finally constructed that could be used for knowledge retrieval. figure 5. an example of the knowledge network. once the semantic resource is ready, the knowledge network is presentable. we adopted d3.js to display the knowledge network (figure 5). the net view automatically exhibits several books related with an author william davis, which is placed in a conspicuous position on the screen. the forced map can be reformed when users drag any book with the mouse, which will be the noticeable center of other books. the network can connect with the database and the website. information technology and libraries | september 2018 76 5. user-experience study on knowledge display and recommendation there are two common ways to evaluate a retrieval system. one is to test the statistic results, such as the recall and precision. the other is a user study. since our aim is “of the people, for the people,” we chose to conduct two user-experience studies over the statistical results. as such, we can obtain what users suggest and comment on our approach. user-experience study design in february 2016, with the help of friends, we recruited volunteers by posting fliers in portland, oregon. fifty volunteers contacted us. thirty-nine responses were received by the end of march 2016 because the other eleven volunteers were not able to enroll in the electronic test. since we needed to test the feasibility of both the new indexing category and the knowledge recommendation, we set up the user study into two parts, including the comparison of the simple retrieval and the knowledge recommendation. first, we requested permission to use the data source and website frame from book city. however, we cannot construct a new website for book city due to intellectual-property issues. therefore, we constructed a practical mock-up to guide users to simulate a retrieval experiment. following the procedure of the user experience design, we chose mockingbot (https://mockingbot.com) as the mock-up builder. mockingbot allows the demo users to experience a vivid system that will be developed later. the mock-up supports every tag that can be linked with other pages so that subjects could click on the mock-up just as they would on a real website. the demo is expected to help us (1) examine whether our changes would meet the users’ satisfaction and (2) gather information for a better design. then we performed face-to-face, userguided interviews to first gain experience on the previous retrieval system and then compare them with our results. we concurrently recorded the answers and scores of users’ feedback. in the following sections, we will describe the interview process and present the feedback results. study 1: comparison of simple retrieval first, subjects were asked to search related books written by “michael pollan” at powells.com (figure 6). as such, all subjects used the search box based on their instincts. then they were asked to find a new hardcover copy of a book named cooked: a natural history of transformation. we paid attention to the ways that subjects located the target. only five of them used keyboard shortcuts to find the target. however, thirteen subjects stated their concerns regarding the absence of refinement options. furthermore, we noticed that six subjects swept (moused over) the refinement area and then decided to continue eye screening. in the meantime, we recorded the time they spent looking for the item. after they found the target, all subjects gave us a score from one to ten that represented their satisfaction with the current retrieval system. of the people, for the people | lou, wang, and he 77 https://doi.org/10.6017/ital.v37i3.10060 figure 6. screenshot of retrieval results in the current system. in the comparison experiment, we placed our mock-up in front of subjects and conducted the same exam above. in the mock-up, we used the basic frame of the retrieval system but reframed the refinement area. in the new refinement area (figure 7), we added an optional box with refinement keywords in the left column to narrow the search scope. the logic of the refined keywords comes from the indexing category, as we mentioned in the section on the indexing books. “michael pollan” was indexed in six categories: “biographies,” “children’s books,” “cooking and food,” “engineering manufactures,” “hobby and leisure,” and “gardening.” thus, when subjects clicked the “cooking and food” category, they can refine the results to only twelve books rather than the seventy books in the current system. users can obtain accurate retrieval results faster. after the subjects completed their tasks, they gave us a score from one to ten representing their satisfaction with the modified retrieval system. figure 7. refinement results in the modified category-system mock-up. information technology and libraries | september 2018 78 study 2: knowledge recommendation in this experiment, we conducted two tests for two functions on knowledge visualization. one tested the preferences for the net view, and the other tested the preferences for the individual recommendation. for the net view, we guided subjects to search for “william davis” in the mock-up and reminded them to click the net view button after the system recalled a list view. then, the subjects could see the net view results in figure 5. we recorded the scores that they gave for the net view. as for the recommendation on individual books, we adopted multiple layers of associated retrieval results for every book. users could click on one book and another related book would show in a new tab window. we asked subjects to conduct a new search for “william davis.” then they could browse the website and freely click on any book. once they clicked on davis’s book wheat belly: lose the wheat, lose the weight, and find your path back to health, the first recommendation results popped up (figure 8). the recommendation results about wheat in the field of “grain and bread” showed up, including good to the grain: baking with whole grain flours and bread bakers apprentice: mastering the art of extraordinary bread. others about health and losing weight showed up also, such as paleo lunches and breakfasts on the go. all related books appeared because the first book is about both wheat and a healthy diet. a new window showing relevant authors and titles would pop up if the mouse glided over any picture. we asked the subjects about their thoughts on the new recommendation format and recorded the scores. figure 8. an example of knowledge recommendation. users’ feedback as a result, knowledge organization and retrieval received a positive response (tables 2 and 3). first, subjects complained about the inefficiency of the current retrieval system in that it took so long to find one book without using shortcut keys (ctrl-f). three quarters of them were not satisfied with the original search style due to the search time length. however, 67 percent of the subjects gave a score of more than eight points for the refined search results of our new system. of the people, for the people | lou, wang, and he 79 https://doi.org/10.6017/ital.v37i3.10060 only two of them thought that it was useless since they were the two users who only took ten seconds to target the exact result. second, 67 percent and 74 percent of the subjects, respectively, thought that the knowledge recommendation and net view were useful and gave them six points. however, five subjects gave scores of one point because they maintained that it was not necessary to build a new viewer system. table 2. the time to find the exact result in the current system. answers # of users fewer than 10 seconds 2 10 to 30 seconds 4 30 seconds to 1 minute 12 more than 1 minute 21 table 3. statistics of quantitative questions in the questionnaire. score questions 10 9 8 7 6 5 4 3 2 1 total satisfied with original results 0 0 0 0 1 9 14 9 4 2 39 preference of refined results 2 10 14 6 5 0 0 0 0 0 37 preference of results in net view 1 8 10 6 4 1 2 3 1 3 39 preference of knowledge recommendation 3 6 4 8 5 6 0 3 1 2 38 during the interview, subjects who gave scores of more than eight points spoke positively about the vivid visualization of the retrieval results, using words such as “innovative” and “creative.” for instance, user 11 said, “bravo changes for powell, that’d be the most innovative experience for the locals.” among the subjects who gave scores of more than six points, the comments were mostly “interesting idea.” for instance, user 17 commented, “this is an interesting idea to explore my knowledge. i had no idea powell could do such an improvement.” some users offered suggestions to improve the system. for example, user 12 suggested that the system was not comprehensive enough to confidently assess whether the modified category system was better than the previous system. user 25 (a possible professional) was very concerned about the recall efficiency since the system might use many matching algorithms. discussion and conclusion in this paper, a digital literature resource organization model based on user cognition is proposed. this model aims to make users exert subjective initiative. we noticed a significant difference between the previous category system and the new system based on user cognition. our aim, which was “of the people, for the people,” was fulfilled. taking powell’s city of books as an example, it is purposeful to describe how to construct a knowledge network based on user cognition. the user experience study showed that this network implements an optimized exhibition of a digital-resource knowledge recommendation and knowledge retrieval. although user cognition includes many other processes of user behavior, we only used the literal expression. it turned out to be a positive and possible way to reveal users’ cognition. information technology and libraries | september 2018 80 we find that there is much more space for the construction object of digital resource knowledge recommendation based on user cognition. for one, in this paper we only take the familiar book city as a study object and books as experiment objects and determined favorable positive effects, which indicates that the digital resource knowledge link can be applied to physical libraries and bookstores or other types of literature. even though libraries have well-developed taxonomy systems, they can be compared with or combined with new ideas. for another, users adore visual effects and user functions. the results show promise in actualizing improvements to book city’s website or even to other digital platforms. the concerns will be how to optimize the retrieval algorithm and reduce the time costs in the next study. acknowledgements we thank carolyn mckay and powell’s city of books for such great help for the questionnaire networking and all participates for feedback. this work was supported by the national social science foundation of china [grant number 17ctq025]. references and notes 1 peter carruthers, stephen stich, and michael siegal, the cognitive basis of science (cambridge: cambridge university press, 2002). 2 sophie monchaux et al., “query strategies during information searching: effects of prior domain knowledge and complexity of the information problems to be solved,” information processing and management 51, no. 5 (2015): 557–69, https://doi.org/10.1016/j.ipm.2015.05.004. 3 hoill jung and kyungyong chung, “knowledge-based dietary nutrition recommendation for obese management,” information technology and management 17, no. 1 (2016): 29–42, https://doi.org/10.1007/s10799-015-0218-4. 4 dandan ma, liren gan, and yonghua cen, “research on influence of individual cognitive preferences upon their acceptance for knowledge classification recommendation service,” journal of the china society for scientific and technical information 33, no. 7 (2014): 712–29. 5 haiqun ma and zhihe yang, “study on the cognitive model of information searchers from the perspective of neuro-language programming,” journal of library science in china 37, no. 3 (2011): 38–47. 6 paul gooding, “exploring the information behaviour of users of welsh newspapers online through web log analysis,” journal of documentation 72, no. 2 (2016): 232–46. https://doi.org/10.1108/jd-10-2014-0149. 7 munmun de choudhury and scott counts, “identifying relevant social media content : leveraging information diversity and user cognition,” in ’ht11 proceedings of the 22nd acm conference on hypertext and hypermedia (new york: acm, 2011), 161–70, https://doi.org/10.1145/1995966.1995990; carol tenopir et al., “academic users’ interactions with sciencedirect in search tasks: affective and cognitive behaviors ,” information processing and management 44, no. 1 (2008): 105–21, https://doi.org/10.1016/j.ipm.2006.10.007. https://doi.org/10.1016/j.ipm.2015.05.004 https://doi.org/10.1007/s10799-015-0218-4 https://doi.org/10.1145/1995966.1995990 https://doi.org/10.1016/j.ipm.2006.10.007 of the people, for the people | lou, wang, and he 81 https://doi.org/10.6017/ital.v37i3.10060 8 young han bae, jong woo jun, and michelle hough, “uses and gratifications of digital signage and relationships with user interface,” journal of international consumer marketing 28, no. 5 (2016): 323–31, https://doi.org/10.1080/08961530.2016.1189372. 9 claude sicotte et al., “analysing user satisfaction with the system in use prior to the implementation of a new electronic inpatient record,” in proceedings of the 12th world congress on health (medical) informatics; building sustainable health systems (amsterdam: ios press, 2007), 1779-1784; zhenzheng qian et al., “satiindicator: leveraging user reviews to evaluate user satisfaction of sourceforge projects,” in proceedings—international computer software and applications conference 1 (2016):93–102, https://doi.org/10.1109/compsac.2016.183. 10 christina merten and cristina conati, “eye-tracking to model and adapt to user meta-cognition in intelligent learning environments,” in proceedings of the 11th international conference on intelligent user interfaces—iui ’06 (new york: acm, 2006), 39–46, https://doi.org/10.1145/1111449.1111465; weidong zhao, ran wu, and haitao liu, “paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target,” information processing & management 52, no. 5 (2016): 976–88, https://doi.org/10.1016/j.ipm.2016.04.004. 11 haoran xie et al., “incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy,” information processing and management 52, no. 1 (2016): 61–72, https://doi.org/10.1016/j.ipm.2015.03.001; francisco villarroel ordenes et al., “analyzing customer experience feedback using text mining: a linguistics-based approach,” journal of service research 17, no. 3 (2014): 278–95, https://doi.org/10.1177/1094670514524625; yujong hwang and jaeseok jeong, “electronic commerce and online consumer behavior research: a literature review,” information development 32, no. 3 (2016): 377–88, https://doi.org/10.1177/0266666914551071. 12 stephan ludwig et al., “more than words: the influence of affective content and linguistic style matches in online reviews on conversion rates,” journal of marketing 77, no. 1 (2012): 1–52, https://doi.org/10.1509/jm.11.0560. 13 jun yang and yinglong wang, “a new framework based on cognitive psychology for knowledge discovery,” journal of software 8, no. 1 (2013): 47–54. 14 alan baddeley, “on applying cognitive psychology,” british journal of psychology 104, no. 4 (2013): 443–56, https://doi.org/10.1111/bjop.12049. 15 aidan moran, “cognitive psychology in sport: progress and prospects,” psychology of sport and exercise 10, no. 4 (2009): 420–26, https://doi.org/10.1016/j.psychsport.2009.02.010. 16 john van de pas, “a framework for public information services in the twenty-first century,” new library world 114, no. 1/2 (2013): 67–79, https://doi.org/10.1108/03074801311291974. 17 enrique frias-martinez, sherry y. chen, and xiaohui liu, “evaluation of a personalized digital library based on cognitive styles: adaptivity vs. adaptability,” international journal of https://doi.org/10.1080/08961530.2016.1189372 https://doi.org/10.1109/compsac.2016.183 https://doi.org/10.1145/1111449.1111465 https://doi.org/10.1016/j.ipm.2016.04.004 https://doi.org/10.1016/j.ipm.2015.03.001 https://doi.org/10.1177/1094670514524625 https://doi.org/10.1177/0266666914551071 https://doi.org/10.1509/jm.11.0560 https://doi.org/10.1111/bjop.12049 https://doi.org/10.1016/j.psychsport.2009.02.010 https://doi.org/10.1108/03074801311291974 information technology and libraries | september 2018 82 information management 29, no. 1 (2009): 48–56, https://doi.org/10.1016/j.ijinfomgt.2008.01.012. 18 shing lee chung et al., “an integrated framework for managing knowledge-intensive service innovation,” international journal of services technology and management 13, no. 1/2 (2010): 20, https://doi.org/10.1504/ijstm.2010.029669. 19 koteshwar chirumalla, “managing knowledge for product-service system innovation: the role of web 2.0 technologies,” research-technology management 56, no. 2 (2013): 45–53, https://doi.org/10.5437/08956308x5602045; koteshwar chirumalla et al., “knowledgesharing network for product-service system development: is it a typical?,” in international conference on industrial product-service systems (2013): 109–14; fumiya akasaka et al., “development of a knowledge-based design support system for product-service systems,” computers in industry 63, no. 4 (2012): 309–18, https://doi.org/10.1016/j.compind.2012.02.009. 20 c. f. cheung et al., “a multi-perspective knowledge-based system for customer service management,” expert systems with applications 24, no. 4 (2003): 457–70, https://doi.org/10.1016/s0957-4174(02)00193-8. 21 padmal vitharana, hemant jain, and fatemeh zahedi, “a knowledge based component/service repository to enhance analysts’ domain knowledge for requirements analysis,” information and management 49, no. 1 (2012): 24–35, https://doi.org/10.1016/j.im.2011.12.004. 22 baihai zhou, “the construction of library interdisciplinary knowledge sharing service system,” in 2014 11th international conference on service systems and service management (icsssm), june 25–27, 2014, https://doi.org/10.1109/icsssm.2014.6874033. 23 rusli abdullah, zeti darleena eri, and amir mohamed talib, “a model of knowledge management system for facilitating knowledge as a service (kaas) in cloud computing environment,” 2011 international conference on research and innovation in information systems, november 23–24, 2011, 1–4, https://doi.org/10.1109/icriis.2011.6125691. 24 alan smeaton and jamie callan, “personalisation and recommender systems in digital libraries,” international journal on digital libraries 5, no. 4 (2005): 299–308, https://doi.org/10.1007/s00799-004-0100-1. 25 yanwen wu et al., “research on personalized knowledge service system in community elearning,” lecture notes in computer science (berlin: springer, 2006), https://doi.org/10.1007/11736639_17; shu-chen kao and chienhsing wu, “pikipdl. a personalized information and knowledge integration platform for dl service,” library hi tech 30, no. 3 (2012): 490–512, https://doi.org/10.1108/07378831211266627. 26 national geographic, destinations of a lifetime: 225 of the world’s most amazing places (washington d.c.: national geographic society, 2016). 27 wen lou and junping qiu, “semantic information retrieval research based on co-occurrence analysis,” online information review 38, no. 1 (january 8, 2014): 4–23, https://doi.org/10.1016/j.ijinfomgt.2008.01.012 https://doi.org/10.1504/ijstm.2010.029669 https://doi.org/10.5437/08956308x5602045 https://doi.org/10.1016/j.compind.2012.02.009 https://doi.org/10.1016/s0957-4174(02)00193-8 https://doi.org/10.1016/j.im.2011.12.004 https://doi.org/10.1109/icsssm.2014.6874033 https://doi.org/10.1109/icriis.2011.6125691 https://doi.org/10.1007/s00799-004-0100-1 https://doi.org/10.1007/11736639_17 https://doi.org/10.1108/07378831211266627 of the people, for the people | lou, wang, and he 83 https://doi.org/10.6017/ital.v37i3.10060 https://doi.org/10.1108/oir-11-2012-0203; junping qiu and wen lou, “constructing an information science resource ontology based on the chinese social science citation index,” aslib journal of information management 66, no. 2 (march 10, 2014): 202–18, https://doi.org/10.1108/ajim-10-2013-0114; fan yu, junping qiu, and wen lou, “library resources semantization based on resource ontology,” electronic library 32, no. 3 (2014): 322–40, https://doi.org/10.1108/el-05-2012-0056. 28 lei zhang et al., “extracting and ranking product features in opinion documents,” in international conference on computational linguistics (2010): 1462–70. 29 lou and qiu, “semantic information retrieval research,” 4; qiu and lou, “constructing an information science resource ontology,” 202; yu, qiu, and lou, “library resources semantization,” 322. 30 qiu and lou, “constructing an information science resource ontology,” 202. https://doi.org/10.1108/oir-11-2012-0203 https://doi.org/10.1108/ajim-10-2013-0114 https://doi.org/10.1108/el-05-2012-0056 abstract introduction related work user cognition and measurement user-oriented knowledge service model knowledge service system construction resource-organization model based on user cognition resource-organization experiment object selection data collection and processing category organization indexing books 4.5 semantic-resource construction 5. user-experience study on knowledge display and recommendation user-experience study design study 1: comparison of simple retrieval study 2: knowledge recommendation users’ feedback discussion and conclusion acknowledgements references and notes automated storage & retrieval system: from storage to service articles automated storage & retrieval system: from storage to service justin kovalcik and mike villalobos information technology and libraries | december 2019 114 justin kovalcik (jdkovalcik@gmail.com) is director of library information technology, csun oviatt library. mike villalobos (mike.villalobos@csun.edu) is guest services supervisor, csun oviatt library. abstract the california state university, northridge (csun) oviatt library was the first library in the world to integrate an automated storage and retrieval system (as/rs) into its operations. the as/rs continues to provide efficient space management for the library. however, added value has been identified in materials security and inventory as well as customer service. the concept of library as space, paired with improved services and efficiencies, has resulted in the as/rs becoming a critical component of library operations and future strategy. staffing, service, and security opportunities paired with support and maintenance challenges, enable the library to provide a unique critique and assessment of an as/rs. introduction “space is a premium” is a phrase not unique to libraries; however, due to the inclusive and open environment promoted by libraries, their floor space is especially attractive to those within and outside of the building’s traditional walls. in many libraries, the majority of floor space is used to house a library’s collection. in the past, as collections grew, floor space became increasingly limited. faced with expanding expectations and demands, libraries struggled to identify a balance between transforming space for new services while adding materials to a growing collection. in addition to management activities like weeding, other solutions such as offsite storage and compact shelving rose in popularity as a method to create library space in the absence o f new building construction. years later as collections move away from print and physical materials, libraries are beginning to reexamine their building’s space and envision new features and services. “now that so many library holdings are accessible digitally, academic libraries have the opportunity to make use of their physical space in new and innovative ways.”1 the csun oviatt library took a novel approach and launched the world’s first automated storage and retrieval system (as/rs) in 1991 as a storage solution to resolve its building space limitations. the project was a california state university (csu) system chancellor’s office initiative that cost more than $2 million to implement and began in 1989. the original concept “came from the warehousing industry, where it had been used by business enterprises for years.”2 by leveraging and storing physical materials in the as/rs, the csun oviatt library is able to create space within the library for new activities and services. “instead of simply storing information materials, the library space can and should evolve to meet current academic needs by transforming into an environment that encourages collaborative work.”3 mailto:jdkovalcik@gmail.com mailto:mike.villalobos@csun.edu automated storage & retrieval system | kovalcik and villalobos 115 https://doi.org/10.6017/ital.v38i4.11273 unfortunately, as the first stewards of an as/rs, csun made decisions that led to mismanagement and neglect resulting in the as/rs facing many challenges in becoming a stable and reliable component of the library. however, recent efforts have sought to resolve these issues and resulted in system updates, management, and functionality. whereas in the past low-use materials were placed in as/rs to create space for new materials, now materials are moved into the as/rs to create space for patrons, secure collections, and improve customer service. as part of this critical review, the functionality and maintenance along with the historical and current management of the as/rs will be examined. background csun is the second-largest member of the twenty-three-campus csu system. the diverse university community includes over 38,000 students and more than 4,000 employees.4 consisting of nine colleges offering 60 baccalaureate degrees, 41 master’s degrees, 28 credentials in education, and various extended learning and special programs, csun provides a diverse community with numerous opportunities for scholarly success.5 the csun oviatt library’s as/rs is an imposing and impressive area of the library that routinely attracts onlookers and has become part of the campus tour. the as/rs is housed in the library’s east wing and occupies an area that is 8,000 square feet and 40 feet high arranged into six aisles. the 13,260 steel bins, each 2 feet x 4 feet, in heights of 6, 10, 12, 15, and 18 inches, are stored on both sides of the aisles enabling the as/rs to store an estimated 1.2 million items.6 each aisle has a storage retrieval machine (srm) that performs automatic, semiautomatic, and manual “picks” and “deposits” of the bins.7 the as/rs was assessed in 2014 as responsibilities, support, and expectations of the system shifted and previous configurations were no longer viable. discontinued and failing equipment, unsupported server software, inconsistent training and use, and decreased local support and management were identified as impediments for greater involvement in library projects and operations. campus provided funding in 2015 to update the server software as well as major hardware components on three of the six aisles. divided into two phases, the server software upgrade was completed in may 2017 followed by the hardware upgrade in january 2019.8 literature review the continued growth of student, faculty, and academic programs along with evolving expectations and needs since the late 1980s has required the library to analyze library services and examine the building’s physical space and storage capacity. in the late 1980s, identifying space for increasing printed materials was the main contributing factor in implementing the as/rs. in the mid-2010s, creating space within the library for new services was dependent on a stable and reliable as/rs. “the conventional way of solving the space problem by adding new buildings and off-site storage facilities was untenable.”9 a benefit of an as/rs, as creaghe and davis predicted in 1986 was, “the probable slow transition from books to electronic media, an aaf [automated access facility] may postpone the need for future library construction indefinitely.”10 the as/rs has enabled the library to create space by removing physical materials while enhancing customer service, material security, and inventory control. “the role of the library as service has been evolving in lockstep with user needs. the current transformative process that takes place in academia has a powerful impact on at least two functional areas of the library: information technology and libraries | december 2019 116 library as space and library as collection.”11 in addition, the “increased security the aaf … offers will save patrons time that would be spent looking for books on the open shelves that may be in use in the library, on the waiting shelves, misplaced, or missing.”12 in subsequent years, library services have evolved to include computer labs with multiple high-use printers/scanners/copiers, instructional spaces, individual and group study spaces, makerspaces, etc., in addition to campus entities that have required large amounts of physical space within the library. “it is well-known that academic libraries have storage problems. traditional remedies for this situation—used in libraries across the nation—include off-site storage for less used volumes, as well as, more recently, innovative compact shelving. these solutions help, but each has its disadvantages, and both are far from ideal. . . . when the eastern michigan university library had the opportunity to move into a new building, we saw that an as/rs system would enable us to gain open space for activities such as computer labs, training rooms, a cafe, meeting rooms, and seating for students studying.”13 the as/rs provides all the space advantages provided by off-site storage and compact shelving while adding much more value while mitigating negatives of off-site time delays and the confusion of accessing and using compact shelving. staffing & usage 1991–1994 following the 80/20 principle, low-use items were initially selected for storage in the as/rs. “when the storage policy was being developed in [the] 1990s, the 80/20 principle was firmly espoused by librarians. . . . thus, by moving lower-use materials to as/rs, the library could still ensure that more than 80% of the use of the materials occurs on volumes available in the open stacks.”14 low-use items were identified if one of the following three conditions was met: (1) the item’s last circulation date was more than five years ago; (2) the item was a non-circulating periodical; or (3) items that were not designed to leave an area and received little patron usage such as the reference collection. in 1991, the as/rs was loaded with 800,000 low-use items and went live for the first time later that year. staffing for the initial as/rs department consisted of one full-time as/rs supervisor (40 hours/week), one part-time as/rs repair technician (20 hours/week), and 40 hours a week of dedicated student employees, for a total of 100 hours a week of dedicated as/rs management. the as/rs was largely utilized as a specialized service for internal library operations with limited patron-initiated requests. as/rs operations were uniquely created and customized for each as/rs operator as well as the desired task needing to be performed. skills were developed internally with knowledge and training shared by word of mouth or accompanied with limited documentation. 2000 mid-2000s the as/rs department functioned in this manner until the 1994 northridge earthquake struck the campus directly and required partial building reconstruction to the library. although there was no damage to the as/rs itself or its surrounding structure, extensive damage occurred in the wings of the library. the damage resulted in the library building being closed and inaccessible. when the library reopened in 2000, it was determined that due to previous as/rs low usage that a dedicated department was no longer warranted. the as/rs supervisor position was dissolved, the student employee budget was eliminated, and the as/rs technician position was not replaced after the employee retired in 2008. as/rs operational responsibilities were consolidated into the circulation department and as/rs administration into the systems department. both circulation automated storage & retrieval system | kovalcik and villalobos 117 https://doi.org/10.6017/ital.v38i4.11273 and systems departments redefined their roles and responsibilities to include the as/rs without additional budgetary funding, staffing, or training. in order for as/rs operations to be absorbed by these departments, changes had to occur in the administration, operating procedures, staffing assignments, and access to the as/rs. all five circulation staff members and twenty student employees received informal training by members of the former as/rs department in the daily operations of the as/rs. the circulation members also received additional training for first-tier troubleshooting of as/rs operations such as bin alignments, emergency stops, and inventory audits. the as/rs repair technician remained in the systems department; however, as/rs troubleshooting responsibility was shared among the systems support specialists and dedicated as/rs support was lost. the administrative tasks of scheduling preventive maintenance services (pms), resolving as/rs hardware/equipment issues with the vendor, and maintaining the server software remained with the head of the systems department. without a dedicated department providing oversight for the as/rs, issues and problems began to occur frequently. circulation had neither the training nor resources available to master procedures or enforce quality control measures. similarly, the systems department became increasingly removed from daily operations. many issues were not reported at all and became viewed as system quirks that required workarounds or were viewed as limitations of the system. for issues that were reported, troubleshooting had to start all over again and systems relied on circulation staff being able to replicate the issue in order to demonstrate the problem. system’s personnel retained little knowledge on performing daily operations, and troubleshooting became more complex and problematic as different operators had different levels of knowledge and skill that accompanied their unique procedures. mid-2000s–2015 these issues became further exasperated when areas outside of circulation were given full access to the as/rs in the mid-2000s. employees from different departments of the library began entering and accessing the as/rs area and operated the as/rs based on knowledge and skills they learned informally. student assistants from these other departments also began accessing the area and performing tasks on behalf of their informally trained supervisors. further, without access control, employees as well as students ventured into the “pit” area of the as/rs where the srms move and end-of-aisle operations occur. this area contains many hazards and is unsafe without proper training. during this period, the special collections and archives (sc/a) department loaded thousands of un-cataloged, high-use items into the as/rs that required specialized service from circulation. these items were categorized as “non-library of congress” and inventory records were entered into the as/rs software manually by various library employees. in addition, paper copies were created and maintained as an independent inventory by sc/a. over the years, the sc/a paper inventory copies were found to be insufficiently labeled, misidentified, or missing. therefore, the as/rs software inventory database and the sc/a paper copy inventory contained conflicts that could not be reconciled. to resolve this situation, an audit of sc/a materials was completed in spring 2019 to locate inventory that was thought to be missing. information technology and libraries | december 2019 118 all bound journals and current periodicals were eventually loaded into the as/rs as well, causing other departments and areas to rely on the as/rs more heavily. departments such as interlibrary loan and reserves, as well as patrons, began requesting materials stored in the as/rs more routinely and frequently. the as/rs transformed from a storage space with limited usage to an active area with simultaneous usage requests of different types throughout the day. without a dedicated staff to organize, troubleshoot, and provide quality control, there was an abundance of errors that led to long waits for materials, interdepartmental conflicts, and unresolved errors. high-use materials from sc/a, as well as currently received periodicals from the main collection, were the catalysts that drove and eventually warranted change in the as/rs usage model from storage to service. the inclusion of these materials created new primary customers identified as internal library departments: sc/a and interlibrary loan (ill). with over 4,000 materials contained in the as/rs, sc/a requires prompt service for processing archival material into the as/rs and filling specialized patron requests for these materials. in addition, ill processes over 500 periodical requests per month that utilize and depended on as/rs services. the additional storage and requests created an uptick in overall as/rs utilization that carried over into circulation desk operations as well. 2015–present the move from storage to service was not only inevitable due to an evolving as/rs inventory, but was necessary in order to regain quality control and manage the library-wide projects that involved the as/rs. the increased usage and reliance on the as/rs required the system be well maintained and managed. administration of the as/rs remains within systems and circulation student employees continue to provide supervised assistance to the as/rs. the crucial change was identified and emerged within circulation for a dedicated operations and project manager. an as/rs lead position was created with responsibilities for the daily operations and management of the system and service. however, this was not a complete return to the original staffing concept of the early 1990s. the concept for this new position focuses on project management and system operations rather than the original sole attention to system operations. the as/rs lead is the point of contact for all library projects that utilize the as/rs, relaying any as/rs issues or concerns to systems, and daily as/rs usage. this shift is necessary due to the increased demand and reliance on the system that has changed its charge from storage to service. customer service the library noted over time that the as/rs could be used as a tool in weeding and other collection shift projects to create space and aid in reorganizing materials. as more high-use materials were loaded into the as/rs the indirect advantages of the as/rs became more apparent. patrons request materials stored within the as/rs through the library’s website and pick up the materials at the circulation desk. there is no need for patrons to navigate the library, successfully use the classification system, and search shelves to locate an item that may or may not be there. as kirsch notes, “the ability to request items electronically and pick them up within minutes eliminates the user’s frustration at searching the aisles and floors of an unfamiliar library.”15 the vast majority of library patrons are csun students that commute and must make the best use of their time while on campus. housing items in the as/rs creates the opportunity to have hundreds of thousands of items all picked up and returned to one central location. this makes it far easier for library patrons, especially users with mobility challenges, to engage with a plethora of library automated storage & retrieval system | kovalcik and villalobos 119 https://doi.org/10.6017/ital.v38i4.11273 materials. the time allotted for library research and/or enjoyment becomes more productive as their desired materials are delivered within minutes of arriving in the building. as heinrich and willis state, “the provision of the nimble, just-in-time collection becomes paramount, and the demand for as/rs increases exponentially.”16 as/rs items are more readily available than shelved items on the floor, as it takes minutes to have as/rs items returned and made available once again. “they may be lost, stolen, misshelved, or simply still on their way back to the shelves from circulation—we actually have no way of knowing where they are without a lengthy manual search process, which may take days. . . . unlike books on the open shelves, returned storage books are immediately and easily ‘reshelved’ and quickly available again.”17 another advantage is there is no need to keep materials in call-number order with the unpleasant reality of missing and misshelved items. items in the as/rs are assigned bin locations that can only be accessed by an operatoror user-initiated request. the workflow required to remove a material from the as/rs involves multiple scans and procedures that increase accountability that does not exist for items stored on floor shelves. further, users are assured of an item’s availability within the system. storing materials in the as/rs ensures that items are always checked out when they leave the library and not sitting unaccounted for in library offices and processing areas. it also avoids patron frustration of misshelved, recently checked-out, or missing items. security the decision to follow the 80/20 principle and place low-use items in the as/rs meant high-use items remained freely available to library patrons on the open shelves of each floor. this resulted in high-use items being available for patron browsing and checkout, as well as patron misuse and theft. the sole means of securing these high-use items involved tattle-tape and installing security gates at the main entrance. therefore, the development of policies and procedures for the enforcement of these gates was also required. beyond the inherent cost, maintenance, and issue of ensuring items are sensitized and desensitized correctly, gate enforcement became another issue that rested upon the circulation department. assuming theft would occur by exiting the building through passing through the gates at the main entrance of the library, enforcement is limited in actions that may be performed by library employees. touching, impeding the path, following, detaining, searching, etc. of library patrons are restricted actions reserved for campus authorities such as the police and not library employees. rather than attempting to enforce a security mechanism in which we have no authority, the as/rs provides an alternative for the security of high-use and valuable materials. storing items in the as/rs eliminates the possibility of theft or damage by visitors and places control and accountability over the internal use of materials. “there would be far fewer instances of mutilation and fewer missing items.”18 further, access to the as/rs area was restricted from all library personnel to only circulation and systems employees with limited exceptions. individual log ins also provided a method of control and accountability as each operator is required to use a personal account rather than a departmental account to perform actions on the as/rs. materials stored in the as/rs are, “more significantly . . . safer from theft and vandalism.”19 information technology and libraries | december 2019 120 inventory conducting a full inventory of a library collection is time consuming, expensive, and often inaccurate by the time of completion. missing or lost items, shelf reading projects, in-process items, etc. create overhead for library employees and generate frustration for patrons searching for an item. massive, library-wide projects such as collection shifts and weeding are common endeavors undertaken to create space, remove outdated materials, and improve collection efficiency. however, actions taken on an open shelves collection is time consuming, costly, inefficient, and affect patron activities. these projects typically involve months of work that involve multiple departments to complete. items stored within the as/rs do not experience these challenges because the system is managed by a full-time employee throughout the year and not on a project basis. the system is capable of performing inventory audits, and does not affect public services. therefore, while the cost of an item on an open shelf is $0.079, the cost of storing the same item in the as/rs is $0.0220 routine and spot audits ensure an accurate inventory, confirm capacity level of the system, and establish best management of the bins. as/rs inventory audits are highly accurate and much more efficient than shelf reading with little impact to patron services. “while this takes some staff time, it is far less time-consuming than shelf reading or searching for misshelved books.”21 storing materials in the as/rs is more efficient than on open shelves; however, bin management is essential in ensuring bins are configured in the best arrangement to achieve optimal efficiency. the size and configuration of bins directly affects storage capacity. type of storage, random or dedicated, also influences capacity, efficiency, and accessibility of items. the 13,260 steel bins in the as/rs range in height from 6 to 18 inches. the most commonly used bins are the 10and 12-inch bins; however, there is a finite number of these bin heights. unfortunately, the smallest and largest bins are rarely used due to material sizes and weight capacity; therefore, as/rs optimal capacity is unattainable and the number of materials eligible for loading limited by number of bins available. the library also determined that dedicated, rather than random, bin storage type aided in locating specialized materials, reduced loading and retrieval errors, and enhanced accessibility by arranging highly used bins to reachable locations. in the event an srm breaks down and an aisle becomes nonfunctional for retrieving bins, strategically placing the highest used and specialized locations in bins that can be manually pulled is a proactive strategy. however, this requires dedicated bins with an accurate and known inventory that has been arranged in accessible locations. lessons learned disasters & security in 1994, the as/rs proved to provide a much more stable and secure environment than the open stacks when it successfully endured a 6.9 earthquake. the reshelving of more than 300,000 items required a crew of more than thirty personnel over a year to complete. many items were destroyed from the impact of falling to the floor and being buried underneath hundreds of other automated storage & retrieval system | kovalcik and villalobos 121 https://doi.org/10.6017/ital.v38i4.11273 items. the as/rs in contrast consisted of over 800,000 items and successfully sustained the brunt of the earthquake’s impact with no damage to any of the stored items. unfortunately. the materials that had been loaded into the as/rs in 1991 were low-use items that were viewed as one step from weeding. therefore, high-use items stored in open shelves were damaged and required the long process of recovery and reconstruction: identifying and cataloging damaged and undamaged materials, disposal of those damaged, renovation of the area, and purchase of new items. the low-use items stored in the as/rs by contrast required a few bins that had slightly shifted be pushed back fully into their slots. as/rs items have proven to be more secure from misplacement, theft, and physical damage from earthquakes as compared to items in open shelves. maintenance, support, and modernization the csun oviatt library has received two major updates to the as/rs since it was installed in 1991. in 2011, the as/rs received updates for communication and positioning components. the second major update occurred in two phases between 2016 and 2018 and focused on software and equipment. in phase one, server and client-side software was updated from the original software created in 1989. in phase two, half the srms received new motors, drives, and controllers. due to the many years of reliance on preventive maintenance (pm) visits and avoidance of modernization, our vendors were unable to provide support for the as/rs software and had difficulty locating equipment that had become obsolete. preventive maintenance visits were used to maintain the status quo and are not a long-term strategy for maintaining a large investment and critical component of business operations. creaghe and davis note that, “current industrial facility managers report that with a proper aaf [automated access facility] maintenance program, it is realistic to expect the system to be up 9598 percent of the time.”22 pm service is essential for long-term as/rs success; however, preventive maintenance alone is incapable of modernization and ensuring equipment and software do not become obsolete. maintenance is not the same as support, rather maintenance is an aspect of support. support includes points of contacts who are available for troubleshooting, spare supplies on hand for quick repairs, a life-cycle strategy for major components, and longterm planning and budgeting. kirsch attested the following describing eastern michigan university’s strategy: “although the dean is proud and excited about this technology, he acknowledges that just like any computerized technology, when it’s down, it’s down. ” to avoid system problems, emu bought a twenty-year supply of major spare parts and employs the equivalent of one-and-a-half full-time workers to care for its automated storage and retrieval system.”23 a system that relies solely on preventive maintenance will quickly become obsolete and require large and expensive projects in the future if the system is to continue functioning. further, modernization provides an avenue for new features and functions to be realized that increase functionality and efficiency. networking the csun oviatt library on average receives between three to four visits a year along with multiple emails and phone conversations requesting information from different libraries regarding the as/rs. these conversations aid the library by viewing the as/rs in different perspectives and forces the library to review current practices. information technology and libraries | december 2019 122 the library has learned through speaking with many different libraries that needs, design, and configuration of an as/rs can be as unique as the libraries inquiring. the csun oviatt library, for example. is much different than the three other csu system libraries that have an as/rs. due to our system being outdated, it has been difficult to form or establish meaningful groups or share information because the systems are all different from each other. as more conversations occur and systems become more modern and standard, there is potential for knowledge sharing as well as group lobbying efforts for features and pricing. buy in user confidence in any system is required in order for that system to be successful. convincing a user base that moving materials from readily available open shelves and transferring them into steel bins housed within 40-feet-high aisles that are inaccessible will be difficult if the system is consistently down. therefore, the better the as/rs is managed and supported, the more reliable and dependable that system will be and the likelihood user confidence will grow. informing stakeholders of long-term planning and welcoming feedback demonstrates that the system is being supported and managed with an ongoing strategy that is part of future library operations. similarly, administrators need confirmation that large investments and mission-critical services are stable, reliable, and efficient. creating a new line item in the budget for as/rs support and equipment life-cycle requires justification along with a firm understanding of the system. in addition, staffing and organizational responsibilities must also be reviewed in order to establish an environment that is successful and efficient. continuous assessments of the as/rs regarding downtime, projects involved, services and efficiencies provided, etc. aid in providing an illustration of the importance and impact of the system on library operations as a whole. recording usage and statistics unfortunately, usage statistics were not recorded for the as/rs prior to june 2017. therefore, data is unavailable to analyze previous system usage, maintenance, downtime, or project involvement. data-driven decisions require the collection of statistics for system analysis and assessment. following the server software and hardware updates, efforts have been taken to record project statistics, inventory audits, and srm faults, as well as public and internal paging requests. conclusion the as/rs remains, as heinrich & willis described it, “a time-tested innovation.”24 through lessons learned and objective assessment, the library is positioning the as/rs to be a critical component for future development and strategy. by expanding the role of the as/rs to include functions beyond low-use storage, the library discovered efficiencies in material security, customer service, inventory accountability, and strategic planning. the csun oviatt library has learned, experienced, and adjusted its perception, treatment, and usage of the as/rs over the past thirty years. factors often forgotten such as access to the area, staffing and inventory auditing are easily overlooked, while other potential functions such as material security and customer services may not be identified without ongoing analysis and assessment. critical review without a limited or biased perception, has enabled the library to realize the greater functionality the as/rs is able to provide. automated storage & retrieval system | kovalcik and villalobos 123 https://doi.org/10.6017/ital.v38i4.11273 notes 1 shira atkinson and kirsten lee, “design and implementation of a study room reservation system: lessons from a pilot program using google calendar,” college & research libraries 79, no. 7 (2018): 916–30, https://doi.org/10.5860/crl.79.7.916. 2 helen heinrich and eric willis. “automated storage and retrieval system: a time-tested innovation,” library management 35, no. 6/7 (august 5, 2014): 444-53. https://doi.org/10.1108/lm-09-2013-0086. 3 atkinson and lee, “design and implementation of a study room reservation system,” 916–30. 4 “about csun,” california state university, northridge, february 2, 2019, https://www.csun.edu/about-csun. 5 “colleges,” california state university, northridge, may 8, 2019, https://www.csun.edu/academic-affairs/colleges. 6 estimated as/rs capacity was calculated by determining the average size and weight of an item for each size of bin along with the most common bin layout. the average item was then used to determine how many could be stored along the width and length (and if appropriate height) of the bin and then multiplied. many factors affect the overall capacity including: bin layout (with or without dividers), stored item type (book, box, records, etc.), weight of the items, and operator determination of full, partial, empty bin designation. the as/rs mini-loaders have a weight limit of 450 pounds including the weight of the bin. 7 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 8 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 9 heinrich and willis, “automated storage and retrieval system,” 444-53. 10 norma s. creaghe and douglas a. davis. “hard copy in transition: an automated storage and retrieval facility for low-use library materials,” college & research libraries 47, no. 5 (september 1986): 495-99, https://doi.org/10.5860/crl_47_05_495. 11 heinrich and willis, “automated storage and retrieval system,” 444-53. 12 creaghe and davis, “hard copy in transition,” 495-99. 13 linda shirato, sarah cogan, and sandra yee, “the impact of an automated storage and retrieval system on public services.” reference services review 29, no. 3 (september 2001): 253-61, https://doi.org/10.1108/eum0000000006545. 14 heinrich and willis, “automated storage and retrieval system,” 444-53. 15 sarah e. kirsch, “automated storage and retrieval—the next generation: how northridge’s success is spurring a revolution in library storage and circulation,” paper presented at the https://doi.org/10.5860/crl.79.7.916 https://doi.org/10.1108/lm-09-2013-0086 https://www.csun.edu/about-csun https://www.csun.edu/academic-affairs/colleges https://library.csun.edu/about/asrs https://doi.org/10.5860/crl_47_05_495 https://doi.org/10.1108/eum0000000006545 information technology and libraries | december 2019 124 acrl 9th national conference, detroit, michigan, april 8-11 1999, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf . 16 heinrich and willis, “automated storage and retrieval system,” 444-53. 17 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 18 kirsch, “automated storage and retrieval.” 19 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 20 cost of material management was calculated by removing building operational costs (lighting, hvac, carpet, accessibility/open hours, etc.) and focusing on the management of the material instead. the management of materials (or unit cost) is determined by dividing the total amount of fixed and variable costs by the total number of units; 400,000 items divided by $31,500 in annual shelving student budget equals $0.079 per-material per-year in open shelves; 900,000 items divided by $18,000 in annual as/rs student budget equals $0.02 permaterial per-year in the as/rs. 21 shirato, cogan, and yee, “the impact of an automated storage and retrieval system,” 253-61. 22creaghe and davis, “hard copy in transition,” 495-99. 23 kirsch, “automated storage and retrieval.” 24 heinrich and willis, “automated storage and retrieval system,” 444-53. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf abstract introduction background literature review staffing & usage 1991–1994 2000 mid-2000s mid-2000s–2015 2015–present customer service security inventory lessons learned disasters & security maintenance, support, and modernization networking buy in recording usage and statistics conclusion notes the benefits of enterprise architecture for library technology management: an exploratory case study sam searle information technology and libraries | december 2018 27 sam searle (samantha.searle@griffith.edu.au) is manager, library technology services, griffith university, brisbane, australia. abstract this case study describes how librarians and enterprise architects at an australian university worked together to document key components of the library’s “as-is” enterprise architecture (ea). the article covers the rationale for conducting this activity, how work was scoped, the processes used, and the outputs delivered. the author discusses the short-term benefits of undertaking this work, with practical examples of how outputs from this process are being used to better plan future library system replacements, upgrades, and enhancements. longer-term benefits may also accrue in the future as the results of this architecture work inform the library’s it planning and strategic procurement. this article has implications for practice for library technology specialists as it validates views from other practitioners on the benefits for libraries in adopting enterprise architecture methods and for librarians in working alongside enterprise architects within their organizations. introduction griffith university is a large comprehensive university with multiple campuses located across the south east queensland region in australia. library and information technology operations are highly converged and from 1989–2017 were offered within a single division of information services. scalable, sustainable, and cost-effective it is seen as a key strategic enabler of the university’s core business in education and research. “information management and integration” and “foundation technology” are two of four key areas outlined in the griffith digital strategy 2020, which highlights enterprise-wide decision-making and proactive moves to take advantage of as-a-service models for delivering applications.1 from late 2016 through to early 2018, library and learning services (“the library”) and it architecture and strategy (itas) worked iteratively to document key components of the library’s “as-is” enterprise architecture (ea). around fifty staff members have participated in the process at different points. the process has been very positive for all involved and has led to a number of benefits for the library in terms of improved planning, decision-making, and strategic communication. as manager, library technology services, the author was well placed to act as a participant-asobserver with the objective of sharing these experiences with other library practitioners. the author actively participated in the processes described here and has been able to informally discuss the benefits of this work with the architects and some of the library staff members who were most involved. mailto:samantha.searle@griffith.edu.au benefits of enterprise architecture for library technology management | searle 28 https://doi.org/10.6017/ital.v37i4.10437 literature review enterprise architecture (ea) emerged over twenty years ago and is now a well-established it discipline. like other disciplines such as project management and change management, there are a number of best practice frameworks in common use, including the open group architecture framework (togaf).2 a global federation of member professional associations has been in place since 2011, with aims including the formalization of standards and promotion of the value of ea.3 educational qualifications, certifications, and professional development pathways for enterprise architects are available within universities and the private training sector. according to the international higher education technology association educause, ea is relatively new within universities but is growing in importance. as a set of practices, “ea provides an overarching strategic and design perspective on it activities, clarifying how systems, services, and data flows work together in support of business processes and institutional mission.”4 yet despite this growing interest in our parent organizations, individual academic libraries applying ea principles and methods are notably absent from the scholarly literature and library practitioner information sharing channels. the fullest account to date of the experience and impacts of enterprise architecture practice in a library context is a case study from the canada institute for scientific and technical information (cisti). at the time of the case study’s writing in 2008, cisti was already well underway in its adoption of ea methods in an effort to address the challenges of “legacy, isolated, duplicated, and ineffective information systems” and to “reduce complexity, to encourage and enable collaborations, and, finally, to rein in the beast of technology.”5 the author of this case study concludes that while getting started in ea was complex and resource-intensive, this was more than justified at cisti by the improvements in technology capability, strategic planning, and services to library users. broader whole-of-government agendas are a driver for ea adoption in non-university research libraries. the national library of finland’s ea efforts were guided by a national information society policy and the ea architecture design method for finnish government. 6 a 2009 review of the it infrastructure at the u.s. library of congress (lc) argued lc was lagging behind other federal agencies in adoption of government-recommended ea frameworks. the impact of this included: inadequate linking of it to the lc mission; potential system interoperability problems; difficulties assessing and managing the impact of changes; poor management of it security; and technical risk due to non-adherence to industry standards and lack of future planning.7 a followup review in 2015 noted that lc had since developed an architecture, but that it had still fallen short by not gathering data from management and validating the work with stakeholders. 8 there is little discussion in the literature about the ea process as a collaborative effort. in their 2016 discussion of emerging roles for librarians, parker and mckay proposed ea as a new area for librarians themselves to consider moving into, rather than as a source of productive partnerships.9 they argued that there are many similarities in the skillsets and practices of enterprise architects and information professionals (in particular, systems librarians and corporate information managers). areas of crossover identified included: managing risks, for example, related to intellectual property and data retention; structured and standardized approaches to (meta)data and information; technical skills such as systems analysis, database design and vendor management; and understanding and application of information standards and internal information technology and libraries | december 2018 29 information flows. while not a research library, within a broader information management context state archives and records nsw has promoted the benefits to records managers of working with enterprise architects, including improved program visibility, strategic assistance with business case development, and the embedding of recordkeeping requirements within the organization’s overall enterprise architecture.10 getting started: context and planning library technology services context in 2015–16, the awareness of enterprise and solution architecture expanded significantly within griffith university’s library technology services (lts) team. in 2015, some members of the team participated in activities led by external consultants to document griffith’s overall enterprise architecture at a high level. in 2016, the author became a member of the university’s solution architecture board (sab). lts submitted several smaller solution architectures to this group for discussion and approval, and team members found this process useful in identifying alternative ways to do things that we may not have otherwise considered. as a small team looking after a portfolio of high-use applications, lts was seeking to align itself as much as possible with university-wide it governance and strategy. these broader approaches included aggressively seeking to move services to cloud hosting, standardizing methods for transferring data between systems, complying with emerging requirements for greater it security, and participating in large-scale disaster recovery planning exercises. the author also needed to improve communication with senior it stakeholders. there was little understanding outside of the library of the scale and complexity involved in delivering online library services to a community of over 50,000 people. in a resource-scarce environment, it was increasingly important to make business cases not just in formal project documents but also opportunistically in less formal situations (the “elevator pitch”). existing systems were definitely hindering the library in making progress toward an improved online student experience and more efficient usage of staff resources. a complex ecosystem of more than a dozen library applications had developed over time. the library had selected these at different times based on requirements for specific library functions rather than alignment with an overall architectural strategy. our situation mirrored that described at cisti: “a complex and ‘siloed’ legacy infrastructure with significant vendor lock-in” combined with “reactionary” projects that “extended or redesigned [existing infrastructure] to meet purported needs, without consideration for the complexity that was being added to overcomplicated systems.”11 complex data flows between local systems and third-party providers that were critical to library services were not always well-documented. while lts staff members were extremely experienced, much of their knowledge was tacit. as in many libraries, staff could be observed sharing in informal, organic ways focused on the tasks at hand, but less effort was spent on capturing knowledge systematically. building a more explicit shared understanding about the library’s application portfolio would help address risks associated with staff succession. improved internal documentation would also address emerging requirements for team members to both develop their own understanding in new areas (upskilling) as well as become more flexible in terms of taking up broader roles and responsibilities across the team (cross-skilling). benefits of enterprise architecture for library technology management | searle 30 https://doi.org/10.6017/ital.v37i4.10437 there was also a sense that the time was right to take stock and evaluate the current state of affairs before embarking on any major changes. the team was supporting several applications, including the library management system and the interlibrary loans system, that were end-of-life. we needed to make decisions, and these needed to not only address our current issues but also provide a firm platform for the future. it was in this context that in 2016 library technology services approached the information technology architecture and solutions group for assistance. information technology architecture and solutions context in 2014, griffith university embarked on a new approach to enterprise architecture. the chief technology officer was given a mandate by the senior leadership of the university to ensure that it architecture was managed within an architecture governance framework, and the information services ea team was tasked with developing and maintaining an ea and providing services to support the development of solution architectures for projects and operational activities. two new boards were established to provide governance: the information and technology architecture board (itab) would control architectural standards and business technology roadmaps, while the solution architecture board (sab) would “support the development and implementation of solution architecture that is effective, sustainable and consistent with architectural standards and approaches.” project teams and operational areas were explicitly given responsibility to engage with these boards when undertaking the procurement and implementation of it systems. sets of architectural, information, and integration principles were developed, which promoted integration mechanisms that minimized business impact and were future-proof, loosely coupled, reusable, and shared services.12 our enterprise architects saw their primary role as maximizing the value of the university’s total investment in it by promoting standards and frameworks that could potentially improve consistency and reduce duplication across the whole organization. in order to do this , they would need to work with and through other business units. from the architects’ perspective, a collaboration with the library offered an opportunity to exercise skillsets and frameworks that were in place but still relatively new. griffith was still maturing in this area and attempting to move from the hiring of consultants as the norm to building more internal capability. working with the library would be a good learning experience for a junior architect, who was on a temporary work placement from another part of information services as a professional development opportunity. she could build her skills in a friendly environment before embarking on other engagements with potentially less open client groups. determining scope in a statement of architecture work once the two teams had decided that the process could have benefits on both sides, the next step was to jointly develop a statement of architecture work outlining what the process would include and how we would work together. a formal document was eventually endorsed at the director level, but prior to that, the librarians and the architects had a number of useful informal conversations in which we discussed our expectations, as well as the amount of time that we could reasonably contribute to the process. in developing the statement of work, the two teams agreed to focus on the current “as-is” environment and on assessment of the maturity of the applications already in use (see figure 1). this would help us immediately with developing business cases and roadmaps, without information technology and libraries | december 2018 31 necessarily committing either team to the much greater effort required to identify an ideal “to-be” (i.e., future) state to work towards. figure 1. overview of the architecture statement of work. full size version available at https://doi.org/10.6084/m9.figshare.6667427. the open group architecture framework (togaf) supports the development of enterprise architectures through four subdomains: business architecture, data architecture, application architecture, and technology architecture.13 the work that we decided to pursue maps to two of these areas: data architecture, which “describes the structure of an organization’s logical and physical data assets and data management resources;” and application architecture, which “provides a blueprint for the individual applications to be deployed, their interactions, and their relationships to the core business processes of the organization.” enterprise architecture process and outputs once the architecture statement of work had been agreed on, the two teams embarked on the process of working together over an extended period. while the lapsed time from approval of the statement of work through to endorsement of the architecture outputs by the solution architecture board was approximately fourteen months, the bulk of the work was undertaken within the first six months. following an intense period of information gathering involving large numbers of staff, a smaller subset of people then worked iteratively to refine the outputs for final approval. several times architecture activities had to be placed on hold in favor of essential ongoing operational work and higher priority projects, such as a major upgrade of the institutional repository. the process involved four main activities which are described in more detail in following sections. https://doi.org/10.6084/m9.figshare.6667427 benefits of enterprise architecture for library technology management | searle 32 https://doi.org/10.6017/ital.v37i4.10437 data asset and application inventory the first activity consisted of a series of three workshops to review information held about library systems in the ea management system, orbus software’s iserver. this is the tool used by the griffith ea team to develop and store architectural models, and to produce artifacts such as architecture diagrams (in microsoft visio format) and documentation (in microsoft word, excel, and powerpoint formats).14 the architects guided a group of librarians who use and support library systems through a process of mapping the types of data held against an existing list of enterprise data entities. in this context, a data entity is a grouping of data elements that is discrete and meaningful within a particular business context. for library staff, meaningful data entities included all the data relating to a person, to items and metadata within a library collection, and to particular business processes such as purchasing. we also identified the systems into which data were entered (system of entry), the systems that were considered the “source of truth” (system of record), and the systems that made use of data downstream from those systems of record (reference systems). the main output of this process was a workbook (figure 2) showing a range of relationships: between systems and data entities; between internal systems; and between internal systems and external systems. the first two columns in the worksheet contain a list of all the data entities and sub-entities stored in library systems (as expressed in the enterprise architecture). along the top of the worksheet is a list of all the products in our portfolio along with a range of systems they are integrated with. each of the orange arrows in this spreadsheet represents the flow of data from one system to another. the workbook in this raw form is definitely messy and the data within it is not really meant to be widely consumed in this format. the workbook’s main role is as the data source for the application communication diagram that is described in a later section. as a result of this data asset inventory, the management system used by our architects now contains a far more comprehensive and up-to-date view of the library’s architectural components than before: • the data entities better reflect library content. for example, while iserver already had a collection item data entity, we were able to add new data entity subtypes for bibliographic records, authority records, and holdings records. • library systems are now captured in ways that make more sense to us. workshopping with the architects led to the breakdown of several applications into more granular architectural components. for example, the library management system is now represented not just as a single system, but rather as a set of interconnected modules that support different business functions, such as cataloguing and circulation. similarly, our reading lists solution was broken down into its two main components: one for managing reading lists and one for managing digitized content. this granularity has enabled us to build a clearer picture of how systems (and modules within systems) interface with each other. information technology and libraries | december 2018 33 figure 2. part of the data asset and application inventory worksheet. full size version available at https://doi.org/10.6084/m9.figshare.6667430. https://doi.org/10.6084/m9.figshare.6667430 benefits of enterprise architecture for library technology management | searle 34 https://doi.org/10.6017/ital.v37i4.10437 • the wide range of technical interfaces we have with third parties, such as publishers and other libraries, is now explicitly expressed. feedback from the architects suggested that the library was very unusual compared to other parts of the organization in terms of the number of critical external systems and services that we use as part of our service provision. previously iserver did not contain a full picture of these critical services, including: o the web-based purchasing tools that we use to interact with publishers, such as ebsco’s gobi;15 o the library links program that we use to provide easier access to scholarly content via google scholar;16 and o various harvesting processes that enable us to share metadata with content aggregators, such as the national library of australia’s trove service and the australian national data service’s research data australia portal. 17 application maturity survey the second activity was an application maturity assessment. this involved forty-four staff members from all areas of the library with different viewpoints (technical, non-technical, and management) answering a series of questions in a spreadsheet format. the survey contained questions about: • how often a system was used; • how easy it was to use; • how well it supported the business processes that person carried out; • how well it performed, for example, in terms of response times; • how quickly changes/enhancements were implemented in the product; • how easily the system could be integrated with other systems; • the level of compliance with industry standards; and • overall supportability (including vendor support). as different respondents were assigned multiple systems depending on their level of support and/or use, the final overall number of responses to the survey was 144 responses relating to eleven different systems. the outputs of this process were a summary table and a series of four graphs. the summary table (see figure 3) presents aggregated scores on a scale of one (low) to five (high) for each application as well as recommended technical and management strategies. it is interesting, and somewhat disheartening, to note that scores for the business criticality of the applications are generally much higher than the scores for fitness. there is also some variation in the strategies required; some systems need to be replaced, but there are others where the issues seem to be less technical. the third row of the table shows a product that is scored as highly business-critical and perfectly suited to the job from a technical perspective, yet the product still scores much more poorly for business fit, which could indicate that something has gone wron g in the way that this product has been implemented. information technology and libraries | december 2018 35 figure 3. table summarizing the results of the application maturity assessment [product names redacted]. applications are rated on a scale of one to five, and one of four management strategies (technology refresh—not shown here, optimise, implementation review, or replace) is recommended. full size version available at https://doi.org/10.6084/m9.figshare.6667433. figure 4. two of the four graph types produced from the application maturity survey results, for a product [name redacted] that is performing well. full size version available at https://doi.org/10.6084/m9.figshare.6667436. figures 4 and 5 show the four graph types produced automatically from the survey results. on the left in figure 4 is a view displaying the business criticality, business fit, and technical fit for an individual application (shown in pink) as compared to the overall portfolio (shown in blue). on the right is a graph showing scores for the range of measures covered by the survey. this https://doi.org/10.6084/m9.figshare.6667433 https://doi.org/10.6084/m9.figshare.6667436 benefits of enterprise architecture for library technology management | searle 36 https://doi.org/10.6017/ital.v37i4.10437 particular product is doing well; technical and business fit are high in the graph on the left, and most measures are above average in the graph on the right. figure 5 shows the remaining two graphs for the same product. the graph on the left plots the scores for business criticality and application suitability (fitness for purpose) to produce a recommended technical strategy. the graph on the right plots the scores for business fit and technical fit to produce a recommended management strategy. in both graphs, it is possible to see how the specific application is performing (the red square) compared to the portfolio overall (the blue diamond). placement within the quadrant with the green optimize label is preferred, as in this case. figure 5. the remaining two graph types from the application maturity survey results, for a system [product name redacted] that is performing well. the specific system’s location is shown by the red square, while the blue diamond maps the average for all systems in the application portfolio. full size version available at https://doi.org/10.6084/m9.figshare.6667442. figures 6 and 7 present the same set of graphs for an end-of-life system. in figure 6 the graph on the left shows that the product is very business-critical but that its scores for technical fit and business fit (the lower corners of the pink triangle) are lower than the average across all applications (the lower corners of the blue triangle). the graph on the right shows that supportability and the time to market for changes and enhancements (the least prominent “points” in the pink polygon) are below the portfolio average (shown in blue along the same axes) while scores for other criticality, standards compliance, information quality, and performance were more in line with the portfolio average. https://doi.org/10.6084/m9.figshare.6667442 information technology and libraries | december 2018 37 figure 6. the first and second (of four) graphs for a system [product name redacted] that is end of-life. full size version available at https://doi.org/10.6084/m9.figshare.6667478. in figure 7, this application is placed well within the quadrant suggesting replacement. figure 7. the third and final graphs for a system [product name redacted] that is end-of-life. the placement of the red square within the replace quadrant indicates that this product is a high candidate for decommissioning. this is a marked difference from the portfolio as a whole (the blue diamond), which could be reviewed for possible implementation improvements. full size version available at https://doi.org/10.6084/m9.figshare.6667484. https://doi.org/10.6084/m9.figshare.6667478 https://doi.org/10.6084/m9.figshare.6667484 benefits of enterprise architecture for library technology management | searle 38 https://doi.org/10.6017/ital.v37i4.10437 the graphs are also useful for highlighting anomalies. figure 8 shows a product that is assessed as better-than-average in the portfolio on most measures. however, the survey results quite clearly show that information quality is a major issue. figure 8. graph from application maturity survey showing a specific area of concern (data quality) for an otherwise well-performing application [product name redacted]. full size version available at https://doi.org/10.6084/m9.figshare.6667487. this type of finding will help library technology services to target our continuous improvement efforts and work through our relationships with user groups and vendors to get a better result. application communication diagram the third major activity was the production of an application communication diagram (see figure 9). this is a visual representation of all of the information that was collated through the workshops using the workbook described above. https://doi.org/10.6084/m9.figshare.6667487 information technology and libraries | december 2018 39 figure 9. application communication diagram [simplified view]. full size version available at https://doi.org/10.6084/m9.figshare.6667490. https://doi.org/10.6084/m9.figshare.6667490 benefits of enterprise architecture for library technology management | searle 40 https://doi.org/10.6017/ital.v37i4.10437 the diagram includes a number of things to note. • key applications that make up the library ecosystem. an example of this is the large blue box on the top left. this represents the intota product suite from proquest, which contains multiple components, including our link resolver, discovery layer, and electronic resource manager. • physical technology. self-checkout machines appear as the small green box mid-right. • other internal systems that connect to library system components. examples of these are throughout and include: corporate systems, such as peoplesoft for human resources and finances; identity management systems like metadirectory and ping federate; the learning management system blackboard; and research systems, including the research information management system and the researcher profiles system. • external systems that connect to our systems. these are mostly gathered into the large grey box bottom right. • actors who access the systems. this includes administrators, staff, students, and the general public. actors are identified using a small person icon. • interfaces between components. each line in the diagram represents a unique connection into another system or interface. captions on these lines indicate the nature of the connection, e.g. manual data entry, z39.50 search, export scripts, and lookup lists. the production of this diagram has been an iterative process that has taken place over a long time period. the number of components involved in the diagram is quite large, so it is worth noting that the version presented here has actually been simplified. the architects’ tools can present information in different ways and this particular “view” was chosen to balance the need for detail and accuracy with the need to communicate meaningfully with a variety of stakeholders. production of interactive visualizations in the fourth and final work package, the data entity and application inventory spreadsheet was used as a data source to provide an interactive visualization (see figure 10). a member of the architecture team converted the workbook (see figure 2) from microsoft excel .xls into a .csv file. he developed a php script to query the file and return a json object based on the parameters that were passed. the data driven documents javascript library (d3.js) was used to produce a force graph that uses shapes, colors, and lines to visually present the spreadsheet information in a more interactive way.18 this tool enables navigation through the library’s network of data entities (shown as orange squares) and applications (shown as blue dots). in the example being displayed, the data entity “bibliographic records—marc” has been selected. it is possible to see both in the visualization and in the popup box on the left how marc records are captured, stored, and used across our entire ecosystem of applications. this visualization was very much an experiment and the value of this in the long term is something we are still discussing. in the short term, other outputs have proven to be more useful for planning purposes. information technology and libraries | december 2018 41 figure 10. interactive visualization of library architecture, showing relationships between a single data subentity (bibliographic records—marc) and various applications. full size version available at https://doi.org/10.6084/m9.figshare.6667493. https://doi.org/10.6084/m9.figshare.6667493 benefits of enterprise architecture for library technology management | searle 42 https://doi.org/10.6017/ital.v37i4.10437 discussion the process described above was not without its challenges, including establishing a common language. enterprise architecture and libraries are both fertile breeding grounds for jargon and acronyms. there was also a disconnect in our understandings of who our users were, with the architects tending to concentrate on internal users, while the librarians were keen to include the perspectives of the academic staff and students who make up our core client base. these were minor challenges, and the experience of working with the enterprise architects was overall an interesting and positive one for the library. our collaboration validated mckay and parker’s view that there is much crossover in the skillsets and mindsets of librarians and enterprise architects.19 both groups tended to work in systematic and analytical ways, which was helpful in removing some of the more emotive aspects that might have arisen through a more judgmental “assessment” process. the enterprise architects’ job was to promote conformance with standards that are aspirational in many respects for the library. however, the collaborative nature of the process and the immediate usefulness of its outputs helped us to approach this as an opportunity to improve our internal practices as well as the services that we offer to library customers. the architects observed in return that library staff were very open-minded about the process; this had not necessarily always been their experience with other groups in the university. one reason for this may have been lts’s efforts to communicate early with other library staff. before embarking on this work, we sent emails and provided verbal updates to all participants and their supervisors. these communications were clear about both the time commitment needed for workshops and surveys and also about the benefits we hoped to achieve. short-term impacts in the library domain the level of awareness and understanding in library technology services about ea concepts and methods is much higher than what it was previously. our capacity to self-identify architectural issues is better as a result and this is enabling us to be proactive rather than reactive. a recent example of this is a request from our solution architecture board (sab) to seek an exemption from our it advisory board (itab) for our proposed use of the niso circulation interchange protocol (ncip) to support interlibrary loan. while ncip is a niso standard that is widely used in libraries, it is not one of the integration mechanisms incorporated into the architecture standards. as a result of this request, we plan to develop a document for these it governance groups about all the library-specific data transfer protocols that we use; not just ncip, but also z39.50, the open archives initiative protocol for metadata harvesting (oai-pmh), the edifact standard for transferring purchasing information, and possibly others. it is in our interests to educate these important governance groups about integration methods commonly used in the library environment, since these are not well understood outside of our team. the baseline as-is application architecture diagram gives us a much better grasp on the complexity we are faced with. understanding this complexity is a prerequisite to controlling it. the diagram, and the process worked through to populate it, makes it easier to identify manual processes that should be automated and integrations that might be done more efficiently or effectively. for example, like most libraries, we still have many scheduled batch processes that we could potentially replace in the future with web services to provide real-time updates. information technology and libraries | december 2018 43 the iserver platform is now an important source of data to support our decision-making, in terms of arriving at broad recommendations for replacing, reimplementing, or optimizing our systems as well as highlighting specific areas of concern. importantly, the process produced relative results, so that we can see across our application portfolio which systems are underperforming compared to others. this makes it easier to determine where the team should be putting its efforts and highlights areas where firmer approaches to vendor management could be applied. a practical example of this was our decision in late 2017 to review (and ultimately unbundle and replace) an e-journal statistics module that was underperforming compared to other modules within the same suite. the outputs from this process are also helping library technology services communicate, both within our own team and also with other stakeholders. the results of the application maturity assessment were included as part of a business case seeking project funding to upgrade our library management system and replace our interlibrary loans system. that funding bid was successful. while it is possible that the business case would have been approved regardless, a recommendation from the architects that the system needed to be replaced was likely more persuasive than the same recommendation coming solely from a library perspective. in our organizational context, enterprise architects are trusted by very senior executives; they are perceived as neutral and objective, and the processes that they use are understood to be systematic and data-driven. longer-term impacts in an enterprise context there are a number of longer-term impacts that may arise from this work. seeing the library’s applications in a broader enterprise context is likely to lead to more questioning of the status quo and to a desire to investigate new ways to do things. in large organizations like universities, available enterprise systems can offer better functionality and more standardized ways of operating than library systems. financial systems are an obvious example, as are business intelligence tools. the canned and custom reports and dashboards within library systems meet a narrow set of requirements, but do not compare well for increasingly complex analytics when compared to enterprise data warehousing, emerging “data lake” technologies for less structured data, and sophisticated reporting tools. an enterprise approach also highlights where the same process is being done across different systems. for example, oai-pmh harvesting is a feature of multiple systems at griffith. traditionally each system provides its own feeds. our data repository, publications repository, and researcher profile system all provide oai-pmh harvesting endpoints for sending metadata to different aggregators. an alternative solution to explore could be to harvest all publications data from multiple systems into our corporate data warehouse (particularly if this evolved to provide more linked data functionality) and provide a single oai-pmh endpoint that could then be managed as a single service. the ea process has further raised our already high level of concern with the current library systems market. there has been a move in recent years towards larger, highly-integrated “black box” solutions. while there have been some moves towards openness, for example through the development of apis, these are often rhetorical rather than practical. the pricing structures for products mean that we continue to pay for functionality that would not be required if we could integrate library applications with non-library enterprise tools in smarter ways. at griffith, the benefits of enterprise architecture for library technology management | searle 44 https://doi.org/10.6017/ital.v37i4.10437 products that scored most highly in our maturity assessment in terms of business and technical fit were the less expensive, lightweight, browser-based, cloud-native tools designed to do one or two things really well. this suggests that strategies around a more loosely coupled microservices approach, such as that being developed through the folio open source library software initiative, will be worth exploring in future.20 conclusion there are few documented examples of librarians working closely with enterprise architects in higher education or elsewhere. the goal of this case study is to encourage other librarians to learn more about architects’ work practices and to seek opportunities to apply ea methods in the library systems space for the benefit not just of the library but also for the organization as a whole. as a single institution case study, the applicability of this work may be limited in other environments. griffith has a long tradition of highly converged library and it operations; other organizations may have more structural barriers to entry if the library and it areas are not as naturally cooperative. a further obvious limitation relates to resourcing. the author of the cisti case study cautions that getting started in ea can be complex and resource-intensive. few libraries are likely to be in the position of cisti in having dedicated library architects, so working with others will be required. in many universities, work of this nature is outsourced to specialist consultants because of a lack of in-house expertise. at griffith university, we conducted this exercise entirely with in-house staff. a downside of this was that, despite our best efforts at the scoping stage, competing priorities in both areas meant that this work took far longer than we expected. in theory, external consultants could have guided the library through similar activities to produce similar outputs, and probably in a shorter timeframe. however, we would observe that the process has been just as important as the outputs; the knowledge, skills, and relationships that have been built will continue into the future. at cisti, investments in ea were assessed by the library as justified by the improvements in technology capability, strategic planning, and services to library users. the griffith experience validates this perspective. it is also important to note that ea work can and should be done in an iterative way. our experience suggests that some outputs can be delivered earlier than others and useful insights can be gleaned even from drafts. our local “ecosystem” of library applications, enterprise applications, and integrations between these different components mus t respond to changes in technologies; legal and regulatory frameworks; institutional policies and procedures; and other factors. it is therefore unrealistic to expect outputs from a process like this to remain current for long. assuming that the library’s data and application architecture will always be a work-in-progress, it will continue to be worth the effort involved to build and maintain positive working relationships with the enterprise architects, who now have a deeper understanding of who we are and what we do. acknowledgements thank you to anna pegg, associate it architect; jolyon suthers, senior enterprise architect; colin morris, solution consultant; the library technology services team; all our library and learning services colleagues who participated in this initiative; and joanna richardson, library strategy information technology and libraries | december 2018 45 advisor, for support and feedback during the writing of this article. this work was previously presented at theta (the higher education technology agenda) 2017, auckland, new zealand. references 1 griffith university, “griffith digital strategy 2020,” 2016, https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digitalstrategy.pdf. 2 the open group, “togaf®, an open group standard,” accessed june 4, 2018, http://www.opengroup.org/subjectareas/enterprise/togaf. 3 federation of enterprise architecture professional associations, “a common perspective on enterprise architecture,” 2013, http://feapo.org/wp-content/uploads/2013/11/commonperspectives-on-enterprise-architecture-v15.pdf. 4 judith pirani, “manage today’s it complexities with an enterprise architecture practice,” educause review, february 16, 2017, https://er.educause.edu/blogs/2017/2/managetodays-it-complexities-with-an-enterprise-architecture-practice. 5 stephen kevin anthony, “implementing service oriented architecture at the canada institute for scientific and technical information,” the serials librarian 55, no. 1–2 (july 3, 2008): 235–53, https://doi.org/10.1080/03615260801970907. 6 kristiina hormia-poutanen, “the finnish national digital library: national library of finland developing a national infrastructure in collaboration with libraries, archives and museums,” accessed march 24, 2018, http://travesia.mcu.es/portalnb/jspui/bitstream/10421/6683/1/fndl.pdf. 7 karl w. schornagel, “information technology strategic planning: a well-developed framework essential to support the library’s and future it needs. report no. 2008-pa-105,” may 2, 2009, https://web.archive.org/web/20090502092325/https://www.loc.gov/about/oig/reports/20 09/final%20it%20strategic%20planning%20report%20mar%202009.pdf. 8 joel willemssen, “information technology: library of congress needs to implement recommendations to address management,” december 2, 2015, https://www.gao.gov/assets/680/673955.pdf. 9 rebecca parker and dana mckay, “it’s the end of the world as we know it . . . or is it? looking beyond the new librarianship paradigm,” in marketing and outreach for the academic library, ed. bradford lee eden (lanham, md: rowman and littlefield, 2016): 81–106. 10 new south wales state archives and records authority, “recordkeeping in brief 59—an introduction to enterprise architecture for records managers,” 2011, https://web.archive.org/web/20120502184420/https://www.records.nsw.gov.au/recordkee ping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-inbrief-59-an-introduction-to-enterprise-architecture-for-records-managers. 11 anthony, “implementing service oriented architecture,” 236–37. https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digital-strategy.pdf https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digital-strategy.pdf http://www.opengroup.org/subjectareas/enterprise/togaf http://feapo.org/wp-content/uploads/2013/11/common-perspectives-on-enterprise-architecture-v15.pdf http://feapo.org/wp-content/uploads/2013/11/common-perspectives-on-enterprise-architecture-v15.pdf https://er.educause.edu/blogs/2017/2/manage-todays-it-complexities-with-an-enterprise-architecture-practice https://er.educause.edu/blogs/2017/2/manage-todays-it-complexities-with-an-enterprise-architecture-practice https://doi.org/10.1080/03615260801970907 http://travesia.mcu.es/portalnb/jspui/bitstream/10421/6683/1/fndl.pdf https://web.archive.org/web/20090502092325/https:/www.loc.gov/about/oig/reports/2009/final%20it%20strategic%20planning%20report%20mar%202009.pdf https://web.archive.org/web/20090502092325/https:/www.loc.gov/about/oig/reports/2009/final%20it%20strategic%20planning%20report%20mar%202009.pdf https://www.gao.gov/assets/680/673955.pdf https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers benefits of enterprise architecture for library technology management | searle 46 https://doi.org/10.6017/ital.v37i4.10437 12 jolyon suthers, “information and technology architecture,” 2016, accessed april 6, 2018 https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/com munities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%2 0symposium%202016%20-%20it%20architecture%20v2_0.pdf. 13 the open group, “togaf® 9.1,” 2011, 2018, http://pubs.opengroup.org/architecture/togaf9doc/arch/index.html: part 1 introduction section 2: core concepts. 14orbus software, “iserver for enterprise architecture,” accessed march 26, 2018, https://www.orbussoftware.com/enterprise-architecture/capabilities/. 15 ebsco, “gobi®,” accessed june 5, 2018, https://gobi.ebsco.com/gobi. 16 google scholar, “google scholar support for libraries,” accessed june 5, 2018, https://scholar.google.com/intl/en/scholar/libraries.html. 17 national library of australia, “trove,” accessed june 5, 2018, https://trove.nla.gov.au/; australian national data service, “research data australia,” accessed june 5, 2018, https://researchdata.ands.org.au/. 18 mike bostock, “d3.js—data-driven documents,” accessed april 3, 2018, https://d3js.org/. 19 parker and mckay, “it’s the end of the world,” 88. 20 breeding, marshall, “five key technology trends for 2018,” computers in libraries, 37, no.10 (december 2017), http://www.infotoday.com/cilmag/dec17/breeding--five-key-technologytrends-for-2018.shtml. https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf http://pubs.opengroup.org/architecture/togaf9-doc/arch/index.html http://pubs.opengroup.org/architecture/togaf9-doc/arch/index.html https://www.orbussoftware.com/enterprise-architecture/capabilities/ https://gobi.ebsco.com/gobi https://scholar.google.com/intl/en/scholar/libraries.html https://trove.nla.gov.au/ https://researchdata.ands.org.au/ https://d3js.org/ http://www.infotoday.com/cilmag/dec17/breeding--five-key-technology-trends-for-2018.shtml http://www.infotoday.com/cilmag/dec17/breeding--five-key-technology-trends-for-2018.shtml abstract introduction literature review getting started: context and planning library technology services context information technology architecture and solutions context determining scope in a statement of architecture work enterprise architecture process and outputs data asset and application inventory application maturity survey application communication diagram production of interactive visualizations discussion short-term impacts in the library domain longer-term impacts in an enterprise context conclusion acknowledgements references information technology and libraries at 50: the 1990s in review steven k. bowers information technology and libraries | december 2018 9 steven k. bowers (sbowers@wayne.edu) is executive director, detroit area library network (dalnet). i played some computers games — stored on data cassette tapes — in the 1980s. that was entertaining, but i never imagined the greater hold that computers would have on the world by the mid-1990s. i can remember getting my first email account in 1993, and looking at information on rudimentary web pages in 1996. i remember my work shifting from an electric typewriter to a bulky personal computer with dial-up internet access. eventually, this new computing technology became a prevalent part of my everyday life. this shift to a computer-driven reality had a major effect on libraries too. i was amazed by the end of the 1990s to be doing research on a university library catalog system connected with other institutions of higher education throughout the region, wondering at the expanded access to, and reach of, information. in my mind, due to computers and the internet, libraries were really connected at that time more than they had ever been. as i prepared this review of what we were writing about in ital in the 1990s, i had some fond memories of the advent of personal computers in my daily life and in the libraries i had access to. as we take a look back, i think it is interesting to see what we were doing then and how it is connected to what we are still working on today. along with the eventual disruption that the internet was to libraries, computers and online access also had the effect of greatly changing how libraries constructed our core research tools, especially the catalog. prior to the 1990s libraries had begun automation projects to move their catalogs to computer-based terminals, creating connections and access that were not previously possible with a card catalog. if we are still complaining about the design and function of the online public access catalog (opac) today, in the early 1990s we were discussing what their design and function should be, in a positive and optimistic way. in some ways it seems hard to recall the discussions of how to format data and display it to users. in other ways it seems like we are still having the same discussions, but our work has become more complex as we continue to restructure library data to become more open and accessible. while we were contemplating the design of online library catalogs, libraries were also discussing the implementation of networking and other information technology infrastructures. nevins and learn examined the changes in hardware, software, and telecommunications at the time and predicted a more affordable cost model with distributed personal computers connected through networks, and enhancing library automation cooperation. 1 they expanded the discussion to include consideration of copyright and intellectual property, security, authorization, and a need for information literacy in the form of user navigation, all key to what we are doing today. beyond catalogs, there was the real adoption of the internet itself. by the early 1990s there was growing enthusiasm for accessing and exploring the internet. 2 this created a need for libraries to learn about the internet and instruct others on how to use it. as late as 1997, however, even search engines were still being introduced and defined, and using the internet or searching the world wide web was still a new concept that was not fully understood by many people. at their the 1990s in review| bowers 10 https://doi.org/10.6017/ital.v37i4.10821 basis, search engines were simply defined as indexing and abstracting databases for the web. 3 it is interesting that library catalogs were developed separately from the development of search engines and we are still trying to get our metadata out of our closed systems and open to the rest of the web. in 1991, kibirige examined the potential impact of this new connectivity on library automation. he posited that “one of the most significant change agents that will pervade all other trends is the establishment and regular use of high-speed, fiber optic communication highways.”4 his article in ital provides a prescient overview of much of what has played out in technology, not just in libraries. he noted the need for disconnected devices to become tools to access full-text information remotely.5 perhaps most important, he noted the need for librarians to become experts in non-library technology, to keep pace with developments outside of the profession. this admonition is still important to keep in mind today. at the time, however, libraries were working on the basics of converting records from online bibliographic utility systems running on mainframes to a more useful format for access on a personal computer, let alone thinking about transforming library metadata into linked data that can be accessed by rest of the internet. so we keep moving forward. later in the decade, libraries began to think about the library catalog as a “one stop shop” for information. in 1997, caswell wrote about new work to integrate local content, digital materials, and electronic resources, all into one search interface. initially the discussion was more technical in nature, but caswell provided an early concept for providing a single access point to all of the content that the library has, print and electronic, which was a step forward from just listing the books in the catalog.6 at the time we were still far away from our current concept of a full discovery system with access to millions of electronic resources that may well surpass the print collections of a library. eventually more discussion developed around the importance of user experience and usability for the design of catalogs and websites. catalogs were examined in parallel with the structure of library metadata, and both were seen as important to the retrievability of library materials. human-machine interaction was starting to be examined on the staff side of systems, and this would eventually become part of examining the public interface usability as well. outlining an agenda for redesigning online catalogs, buckland summarized this new technological development work for libraries by noting that “sooner or later we need to rethink and redesign what is done so that it is not a mechanization of paper but fully exploits the capabilities of the new technology.”7 more exciting, by the end of the 1990s we were seeing usability studies for specific populations and those with accessibility difficulties. systems were in wide enough use that libraries began to examine their usefulness to more audiences. beyond our systems, the technology of our actual collections was changing. new network connectivity combined with new hardware led to new formats for library resources, specifically digital and electronic resources. in 1992, geraci and langschied summarized these changes, stating that “what is new for the 1990s is the complication of a greater variety of electronic format, software, hardware, and network decisions to consider.”8 they also expanded the conversation to include data in all forms, and data sets of various kinds, well beyond traditional library materials. this is an important evolution as libraries worked to shift their operations, identities, and curatorial practices. geraci and langschied defined data by type, including social data, scientific information technology and libraries | month year 11 data, and humanities data. they called most importantly for libraries to include access to this varied data to continue the role of libraries providing access to information, as they cautioned that information seekers were already beginning to bypass libraries and look for such information from other sources. libraries were beginning to lose ground as the gatekeepers of information and needed to shift to providing online access and open data themselves. the early 1990s were an exciting time for preservation, as discussion was moving from converting materials to microforms to digitization. in 1990, lesk compared the two formats and had hope for a promising digital future.9 thank goodness he was on target for sharing resources and creating economical digital copies, even if he did not completely predict the eventual shift to reliance on electronic resources that many research libraries have now made. lesk also noted the importance of text recognition, optical character recognition (ocr), and text formatting in ascii. others focused on digital file formats and the planning and execution of creating digital collections. digitization practices were developing and the need to formalize practice was becom ing evident. the same year, lynn outlined the relationship between digital resources and their original media, highlighting preservation, capture, storage, access, distribution.10 by the late 1990s there were more targeted discussions about the benefits of digitizing resources to provide not only remote access, but access to archival materials specifically. in 1996, alden provided a good primer on everything to consider when doing digitization projects, within budget constraints. 11 by the mid-1990s, karen hunter was excited to extol the promises of the dissemination of information electronically, calling the high performance computing and high speed networking applications act of 1993 “[a] formidable vision and goal. real-time access to everything and a laser printer in every house. the 1990s equivalent to a chicken in every pot.”12 hunter’s article is a good overview of where libraries were at working with electronic publications and online access in the early 1990s. halcyon enssle’s piece on moving reserves to online access opened with a great summary of where much of library access was headed: “the virtual library, libraries without walls, the invisible user . . . these are some of the terms getting used to describe the library of the future . . . .”13 eventually, by the end of the decade we even learned to start tracking how our new online libraries were being used, applying our knowledge of print resource usage to our new online collections. in 1995, laverna saunders had already developed a new definition of what a library was, and how the transformation of libraries from physical warehouses to providing access to online content would affect workflows in libraries. as defined by saunders, “the virtual library is a metaphor for the networked library, consisting of electronic and digital resources, both local and remote.”14 not a bad definition more than 20 years later. saunders asked pertinent questions such as which resources would be best in print vs. online, what print materials should be retained, and which resources and collections libraries should digitize themselves. the broader view provided was that these changes would affect not just collections but the entire operation of libraries. there would still be work to do in libraries, but changes in the work were necessary to address shifting technology and the composition of collections. by the end of the decade there was new work to assess use of electronic resources, extended virtual reference services, and information literacy extending to technology instruction. in 1998, kopp wrote about the promising future of library collaborations. consortia were well established in prior decades and they were seeing a resurgence. kopp noted that just as consortia the 1990s in review| bowers 12 https://doi.org/10.6017/ital.v37i4.10821 had been built around support for new shared utilities in the 1970s and 1980s, in the 1990s they were finding a new purpose in the new networking of the internet and possibilities of greater connectivity and collaborations in the online environment.15 beyond cataloging and automation technology, it is interesting to note that even in the new online environment that was forming in the 1990s, many consortia formed at the time to share print resources. this may have been conversely related to libraries shifting from complete print collections to online holdings that many may have felt were more ephemeral, or maybe money was spent on new technological infrastructures and less on library materials. resource sharing of print materials is still an important part of libraries working together to provide access to information, and since the time that kopp wrote about consortia and growing networked collaborations, there has also been a growing development of sharing electronic resources. a large part of the work of many consortia today revolves around purchasing of electronic resources, but in the late 1990s libraries were just beginning to get into purchasing commercial electronic resources.16 there were lots of ital articles in the 1990s looking at the future of libraries and technology, and some specific articles dedicated to prognostication. in 1991, looking into the future, kenneth e. dowlin shared a vision for public libraries in 2001. he predicted that libraries would still exist but it is noteworthy that at the time the future existence of libraries was questioned by many. dowlin did predict change for libraries, including the confluence of new media formats, computing, and yes, still books. he stated what time has now confirmed: “the public wants them all.”17 he had lots of other interesting ideas as well; his article is worth a second look. another fun take on the future was a special section on science fiction from 1994 considering future possibilities in information technology and access. in one piece, david brin noted, “nobody predicted that the home computer would displace the mega-machine and go on to replace the rifle over the fireplace as freedom’s great emancipator, liberating common citizens as no other technology has since the invention of the plow.”18 an interesting observation, even if the computer has now been replaced by phones in our pockets or other fantastic wearable technologies. by the end of the 1990s, libraries had been greatly transformed by technology. many libraries had automated, workflows continued to adjust in all areas of library work, and most libraries had at least partially incorporated elements of using the internet along with providing computer access to library users. some libraries were already moving through the change from print to electronic library resources. specific web applications and websites were also being developed and used for and by libraries. these eventually have matured into smarter systems that can provide better access to our collections and smarter assessment of our resource usage, for both print and electronic materials. as a whole, the 1990s are an exciting time to review when looking at the intersection of information technology and libraries. as information dissemination moved to an online environment, within and outside of the profession, the future existence of libraries began to be questioned. as we now know, libraries still play an important role in providing access to information. notes 1 kate nevins and larry l. learn, “linked systems: issues and opportunities (or confronting a brave new world),” information technology and libraries 10, no. 2 (1991): 115. information technology and libraries | month year 13 2 constance l. foster, cynthia etkin, and elaine e. moore, “the net results: enthusiasm for exploring the internet,” information technology and libraries 12, no. 4 (1993): 433-6. 3 scott nicholson, “indexing and abstracting on the world wide web: an examination of six web databases,” information technology and libraries 16, no. 2 (1997): 73-81. 4 harry m. kibirige, “information communication highways in the 1990s: an analysis of their potential impact on library automation,” information technology and libraries 10, no. 3 (1991): 172. 5 kibirige, “information communication highways in the 1990s,” 175. 6 jerry v. caswell, “building an integrated user interface to electronic resources,” information technology and libraries 16, no. 2 (1997): 63-72. 7 michael k. buckland, “agenda for online catalog designers,” information technology and libraries 11, no. 2 (1992): 162. 8 diane geraci and linda langschied, “mainstreaming data: challenges to libraries,” information technology and libraries 11, no. 1 (1992): 10. 9 michael lesk, “image formats for preservation and access,” information technology and libraries 9, no. 4 (1990): 300-308. 10 m. stuart lynn, “digital imagery, preservation, and access--preservation and access technology: the relationship between digital and other media conversion processes: a structured glossary of technical terms,” information technology and libraries 9, no. 4 (1990): 309-336. 11 susan alden, “digital imaging on a shoestring: a primer for librarians,” information technology and libraries 15, no. 4 (1996): 247-50. 12 karen a. hunter, “issues and experiments in electronic publishing and dissemination,” information technology and libraries 13, no. 2 (1994): 127. 13 halcyon r. enssle, “reserve on-line: bringing reserve into the electronic age,” information technology and libraries 13, no. 3 (1994): 197. 14 laverna m. saunders, “transforming acquisitions to support virtual libraries,” information technology and libraries 14, no. 1 (1995): 41. 15 james j. kopp, “library consortia and information technology: the past, the present, the promise,” information technology and libraries 17, no. 1 (1998): 7-12. 16 international coalition of library consortia, “guidelines for statistical measures of usage of web-based indexed, abstracted, and full text resources,” information technology and libraries 17, no. 4 (1998): 219-21; charles t. townley and leigh murray, “use-based criteria the 1990s in review| bowers 14 https://doi.org/10.6017/ital.v37i4.10821 for selecting and retaining electronic information: a case study,” information technology and libraries 18, no. 1 (1999): 32-9. 17 kenneth e. dowlin, “public libraries in 2001,” information technology and libraries 10, no. 4 (1991): 317. 18 david brin, “the good and the bad: outlines of tomorrow,” information technology and libraries 13, no. 1 (1994): 54. evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results articles evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results sam grabus information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12235 sam grabus (smg383@drexel.edu) is an information science phd candidate at drexel university’s college of computing and informatics, and research assistant at drexel’s metadata research center. this article is the 2020 winner of the lita/ex libris student writing award. © 2020. abstract this research compares automatic subject metadata generation when the pre-1800s long-s character is corrected to a standard < s >. the test environment includes entries from the third edition of the encyclopedia britannica, and the hive automatic subject indexing tool. a comparative study of metadata generated before and after correction of the long-s demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the long-s is not corrected. results confirm that correcting the long-s increases the availability of terms that can be used for creating quality metadata records. a relationship is also demonstrated between shorter entries and an increase in omitted terms when the long-s is not corrected. introduction the creation of subject metadata for individual documents is long known to support standardized resource discovery and analysis by identifying and connecting resources with similar aboutness .1 in order to address the challenges of scale, automatic or semi-automatic indexing is frequently employed for the generation of subject metadata, particularly for academic articles, where the abstract and title can be used as surrogates in place of indexing the full text. when automatically generating subject metadata for historical humanities full texts that do not have an abstract, anachronistic typographical challenges may arise. one key challenge is that presented by the historical “long-s” < ſ >. in order to account for these idiosyncrasies, there is a need to understand the impact that they have upon the automatic subject indexing output. addressing this challenge will help librarians and information professionals to determine whether or not they will need to correct the long-s when automatically generating subject metadata for full-text pre-1800s documents. the problem of the long-s in optical character recognition (ocr) for digital manuscript images has been discussed for decades.2 many scholars have researched methods for correcting the longs through the use of rule-based algorithms or dictionaries.3 while the problem of the long-s is well-known in the digital humanities community, automatic subject metadata generation for a large corpus of pre-1800s documents is rare, as is research about the application and evaluation of existing automatic subject metadata generation tools on 18th-century documents in real-world information environments. the impact of the long-s upon automatic subject metadata generation results for pre-1800s texts has not been extensively explored. the research presented in this paper addresses this need. the paper reports results from basic statistical analysis and visualization using the helping interdisciplinary vocabulary engineering (hive) tool automatic mailto:smg383@drexel.edu information technology and libraries september 2020 evaluating the impact of the long-s | grabus 2 subject indexing results, before and after the correction of the historical long-s in the 3rd edition of the encyclopedia britannica. background work was conducted over the summer and fall of 2019, and the research presented was conducted during winter 2020. the work was motivated by current work on the “developing the data set of nineteenth-century knowledge” project, a national endowment for the humanities collaborative project between temple university’s digital scholarship center and drexel university’s metadata research center. the grant is part of a larger project, temple university’s “19th-century knowledge project,” which is digitizing four historical editions of the encyclopedia britannica.4 the next section of this paper presents background covering the historical encyclopedia britannica data, the automatic subject metadata generation tool used for this project, a brief background of “the long-s problem,” and the distribution of encyclopedia entry lengths in the 3rd edition. the background section will be followed by research objectives and method supporting the analysis. next, the results are presented, demonstrating prevalence of terms omitted from the automatic subject metadata generation results if the long-s is not corrected to a standard small < s > character, as well as the impact of encyclopedia entry length upon these results. the results are followed by a contextual discussion, and a conclusion that highlights key findings and identifies future research. background indexing for the 19th-century knowledge project the 19th-century knowledge project, an neh-funded initiative at temple university, is fully digitizing four historical editions of the encyclopedia britannica (the 3rd, 7th, 9th, and 11th). the long-term goal of the project is to analyze the evolving conceptualization of knowledge across the 19th century.5 the 3rd edition of the encyclopedia britannica (1797) is the earliest edition being digitized for this project. the 3rd edition consists of 18 volumes, with a total of 14,579 pages, and individual entries ranging from four to over 150,000 words. for each individual entry, researchers at temple have created individual tei-xml files from the ocr output. in order to enrich accessibility and analysis across this digital collection, the knowledge project will be adding controlled vocabulary subject headings into the tei headers of each encyclopedia entry xml file. considering the size of this corpus, both in terms of entry length and number of entries, automatic subject metadata generation will be required for the creation of this metadata. the knowledge project will employ controlled vocabularies to replace or complement naturally extracted keywords for this process. using controlled vocabularies adheres to metadata semantic interoperability best practices, ensures representation consistency, and helps to bypass linguistic idiosyncrasies of these 18th and 19th century primary source materials. 6 we selected two versions of the library of congress subject headings (lcsh) as the controlled vocabularies for this project. lcsh was selected due to its relational thesaurus structure, multidisciplinary nature, and continued prevalence in digital collections due to its expressiveness and status as the largest general indexing vocabulary.7 in addition to the headings from the 2018 edition of lcsh, headings from the 1910 lcsh are also implemented in order to provide a more multi-faceted representation, using temporally-relevant terms that may have been removed from the contemporary lcsh. the tool applied for this process is hive, a vocabulary server and automatic indexing application. 8 hive allows the user to upload a digital text or url, select one or more controlled vocabularies, and performs automatic subject indexing through the mapping of naturally extracted keywords to the available controlled vocabulary terms. hive was initially launched as an imls linked open information technology and libraries september 2020 evaluating the impact of the long-s | grabus 3 vocabulary and indexing demonstration project in 2009. since that time, hive has been further developed, with the addition of more controlled vocabularies, user interface options, and the rake keyword extraction algorithm. the rake keyword extraction algorithm has been selected for this project after a comparison of topic relevance precision scores for three keyword extraction algorithms.9 the long-s problem early in our metadata generation efforts, we discovered that the 3rd edition of the encyclopedia britannica employs the historical long-s. originating in early roman cursive script, the long-s was used in typesetting up through the 18th century, both with and without a left crossbar. by the end of the 18th century, the long-s fell out of use with printers.10 as outlined by lexicographers of the 17th and 18th centuries, the rules for using the long-s were frequently vague, complicated, inconsistent over time, and varied according to language (english, french, spanish, or italian). 11 these rules specified where in a word the long-s should be used instead of a short < s >, whether it is capitalized, where it may be used in proximity to apostrophes, hyphens, and the letters < f >, < b >, < h >, and < k >; and whether it is used as part of a compound word or abbreviation.12 this is further complicated by the inclusion of the half-crossbar, which occasionally results in two consequences: (a) the long-s may be interpreted by ocr as an < f >, and < b > and < f > may be interpreted by ocr as a long-s. figure 1 shows an example from the 3rd edition entry on russia, in which the original text specifies “of” (line 1 in top figure), yet the ocr output has interpreted the character as a long-s. the long-s may also occasionally be interpreted by the ocr as a lowercase < l >, such as the “univerlity of dublin” in the 3rd edition entry on robinson (the most rev sir richard). these complications and inconsistencies are challenges when developing python rules for correcting the long-s in an automated way, and even preexisting scripts will need to be adapted for individual use with a particular corpus. figure 1. example from the 3rd edition entry on russia, comparing the original use of a letter < f > in “of” to the ocr output of the same passage, which mistakenly interprets the character as a long-s. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 4 despite the transition away from the long-s towards the end of the 18th century, the 3rd edition of the encyclopedia britannica (published in 1797) implements the long-s throughout, with approximately 100,594 instances of the long-s in the ocr output. when performing metadata generation with the hive tool on the ocr output for an entry, the long-s is most often interpreted by the automatic metadata generation tool as an < f >, which can result in (a) inaccurate keyword extraction (e.g., russians→ ruffians), and (b) when mapping extracted keywords to controlled vocabulary terms, essential topics could be unidentifiable, and hive will subsequently omit them from the results because they cannot be mapped to controlled vocabulary terms. figure 2 provides a truncated view of long-s words in the 3rd edition entry on rum, which are subsequently removed from the pool of automatically extracted keywords when performing the automatic subject indexing sequence in hive. using keyword extraction algorithms that are largely dependent upon term frequencies, automatic subject indexing for an entry on rum may be substantially hindered when meaningful and frequently occurring words such as sugar, and yeast are removed. figure 2. examples of the long-s in the 3rd edition encyclopedia britannica entry on rum. using this example entry, the automatic subject indexing results were compared using python, to determine which terms only appear when the long-s has been corrected to the standard < s >. the comparison showed that 16 total terms no longer appeared in the results when the long-s was not corrected to a standard < s >: ten terms using the 2018 lcsh, and six terms using the 1910 lcsh. these omitted results included the terms sugar and yeast. the next section will discuss the encyclopedia entry word count for this corpus, and the possible impact that this may have upon automatic subject indexing between corrected and uncorrected long-s instances. encyclopedia entry lengths consistent with other encyclopedia britannica editions in the 18th and 19th centuries, the encyclopedia entries in the 3rd edition vary substantially in length. a convenience sample of 3,849 3rd edition entries ranging in length from 2 to 202,848 words demonstrated an arithmetic mean of information technology and libraries september 2020 evaluating the impact of the long-s | grabus 5 826.60, and a median word count of 71. as shown in figure 3, this indicates a significant skew towards shorter entry lengths. for the vast majority of encyclopedia entries in this corpus, a low total word count may impact the degree of long-s impact for automatic subject indexing results, given the importance of term availability and frequency for keyword extraction algorithms. figure 3. scatterplot of word count for a convenience sample of 3,849 3rd edition encyclopedia britannica entries. large-scale metadata generation requires time, labor, and resources, and it becomes more costly when accounting for the complications of correcting the long-s for a particular corpus. library and information professionals working with digital humanities resources will need to understand the impact of correcting or not corrected the long-s in the corpus before designating resources and developing a protocol for generating the automatic or semi-automatic metadata for full-text resources. this includes understanding whether or not the length of each individual document will affect the degree of long-s impact upon the results. this challenge, and issues reviewed above, are in the research presented below. objectives the overriding goal of this work is to determine the prevalence of omitted terms in automatic subject indexing results when the long-s is not corrected in the 3rd edition entries of the encyclopedia britannica. research questions: 1. what is the average number of terms that are omitted from automatic subject indexing results when the long-s is not corrected to a standard < s >? 2. how does the encyclopedia entry length affect the number of terms that are omitted when the long-s is not corrected to a standard < s >? this analysis will approach these goals by performing a comparative analysis of automatic subject indexing results to determine the number of terms that are omitted from the results when the long-s is not corrected to a standard letter < s >. basic descriptive statistics are generated to determine central tendency. the quantity of terms omitted are then compared with encyclopedia information technology and libraries september 2020 evaluating the impact of the long-s | grabus 6 entry word counts. these objectives were shaped by collaboration between drexel university’s metadata research center and temple university’s digital scholarship center. the next section of this paper will report on methods and steps taken to address these objectives. methods we approached this research by performing a comparative analysis of subject metadata generated both before and after the correction of the historical long-s in the 3rd edition of the encyclopedia britannica. the hive tool was used to automatically generate the subject metadata. descriptive statistics were applied, and visualizations produced from the results were also examined to identify trends. figure 4. the 30 encyclopedia britannica 3rd edition encyclopedia britannica entries randomly selected for this study, sorted in ascending order by their word counts. the protocol for performing this research involved the following steps: 1. compile a sample for testing: 1.1. a random sample of 30 encyclopedia entries was identified from a convenience sample of entries that comprise the letter s volumes of the 3rd edition. the entries range, in length, from 6 to 6,114 words. the median word count for entries in this sample is 99 words. 1.2. the sample of terms selected for this study and their respective word counts are visualized in figure 4. 1.3. for each entry, the long-s terms in the original xml file were extracted to a list. 2. perform automatic subject indexing sequence upon entries to generate lists of terms: 2.1. using the 2018 and 1910 versions of the lcsh. 2.2. with fixed maximum subject heading results set to 40: 20 maximum terms returned with the 2018 lcsh, and 20 maximum terms returned with the 1910 lcsh. 2.3. before long-s correction and after long-s correction, using the oxygen xml editor tei to txt transformation. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 7 3. perform outer join on python data frames, between terms generated when the long-s has been corrected vs. terms generated when the long-s has not been corrected. the resulting left outer join list displays terms that are omitted from the automatic indexing results if the long-s is not corrected to a standard small < s >. the quantity of terms omitted are recorded for comparison. 4. analysis: descriptive statistics were generated to determine central tendency for the number and percentage of words omitted when the long-s is not corrected. the quantity of terms omitted are also visualized in a continuous scatterplot with the corresponding word counts, to demonstrate that the quantity of terms omitted when the long-s is not corrected seems to relate to the length of the document being automatically classified. results the results report the prevalence of omitted terms when the long-s is not corrected to a standard < s >, as well as a visualization of the number of terms omitted as they relate to the encyclopedia entry length. for each of the 30 sample entries automatically indexed with hive, a fixed maximum number of 40 entries were returned: a maximum of 20 terms using the 2018 lcsh, and a maximum of 20 terms using the 1910 lcsh. as seen in table 1, central tendency is measured using the arithmetic mean and median, along with the standard deviation and range. the average number of terms omitted from an entry’s results is 6.73, and the average percentage of terms omitted from an entry’s results is 26.51 percent, with the 2018 and 1910 editions of lcsh performing at similar rates. the full results are displayed in appendix a. table 1. measures of centrality, standard deviation, range, and percentage for quantity of terms omitted when the long-s is not corrected to a standard < s >, rounded to the hundredth. for each entry, a maximum of 40 terms were returned: 20 using 2018 lcsh and 20 using 1910 lcsh. the total results returned varies according to the entry length. these totals are reported in appendix b. (n= 30 entries.) for each entry in the sample, the results in appendix a display the total words omitted when the long-s is not corrected, the number of 2018 lcsh terms omitted, the number of 1910 lcsh terms omitted, and the encyclopedia entry word count. figure 5 visualizes the total number of terms omitted for each entry when the long-s is not corrected, demonstrating an increase in terms omitted for entries with lower word counts. these results are broken down by vocabulary used in figure 6, demonstrating that both vocabularies used to generate these results indicate a significant increase in omitted terms for shorter entries. column1 both vocabularies 2018 lcsh 1910 lcsh average, terms omitted 6.73 3.67 3.07 median, terms omitted 5 3 2 standard deviation 6.53 3.84 3.17 range, terms omitted 0-24 0-13 0-11 average percentage, omitted terms 26.51% 27.51% 24.28% median percentage, omitted terms 22.36% 20.00% 19.09% information technology and libraries september 2020 evaluating the impact of the long-s | grabus 8 figure 5. number of automatic subject indexing terms that are omitted when the long-s is not corrected to a standard < s > as compared by encyclopedia entry word count. figure 6. number of automatic subject indexing terms that are omitted when the long-s is not corrected to a standard < s > as compared by encyclopedia entry word count, separated by controlled vocabulary version. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 9 discussion the analysis above presents measures of centrality for quantity of terms omitted if the long-s is not corrected to a standard < s > prior to automatic subject indexing using hive, as well as a visualization to represent the relationship between encyclopedia entry word count and number of terms omitted. although researchers have identified challenges with the long-s and have focused a great deal on the technologies and methods used to correct it, there is still limited work on looking at the results of not correcting the long-s character when performing an automatic subject indexing sequence. this research demonstrated an average of 6.73 potentially relevant terms omitted from automatic indexing results when the long-s is not corrected, accounting for an average of 26.51 percent of the total results, with an approximately equal distribution of omitted terms across the two controlled vocabulary versions used. when the quantity of terms omitted is visualized using a continuous scatterplot, the results also demonstrated a significant increase in omitted terms for shorter entries, with longer entries less affected. these results reflect the impact of term frequency and total word count in keyword extraction and automatic subject indexing, with longer documents having a greater pool of total terms from which to identify key terms. considering the complexities and similarities of the typographical characters in the original manuscript, the ocr output process for this corpus occasionally mistakes the letters < s >, < f >, < r >, and < l >. as a result, an occasional long-s word in this study did not originally contain an < s > (e.g., sor instead of for). correction of these long-s ocr errors requires the development of a dictionary-based script. an additional complication of this research is that the corrected ocr output for the encyclopedia entries still contains a few errors not related to the long-s, which will prevent the mapping of the term to any controlled vocabulary term (e.g., in the entry on sepulchre, the ocr output for the term palestine was palestinc). these results are specific to this particular corpus of 3rd edition encyclopedia britannica entries, but it is very likely that testing another set of pre-1800s documents containing the long-s would also illustrate that for best results with any algorithm or tool, the long-s needs to be corrected. the results are also specific to the two versions of the lcsh used, both the 1910 lcsh and the 2018 lcsh, which are available in the hive tool. the 1910 version is key for the time period being studied, and the 2018, more contemporary to today, has supported additional analysis on the impact of the long-s. both of these vocabularies are important to the larger 19th-century knowledge project. it should be noted that while the lcsh is updated weekly, we were limited to what is available via the hive tool, and any discrepancies that may be found with the 2020 lcsh will very likely have a minimal effect upon metadata generation results. it should be noted that the 2020 lcsh will be incorporated into hive soon and can be explored in future research. conclusion and next steps the objective of this research was to determine the impact of correcting the long-s in pre-1800s documents when performing an automatic metadata generation sequence using keyword extraction and controlled vocabulary mapping. this was accomplished by performing an automatic subject indexing sequence using the hive tool, followed by a basic statistical analysis to determine the quantity of terms omitted from the results when the long-s is not corrected to a standard < s >. the number of omitted terms was also compared with the encyclopedia entry word count and visualized to demonstrate a significant increase in omitted terms for shorter information technology and libraries september 2020 evaluating the impact of the long-s | grabus 10 encyclopedia entries. the study was conclusive in confirming that the correction of the long-s is a critical part of our workflow. the significance of this research is that it demonstrates the necessity of correcting the long-s prior to performing an automatic subject indexing on historical documents. beyond the correction of the long-s, the larger next steps for this project are to continue to explore automatic metadata generation for this corpus. these next steps include the comparison of results using contemporary vs. historical vocabularies and streamlining a protocol for bulk classification procedures and integration of terms into the tei-xml headers. the research presented here can inform other digital humanities and even science-oriented projects, where researchers may not be aware of the impact of the long-s on automatic metadata generation not only for subjects, but also named entities, particularly when automatic approaches with controlled vocabularies are desired. acknowledgements the author thanks dr. jane greenberg and dr. peter logan for their guidance. the author acknowledges the support of the neh grant #haa-261228-18. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 11 appendix a entry term total words omitted 2018 lcsh terms omitted 1910 lcsh terms omitted encyclopedia entry word count sardis 24 13 11 381 suction 24 13 11 38 stylites, pillar saints 19 13 6 199 shadwell 14 10 4 211 salicornia 13 6 7 254 sepulchre 11 3 8 348 sitta nuthatch 9 5 4 620 sprat 9 3 6 475 serapis 8 5 3 587 strada 8 1 7 189 shoad 7 4 3 463 sign 7 5 2 68 shooting 6 3 3 6114 strata 6 3 3 2920 stewartia 5 4 1 72 subclavian 5 3 2 20 schweinfurt 4 2 2 84 scroll 4 2 2 45 spalatro 4 3 1 99 special 4 3 1 24 samogitia 3 2 1 112 shakespeare 3 0 3 3855 sinapism 2 1 1 25 sect 1 1 0 20 severino 1 1 0 38 shaddock 1 1 0 6 scarlet 0 0 0 65 shallop, shalloop 0 0 0 42 soldanella 0 0 0 56 spoletto 0 0 0 99 information technology and libraries september 2020 evaluating the impact of the long-s | grabus 12 appendix b *n = 30 entries average terms returned median terms returned corrected 24.77 / 40 possible 28 / 40 possible uncorrected 26.47 / 40 possible 29 / 40 possible 2018 lcsh corrected 14.10 / 20 possible 19 / 20 possible 2018 lcsh uncorrected 13.47 / 20 possible 18.5 / 20 possible 1910 lcsh corrected 11.27 / 20 possible 11 / 20 possible 1910 lcsh uncorrected 10.13 / 20 possible 9 / 20 possible information technology and libraries september 2020 evaluating the impact of the long-s | grabus 13 endnotes 1 liz woolcott, “understanding metadata: what is metadata, and what is it for?,” routledge (november 17, 2017), https://doi.org/10.1080/01639374.2017.1358232; koraljka golub et al., “a framework for evaluating automatic indexing or classification in the context of retrieval,“ journal of the association for information science and technology 67, no. 1 (2016), https://doi.org/10.1002/asi.23600; lynne c. howarth, “metadata and bibliographic control: soul-mates or two solitudes?,“ cataloging & classification quarterly 40, no. 3-4 (2005), https://doi.org/10.1300/j104v40n03_03. 2 a. belaid et al., “automatic indexing and reformulation of ancient dictionaries“ (paper presented at the first international workshop on document image analysis for libraries, palo alto, ca, 2004), https://doi.org/10.1109/dial.2004.1263264. 3 beatrice alex et al., “digitised historical text: does it have to be mediocre" (paper presented at the konvens 2012 (lthist 2012 workshop), vienna, september 21, 2012); ted underwood, “a half-decent ocr normalizer for english texts after 1700," the stone and the shell, december 10, 2013, https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-englishtexts-after-1700/. 4 “nineteenth-century knowledge project," (github repository), 2020, https://tuplogan.github.io/. 5 “nineteenth-century knowledge project.” 6 marcia lei zeng and lois mai chan, “metadata interoperability and standardization a study of methodology, part ii," d-lib magazine 12, no. 6 (2006); g. bueno-de-la-fuente, d. rodríguez mateos, and j. greenberg, “chapter 10 automatic text indexing with skos vocabularies in hive" (elsevier ltd, 2016); sheila bair and sharon carlson, “where keywords fail: using metadata to facilitate digital humanities scholarship," journal of library metadata 8, no. 3 (2008), https://doi.org/10.1080/19386380802398503. 7 john walsh, “the use of library of congress subject headings in digital collections," library review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 8 jane greenberg et al., “hive: helping interdisciplinary vocabulary engineering,“ bulletin of the american society for information science and technology 37, no. 4 (2011), https://doi.org/10.1002/bult.2011.1720370407. 9 sam grabus et al., “representing aboutness: automatically indexing 19thcentury encyclopedia britannica entries,” nasko 7 (2019), pp. 138-48, https://doi.org/10.7152/nasko.v7i1.15635. 10 karen attar, “s and long s," in oxford companion to the book, eds. michael felix suarez and h. r. ii woudhuysen (oxford: oxford university press, 2010); ingrid tieken-boon van ostade, “spelling systems,“ in an introduction to late modern english (edinburgh university press, 2009). 11 andrew west, “the rules for long-s," tugboat 32, no. 1 (2011). 12 attar, “s and long s.” https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1002/asi.23600 https://doi.org/10.1300/j104v40n03_03 https://doi.org/10.1109/dial.2004.1263264 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tu-plogan.github.io/ https://tu-plogan.github.io/ https://doi.org/10.1080/19386380802398503 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.7152/nasko.v7i1.15635 abstract introduction background indexing for the 19th-century knowledge project the long-s problem encyclopedia entry lengths objectives methods results discussion conclusion and next steps acknowledgements appendix a appendix b collaboration and integration: embedding library resources in canvas articles collaboration and integration embedding library resources in canvas jennifer l. murray and daniel e. feinberg information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11863 jennifer l. murray (jennifer.murray@unf.edu) is associate dean, university of north florida. daniel e. feinberg (daniel.feinberg@unf.edu) is online learning librarian, university of north florida. abstract the university of north florida (unf) transitioned to canvas as its learning management system (lms) in summer 2017. this implementation brought on opportunities that allowed for a more userfriendly learning environment for students. working with students in courses which were in-person, hybrid, or online, brought about the need for the library to have a place in the canvas lms. students needed to remember how to access and locate library resources and services outside of canvas. during this time, the thomas g. carpenter library’s online presence was enhanced, yet still not visible in canvas. it became apparent that the library needed to be integrated into canvas courses. this would enable students to easily transition between their coursework and finding resources and services to support their studies. in addition, librarians who worked with students, looked for ways for students to easily find library resources and services online. after much discussion, it became clear to the online learning librarian (oll) and the director of technical services and library systems (library director) that the library needed to explore ways to integrate more with canvas. introduction online learning is not a new concept at the unf. in fact, in-person, hybrid, and online courses used online learning in some capacity since distance learning took hold in higher education. unf transitioned to canvas as their learning management system (lms) in summer 2017, which replaced blackboard. this change, which affected all the unf’s online instruction and student learning, brought on new benefits and challenges that allowed for a more secure system for students to take in-person, hybrid, and distance learning courses. while this change occurred, unf’s library went through many changes in its virtual presence. students, specifically those who had classes that utilized canvas, needed a user-friendly way to use the library website and its resources virtually. in response, the library’s resources transitioned into having a greater online presence. however, ultimately, many students needed to use resources that they did not actually realize were available electronically from the library. through instruction and research consultations (both in-person and virtually), students needed to be directed back to the library homepage to access resources; however, the reality was that unless there was a presence of library instruction or professors pointing out library resources, students instead turned to google or other easy to find online resources to which they were previously exposed. how the project originated by spring 2018, there was growth of unf courses that were converted to online or hybrid courses. as students used canvas more, librarians started receiving feedback from in-person and online sessions that students had difficulty accessing the library’s resources while in canvas. the lack of library visibility in canvas caused the librarians to truly acknowledge that this was a problem. mailto:jennifer.murray@unf.edu mailto:daniel.feinberg@unf.edu information technology and libraries june 2020 collaboration and integration | murray and feinberg 2 students had to open a new browser window to access the library and then go back to canvas to complete their assignments, which involved multiple steps. this caused frustration among students who had to remember the library url, while also getting used to navigating their new courses in canvas. librarians consistently spent large amounts of time instructing students how to navigate to the library website during library instruction sessions and research consultations. in effect, more time was spent with students to guide them to library resources such as programmatic or course specific springshare hosted libguides (also known as library guides), or the library homepage. rather than being focused on how to use library resources and become more information literate, students spent more time on just locating the library website to get to the unf library’s online resources. together, the oll and library director talked about possibilities in canvas that would benefit all students who attended unf both in-person and online. canvas is located in unf’s mywings, a portal where all students go for coursework, email, and resources that support their academic studies at unf. it became apparent that if it was possible, there needed to be a quicker way to access the unf library resources for students. literature review with the advent of online learning, it became obvious that students needed to have library access within their online learning management system. for campuses such as unf, this meant within canvas. for unf students that are distance or online students only, this was especially true. farkas noted that librarians had worked to determine the best ways to provide library materials, services, and embed librarians into the lms.1 over the last fifteen years, lms have become more important to support the growth of online learning. pomerantz noted that the lms has become critical to instruction and online learning. approximately 99 percent of institutions adopt an lms and 88 percent of faculty utilize an lms platform.2 this “puts it in the category with cars and cellphones as among the most widely adopted technologies in the united states.”3 library guides that have been integrated into an lms increased their visibility, but did not guarantee that faculty and students would utilize them. that is why it was critical to continuously collaborate and communicate with faculty, students, and librarians to bring attention to the resources that could assist students. farkas noted that librarians at the ohio state university discussed that no matter how the library was integrated into a university’s lms, the usage of the library there was decidedly dependent on if the faculty professor promoted the library to their students.4 the reality that libraries faced was that without visibility in an lms, students that were online/distance learners needed to remember or find the library’s website. while this seemed to be inconsequential, it caused students to use google or other resources instead of their university/college’s library discovery tool or library databases. farkas noted that shank and dewald’s seminal article described a university’s lms as having two levels, macro and micro. when there was one way to access the library in the lms, then it was termed macro. this single pathway allowed for less maintenance since there was a single way to access the library from the lms.5 the university of kentucky embedded the library by adding a library tab in blackboard. other institutions like portland state university, ohio state university, and oakland state university developed library widgets to make the library more accessible.6 the addition of library and research guides in library instruction was critical to increase visibility for information technology and libraries june 2020 collaboration and integration | murray and feinberg 3 students and furthermore make sure students had easier access to the library through their lms. getting librarians’ access to the lms at their institutions is an ongoing issue.7 unf librarians wanted to determine best practices to decide how the library could integrate into canvas. therefore, research was needed to see what other university libraries were doing. the librarians at unf discovered that there was no obvious preference based on examples found in research to accomplish how to get the library into canvas. davis observed that “claiming a small amount of real estate in the lms template . . . is an easy way to put the library in students’ periphery.” by simply having a library link added or a page added to each course was “the digital equivalent for students of walking past the library on the way to class.”8 however, it seemed that a lot depended on how an lms was used at an unf and the technical expertise available. thompson and davis noted that the “lms technology has added another layer of complexity to the puzzle. as technology evolves to address changes in learning design, student and facu lty attitudes, expectations, perceptions will continue to be a critical piece of the course integration puzzle.” 9 while looking at other institutions, there were a variety of ways in which canvas and the library were integrated. there were numerous examples from embedded springshare product library guides, to the creation of modules of quizzes or tutorials, and even to the creation of online minicourses, and embedded librarians in lms courses.10 penn state university looked at their method of how to add library content into canvas. they already had a specific way of putting library guides in canvas, but it was not in a highly visible location for students to easily access. when faculty put guides in their courses, with the collaboration of librarians, the guides were used. however, many of the faculty did not use these librarians or resources. a student survey and user studies were used to best learn how to fix the problem of students and faculty that did not use the guides and content more. penn state worked with their comm 190 instructors to administer a survey that was extra credit, to ensure getting responses.11 “general findings included: 53 percent of students did not know what a course guide was; 41 percent of students had used a guide for a class project; 60 percent accessed a course guide via the lms; and 37 percent of students used course guides on their own.”12 many students were interested in doing their library research within canvas itself. it should be noted that the guides needed to be in a prominent place in canvas, but not overwhelm the course content. for course-specific guides an introductory statement was needed to describe what the guide was about. when the release of springshare’s lti tool occurred, it became an optimal time in which the technical solutions allowed for penn state’s library guides to be embedded smoothly into canvas.13 the learning tools interoperability (lti tool) allows for online software to be integrated into canvas. in effect, when professors want to add a tool to their course, it allows for more seamless and controlled avenue. in the case of library guides, it creates a way in which guides can be embedded into the lms with little problems. another example of a library integration into a campus lms was at the utah state university (usu) merrill-cazier library. they looked to find a way to maximize the effectiveness of springshare’s library guides when they assessed the design and reach of library guides within their lms.14 they took a unique approach to build an lti tool that automatically pulled in the most appropriate library guide when the “research help” link in canvas was activated by a professor. they also saw this as an opportune time to redesign their subject guides and ensure there were guides for all disciplines. they provided usage data to subject librarians to help determine where there might be opportunities to interact with classes and provide more library instruction. overall, information technology and libraries june 2020 collaboration and integration | murray and feinberg 4 the study and feedback they received from students helped them to find ways to improve how librarians used and thought about library guides, and expanded their reach based on usage data. 15 this ability to add library guides to canvas provided students a way to access library materials or the library without having to leave the online classroom. many libraries have conducted focus groups and usability studies that were key to providing valuable feedback on the knowledge and understanding that faculty and students had of guides, ways to improve information shared that assisted students with their coursework and faculty in their online teaching. research indicated that exploration and implementation of integrating library guides into an lms led to a need to improve and provide more consistently designed guides.16 the literature indicated the importance of a strong relationship with the department that manages the lms. these integrations were made much easier when there was a relationship established and it sometimes led to finding out about additional opportunities to integrate more with the lms. penn state saw an increase of over 200,000 hits to its course guides believed to be because of the lti integration.17 this, however, did not guarantee that the students benefited from the course guides, similar to library statistics not proving resources were being used despite page hits. in addition, faculty were able to turn off the lti course guide tool, which reduced the chances of student usage or awareness of the course guide. based on feedback from students and faculty, it did not matter where the course guides were since they could be ignored anyway. a penn state blog was developed by the teaching and learning with technology unit to provide instructors a venue in which they could be aware of online services that librarians provide.18 “although automatic integration allows specialized library resources to be targeted at all lms courses, that does not mean that they’ll be accessed. it is important then to build ongoing relationships with stakeholders, providing not just information that such integrations exist, but also reasons why to use them.”19 however, not all universities and colleges decided to integrate the library strictly through a library guide or a link to the library integration into their lms. karplus noted that students spent more time online rather than going to the physical academic library. karplus discussed that the digital world combined with academic library resources had two benefits. one of which brought online research as a more normal occurrence. the second benefit was that students were more comfortable with accessing online resources.20 while using blackboard, st. mary’s college’s goal was to incorporate library information literacy modules into courses that existed. using the blackboard lms, students were able to access all components of their courses including assigned readings. this became their academic environment. therefore, information literacy modules, tutorials, and outside internet resources could be added to the lms.21 tutorials combined with preand post-testing, gave faculty instant feedback. librarians were also able to participate in blackboard through discussion boards and work with students.22 there was a constant need to update the modules and the information added to blackboard. librarians having access to the blackboard site, allowed for students to use the library resources more readily. “the site can be the focal point for many librarians in one location thus ensuring a consistent, collaborative instructional program.”23 overall, the integration of campus librarians into an lms was to get students to use the library in order to be more successful in their academic endeavors. information technology and libraries june 2020 collaboration and integration | murray and feinberg 5 developing a plan of action initially, the oll and library director brainstormed possible integration ideas, ranging from adding a library global navigation button to the canvas ui, to adding a link to the library in the canvas help menu. at the same time, they also researched what other libraries had done. after brainstorming, it was realized that additional conversations needed to occur within the library and with unf’s online learning support team, a part of the center for instruction and research technology (cirt), the group that manages canvas. the discussion to integrate the library and canvas was a complex matter. unf administrators asked for a proposal to be written so it could be brought to the library, online learning support team, and information technology services (its) stakeholders for discussion and approval. that proposal, along with much needed discussion, was critical in order to determine the possibilities and actions that needed to be taken. that being said it was important to recognize the importance of what was best to serve the faculty and students. when brainstorming discussions started to occur with unf’s online learning support team, it was important for the library to determine what options were available to embed the library in canvas. the library had a strong relationship with unf’s online learning support team and its administrators, which made this an easy process to pursue. what the oll and library director initially wanted was to add a simple link to the global navigation in canvas that would take all users to the library homepage. however, it became apparent that this was not possible due to the fact that this space is limited and many departments on campus would like greater visibility in canvas. the next option, which was easier to implement, was to add a link to the library homepage under the help menu in canvas. although this menu link was added, it was so hidden in canvas that the oll and library director felt that it would never be found in canvas by faculty, let alone students. cirt administrators asserted to the oll and library director of what other possibilities were available. after researching options, the library recommended creating access to library resources and services using a springshare lti tool for library guides, which cirt agreed to. library guides, or libguides, are library webpages that are built from springshare software. using the lti tool seemed like a great possibility since it would allow for more of a presence in addition to the help link to the library homepage. after approval from library administration and initial discussions with it, the project moved forward. implementation the project took about a year to complete from the time discussions began internally in the library to the time the integration went live (see figure 1). information technology and libraries june 2020 collaboration and integration | murray and feinberg 6 figure 1. project timeline the idea to have a seamless entryway to the library seemed to be a good idea based on observations and feedback from students, but the oll and library director started by completing an environmental scan to see what other institutions did to get ideas on ways the unf library could integrate into canvas. the oll and library director learned that there were a variety of ways it had been done from the integration of the library at the global navigation level, course level, and by an added link to the library under the help menu in canvas. it became clear that an integration into canvas would seem like an obvious progression to strengthen not only online learning, but also give students the ability to benefit from the resources that the library subscribed to and enhance their curricular needs. conversations then occurred with unf’s online learning support team to discuss integration options further. after much discussion, a decision was made to pursue an added link to the librar y website under the canvas help menu and a new lti tool at the course level. since canvas was used in so many courses, it was determined that university-wide campus committee agreement was needed on how to go about adding library guides to canvas courses. librarians were also approached at this time to get their input and feedback. the goal seemed obvious to the librarians. when they were approached, buy-in to support the students with canvas by way of the help button and lti tool integration seemed more than straightforward. therefore, for the librarians, the goal was to solve the problem of making sure that students could easily access library materials. overall, the library faculty’s preference for the implementation was to embed the library website under the canvas help menu while also have the student resources libguide inside all canvas courses using the springshare lti tool. after all internal approvals were obtained, the link to the library was seamlessly added under the canvas help menu. as for the springshare lti tool, it required more work and discussion before it could be implemented. after approval was granted from the unf online learning support team and campus its security team, the integration began. configuration options for the lti tool were explored and the systems and digital technologies librarian worked closely with the unf online learning support team and springshare to setup the libapp lti tool. information technology and libraries june 2020 collaboration and integration | murray and feinberg 7 the first step was to configure springhare’s automagic lti tool to automatically add libguid es to courses in canvas. this included adding an lti tool name and description, which appeared in canvas during setup and the course navigation. it was also decided to set the student resources libguide as the default guide for all courses based on feedback from across campus. instructors could request to use a different libguide for their course. to enable this, two parameters had to be set in the automagic lti tool to enable libguide matching between canvas and libguides: • lti parameter name: for unf, this was set to “context” label, to select the course code field in canvas. • libguides metadata name: this was set to the appropriate value to identify the metadata field used in libguides. if an instructor decided to change the default guide to another guide, these two parameters would need to be entered into a specific libguide’s custom metadata, so that canvas could link to the designated guide to display in a course. the change had to be made in the libguide itself, so it was handled by the systems and digital technologies librarian. there had not been many instructors who had requested this yet, but if utilized, the library would also have had to ensure this carried over each semester by updating the metadata in the guide to the new course code. after the configuration was completed on the springshare side, the next step was to set up the integration in the canvas test environment. an external application had to be installed in canvas to allow the springshare lti tool to run. after it was tested, the application was applied across all courses and set to display by default, which the majority of faculty preferred. faculty who did not want to use the integration had the ability to manually turn it off in canvas. during the implementation setup, a few minor issues were encountered. after seeing what the student resources guide looked like in canvas it became clear that the header and footer were not needed and just cluttered the guide. they were both removed in the lti setup options to ensure a cleaner looking guide. since the libguides were being embedded into another system (canvas), formatting of the guides had to be adjusted. the other issue encountered was trying to add available springshare widgets such as the library hours or room booking content to the guide using the automagic lti tool. while this was not successful, it was determined that the additional options were not needed. once the integration was set up in the canvas test environment, demonstrations were held and input was gathered from stakeholders through campus-wide meetings with faculty to obtain their input. it was critical to determine if faculty would utilize libguides in their canvas courses. an overview of the integration and the benefits were given to the campus technology committee and distance learning committee faculty. a demonstration was also given so that these faculty committees could see what the integration would look like in their courses. overall, the feedback obtained from the faculty was very positive. the preference was to have the configuration be optout, where the library guides would automatically display in canvas courses. many faculty members were excited about the integration and looked forward to having it in their courses. after demos took place and final setup was completed based on feedback, the integration was then setup in the canvas production environment and was announced via newsletters, emails , and social media. as of the fall 2019 semester, the library's student resources guide was integrated into all courses in canvas (see figure 2). information technology and libraries june 2020 collaboration and integration | murray and feinberg 8 figure 2. student resources library guide in canvas benefits of the integration students are dependent on their campus lms in order to complete their coursework, support their studies, and in the case at unf, have easier access to the online campus. the libguide integration not only streamlined their way to library resources, but also promoted library usage from students that may not have known how to get to the resources available to them. for faculty it should be noted that they were able to replace the default student libguide with a more specific subject or course guide. either way, it brought more awareness to resources and services that supported curricular needs. the springshare chat widget in the guide also gave students the ability to communicate directly with a librarian from within canvas. this integration not only increased the library visibility in the online environment but enabled all students, whether inperson, hybrid, or online, with direct access to the resources they needed for their coursework. challenges of the integration although there were many benefits to integrate the library into canvas, there were many challenges with making the integration happen. there were many more stakeholders than expected. from library administration, to the canvas administrators, to library faculty, and teaching faculty committees, their input was needed prior to the project taking place. since the project grew organically, this meant that all of the stakeholders were brought in as the project grew or unfolded. once the project received approval from the library and cirt administrators, its administrators had to give the final approval in order to proceed with the integration of library guides. the process to implement the integration took some time to figure out. in addition, getting buy-in from the teaching faculty was key as the navigation options in their canvas courses would be impacted. making sure the faculty understood how it would assist their students was important as the goal was to help their students succeed with their coursework. information technology and libraries june 2020 collaboration and integration | murray and feinberg 9 a concern was if faculty would tell their students, or conversely, would students find the link to the libguide on their own? determining how the news of the library and canvas integration would be communicated to the unf community was the final step. the library director, oll, and cirt administrators needed to determine the best communication routes to get the unf community aware of this news. in effect, emails, unf updates/newsletters, and by word of mouth by teaching faculty. it was crucial that students be aware of these tools. this meant that going forward, unf would depend on word of mouth or student's curiosity in the canvas navigation bars themselves. discussion and next steps integrating the library with the unf’s learning management system, canvas, took much planning and collaboration, which was key to creating a more user-friendly learning environment for students. in reflecting on what went well and what did not, the unf librarians learned several important lessons that will help improve upon the implementation of future projects. to begin, it is important to identify and involve stakeholders early on, so they can provide feedback along the way. getting buy-in from the teaching faculty is also key since the integration affects the navigation options in their canvas courses. for unf, initially, the oll and library director did not realize how many groups of teaching faculty and departments would need to approve this canvas change and implementation. it was important to have them understand the importance of the integration and how it can assist their students with their coursework. considering the content of the library guides was important because of the impact it would have on canvas courses. for example, at the unf library some students thought that the librarian’s picture on the default guide was in fact their professor. in turn, students began to contact her. this caused much confusion for our students and professors alike. along the way, communication is critical so that everyone is kept informed as the integration progresses. communication at the appropriate times and ensuring all information is gathered about configuration options before starting conversations with stakeholders is important too. having this transparency at the appropriate times and ensuring there was enough info rmation about the configuration options before starting conversations with stakeholders was important too. finally, investigating the ability to retrieve usage statistics from day one would be extremely beneficial and provide data to assess how often the library guides are being used in the lms and by whom. this information would help determine next steps and explore other potential integration opportunities. at unf, the librarians were not able to implement statistics as part of our integration which has made it more difficult to assess the usage of the library guides in canvas. now that the integration has been completed, ensuring the integration continues to meet the needs of faculty and students will be important. feedback will need to be gathered from stakeholders to find out if they find the integration useful, if there are any issues being encountered, and/or if they have any recommendations for ways to enhance the integration. usage statistics will also be gathered as soon as they are available. this will provide information on which instructors are using the library guides in their courses and which instructors are not using them. for those who have used it, it will be an opportunity to target those courses for instruction. for those who have not used them, it will be an opportunity to find out why and make sure they are aware of the benefits of using them in their courses. information technology and libraries june 2020 collaboration and integration | murray and feinberg 10 exploring other integration possibilities, especially as the technology continues to evolve, will be important to ensure the library continues to reach students. while the natural progression of the unf integration would be to embed librarians in the canvas platform, others have faced challenges. “according the ithaka s & r library survey 2013 by long and schonfeld, 80–90 percent of academic library directors perceive their librarians’ primary role as contributing to student learning while only 45–55 percent of faculty agree librarians contribute to student learning.”24 even though this is a challenge, faculty collaboration with librarians is crucial for the embedded librarian role. without a requirement of embedded librarianship, marketing for the librarians and what they can do for students will be essential for their role to be successful.25 at unf, conversations will have to be held to determine what other integrations would be of interest and possible at our university. the unf library will also be looking to improve the design and layout of library guides. now that their visibility has increased, it will be important to standardize them and ensure they all have a consistent look and feel, which will make it easier for students to find the information and resources they are looking for. conclusion in today’s rapidly changing technological world, it is critical to make resources available despite where students are physically located. integrating the library’s libguides into canvas not only brought more visibility to the library, its resources, and its services, but it also brought the library to where the students were engaged with the university. as noted by farkas, “positioning the library at the heart of the virtual campus seems as important as positioning the library as the heart of the physical campus.”26 providing resources to students at their point of need, enabled them to easily access the information they needed to help them succeed in their courses. it also allowed faculty to integrate library resources that were most beneficial to their courses and enhanced their teaching as well as the educational needs of their students. the unf library will continue to look at how library resources are used, and how to best serve the online community going forward. it will be important to explore ways to enhance existing services with existing technology but also look ahead and determine what may be possible down the road with new and upcoming technologies. in addition, assessing how the library connects to online learners and gathers feedback from students and faculty will be critical to contributing to the success of students. endnotes 1 meredith gorran farkas, “libraries in the learning management system,” american libraries tips and trends (summer 2015), https://acrl.ala.org/is/wpcontent/uploads/2014/05/summer2015.pdf. 2 jeffrey pomerantz et al., “foundations for a next generation digital learning environment: faculty, students, and the lms” (jan 12, 2018): 1–4. 3 pomerantz et al., “foundations for a next generation digital learning environment.” 4 farkas, “libraries in the learning management system.” https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf information technology and libraries june 2020 collaboration and integration | murray and feinberg 11 5 farkas, “libraries in the learning management system.” 6 farkas, “libraries in the learning management system.” 7 farkas, “libraries in the learning management system.” 8 robin camille davis, “the lms and the library,” behavioral & social sciences librarian 36, no. 1 (jan 2, 2017): 31–5, https://doi.org/10.1080/01639269.2017.1387740. 9 liz thompson and davis vess, “a bellwether for academic library services in the future: a review of user-centered library integrations with learning management systems,” virginia libraries 62, no. 1 (2017): 1–10, https://doi.org/10.21061/valib.v62i1.1472. 10 davis, “the lms and he library.” 11 amanda clossen and linda klimczyk, “chapter 2: tell us a story: canvas integration strategy,” library technology reports 54, no. 5 (2018): 7–10, https://doi.org/10.5860/ltr.54n5. 12 clossen and klimczyk, “chapter 2,” 8. 13 clossen and klimczyk, “chapter 2,” 8. 14 britt fagerheim et al. “extending our reach,” reference & user services quarterly 56, no. 3 (2017): 180–8, https://doi.org/10.5860/rusq.56n3.180. 15 fagerheim et al., “extending our reach,” 187. 16 fagerheim et al., “extending our reach,” 188. 17 amanda clossen, “chapter 7: ongoing implementation: outreach to stakeholders,” library technology reports 54, no. 5 (2018): 28. 18 amanda clossen, “chapter 7,” 29. 19 amanda clossen, “chapter 7,” 29. 20 susan s. karplus, “integrating academic library resources and learning management systems: the library blackboard site,” education libraries 29, no. 1 (2006): 5, https://doi.org/10.26443/el.v29i1.219. 21 karplus, “integrating academic library resources and learning management systems.” 22 karplus, “integrating academic library resources and learning management systems.” 23 karplus, “integrating academic library resources and learning management systems.” 24 beth e. tumbleson, “collaborating in research: embedded librarianship in the learning management system,” reference librarian 57, no. 3 (jul, 2016): 224–34, https://doi.org/10.1080/02763877.2015.1134376. https://doi.org/10.1080/01639269.2017.1387740. https://doi.org/10.21061/valib.v62i1.1472 https://doi.org/10.5860/ltr.54n5 https://doi.org/10.5860/rusq.56n3.180 https://doi.org/10.26443/el.v29i1.219 https://doi.org/10.1080/02763877.2015.1134376 information technology and libraries june 2020 collaboration and integration | murray and feinberg 12 25 tumbelson, “collaborating in research.” 26 farkas, “libraries in the learning management system.” abstract introduction how the project originated literature review developing a plan of action implementation benefits of the integration challenges of the integration discussion and next steps conclusion endnotes 10980 2019038 editor lita president’s message updates from the 2019 ala midwinter meeting bohyun kim information technology and libraries | march 2019 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, rhode island. in this president’s message, i would like to provide some updates from the 2019 ala midwinter meeting held in seattle, washington. first, as many of you know, the potential merger of lita with alcts and llama has been temporarily put on hold, due to an initial timeline that was rather ambitious and the lack of time required to deliberate on and resolve some issues in the transition plan to meet that timeline.1 these updates were also shared at the lita town hall during the midwinter meeting, where many lita members spent time discussing topics such as the draft mission and vision statements for the new division, what makes people feel at home in a division, in which areas lita should redouble its focus, and which activities lita may be able to set aside without losing its identity. valuable feedback and thoughts were provided by town hall participants. many emphasized the importance of building and retaining a community of library technologists around lita values, programming, resources, advocacy, service activities, and networking opportunities in those feedback. the merger-related discussion is to resume this spring, and the leadership of lita, alcts, and llama will make every effort to ensure the best future for three divisions at this time of great flux and change. second, lita is looking into introducing some changes to the lita forum. in the feedback and thoughts gathered at the lita town hall, the lita forum was also mentioned as one of the valuable lita offerings to its members. the origin of the lita forum goes back to lita’s first national conference held in baltimore in 1983.2 since then, the lita forum has become a cherished venue for many library technologists, a place where they meet other like-minded people in the field, learn from one another, share ideas and experience, and look for more ways in which technology can be utilized to better serve libraries and library patrons. initially, the steering committee hoped that all three divisions would participate in putting together the lita forum with a wider range of content that encompasses the interests of not only lita members but also of those in alcts and llama, in a virtual format in order to engage more members who cannot easily travel, to be held some time in spring 2020. at the time this idea was conceived more than a year ago, it was assumed that all preparations for the member vote regarding the merger would have been nearly completed by the time of the midwinter meeting. however, the steering committee unfortunately ran out of time for that preparation. merger planning also took up almost the entirety of the time that the leadership and the staff of the three divisions had available. this resulted in an unfortunate delay in proper forum planning. with the merger conversation on hold at this point and the new timeline for the merger being likely to be set back at least by a year, the changed circumstances for the forum planning had to be reviewed. information technology and libraries | march 2019 3 after a lively and thoughtful discussion at the midwinter meeting, the lita board decided that, considering how much work remains to be done regarding merger planning, it may not be practical or feasible to have the next lita forum be the first virtual and joint one. however, there was a lot of interest in and excitement about trying a virtual format since it will allow lita to reach and serve the needs of more lita members than the traditional in-person meeting. it was also pointed out that the virtual format may provide an opportunity for lita to experiment with different and more unconventional conference program formats, which could be a welcoming change to lita members. the lita board, however, also acknowledged the value of a physical conference where people get to meet one another in person, which cannot be easily transferred to a virtual conference. if the virtual conference experiment takes place and is successful, lita may hold its forum alternating every year between two different formats – virtual and physical. planning for and running a fully virtual conference at the scale of a multi-day national forum will require additional time and careful consideration since it will be the first time the lita forum planning committee and the lita office attempt this. logistics management is likely to be quite different in a virtual conference. the attendee expectations and the user experience will also significantly differ in a virtual conference than in a physical conference. as the first step of this investigation, the lita forum planning committee will explore what the ideal lita virtual forum may look like in terms of programming formats and participant experience. the lita office and the finance advisory committee will also look into the financial side of running the lita forum in a virtual format. at this time, it is not yet determined when the next lita forum will be held and whether it will be a virtual or a physical one. once these investigations are completed, however, the lita board should be able to decide on the most appropriate path towards the next lita forum. stay tuned for what exciting changes may be coming to the lita forum. third, i would like to mention that lita issued a statement regarding the incidents of aggressive behavior, racism, and harassment reported at the 2019 ala midwinter meeting.3 along with the statement, the lita board has decided to commit funds to provide an online bystander / allyship training, which we hope will equip lita members with tools that empower active and effective allyship, recognize and undo oppressive behaviors and systems, and promote the practice of cultural humility, thereby collectively increasing our collaborative capacity. the lita statement and the board decision were received positively by many lita members. other ala divisions such as alcts, alsc, asgcla, llama, united, and yalsa have already expressed interest in working together with lita on this, and the lita board is looking into a few options to choose from. more information about the training will be soon provided. lastly, i am thrilled to announce that the lita president’s program at the upcoming ala annual conference at washington d.c in june will feature meredith broussard, a data journalist and the author of artificial unintelligence: how computers misunderstand the world, as the speaker. in her book, broussard delves into many problems surrounding techno-chauvinism, which displays blind optimism about technology and an abundant lack of caution about how new technologies will be used. she further details how this simplistic worldview that prioritizes building new things and efficient code above social conventions and human interactions often misinterprets a complex social issue as a technical problem and results in a reckless disregard for public safety and the public good. lita president’s message: updates from the 2019 ala midwinter meeting | kim 4 https://doi.org/10.6017/ital.v38i1.10980 reviewing the early history of computing and digital technology, broussard observes: “we have a small, elite group of men who tend to overestimate their mathematical abilities, who have systematically excluded women and people of color in favor of machines for centuries, who tend to want to make science fiction real, who have little regard for social convention, who don’t believe that social norms or rules apply to them, who have unused piles of government money sitting around, and who have adopted the ideological rhetoric of far-right libertarian anarcho-capitalists. what could possibly go wrong?”4 i invite all of you to come to this program for more insight and a deeper understanding about what the recent technology innovation involving artificial intelligence (ai) and big data means to our everyday life and where it may be headed. the program information is available in the ala 2019 annual conference scheduler at https://www.eventscribe.com/2019/alaannual/fspopup.asp?mode=presinfo&presentationid=519109. endnotes 1 the official announcement can be found at the lita blog. see bohyun kim, “update on new division discussions,” lita blog, january 26, 2019, https://litablog.org/2019/01/update-onnew-division-discussions/. 2 stephen r. salmon, “lita’s first twenty-five years: a brief history,” library information technology association (lita), september 28, 2006, http://www.ala.org/lita/about/history/1st25years. 3 “lita’s statement in response to incidents at ala midwinter 2019,” lita blog, february 4, 2019, https://litablog.org/2019/02/litas-statement-in-response-to-incidents-at-alamidwinter-2019/. 4 meredith broussard, artificial unintelligence: how computers misunderstand the world (cambridge, massachusetts: the mit press, 2018), p. 85. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the internet, the world wide web, library web browsers, and library web servers jian-zhong, zhou information technology and libraries; mar 2000; 19, 1; proquest pg. 50 tutorial the internet, the world wide web, library web browsers, and library web servers jian-zhong (joe) zhou this article first examines the difference between two very familiar and sometimes synonymous terms, the internet and the web. the article then explains the relationship between the web's protocol http and other high-level internet protocols, such as telnet and ftp, as well as provides a brief history of web development. next, the article analyzes the mechanism in which a web browser (client) "talks" to a web server on the internet. finally, the article studies the market growth for web browsers and web servers between 1993 and 1999. two statistical sources were used in the web market analysis: a survey conducted by the university of delaware libraries for the 122 members of the association of research libraries, and the data for the entire web industry from different web survey agencies. many librarians are now dealing with the internet and the web on a daily basis. while the web is sometimes synonymous with the internet in many people's minds, the two terms are quite distinct, and they refer to different but related concepts in the modem computerized telecommunication system. the internet is nothing more than many small computer networks that have been wired together and allow electronic information to be sent from one network to the next around the world . a piece of data from joe zhou (joezhou@udel.edu) is associate librarian at the university of delaware library, newark. beijing, china may traverse more than a dozen networks while making its way to washington, d.c. we can compare the internet to the great wall of china, which was built in the qin dynasty around the third century b.c. by connecting many existing short defense walls built by previous feudal states . the great wall not only served as a national defense system for ancient china, but also as a fast military communication system. a border alarm was raised by means of smoke signals by day, and beacon fires at night, ignited by burning a mixture of wolf dung , sulfur, and saltpeter. the alarm signal could be relayed over many beacon-fire towers from the western end of the great wall to the eastern end (4,500 miles away) within a day . this was considered light speed two thousand years ago. however, while the great wall transferred the message in a linear mode, the internet is a multidimensional network. the web is a late-comer to the internet, one of the many types of high-level data exchange protocols on the internet. before the web, there was telnet, the traditional commanddriven style of interaction. there was ftp, a file transfer protocol useful for retrieving information from large file archives. there was usenet , a communal bulletin board and news system. there was also e-mail for individual information exchange, and e-mail lists, for one-to-many broadcasts. in addition, there was gopher, a campus-wide information system shared among universities and research institutions, and wais, a powerful search and retrieval system developed by thinking machines, inc. in 1990 tim bemerslee and robert cailliau at cern (www. cern.ch), the european laboratory for particle physics, created a new information system called "world wide web" (www). designed to help the cern scientists with the increasingly confusing task of exchanging information on the 50 information technology and libraries i march 2000 internet, the web system was to act as a unifying force, a system that would seamlessly bind all file-protocols into a single point of access. instead of having to invoke different programs to retrieve information via various protocols, users would be able to use a single program, called a "browser," and allow it to handle all the details of retrieving and displaying information. in december 1993 www received the ima award, and in 1995 bemers-lee and cailliau received the association for computing (acm) software system award for its development. the web is best known for its ability to combine text with graphics and other multimedia on the internet. in addition, the web has some other key features that make it stand out from earlier internet information exchange protocols. since the web is a late-comer to the internet, it has to be compatible backwards with other communications protocols in addition to its native language, hypertext transfer protocol (http). among the foreign languages spoken by web browsers are telnet, ftp, and other high-level communication protocols mentioned earlier. this support for foreign protocols lets people use a single piece of software, the web browser, to access information without worrying about shifting from protocol to protocol and software incompatibility . despite different high-level protocols including http for the web, there is one thing in common for all parts of the internet-tcp/ ip, the lower level of the internet protocol. tcp /ip is respon sible for establishing the connection between two computers on the internet and guarantees that the data can be sent and received intact. the format and content of the data are left for high-level communication protocols to manage, among which the web is the best known one. at the tcp /ip level all computers "are created equal." two computers establish a connection and start to reproduced with permission of the copyright owner. further reproduction prohibited without permission. communicate. in reality, however, most conversations are asymmetric. the end user's machine (the client) usually sends a short request for information, and the remote machine (the server) answers with a longwinded response. the media is the internet. the common language on the internet can be the web or any other high-level protocols . on the web, the client is the web browser; it handles the user's request for a document. the first web browser, ncsa mosaic, developed by the national center for supercomputing applications (ncsa) at the university of illinois at urbanachampaign, was released in midnovember 1993 for unix, windows, and macintosh platforms. version 3.0 of ncsa mosaic is available at www. ncsa. uiuc.ed u/ sdg /software/ mosaic. both source code and binaries are free for academic use. mosaic lost market share to netscape after its key developer left ncsa and joined netscape. even after mosaic introduced an innovative 32-bit version in early 1997, which can perform feats that other major browsers had not even thought of back then, mosaic remained out of the major browsers' market. the two most widely-used browsers today are microsoft's internet explorer (ie) and netscape's navigator (part of the netscape communicator suite). recent web browser surveys conducted by different internet survey companies such as www.zonaresearch.com/ browserstudy, www.psrinc.com/ trends.htm, and www .statmarket. com all indicate that ie is the market leader with more than 60 percent market share, leaving navigator with between 35 percent and 40 percent. in 1995 ie had only 1 percent share versus navigator's more than 90 percent, an unimaginable rise critics have attributed to microsoft's strategy of bundling the browser with its near-monopoly windows operating system. however, a survey conducted in december 1998 by the university of delaware library of 122 members of the association of research libraries (arl) showed that netscape still remained the market leader among big academic libraries. more than 90 percent of arl libraries supported netscape, and about 50 percent also supported ie. most arl libraries supported both browsers, and unlike the browser industry survey mentioned earlier, in which only one product can be picked as the primary browser , the sum of the percentages for the arl survey was greater than 100 percent. the main function of the web browser is to request a document available from a specific server through the internet using the information in the document's url. the server on a remote machine returns the document usually physically stored on one of the server's disks. with the use of common gateway interface (cgi), the documents do not have to be static. rather, they can be synthesized at the point of being requested by cgi scripts running on the server's side of the connection . in some database-driven web servers that make the core of today's e-commerce, the documents provided may never exist as physical files but are generated as needed from database records . the web server can be run on almost any computer, and server software is available for almost all operating systems, such as unix, windows 95/98/nt, macintosh, and os / 2. according to the university of delaware library's 1998 survey of internet web servers among arl member libraries, more than 32 percent of arl libraries chose apache as their web server software, followed by the netscape series at 29.32 percent, ncsa httpd at 11.28 percent, and microsoft internet information server (iis) at 7.52 percent. in july 1999 the author checked the netcraft survey at www .netcraft. com/survey . the top three web server software programs for more than 6.5 million web sites are apache (56.35 percent) , microsoft-hs (22.33 percent), and netscape (5.65 percent). the netcraft survey also provides the historical market share information of major web servers since august 1995. ncsa httpd was the first web server software released, about the same time as the release of mosaic in 1993. however, it slipped from the number-one position with more than 90 percent market share in 1993, and almost 60 percent in 1995, to less than 1 percent in july 1999. it is no longer supported by ncsa, however, httpd remains a popular choice for web servers due to its small size, fast performance, and solid collection of features . the "inertia effect" of the existing sites (if it runs well, why bother to change?) will likely keep ncsa on the major web server software list for some time. ncsa is free, but available only for the unix platform. it is available from http:/ /hoohoo .ncsa.uiuc.edu. however, when the author visited the site in july 1999, the following message appeared on the main page : "the ncsa httpd is no longer under development. it is an unsupported product. we recommend that you check out the apache server, instead of installing our server." most people who use only web browsers may have heard of apache only as an indian nation or a military helicopter, not the most popular web server software with more than 50 percent market share . it was first introduced as a set of fixes or "patches" to the ncsa httpd. apache 1.0 was released in december 1995 as open-source server software by a group of webmasters who named themselves the apache group. open-source means the source code is available and freely distributed, and it is the key to apache's attractiveness and popularity. the apache group members were nsca users tutorial i zhou 51 reproduced with permission of the copyright owner. further reproduction prohibited without permission. who decided to coordinate development work on the server software after nsca stopped. in july 1999 the apache group announced that it was establishing a more formal organization called the apache software foundation (asp). in the future, the asp (www .apache.org) will monitor development of the free software, but it will remain a "not-for-profit" foundation. apache is high-end, enterprise-level server software and can be run on os/2, unix (including linux), and windows platforms, but a mac version is still not available. the netscape series includes netscape-enterprise, netscape-pasttrack, netscape-commerce, and netscape-communication . enterprise is a high-end, enterprise-level server while pasttrack serves as an entrylevel server for small workgroups. netscape supports both the unix and the windows nt platforms. the other major commercial web server, microsoft internet information server (iis), as of 1999, is only available for the windows platform. however, one advantage of iis over netscape is that it can be downloaded for free as part of the windows option pack. in addition, iis can handle ms office documents very well. while both the microsoft and netscape brand names are well recognized by millions of end users. a name alone does not necessarily equate to large market share, nor does a deep pocket. apache remains the top web server despite intense competition. one of the keys to apache's success, in addition to its outstanding performance, lies in its open-source code movement and active user support on a wide basis. the web server of choice for the macintosh platforms is webstar. however, due to the limitations of the operating system networking software, the performance of macintosh-based servers has not been great. webstar can be downloaded as a free evaluation release from www.stamine.com/webstar. the web server market is dynam52 information technology and libraries i march 2000 ic and competition intense. there are more than sixty web server products on the top list ( of web servers with more than one thousand web sites) as of july 1999, and newcomers are being added frequently. acknowledgments the author thanks peter liu, head of the systems department at the university of delaware library, for providing the web survey data of arl libraries . after this article was submitted, the survey data was published by arl in 1999 as spec kit 246: web page development and management. the author also wants to thank his dear wife min yang for her technical assistance. min is webmaster and system administrator for the web site at a. i. dupont nemours foundation and hospital for children, http:/ /kidshealth.org. editorial board thoughts column getting to yes: stakeholder buy-in for implementing emerging technologies in your library ida joiner information technology and libraries | september 2018 5 ida a. joiner (ida.joiner@gmail.com), a member of lita and the ital editorial board, is the librarian at the universal academy school in irving, texas. she is the author of “emerging library technologies: it’s not just for geeks” (elsevier, august 2018). have you ever wanted to implement new technologies in your library or resource center such as (drones, robotics, artificial intelligence, augmented/virtual reality/mixed reality, 3d printing, wearable technology, and others) and presented your suggestions to your stakeholders (board members, directors, managers, and other decision makers) only to be rejected based on “there isn’t enough money in the budget,” or “no one is going to use the technology,” or “we like things the way that they are,” then this column is for you. i am very passionate about emerging technologies, how they are and will be used in libraries/ resource centers, and how librarians will be able to assist those who will be affected by these technologies. i recently published a book introducing emerging technologies in libraries. i came up with suggestions on how doing your research — including the questions below and those on the accompanying checklist —will prepare you to meet with your stakeholders and improve the likelihood of your emerging technology proposal being approved. 1. who are your stakeholders and include them early on in the process? determine who you stakeholders are, what their areas of expertise are, and how they can support your emerging technology projects. the most critical piece to getting your stakeholders on board to support your technology initiatives is addressing the question “what’s in it for them?” this will get their attention and increase your odds to getting to say “yes” to your technology initiatives. 2. what are the costs? research what your costs will be and create a budget. find innovative ways to fund your initiatives by researching grants, strategic partnerships with others who might be interested in partnering with you, and locating other funding opportunities. 3. what are the risks? identify any potential risks so that you are prepared to discuss how you will mitigate them when you meet with your stakeholders. some potential risks that you might want to address are budget cost overruns or staffing issues such as a key person resigning or going on maternity or sick leave, or policies in place to deter patrons from trying to use the technology for criminal means. mailto:ida.joiner@gmail.com https://www.elsevier.com/books/emerging-library-technologies/joiner/978-0-08-102253-5 https://www.elsevier.com/books/emerging-library-technologies/joiner/978-0-08-102253-5 getting to yes | joiner 6 https://doi.org/10.6017/ital.v37i3.10746 4. what is the timeline and key milestones? address the timeline for when you want or need to implement these technologies? have you planned for key milestones and possible delays such as funds not being available? you need to have a detailed timeline, from your first kickoff meeting with your initiative’s team, to your stakeholder meeting where you present your proposal, to getting signoff on the project. 5. what training will you offer? perform a needs assessment to determine who will need to be trained, what training you will offer, what your training costs will be, and who will pay for them. once you have all of this in place, you will select the trainer(s) and the training model (such as “train the trainer”) that you will use. 6. how will you market your technology initiatives? will you rely on social media to market your technology initiatives? will you collaborate with your marketing department for developing your message through press releases, websites, blogs, e-newsletters, flyers, and other media outlets? you will need to meet with your marketing and publications experts to plan how you will market your emerging technology initiatives along with your costs and who will pay them. 7. who is your audience and how can you engage them? this is the one of the most important areas to address in your proposal to present to your stakeholders. without our patrons, there is no library. you will need to determine who your audience is and how you can utilize the emerging technologies to assist them. are they k to 12 students, adults who will be displaced by these technologies, technology novices who want to learn more about these technologies, or university faculty and/or students who want to use the technology for their projects? you can address all of these potential audiences in your proposal to your stakeholders. these are just a few tips on how to get stakeholder buy-in for implementing emerging technologies in your library. feel free to share some of your own successes in getting shareholders on board to implement emerging technologies in your library or resource center. information technology and libraries | septmeber 2018 7 emerging technology stakeholder buy-in questionnaire i have included questions below that you should follow when you are considering getting your stakeholders on board to implement new emerging technologies in your library. if you address all of these, you have a very good chance of getting your stakeholders on board to support your initiatives. 1. what technologies do you want to implement in your library/resource center and why do you want them? 2. who are your stakeholders and what are their backgrounds? 3. why should your stakeholders support your technology initiatives? 4. what is your budget for your new technology initiatives? 5. what training is needed to support these initiatives, who will provide the training, what are the costs, and who will pay for the training? 6. how will you market these technology initiatives, what are the costs, and who will pay for them? 7. did you perform a cost-benefit-analysis for these technology initiatives? 8. are there legal fees? if so, what are they, and who will pay for them? 9. what are the risks? 10. what are the returns on the investment (roi)? 11. what strategic partnerships can you establish? 12. what is your timeline for implementing these technology initiatives? emerging technology stakeholder buy-in questionnaire articles “good night, good day, good luck”: applying topic modeling to chat reference transcripts megan ozeran and piper martin information technology and libraries | june 2019 59 megan ozeran (mozeran@illinois.edu) is data analytics & visualization librarian, university of illinois library. piper martin (pm13@illinois.edu) is reference services & instruction librarian, university of illinois library. abstract this article presents the results of a pilot project that tested the application of algorithmic topic modeling to chat reference conversations. the outcomes for this project included determining if this method could be used to identify the most common chat topics in a semester and whether these topics could inform library services beyond chat reference training. after reviewing the literature, four topic modeling algorithms were successfully implemented using python code: (1) lda, (2) phrase-lda, (3) dmm, and (4) nmf. analysis of the top ten topics from each algorithm indicated that lda, phraselda, and nmf show the most promise for future analysis on larger sets of data (from three or more semesters) and for examining different facets of the data (fall versus spring semester, different time of day, just the patron side of the conversation). introduction the library at the university of illinois at urbana-champaign has included chat reference services since the spring of 2001.1 today, this service is extensively used by library patrons, resulting in thousands of conversations each semester. while in-person reference edges out chat for the largest number of interactions at the main library information desk over the most recent four years, chat questions have a higher number of more complex questions that incorporate teaching or strategizing.2 since the initial implementation of chat, the library has continually assessed and improved chat reference by evaluating the software, measuring the effectiveness and value of the service, and providing staff training.3 for several years, librarians at the university of illinois have used chat transcripts for training graduate assistants and new employees and chat statistics for determining staffing. unlike other forms of reference interactions, chat offers a textual record of the conversation, so librarians have used this unique opportunity in a couple different ways. in a training exercise, students read through actual transcripts and are guided in recognizing both well-developed and less-than-ideal interactions. they are then asked to think about ways those chat conversations could have been improved and to share strategies for doing so. graduate assistant supervisors also use chat transcripts to evaluate the performance of individual graduate assistants, checking for appropriate levels of helpfulness and for adherence to the library’s chat policies. finally, part of the library’s assessment strategy looks at chat interaction numbers, such as chats per hour, the duration of each conversation, and the level of complexity of each conversation to help make decisions about optimal chat staffing levels. however, prior to the project described here, the library had not yet good night, good day, good luck | ozeran and martin 60 https://doi.org/10.6017/ital.v38i2.10921 analyzed the chat reference conversations on a large scale to understand the range and consistency of topics being discussed. while these uses of chat data have been successful, such a large body of information from patrons about the library and its collections and services seemed underutilized. in an environment of growing data-informed decision-making, both within the broader library community and at the university of illinois in particular, it was now an opportune time to implement this kind of largescale topic analysis. if common themes emerged from the chat interactions beyond simply showing the most frequently asked questions, these themes could inform the library’s reference services beyond just training for chat reference. for example, patterns in the number of citation questions could indicate the best times to offer a citation management tool workshop; multiple inquiries about a new resource or tool might prompt planning a new workshop; and repeated confusion regarding a service or policy may signal a need to bolster the online guides or faq. since the number of chat transcripts was so large, automating analysis through a programming language such as python seemed the best course of action. this article presents the results of a pilot project that tested the application of algorithmic topic modeling to chat conversations. the outcomes for this project included (1) determining if this method could be used to identify the most common chat reference topics in a semester; and (2) whether this information indicated if it could be used to inform reference services beyond just training for chat, such as improving faqs, workshops, the library website, or other instruction. literature review chat reference services are well established in academic libraries, and there are abundant examples in the literature exploring these services. however, there is a lack of research on ways to employ automated methods to analyze chat reference. numerous articles approach chat analysis via traditional qualitative methods, where research teams hand-code chat themes, topics, or question categories.4 schiller employed a tool called qda miner to partially automate the otherwise human-driven coding process, using the software to automatically generate clusters of manually created codes.5 only one paper appeared to explicitly address the issue primarily by using algorithmic analysis methods. in addition to conducting sentiment analysis, kohler applied three topic modeling algorithms to chat reference conversations at rockhurst university.6 kohler identified the algorithm of non-negative matrix factorization (nmf) as the “winning topic extractor” based on how evenly it distributed the topic clusters across all the chat conversations.7 the other algorithms kohler tested, latent dirichlet allocation (lda) and latent semantic analysis (lsa), had much more skewed distributions of topics. the most common topic identified by lda appeared in so many of the chat conversations that it was essentially meaningless as a category. lda is one the most well-established topic modeling algorithms, but as kohler found, it does not work very well with short texts like chat conversations. to supplement the lack of library research in this area, non-library research that has applied topic modeling to short texts was also reviewed. interestingly, although the nmf algorithm worked well for kohler’s analysis of library chat conversations, there was little mention of nmf in the nonlibrary literature. on the other hand, it was not surprising that lda was one of the most commonly discussed algorithms, either as an example of what doesn’t work or as a basis upon which a modified algorithm was created to perform better for short texts.8 another common algorithm information technology and libraries | june 2019 61 was biterm topic modeling (btm). proposed by cheng et al., btm takes pairs of words (biterms), rather than individual words, as the unit on which to base topics.9 by creating biterms, the researchers increased the number of items to sort into topics, thus mitigating a common problem with analyzing short texts. a final commonly used algorithm was the dirichlet mixture model (dmm).10 a key feature of dmm for analyzing short texts is that it assumes each text (in this project, each chat conversation) is associated with only one topic. while longer texts like articles or books likely encompass many topics, it is plausible that a chat conversation could be summarized in one topic. methodology at the time of this project (spring 2018), the library was using locally developed chat software called iwonder. the chat widget is embedded on the library homepage, on the “ask a librarian” page, in libguides, and within the library’s interface for its licensed ebsco databases. the chat service was available 87 hours per week at the time the data was collected. during the day, chat service is provided by a mix of librarians, library staff, and graduate assistants, most of whom are scheduled at the main library’s information desk. subject-specific libraries, including the engineering library, the agricultural and life sciences library, and the social sciences, health, and education library, also contribute many hours on chat reference from their respective locations. the evening and weekend shifts are all covered by graduate assistants from the university of illinois school of information sciences. the authors decided that one semester of chat transcripts would be the most appropriate corpus with which to work for this pilot project because it would encompass a substantive and meaningful (but also manageable) number of conversations. in preparation, institutional review board approval was received, and a graduate student completing a degree in information management from the school of information sciences was selected to assist with this project through the school’s practicum program. this practicum student is an experienced programmer, and his presence on the team allowed the project to proceed more quickly than if the authors had pursued the project without his expertise. to begin the project, all chat conversations from the spring 2017 semester were obtained by querying the local server using mysql workbench, limiting the query to chat logs between the dates 1/17/2017 and 5/12/2017 (inclusive). because each line of a chat conversation was saved as a separate line in the database, this meant retrieving approximately 90,000 lines of data. the actual text of the chat conversations was unstructured (by its nature), but the text was saved with related metadata. for instance, each chat conversation was automatically given a unique identifier, so the individual lines could be grouped into conversations and put in order by their timestamp. the 90,000 lines represented almost 6,000 individual conversations. the chat logs were cleaned using a combination of openrefine (primarily for ascii character cleanup) and python code to remove personally identifiable information (pii) and to make the data easier to analyze.11 by default, the chat software did not collect any information about patrons, but sometimes patrons volunteered pii because they thought it was needed to answer their questions. therefore, part of the cleaning process involved removing as much of this patron pii as possible, replacing it with the word “removed” to denote the change. in addition, library staff usernames were scrubbed by replacing each username with a generic “staff###”, where “###” was a unique (incremented) number assigned to each original username. this maintained good night, good day, good luck | ozeran and martin 62 https://doi.org/10.6017/ital.v38i2.10921 the ability to track a single staff member across multiple conversations, if desired, without identifying the actual person. another important part of the data cleaning was to remove urls, because these would be unnecessary in identifying topics, and they significantly increased the number of unique “words” that the analysis algorithms identified. the urls were nearly always saved within an html tag, so most urls were easily identified for removal. the data cleaning process has been described here in a linear fashion for ease of understanding, but over the course of the project it was actually an iterative process, as more cleaning issues were discovered during analysis. based on the analyses performed in the related literature, the practicum student wrote code to test five topic modeling algorithms: (1) latent dirichlet allocation (lda), (2) phrase-lda (lda applied to phrases instead of words), (3) biterm topic modeling (btm), (4) dirichlet mixture modeling (dmm), and (5) non-negative matrix factorization (nmf). ultimately, the processing power and time required to implement btm meant that this algorithm could not be implemented for this project. however, for the other four models, lda, phrase-lda, dmm, and nmf, were all successfully implemented. all code related to this project, including the cleaning and analysis, are available on github (https://github.com/mozeran/uiuc-chat-log-analysis). results outputs of the lda, phrase-lda, dmm, and nmf modeling algorithms are shown in tables 1 through 4. after removing common stop words, the remaining words were put into lowercase and stemmed before topic modeling algorithms were applied. the objective of the stemming process was to convert singular and plural versions of a word to a hybrid form so that they are treated as the same word. thus, many words ending in “y” are shown ending in “i”. for instance, “library” and “libraries” would both be converted to “librari” and thus be treated as the same word. the phrase “easi search” refers to “easy search,” the all-in-one search box on the library homepage. the word “ugl” refers to the undergraduate library (ugl). the word “remov” showed up in the topic lists surprisingly frequently, probably because patron pii was replaced with the word “removed.” since explicitly denoting the removal of pii is unlikely to be of import, it makes sense in the future to simply remove the pii without replacement. table 1: lda (top 10 words in each topic) topic 1 music map laptop remov find ok one also may score topic 2 look search find help databas thank use articl research would topic 3 book librari thank help check look remov reserv would els topic 4 help use student find articl librari hi look tri question topic 5 request librari account item thank ok get help loan number topic 6 thank chat good know one night go okay think hi topic 7 thank look librari remov help would contact inform find like topic 8 search articl databas click thank journal help page ok find topic 9 articl thank journal access look help remov full link find topic 10 access tri link thank use work get campu remov let table 2: phrase-lda information technology and libraries | june 2019 63 (top 10 phrases in each topic) topic 1 interlibrari loan, lose chat, chat servic, lower level, chat open, writer workshop, spring break, studi room, call ugl, add chat topic 2 good night, great day, good day, good luck, drop menu, sound good, nice day, ye great, remov thank welcom, make sens topic 3 anyth els, tri find, abl find, find anyth, feel free, ll tri, social scienc, tri access, ll back, abl access topic 4 easi search, academ search, find articl, search box, tri search, databas subject, search bar, search term, databas search, search databas topic 5 graduat student, grad student, peer review, undergrad student, illinoi undergrad, scholarli sourc, univers illinoi, undergradu student, primari sourc, googl scholar topic 6 main librari, librari catalog, librari account, librari homepag, call number, librari websit, netid password, main stack, creat account, borrow id topic 7 page remov, click link, open new tab, link remov, send link, remov click, left side, remov link, page click, error messag topic 8 give one moment, contact inform, moment pleas, faculti staff, give minut, pleas contact, email address, staff member, faculti member, unit state topic 9 full text, journal articl, access articl, find articl, databas journal, light blue, articl titl, titl articl, journal databas, found articl topic 10 request book, request item, check book, doubl check, print copi, cours reserv, copi avail, physic copi, book avail, copi past table 3: dmm (top 10 words in each topic) topic 1 work open chat way onlin say specif avail day sourc topic 2 check titl research much onlin avail day text sourc say topic 3 pleas sourc day onlin titl found right hello may take topic 4 chat also copi pleas think onlin undergrad sourc work way topic 5 pleas sorri found item chat way right open work time topic 6 found also right much think could research undergrad sorri way topic 7 contact hello account sorri could ask titl moment may think topic 8 copi onlin sorri ask think say right also much sourc topic 9 much research way may right think open take hello result topic 10 abl avail also titl catalog pleas say campu onlin take table 4: nmf (top 10 words in each topic) topic 1 request take titl today moment way item may place say topic 2 specif start type journal topic research tab way subject result topic 3 ugl today ask wonder call may contact peopl someon talk topic 4 sourc univers scholarli research servic resourc tell illinoi guid librarian topic 5 account log set vpn us password id say campu problem topic 6 main locat undergradu call tab review two circul ugl number topic 7 reserv class time undergradu cours websit show im titl onlin good night, good day, good luck | ozeran and martin 64 https://doi.org/10.6017/ital.v38i2.10921 topic 8 text full troubl problem still pdf websit onlin send moment topic 9 chat night hey yeah oh well time tonight take yep topic 10 unfortun uiuc onlin wonder version graduat print seem way grad discussion interpreting the results of a topic model can be a bit of a guessing game. none of these algorithms look at the semantic meaning of words, so the resulting topics are not based on semantics. each algorithm simply employs a different method of mathematically determining the likelihood that words are related to each other. when this likelihood is high enough (as defined by the algorithm), the words are listed within the same topic. identifying topics mathematically is much quicker than a person hand-coding conversations. however, automatic classification also means that the resulting topics could make absolutely no sense to people, who understand the semantic meaning of the words within a topic. this lack of coherent meaning is most present in the results of the dmm model (table 3). for instance, the words that comprise topic 1 are the following: “work open chat way online say specify available day source.” it is difficult to imagine what overarching concept links all, or even most, of these words. only a few words appear to have any significance at all: “open” could refer to open access, or to the library’s open hours; “online” may refer to finding resources online, or the fact that a student is taking online classes; and “source” is likely some reference to a research resource. these words barely relate to each other semantically, and the remaining seven words don’t provide much clarification. thus, it appears that dmm is not a particularly good topic modeling algorithm for library chat reference. the results seen from the lda model (table 1) appear slightly more comprehensible. in topic 2, for instance, the words are as follows: “look search find help database thank use article research would.” while not all the words relate to each other, a common theme could emerge from the words look, search, find, database, article, and research. it’s possible that this topic 2 identified chat conversations where a patron needed help finding research articles. even topic 6, at first glance a silly list of words, makes some sense: “thank chat good know one night go okay think hi.” greetings and sign-offs probably comprised a good number of the total words in the corpus, so it is understandable that a “greetings” topic could be mathematically identified. overall, lda appears to have potential in topic modeling chat reference, but it probably needs to be further tweaked. when applying the lda model to phrases (table 2), the coherence increases within the phrases, but the topics are not always as coherent. topic 1 includes the following phrases: “interlibrary loan, lose chat, chat service, lower level, chat open, writer workshop, spring break, study room, call ugl, add chat.” each phrase, individually, makes perfect sense for the context of this library; as a collection, however, the phrases don’t comprise one coherent topic. four of the phrases explicitly mention chat services (an interesting meta-topic), while the rest appear completely unrelated. on the other hand, topic 10 does show more semantic relation between the phrases: “request book, request item, check book, double check, print copy, course reserve, copy available, physical copy, book available, copy past.” it seems pretty clear that this topic refers to books— whether on reserve, being requested, or checking if they are even available. with the wide difference in topic coherence, the phrase-lda algorithm is not perfect for topic modeling chat reference, but further exploration is warranted. information technology and libraries | june 2019 65 the final algorithm, nfm (table 4), is also imperfect. it is possible to distill each topic into an actual semantic concept, but there is almost always at least one word that makes it a little less clear. topic 5 probably provides the best coherence: “account log set vpn use password id say campus problem.” it seems clear this topic refers to identity verification, likely for off-campus use of library resources. the other topics given by the algorithm have more confusing elements, such as in topic 1 where the relatively meaningless words may, way, and say all appear. it’s interesting that kohler found nmf to work very well, while the results above are not nearly as coherent as those identified in her implementation.12 this is a perfect example of how the tuning of many different parameters can affect the ultimate results of each topic modeling algorithm. this is why the authors think it is worth continuing to explore how to improve the implementation of lda, phrase-lda, and nmf algorithms for chat conversations, as well as share the original code for others to test and revise. it will take many different projects at many different libraries before an optimum topic model implementation is found for chat reference. next steps for the most part, the more coherent results from the lda and nmf topic modeling algorithms support anecdotal understanding of the primary themes in chat conversations. currently, two members of the research & information services unit, the department responsible for scheduling the chat reference service at the main library, are examining the model outputs to determine whether any of the results are strong enough at this stage to suggest changes to services or resources. they will also share the results with the chat coordinators at other libraries on campus in case the results indicate changes for them. additionally, results will be shared with the library’s web working group, since repeated questions about the same services or locations may suggest the need to display them in a more prominent place on the library website or provide a more discoverable online path to them. since this was a pilot project that used a fairly small data set, it is anticipated that years of transcripts—along with improved topic model implementation—will reveal even more significant and robust themes. with the encouraging results of this pilot project, there is much to continue to explore.13 one future question is whether there are differences between fall and spring semesters. if some topics arise more frequently in one semester than the other, perhaps the library needs to offer more workshops during that semester. alternatively, perhaps support materials should be created (such as handouts or online guides) that emphasize the related services and place them more prominently, while withdrawing or de-emphasizing them in the other semester. another area for further analysis is how the topics that emerge in the late-night chat interactions compare to other times of day. this will help the library design more relevant training materials for the graduate assistants who staff those shifts, or potentially change who is staffing the shifts. also of interest is comparing the text written by the chat operators versus the chat users, as this would further spotlight the terminology that patrons use. if patrons are using significantly different terms from staff, then modifying the language of the library’s website may reduce confusion. there are also improvements to make to the data cleaning process, such as better identifying when to remove stop words and when to remove punctuation. these steps weren’t perfectly aligned, which is why; for example, the “ll” that appears in topic 3 of the phrase-lda results (table 2) is most likely a mutation of the contractions like “i’ll,” “we’ll,” and “you’ll.” generating “ll” as a word from multiple different contractions not only created a meaningless word, but since “ll” good night, good day, good luck | ozeran and martin 66 https://doi.org/10.6017/ital.v38i2.10921 occurred more frequently than any unique contraction, it was potentially treated as more important by the topic modeling algorithms. conclusion this project has demonstrated that topic modeling is one possible way to employ automated methods to analyze chat reference, with mixed success. the library will continue to improve chat reference analysis based on this project experience. the authors hope that other libraries will use the lessons from this project and the code in github as a starting point to employ similar analysis for their own chat reference. in fact, a related project at the university of northern iowa library is evidence of growing interest in topic modeling of chat reference transcripts.14 considering how frequently patrons use chat reference, is it important for libraries to explore and embrace whatever methods will allow them to assess and improve such services. acknowledgements the authors wish to acknowledge the research and publication committee of the university of illinois at urbana-champaign library, which provided support for the completion of this research. many thanks are owed to xinyu tian, our practicum student, for the extensive work he did in identifying relevant literature and developing the project code. notes 1 jo kibbee, david ward, and wei ma, “virtual service, real data: results of a pilot study,” reference services review 30, no. 1 (mar. 1, 2002): 25–36, https://doi.org/10.1108/00907320210416519. 2 the library uses the read scale (reference effort assessment data scale), which allows reference transactions to be translated into a numerical scale that takes into account the effort, skills, knowledge, teaching moment, techniques and tools used by the staff in the transaction. see readscale.org for more information. 3 david ward and m. kathleen kern, “combining im and vendor-based chat: a report from the frontlines of an integrated service,” portal: libraries and the academy 6, no. 4 (oct. 2006): 417–29, https://doi.org/10.1353/pla.2006.0058; joann jacoby et al., “the value of chat reference services: a pilot study,” portal: libraries and the academy 16, no. 1 (jan. 2016): 109– 29, https://doi.org/10.1353/pla.2016.0013; david ward, “using virtual reference transcripts for staff training,” reference services review 31, no. 1 (2003): 46–56, https://doi.org/10.1108/00907320310460915. 4 robin brown, “lifting the veil: analyzing collaborative virtual reference transcripts to demonstrate value and make recommendations for practice,” reference & user services quarterly 57, no. 1 (fall 2017): 42–47, https://doi.org/10.5860/rusq.57.1.6441; maryvon côté, svetlana kochkina, and tara mawhinney, “do you want to chat? reevaluating organization of virtual reference service at an academic library,” reference & user services quarterly 56, no. 1 (fall 2016): 36–46, https://doi.org/10.5860/rusq.56n1.36; donna goda and corinne bisshop, “frequency and content of chat questions by time of semester at the university of central florida: implications for training, staffing and marketing,” public services quarterly 4, no. 4 (dec. 2008): 291–316, https://doi.org/10.1080/15228950802285593; information technology and libraries | june 2019 67 kelsey keyes and ellie dworak, “staffing chat reference with undergraduate student assistants at an academic library: a standards-based assessment,” the journal of academic librarianship 43, no. 6 (2017): 469–78, https://doi.org/10.1016/j.acalib.2017.09.001; michael mungin, “stats don’t tell the whole story: using qualitative data analysis of chat reference transcripts to assess and improve services,” journal of library & information services in distance learning 11, no. 1–2 (jan. 2017): 25–36, https://doi.org/10.1080/1533290x.2016.1223965. 5 shu z. schiller, “chat for chat: mediated learning in online chat virtual reference service,” computers in human behavior 65 (dec. 2016): 651–65, https://doi.org/10.1016/j.chb.2016.06.053. 6 ellie kohler, “what do your library chats say?: how to analyze webchat transcripts for sentiment and topic extraction,” in brick & click libraries conference proceedings (brick & click, maryville, mo: northwest missouri state university, 2017), 138–48, https://www.nwmissouri.edu/library/brickandclick/presentations/eproceedings.pdf. 7 kohler, 141. 8 for example: guan-bin chen and hung-yu kao, “re-organized topic modeling for microblogging data,” in proceedings of the ase bigdata & socialinformatics 2015, ase bd&si ’15 (new york, ny: acm, 2015), 35:1–35:8, https://doi.org/10.1145/2818869.2818875. 9 x. cheng et al., “btm: topic modeling over short texts,” ieee transactions on knowledge and data engineering 26, no. 12 (dec.2014): 2,928–41, https://doi.org/10.1109/tkde.2014.2313872. 10 for example: chenliang li et al., “topic modeling for short texts with auxiliary word embeddings,” in proceedings of the 39th international acm sigir conference on research and development in information retrieval (acm press, 2016), 165–74, https://doi.org/10.1145/2911451.2911499. 11 we used python packages gensim, langid, nltk, numpy, pandas, re, sklearn, and stop_words for data cleaning and analysis. 12 kohler, “what do your library chats say?” 13 the library implemented new chat reference software after this project was completed, so analysis of chat conversations that took place after the spring 2018 semester will require a reworking of the data collection and cleaning processes. 14 hyunseung koh and mark fienup, “library chat analysis: a navigation tool,” (poster, dec. 5, 2018), https://libraryassessment.org/wp-content/uploads/2018/11/58-kohfienuplibrarychatanalysis.pdf. reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic library for scientific journals: consortium project in brazil rosaly favero krzyzanowski;taruhn, rosane information technology and libraries; jun 2000; 19, 2; proquest pg. 61 electronic library for scientific journals: consortium project in brazil making information available for the acquisition and transmission of human knowledge is the focal point of this paper, which describes the creation of a consortium for the 1111iversity and research institute libraries in the state of sao paulo, brazil. through sharing and cooperation, the project will facilitate information access and minimize acquisition costs of international scientific periodicals, consequently increasing user satisfaction. to underscore the advantages of this procedure, the objectives, management, and implementation stages of the project are detailed, as submitted to the research support foundation of the state of sao paulo (fapesp). i production, organization, and acquisition of knowledge in 1851, predicting the imminent growth in information, which in fact exploded in volume one hundred years later, joseph henri of the smithsonian institute voiced his opinion that the progress of mankind is based on research, study, and investigation, which generate wisdom, knowledge or, simply , information. he stated that for practically every item of interest there is some record of knowl edge pertinent to it, "and unless this mass of information be properly arranged, and the means furnished by which its content may be ascertained, literature as well as science will be overwhelmed by their own unwieldy bulk. the pile will begin to totter under its own weight, and all the additions we may heap upon it will tend to add to the extension of the base, without increasing the elevation and dignity of the edifice." 1 at the threshold of the twenty-first century, these words become more self-evident by the day. there are enormous archives of knowledge from which people extract parts, allowing them to advance and progress in science, technology, and the humanities. until some decades back, recovery from these archives was essentially a manual task consisting of written work and organization. today's technologies provide auxiliary tools to transmit this knowledge . although information is a cultural and social asset, it now is purchased at high prices . making these enormous archives available in a clear and organized manner by using the proper technology is currently the greatest challenge for all those involved in knowledge management-the production , organization, and transmission of information. rosaly favero krzyzanowski rosane taruhn i the advent and implications of electronic publications among the major contributions of the industrial era, outstanding are the evolution and growth of information publi shing and printing facilities that use tools to record, store, and distribute information. in the last ten years, the first steps were taken toward the storage and reproduction of sounds and images in new multimedia formats. technological advances also have brought new possibilities in accessing and disseminating information . electronic publishing has been particularly effective in accelerating access and contributing to the generation of additional knowledge; consequently, an exponential increase in data has taken place, most notably in the second half of the twentieth century. current journals numbered about 10,000 at the beginning of the century; by the year 2000 the number had reached an estimated 1 million. 2 as a result, specialized literature has been warning about a possible crisis in the traditional system of scientific publications on paper . in addition to the difficulty of financing the publication of these works, the prices of subscriptions to scientific periodicals on paper have been rising every year. at times, this makes it impracticable to update collections in all libraries, which interferes substantially in development. on the other hand, access to electronic scientific publications via internet is proving to be an alternative for maintaining these collections at lower cost. it also provides greater agility in publishing and distributing the periodical, and in the final user's accessing of the information. due to this, it is important that institutions that wish to support and promote research developed by their scientific communities facilitate access to these publications on electronic media . to paraphrase line, we can say that although publishers are still uncertain as to all the aspects of transmitting information electronically, because authors and institutions will be increasingly able to distribute their works on the web without the direct involvements of publishers, there is an escalation in electronic publications being published by scientific publishers.3 rosaly favero krzyzanowski is technical director of the integrated library system of the university of sao paulosibi/usp, brazil. rosane taruhn is director of the development and maintenance of holdings service of the technical department of the university of sao paulo-sibi/usp, brazil. electronic library for scientific journals i krzyzanowski andtaruhn 61 ! reproduced with permission of the copyright owner. further reproduction prohibited without permission. physical figure 1. infrastructure resources for consortium formation line also savs that one of the reasons for the growth in the number o'f electronic publications is "that it is technically possible to make them [journals] accessible in this way, and in fact easy and cheap, since nearly all te_xt ~oes through a digital version on the way to pubh~ahon. secondly, journal publishers believe that electronic ve~sions provide a second market in addition to that for t~eir printed versions, or at least in an expanded market, since many users will be the same." 4 . . . . . it is important to point out that the sc1enhhc penod1cal, be it paper or electronic, must ensure market valu_e and academic community receptivity, have a staff qualified for scientific publishing, be consistent in publishing release dates, comply with international standards, and use established distribution and sales mechanisms. 5 line goes further: "electronic publication as an_ 'extra' to printed publication has few added costs of j~urnal publication other than those of printing, and pubhshe~s are not going to want to make less money fro~ elect~onic journals than they do from printed ones. while p~inted journals once acquired can be used and reused without extra cost, each access to an electronic article has to be paid for. and although the costs of storage and binding may be saved, these are offset by the costs of printing out."6 he then notes that this technology demands an active equipment and telecommunication infrastructure. another point he addresses is the need for users to master the search strategies required to efficiently recover information, thus reducing the time spent and costs. in turn, saunders points out that, depending on the contracts made with the publishers or their agents: 62 information technology and libraries i june 2000 libraries, through their development, formation, and maintenance policies, should be receptive to this transition by accommodating the different means of communication to the different user needs and striving for a new balance. these policies should certainly stress the cooperation and sharing of remote access to the information demanded. budget estimates should, therefore, foresee, in addition to the subscriptions to electronic titles with complete texts, other possible items like licensing rates for multi-user remote access and the right to copy articles on electronic media to paper, depending on the contracts made with the publishers or their agents.7 i electronic publication consortiums catering to mutual interests by setting up a library consortium to select, acquire, maintain, and preserve electronic information is one means of reducing or sharing costs as well as expanding the universe of information available to users and ensuring a successful outcome. resources-physical, human, financial, and electronic-are combined for the common good; in this case, the consortium, as shown in figure 1, which was extracted and adapted from an oclc institute. 8 the consortium presupposes invigoration of cooperative activities among member libraries by promoting the central administration of electronic publication databases as part of a shared library system visible to all and replete with access facilities. in addition to putting in place simplified, reciprocal lending progra~s and spu_rring _the cooperative development of collections and the~r st~nng, the consortium has the objective of implementing information distribution by electronic means, provided that copyright and fair use rights are complied _wi~h.9 on t~e other hand, "the research library community is committed to working with publishers and database producers to develop model agreements that deploy lice~ses that d? not contract around fair use or other copynght provisions. in this way, one seeks to insure the library practices being disseminated, especially interli?~ary lendi~g."'. 0 experience shows that acqumng ~ubhcahons through consortia has brought great benefits and has equally favored different size institutions that would not be able to afford single subscriptions, whether on paper or in electronic format. north american and european universities have been opting for this type of alliance to augment inve~tment cost-benefit. important examples of these consortia currently operative are: • washington research library consortium, washington, d.c., www.wric.org; reproduced with permission of the copyright owner. further reproduction prohibited without permission. • university system of georgia, galileo project, www.galileo.peachnet.edu; • committee on institutional cooperation, michigan, www.cedar.cic.net/ cic; and • ohio library and information network, ohio link, www.ohiolink.edu. i the electronic consortium in the state of sao paulo considering that brazilian institutions also are being affected by the high cost of maintaining periodical collections and that alternative means of distributing this information are available, the model used abroad has shown itself as appropriate for developing the international scientific publications electronic library in the state of sao paulo. the location has a favorable information infrastructure available, particularly that of the electronic network of the academic network of sao paulo (ansp), thanks to the support of the research support foundation of the state of sao paulo (fapesp). 11 growing user demand for direct, convenient access to information in the state of sao paulo also was a factor in location choice. the final decision was to compose the consortium of five sao paulo state universitiesuniversidade de sao paulo (usp), universidade estadual paulista (unesp), universidade de campinas (unicamp), universidade federal de sao carlos (ufscar), and universidade federal de sao paulo (unifesp)-as well as the latin american and caribbean center for health science information (bireme). the consortium's goal was to make available to the member institutions' entire scientific community-10,492 faculty and researchers -rapid access to the complete, updated texts of the elsevier science scientific journals. this publishing house, an umbrella for north holland, pergamon press, butterworth-einemann, and excerpta medica, presently publishes electronic versions of its journals. selection of the member institutions that would serve as a pilot group for this project was based on prior experience with the cooperative work in preparing the unibibli collective catalog cd-rom, which, using bireme/opas/oms technology, consolidates the collections of these three universities. the project was initially funded by the fapesp; since its fourth edition the cdrom has been published through funds provided by the universities themselves, by means of a signed agreement. moreover, choice of elsevier science, which would be justified solely by its premier ranking in the global publishing market, also is due to the fact that consortium member institutions maintain subscriptions to a great number (606) of this publishing house's titles on paper. already fully available on electronic media, these titles are components of a representative collection initiating the building of the international scientific publications electronic library in the state of sao paulo. furthermore, the majority of the titles are studied on the institute of scientific information's web of science site, which has been at the disposal of researchers and libraries in the state of sao paulo since 1998. consortium objectives the consortium was formed to contribute to the development of research through the acquisition of electronic publications for the state of sao paulo's scientific community. using the ansp network, in addition to augmenting and speeding up access to current scientific information in all the member institutions, will: • increase the cost-benefit per subscription; • promote the rational use of funds; • ensure continuous subscription to these periodicals; • increase the universe of publications available to users through collection sharing; • guarantee local storage of the information acquired and thus ensure the collection's maintenance and its continual use by present and future researchers; and • develop the technical capabilities of the personnel of the state of sao paulo institutions in operating and using electronic publication databases. initially, the project will not interfere in the current process of acquiring periodicals on paper and in distributing collections in member institutions. however, as electronic collection utilization becomes predominant, duplicate subscriptions to paper may be eliminated so as to allow new subscriptions to be available to the consortium at no additional cost. implementation of the electronic library for international scientific publications implementation of this project includes the following stages already achieved: • constitution of the consortium by the six member institutions; and • set up of an administrative board. the following stages are in progress: • purchase of hardware (central server) and a software manager; and • estimate for the installation of the operational system. electronic library for scientific journals i krzyzanowski and taruhn 63 reproduced with permission of the copyright owner. further reproduction prohibited without permission. bireme server fapesp server full-text database r----------.,1 full-text 1 t international i r database 1 ~----------~ web of science .... •--•.. : scientific : : current : : contents : scielo : periodical : 1 electronic 1 i l'b i 1 1 rary 1 .. __________ .. : connect : i (ccc) i i i ., __________ ., \/ universe • web of science: 8,000 titles • ccc: 9,000 titles users in consortia institutions • scielo (scientific electronic library online): 100 titles • international scientific periodical electronic library: 606 titles figure 2. reference database and full-text interconnectivity to optimize information access and the following stages are planned: • training for qualified personnel and maintenance of the infrastructure built up; • acquisition and implementation of the electronic library on the central server; and • permanent utilization assessment. the pilot project proposes that the central server, for storage and availability of electronic scientific periodical collections on the ansp network, be located at fapesp in order to facilitate development of an electronic bank. in the future, the bank should, in addition to the collection in mind for the project, include international collections of other publishing houses: the scielo collection of brazilian scientific magazines (project fapesp /bireme) as well as the web of science and current contents connect reference databases (see figure 2). consortium management the electronic library will be administrated by the consortium's administrative board, made up of a general coordinator, an operations coordinator, and directors and coordinators of the library systems and central libraries of member institutions as well as consultants recommended by fapesp. the administrative board shall be in charge of the implementation, operation, dissemination, and assessment of electronic library utilization. it also is charged 64 information technology and libraries i june 2000 with supervising qualified personnel training in order to guarantee the success of the project. an agreement specifying the consortium objective, its constitution, the manner by which it shall be executed and consortium member obligations established was signed. shortly, a contract to use elsevier science electronic publications shall be signed by fapesp and by the provider. the agreement's documents and use license were drawn up in compliance with the principles for licensing electronic resources recommended by the american library association, published in final version at the 1997 american library association annual conference.12 i recovery system and information use evaluation research on electronic media suggests that use of a single software program that offers different strategies and forms of interacting for searching the collections requires an evaluation of the efficiency of individual research strategies. this evaluation is critical for preparation of guidelines that orient the choice of systems and proper training programs.13 for the electronic library, the challenge of measuring not only the amount of file use but also the efficacy and efficiency of its information access systems and training for its users is an imperative task. in the project reproduced with permission of the copyright owner. further reproduction prohibited without permission. described, evaluation shall be made by indicators that demonstrate use of the electronic library and of the collections on paper, per journal title, subject researched, user institution, number of accesses per day, and user satisfaction regarding service provided (interface, response time, text copies), among other factors to be studied. i final remarks the way in which electronic media are read by the users is a code far beyond the written, because sound and image are being added increasingly. in this first generation of electronic publications, fapesp supported availability of web of science and of scielo by fapesp and the creation of the international scientific publications electronic library in the state of sao paulo. the possible introduction of current contents connect will trigger an extraordinary leap in research development, facilitating the access of scientific information and the acquisition and transmission of human knowledge as well as enhancing the cooperative and sharing enterprise of member libraries. references and notes l. annual report of the board of regents of tile smit/zsonum institution ... during the year 1851 (washington, d.c. 1852), 22. 2. leo wieers, "a vision of the library of the future," in developing the library of the fut11re: the tilb11rg experience, h. geleijnse and c. grootaers, eds. (tilburg, the netherlands: tilburg univ., 1994), 1-11. 3. m. b. line, "the case for retaining printed lis journals," !fla journal 24, no. 1 (oct./nov. 1998): 15-19. 4. ibid. 5. r. f. krzyzanowski, "administra<;ao de revistas cientificas," in re11niiio anual da sociedade de pesquisa odonto/6gica, aguas de sao pedro, 14, 1997. (lecture) 6. line, "the case for retaining printed lis journals." 7. l. m. saunders, "transforming acquisitions to support virtual libraries," information teclmology and libraries 14, no. 1 (mar. 1995): 41-46. 8. oclc institute, oclc instit11te seminar: information tec/znology trends for thl' global library cormmmity, 1997, ohio (dublin, ohio: oclc institute/the andrew w. mellon foundation/funda<;ao gettilio vargas/bibliodata library network, 1997). 9. a definition of fair use is the "legal use of information: permission to reproduce texts for the purposes of teaching, study, commentary or other specific social purposes." found in j. s. d. o'connor, "intellectual property: an association of research libraries statement of principles." accessed july 28, 1999, http://arl.cni.org/ scomm/ copyright/ principles. html. 10. statement of current perspective and preferred practices for the selection and purchase of electronic information. icolc statement on electronic information. accessed july 2, 1998, www.library.yale.edu/ consortia/statement.html. 11. r. f. krzyzanowski and others, biblioteca eletr6nica de publicac;oes cientfficas internacionais para as universidades e institutos de pesquisa do estado de sao paulo. sao paulo, 1998 (project presented to fapesp-fundac;ao de amparo a pesquisa do estado de sao paulo). 12. b. e. c. schottlaender, "the development of national principles to guide librarians in licensing electronic resources," library acquisitions-practice and theory 22, no. 1 (spring 1998): 49-54. 13. w. s. lang and m. grigsby, "statistics for measuring the efficiency of electronic information retrieval," journal of the american society for information science 47, no. 2 (feb. 1996): 159-66. electronic library for scientific journals i krzyzanowski and taruhn 65 virtual reality as a tool for student orientation in distance education programs: a study of new library and information science students articles virtual reality as a tool for student orientation in distance education programs a study of new library and information science students sandra valenti, brady lund, and ting wang information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11937 dr. sandra valenti (svalenti@emporia.edu) is assistant professor, school of library and information management, emporia state university. brady lund (blund2@g.emporia.edu) is doctoral student of library and information management at emporia state university. ting wang (twang2@emporia.edu) is doctoral student of library and information management, emporia state university. abstract virtual reality (vr) has emerged as a popular technology for gaming and learning, with its uses for teaching presently being investigated in a variety of educational settings. however, one area where the effect of this technology on students has not been examined in detail is as tool for new student orientation in colleges and universities. this study investigates this effect using an experimental methodology and the population of new master of library science (mls) students entering a library and information science (lis) program. the results indicate that students who received a vr orientation expressed more optimistic views about the technology, saw greater improvement in scores on an assessment of knowledge about their program and chosen profession, and saw a small decrease in program anxiety compared to those who received the same information as standard textand-links. the majority of students also indicated a willingness to use vr technology for learning for long periods of time (25 minutes or more). the researchers concluded that vr may be a useful tool for increasing student engagement, as described by game engagement theory. literature review computer-assisted instruction (cai) has, for many years, been considered an effective method of instructional delivery that improves student engagement and outcomes.1 new technologies, such as the learning management system (lms), online video, laptops and tablets, word processors, spreadsheets, and presentation platforms, have all significantly altered how knowledge is transferred and measured in students. when adopted by instructors, these technologies can improve the quality of student learning, work, and their evaluation of this work. empirical research has shown that learning technologies do indeed contribute to better learning than a lecture alone.2 positive reaction to the adoption of new learning technologies among student populations has been shown across all grade levels, from pre-k through postgraduate education.3 research in the fields of instructional design technology (idt) and information science (is) have shown that the novelty of new learning technology provides short-term improvement in outcomes.4 this supports the broader hypothesis that engagement increases retention of knowledge. these findings would suggest that, at least in the short term, instructors could anticipate improvement in knowledge retention through the use of a new technology like virtual mailto:svalenti@emporia.edu mailto:blund2@g.emporia.edu mailto:twang2@emporia.edu information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 2 reality. when used in sustained instructional efforts, many learning technologies show som e promise for improving the attainment of learning outcomes.5 this is why interest in learning technology has grown so significantly in the past two decades and the job outlook for instructional designers is increasing faster than the national average. 6 a large proportion of instructional technologies are not truly “adopted” by instructors, but rather used only in one-off sessions and then discarded.7 there seem to be some common factors among those technologies that are adopted and used regularly by instructors: 1. practicality, or the amount of work the new technology requires versus the perceived value of said technology; 2. affordability, or the cost of a new technology versus the perceived value of said technology; and 3. stability, or the likelihood of the product to be continuously supported and updated by its manufacturer (e.g., a product like microsoft office has a higher likelihood of ongoing maintenance).8 as noted by lund and scribner, only recently, with the introduction of free vr development programs and inexpensive viewers/headsets like google cardboard, has vr fit this criteria. 9 it is finally practical to use vr as a learning tool for classrooms with large numbers of students. “virtual reality is the computer-created counterpart to actual reality. through a video headset, computer programs present a visual world that can, pixel-perfectly, replicate the real world—or show a completely unreal one.”10 virtual reality is distinct from augmented reality, which augments a real-world, real-time image (e.g., viewed through a camera on a mobile device) with computer-generated information, such as images, text, videos, animation, and sound.11 the focus of the present study is virtual reality only, not related augmented (or mixed) reality technology. an important contribution to the study of virtual reality in library and information science (lis) is varnum’s beyond reality.12 this short introductory book covers both theoretical and practical considerations for the use of virtual, augmented, and mixed reality in a variety of library contexts. while the book describes how vr can be utilized in a variety of library education (for non-lis majors) contexts, it does not include an example of how virtual reality may be used for library school education. it also does not investigate in significant detail the use of virtual reality for a virtual orientation to an academic program. these are the gaps in which the following study attempts to address. the present study may be viewed through the framework game engagement theory, as described by whitton.13 game engagement theory suggests that five major learning engagement factors exist and that using gaming activities may improve how well learning activities address these factors. these factors include: • challenge, motivation to undertake activity; control, the level of choice; • immersion, extent to which an individual is absorbed into activity; • interest, an individual’s interest in the subject matter; and • purpose, the perceived value of the outcome of the activity. information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 3 it has been suggested by several researchers, including dede, that immersive experiences like vr touch on similar factors of engagement.14 emporia state university’s school of library and information management the setting for this study is emporia (ks) state university’s school of library and information management (esu slim). esu slim is the oldest library school west of the mississippi river, founded in 1902. compared to other lis education programs, esu slim is unique in that it offers a hybrid course delivery format. the six core courses in the mlis degree program are online with two in-person-class weekends for each class. each class weekend is eleven hours: from 6 to 9 p.m. friday and 9 a.m. to 5 p.m. saturday at one of nine distance education locations scattered throughout the western half of the united states. due to this course delivery format, the student population of esu slim may skew slightly older and have more individuals who are employed fulltime in relation to residential master’s programs. esu slim uses a cohort system, with a new group of students beginning annually at each of the eight distance locations as well as the main emporia, kansas campus. before each new cohort begins its first course, a one-day, in-person student orientation is offered on the campus in which the cohort will attend classes. the purpose of this experimental study is to examine how well vr technology can support or satisfy the role of the in-person student orientation by emulating the experience/information students receive during this informational session. methods this study was designed with a pre-test/post-test experimental design. depending on the state in which the students reside, they were assigned either to the experimental or control group . the experimental group received a cardboard vr headset (similar to google cardboard) and a set of instructions on how to use them. they were instructed to utilize this headset to view an interactive experience that introduced elements of library service and library education as a form of new student orientation. students in the control group received a set of links that contained the same information as the vr experience, but in a more static (non-immersive or interactive) setting. participants for this study were library school students from four states: south dakota, idaho, nevada, and oregon. these students were all enrolled in a mixed-delivery program in lis. for each core course in the program, students attend two intensive, in-person, weekend class sessions. the rest of the course content is delivered via a learning management system. for this study, the researchers were particularly interested in understanding the role of vr orientation for distance education students, as these students do not have access to the physical university campus and thus miss out on information that in-person interaction with faculty and the library environment might provide. this also seemed like a worthwhile population to study given that a large portion of lis programs have adopted the distance education (online or mixed-delivery) format. in march 2019, a sample of this population was asked to complete a short survey to indicate their interest in virtual reality for new student orientation and the extent to which acquiring information via this medium may relieve their anxiety and increase their success in the program. sixty-one percent of students indicated at least some elevated level of anxiety about their first mls information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 4 course, while 55 percent agreed that knowing more about the program’s faculty and course structure and purpose would decrease that anxiety. students were also asked to indicate the most pressing information needs they have about the program. these needs are displayed in table 1 below. this information was used to guide the design of the vr content for this study. table 1. information needs expressed by new mls students information need number of respondents (out of 55) information about esu’s curriculum 50 what courses professors normally teach 42 information about information access 41 information about librarianship in general 39 professors’ research interests 35 information about esu’s faculty 27 to see who they are via a video introduction 25 information about esu’s library 24 why they teach for esu’s mls program 23 a little personal information about faculty 20 information about my regional director 14 to which associations do faculty belong 13 information about esu’s physical spaces 5 information about esu’s archives 4 these students were also asked to indicate the extent to which they would like to use vr to virtually “meet” faculty, learn more about the program’s format, see program spaces, and learn about library services, using a five-point likert scale. the findings for this question are displayed in figure 1. information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 5 figure 1. new mls students reception to using vr as an orientation tool based on the largely positive response towards using vr for new student orientation, the researchers progressed to the experimental phase of the study. a vr experience was developed using veer vr (veer.tv), a completely free and intuitive vr-creation platform. within this platform, creators are able to upload images that were captured using a 360-degree vr camera (we used a samsung gear 360 camera) and drag-and-drop interactive elements, including text boxes, videos, audio, and transitions to new images. thus, it was possible to create a vr experience within the setting of an academic library where users could navigate throughout the building and virtually meet faculty and learn about fundamental concepts in librarianship. for this phase of the study a set of research questions were defined, hypothesis created, and independent and dependent variables identified: research questions 1. research question 1: will vr improve students’ knowledge of topics related to their library school and basic library topics, relative to those without a vr experience? 2. research question 2: will vr reduce students’ anxiety about their library program, relative to those without a vr experience? 3. research question 3: will students’ perceptions towards the usefulness of vr be significantly different based on whether or not they utilized the vr experience? 0 2 4 6 8 10 12 14 16 18 20 i'd like to use vr to "meet" faculty i'd like to use vr to learn more about the program format i'd like to use vr to see the classrooms i'd like to learn more about library services using vr f re q u e n c y o f r e sp o n d e n ts category of vr use as student orientation tool strongly agree agree neutral disagree strongly disagree information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 6 hypothesis use of vr will improve students’ knowledge of topics related to library schools and librarianship, reduce their anxiety, and result in a more positive perspective towards vr technology. variables independent variable: whether a student viewed the vr experience for a virtual orientation or viewed the web links for an online orientation. dependent variables: change in students’ scores on a post-test assessment of orientation knowledge, compared to their pre-test scores. change in students’ anxiety levels and perceptions of vr. experimental phase the experimental phase of the study was conducted in august 2019. twenty-nine students agreed to participate in this study. the age and gender characteristics of this population are as follows: fourteen under age 35, eleven age 35–44, four age 45+; nine male, seventeen female, and three fluid or transgender. thirty-three percent of the students who agreed to participate were in the control group, while 67 percent were in the experimental group. all participants in the study received a free vr headset, which was theirs to keep. funding for these vr headsets was provided by a generous grant from a benefactor at the researchers ’ university. participants in the control group were encouraged to use the vr headset after they had completed their participation in the study. both groups received instructions with their viewer that instructed them to complete a pre-test survey, embedded within a module of their learning management system account. following the pre-test, the experimental group was instructed to use the vr experience created by the researchers to learn about their library school, its faculty, and the library concepts. the control group was instructed to use links provided in the module to experience the same content, but without the vr experience. following the experience, both groups were instructed to complete a post-test survey in the module, as well as a follow-up survey that asked questions about how long they interacted with the content, how the experience affected their program anxiety, and additional comments. once the data was collected for all participants, the researchers’ conducted a series of analyses on the data, including an analysis of covariance (ancova) for post-test scores among the control and experimental groups, and ancova for program anxiety following the experimental treatment. 15 results figure 2 displays the amount of time participants in the experimental group spent using the vr experience. nearly 60 percent of participants spent more than 25 minutes using the virtual reality experience. this finding may seem remarkable, given the average attention span of students is generally no more than a handful of minutes, but aligns with that of geri, winer, and zaks, who found that engagement with interactive video lengthens the attention span of users, and supports the premise of engagement theory as discussed in the literature review.16 only 10 percent of individuals assigned to the experimental group decided not to use the headset. additionally, about information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 7 one-third of participants in both the experimental and control groups indicated that they used the vr headset to view other content after they completed the study. figure 2. amount of time experimental group participants spent in vr experience in table 2, responses for likert question about the participants’ post-test perspectives of vr are shown. participants in the vr group generally had more favorable perspectives on their experience than participants in the control group. participants in the control group, however, were a bit more optimistic on the idea that vr has promising uses for education and librarianship (though both groups expressed optimistic perspectives on these questions). there was some indication that participants would be willing to use vr for student orientation again, as both groups responded favorably to the idea that vr orientation information is appropriate and negatively to the idea that it would be better to get information from other sources. tables 3 and 4 display the ancova for pre-test/post-test score change among groups and the change in anxiety among the groups, respectively. post-test scores for the experimental (17.23 correct out of 20 questions, or 86 percent) and control group (17.38/20, or 87 percent) were virtually identical; however the pre-test scores differed (experimental group, 72 percent, scored worse on the pre-test than control group, 78 percent), so the change in scores was actually greater for the experimental group. as shown in table 3, though, this difference in score change was not found to be statistically significant, f (1, 20) = .641 p = .4, r = .01. that is, no significant difference was found as to whether vr improves scores compared to links. it can be concluded, however, that information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 8 the links and vr together did improve scores from the pre-test to the post-test, with ancova values of f (1, 20) = 7.6, p < .01, r = .47. table 2. post-test perspectives of vr for experimental and control groups question control (textlinks)* experimental (vr)* the instructions were easy to understand and follow 3 3.38 the viewer/text-links were fun to use 3 3.63 the vr/text-links content was engaging 3 3.13 i would recommend continuing vr/textlinks use 2.67 3 i felt better informed about the topics presented 2.5 3.11 the information given was helpful 2.5 3.38 i feel more connected to the school than before 2.5 2.88 virtual reality is just a fad 2 2.88 there are exciting uses for vr in education 4 3.5 there are exciting uses for vr in librarianship 4 3.5 using vr is too time consuming 2 3 i’d rather get information in formats other than vr 2.5 2.89 vr orientation information is appropriate 4 3.38 *five-point likert scale (level of agreement—1, strongly disagree; 5, strongly agree) table 3. ancova for pre-test/post-test change in scores degrees of freedom fvalue pvalue pretest 1 .135 .7 group 1 .641 .4 error 18 total 19 corrected total 20 though the vr group generally reported less anxiety on a five-point likert scale following the experiment than the control group (both groups showed some reduction), this difference was not statistically significant at p<.05 (though it was significant at p<.1). it is worth noting that few students indicated prior experience with vr before this study, so it may have simply been the unfamiliar technology that resulted in anxiety not dropping as far as anticipated, not the nature of the content. at the same time, it is worth noting, as bawden and robinson did, that information overload, which could certainly be the product of immersive vr orientations, is connected to information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 9 information anxiety.17 thus, it may be better, in the design of vr orientations, to keep the amount of new information at a minimum, only introducing broad concepts and allowing more freedom and flexibility for the user. table 4. ancova for anxiety following the orientation experience sum of squares df mean square f sig. between groups 3.219 1 3.219 3.44 9 .07 9 within groups 17.733 19 .933 total 20.952 20 discussion participants in this study expressed willingness to use vr for extended periods of time (over 25 minutes) and demonstrated strong levels of engagement. based on this finding, it seems possible that a well-designed vr orientation could be a suitable substitute for the in-person orientation for distance students. this is a significant finding, given that the majority of existing research on orientation for distance education students focuses on the design of online course modules or video streaming for orientation, which are not nearly as immersive and dynamic as physical presence in the environment.18 vr much more closely emulates physical presence than noninteractive/immersive videos and text. those among the participants who were in the experimental (vr) group expressed more favorable perspectives towards the technology. this suggests that experience with the technology increases comfort and interest in the technology. this aligns with the findings of theung, mei-ling, liu, cheok, among others, who found that use of vr were more likely to accept the technology after usage.19 additionally, stated interest in using vr for other purposes, including one-third of participants who have already utilized the technology to explore other apps suggested by the researchers. the findings of this study align with game engagement theory in several of its key aspects. vr is shown to have garnered the interest of the students who participated in the study, as indicated in table 2, aligning with the aspect of interest. they could see the purpose of the experience and were able to take control of the experience to ensure that they interacted with necessary information to satisfy this purpose. this is opposed to the control group, which had to follow links and read text in a sequential order with little control or creativity involved. accordingly, greater improvement in scores was observed for the experimental group. even though the improvement was not statistically significant, this could likely be explained by the relatively small sample size. with a larger number of participants, the statistical strength of the differences between the two study groups may have been more pronounced. this is one limitation of the present study. in addition to a small participant group, several other limitations exist with this study. participants came from only a small sample of states, all in the western half of the united states. a less homogeneous sample may have produced more robust results. some vr headsets arrived late due to delays in distributing them, giving the students less opportunity to review the content than information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 10 they otherwise may have had. finally, the researchers were not able to easily troubleshoot problems with accessing the vr experience for distance students. while the best was done to help all participants figure out how to use the technology, several students opted to discontinue participation when the technology gave them trouble. this also led to a smaller study sample population than initially anticipated. conclusion the findings of this study may have several important implications for library professionals who are considering using vr technology for library orientations or instruction. this study found vr to have a positive effect on students’ interest and to slightly increase scores and reduce anxiety among them. while there is no indication from this study whether vr would produce positive effects over a sustained period of time (e.g., every class session over the course of a semester), in limited usage it appears to at least draw students’ attention more so than the traditional online teaching options like static text and links. the same vr experience developed to introduce students to basic concepts within the librarianship/the library could be used for undergraduate and graduate students in all majors during library orientation sessions. this may make the library a more memorable component of students’ early university experiences, as opposed to lecture information that students are likely to easily forget. library professionals may consider these factors when deciding whether to opt for the more traditional methods of instruction/orientation or experimenting with a more innovative method of teaching like virtual reality. endnotes 1 jennifer j. vogel et al., “using virtual reality with and without gaming attributes for academic achievement,” journal of research on technology in education 39, no. 1 (2006): 105–18, https://doi.org/10.1080/15391523.2006.10782475. 2 yigal rosen, “the effects of an animation-based on-line learning environment on transfer of knowledge and on motivation for science and technology learning,” journal of educational computing research 40, no. 4 (2009): 451–67, https://doi.org/10.2190/ec.40.4.d; elisha chambers, efficacy of educational technology in elementary and secondary classrooms: a metaanalysis of the research literature from 1992–2002 (carbondale, il: southern illinois university at carbondale, 2002). 3 elisha chambers, “efficacy of educational technology in elementary and secondary classrooms: a meta-analysis of the research literature from 1992–2002,” phd diss., southern illinois university at carbondale, 2002. 4 jason m. harley et al., “comparing virtual and location-based augmented reality mobile learning: emotions and learning outcomes,” educational technology research and development 64, no. 3 (2016): 359–88, https://doi.org/10.1007/s11423-015-9420-7; jocelyn parong and richard e. mayer. “learning science in immersive virtual reality,” journal of educational psychology 110, no. 6 (2018): 785–95, https://doi.org/10.1037/edu0000241; paul legris, john ingham, and pierre collerette, “why do people use information technology? a https://doi.org/10.1080/15391523.2006.10782475 https://doi.org/10.2190%2fec.40.4.d https://doi.org/10.1007/s11423-015-9420-7 https://psycnet.apa.org/doi/10.1037/edu0000241 information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 11 critical review of the technology acceptance model,” information and management 40, no. 3 (2003): 191–204, https://doi.org/10.1016/s0378-7206(01)00143-4. 5 zaid khot et al., “the relative effectiveness of computer‐based and traditional resources for education in anatomy,” anatomical sciences education 6, no. 4 (2013): 211–15, https://doi.org/10.1002/ase.1355; michael j. robertson and james g. jones, “exploring academic library users’ preferences of delivery methods for library instruction,” reference & user services quarterly 48, no. 3 (2011): 259–69. 6 joshua kim, “instructional designers by the numbers,” inside higher ed (2015), https://www.insidehighered.com/blogs/technology-and-learning/instructional-designersnumbers. 7 elena olmos-raya et al., “mobile virtual reality as an educational platform: a pilot study on the impact of immersion and positive emotion induction in the learning process,” eurasia journal of mathematics science and technology education 14, no. 6 (2018): 2045-57, https://doi.org/10.29333/ejmste/85874. 8 brady d. lund and shari scribner, “developing virtual reality experiences for archival collections: case study of the may massee collection at emporia state university,” the american archivist, https://doi.org/10.17723/aarc-82-02-07. 9 lund and scribner, “developing virtual reality experiences for archival collections.” 10 kenneth j. varnum, “preface,” in kenneth j. varnum, ed., beyond reality: augmented, virtual, and mixed reality in the library (chicago: ala editions, 2019): x. 11 brady d. lund and daniel a. agbaji, “augmented reality for browsing physical collections in academic libraries,” public services quarterly 14, no. 3 (2018): 275–82, https://doi.org/10.1080/15228959.2018.1487812. 12 kenneth j. varnum, ed., beyond reality: augmented, virtual, and mixed reality in the library (chicago: ala editions, 2019). 13 nicola whitton, “game engagement theory and adult learning,” simulation and gaming 42, no. 5 (2011): 596–609, https://doi.org/10.1177/1046878110378587. 14 chris dede, “immersive interfaces for engagement and learning,” science 323, no. 5910 (2010): 66–69, https://doi.org/10.1126/science.1167311. 15 pat dugard and john todman, “analysis of pre‐test‐post‐test control group designs in educational research,” educational psychology 15, no. 2 (1995): 181–98, https://doi.org/10.1080/0144341950150207. https://doi.org/10.1016/s0378-7206(01)00143-4 https://doi.org/10.1002/ase.1355 https://www.insidehighered.com/blogs/technology-and-learning/instructional-designers-numbers https://www.insidehighered.com/blogs/technology-and-learning/instructional-designers-numbers https://doi.org/10.29333/ejmste/85874 https://doi.org/10.17723/aarc-82-02-07 https://doi.org/10.1080/15228959.2018.1487812 https://doi.org/10.1177%2f1046878110378587 https://doi.org/10.1126/science.1167311 https://doi.org/10.1080/0144341950150207 information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 12 16 nitza geri, amir winer, and beni zaks, “challenging the six-minute myth of online video lectures: can interactivity expand the attention span of learners?,” online journal of applied knowledge management 5, no. 1 (2017): 101–11. 17 david bawden and lyn robinson, “the dark side of information: overload, anxiety and other paradoxes and pathologies,” journal of information science 35, no. 2 (2009): 180–91, https://doi.org/10.1177/0165551508095781. 18 moon-heum cho, “online student orientation in higher education: a developmental study,” educational technology research and development 60, no. 6 (2012): 1051–69, https://doi.org/10.1007/s11423-012-9271-4; karmen crowther and alan wallace, “delivering video-streamed library orientation on the web: technology for the educational setting,” college and research libraries news 62, no. 3 (2001): 280–85. 19 yin-leng theng et al., “mixed reality systems for learning: a pilot study understanding user perceptions and acceptance,” international conference on virtual reality (2007): 728–37, https://doi.org/10.1007/978-3-540-73335-5_79. https://doi.org/10.1177/0165551508095781 https://doi.org/10.1007/s11423-012-9271-4 https://doi.org/10.1007/978-3-540-73335-5_79 abstract literature review emporia state university’s school of library and information management methods research questions hypothesis variables experimental phase results discussion conclusion endnotes migration of a research library's ict-based services to a cloud platform communication migration of a research library’s ict-based services to a cloud platform francis jayakanth, ananda t. byrappa, and filbert minj information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13537 francis jayakanth (francis@iisc.ac.in) is scientific officer, jrd tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, jrd tata memorial library, indian institute of science. filbert minj (filbert@iisc.ac.in) is principal research scientist, supercomputer education and research centre, indian institute of science. © 2022. abstract libraries have been at the forefront in adopting emerging technologies to manage the library’s operations and provide information services to the user community they serve. with the emergence of cloud computing (cc) technology, libraries are exploring and adopting cc service models to make their own services more efficient, reliable, secure, scalable, and cost-effective. in this article, the authors share their experience migrating some of the library’s locally hosted ict-based services onto the microsoft azure cloud platform. the migration of services to a cloud platform has helped the library significantly reduce the downtime of its services due to power or network or system outages. introduction established in 1909, the indian institute of science is a leading advanced education and research institution in the sciences and engineering. since its inception, the institute has balanced an emphasis on pursuing basic knowledge with applying its research findings for industrial and societal benefit. the institute, which started with just two departments—general and applied chemistry and electrical technology—now has over 40 departments spread across six divisions: biological sciences, chemical sciences, electrical sciences, interdisciplinary research, mechanical sciences, and physical and mathematical sciences. the institute’s jrd tata memorial library (https://library.iisc.ac.in) celebrated its centenary in 2011. established in 1911, the library was one of the earliest central facilities created by the institute to support teaching and research. the library offers both conventional and contemporary services to its users. the library’s traditional services include reference, referral, cataloguing and classification, circulation, inter library loan, document delivery, weekly display of recent periodicals and books, and photocopying. some of the library’s current information and communications technology (ict)based services include digital repository services for the institution’s research publications and theses and dissertations, a faculty profiling system, a web-based online public access catalogue (web opac), and shibboleth-based federated access to the library’s subscribed online resources. the library also facilitates information literacy services such as library orientations, workshops, seminars, demonstrations, invited talks, training sessions on subscribed resources, trial access to new products and services, and author workshops on the research publishing process. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:filbert@iisc.ac.in https://library.iisc.ac.in/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 2 until 2018, the library used its on-premises it infrastructure to provide these ict-based services. the library had dedicated computer servers for its email, institutional repository, library website, integrated library management system (lms), and online journal publishing system. the institution’s faculty profiles system is part of the indian research information system (https://irins.org/irins/), a web-based research information management service provided by the information and library network (inflibnet) centre. the library’s in-house servers were ageing, and they were even beginning to fail. also, managing the in-house servers with limited human resources was increasingly challenging. as a result, the library contemplated moving some of its services to a cloud platform. even smaller libraries had begun migrating their services to the cloud platform almost a decade ago.1 around 2016, the institution established the digits (digital campus and information technology services) office to conceive, plan, and create a best-in-class information technology and networking system and support operational excellence through agile it and networking services. to date, the digits office has, among other projects, successfully migrated more than 70 departmental email servers to a centrally managed cloud-based microsoft office 365 suite and developed and migrated the institute’s main website (https://www.iisc.ac.in) and more than 150 websites and 10 web portals of institution departments, centres, and other facilities to the microsoft azure platform. the digits office also creates and maintains virtual machines (vms) on the microsoft azure cloud platform for the institution’s departments and offices. migration of locally hosted it infrastructure to a cloud platform offers several benefits to the organization. these benefits include setting up virtual offices accessible from anywhere and at any time, avoiding capital investment in computing infrastructure, taking advantage of the cloud platform’s elastic computing resources, avoiding the necessity of having a dedicated it team, and, most importantly, minimizing downtime and loss of productivity and data. moreover, a cloud platform offers easy scalability, redundancy, and security. achieving these features in the traditional in-house hosting of computing infrastructure would be cost prohibitive. 2 the library has configured three vms on the azure platform and has moved some of its ict-based services to the cloud platform. migrating ict-based services to the cloud platform has helped the library significantly reduce the downtime of computer servers. cloud computing and its service models cloud computing (cc) refers to computer hardware and software provided as a service by another company. the only requirement to access the cloud computing service is a device with access to the internet. some leading cc service providers include amazon web services (https://aws.amazon.com/what-is-aws/), microsoft azure (https://azure.microsoft.com/en-in/), and google cloud (https://cloud.google.com). there are three service models in cloud computing: software as a service (saas), platform as a service (paas), and infrastructure as a service (iaas). service providers host software applications on their cloud platforms in the saas model. examples of the saas model include google apps (https://workspace.google.com/) and microsoft office 365 (https://www.microsoft.com/enin/microsoft-365). clients opting for the saas model need not worry about installation, setup, running, and maintaining the applications. service providers will do that for the clients. https://irins.org/irins/), https://www.iisc.ac.in/ https://aws.amazon.com/what-is-aws/ https://azure.microsoft.com/en-in/ https://cloud.google.com/ https://workspace.google.com/ https://www.microsoft.com/en-in/microsoft-365 https://www.microsoft.com/en-in/microsoft-365 information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 3 paas provides a computing platform comprising an operating system, database, programming environment, and application programming interface. examples of the paas model include amazon elastic beanstalk (https://aws.amazon.com/elasticbeanstalk/), windows azure (https://azure.microsoft.com/en-in/), and google compute engine (https://cloud.google.com/compute). in the iaas service model, clients can obtain computing infrastructure, virtual machines, networking, and storage components on demand and deliver them over the internet. examples for the iaas model include google compute engine, amazon ec2, and microsoft azure. in coordination with the digits office, the library initially provisioned three vms on the microsoft azure cloud platform to migrate some of its it-based services. table 1 shows the initial hardware configurations of each of the three vms. table 1. vm types and their system configurations along with the cost virtual machine (vm) vm type1 vcpus & ram (gb) cost / month (usd)2 os disk (gb) & type secondary storage disk (gb) & type storage cost / month (usd)3 os vm1 (ir services) standard f4s_v2 4, 8 140 400 (ssd) 600 (ssd) 114 cent 7.x vm2 (ilms) standard d4s_v3 4, 16 148 300 (ssd) 200 (ssd) 57 cent 7.x vm3 (website) standard f4s 4, 8 140 300 (ssd) 200 (ssd) 57 ubuntu 18.x 1as of 2018 and subject to change with time. 2cost as prevalent in 2018. 3cost as prevalent in 2018. a virtual machine (vm) is an on-demand and scalable computing resource available on cc platforms. vms give better control over the computing environment without buying any underlying physical hardware. the microsoft azure platform offers various vm options, each optimized for different workloads. for example, the d-series azure vms provide a combination of vcpus (virtual cpus), memory, and temporary storage to meet the requirements associated with most production workloads. categories in the d-series of vms include ds-series, dds-series, and das-series. the f-series vms feature a higher cpu-to-memory ratio, are equipped with 2 gb ram and 16 gb of local solid-state drives (ssds) per cpu core, and are optimized for compute-intensive workloads. f-series vms are costlier than the corresponding d-series vms (https://docs.microsoft.com/en-us/azure/virtualmachines/sizes). for the secondary storage, apart from the standard hard disk drives (hdds), vms support azure premium ssds and ultra-disk storage, depending on regional availability. the premium ssds are designed to support intensive input/output workloads. they are priced almost three times higher than the standard hdds. the standard disk capacity of an azure vm’s os disk is 30 gb, and it can be increased to the desired capacity. apart from the os disk, one can also have a required amount of secondary disk storage. the cost for the additional disk storage (both os and data) is https://aws.amazon.com/elasticbeanstalk/ https://azure.microsoft.com/en-in/ https://cloud.google.com/compute https://docs.microsoft.com/en-us/azure/virtual-machines/sizes https://docs.microsoft.com/en-us/azure/virtual-machines/sizes information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 4 independent of vm pricing, and it depends on storage type and capacity. a vcpu refers to a virtual central processing unit. a vm treats each vcpu as a single physical core. migration of the library’s ict-based services to microsoft azure cloud platform libraries have always been at the forefront of adopting emerging technologies. it is true with cc technology as well. in an interview at the american library association annual meeting in anaheim, california, in june 2012, clifford lynch traced 30 years of interactions between libraries and new technologies.3 with the evolution of cc technologies, libraries have been using cc’s saas and iaas service models since 2009 to host their websites, library management system (lms), and digital repositories. libraries have been using the cc mainly for saas and iaas services.4 as a first step, during mid-2017, the digits office began migrating all the 70+ individual departmental email servers, including the library’s, to a centrally managed, cloud-based mailing solution using office 365 (now microsoft 365) exchange online. after the successful migration of all the email servers, the library shut down its email server. next, the library decided to migrate some of its locally hosted ict-based services to the cloud platform in a phased manner. planning the migration process: considering a single vm or independent vms for each application before undertaking the migration process, libraries need to consider what types of projects are good candidates for the cloud and what types are not.5 in the first phase of the cloud migration, the library decided to migrate the following services: (1) institutional repository services, (2) the library management system, and (3) the library website. before the cloud migration, the library used three independent on-premises servers to host the above services. a sun fire computer server with intel xeon processor, 4 gb of ram, and 2 tb of secondary storage hosted the institutional repository service for research publications using eprints software and the electronic theses and dissertations service using dspace software. the libsys library management system was hosted on an ibm server with intel xeon cpu e5-2620 v2 @ 2.10ghz processor, 16 gb ram, and 1 tb of secondary storage. the library website was hosted on an ibm thinkserver ts150 server with intel xeon cpu e3-1225 v5 @ 3.30ghz processor, 8 gb ram, 1 tb of secondary storage. all three computer servers had been in use for nearly 10 years and were long overdue for replacement. as the library contemplated upgrading its ict infrastructure, provisioning vms on the azure cloud platform through the digits office was a stimulus. next, the library had to decide whether to go with a single vm with robust hardware configuration to host all three applications or to provide independent vms for each service. based on the experience gained from hosting two ir services on a single server, the library decided to go again with a single vm with robust hardware configuration to host the ir services. the lms is a critical application for managing all library functions; therefore, the library decided to host it on a separate vm. a third vm is used to host the library website. as the library eventually plans to move its other ict-based applications to the azure platform, it could migrate and distribute those applications on the existing three vms based on the utilization and load on each of the three vms. initially, the library opted for two vms in the f-series and one in the d-series with premium ssds for all three vms. after observing performance and price for about three months, the library had information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 5 one of the two vms (vm3) moved from f-series to d-series and downgraded the os and data disk types of all the three vms to the standard disk drives. the data disk on vm3, hosting the library website, was dropped as the os disk capacity was more than adequate to run the service. table 2 shows the revised vm types and their configurations. in april 2020, most of our students and faculty members started working off-campus because of the onset of the covid-19 pandemic. to facilitate seamless access to licensed online resources from off-campus, the library set up federated access through shibboleth sso.6 the library provisioned a new virtual machine (vm4) with a system configuration as indicated in table 2. table 2. revised vm types and their system configurations along with the cost virtual machine (vm) vm type4 vcpus & ram (gb) cost/ month (usd)2 os disk (gb) & type secondary storage (gb) & type storage cost / month (usd)3 os vm1 (ir services) standard f4s_v2 4, 8 147 400 (ssd) 600 (hdd) 39 cent 7.x vm2 (ilms) standard d4s_v3 4, 16 148 300 (ssd) 200 (hdd) 21 cent 7.x vm3 (website) standard d2s_v3 4, 8 81 300 (hdd) none nil ubuntu 18.x vm4 idp server standard f2s_v2 2, 4 71 300 (hdd) none nil ubuntu 18.x 1as of 2019 and subject to change with time. 2cost as prevalent in 2019. 3cost as prevalent in 2019. the cloud migration process cloud migration is the process of moving applications and data from an organization’s onpremises computers to a cloud platform. before undertaking the migration process, the requisite software applications must be installed and configured on the vms. then, the data corresponding to each application must be backed up on the on-premises system and moved to the corresponding vms. coordination with the campus network support team is essential to ensure the vms are accessible on the internet with all the security measures in place. so, every application migrating to the cloud platform has to go through a cloud migration process. in the following sections, the authors briefly describe the library’s migration process to migrate three of its ict-based services to the azure cloud. the library completed the entire migration process in about three months. migration process for the research publications repository eprints@iisc, the institute’s institutional repository (ir), was established in 2002 and holds nearly 55,000 publications. it is one of the earliest repositories in this part of the world. 7 the ir runs on eprints (https://www.eprints.org/uk/), the world-leading open-source digital repository platform. developed at the university of southampton for over 20 years, eprints has provided stable, innovative repository services across academia and beyond. eprints is a stable, flexible, https://www.eprints.org/uk/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 6 reliable software and ideal for maintaining institutional repositories. before the migration, the library hosted the publications repository on an on-premises server for almost 17 years. eprints software depends on other software, including apache web server with mod_perl (https://httpd.apache.org/), mysql/mariadb (https://mariadb.org/) relational database management system, perl programming language, and several perl modules. eprints software bundles many required perl modules, but some are installed depending on the underlying operating system (os). for example, for the eprints@iisc repository installed on a vm running the cent os, the library has installed apache web server with mod_perl, mariadb relational database management system, and a few missing perl modules. after installing all the dependent software, the library followed the steps listed below to migrate the publications repository to the vm on the azure cloud platform: 1. installed the latest version of the eprints (3.4.1) software on the vm and incorporated all the local customizations done at the configuration and code levels. 2. created a new repository and retained the existing repository name. 3. created a new mariadb database and assigned appropriate grant permissions to the database. 4. as the database structure had changed in the latest version of eprints, executed the necessary scripts built into the eprints software to update the eprints database structure. 5. imported customized institute-specific subject headings to override the default ones. 6. moved the database and full-text backup files to the vm using winscp—an open-source, free ftp client (https://winscp.net/eng/docs/introduction). 7. restored the backups comprising eprints mysql database and full-text files on the vm to the corresponding locations on the file system and uncompressed database and full-text files. 8. imported the mysql database into the new mariadb database on the vm. 9. regenerated all the static pages of the repository, abstracts of all the records, and browse views for the year, author, document type, and subject categories on the vm. 10. enabled hypertext transfer protocol secure (https) for log in and account creation links. 11. configured postfix (http://www.postfix.org/). postfix is a free and open-source mail transfer agent that routes and delivers electronic mail. 12. coordinated with the institute’s network support team to make necessary changes in the dns entries to reflect the vm’s new public ips and enable the vm to send and receive emails. 13. created crontab entries on the vm to run the cron jobs. a cron job is a time-based job scheduler in a unix-like computer operating system. some of the cron jobs include updating the latest records added to the repository, displaying the latest count of records in the repository, and updating the browse views of the repository. migration process for the electronic theses and dissertations repository established in 2005, etd@iisc, the institution’s electronic theses and dissertations repository, is one of the earliest etd repositories in this part of the world. 8 the library uses dspace software (https://duraspace.org/dspace/) to maintain the etd repository. to date, the repository holds nearly 6,000 of the institute’s etds. before the migration, the library hosted the etd repository on an on-premises server for almost 13 years. https://httpd.apache.org/ https://mariadb.org/ https://winscp.net/eng/docs/introduction http://www.postfix.org/ https://duraspace.org/dspace/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 7 dspace software is dependent on several third-party software applications and tools, including java jdk, apache maven (https://maven.apache.org/), apache ant (https://ant.apache.org/), postgresql (https://www.postgresql.org/) or oracle (https://www.oracle.com/in/database) relational database management system, and apache tomcat servlet engine (http://tomcat.apache.org/). the library followed the steps listed below to migrate the etd repository to the vm on the azu re cloud platform: 1. installed all the dspace-dependent software packages and the latest version of dspace software (version 6.3). 2. configured the dspace software to incorporate the native customizations. 3. created communities and collections to reflect the divisions and the corresponding departments and centres of the institute using java script. 4. set the access permission for each collection of the repository based on the users and groups belonging to the collection. 5. modified the metadata of the etds from the on-premises version using a script. the modified metadata was imported into the latest version of dspace. 6. copied and moved etd items comprising pdf files from the on-premises server to the vm. 7. enabled hypertext transfer protocol secure (https) for the etd site. 8. customized the default etd site for a better look and feel. 9. created crontab entries on the vm to run the cron jobs to take incremental backup and display the etd count on the landing page. the new version of the dspace user registration system was modified to enable only people with an institute email id to register with etd@iisc. in addition, the registration process captures the registrant’s department and the division, which helps automate the process of assigning the registrant to a specific collection of the repository. therefore, a user will submit an etd only to the designated collection. migration process for the libsys library management system the library has been using libsys (https://www.libsys.co.in/), a commercial lms, for over 25 years. libsys is dependent on several other software applications, including wildfly application server (https://www.wildfly.org/), java jdk, and mysql (mariadb) relational database management system. the steps involved in migrating libsys to the cloud (vm2) are listed below: 1. installed all the libsys-dependent software components and the latest version of the libsys software on the vm. 2. restored the mariadb database backup. 3. made required changes in the libsys configuration files. 4. installed and configured postfix email transfer agent for email communication. 5. as the libsys service and the opac run on nonstandard ports, the library coordinated with the network support team to open the required communication ports on the vm. https://maven.apache.org/ https://ant.apache.org/ https://www.postgresql.org/ https://www.oracle.com/in/database http://tomcat.apache.org/ https://www.libsys.co.in/ https://www.wildfly.org/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 8 migration process for the library website the library uses drupal (https://www.drupal.org/), a content management system, to host its website. the steps involved in the migration process are listed below: 1. installed all the drupal-dependent software, including apache web server, mariadb, and php (https://www.php.net/), on the vm. 2. installed one of the latest versions of drupal using its web-based installer. 3. installed the required drupal plugins and the drupal theme. 4. restored drupal database backup on the vm. 5. installed and configured postfix email transfer agent for email communication. after completing migration processes, the library coordinated with the network support team to make changes in the domain name system (dns) to enable access to all the three vms on the internet. monitoring the azure virtual machines azure monitor (am) for vms includes a set of performance charts that target several key performance indicators to determine how well a virtual machine performs. the graphs show resource utilization over a period to identify bottlenecks and anomalies or switch to a perspective listing of each vm to view resource utilization based on the metric selected. while there are numerous elements to consider when dealing with performance, azure monitor for vms monitors critical operating system performance indicators related to the processor, memory, network adapter, and disk utilization. performance complements the health monitoring feature and helps expose issues that indicate a possible system component failure, support tuning and optimization to achieve efficiency, or support capacity planning (https://docs.microsoft.com/enus/azure/azure-monitor/insights/vminsights-performance). am is accessible only to the cloud administrator. based on the am charts, the library’s inference has been that the ir server (vm1) hosting publications and etd repositories needs capacity planning. cpu utilization is reaching maximum capacity quite frequently. therefore, the library plans to move the etd repository to an independent vm. the utilization of the ilms server (vm2) is less than optimal. therefore, the library decided to migrate publication of the institution’s journal of the indian institute of science (jiisc) from onpremises hosting to the ilms server (vm2) on the azure cloud. for hosting jiisc on the azure cloud platform, the library uses the open journal system (ojs) platform (https://pkp.sfu.ca/ojs/). ojs is open-source software for the management of peerreviewed academic journals. ojs is dependent on other software and tools, including apache web server, mysql or postgresql, and php. the library used the virtual hosting concept to host multiple sites on a single vm (vm2). virtual hosting is a method of hosting multiple domain names on a single server. https://www.drupal.org/ https://www.php.net/ https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-performance https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-performance https://pkp.sfu.ca/ojs/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 9 benefits observed of migrating to the cloud moving some of the ict-based services of the library to the azure cloud platform has resulted in the following benefits: 1. service reliability has improved significantly as the reliance on the in-house ageing servers has been done away with. 2. cloud migration has made the library’s computing infrastructure more flexible. it can be scaled up or down as per the library’s requirement. 3. operational logs and usage metrics are easy to obtain. 4. set alert rules are based on vm metrics. 5. the cloud hosting company’s managed services include periodic backups, ensuring that data is secure. 6. users can now quickly move between the library and home (or any other location) and access all their research. 7. another significant benefit to cloud computing for library users is sharing information easily. libraries provide collaborative spaces within the building, but patrons also can use collaborative online spaces. onedrive provides access to online storage and allows sharing of files and folders among approved users. 9 lessons learned during and after the migration process initially, the library opted for two vms on azure’s f-series and one in the d-series with the ssd storage devices for all three vms. as stated above, the pricing of the azure vms depends on the hardware configuration of a specific type of vm series. for example, the f-series with a particular hardware configuration and ssd costs more than its counterparts on the dor b-series with the same hardware configuration with a standard hdd. libraries should, therefore, have a clear understanding of the vm types and the corresponding costing aspect. after observing vms performance and the cost aspect for a few months, the library moved one of the two vms from the f-series to the d-series and switched to standard hard disk drives for all three vms. the changeover did not result in any performance degradation, but the cost of the secondary storage came down by about one-third. the library maintains two institutional repositories, one for research publications using the eprints application and the other for theses and dissertations using the dspace application. the library decided to migrate both the repositories to a single vm. however, it turns out that this decision was not a prudent one, for the tomcat server running eprints crashes, often resulting in downtime for the etd service. the vm usage metrics reveal that the dspace application often utilizes nearly 100% of cpu capacity, which leads to the freezing of the tomcat server, resulting in etd service becoming unresponsive. the library contemplates upgrading the hardware configuration or setting up the two repository applications on two vms. the library is checking to understand if the issue is with the tomcat server configurations. the graph shown in figure 1 is the screenshot of azure’s metric monitoring of the vm1 running eprints and dspace applications. the peaks in the graph represent the cpu usage by the dspace application. it is evident from the graph that quite frequently, the cpu usage of the dspace application is reaching 100%, which eventually leads to the freezing of the tomcat web server. information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 10 figure 1. screenshot representing the cpu usage of eprints and dspace applications. the professional library staff administered and managed the on-premises ict-based services of the library. therefore, the library did not encounter any technical challenges during and after the cloud migration process. other libraries that intend to migrate their services to a cloud platform should ensure that the library staff entrusted with the migration process are comfortable working with the command prompt, especially in the linux operating system environment. conclusions it has been more than two years since some of the library’s ict-based services migrated to the microsoft azure platform. to date, the library has not experienced a single instance of its servers being down because of a power outage, network issue, or crashing of the vms. however, there have been issues with specific services like the apache tomcat servlet engine or the apache web server crashing, resulting in the corresponding application being unresponsive. such behaviour can result because, at times, the system resources, especially the cpu and ram, may be used to their capacity. restarting the specific service will ensure that the corresponding application comes up. therefore, it is essential to keep track of the ram and cpu usage of the vms and upgrade them if the situation warrants such an action. the rapid elasticity characteristic of cloud computing facilitates organizations to configure optimal computing resources based on actual requirements. based on the library’s initial experience running the ict-based services on a cloud platform, the authors suggest that deploying two different institutional repository software platforms like eprints and dspace on a single vm may not be a good idea. the tomcat instance powering the dspace site runs with higher cpu usage, swinging up to 100% and at times going beyond 100% cpu usage. the high cpu usage by the tomcat instance can eventually lead to its freezing, resulting in the corresponding service being inaccessible. working with the vms demands some degree of familiarity in working in the command prompt. library staff who are not comfortable working in the command prompt will require additional training in getting used to the vm environment. in our case, the training was not necessary as the authors had adequate experience working in the linux operating system. also, library staff managing the cloud infrastructure need to coordinate with the organization’s networking and information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 11 email support staff to configure the vms to be accessible on the internet, email communications , and enforce security measures. acknowledgements the authors would like to thank the editor and the referees for their insightful comments and suggestions. endnotes 1 robin r. hartman, “life in the cloud: a worldshare management services case study,” journal of web librarianship 6, no. 3 (2012): 176–85, https://doi.org/10.1080/19322909.2012.702612. 2 erik t. mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86, https://doi.org/10.1080/19322900903565259. 3 clifford lynch, elke greifeneder, and michael seadle, “interactions between libraries and technology over the past 30 years,” library hi tech 30, no. 4 (2012): 565–78, https://doi.org/10.1108/07378831211285059. 4 yan han, “iaas cloud computing services for libraries: cloud storage and virtual machines,” oclc systems and services 29, no. 2 (2013): 87–100, https://doi.org/10.1108/10650751311319296. 5 denis galvin and mang sun, “avoiding the death zone: choosing and running a library project in the cloud,” library hi tech 30, no. 3 (2012): 418–27, https://doi.org/10.1108/07378831211266564. 6 francis jayakanth, ananda t. byrappa, and raja vishvanathan, “off-campus access to licensed online resources through shibboleth,” information technology and libraries 40, no. 2 (2021), https://doi.org/10.6017/ital.v40i2.12589. 7 francis jayakanth et al., “eprints@iisc: india’s first and fastest-growing institutional repository,” oclc systems and services: international digital library perspectives 24, no. 1 (2008): 59–70, https://doi.org/10.1108/10650750810847260. 8 jobish pitchet, filbert minj, and tarikere basappa rajashekar, “etd@iisc.: a dspace-based etdms and oai compliant theses repository service of indian institute of science,” in etd 2005: evolution through discovery, 8th international symposium on electronic theses and dissertations, 28–30 september 2005 (sydney, australia: the university of new south wales). 9 tom ipri, “where the cloud meets the commons,” journal of web librarianship 5, no. 2 (2011): 132–41, https://doi.org/10.1080/19322909.2011.573295. https://doi.org/10.1080/19322909.2012.702612 https://doi.org/10.1080/19322900903565259 https://doi.org/10.1108/07378831211285059 https://doi.org/10.1108/10650751311319296 https://doi.org/10.1108/07378831211266564 https://doi.org/10.6017/ital.v40i2.12589 https://doi.org/10.1108/10650750810847260 https://doi.org/10.1080/19322909.2011.573295 abstract introduction cloud computing and its service models migration of the library’s ict-based services to microsoft azure cloud platform planning the migration process: considering a single vm or independent vms for each application the cloud migration process migration process for the research publications repository migration process for the electronic theses and dissertations repository migration process for the libsys library management system migration process for the library website monitoring the azure virtual machines benefits observed of migrating to the cloud lessons learned during and after the migration process conclusions endnotes microsoft word 13063 20211217 galley.docx article bridging the gap using linked data to improve discoverability and diversity in digital collections jason boczar, bonita pollock, xiying mi, and amanda yeslibas information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13063 jason boczar (jboczar@usf.edu) is digital scholarship and publishing librarian, university of south florida. bonita pollock (pollockb1@usf.edu) is associate director of collections and discovery, university of south florida. xiying mi (xmi@usf.edu) is digital initiative metadata librarian, university of south florida. amanda yeslibas (ayesilbas@usf.edu) is e-resource librarian, university of south florida. © 2021. abstract the year of covid-19, 2020, brought unique experiences to everyone in their daily as well as their professional life. facing many challenges of division in all aspects (social distancing, political and social divisions, remote work environments), university of south florida libraries took the lead in exploring how to overcome these various separations by providing access to its high-quality information sources to its local community and beyond. this paper shares the insights of using linked data technology to provide easy access to digital cultural heritage collections not only for the scholarly communities but also for those underrepresented user groups. the authors present the challenges at this special time of the history, discuss the possible solutions, and propose future work to further the effort. introduction we are living in a time of division. many of us are adjusting to a new reality of working separated from our colleagues and the institutions that formerly brought us together physically and socially due to covid-19. even if we can work in the same physical locale, we are careful and distant with each other. our expressions are covered by masks, and we take pains with hygiene that might formerly have felt offensive. but the largest divisions and challenges being faced in the united states go beyond our physical separation. the nation has been rocked and confronted by racial inequality in the form of black lives matter, a divisive presidential campaign, income inequality exacerbated by covid-19, the continued reckoning with the #metoo movement, and the wildfires burning the west coast. it feels like we are burning both literally and metaphorically as a country. adding fuel to this fire is the consumption of unreliable information. ironically, even as our divisions become more extreme, we are increasingly more connected and tuned into news via the internet. sadly, fact checking and sources are few and far between on social media platforms, where many are getting their information. the pew foundation report the future of truth and misinformation online warns that we are on the verge of a very serious threat to the democratic process due to the prevalence of false information. lee raine, director of the pew research center’s internet and technology project, warns, “a key tactic of the new anti-truthers is not so much to get people to believe in false information. it’s to create enough doubt that people will give up trying to find the truth, and distrust the institutions trying to give them the truth.”1 libraries and other cultural institutions have moved very quickly to address and educate their populations and the community at large, trying to give a voice to the oppressed and provide information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 2 reliable sources of information. the university of south florida (usf) libraries reacted by expanding antiracism holdings. usf’s purchases were informed by work at other institutions, such as the university of minnesota’s antiracism reading lists, which has in turn grown into a rich resource that includes other valuable resources like the mapping prejudice project and a link to the umbra search.2 the triad black lives matter protest collection at the university of north carolina greensboro is another example of a cultural institution reacting swiftly to document, preserve, and educate.3 these new pages and lists being generated by libraries and cultural institutions seem to be curated by hand using tools that require human intervention to make them and keep them up to date. this is also a challenge the usf libraries faced when constructing its new african american experience in florida portal, a resource that leverages already existing digital collections at usf to promote social justice. another key challenge is linking new digital collections and tools to already established collections and holdings. beyond the new content being created in reaction to current movements, there is already a wealth of information established in rich archives of material, especially regarding african american history. digital collections need to be discoverable by a wide audience to achieve resource sharing and educational purposes. this is a challenge many digital collections struggle with, because they are often being siloed from library and archival holdings even within their own institutions. all the good information in the world is not useful if it is not findable. an example of a powerful discovery tool that is difficult to find and use is the umbra search (https://www.umbrasearch.org/) linked to the university of minnesota’s anti-racism reading list. umbra search is a tool that aggregates content from more than 1,000 libraries, archives, and museums.4 it is also supported by highprofile grants from the institute of museum and library services, the doris duke charitable foundation, and the council on library and information resources. however, the website is difficult to find in a web search. umbra search was named after society of umbra, a collective of black poets from the 1960s. the terms umbra and society of umbra do not return useful results for finding the portal, nor do broader searches of african american history the portal is difficult to find through basic web searches. one of the few chances for a user to find the site is if they came upon the human-made link in the university of minnesota anti-racism reading list. despite enthusiasm from libraries and other cultural institutions, new purchases and curated content are not going to reach the world as fully as hoped. until libraries adopt open data formats in favor of locking away content in closed records like marc, library and digital content will remain siloed from the internet. the library catalog and digital platforms are even siloed from each other. we make records and enter metadata that is fit for library use but not shareable to the web. as karen coyle asked in her lita keynote address a decade ago, the question is how can libraries move from being “on the web” to being “of the web”?5 the suggested answer and the answer the usf libraries are researching is with linked data. literature review the literature on linked data for libraries and cultural heritage resources reflects an implementation that is “gradual and uneven.” as national libraries across the world and the library of congress develop standards and practices, academic libraries are still trying to understand their role in implementation and identify their expertise.6 information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 3 in 2006 tim berners-lee, the creator of the sematic web concept, outlined four rules of linked data: 1. use uris as names for things. 2. use http uris so that people can look up those names. 3. when someone looks up a uri, provide useful information, using the standards (rdf, sparql). 4. include links to other uris so that they can discover more things.7 it was not too long after this that large national libraries began exploring linked data and experimenting with uses. in 2010 the british library presented its prototype of linked data. this move was made in accordance with the uk government’s commitment to transparency and accountability along with the user’s expectation that the library would keep up with cutting edge trends.8 today the british library has released the british national bibliography as linked data instead of the entire catalog because it is authoritative and better maintained than the catalog.9 the national libraries of europe, spurred on by government edicts and europeana (https://www.europeana.eu/en), are leading the progress in implementation of linked data. national libraries are uniquely suited to the development and promotion of new technologies because of their place under the government and proximity to policy making, bridging communication between interested parties and the ability to make projects into sustainable services.10 a 2018 survey of all european national libraries found that 15 had implemented linked data, two had taken steps for implementation and three intended to implement it. even national libraries that were unable to implement linked data were contributing to the linked data open cloud by providing their data in datasets to the world.11 part of the difficulty with earlier implementation of linked data by libraries and cultural heritage institutions was the lack of a “killer example” that libraries could emulate.12 the relatively recent success of european national libraries might provide those examples. many other factors have slowed the implementation of linked data. a survey of norwegian libraries in 2009 found considerable gap in the semantic web literature between the research undertaken in the technological field and the research in the socio-technical field. implementing linked data requires reorganization of the staff, commitment of resources, education throughout the library and buy-in from the leadership to make it strategically important.13 the survey of european national libraries cited the exact same factors as limitations in 2018.14 outside of european national libraries the implementation of linked data has been much slower. many academic institutions have taken on projects that tend to languish in a prototype or proof of concept phase.15 the library-centric talis group of the united kingdom “embraced a vision of developing an infrastructure based on semantic web technologies” in 2006, but abandoned semantic web-related business activities in 2012.16 it has been suggested that it is premature to wholly commit to linked data, but it should be used for spin-off projects in an organization for experimentation and skill development.17 linked data is also still proving to be technologically challenging for implementation of cultural heritage aggregators. if many human resources are needed to facilitate linked data, it will remain an obstacle for cultural heritage aggregators. a study has shown automated interpretation of information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 4 ontologies is hampered by a lack of inter-ontology relations. cross-domain applications will not be able to use these ontologies without human intervention.18 aiding in the development and awareness of linked data practices for libraries is the creation and implementation of bibframe by the library of congress. the library of congress’s announcement in july 2018 that bibframe would be the replacement of marc definitively shows that the future of library records is focused on linking out and integrating into the web.19 the new rda (resource description and access) cataloging standards made it clear that marc is no longer the best encoding language for making library resources available on the web.20 while rda has adopted the cataloging rules to meet a variety of new library environments, the marc encoding language makes it difficult for computers to interpret and apply logic algorithms to the marc format. in response, the library of congress commissioned the consulting agency zepheria to create a framework that would integrate with the web and be flexible enough to work with various open formats and technologies, as well as be able to adapt to change. using the principles and technologies of the open web, the bibframe vocabulary is made of “resource description framework (rdf) properties, classes, and relationships between and among them.”21 eric miller, the ceo of zepheria, says bibframe “works as a bridge between the description component and open web discovery. it is agnostic with regards to which web discovery tool is employed” and though we cannot predict every technology and application bibframe can “rely on the ubiquity and understanding of uris and the simple descriptive power of rdf.”22 the implementation of linked data in the cultural heritage sphere has been erratic but seems to be moving forward. it is important to pursue though because bringing local data out of the “deep web” and making them open and universally accessible, means offering minority cultures a democratic opportunity for visibility.”23 linked data linked data is one way to increase the access and discoverability of critical digital cultural heritage collections. also referred to as semantic web technologies, linked data follows the w3c resource description framework (rdf) standards.24 according to tim berners-lee, the semantic web will bring structure and well-defined meaning to web content allowing computers to perform more automated processes.25 by providing structure and meaning to digital content, information can be more readily and easily shared between institutions. this provides an opportunity for digital cultural heritage collections of underrepresented populations to get more exposure on the web. following is a brief overview of linked data to illustrate how semantic web technologies function. linked data is created by forming semantic triples. each rdf triple contains uniform resource identifiers or uris. these identifiers allow computers (machines) to “understand” and interpret the metadata. each rdf triple consists of three parts: a subject, a predicate, and an object. the subject defines what the metadata rdf triple is about, while the object contains information about the subject which is further defined by the relationship link in the predicate. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 5 figure 1. example of a linked data rdf triple describing william shakespeare’s authorship of hamlet. for example, in figure 1, “william shakespeare wrote hamlet” is a triple. the subject and predicate of the triple are written as an uri containing the identifier information and the object of the triple is a literal piece of information. the subject of the triple, william shakespeare, has an identifier which in this example links to the library of congress name authority file for william shakespeare. the predicate of the rdf triple describes the relationship between the subject and object. the predicate also typically defines the metadata schema being used. in this example, dublin core is the metadata schema being used, so “wrote” would be identified by the dublin core creator field. the object of this semantic triple, hamlet, is a literal. literals are text that are not linked because they do not have a uri. subjects and predicates always have uris to allow the computer to make links. the object may have a uri or be a literal. together these uris, along with the literal, tell the computer everything it needs to know about this piece of metadata, making it self-contained. rdf triples with their uris are stored in a triple-store graph style database which functions differently from a typical relational database. relational databases rely on table headers to define the metadata stored inside. moving data between relational databases can be complex because tables must be redefined every time data is moved. graph databases don’t need tables since all the defining information is already stored in each triple. this allows for bidirectional flow of information between pieces of metadata and makes transferring data simpler and more efficient.26 information in a triple-store database is then retrieved using sparql, a query language developed for linked data. because linked data is stored as self-contained triples, machines have all the information needed to process the data and perform advanced reasoning and logic programming. this leads to better search functionality and lends itself well to artificial intelligence (ai) technologies. many of today’s modern websites make use of these technologies to enhance their displays and provide greater functionality for their users. the internet is an excellent avenue for libraries to un-silo their collections and make them globally accessible. once library collections are on the web, advanced keyword search functionalities and artificial intelligence machine learning algorithms can be developed to automate metadata creation workflows and enhance search and information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 6 retrieval of library resources. the use of linked data metadata in these machine-learning functions will add a layer of semantic understanding to the data being processed and analyzed for patron discovery. ai technology can also be used to create advanced graphical displays making connections for patrons between various resources on a research topic. sharing digital cultural heritage data with other institutions often involves transferring data and is considered one of the greatest difficulties in sharing digital collections. for example, if one institutional repository uses dublin core to store its metadata for a certain cultural heritage collection and another repository uses mods/mets to store digital collections, there must first be a data conversion before the two repositories could share information. dublin core and mods/mets are two completely different schemas with different fields and metadata standards. these two schemas are incompatible with each other and must be crosswalked into a common schema. this typically results in some data loss during the transformation process. this makes combining two collections from different institutions into one shared web portal difficult. linked data allows institutions to share collections more easily. because linked data triples are self-contained, there is no need to crosswalk metadata stored in triples from one schema into another when transferring data. the uris contained in the rdf triples allow the computer to identify the metadata schema and process the metadata. rdf triples can be harvested from one linked data system and easily placed into another repository or web portal. a variety of schemas can all be stored together in one graph database. storing metadata in this way increases the interoperability of digital cultural heritage collections. collections stored in triple-store databases have sparql endpoints that make harvesting the metadata in a collection more efficient. libraries can easily share metadata on important collections increasing the exposure and providing greater access for a wider audience. philip schreur, author of “bridging the worlds of marc and linked data,” sums this concept up nicely: “the shift to the web has become an inextricable part of our day-to-day lives. by moving our carefully curated metadata to the web, libraries can offer a muchneeded source of stable and reliable data to the rapidly growing world of web discovery.”27 linked data also makes it easier to harvest metadata and import collections into larger cultural heritage repositories like digital public library of america (dpla) which uses linked data to “empower people to learn, grow, and contribute to a diverse and better-functioning society by maximizing access to our shared history, culture, and knowledge.”28 europeana, the european cultural heritage database, uses semantic web technologies to support its mission which is to “empower the cultural heritage sector in its digital transformation.”29 using linked data to transfer data into these national repositories is more efficient and there is less loss of data because the triples do not have to be transformed into another schema. this increases the access of many cultural heritage collections that might not otherwise be seen. one of the big advantages to linked data is the ability to create connections between other cultural heritage collections worldwide via the web. incorporating triples harvested from other collections into the local datasets enables libraries to display a vast amount of information about cultural heritage collections in their web portals. libraries thus can provide a much richer display and allows users access to a greater variety of resources. linked data also allows web developers to use uris to implement advanced search technologies creating a multifaceted search environment for patrons. current research points to the fact that using sematic web technologies makes the creation of advance logic and reasoning functionalities possible. according to liyang yu in the book introduction to the semantic web and semantic web services, “the semantic web is an information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 7 extension of the current web. it is constructed by linking current web pages to a structured data set that indicates the semantics of this linked page. a smart agent, which is able to understand this structure data set, will then be able to conduct intelligent actions and make educated decisions on a global scale.”30 many digital cultural heritage collections in libraries live in siloed resources and are therefore only accessible to a small population of users. linked data helps to break down traditional library silos in these collections. by using linked data, an institution can expand the interoperability of the collection and make it more easily accessible. many institutions are starting to incorporate linked data technologies into digital collections, thereby increasing the ability for institutions to share collections. this allows for a greater audience to have access to critical cultural heritage collections for underrepresented populations. in the article “bridging the worlds of marc and linked data,” the author states, “the shift to linked data within this closed world of library resources will bring tremendous advantages to discovery both within a single resource … as well as across all the resources in your collections, and even across all of our collective collections. but there are other advantages to moving to linked data. through the use of linked data, we can connect to other trusted sources on the web.… we can also take advantage of a truly international web environment and reuse metadata created by other national libraries.”31 university of south florida libraries practice university of south florida libraries digital collections house a rich collection varying from cultural heritage objects to natural science and environment history materials to collections related to underrepresented populations. most of the collections are unique to usf and have significant research and educational value. the library is eager to share the collections as widely as possible and hopes the collections can be used at both document and data level. linked data creates a “web of data” instead of a “web of documents,” which is the key to bringing structure and meaning to web content, allowing computers to better understand the data. however, collections are mostly born at the document level. therefore, the first problem librarians need to solve is how to transform the documents to data. for example, there is a beautiful natural history collection called audubon florida tavernier research papers in usf libraries digital collections. the audubon florida tavernier research papers is an image collection which includes rookeries, birds, people, bodies of water, and man-made structures. the varied images come from decades of research and are a testament to the interconnectedness of bird population health and human interaction with the environment. the images reflect the focus of audubon’s work in research and conservation efforts both to wildlife and to the habitat that supports the wildlife.32 this was selected to be the first collection the authors experimented with to implement linked data at usf libraries. the lessons learned from working with this collection are applied to later work. when the collection was received to be housed in the digital platform, it was carefully analyzed to determine how to pull the data out of all the documents as much as possible. the authors designed a metadata schema of the combination of mods and darwin core (darwin core, abbreviated to dwc, is an extension of dublin core for biodiversity informatics) to pull out and properly store the data. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 8 figure 2. american kestrel. figure 3. american kestrel metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 9 figure 2 is one of the documents in the collection, which is a photo of an american kestrel. figure 3 shows the data collected from the document and the placement of the data in the metadata schema. the authors put the description of the image in free text in the abstract field. this field is indexed and searchable through the digital collections platform. location information is put in the hierarchical spatial field. the subject heading fields describe the “aboutness” of the image, that is, what is in the image. all the detailed information about the bird is placed in darwin core fields. thus, the document is dissembled into a few pieces of data which are properly placed into metadata fields where they can be indexed and searched. having data alone is not sufficient to meet linked data requirements. the first of the four rules of linked data is to name things using uris.33 to add uris to the data, the authors needed to standardize the data and reconcile it against widely-used authorities such as library of congress subject headings, wikidata, and the getty thesaurus of geographic names. standardized data tremendously increases the percentage of data reconciliation, which will lead to more links with related data once published. figure 4. amenaprkitch khachkar. figure 4 shows an example from the armenia heritage and social memory program. this is a visual materials collection with photos and 3d digital models. it was created by the digital heritage and humanities collection team at the library. the collection brings together comprehensive information and interactive 3d visualization of the fundamentals of armenian identity, such as their architectures, languages, arts, etc.34 when preparing the metadata for the items in this collection, the authors devoted extra effort to adding geographic location metadata. this effort serves two purposes: one is to respectfully and honestly include the information in the collection; and the second is to provide future reference to the location of each item as the physical items are in danger and could disappear or be ruined. the authors employed the getty thesaurus of geographic names because it supports a hierarchical location structure. the location names at each level can be reconciled and have their own uris. the authors also paid extra attention on the subject headings. figure 5 shows how the authors used library of congress subject headings, local subject headings assigned by the researchers, and the getty art and architecture thesaurus for this collection. in the data reconciliation stage, the metadata can be compared against both library of congress subject headings authority files and the getty aat vocabularies so that as many uris as possible can be fetched and added to the metadata. the focus information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 10 on geographic names and subject headings is to standardize the data and use controlled vocabularies as much as possible. once moving to the linked data world, the data will be ready to be added with uris. therefore, the data can be linked easily and widely. figure 5. amenaprkitch khachkar metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 11 one of the goals of building linked data is to make sense out of data and to generate new knowledge. as the librarians explored how to bring together multiple usf digital collections to highlight african american history and culture, three collections seemed particularly appropriate: • an african american sheet music collection from the early 20th century (https://digital.lib.usf.edu/sheet-music-aa) • the “narratives of formerly enslaved floridians” collection from 1930s (https://digital.lib.usf.edu/fl-slavenarratives) • the “otis r. anthony african american oral history collection” from 19781979(https://digital.lib.usf.edu/ohp-otisanthony) these collections are all oral expressions of african american life in the us. they span the first three-quarters of the 20th century around the time of the civil rights movement. creating linked data out of these collections will help shed light on the life of african americans through the 20th century and how it related to the civil rights movement. with semantic web technology support, these collections can be turned into machine actionable datasets to assist research and education activities on racism, anti-racism and to piece into the holistic knowledge base. usf libraries started to partner with dpla in 2018. dpla leverages linked data technology to increase discoverability of the collections contributed to it. dpla employs javascript object notation for linked data (json-ld) as its serialization for their data which is in rdf/xml format. json-ld has a method of identifying data with iris. the use of this method can effectively avoid data ambiguity considering dpla is holding a fairly large amount of data. json-ld also provides computational analysis in support of semantics services which enriches the metadata and in results, the search will be more effective.35 in the 18 months since usf began contributing selected digital collections to dpla, usf materials have received more than 12,000 views. it is exciting to see the increase in the usage of the collections and it is the hope that they will be accessed by more diverse user groups. usf libraries are exploring ways to scale up the project and eventually transition all the existing digital collections metadata to linked data. one possible way of achieving this goal would be through metadata standardization. a pilot project at usf libraries is to process one medium-size image collection of 998 items. the original metadata is in mods/mets xml files. we first decided to use the dpla metadata application profile as the data model. if the pilot project is successful, we will apply this model to all of our linked data transformation processes. in our pilot, we are examining the fields in our mods/mets metadata and identify those that will be meaningful in the new metadata schema. then we transport the metadata in those fields to excel files. the next step is to use openrefine to reconcile the data in these excel files to fetch uris for exact match terms. during this step, we are employing reconciliation services from the library of congress, getty tgn, and wikidata. after all the metadata is reconciled, we are transforming the excel file to triples. the column headers of the excel file become the predicates and the metadata as well as their uris will be the objects of the triples. next, these triples will be stored in an apache jena triple-store database so that we can start designing sparql queries to facilitate search. the final step will be designing a user-friendly interface to further optimize the user experiences. in this process, to make the workflow as scalable as possible, we are focusing on testing two processes: first, creating a universal metadata application profile to apply to the most, if not all, of the collections; and second, only fetching uris for exactly matching terms during the reconciliation information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 12 process. both of these processes aim to reduce human interactions with the metadata so that the process is more affordable to the library. conclusion and future work linked data can help collection discoverability. in the past six months, usf has seen an increase in materials going online. usf special collections department rapidly created digital exhibits to showcase their materials. if the trend in remote work continues, there is reason to believe that digital materials may be increasingly present and, given enough time and expertise, libraries can leverage linked data to better support current and new collections. the societal impact of covid-19 worldwide sheds light on the importance of technologies such as linked data that can help increase discoverability. when items are being created and shared online, either directly related to covid-19 or a result of its impact, linked data can help connect those resources. for instance, new covid-19 research is being developed and published daily. the publications office of the european union datathon entry “covid-19 data as linked data” states that “[t]he benefit of having covid-19 data as linked data comes from the ability to link and explore independent sources. for example, covid-19 sources often do not include other regional or mobility data. then, even the simplest thing, having the countries not as a label but as their uri of wikidata and dbpedia, brings rich possibilities for analysis by exploring and correlating geographic, demographic, relief, and mobility data.”36 the more institutions that contribute to this, the greater the discoverability and impact of the data. in 2020 there has been an increase in black lives matter awareness across the country. this affects higher education. usf libraries are not the only ones engaged in addressing racial disparities. many institutions have been doing this for years. others are beginning to focus on this area. no matter whether it’s a new digital collection or one that’s been around for decades, the question remains: how do people find these resources? perhaps linked data technologies can help solve that problem. linked data is a technology that can help accentuate the human effort put forth to create those collections. linked data is a way to assist humans and computers in finding interconnected materials around the internet. usf libraries faced many obstacles implementing linked data. there is a technological barrier that takes well-trained staff to surmount, i.e., creating a linked data triple store database and having linked data interact correctly on webpages. there is a time commitment necessary to create the triples and sparql queries. sparql queries themselves vary from being relatively simple to incredibly complicated. the authors also had the stumbling block of understanding how linked data worked together on a theoretical level. taking all of these considerations into account, we can say that creating linked data for a digital collection is not for the faint of heart. a cost/benefit analysis must be taken and assessed. the authors of this paper must continue to determine the need for linked data. at usf, the authors have taken the first steps in converting digital collections into linked data. we’ve moved from understanding the theoretical basis of linked data and into the practical side where the elements that make up linked data start coming together. the work to create triples, sparql queries, and uris has begun, and full implementation has started. our linked data group has learned the fundamentals of linked data. the next, and current, step is to develop workflows for existing metadata conversion into appropriate linked data. the group meets regularly and has created a triple store database and converted data into linked data. while the process is slow information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 13 moving due to group members’ other commitments, progress is being made by looking at the most relevant collections we would like to transform and moving forward from there. we’ve located the collections we want to work on, taking an iterative approach to creating linked data as we go. with linked data, there is a lot to consider. how do you start up a linked data program at your institution? how will you get the required expertise to create appropriate and high-quality linked data? how will your institution crosswalk existing data into triples format? is it worth the investment? it may be difficult to answer these questions but they’re questions that must be addressed. the usf libraries will continue pursuing linked data in meaningful ways and showcasing linked data’s importance. linked data can help highlight all collections but more importantly those of marginalized groups, which is a priority of the linked data group. endnotes 1 peter perl, “what is the future of truth?” pew trust magazine, february 4, 2019, https://www.pewtrusts.org/en/trust/archive/winter-2019/what-is-the-future-of-truth. 2 “anti-racism reading lists,” university of minnesota library, accessed september 24, 2020, https://libguides.umn.edu/antiracismreadinglists. 3 “triad black lives matter protest collection,” unc greensboro digital collections, accessed december 9, 2020, http://libcdm1.uncg.edu/cdm/blm. 4 “umbra search african american history,” umbra search, accessed december 10, 2020, https://www.umbrasearch.org/. 5 karen coyle, “on the web, of the web” (keynote at lita, october 1, 2011), https://kcoyle.net/presentations/lita2011.html. 6 donna ellen frederick, “disruption or revolution? the reinvention of cataloguing (data deluge column),” library hi tech news 34, no. 7 (2017): 6–11, https://doi.org/10.1108/lhtn-072017-0051. 7 tim berners-lee, “linked data,” w3, last updated june 18, 2009, https://www.w3.org/designissues/linkeddata.html. 8 neil wilson, “linked data prototyping at the british library” (paper presentation, talis linked data and libraries event, 2010). 9 diane rasmussen pennington and laura cagnazzo, “connecting the silos: implementations and perceptions of linked data across european libraries,” journal of documentation 75, no. 3 (2019): 643–66, https://doi.org/10.1108/jd-07-2018-0117. 10 jane hagerlid, “the role of the national library as a catalyst for an open access agenda: the experience in sweden,” interlending and document supply 39, no. 2 (2011): 115–18, https://doi.org/10.1108/02641611111138923. 11 pennington and cagnazzo, “connecting the silos,” 643–66. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 14 12 gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010): 2, https://doi.org/10.1045/november2010-byrne. 13 bendik bygstad, gheorghita ghinea, and geir-tore klæboe, “organisational challenges of the semantic web in digital libraries: a norwegian case study,” online information review 33, no. 5 (2009): 973–85, https://doi.org/10.1108/14684520911001945. 14 pennington and cagnazzo, “connecting the silos,” 643–66. 15 heather lea moulaison and anthony j. million, “the disruptive qualities of linked data in the library environment: analysis and recommendations,” cataloging & classification quarterly 52, no. 4 (2014): 367–87, https://doi.org/10.1080/01639374.2014.880981. 16 marshall breeding, “linked data: the next big wave or another tech fad?” computers in libraries 33, no. 3 (2013): 20–22. 17 moulaison and million, “the disruptive qualities of linked data,” 369. 18 nuno freire and sjors de valk, “automated interpretability of linked data ontologies: an evaluation within the cultural heritage domain,” (workshop, ieee conference on big data, 2019). 19 “bibframe update forum at the ala annual conference 2018,” (washington, dc: library of congress, july 2018), https://www.loc.gov/bibframe/news/bibframe-update-an2018.html. 20 jacquie samples and ian bigelow, “marc to bibframe: converting the pcc to linked data,” cataloging & classification quarterly 58, no. 3–4 (2020): 404. 21 oliver pesch, “using bibframe and library linked data to solve real problems: an interview with eric miller of zepheira,” the serials librarian 71, no. 1 (2016): 2. 22 pesch, 2. 23 gianfranco crupi, “beyond the pillars of hercules: linked data and cultural heritage,” italian journal of library, archives & information science 4, no. 1 (2013): 25–49, http://dx.doi.org/10.4403/jlis.it-8587. 24 “resource description framework (rdf),” w3c, february 25, 2014, https://www.w3.org/rdf/. 25 tim berners-lee, james hendler, and ora lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43, https://www.jstor.org/stable/26059207. 26 dean allemang and james hendler, “semantic web application architecture,” in semantic web for the working ontologist: effective modeling in rdfs and owl, (saint louis: elsevier science, 2011): 54–55. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 15 27 philip e. schreur and amy j. carlson, “bridging the worlds of marc and linked data: transition, transformation, accountability,” serials librarian 78, no. 1–4 (2020), https://doi.org/10.1080/0361526x.2020.1716584. 28 “about us,” dpla: digital public library of america, accessed december 11, 2020. https://dp.la/about. 29 “about us,” europeana, accessed december 11, 2020, https://www.europeana.eu/en/about-us. 30 liyang yu, “search engines in both traditional and semantic web environments,” in introduction to semantic web and semantic web services (boca raton: chapman & hall/crc, 2007): 36. 31 schreur and carlson, “bridging the worlds of marc and linked data.” 32 “audubon florida tavernier research papers,” university of south florida libraries digital collections, accessed november 30, 2020, https://lib.usf.edu/?a64/. 33 berners-lee, “linked data,” https://www.w3.org/designissues/linkeddata.html. 34 “the armenian heritage and social memory program,” university of south florida libraries digital collections, accessed november 30, 2020, https://digital.lib.usf.edu/armenianheritage/. 35 erik t. mitchell, “three case studies in linked open data,” library technology reports 49, no. 5 (2013): 26-43. 36 “covid-19 data as linked data,” publications office of the european union, accessed december 11, 2020, https://op.europa.eu/en/web/eudatathon/covid-19-linked-data. delivering: automated materials handling for staff and patrons public libraries leading the way delivering automated materials handling for staff and patrons carole williams information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13xxx carole williams (williamscar@ccpl.org) is amh and self-service coordinator, charleston county public library. © 2021. “you’ve made libraries cool again!” “wowsuper techie—we’re fascinated by the book return!” with enthusiastic comments like these from our visitors, the staff at charleston county public library (ccpl) knew that we were delivering patron engagement while providing an effective book return system. thanks to county residents overwhelmingly approving a referendum to build five new libraries and renovate thirteen others. from june 2019–november 2020 charleston county opened four new library branches, each with an automated materials handler (amh), and moved our support and administrative staff into a renovated support services building that now houses a 32-chute amh central sorter with smart transit check-in technology. (side note: yes, i know what you’re thinking and yes, we did, we opened two of the new branches during the pandemic—definitely fodder for another article.) the branch amhs have interior and exterior return windows and sit along a glass wall so patrons can watch their items ride the conveyor belts and drop into sorting bins. the staff side has an inductor for items being returned from or sent to other branches, so there is almost always something for the public to watch (see figure 1). men, women, children, young and old enjoy watching the amh and asking questions. some patrons bring their out of town guests (even a nun visiting from ireland) to see the amh in action. this spontaneous interaction bolsters our connection with visitors and subconsciously reinforces the concept of “library as safe exploration.” a frequent question is “how does this work?” our explanation of tags and coding is the perfect opportunity to suggest books, point out games, and promote upcoming classes. we follow a roving customer service model. because an amh is an efficient tool that checks in items and deposits them in pre-determined bins for easy shelving, we have freed up hours of staff time that can now be spent in the stacks, helping patrons find items and answering questions as needed. delivering an excellent amh experience for staff has been more complicated. as befitting a port county, we went full steam ahead with new technology, new locations, and increased services. this required all staff to simultaneously learn new systems and change many of our in-house procedures while continuing with daily operations. every detail, from how to sort for shelving to labeling shipments, needed to be re-examined. the biggest changes came with bringing the central sorter online. some of the changes were technical. for example, we use rfid tags as an identification number and place a matching barcode on each item. rfid is excellent technology; tagging all our items has completely changed and streamlined our process. most items come pre-processed and the amhs are set to only read the rfid. an unintended, but useful consequence is that we have become more aware of vendor processing errors where tags and barcodes don’t match. (side note: we are working on some system-wide solutions to locate discrepancies between barcode numbers and rfid tags in the ils; rfid is another topic entirely, so stay tuned.) mailto:williamscar@ccpl.org information technology and libraries september 2021 delivering| williams 2 figure 1. children returning books to the automated materials handler at a branch of the charleston public library. another benefit of the amh is that our library collections acquisitions and technical services (lcats) department realized that as they processed new orders, they could now send those items out daily instead of waiting to accumulate enough individual branch materials for a separate shipment—a win for patrons (new materials every day) and lcats (storage space.) the unexpected twist: our adult, ya, and children’s’ librarians are accustomed to receiving new materials separately from returns so they can familiarize themselves with the titles before the items are shelved. with the central sorter, new items go out daily mixed in with the rest of the daily shipment. spine tape makes it is easy for circulation staff to separate the new adult items, but we still needed a solution for children and ya. after several sort changes and many discussions, we went old school, recycling used paper into book flags. the flagging doesn’t cause a problem with the amh, is quick for technical services to place in each new book and is easy for circulation to spot and put aside at the receiving location. some of the changes were electrical. only the four new branch locations and support services facility have an amh, while the other fourteen branches check in items by hand. we added a tote check-in server (tcs) system to the central sorter. this feature creates a manifest of the items in each crate. branches can now receive the contents of each crate by entering a 4 -digit barcode instead of scanning individual items. an unintended consequence to our new internet-dependent system that we had not anticipated was electricity. the coast has frequent thunderstorms that can information technology and libraries september 2021 delivering| williams 3 cause power outages and flooding. if the power is out, there is no way to sort or receive the items in delivery. luckily this doesn’t happen often, and so far, power has been restored quickly. some of the changes were physical. our delivery drivers also process the shipment when they return each day. in their previous workflow, most of the shipment was delivered to the downstream libraries. the parts of the shipment that they did process had printed routing slips placed in each item, so staff could all be sorting the shipment at the same time. now their department has become logistics, which is a more encompassing title and better covers the wider variety of tasks the staff have added to their day. in addition to delivery and mail duties, logistics also manages and maintains the amh and tcs equipment, troubleshooting problems that arise, scanning barcodes, and processing an average of 3000 items daily with the amh. most of the shipment is now coming through the central sorter—staff handle an average of 157 crates each weekday, moving items from support services and to/from branches. we have electric forklifts that hold three crates at a time to help with the increase in physically shifting the crates. now one person inducts the shipment while others scan and stack the crates on the loading dock. this procedure is much faster than the previous paper slip method and processing is usually finished in a couple of hours. other changes were mental and emotional. new locations, renovations, technologies, and procedures can be exciting, but can also lead to change fatigue. fortunately, everyone retained their job this past year, but in order to operate a new branch built in a previously unserved community, we had to reassign staff from locations closed for renovation. ccpl’s vision is for our library to be the path to our cultural heritage, a door to resources of the present, and a bridge to opportunities in the future. we are doers, creators, servers, and teammates, not only to the community, but to our coworkers. we are all in for our shared vision, but whew. . . some days we all experience mental eye rolling and collective sighs of “another change?” our director’s mantra is “we are the calm.” and it is true. by fall we will have three of the renovated branches reopened, three more under renovation and another staffing shift. with some grace and encouragement to one another, we will handle whatever comes next. application level security in a public library: a case study richard thomchick and tonia san nicolas-rocca information technology and libraries | december 2018 107 richard thomchick (richardt@vmware.com) is mlis, san josé state university. tonia san nicolas-rocca (tonia.sannicolas-rocca@sjsu.edu) is assistant professor in the school of information at san josé state university. abstract libraries have historically made great efforts to ensure the confidentiality of patron personally identifiable information (pii), but the rapid, widespread adoption of information technology and the internet have given rise to new privacy and security challenges. hypertext transport protocol secure (https) is a form of hypertext transport protocol (http) that enables secure communication over the public internet and provides a deterministic way to guarantee data confidentiality so that attackers cannot eavesdrop on communications. https has been used to protect sensitive information exchanges, but security exploits such as passive and active attacks have exposed the need to implement https in a more rigorous and pervasive manner. this report is intended to shed light on the state of https implementation in libraries, and to suggest ways in which libraries can evaluate and improve application security so that they can better protect the confidentiality of pii about library patrons. introduction patron privacy is fundamental to the practice of librarianship in the united states (u.s.). libraries have historically made great efforts to ensure the confidentiality of personally identifiable information (pii), but the rapid, widespread adoption of information technology and the internet have given rise to new privacy and security challenges. the usa patriot act, the rollback of the federal communications commission rules prohibiting internet service providers from selling customer browsing histories without the customer’s permission, along with electronic surveillance efforts by the national security agency (nsa) and other government agencies, have further intensified privacy concerns about sensitive information that is transmitted over the public internet when patrons interact with electronic library resources through online systems such as an online public access catalog (opac). 1 hypertext transport protocol secure (https) is a form of hypertext transport protocol (http) that enables secure communication over the public internet and provides a deterministic way to guarantee data confidentiality so that attackers cannot eavesdrop on communications. https has been used to protect sensitive information exchanges (i.e., e-commerce transactions, user authentication, etc.). in practice, however, security exploits such as man-in-the-middle attacks have demonstrated the relative ease with which an attacker can transparently eavesdrop on or hijack http traffic by targeting gaps in https implementation. there is little or no evidence in the literature that libraries are aware of the associated vulnerabilities, threats, or risks, or that researchers have evaluated the use of https in library web applications. this report is intended to shed light on the state of https implementation in libraries, and to suggest ways in which libraries can evaluate and improve application security so that they can better protect the mailto:richardt@vmware.com mailto:tonia.sannicolas-rocca@sjsu.edu application level security in a public library |thomchick and san nicolas-rocca 108 https://doi.org/10.6017/ital.v37i4.10405 confidentiality of pii about library patrons. the structure of this paper is as follows. first, we review the literature on privacy as it pertains to librarianship and cybersecurity. we then describe the testing and research methods used to evaluate https implementation. a discussion on the results of the findings is presented. finally, we explain the limitations and suggest future research directions. literature review the research begins with a survey of the literature on the topic of confidentiality as it pertains to patron privacy; the impact of information technology on libraries; and the use of https as a security control to protect the confidentiality of patron data when it is transmitted over the public internet. while there is ample literature on the topic of patron privacy, there appears to be a lack of empirical studies that measure the use of https to protect the privacy of data transmitted to and from patrons when they use library web applications.2 the primal importance of patron privacy patron privacy has long been one of the most important principles of the library profession in the u.s. as early as 1939, the code of ethics for librarians explicitly stated, “it is the librarian’s obligation to treat as confidential any private information obtained through contact with li brary patrons.”3 the concept of privacy as applied to personal and circulation data in library records began to appear in the library literature not long after the passage of the u.s. privacy act of 1974.4 today, the american library association (ala) regards privacy as “fundamental to the ethics and practice of librarianship,” and has formally adopted a policy regarding the confidentiality of personally identifiable information (pii) about library users, which asserts, “confidentiality exists when a library is in possession of personally identifiable information about users and keeps that information private on their behalf.”5 this policy affirms language from the ala code of ethics, and states that “confidentiality extends to information sought or received and resources consulted, borrowed, acquired or transmitted including database search records, reference questions and interviews, circulation records, interlibrary loan records, information about materials downloaded or placed on ‘hold’ or ‘reserve,’ and other personally identifiable information about uses of library materials, programs, facilities, or services.” 6 with the advent of new technologies used in libraries to support information discovery, more challenges arise to protect patron privacy.7 the impact of information technology on patron privacy researchers have studied the impact of information technology on patron privacy for several decades. early research by harter and machovec discussed the data privacy challenges arising from the use of automated systems in the library, and the associated ethical considerations for librarians who create, view, modify, and use patron records.8 fouty addressed issues regarding the privacy of patron data contained in library databases, arguing that online patron records provide more information about individual library users, more quickly, than traditional paperbased files.9 agnew and miller presented a hypothetical case involving the transmission of an obscene email from a library computer, and an ensuing fbi inquiry, as a method of examining privacy issues that arise from patron internet use at the library.10 in addition, merry pointed to the potential for violations of patron privacy brought about by tracking of personal information attached to electronic text supplied by publishers.11 information technology and libraries | december 2018 109 the consensus from the literature, as articulated by fifarek, is that technology has given rise to new privacy challenges, and that the adoption of technology in the library has outpaced efforts to maintain patron privacy.12 this sentiment was echoed and amplified by john berry, former ala president, who commented that there are “deeper issues that arise from the impact of converting information to digitized, online formats” and critiqued the library profession for having “not built protections for such fundamental rights as those to free expression, privacy, and freedom.”13 ala affirmed these findings and validated much of the prevailing research in a report from the library information technology association, which concluded, “user records have also expanded beyond the standard lists of library cardholders and circulation records as libraries begin to use electronic communication methods such as electronic mail for reference services, and as they provide access to computer, web and printing use.”14 in more recent years, library systems have made increasing use of network communication protocols such as http and focus of the literature has shifted towards internet technologies in response to the growth of trends such as cloud computing and web 2.0. mavodza characterizes the relevance of cloud computing as “unavoidable” and expounds on the ways in which software-as-aservice (saas), platform as a service (paas), and infrastructure as a service (iaas) and other cloud computing models “bring to the forefront considerations about . . . information security [and] privacy . . . that the librarian has to be knowledgeable about.”15 levy and bérard caution that nextgeneration library systems and web-based solutions are “a breakthrough but need careful scrutiny” of security, privacy, and related issues such as data provenance (i.e., where the information is physically stored, which can potentially affect security and privacy compliance requirements). 16 protecting patron privacy in the “library 2.0” era “library 2.0” is an approach to librarianship that emphasizes engagement and multidirectional interaction with library patrons. although this model is “broader than just online communication and collaboration” and “encompasses both physical and virtual spaces,” there can be no doubt that “library 2.0 is rooted in the global web 2.0 discussion,” and that libraries have made increasing use of web 2.0 technologies to engage patrons.17 the library 2.0 model disrupts many traditional practices for protecting privacy, such as limited tracking of user activity, short-term data retention policies, and anonymous browsing of physical materials. instead, as zimmer states, “the norms of web 2.0 promote the open sharing of information—often personal information—and the design of many library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities.”18 as ala cautioned in their study on privacy and confidentiality, “libraries that provide materials over websites controlled by the library must determine the appropriate use of any data describing user activity logged or gathered by the web server software.”19 the dilemma facing libraries in the library 2.0 era, then, is how to appropriately leverage user information while maintaining patron privacy. many library systems require users to validate their identity through the use of a username, password, pin code, or another unique identifier for access to their library circulation records and other personal information.20 however, several studies suggest the authentication process itself spawns a trail of personally identifiable information about library patrons that must be kept confidential.21 there is discussion in the literature about the value of using https and ssl certificates to protect patron privacy and build a high level of trust with users, and general awareness about importance of encrypting communications that involve sensitive information, such as “payment for fines and fees via the opac” or when “patrons are required to enter personal application level security in a public library |thomchick and san nicolas-rocca 110 https://doi.org/10.6017/ital.v37i4.10405 details such as addresses, phone numbers, usernames, and/or passwords.”22 however, as breeding observed, many opacs and other library automation software products “don't use ssl by default, even when processing these personalization features.” 23 these observations call library privacy practices into question, and are concerning since “hackers have identified library ilss as vulnerable, especially when libraries do not enforce strict system security protocols.” 24 one of the challenges facing libraries is the perception that “a library's basic website and online catalog functions don't need enhanced security.”25 as a matter-of-fact, one of the most common complaints against https implementation in libraries has been: “we don’t serve any sensitive information.”26 these beliefs may be based on the historical practice of using https selectively to secure “sensitive” information and operations such as user authentication. but in recent years, it has become clear that selective https implementation is not an adequate defense. the electronic frontier foundation (eff) cautions, “some site operators provide only the login page over https, on the theory that only the user’s password is sensitive. these sites’ users are vulnerable to passive and active attacks.”27 passive attacks do not alter systems or data. during a passive attack, a hacker will attempt to listen in on communications over a network. eavesdropping is an example of a passive attack.28 active attacks alter systems or data. during this type of attack, a hacker will attempt to break into a system to make changes to transmitted or stored data, or introduce data into the system. examples of active attacks include man-in-the-middle, impersonation, and session hijacking.29 http exploits web servers typically generate unique session token ids for authenticated users and transmit them to the browser, where they are cached in the form of cookies. session hijacking is a type of attack that “compromises the session token by stealing or predicting a valid session token to gain unauthorized access to the web server,” often by using a network sniffer to capture a valid session id that can be used to gain access to the server.30 session hijacking is not a new problem, but the release of the firesheep attack kit in 2010 increased awareness about the inherent insecurity of http and the need for persistent https.31 in the wake of firesheep’s release and several major security breaches, senator charles schumer, in a letter to yahoo!, twitter, and amazon, characterized http as a “welcome mat for would-be hackers” and urged the technology industry to implement better security as quickly as possible.32 these and other events prompted several major site operators, including google, facebook, paypal, and twitter, to switch from partial to pervasive https. today these sites transmit virtually all web application traffic over https. security researchers from these companies, as well as from several standards organizations such as electronic frontier foundation (eff), internet engineering task force (ietf), and open web application security project have shared their experiences and recommendations to help other website operators implement https effectively.33 these include encrypting the entire session, avoiding mixed content, configuring cookies correctly, using valid ssl certificates, and enabling hsts to enforce https. testing techniques used to evaluate https implementation there is little or no evidence in the literature that libraries are aware of the associated vulnerabilities, threats, or risks, or that researchers have evaluated the use of https in library web applications. however, there are many methods that libraries can use to evaluate https and information technology and libraries | december 2018 111 ssl/tls implementation, including automated software tools and heuristic evaluations. these methods can be combined for deeper analysis. automated software tools among the most widely used automated analysis software tools is ssl server test from qualys ssl labs. this online service “performs a deep analysis of the configuration of any ssl web server on the public internet” and provides a visual summary as well as detailed information about authentication (certification and certificate chains) and configuration (protocols, key strength, cipher suites, and protocol details).34 users can optionally post the results to a central “board” that acts as a clearinghouse for identifying “insecure” and “trusted” sites. another popular tool is sslscan, a command-line application that, as the name implies, quickly “queries ssl services, such as https, in order to determine the ciphers that are supported.”35 however, these tools are limited in that they only report specific types of data and do not provide a holistic view of https implementation. heuristic evaluations in addition to automated software tools, librarians can also use heuristic evaluations to manually inspect the gray areas of https implementation, either to validate the results of automated software or to examine aspects not included in the functionality of these tools. one example is httpsnow, a service that lets users report and view information about how websites use https. httpsnow enables this activity by providing heuristics that non-technical audiences can use to derive a relatively accurate assessment of https deployment on any particular website or application. the project documentation includes descriptions of, and guidance for identifying, http-related vulnerabilities such as use of http during authenticated user sessions, presence of mixed content (instances in which content on a webpage is transmitted via https while other content elements are transmitted via http), insecure cookie configurations, and use of invalid ssl certificates. research methodology a combination of heuristic and automated methods was used to evaluate https implementation in a public library web application to determine how many security vulnerabilities exist in the application and assess to the potential privacy risks to the library’s patrons. research location this research project was conducted at a public library in the western us that we will call west coast public library (wcpl). this library was established in 1908 and employs ninety staff and approximately forty volunteers. in addition, it has approximately 91,000 cardholders. as part of its operations, wcpl runs a public-facing website and an integrated library system (ils) that includes an opac with personalization for authenticated users. test to conduct the test, a valid wcpl library patron account was created and used to authenticate one of the authors for access to account information and personalized features of wcpl’s opac. next, the google chrome web browser was used to visit wcpl’s public-facing website. a valid patron name, library card number, and eight-digit pin number were then used to gain access to online account information. several tasks were performed to evaluate https usage. a sample search application level security in a public library |thomchick and san nicolas-rocca 112 https://doi.org/10.6017/ital.v37i4.10405 query for the keyword “recipes” was performed in the opac while logged in. the description pages for two of the resources listed in the search engine result page (one printed resource and one electronic resource) were clicked on and viewed. the electronic resource was added to the online account’s “book cart” and the book cart page was viewed. during these activities, httpsnow heuristics were applied to individual webpages and to the user session as a whole. the web browser’s url address window was inspected to determine whether some or all pages were transmitted via http or https. the url icon in the browser’s address bar was clicked on to view a list of the cookies that the application set in the browser. each cookie was inspected for the text, "send for: encrypted connections only," which indicates that the cookie is secure. individual webpages were checked for the presence of mixed (encrypted and unencrypted) content. information about individual ssl certificates was inspected to determine their validity and encryption key length. all domain and subdomain names encountered during these activ ities were documented. the google chrome web browser was then used to access the qualys ssl server test tool. each domain name encountered was submitted. test results were then examined to determine whether any authentication or configuration flaws exist in wcpl’s web applications. results and discussion given the recommendations suggested by several organizations (e.g., eff, ietf, owasp), we evaluated wcpl’s web application to determine how many security vulnerabilities exist in the application, and assess the potential privacy risks to the library’s patrons. the results of tests, as discussed below, suggest that wcpl’s web application processes a number of vulnerabilities that could potentially be exploited by attackers and compromise the confidentiality of pii about library patrons. this is not surprising given the lack of research on https implementation, as well as the general consensus in the literature that technology adoption has outpaced efforts to maintain patron privacy. based on the results of these tests, wcpl’s website and ils span across several domains. some of these domains appear to be operated by wcpl, while others appear to be part of a hosted environment operated by the ils vendor. based on this information, it is reasonable to conclude that wcpl’s ils utilizes a “hybrid cloud” model. in addition, random use of https is observed in the opac interface during the testing process. this is discussed in the following sections. use of http during authenticated user sessions library patrons use wcpl’s website and opac to access and search for books and other material available through the library. given the results of the tests, wcpl does not use https pervasively across its entire web application. during the test, we found that wcpl’s website is transmitted via http by default. this was after manually entering in the url with an “https” prefix, which resulted in a redirect to the unencrypted “http” page. we continued to test wcpl’s website and opac by performing a query using the search bar located on the patron account page. we found that wcpl’s opac transmits some pages over http and others over https. for example, when a search query is performed in the search bar located on the patron account page, the search engine results page is sometimes served over https, and sometimes over http (see figure 1). this behavior is not limited to specific pages; rather it appears to be random. this security flaw leaves library patrons vulnerable to passive and active attacks that exploit gaps in https implementation, which allows an attacker to eavesdrop on and hijack a user-session providing the attacker with access to private information. information technology and libraries | december 2018 113 figure 1. results of the library’s use of https. presence of mixed content when a library patron visits a webpage served over https, the connection with the web server is encrypted, and therefore, safeguarded from attack. if an https webpage includes content retrieved via http, the webpage is only partially encrypted, leaving the unencrypted content vulnerable to attackers. analysis of wcpl’s website did not reveal any explicit use of mixed content on the public-facing portion of the site. test results, however, detected unencrypted content sources on some pages of the library’s online catalog. this, unfortunately, puts patron privacy at risk as attackers can intercept the http resources when an https webpage loads content such as an image, iframe or font over http. this compromises the security of what is perceived to be a secure site by enabling an attacker to exploit an insecure css file or javascript function, leading to disclosure of sensitive data, malicious website redirect, man-in-the-middle attacks, phishing, and other active attacks.36 insecure cookie management cookies are small text files, sent from a web server and stored on user computers via web browsers. cookies can be divided into two categories: session and persistent. persistent cookies are stored on the user’s hard drive until they are erased or expire. unlike persistent cookies, session cookies are stored in memory and erased once the user closes their browser. provided that computer settings allow for it, cookies are created when a user visits a website. cookies can be set up such that communication is limited to encrypted communication, and can be used to remember login credentials, previous information entered into forms, such as name, mailing address, email address, and the like. cookies can also be used to monitor the number of times a user visits a website, the pages a user visits, and the amount of time spent on a webpage. application level security in a public library |thomchick and san nicolas-rocca 114 https://doi.org/10.6017/ital.v37i4.10405 the results of the tests suggest that wcpl’s cookie policies are inconsistent. we found two types of cookies present. within one domain, the web application uses a jsession cookie that is configured to send for “secure connections only.” this indicates that the session id cookie is encrypted during transmission. another domain uses an asp.net session id that is configured to send for any connection, which means the session id could be transmitted in an unencrypted format. cookies transmitted in an unencrypted format could be intercepted by an attacker in order to eavesdrop on or hijack user sessions. this leaves user privacy vulnerable given the type of information contained within cookies. flawed encryption protocol support transport layer security (tls) is a protocol designed to provide secure communication over the web. websites using tls, therefore, provide a secure communication path between their web servers and web browsers preventing eavesdropping, hijacking, and other active attacks. this study employed the ssl server test from qualys ssl labs to perform an analysis of wcpl’s web applications. results of the qualys test (see figure 2) indicate that the site does not support tls 1.2, which means the server may be vulnerable to passive and active attacks, thereby providing hackers with access to data passed between a web server and web browser accessing the server. in addition, the application’s server platform supports ssl 2.0, which is insecure because it is subject to a number of passive and active attacks leading to loss of confidentiality, privacy, and integrity. figure 2. qualys scanning service results. the vulnerabilities discovered during the testing process may be a result of uncoordinated security. this is concerning because it is a by-product of the cloud computing approach used to operate wcpl’s ils. while libraries may have acclimated to the challenge of coordinating security measures across a distributed application, they now face the added complexity of coordinating information technology and libraries | december 2018 115 security measures with their vendors, who themselves may also utilize additional cloud-based offerings from third parties. as cloud technology adoption increases and cloud-based infrastructures become more complex and distributed, attackers will likely attempt to find and exploit systems with inconsistent or uneven security measures, and libraries will need to work closely with information technology vendors to ensure tight coordination of security measures. unencrypted communication using http affects the privacy, security, and integrity of patron data. passive attacks such as eavesdropping, and active attacks such as hijacking, man -in-the-middle, and phishing can reveal patron login credentials, search history, identity, and other sensitive information that, according to ala, should be kept private and confidential. given the results of the testing done in this study, it is clear that wcpl needs to revisit and strengthen their web application security measures by, according to organizations within the security community, using https pervasively across the entire web application, avoiding mixed content, configuring cookies limited to encrypted communication, using valid ssl certificates, and enabling hsts to enforce https. implementing improvements to https will mitigate attacks by strengthening the integrity of wcpl’s web applications, which in turn, will help protect the privacy and confidentiality of library patrons. limitations and future research this research was performed at a public library in the western u.s. therefore, future research is needed to study the implementation of https to increase patron privacy at other public libraries, libraries in other parts of the u.s. and in other countries. it would also be valuable to conduct similar research at libraries of different types, including academic, law, medical, and other types of special libraries. ssl server test from qualys ssl labs and httpsnow were used to evaluate the use of https at wcpl. the use of other evaluation techniques may generate different results. while a major limitation of this study is the evaluation of a single public library and the implementation of https to ensure patron privacy, a next phase of research should further investigate the policies in place that are used to safeguard patron privacy. these include security education, training, and awareness programs, as well as access controls. furthermore, library 2.0 and cloud computing are fundamental to libraries, but create risks that could impact the ability to keep patron pii safeguarded. as such, future research should evaluate the impact library 2.0 and cloud computing applications have on maintaining the confidentiality of patron information. conclusion the library profession has long been a staunch defender of privacy rights, and the literature reviewed indicates strong awareness and concern about the rapid pace of information technology and its impact on the confidentiality of personally identifiable information about library patrons. much work has been done to educate librarians and patrons about the risks facing them and the measures they can take to protect themselves. however, the research and experimentation presented in this report strongly suggest that there is a need for wcpl and other libraries to reassess and strengthen their https implementations. https is not a panacea for mitigating web application risks, but it can help libraries give patrons the assurance of knowing they take security and privacy seriously, and that reasonable steps are being taken to protect them. finally, this report concludes that further research on library application security should be conducted to assess the overall state of application security in public, academic, and special libraries, with the application level security in a public library |thomchick and san nicolas-rocca 116 https://doi.org/10.6017/ital.v37i4.10405 long-term objective of enabling ala and other professional institutions to develop policies and best practices to guide the secure adoption of library 2.0 and cloud computing technologies within a socially connected world. references 1 jon brodkin, “president trump delivers final blow to web browsing privacy rules,” ars technica (april 3, 2017), https://arstechnica.com/tech-policy/2017/04/trumps-signaturemakes-it-official-isp-privacy-rules-are-dead/. 2 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2 (2017): 48–58, https://doi.org/10.6017/ital.v36i2.9817. 3 american library association, “history of the code of ethics: 1939 code of ethics for librarians,” accessed may 11, 2018, http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/conte ntdisplay.cfm&contentid=8875. 4 joyce crooks, “civil liberties, libraries, and computers,” library journal 101, no. 3 (1976): 482– 87; stephen harter and charles c. busha, “libraries and privacy legislation,” library journal 101, no. 3 (1976): 475–81; kathleen g. fouty, “online patron records and privacy: service vs. security,” journal of academic librarianship 19, no. 5 (1993): 289–93, https://doi.org/10.1016/0099-1333(93)90024-y. 5 “code of ethics of the american library association,” american library association, amended january 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics; “privacy: an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 6 american library association, “privacy: an interpretation of the library bill of rights,” amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 7 pekala, “privacy and user,” pp. 48–58. 8 harter and busha, “libraries and privacy legislation,” pp. 475–81; george s. machovec, “data security and privacy in the age of automated library systems,” information intelligence, online libraries, and microcomputers 6, no. 1 (1988). 9 fouty, “online patron records and privacy, pp. 289–93. 10 grace j. agnew and rex miller, “how do you manage?,” library journal 121, no. 2 (1996): 54. 11 lois k. merry, “hey, look who took this out!—privacy in the electronic library,” journal of interlibrary loan, document delivery & information supply 6, no. 4 (1996): 35–44, https://doi.org/10.1300/j110v06n04_04. https://arstechnica.com/tech-policy/2017/04/trumps-signature-makes-it-official-isp-privacy-rules-are-dead/ https://arstechnica.com/tech-policy/2017/04/trumps-signature-makes-it-official-isp-privacy-rules-are-dead/ https://doi.org/10.6017/ital.v36i2.9817 http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/contentdisplay.cfm&contentid=8875 http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/contentdisplay.cfm&contentid=8875 https://doi.org/10.1016/0099-1333(93)90024-y http://www.ala.org/advocacy/proethics/codeofethics/codeethics http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy https://doi.org/10.1300/j110v06n04_04 information technology and libraries | december 2018 117 12 aimee fifarek, “technology and privacy in the academic library,” online information review 26, no. 6 (2002): 366–74, https://doi.org/10.1108/14684520210452691. 13 john n. berry iii, “digital democracy: not yet!,” library journal 125, no. 1 (2000): 6. 14 american library association, “appendix—privacy and confidentiality in the electronic environment,” september 28, 2006, http://www.ala.org/lita/involve/taskforces/dissolved/privacy/appendix. 15 judith mavodza, “the impact of cloud computing on the future of academic library practices and services,” new library world 114, no. 3/4 (2012): 132–41, https://doi.org/10.1108/03074801311304041. 16 richard levy, “library in the cloud with diamonds: a critical evaluation of the future of library management systems,” library hi tech news 30, no. 3 (2013): 9–13, https://doi.org/10.1108/lhtn-11-2012-0071; raymond bérard, “next generation library systems: new opportunities and threats,” bibliothek, forschung und praxis 37, no. 1 (2013): 52–58, https://doi.org/10.1515/bfp-2013-0008. 17 michael stephens, “the hyperlinked library: a ttw white paper,” accessed may 13, 2018, http://tametheweb.com/2011/02/21/hyperlinkedlibrary2011/; michael zimmer, “patron privacy in the ‘2.0’ era.” journal of information ethics 22, no. 1 (2013): 44–59, https://doi.org/10.3172/jie.22.1.44. 18 zimmer, “patron privacy in the ‘2.0’ era,” p. 44. 19 “the american library association’s task force on privacy and confidentiality in the electronic environment,” american library association, final report july 7, 2000, http://www.ala.org/lita/about/taskforces/dissolved/privacy. 20 library information technology association (lita), accessed may 11, 2018, http://www.ala.org/lita/. 21 library information technology association (lita), accessed may 11, 2018, http://www.ala.org/lita/; pam dixon, “ethical issues implicit in library authentication and access management: risks and best practices,” journal of library administration 47, no. 3 (2008): 141–62, https://doi.org/10.1080/01930820802186480; eric p. delozier, “anonymity and authenticity in the cloud: issues and applications,” oclc systems and services: international digital library perspectives 29, no. 2 (2012): 65–77, https://doi.org/10.1108/10650751311319278. 22 marshall breeding, “building trust through secure web sites,” computers in libraries 25, no. 6 (2006), p. 24. 23 breeding, “building trust,” p. 25. https://doi.org/10.1108/14684520210452691 http://www.ala.org/lita/involve/taskforces/dissolved/privacy/appendix https://doi.org/10.1108/03074801311304041 https://doi.org/10.1108/lhtn-11-2012-0071 https://doi.org/10.1515/bfp-2013-0008 http://tametheweb.com/2011/02/21/hyperlinkedlibrary2011/ https://doi.org/10.3172/jie.22.1.44 http://www.ala.org/lita/about/taskforces/dissolved/privacy http://www.ala.org/lita/ http://www.ala.org/lita/ https://doi.org/10.1080/01930820802186480 https://doi.org/10.1108/10650751311319278 application level security in a public library |thomchick and san nicolas-rocca 118 https://doi.org/10.6017/ital.v37i4.10405 24 barbara swatt engstrom et al., “evaluating patron privacy on your ils: how to protect the confidentiality of your patron information,” aall spectrum 10, no 6 (2006): 4–19. 25 breeding, “building trust,” p. 26. 26 tj lamana, “the state of https in libraries,” intellectual freedom blog, the office for intellectual freedom of the american library association (2017), https://www.oif.ala.org/oif/?p=11883. 27 chris palmer and yan zhu, “how to deploy https correctly,” electronic frontier foundation, updated february 9, 2017, https://www.eff.org/https-everywhere/deploying-https. 28 computer security resource center, “glossary,” national institute of standards and technology, accessed may 12, 2018, https://csrc.nist.gov/glossary/?term=491#alphaindexdiv. 29 computer security resource center, “glossary,” national institute of standards and technology, accessed may 12, 2018, https://csrc.nist.gov/glossary/?term=2817. 30 open web application security project, “session hijacking attack,” last modified august 14, 2014, https://www.owasp.org/index.php/session_hijacking_attack; open web application security project, “session management cheat sheet,” last modified september 11, 2017, https://www.owasp.org/index.php/session_management_cheat_sheet. 31 eric butler, “firesheep,” (2010), http://codebutler.com/firesheep/; audrey watters, “zuckerberg's page hacked, now facebook to offer ‘always on’ https," accessed may 16, 2018, https://readwrite.com/2011/01/26/zuckerbergs_facebook_page_hacked_and_now_facebook/ . 32 info security magazine, “senator schumer: current internet security “welcome mat for wouldbe hackers,” (march 2, 2011), http://www.infosecurity-magazine.com/view/16328/senator schumer-current-internetsecurity-welcome-mat-for-wouldbe-hackers/. 33 palmer and zhu, “how to deploy https correctly”; internet engineering task force, “recommendations for secure use of transport layer security (tls) and datagram transport layer security (dtls),” (may, 2015), https://tools.ietf.org/html/bcp195; open web application security project, “session management cheat sheet,” last modified september 11, 2017, https://www.owasp.org/index.php/session_management_cheat_sheet. 34 qualys ssl labs, “ssl/tls deployment best practices,” accessed may 18, 2018, https://www.ssllabs.com/projects/best-practices/. 35 sourceforge, “sslscan—fast ssl scanner,” last updated april 24, 2013, http://sourceforge.net/projects/sslscan/. 36 palmer and zhu, “how to deploy https correctly.” https://www.oif.ala.org/oif/?p=11883 https://www.eff.org/https-everywhere/deploying-https https://csrc.nist.gov/glossary/?term=491#alphaindexdiv https://csrc.nist.gov/glossary/?term=2817 https://www.owasp.org/index.php/session_hijacking_attack https://www.owasp.org/index.php/session_management_cheat_sheet http://codebutler.com/firesheep/ https://readwrite.com/2011/01/26/zuckerbergs_facebook_page_hacked_and_now_facebook/ http://www.infosecurity-magazine.com/view/16328/senator-%20schumer-current-internet-%20security-welcome-mat-for-wouldbe-hackers/ http://www.infosecurity-magazine.com/view/16328/senator-%20schumer-current-internet-%20security-welcome-mat-for-wouldbe-hackers/ https://tools.ietf.org/html/bcp195 https://www.owasp.org/index.php/session_management_cheat_sheet https://www.ssllabs.com/projects/best-practices/ http://sourceforge.net/projects/sslscan/ abstract introduction literature review the primal importance of patron privacy the impact of information technology on patron privacy protecting patron privacy in the “library 2.0” era http exploits testing techniques used to evaluate https implementation automated software tools heuristic evaluations research methodology research location test results and discussion use of http during authenticated user sessions presence of mixed content insecure cookie management flawed encryption protocol support limitations and future research conclusion references user experience testing in the open textbook adaptation workflow: a case study article user experience testing in the open textbook adaptation workflow a case study camille thomas, kimberly vardeman, and jingjing wu information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12039 camille thomas (cthomas5@fsu.edu) is scholarly communications librarian, florida state university. kimberly vardeman (kimberly.vardeman@ttu.edu) is user experience librarian, texas tech university. jingjing wu (jingjing.wu@ttu.edu) is web librarian, texas tech university. © 2021. abstract as library publishers and open education programs grow, it is imperative that we integrate practices in our workflows that prioritize and include end users. although there is information available on best practices for user testing and accessibility compliance, more can be done to give insight into the library publishing context. this study examines the user and accessibility testing workflow during the modification of an existing open textbook using pressbooks at texas tech university. introduction as library publishers and open education programs grow, there is an opportunity to integrate into our workflows practices that prioritize and include end users. although there is information available on best practices for user testing and accessibility compliance, more can be done to give insight into the library publishing context. there are currently no case studies that examine the user and accessibility testing workflow during the modification of an existing open textbook. this study examines user experience testing as a method to improve oer interfaces, learning experience, and accessibility during the oer production process using pressbooks at a large research university. literature review user experience (ux) is a “momentary, primarily evaluative feeling (good–bad) while interacting with a product or service” that can go beyond simple usability evaluations to consider “qualities such as meaning, affect and value.”1 ux evaluations are generally applied to library websites, spaces, and interfaces and are not currently a common element in library publishing workflows. open educational resources (oer) are defined as teaching, learning, and research resources that reside under an intellectual property license that permits their free use and repurposing by others.2 whitfield and robinson make a distinction between teaching vs. learning resources, instructional vs. interface usability, and ease of modification for creators.3 this select literature review considers usability testing of e-books, oer workflows, and accessibility evaluations and how they apply to local contexts. along with incentives for instructors to engage with oer, the ability to adapt oer is o ften highlighted as a benefit. walz shares common workflows for oer production, including broad steps for design and development during creation of original oer.4 in the case of reuse, the design stage in walz’s workflow includes review, redesign, redevelopment, and adoption. open university models for transforming oer include the integrity model, in which the new oer remains close to the original material; the essence model, in which material is transformed by mailto:cthomas5@fsu.edu mailto:kimberly.vardeman@ttu.edu mailto:jingjing.wu@ttu.edu information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 2 reducing some features and adding new activities for interactivity; and the remix model, in which content is redesigned to be optimal for web viewing.5 student participation in oer production is often seen in open pedagogy, but these cases look at student frustrations and feedback with the objective of experiential learning, not for usability or evaluation.6 now that oer production has grown in scale, librarians and advocates seek the most effective and sustainable workflows. figure 1. illustrations of two oer lifecycles. this work was adapted by camille thomas from an original by anita walz (2015) under a cc by 4.0 international license.7 in his workflow and framework analysis meinke recommends the inclusion of more discrete steps and believes each institution’s ideal workflow will be based on local context.8 usability testing is a discrete workflow step that gives us human-centered insight about how users are affected by interfaces and how they value systems.9 libraries favor collections-based assessment that measures how many end users are using digital items, without prioritizing who users are or how and why they use resources.10 demonstrating and assessing value is essential for scholar buy-in and content recruitment, for example, which are central to all types of open resources. in the case of educational materials, lack of engagement and breakdowns in learning can be attributed to barriers and marginalization of learners.11 additionally, critiques of oer include assumptions that access to information equates to meaningful access to knowledge, but withou t context there is no guarantee that there will be meaningful transference or learning.12 harley believes defining a target audience and considering reasons for use and nonuse of resources in specific educational contexts beyond measuring anecdotal demand (e.g., website page views or survey responses, which harley does not see as indicators of value but rather of popularity) may address challenges to effectively measuring outcomes for content that is freely available on the web.13 meaningful evaluation of learning resources requires deep understanding of contextualized information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 3 educator and student needs, not just content knowledge.14 to address these barriers and assumptions, openstax, a leading oer publisher, has ux experts on staff, but this model is exceptional and rare at a university or library. many universities and libraries publishing oer do not have full-time personnel dedicated to review. some library user experience departments have hired content strategists for auditing, analyzing, defining, and designing website content, contentrelated projects, and overall content strategy.15 currently, oer work is rarely included in the scope of library user experience departments. however, limited literature does show the use of ux research methods in library publishing contexts. libraries and support units with few resources can also perform user testing.16 user experience practitioners have established that a low number of test participants—three to five—are enough to identify a majority of usability issues.17 borsci et al. suggest the important aspect is not securing a high-volume sample, but rather finding behavior diversity to make reliable evaluations.18 the number of users required to find diverse behavior can depend on what is being tested. following this standard, the consistent inclusion of user evaluations in oer workflows will not necessarily require large amounts of funding, participants, resources, or time. oer, in particular, are well suited to cumulative, early, frequent, and specific user testing. with their open copyright licensing and availability, oer are an example of the mutable digital content needed for collaboration, cumulative production, and support of networked communities. 19 several studies assert that complete information behavior analysis should be carried out before or during development, not after.20 meinke concludes his workflow analysis by encouraging iterative release in oer production workflows, which aligns with lean and iterative “guerilla” approaches used in libraries to sustainably improve usability.21 iteration is a process of development that involves cyclical inquiry or repetitive testing, providing multiple opportunities for people to revisit ideas or deliverables in order to improve an evolving design until the end goal is reached.22 in the context of design, it is a method of creating high-quality deliverables at a faster rate.23 a cyclical approach also reflects walz’s as well as blakiston and mayden’s workflow visualizations.24 walz asserts that incentives for instructors and the quality of the resources are key factors in advancing adoption, adaptation, and creation of oer.25 harley uncovered disconnects between what faculty say they need in undergraduate education and what those who produce digital educational resources imagine is an ideal state. 26 influence on faculty resource use, including oer, varied by discipline, teaching style and philosophy, and digital literacy levels, with personal preferences having the most influence on decision-making. in the evaluation or tracking stage found in most oer production workflows, we can see the impact of the quality assurance stage. the study by woodward et al. on student voice in evaluating textbooks found that incorporating multiple stakeholders into the process resulted in deeper exploration of students’ expectations when learning. students ranked one oer and one conventional textbook the highest based on content, design, pedagogy, and cost. multiauthored options ranked higher, and texts with examples were seen as more beneficial for distance learners.27 meinke believes unless discrete parts of the development process are identified, it is not useful to signal others to contribute to a project.28 an example of an oer production workflow containing usability considerations is the content, openness, reuse & repurpose, and evidence (corre) framework by the university of leicester (see fig. 2).29 the openness phase of the corre 2.0 framework includes “transformation for usability,” which is assigned to the oer evaluator, editorial board, or head of department.30 versions of the corre workflow were adapted by the information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 4 university of bath and the university of derby in the united kingdom for their institutional contexts. for example, the university of derby assigned “transformation for usability” to a developer. by building usability as a discrete step in oer production workflows, publishers and collaborators can make improvements on pain points, make changes in context, and create clear guidelines for partnerships based on local needs. betz and hall’s study supports considerations for how user microinteractions, or individual actions within a process, can be improved to make them scalable and commonplace in library workflows.31 this can include publishing workflows. for example, a study of oer on mobile devices in brazil found problems related to performance, text legibility, and trigger actions in small objects.32 other guidelines for oer and usability include using high-quality multimedia source material, advanced support from educational technologists, and balancing high and low technology in order to avoid assumptions about learners’ internet connection or devices.33 although usability testing alone is an important part of evaluating a website or product, because the user experience is multifaceted, it is also important to ensure that the product is accessible, meets user needs, and has an appealing design.34 figure 2. corre framework for oer development at the university of leicester.35 accessibility studies also encourage integrating user interactions into the creation workflow. accessibility impacts usability, findability, and holistic user experience. 36 creators and supporting advocates have relied on universal design, web standards, and ada compliance when creating information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 5 accessible digital content, emphasizing that accessible content for those with disabilities means more accessible content for all users.37 areas addressed can include text style and formatting, linking, and alternative text considerations, as detailed in the bccampus open education accessibility toolkit.38 for example, kourbetis and boukouras drew from universal design to create a bilingual oer for students with hearing impairments in greece, incorporating contextual considerations for vernacular languages and other local user needs.39 early efforts toward accessible oer, such as a 2010 framework, prompted critiques from members of the accessibility community and impeded adoption.40 while guides based in universal design offer a starting place and consistent reference, oer advocates could create workflows that support adaptive changes seen in inclusive design. universal design and web standards are fixed, while inclusive design seeks to support adaptive changes as needs evolve and does not treat non normative users as a homogenous, segregated group.41 treviranus et al. go on to state that compliance is not achieved by providing a set of rules, guidelines, or regulations to be followed.42 beyond lack of awareness of accessibility best practices, librarians and creators tend to have little control over customizing proprietary digital content platforms to add local context. 43 the flexible learning for open education (floe) project, for example, aims to integrate automatic and unconscious inclusive oer design through open source tools, but many institutions may not be able to develop such tools to incorporate local contexts.44 both librarians and e-resources vendors have been interested in the features and usability of ebooks to fine-tune their collection development strategies as well as improve the user experience of their platforms. literature shows that most studies about e-books have focused on features or the interface design of e-book reading applications. the recent academic student ebook experience survey showed that three-quarters of survey attendees considered it extremely important or important for e-books to have page numbers for citation purposes.45 this survey and other studies suggest that search, navigation, zoom, annotation, mobile compatibility, as well as offline availability including downloading, printing, and saving, were the most expected features.46 other features, such as citation generation and emailing, were mentioned or tested in some research.47 while using e-books and using e-textbooks may involve the same functionality, the purpose, devices, and user types differ because knowledge transfer is needed in learning. jardina and chaparro evaluated eight e-textbook applications with four tasks: bookmarking, note-taking, note locating, and word searching.48 they found that the interfaces to these common features varied on the different applications. standardization, or at least following general web convention when designing these interfaces, may reduce distractions that keep students from learning. the etextbook user interface can be critical to the future success of e-textbook adoption. although limited research on usability of e-textbooks or open textbooks has been conducted, a considerable number of findings from studies on e-books are relevant and applicable to etextbook projects. the e-book or e-textbook applications usability evaluation methods and results can be borrowed when understanding oer user needs. libraries can apply these e-book usability evaluations to the basic infrastructure of oer, but leverage the local contexts of students, instructors, and institutional culture when adapting the material. the more normalized usability, prototyping, and collaboration are in oer production workflows, the richer the resources and community investment. this approach can address diverse and evolving oer user needs, locally and sustainably, as they arise. our study contributes to the information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 6 literature by examining the impacts of integrating usability testing in an inaugural oer adaptation project at a large research university in the united states. case study the project to adapt the open textbook college success, published by the university of minnesota libraries, for use in the raiderready first-year seminar course, was brought to the texas tech university libraries’ scholarly publishing office by the head of the outreach department in march 2018. the program was deciding between a commercial textbook and the adapted open textbook. the course was offered during each fall semester and had an average enrollment of over 1,600 since 2016. an initial meeting took place and regular weekly meetings were set up afterward to review edits and ensure communication within the original 30-day timeline. it was the first oer production project for the libraries’ publishing program, which had previously focused on open access journals and materials in the repository. originally, we sought to use open monograph press because we had a local instance already installed. however, a platform with more robust formatting capabilities was needed in order to reach the desired product within the timeline. we decided to use the pressbooks pro (a wordpress plugin) sandbox for one project through our open textbook network membership. a rough draft of edits to the original work were already completed. we used a mediated service model, in which librarians performed the formatting, quality assurance, and publishing. this was in contrast to self-service models in which creators work independently and consult with support specialists. the digital publishing librarian and scholarly communication librarian formatted the edits, with html/css customization and platform troubleshooting from the web librarian. other library staff involved in the project included communications & marketing (cover design), the user experience librarian, and the electronic resources librarian (cataloging in the catalog). campus stakeholders and partners included the libraries, the raiderready program, editors, copytech (printers affiliated with the university), the campus bookstore, and the worldwide elearning unit. program partners were enthusiastic about usability and accessibility testing for the textbook. the initial testing took place in the middle of the adaptation project timeline, once initial content was formatted and ready for testing. the bccampus’ accessibility toolkit and the pressbooks user guide were used as primary guides throughout the process. the scholarly publishing librarian and the user experience librarian met to develop the testing method and identify users who would reflect the audience using the textbook. a second round of tests was conducted a year after the initial project when the editors made updates to the text. while the resulting changes were minor, this further testing allowed us to seek more feedback on the most recent version of the textbook and apply some lessons learned from the first round of testing. we did not use personas or identify user needs beforehand. we planned to recruit first-year students and students who took the raiderready course in a previous semester. however, we decided to instead recruit from existing pools of student volunteers for library usability tests in order to get three to six students in a short amount of time. for the second round of testing, we planned to recruit on-campus students, distance students, and students with diverse abilities. we recruited from newly established pools of volunteers for distance students as well as existing volunteer pools. during the first iteration, we requested that worldwide elearning, texas tech university’s distance learning unit on campus, test the textbook pilot content in pdf and epub formats using screen reader software. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 7 the user experience librarian conducted a first round of four usability tests in march 2018 and a second round of two usability tests in april 2019. a sample test script from steve krug provided a solid foundation for conducting our own tests.49 in each test, participants were asked to answer two pretest questions, complete four tasks, and answer four posttest questions (see appendix for script). tasks included finding the textbook, exploring the textbook itself, locating activities for a specific chapter, and searching for the student code of conduct. participants were instructed to think aloud as they worked through the tasks. the think-aloud protocol is a commonly used test method, where participants are asked to verbalize their thoughts and actions as they perform tasks on a website. the observation tasks are set beforehand, and the facilitator follows a script to ensure consistency among testers.50 the combination of observing and recording participants’ comments and reactions provides insight into a site’s usability. testers were invited to comment on their experience at the end of the session. each usability test was recorded using morae software to track on-screen activity such as mouse movements and clicks, typing, and the verbal comments of the facilitator and participant. we conducted tests using a laptop running windows 10 with a 15.6-inch display. in the first round of testing, we also showed students the book on an ipad mini, both in adobe reader and ibooks. while we asked them to briefly view the textbooks, we did not ask them to complete specific tasks while using the tablet. limitations the biggest limitation was that we did not test on users using a screen reader or other assistive technology. the user experience librarian built a pool of on-campus students who volunteered to participate in user research in 2018, and relationships with a pool of distance student users was established in 2019. however, a pool of other types of non-normative learners had not yet been established for either round of testing. another limitation of the study was that we primarily tested on campus servers, so we do not have data on rural or distance learner experiences with the textbook until the second round of testing. in addition, we used only a few devices, a windows 10 laptop for formal testing and an ipad students briefly viewed afterward. we also did not have an educational technologist as a partner throughout the process. results once testing was complete, the scholarly communications librarian and the user experience librarian analyzed the notes and identified areas of common concern and confusion among participants. all participants were familiar with online textbooks from other courses. participants cited cost as a major consideration when deciding between purchasing print or electronic texts. more than one participant said that electronic textbooks can be cheaper but can be more frustrating to use. participants had more experience viewing textbooks on laptops. the ability to download texts for reading on a phone was not always available due to publisher restrictions. content and navigation participants liked pictures and visuals to break up the blocks of text. however, one participant expressed a dislike for too many slideshows or other media. another in the first round of testing liked that there were not “too many” links that brought you out of the textbook, stating it was “annoying to split screen in order to see text plus activity/homework assignment.” in the second round of testing, one participant felt the lack of interactive content was best for the first-year students compared to videos and activities in textbooks for advanced courses. that participant also thought the simpler language of the text was more welcoming to first-year students. a information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 8 participant said an ipad would be better than a laptop for viewing this book, because scrolling was easier. several users did not find features such as bookmarked sections in the in-browser pdf viewers or adobe reader. participants who did not see or use the table of contents (toc) continually relied on scrolling through pages to locate content. only one participant, unprompted, used the ctrl+f shortcut to keyword search the text. a few other participants viewed the toc, then entered the desired page number in the top toolbar navigation field. most of them expected the code of conduct information in one of the tasks to be in the front matter. the emphasis on content reflects blakiston and mayden’s experience that without a content strategy, it becomes difficult to search and to demonstrate credibility, and it is a challenge to create a coherent, user-centered information architecture.51 all participants navigated to the toc several times to complete tasks, making it a relevant feature. in the second round of testing, one participant preferred the statements and questions at the beginning of the first chapter to learning objectives typically listed in textbooks. discovery and access participants took varied approaches to finding textbooks. one would get links from the professor via email or the syllabus. others would use the campus bookstore for purchases. one would use the student information system (raiderlink) to locate information about the textbook. potential access points to make the raiderready textbook discoverable included the institutional repository, the open textbook library, and the local library catalog. the open textbook library was ruled out mostly due to campus-specific adaptations, which were not more substantial for public use than the original college success. thinktech, the institutional repository, was the most viable option and allowed for permanent linking, which worked well with the access points student users mentioned. in the second round of testing, one participant searched for the textbook via the library catalog/discovery system, google search, and the raiderready department website. the course description on the department website listed an open textbook, but the user pointed out that it was not actually linked there. discussion user testing changed our actions during the project. interactions with students did not occur during any other stage of the adaptation process before the resource was adopted in the course. many insights from the testing were indicative of self-reported preferences such as requesting more visuals, preferring print for reading and exercises, and auditory screen reading. we also learned ways that cost impacted how students used textbooks. for example, when we followed up on a participant’s comment and asked if they liked to highlight books, the student responded that they try not to mark their books because they want to resell them. testing also helped us observe actual behaviors among similar users in a way oer toolkits and guidelines alone did not. we learned more about how oer fits into the culture of learning and resources at texas tech university and how that may differ from other institutions. for a visual representation of our workflow, we adapted billy meinke’s oer production workflow (targeted to creators) because it was an openly available, editable workflow with comprehensive discrete steps. similar to the corre framework adaptations, meinke’s workflow was adapted by others, including the southern alberta institute of technology (sait), lansing community college, and the university of houston, to fit their institutional contexts.52 our process did not include an external peer review process; instead review was done by the editors. priming and preproduction information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 9 phases in the workflow were relatively quick, occurring in the first two weeks. the bulk of time— about four weeks—was spent in the development phase. the quality assurance and publishing phases occurred for about two weeks, with most of the time spent on finalizing edits and formatting. the first round of user testing took about two hours total and redux (revisiting the prototype and implementing changes), along with the format finalization, took about two weeks. finalization for formatting and redux changes in the first iteration of the text involved pressbooks troubleshooting. the original timeline for the project was 30 days, but the actual time for the project was 60 days. the second round of user testing took about two hours total and occurred at the halfway point within a new 60-day deadline for an updated version of the textbook. we acknowledge that even though the actual time spent with users in the first round was limited to two hours, the process also required time for drafting recruitment emails, communicating with volunteers, scheduling testing, and debriefing after sessions. figure 3 shows our workflow diagram, including a new quality assurance phase (see fig. 4 for detail) based on our case study. it includes prototyping (content and format draft), user testing, and implementing user feedback on the oer prototype. figure 3. discrete production workflow including quality assurance phase. this workflow is an adaptation of a workflow by billy meinke and university of hawai’i at manoa under a cc-by license. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 10 figure 4. quality assurance phase with user testing. we addressed several suggestions participants made during testing to improve the textbook’s navigation functionality. we were able to address requests for a linked toc in the second round of testing. in the first round of testing, formatting was tailored to print pdf format because the editors wanted a print version to be available. we were able to create a linked toc in the digital pdf format, but not the printable pdf format. we were not aware that the toc could be changed based on available documentation in the first round of testing, but we were able to successfully troubleshoot this issue in the second round of tests. we were not able to do any customization on the search feature, which was built in. for customization, pressbooks allows styling through css files for three formats (pdf, web page, epub) separately. we customized them for look and feel. many of the requests were constrained by our working knowledge of and the technical limits of pressbooks, so we added a tips for pdf readers and printing section in the front matter of the textbook during the first round. it is important to note that although these were not major changes to the interface, they gave us insight for iterative changes. upon reflection, it would have been information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 11 preferable to involve someone with pressbooks developer experience at the outset. because we had not led a similar project before, nor worked with the software previously, we were more limited on the changes we could make as a result of testing than we expected. however, after this experience, we know what areas to test and are better prepared to effect actual changes. we made chapters available separately in the institutional repository to cut down on excessive scrolling, because scrolling through an entire textbook slowed students’ ability to study and quickly reference the text. also, the editors requested digital as well as print access to the textbook through the campus bookstore. the raiderready textbook was also added to the library’s local records in the catalog. we did not make a web version available through pressbooks. a web version was not a priority because of the institution-specific customization and because the editors did not request one. usage statistics from the repository between march 2018 and february 2019 peaked during midterms and at the end of semesters in which the class was taught. chapters 5, 6, 7, and 8 had the most downloads—the last chapters of the book, likely the chapters students were tested on for the final—with the majority of downloads (1,368) taking place during october 2018. this indicates that the option to download individual chapters appealed to students. accessibility testing the textbook with screen readers confirmed the need for an epub format of the text. hyde gives the following guidance for educators using pressbooks: “pdf accessibility is known to be somewhat limited, and in particular some files are not able to be interpreted by screen readers. the pdf conversion tool that pressbooks uses does support pdf tagging, which improves screen reader compatibility, but often the best solution is to offer web and e-book versions of a textbook as well as the pdf, so readers can find the format that best suits their needs.”53 for pdfs, issues included lack of alt tags, headings not set, and tables and images lacking tags. adding alt tags was planned early, after they were lost when uploading the wxr (wordpress extended rss) file—a wordpress export file in xml format—in pressbooks, and loss of the alt tags was confirmed during testing at the midpoint of the process. however, due to deadlines and pressbooks functionality, we were not able to address more of the tagging issues. epubs worked much better in tests with screen readers, apple devices, and e-readers. editors preferred that a pdf be used as the primary version and wanted an epub for screen readers upon request. our partners’ preference was likely based on the common use of pdfs but it did not comply with the principles of universal or inclusive design. regarding e-book accessibility, pressbooks documentation says, “ebook accessibility is somewhat dictated by the file format standards, which focus on font sizes and screen readers, and improvements are also being made with dynamic content. the international digital publishing forum has a checklist to prompt ebook creators on accessibility functions they can incorporate while creating their content.”54 we made a decision to include multiple formats to take multiple types of use into consideration. in the fir st round of changes, we included an epub alongside the pdf in the repository, so users with disabilities would not have to self-identify by making a request in order to gain access. upon learning more about inclusive design after the pilot, we realized we were treating users as a homogeneous group and segregating the more accessible version. in the second round, when we realized the epub was not available by separate chapters as was the pdf version, we then made it available by chapter as well. we recommend that evaluating oer according to the international digital publishing forum checklist be incorporated into the qa part of the workflow. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 12 conclusion there is room for future research on iterative testing for oer and testing with more emphasis on mobile devices, testing with deeper investigation into microinteractions concerning accessibility, and testing in workflows that use other publishing platforms. as the creators of the floe project suggest, many more customizations can be made to points of user interactions if the software platform for adaptation is open source. future research may also examine regional and cultural influence on learning and interface preferences. one change that may support future adaptation projects at texas tech university would be modifying internal guidelines that take into consideration previous testing and local context. we also recommend keeping detailed documentation, particularly of steps for changes that are not included in existing guides on oer production. creating a memorandum of understanding with partners that clearly outlined responsibilities could have prevented some of the misunderstandings that occurred. for example, when stakeholders discussed producing print copies of the textbook, it wasn’t clear what the library’s role was. with a short timeline and more work involved than expected, the library was in a position of overpromising and underdelivering. it was apparent that the workflows themselves needed to be open and adaptable to support resources, communities, and processes in local contexts. it was important throughout the process to be aware of our partners’ priorities (e.g., instructional preferences, cost to students, departmental buy-in), because we had to balance these priorities with user feedback. we recommend having specific roles for content strategists, educational technologists, and developers in workflows during oer production. the work of creating workflows, assigning roles, and creating standards for oer content currently falls on librarians, instru ctional designers, and creators. as librarians seek the most sustainable workflows, it will be beneficial to emphasize investing in the quality assurance stages of oer production and evenly distributing responsibilities. this can be done through collaborative partnerships or by hiring additional positions. if other institutions were to scale the practices from our case study, ideally, librarians would take responsibility for adding roles or formalized work to the scope of either ux or oer departments so that it becomes normalized in oer workflows. we recommend working with editors to advocate for one textbook format that addresses a variety of learning needs. we plan to use these experiences, along with existing resources, to include inclusive and user -friendly recommendations in policies and guidelines for oer adaptation. conducting user testing did challenge assumptions about student use of oer by librarians and editing instructors. while we referred to toolkits, guidelines, and best practices, internal testing allowed us to make improvements to several specific microinteractions students encountered while using the text. it was very feasible to incorporate testing into the workflow. we were able to directly observe user information behavior from members of the community that the resource was intended to serve. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 13 appendix: usability test method pretest questions 1. what is your academic classification? (undergraduate, graduate, faculty) 2. have you ever used an e-textbook or a digital textbook in one of your classes? (if yes, ask for course details.) tasks to observe 1. imagine you needed to get a copy of the digital textbook raider ready: unmasking the possibilities of college success. how would you go about finding it? it will help us if you think out loud as you go along—tell us what you’re looking at, what you’re trying to do, what you’re thinking. 2. [if the tester is unable to locate the digital textbook, the moderator will open it.] please take a couple of minutes to look at this textbook. explore and click on a link or two. 3. for the next task, imagine an instructor asked you to locate the chapter activities for chapter 1. could you show us how you would locate those? 4. for the final task, could you find the student code of conduct? posttest questions 1. what were your impressions of this resource? 2. what did you like? dislike? what would you change? 3. how easy or difficult was it to find what you wanted? please explain. 4. is there anything else about your experience using this textbook today that you’d like to tell us? information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 14 endnotes 1 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: microinteractions and the user experience,” information technology and libraries 34, no. 3 (september 21, 2015): 44–45, https://doi.org/10.6017/ital.v34i3.5900. 2 anita r. walz, “open and editable: exploring library engagement in open educational resource adoption, adaptation and authoring,” virginia libraries 61 (january 2015): 23, http://hdl.handle.net/10919/52377. 3 stephen whitfield and zoe robinson, “open educational resources: the challenges of ‘usability’ and copyright clearance,” planet 25, no. 1 (2012): 52, https://doi.org/10.11120/plan.2012.00250051. 4 walz, “open and editable,” 24. 5 andy lane, “from pillar to post: exploring the issues involved in repurposing distance learning materials for use as open educational resources” (working paper, uk open university, december 2006), accessed august 1, 2018, http://kn.open.ac.uk/public/document.cfm?docid=9724. 6 andy arana et al. eds., “open logic project,” university of calgary faculty of arts and the campus alberta oer initiative, accessed april 26, 2019, http://openlogicproject.org/; robin derosa, the open anthology of earlier american literature (public commons publishing, 2015), https://openamlit.pressbooks.com/; timothy robbins, “case study: expanding the open anthology of earlier american literature,” in a guide to making open textbooks with students, ed. elizabeth mays (the rebus community for open textbook creation, 2017), https://press.rebus.community/makingopentextbookswithstudents/chapter/case-studyexpanding-open-anthology-of-earlier-american-literature/. 7 walz, “open and editable,” 24. 8 billy meinke, “discovering oer production workflows,” uh oer (blog), university of hawai’i, december 23, 2016, https://oer.hawaii.edu/discovering-oer-production-workflows/. 9 betz and hall, “self-archiving with ease,” 44. 10 beth st. jean et al., “unheard voices: institutional repository end-users,” college & research libraries 72, no. 1 (january 2011): 23, https://doi.org/10.5860/crl-71r1. 11 jutta treviranus et al., “an introduction to the floe project,” in international conference on universal access in human-computer interaction, universal access to information and knowledge, ed. constantine stephanidis and margherita antona, uahci 2014 (june 2014), lecture notes in computer science 8514: 454, https://doi.org/10.1007/978-3-319-074405_42. 12 sarah crissinger, “a critical take on oer practices: interrogating commercialization, colonialism, and content,” in the library with the lead pipe, october 21, 2015, http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practicesinterrogating-commercialization-colonialism-and-content/; diane harley, “why https://doi.org/10.6017/ital.v34i3.5900 https://nam04.safelinks.protection.outlook.com/?url=http%3a%2f%2fhdl.handle.net%2f10919%2f52377&data=02%7c01%7ccamille.thomas%40ttu.edu%7c1184a3bd8d1b411a7d6f08d6abcb07bd%7c178a51bf8b2049ffb65556245d5c173c%7c0%7c0%7c636885285831155488&sdata=dqdoguvqanm6uote7bqivip8uoaz%2b3xnoxg5uscm4tc%3d&reserved=0 https://nam04.safelinks.protection.outlook.com/?url=http%3a%2f%2fhdl.handle.net%2f10919%2f52377&data=02%7c01%7ccamille.thomas%40ttu.edu%7c1184a3bd8d1b411a7d6f08d6abcb07bd%7c178a51bf8b2049ffb65556245d5c173c%7c0%7c0%7c636885285831155488&sdata=dqdoguvqanm6uote7bqivip8uoaz%2b3xnoxg5uscm4tc%3d&reserved=0 http://hdl.handle.net/10919/52377 https://doi.org/10.11120/plan.2012.00250051 https://doi.org/10.11120/plan.2012.00250051 https://doi.org/10.11120/plan.2012.00250051 http://kn.open.ac.uk/public/document.cfm?docid=9724 http://openlogicproject.org/ https://openamlit.pressbooks.com/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://oer.hawaii.edu/discovering-oer-production-workflows/ https://doi.org/10.5860/crl-71r1 https://doi.org/10.1007/978-3-319-07440-5_42 https://doi.org/10.1007/978-3-319-07440-5_42 http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practices-interrogating-commercialization-colonialism-and-content/ http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practices-interrogating-commercialization-colonialism-and-content/ information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 15 understanding the use and users of open education matters,” in opening up education: the collective advancement of education through open technology, open content, and open knowledge, ed. toru iiyoshi and m.s. vijay kumar (cambridge, ma: the mit press, 2008), 197– 212. 13 harley, “why understanding,” 208. 14 tom carey and gerard l. hanley, “extending the impact of open educational resources through alignment with pedagogical content knowledge and institutional strategy: lessons learned from the merlot community experience,” in opening up education: the collective advancement of education through open technology, open content, and open knowledge, ed. toru iiyoshi and m.s. vijay kumar (cambridge, ma: the mit press, 2008), 238. 15 rebecca blakiston and shoshana mayden, “how we hired a content strategist (and why you should too),” journal of web librarianship 9, no. 4 (2015): 202–6, https://doi.org/10.1080/19322909.2015.1105730; “our team,” openstax, rice university, accessed december 9, 2019, https://openstax.org/team. 16 maria nuccilli, elliot polak, and alex binno, “start with an hour a week: enhancing usability at wayne state university libraries,” weave: journal of library user experience 1, no. 8 (2018), https://doi.org/10.3998/weave.12535642.0001.803. 17 jakob nielsen and thomas k. landauer, “a mathematical model of the finding of usability problems,” in proceedings of the interact’93 and chi’93 conference on human factors in computing systems (may 1993): 211–12, https://doi.org/10.1145/169059.169166. 18 simone borsci et al., “reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation,” in acm transactions on computer-human interaction 20, no. 5, article 29 (november 2013), 18–19, http://delivery.acm.org/10.1145/2510000/2506210/a29-borsci.pdf. 19 treviranus et al., “floe project,” 454. 20 laura icela gonzález-pérez, maría-soledad ramírez-montoya, and francisco j. garcía-peñalvo, “user experience in institutional repositories: a systematic literature review,” international journal of human capital and information technology professionals 9, no. 1 (january–march 2018): 79, 84, https://doi.org/10.4018/ijhcitp.2018010105; betz and hall, “self-archiving with ease,” 45; st. jean et al., “unheard voices,” 23, 36–37, 40. 21 meinke, “discovering oer production workflows”; nuccilli, polak, and binno, “start with an hour.” 22 steven d. eppinger, murthy v. nukala, and daniel e. whitney, “generalised models of design iteration using signal flow graphs,” research in engineering design 9, no. 2 (1997): 112; helen timperley et al., teacher professional learning and development (wellington, new zealand: ministry of education, 2007), http://www.oecd.org/education/school/48727127.pdf. 23 eppinger, nukala, and whitney, “design iteration,” 112–13. https://doi.org/10.1080/19322909.2015.1105730 https://openstax.org/team https://doi.org/10.3998/weave.12535642.0001.803 https://doi.org/10.1145/169059.169166 http://delivery.acm.org/10.1145/2510000/2506210/a29-borsci.pdf https://doi.org/10.4018/ijhcitp.2018010105 http://www.oecd.org/education/school/48727127.pdf information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 16 24 walz, “open and editable,” 23; blakiston and mayden, “how we hired a content strategist,” 203. 25 walz, “open and editable,” 28. 26 harley, “why understanding,” 201–6. 27 scott woodward, adam lloyd, and royce kimmons, “student voice in textbook evaluation: comparing open and restricted textbooks,” international review of research in open and distributed learning 18, no. 6 (september 2017), 150–63, https://doi.org/10.19173/irrodl.v18i6.3170. 28 meinke, “discovering oer production workflows.” 29 samuel k. nikoi et al., “corre: a framework for evaluating and transforming teaching materials into open educational resources,” open learning: the journal of open, distance and e-learning 26, no. 3 (2011), 194–99, https://doi.org/10.1080/02680513.2011.611681. 30 “corre 2.0,” institute of learning innovation, university of leicester, accessed april 25, 2019, https://www2.le.ac.uk/departments/beyond-distance-researchalliance/projects/ostrich/corre-2.0. 31 betz and hall, “self-archiving with ease,” 45–46. 32 andré constantino da silva et al., “portability and usability of open educational resources on mobile devices: a study in the context of brazilian educational portals and android-based devices” (paper, international conference on mobile learning 2014, madrid, spain, february 28–march 2, 2014), 198, https://eric.ed.gov/?id=ed557248. 33 sarah morehouse, “oer bootcamp 3-3: oers and usability,” youtube video, 3:16, march 2, 2018, https://www.youtube.com/watch?v=cncxbcs-2gm. 34 krista godfrey, “creating a culture of usability,” weave: journal of library user experience 1, no. 3 (2015), https://doi.org/10.3998/weave.12535642.0001.301; peter morville, “user experience design,” semantic studios, june 21, 2004, http://semanticstudios.com/user_experience_design/. 35 meinke, “discovering oer production workflows.” 36 cynthia ng, “a practical guide to improving web accessibility,” weave: journal of library user experience 1, no. 7 (2017), https://doi.org/10.3998/weave.12535642.0001.701; whitney quesenbery, “usable accessibility: making web sites work well for people with disabilities,” ux matters, february 23, 2009, http://www.uxmatters.com/mt/archives/2009/02/usableaccessibility-making-web-sites-work-well-for-people-with-disabilities.php. 37 ng, “improving web accessibility.” 38 amanda coolidge et al., accessibility toolkit 2nd edition (victoria, b.c.: bccampus, 2018), 1–71, https://opentextbc.ca/accessibilitytoolkit/. https://doi.org/10.19173/irrodl.v18i6.3170 https://doi.org/10.1080/02680513.2011.611681 https://www2.le.ac.uk/departments/beyond-distance-research-alliance/projects/ostrich/corre-2.0 https://www2.le.ac.uk/departments/beyond-distance-research-alliance/projects/ostrich/corre-2.0 https://eric.ed.gov/?id=ed557248 https://www.youtube.com/watch?v=cncxbcs-2gm https://doi.org/10.3998/weave.12535642.0001.301 http://semanticstudios.com/user_experience_design/ https://doi.org/10.3998/weave.12535642.0001.701 http://www.uxmatters.com/mt/archives/2009/02/usable-accessibility-making-web-sites-work-well-for-people-with-disabilities.php http://www.uxmatters.com/mt/archives/2009/02/usable-accessibility-making-web-sites-work-well-for-people-with-disabilities.php https://opentextbc.ca/accessibilitytoolkit/ information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 17 39 vassilis kourbetis and konstantinos boukouras, “accessible open educational resources for students with disabilities in greece,” in universal access in human-computer interaction, universal access to information and knowledge, ed. constantine stephanidis and margherita antona, uahci 2014 (june 2014), lecture notes in computer science 8514: 349–57, https://doi.org/10.1007/978-3-319-07440-5_32. 40 treviranus et al., “floe project,” 455–56. 41 treviranus et al., 456–57. 42 treviranus et al., 456–57. 43 ng, “improving web accessibility”; treviranus et al., “floe project,” 460–61. 44 treviranus et al., “floe project,” 461. 45 2018 academic student ebook experience survey report (library journal research, 2018): 6, accessed may 3, 2019, https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey _report. 46 michael gorrell, “the ebook user experience in an integrated research platform,” against the grain 23, no. 5 (december 2014): 38; robert slater, “why aren’t e-books gaining more ground in academic libraries? e-book use and perceptions: a review of published literature and research,” journal of web librarianship 4, no. 4 (2010): 305–31; joelle thomas and galadriel chilton, “library e-book platforms are broken: let’s fix them,” academic e-books: publishers, librarians, and users (2016): 249–62; christina mune and ann agee, “ebook showdown: evaluating academic ebook platforms from a user perspective,” in creating sustainable community: the proceedings of the acrl 2015 conference (2015): 25–28; laura muir and graeme hawes, “the case for e-book literacy: undergraduate students’ experience with ebooks for course work,” the journal of academic librarianship 39, no. 3 (2013): 260–74; esta tovstiadi, natalia tingle, and gabrielle wiersma, “academic e-book usability from the student’s perspective,” evidence based library and information practice 13, no. 4 (2018): 70– 87. 47 erin dorris cassidy, michelle martinez, and lisa shen, “not in love, or not in the know? graduate student and faculty use (and non-use) of e-books,” the journal of academic librarianship 38, no. 6 (2012): 326–32; gorrell, “the ebook user experience,” 36–40. 48 jo r. jardina and barbara s. chaparro, “investigating the usability of e-textbooks using the technique for human error assessment,” journal of usability studies 10, no. 4 (2015): 140–59. 49 steve krug, rocket surgery made easy (berkeley, ca: new riders, 2010), 146–53. 50 danielle a. becker and lauren yannotta, “modeling a library website redesign process: developing a user-centered website through usability testing,” information technology and libraries 32, no. 1 (march 2013): 9–10. 51 blakiston and mayden, “how we hired a content strategist,” 194. https://doi.org/10.1007/978-3-319-07440-5_32 https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey_report https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey_report information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 18 52 jessica norman, sait oer workflow, may 2019, accessed july 14, 2020, https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5yk/; regina gong, oer production workflow, accessed july 14, 2020, http://libguides.lcc.edu/oer/adopt; ariana e. santiago, oer adoption workflow visual overview, april 2019, accessed july 14, 2020, https://docs.google.com/drawings/d/1czqhpgpqyrr46vm5iytoemyqj-s1zr0p-m-lj16rtto/; meinke, “discovering oer production workflows.” 53 zoe wake hyde, “accessibility and universal design,” in pressbooks for edu guide (pressbooks.com, 2016), https://www.publiconsulting.com/wordpress/eduguide/. 54 hyde. https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5-yk/edit https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5-yk/edit http://libguides.lcc.edu/oer/adopt https://docs.google.com/drawings/d/1czqhpgpqyrr46vm5iytoemyqj-s1zr0p-m-lj16rtto/ https://www.publiconsulting.com/wordpress/eduguide/ abstract introduction literature review case study limitations results content and navigation discovery and access discussion accessibility conclusion appendix: usability test method endnotes digitization of libraries, archives, and museums in russia article digitization of libraries, archives, and museums in russia heesop kim and nadezhda maltceva information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.13783 heesop kim (heesop@knu.ac.kr) is professor, kyungpook national university. nadezhda maltceva (nadyamaltceva7@gmail.com) is graduate student, kyungpook national university. © 2022. abstract this paper discusses the digitization of cultural heritage in russian libraries, archives, and museums. in order to achieve the research goals, both quantitative and qualitative research methodologies were adopted to analyze the current status of legislative principles related to digitization through the literature review and the circumstance of the latest projects related to digitization through the literature and website review. the results showed that these institutions seem quite successful where they provide a wide range of services for the users to access the digital collections. however, the main constraints on digitization within libraries, archives, and museums in russia are connected with the scale of the work, dispersal of rare books throughout the country, and low level of document usage. introduction culture is one of the most important aspects of human activity. libraries, archives, and museums (lams) in the russian federation store some of the richest cultural and historical heritage collections, some of which can be classified as world cultural treasures. as is true with other countries, lams in russia are engaging in problems with digitization of their unique cultural treasures. in this regard, these repositories are implementing digital technologies to improve their work on digitization, preservation, indexing, search, and access of cultural heritage more effectively and efficiently. information technologies can be used to preserve national knowledge and experience.1 the digitization of cultural heritage is one of the changes that occurred at the present stage of the global information society. researchers have made many attempts to define the concept of digital culture, which is considered to be a phenomenon that manifests itself through art, creativity, and self-realization, by implementing information technologies.2 the need for digitization of unique cultural heritage has caused the rapid development of digital libraries, archives, and museums, described collectively as digital lams, the multidisciplinary institutions that change the way people retrieve and access information. researchers and specialists involved in the digitization of information resources in lams work together to preserve the cultural heritage of the russian federation using modern information technologies. as pronina noted, the digitization of cultural heritage began to develop actively in many countries, including russia, around the same time.3 many researchers analyzed digitization issues in russia. for example, lopatina and neretin discussed the modernization of the system of cultural information resources and the history of preserving digital cultural heritage in russia.4 astakhova pointed out the problem of the digitization of cultural heritage and the transformation of art objects into 3d models.5 miroshnichenko et al. discussed the problem of organizing digital documents in the state archives and pointed out the issues of providing digitized archival documents for wide access through open electronic resources.6 mailto:heesop@knu.ac.kr mailto:nadyamaltceva7@gmail.com information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 2 despite a long history of improvements in digitization policies and programs, issues still exist in the major cultural repositories, and russia’s level and scope of digitization research are still lagging behind many european countries.7 therefore, three primary research questions guide this study: 1. what is the policy to regulate the digitization of cultural heritage in russia? 2. what is the status of the digitization of cultural heritage in russia? 3. what are the constraints related to digitization in russia? in addition, there is not enough research that fully reflects the current activities of digitization practices in lams in russia. by analyzing this matter, the authors hope to present the state of cultural heritage digitization in russia and uncover problems and limitations in this field. benefits of digitization in a cultural heritage repository before answering the key research questions, it is worth exploring the ultimate benefits of digitization in cultural heritage repositories. digitization refers to converting an analogue source into a digital version.8 a large proportion of the collections related to cultural heritage repositories comprise not only the materials that are born digital, but many resources that are not originally created in digital form that have been digitized. digitization involves three major stages.9 the first stage is related to preparing objects for digitization and the actual process of digitizing them. the second stage is concerned with the processing required to make the materials easily accessible to users. this involves a number of editorial and processing activities including cataloguing, indexing, compression, and storage, as well as applying appropriate standards for text and multimedia file formats to meet the needs of online digital lams. the third stage includes the preservation and maintenance of the digitized collections and services built upon them.10 the benefits of digitization are improved access and preservation. items, once digitized, can be used by many people from different places simultaneously at any point in time. unlike printed or analogue collections, digitized collections are not damaged by heavy and frequent usage, which helps in the preservation of information. according to ifla’s guidelines, several benefits come from having digitized materials. organizations digitize 1. to increase access to a high demand from users and the library or archive has the desire to improve access to a specific collection; 2. to improve services to an expanding user’s group by providing enhanced access to the institution’s resources with respect to education and life-long learning; 3. to reduce the handling and use of fragile or heavily used original material and create a backup copy for endangered material such as brittle books or documents; 4. to give the institution opportunities for the development of its technical infrastructure and staff skill capacity; 5. to form a desire to develop collaborative resources, sharing partnerships with other institutions to create virtual collections and increase worldwide access; 6. to seek partnerships with other institutions to capitalize on the economic advantages of a shared approach; and 7. to take advantage of financial opportunities, for example the likelihood of securing funding to implement a program, or of a particular project being able to generate significant income.11 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 3 while digitization has benefits, there are also some problems. the most obvious one is related to the quality of the digitized objects. in the course of digitizing, we may lose some important aspects of the original document. another problem is related to access management. proper mechanisms need to be put in place to determine the authenticity of materials, as well as to control unauthorized access and use. the success of digitization projects depends not only on technology but also on project planning. since digitization is a relatively new process, institutions may concentrate on technology before deciding on a project’s purpose. however, technology should never drive digitization projects; instead, user needs should be determined first, and only then should a technology appropriate to those needs be selected to meet a project’s objectives. the best practices for planning a digitization project can be suggested as follows: determine the copyright status of the materials; identify the intended audience of the materials; determine whether it is technically feasible to capture the information; insist on the highest quality of technical work that the institution can afford; factor in costs and capabilities for long-term maintenance of the digitized images; cultivate a high level of staff involvement; write a project plan, budget, timeline, and other planning documents; budget time for staff training; plan a workflow based upon the results of scanning and cataloging a representative sample of material.12 policies regulating digitization of cultural heritage in russia the policy development at the time of selection should be made early for the suitability of selection and digital object management. this policy should formulate goals of the digitization project, identify materials, set selection criteria, define the means of access to digitized collections, set standards for image and metadata capture and for preservation of the original materials and state the institutional commitment to the long-term preservation of digital content.13 as stated by russian law, the cultural heritage of the peoples of the russian federation includes material and spiritual values created in the past, as well as monuments and historical and cultural territories and objects significant for the preservation and development of identity of the russian federation and all its peoples, their contribution to world civilization.14 the decree of the president of the russian federation “on approval of the fundamentals of state cultural policy” extended the term of cultural heritage by including documents, books, photos, art objects, and other cultural treasures that represent the knowledge and ideas of people throughout the centuries. the government emphasized the role of the information environment and modern technologies by analyzing it at the legislative level. in the presidential decree “on approval of the fundamentals of state cultural policy,” the concept of the information environment is separately distinguished, defined as a set of mass media, radio, and television broadcasting, and the internet; the textual and visual information materials disseminated through them; as well as the creation of digital archives, libraries, and digitized museum collections.15 another important part of the government policy is to provide open access to cultural heritage objects. the problem of access was confirmed in the state program culture of russia (2012 –2018), which stipulated the need to provide access to cultural heritage in digital forms as well as to create and support resources that provide access to cultural heritage objects on the internet and in the national electronic library, one of the main digital repositories in the country.16 access to digital cultural heritage was also considered in the state program information society (2011–2020). the subprogram information environment ensured equal access to the media environment, including information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 4 objects of digital cultural heritage. the program aimed to reduce the gap in access to cultural heritage objects in different regions across the russian federation.17 the digitization of cultural heritage and creation of digital archives is one of the characteristics of innovative changes in the cultural sphere of the information society. the law “on archival affairs” notes that a significant part of the information resources of the archives has a historical and cultural value and should be considered as part of the digital cultural heritage collection, the digitization of which is required.18 with regards to libraries, on january 22, 2020, the state duma of the russian federation adopted the draft law “on amendments to the federal law on librarianship” in terms of improving the procedure for state registration of rare books (rare books are defined as handwritten books or printed publications that have an outstanding spiritual or material value; have a special historical, scientific, cultural significance; and for which a special regime for accounting, storage, and use has been established) that aimed at ensuring legal protection of rare books by improving the system of protection of the items of the national library. the law reflects the criteria for classifying valuable documents as rare books and fixes the main stages of their registration. in case of museums, a federal law from 1996 aimed to establish the national catalog of the russian federation museum collections. at first this national catalog was created for inventory purposes and then it was transformed into an online database to ensure open access to russia’s cultural heritage (http://kremlin.ru/events/administration/21027). annual reports “on the state of culture in the russian federation” reflect the overall situation and changes in libraries, archives, and museums. some researchers emphasized the need to develop a unified regulatory framework for cultural heritage preservation practices. particularly, shapovalova stressed that the leader in this discussion should be the government, which plays a crucial role in the legal regulations of the cultural heritage policy and is responsible for the development of initiatives.19 however, lialkova and naumov criticized that russian policy discusses digitization of only a few cultural objects, but does not define the legal status of such objects and does not cover objects originally created in a digital form.20 kozlova considered the issues of russian digital culture within the framework of the obligatory library copies system.21 since 1994, the national library of russia has accepted electronic media according to the federal law “on the obligatory copy of documents,” which established the legal deposit system; the bibliographic records of deposited electronic media are available online in the electronic catalog “russian electronic editions.” acquisitions librarians use this catalog as a national bibliographic resource for adding electronic editions to their collections. dzhigo addressed issues of digital preservation of cultural heritage and also paid attention to the federal legal deposit law.22 yumasheva dealt with the content of the russian normative methodic of regulating the process of digital copying of historical and cultural heritage from russian libraries and museums.23 kruglikova considered theoretical and practical issues of legislation for the preservation and popularization of cultural heritage in the modern world.24 shapovalova suggested introducing the terms of digital cultural heritage objects on a legislative level, to recognize the concept of preserving cultural heritage and to provide virtual access to such objects on the bigger scale.25 a review of the literature reveals various studies that discuss cultural heritage preservation using modern technologies. the majority of researchers identified issues in this field. digitization practices are carried out mainly by the state libraries, archives, and museums which seek to http://kremlin.ru/events/administration/21027 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 5 preserve cultural heritage objects in a better methodological and legislative way, and less development is seen in smaller local lams. researchers express the value of preservation of cultural materials and the need to analyze and improve legislative procedures. to this day government realizes the importance of digital preservation; however, the term “digital cultural heritage” is not mentioned and the legal status of such digitized objects is not defined. in addition, legislative documents do not cover the regulation of objects originally created in digital format. moreover, we can see a large gap between the accumulation of materials and the degree of their use despite the fact that the government seems to support open access to digital cultural heritage objects. digitization projects of cultural heritage in russia to analyze the circumstances of the latest projects related to digitization, we investigated the relevant websites from may 2021 to june 2022. in this study, we chose a few representative institutions, including some national projects, based on their reputation, authority, and the scope of the collections. the data on digitization practices and current projects were collected . the list of institutions is shown in table 1. as shown in table 1, the authors selected russian national library, national electronic library, russian state library, and presidential library as the largest and most well-known libraries in russia. among the archives chosen for the analysis, the archival fonds was selected because it unites the archives in russia in one system, and the national digital archive was selected because its main goal is to preserve and archive key russian digital resources. as for the museums, the state hermitage museum, the state russian museum, and the state museum of fine arts named after a. s. pushkin were chosen for this study because they hold the richest collections of russian cultural heritage and play a vital role in replenishment of the national catalogue of the russian federation museum collections, the main goal of which is to unite museums across the country. by analyzing the websites of these selected libraries, archives, and museums, we can gain insight into what projects have been undertaken to preserve cultural heritage and what are the main drawbacks of this field. however, it is true that some institutions do not share the latest information on digitized items. in the case of libraries and archives, the numbers are fairly public on the website, but it is difficult to prove exactly when the objects were digitized. however, not all museums share information about recently digitized objects. in this case, quantitatively analyzing digitization practices is the only way. therefore, the authors used a manual method for data collection and counted the number of digitized materials available on the website. indeed, this could be one of the limitations of this work, as some institutions have hidden the exact amount of digitized collections; some institutions have not been able to manually count digitized copies due to huge amounts of data; and some websites may not be up to date. information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 6 table 1. institutions responsible for digitization of cultural heritage in russian lams type name size description libraries russian national library http://nlr.ru/eng/ra 2403/digital-library 650,000 scanned copies as of the beginning of 2019, the digital library included scanned copies of books, magazines, newspapers, music publications, graphic materials, maps, audio recordings, and more. the scanned materials include items from the national library of russia and from partner libraries, publishing organizations, authors and readers. national electronic library https://rusneb.ru 1,700,000 digitized books26 the nel project was designed to provide internet users with access to digitized documents from russian libraries, museums, and archives. nel combines rare books and manuscripts, periodicals, and sheet music collected from all major russian libraries. russian state library https://www.rsl.ru 1,500,000 documents this is the largest public library in russia; the digital collection contains copies of valuable and most requested publications, as well as documents originally created in electronic form. the electronic catalog contains information on more than 21 million publications, 1.5 million of which have been digitized. presidential library https://www.prlib.r u/en 1,000,000 units the presidential library is a nationwide electronic repository of digital copies of the most important documents of the history of russia. the volume of the presidential library collections is more than a million storage units including digital copies of books and journals, archival documents, audio and video recordings, photographs, films, dissertation abstracts, and other materials. http://nlr.ru/eng/ra2403/digital-library http://nlr.ru/eng/ra2403/digital-library https://rusneb.ru/ https://www.rsl.ru/ https://www.prlib.ru/en https://www.prlib.ru/en information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 7 type name size description archives archival fonds of russia (central fonds catalog) https://cfc.rusarchiv es.ru/cfc-search/ 959,576 archival fonds27 annually, the volume of documents of the archival fonds of the russian federation increases by an average of 1.7 million units. as of december 13, 2020, the central fonds catalog included 959,576 items from 13 federal archives and 2,225 state and municipal archives of the russian federation. national digital archive https://ruarxive.org 282 websites28 the purpose of this initiative is to find and preserve websites and other digital materials of high public value and at risk of destruction. the nda project collects official accounts on social networks, official websites of government bodies and political parties, and historical data. however, not many websites were collected in comparison with other countries’ initiatives. unlike the internet archive, the nda project make a complete copy of everything that is on the site, including archive channels on twitter, instagram, and telegram. museums national catalogue of the russian federation museum collections https://goskatalog.r u/portal/#/ 23,193,078 units the catalog is an electronic database containing basic information about each museum item and each museum collection included in the museum fonds of the russian federation. according to the latest statistics (2020), over 23 million units were recorded in the national museum catalog. however, the total amount of museum objects across russia is more than 84 million. https://cfc.rusarchives.ru/cfc-search/ https://cfc.rusarchives.ru/cfc-search/ https://ruarxive.org/ https://goskatalog.ru/portal/#/ https://goskatalog.ru/portal/#/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 8 type name size description state hermitage museum https://www.hermit agemuseum.org 400,000 units the state hermitage museum is the second largest museum in the world. the hermitage exposition is gradually moving online. this process is slow and very laborious. the entire collection of the hermitage has not been digitized, but the website already contains 400,000 exhibits (that is, approximately only one tenth of the entire collection). the online collection includes paintings, sculptures, numismatics, archaeological finds, and other exhibits. state russian museum https://www.rusmus eum.ru/collections/ 3,682* * the number of digitized collections were manually counted on the website this is the world’s largest museum of russian art. the collection of the museum has about 400,000 exhibits and covers all historical periods of russian art. at the moment on the museum website only a small part of the collection is available in digitized form. however, the museum is maintaining the virtual state russian museum branch project, the main goal of which is to give free access to digital and printed materials from other institutions online. state museum of fine arts named after a. s. pushkin https://pushkinmus eum.art 334,000 as of march 1, 2019, the museum’s database contained information on 670,000 museum items, 334,000 (49%) of which have images. in total there are about 683,000 images in the database (not counting special photography) with a volume of about 35 tb. https://www.hermitagemuseum.org/ https://www.hermitagemuseum.org/ https://www.rusmuseum.ru/collections/ https://www.rusmuseum.ru/collections/ https://pushkinmuseum.art/ https://pushkinmuseum.art/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 9 russian national library national electronic library russian national digital archive state hermitage museum figure 1. screenshots of the websites of some of the institutions listed in table 1. a further analysis of russian museums shows that 2,773 state and municipal museums have more than 84 million items, but only a few are displayed in digital form. biryukova et al. reviewed the interdisciplinary approach to preserving cultural heritage and creating virtual museums.29 povroznik also analyzed virtual museums that preserve the cultural heritage of the russian federation.30 the author concluded that virtual museums and its resources need to be studied, developed, and improved more. kondratyev et al. considered the issues of digital heritage preservation from the security, integrity, and accessibility perspective, and analyzed the concept of a smart museum.31 lapteva and pikov represented the experience of the students of the institute for the humanities of siberian federal university working with the state russian museum and the state hermitage museum, the leading russian museums that are playing the important role in country digitization practices.32 the authors noted that results of implementing modern information technologies in museums create a comfortable infrastructure for the audience by preserving and representing cultural heritage in interactive contexts. information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 10 findings digitization in russian libraries creating a digital collection has become a normal library activity in russia.33 within the framework of the main development of activities to preserve russian library collections from 2011 to 2020, one of the main programs of the national library is the digitization of rare books. rare books, according to the federal law “on amendments to the federal law on librarianship,” include handwritten books or printed publications that have outstanding material value or special historical, scientific, and/or cultural significance.34 thus, the law elevated the book to the same level of protection as other objects of cultural heritage at the national level. the website of the register of rare books (https://knpam.rusneb.ru), hosted by the russian state library, became a part of the national library collection preservation program developed in 2001. from 2001 to 2009, the subprogram rare books of the russian federation was created to provide a regulatory framework and methodological support for all areas of library activities related to the preservation of library collections. this program includes not only libraries but also other institutions such as museums, archives, and scientific and educational institutions. however, in order to implement the state registration of rare books, it is necessary to further develop regulatory documents that can control the reference procedure and registration procedure of rare books. another initiative for book preservation is the federal project digital culture, designed to provide citizens with wide access to the country’s unique cultural heritage. it was expected that ten to twenty libraries of different russian regions will take part in the digitization project, each offering at least 50 documents from their collections to the project. however, the problems of this program are related to the work scale, as well as the dispersal issue of rare books throughout the country. as the 2011–2020 library preservation report emphasizes, many of these rare books remain unknown to the wider scholarly community. approximately half of the valuable collections available in the country’s repositories are not described as integral objects of cultural and historical heritage. the russian state library noted that the main problems associated with rare books include comprehensive function to identify and record rare and valuable books; ensuring the safety and security of the books; copying valuable materials requires special equipment; and the need for proper storage as the most important condition for the preservation. another main center of digitization is the digital library of the national library of russia (nlr, https://nlr.ru). the digital library is an open and accessible information resource that includes over 650,000 digitized copies of books, magazines, newspapers, music publications, graphic materials, maps, plans, atlases, and audio recordings. the digitized materials include items from the holdings of the national library of russia, partner libraries, publishing organizations, authors, and even readers. for now, the digital collection of the library includes various collections such as landmarks of the nlr, rare books, rossica, maps, and manuscripts. hosted by russia’s national library in 2004, the national electronic library (nel, https://rusneb.ru/) was launched to create an electronic library sponsored by the russian federal ministry of culture. the nel is a service that searches the full text of scanned books and magazines that have been processed using optical character recognition and converted them into text. it is stored in a digital database available through the internet and mobile applications. one of the main tasks of the nel is the integration of the libraries of the russian federation into a single information network. as of june 15, 2022, the nel collection had a total of 5 million artifacts including electronic copies of books, educational and periodical literature, dissertations and https://knpam.rusneb.ru/ https://nlr.ru/ https://rusneb.ru/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 11 abstracts, monographs, patents, notes, and visual and cartographic publications. the russian state library became the main operator of the national electronic library project in 2014. since 2015, the national library of russia has expanded its digitization program and the site now publishes a list of publications that require digitization. readers vote for publications directly on the site by clicking the vote for digitalization button. for example, as of november 2020, a list of 1,998 publications on a variety of topics ranging from physics and mathematical literature to psychology and music was available for voting. digitization in russian archives archives have historical, scientific, social, and cultural significance, which is an essential part of the process of preserving russian cultural heritage. digitization projects in russia began as an element of the digital cataloging of the largest archives from the 1980s to the 1990s. initially, the main purpose of the digitization project was to create digital copies to ensure the preservation of original archive documents and to eliminate the distribution of rare or poor originals in the reading room. since then, digitization has become an integral part of creating digital archives in russia.35 currently, one of the main goals of digitizing archival documents is to provide open access to legal entities and individuals to archival documents from the russian federation. the main archival center is the archives fond of the russian federation (http://archives.ru/af.shtml). the archives fond has more than 609 million items from the early eleventh century to the present and performs important functions to preserve historical memory, replenish information resources, and provide access to the public. the main task of digitization is to preserve russia’s cultural and historical heritage. each year, the total volume of archives across russia increases by an average of 1.7 million items. despite the relatively small amount of equipment for digitization, we can still see progress. in 2015, 8,750 documents were digitized, while in 2019, the annual total had reached 27,518 documents. this increase in the number of digital documents shows that digital copy production is directly related to equipment acquisition. however, the researchers found that the level of use of these documents was not high and tended to decrease. for example, in 2015, there were 18,155 document views, while in in 2019, there were only 19,417 document views. therefore, it is necessary not only to promote the service of the archive agency but also to increase the demand for archive documents. a portal was created under the auspices of the archives fond of the russian federation (http://www.rusarchives.ru) to encourage archiving services for users and to organize all archives throughout russia. the portal collects information resources of russian archives on the internet and publishes archival directories and regulations. the establishment was an important breakthrough in organizing access to the documents of the archives fond of the russian federation. since 2012, the website has operated the central catalog software complex, which provides information on the composition of federal and regional digitized fonds. as reported by the federal archival agency, 32 virtual exhibition projects are posted on the official website and portals of the federal archival agency. this website provides information about online archive projects, including virtual exhibitions, digital collections, and inter-archive projects. users can search for materials on the website by three publication types: virtual exhibition, document collection, and inter-archive project. the project covers four subjects: the great patriotic war, statehood of russia, soviet era, and space exploration. the federal archival agency’s main website also provides five catalogs and databases that guide users through digitized collections. this list includes the central stock catalog http://archives.ru/af.shtml http://www.rusarchives.ru/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 12 (http://cfc.rusarchives.ru/cfc-search), state register of unique documents (https://unikdoc.rusarchives.ru/), guides to russian archives (https://guides.rusarchives.ru/), electronic inventories of federal archives (https://rusarchives.ru/elektronnye-opisi-federalnyharhivov), database of declassified cases and documents of the federal archives (http://unsecret.rusarchives.ru/), and database on the places of storage of documents on personnel (http://ls.rusarchives.ru/). as of january 1, 2022, 859 documents were included in the state register of unique documents of the archival fund of the russian federation. a total of more than 98,000 documents are stored in the database. a project to digitize documents from the soviet era is still in progress, and the new collections of digitized copies of archival documents stored in federal archives across russia will be displayed on the website in the future (http://sovdoc.rusarchives.ru/#main). one of the major drawbacks of the digitization process in russia is that archival agencies and cultural heritages are scattered throughout russia. to develop digital archiving initiatives in different regions of russia the culture of russia (2012–2018) program was developed. archives of the constituent entities of the russian federation can take part in this program and get funding from the regional budget to digitize collections as a part of the regional program for the development of archival affairs.36 despite some improvements and ongoing projects, there are still no initiatives for the long -term preservation of born-digital materials and no requirements for mandatory long-term preservation of information. however, the national digital archive (https://ruarxive.org) was created to find and preserve websites and other digital materials that have a high public value and are at risk of destruction. this initiative proposes the general idea of archiving modern digital heritage and consists of many projects. the main one is preserved government, which aims to preserve official materials in the following areas: official accounts on social networks; official sites of government managers, officials, political parties; historical documents; and especially databases. future plans include developing tools that will help collect digital materials faster and more efficiently and also better systematize what has already been collected. digitization in russian museums the active introduction of information technology into museums began at the end of the twentieth century. a new area of study, museum informatics, has emerged in russian higher-education institutions. this area of study focuses on museum work and modern information technology to develop and improve museum activities.37 museums have developed many digitization projects to preserve their collections and give free and easy access to cultural heritage items. the modern russian museum system consists of about 2,773 museums, although the exact number of museums is not known. since the 1970s, the rationale for russian museum digitization practices has been quite similar to that of many other countries, finding that information and collection management are needed to ensure that museum objects are listed and properly preserved. the museums plan to create electronic collections, open valuable collections to the public, create a state catalog of the museum collection of the russian federation (https://goskatalog.ru/portal/#/) and integrate all works from all museums in russia. as of 2020, more than 23 million museum items are registered in the national catalog of the museum collection. the catalog is planned to be complete by 2026, when metadata and images of the museum’s collection are included in the register and posted online. digitization of museum collections is an important process that has recently received stable support from the government. http://cfc.rusarchives.ru/cfc-search https://unikdoc.rusarchives.ru/ https://guides.rusarchives.ru/ https://rusarchives.ru/elektronnye-opisi-federalnyh-arhivov https://rusarchives.ru/elektronnye-opisi-federalnyh-arhivov http://unsecret.rusarchives.ru/ http://ls.rusarchives.ru/ http://sovdoc.rusarchives.ru/#main https://ruarxive.org/ https://goskatalog.ru/portal/#/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 13 the national information society (2011–2020) program includes a project to create a new virtual museum based on the collections of the country’s largest national museum. the term “virtual museum” is used to characterize various projects linked to digital technology in virtual and museum space.38 it can be represented by a collection of works of art on the internet and the publication of the museum’s electronic expositions. currently, there are about 300 virtual museums available in virtual form across the country (https://www.culture.ru/museums). the most-visited museums are the state hermitage museum in st. peterburg (https://www.hermitagemuseum.org/), the state tretyakov gallery (https://www.tretyakovgallery.ru), and the state russian museum (http://en.rusmuseum.ru). these museums offer users a wide range of activities, including the use of modern technology . for example, since 2003 the russian art collection at the state russian museum (the world’s largest museum) started to implement the russian museum: virtual branch project, opening virtual branches in museums, universities, cultural centers, and institutions of additional education around the country. thanks to computer technology and digitization, thousands of russian residents in near and far places have access to the value of russian culture, russia’s historical and artistic past, and the richest collection of russian art. international business machines (ibm) collaborated with the hermitage museum to make it one of the most technologically advanced museums in the world. ibm built the state hermitage museum website in 1997, later called the “world’s best online museum” by national geographic traveler.39 the hermitage has unique experience in developing digitization programs and uploading collections to websites. currently, the museum collects more than 3 million items, and the online archives presented on its website provide easy search and the possibility of creating your own collection on the website. in 2020, the hermitage released a documentary feature film in virtual reality (vr) format, “vr— hermitage: immersion in history with konstantin khabenskiy” (https://www.khabenskiy.com/ filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/). visitors can tour the history of the hermitage in a vr format based on the most important events in the history of the hermitage from the eighteenth century to the present. the pushkin museum, the largest museum of european art in moscow, offers another example of using vr technology. the joy of museums offers virtual tours of more than 60,000 museums and historic sites around the world, including the pushkin museum (https://joyofmuseums.com/museums/russianfederation/moscow-museums/pushkin-museum/). virtual museums can display electronic versions of exhibits longer than actual museum exhibitions limited by region and time zone and have the means to record information about past exhibits, including electronic collections of exhibits, as well as data on opening times and concepts. for example, the website of the state tretyakov gallery contains a virtual archive of past exhibitions. therefore, the virtual museum has considerable research potential and is actively contributing to the preservation of cultural heritage. digital copies of the original culture and arts form an electronic archive of great value from two perspectives. this is the preservation of rarity for future generations, the broad access of users to the rarest and most unique artworks in historical significance, and the possibility of research. on the other hand, it is an opportunity to find commercial use of artifacts, additional sponsorship, and investment proposals for museums. conclusions and further study the two most obvious benefits of digitization are improved access and preservation, so that libraries, archives, and museums can represent russian culture and introduce rare and unique cultural heritage artifacts to future generations. in this work, we have addressed some legislative https://www.culture.ru/museums https://www.hermitagemuseum.org/ https://www.tretyakovgallery.ru/ http://en.rusmuseum.ru/ https://www.khabenskiy.com/filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/ https://www.khabenskiy.com/filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/ https://joyofmuseums.com/museums/russian-federation/moscow-museums/pushkin-museum/ https://joyofmuseums.com/museums/russian-federation/moscow-museums/pushkin-museum/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 14 principles and outlined major digitization projects. the general problem of digitization in russia is related to the size of works, the tendency of documents to decrease without high use, and the scatter of rare books nationwide. in the case of libraries, one of the problems of digitization is also related to the uneven distribution of rare books throughout the country. the most important materials are concentrated in the largest federal library, and many rare books are housed in many central libraries in various parts of the russian federation. work using book memorials should be planned as long-term activities performed at different levels. in the case of archives and museums, one of the major drawbacks of the digitization is the dismantling of national archives and cultural heritages. based on this preliminary study, there are several further research topics that can enhance understanding of digitization of cultural heritage in russia. in particular, since digitization is a complex process that requires both management and technology, future research needs to be divided into three aspects: management, technology, and content. endnotes 1 g. a. kruglikova, “use of information technologies in preservation and popularization of cultural heritage,” advances in social science, education and humanities research 437 (2020): 446–50. 2 g. m. shapovalova, “digital culture and digital heritage—doctrinal definitions in the field of culture at the stage of development of modern russian legislation. the territory of new opportunities” [in russian], the herald of vladivostok state university of economics and service 10, no. 4 (2018): 81–89. 3 l. a. pronina, “information technologies preserving cultural heritage. analytics of cultural studies,” 2008, https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohraneniikulturnogo-naslediya/viewer. 4 n. v. lopatina and o. p. neretin, “preservation of digital cultural heritage in a single electronic knowledge space,” bulletin mguki 5, no. 85 (2018): 74–80. 5 y. s. astakhova, “cultural heritage in the digital age. human in digital reality: technological risks,” materials of the v international scientific and practical conference (2020): 204–6. 6 m. a. miroshnichenko, y. v. shevchenko, and r. s. ohrimenko, “preservation of the historical heritage of state archives by digitalizing archive documents” [in russian], вестник академии знаний 37, no. 2 (2020): 188–94. 7 inna kizhner et al., “accessing russian culture online: the scope of digitization in museum s across russia,” digital scholarship in the humanities 19 (2019): 350–67, https://doi.org/10.1093/llc/fqy035. 8 s. d. lee, digital imaging: a practical handbook (new york: neal-schuman publishers, inc., 2001). 9 s. tanner and b. robinson, “the higher education digitisation service (heds): access in the future, preserving the past,” serials 11 (1998): 127–31; g. a. young, “technical advisory service for images (tasi),” 2003, http://www.jiscmail.ac.uk/files/newsletter/issue3_03/; https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohranenii-kulturnogo-naslediya/viewer https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohranenii-kulturnogo-naslediya/viewer https://doi.org/10.1093/llc/fqy035 http://www.jiscmail.ac.uk/files/newsletter/issue3_03/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 15 “preservation services,” harvard library, https://preservation.library.harvard.edu/digitization. 10 g. g. chowdhury and s. chowdhury, introduction to digital libraries (london: facet publishing, 2003), https://doi.org/10.1016/b978-1-84334-599-2.50006-4. 11 j. mcilwaine et al., “guidelines for digitization projects for collections and holdings in the public domain, particularly those held by libraries and archives” (draft) (unesco, march 2002), 6– 7, https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-andconservation/publications/digitization-projects-guidelines.pdf. 12 m. note, managing image collections: a practical guide (oxford: chandos publishing, 2011). 13 mcilwaine et al., “guidelines,” 51–52. 14 fundamentals of the legislation of the russian federation on culture, http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480 bb399b9a7e3dcc/. 15 decree of the president of the russian federation of december 24, 2014 no. 808, on approval of the fundamentals of state cultural policy, http://kremlin.ru/acts/bank/39208. 16 v. zvereva, ‘‘state propaganda and popular culture in the russian-speaking internet,” in freedom of expression in russia’s new mediasphere, ed. mariëlle wijermars and katja lehtisaari (routledge: abingdon, oxon, 2020), 225–47, https://doi.org/10.4324/9780429437205-12. 17 s. l. yablochnikov, m. n. mahiboroda, and o. v. pochekaeva, “information aspects in the field of modern public administration and law,” in 2020 international conference on engineering management of communication and technology (emctech), 1–5; u. chimittsyrenova, “a research proposal information society: copyright (presumption of access to the digital cultural heritage),” colloquium journal, no. 11-3 (2017): 22–24. голопристанський міськрайонний центр зайнятості = голопристанский районный центр занятости. 18 g. m. shapovalova, “information society: from digital archives to digital cultural heritage,” international research journal 5, no. 47 (2016): 177–81. 19 g. m. shapovalova, “the global information society changing the world: the copyright or the presumption of access to digital cultural heritage,” society: politics, economics, law, 2016. 20 s. b. lialkova and v. b. naumov, “the development of regulation of the protection of cultural heritage in the digital age: the experience of the european union,” информационное общество 1 (2020): 29–41. 21 e. kozlova, “russia’s digital cultural heritage in the legal deposit system,” slavic & east european information resources 12, no. 2-3 (2011): 188–91. 22 a. a. dzhigo, “preserving russia’s digital cultural heritage: acquisition of electronic documents in russian libraries and information centers,” slavic & east european information resources 14, no. 2-3 (2013): 219–23. https://preservation.library.harvard.edu/digitization https://doi.org/10.1016/b978-1-84334-599-2.50006-4 https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-and-conservation/publications/digitization-projects-guidelines.pdf https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-and-conservation/publications/digitization-projects-guidelines.pdf http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480bb399b9a7e3dcc/ http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480bb399b9a7e3dcc/ http://kremlin.ru/acts/bank/39208 https://www.worldcat.org/search?q=au=%22wijermars,%20marie%cc%88lle%22 https://doi.org/10.4324/9780429437205-12 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 16 23 y. y. yumasheva, “digitizing russian cultural heritage: normative and methodical regulation,” bulletin of the ural federal university humanitarian sciences 3, no. 117 (2013): 2–7. 24 g. a. kruglikova, “use of information technologies in preservation and popularization of cultural heritage,” advances in social science, education and humanities research 437 (2020): 446–50. 25 g. m. shapovalova, “the concept of digital cultural heritage and its genesis: theoretical and legal analysis, the territory of new opportunities” [in russian], the herald of vladivostok state university of economics and service 9, no. 4 (2017): 159–68. 26 a. annenkov, “national electronic library of russia: it’s not yet on fire, but the time to save it is now [in russian], http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyone-gorela-no-spasat-uzhe-pora.html. 27 saa dictionary of archives terminology. a “fonds” is the entire body of records of an organization, family, or individual that have been created and accumulated as the result of an organic process reflecting the functions of the creator. 28 airtable, https://airtable.com/shro1hise7wgurxg5/tblhdxawiv5avtn7y. 29 m. v. biryukova et al., “interdisciplinary aspects of digital preservation of cultural heritage in russia” [in russian], european journal of science and theology 13, no. 4 (2017): 149–60. 30 n. povroznik, “virtual museums and cultural heritage: challenges and solution,” https://www.researchgate.net/profile/nadezhdapovroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_ solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritagechallenges-and-solutions.pdf. 31 d. v. kondratyev et al., “problems of preservation of digital cultural heritage in the context of information security,” history and archives (2013): 36–51. 32 m. a. lapteva and n. o. pikov, “visualization technology in museum: from the experience of sibfu collaboration with the museums of russia,” journal of siberian federal university humanities & social sciences 7, no. 9 (2016): 1674–81. 33 g. a. evstigneeva, the ideology of digitization of library collections on the example of the russian national public library for science and technology, library collections: problems and solutions, 2014, http://www.gpntb.ru/ntb/ntb/2014/3/ntb_3_8_2014.pdf. 34 main directions of development of activities for the preservation of library collections in the russian federation for 2011–2020, https://kp.rsl.ru/assets/files/documents/maindirections.pdf. 35 g. m. shapovalova, “the concept of digital cultural heritage,” 159–68. 36 o. a. kolchenko and e. a. bryukhanova, “the main directions of archiving informatization in the context of electronic society development,” vestnik tomskogo gosudarstvennogo universiteta—tomsk state university journal 443 (2019): 114–18. http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyo-ne-gorela-no-spasat-uzhe-pora.html http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyo-ne-gorela-no-spasat-uzhe-pora.html https://airtable.com/shro1hise7wgurxg5/tblhdxawiv5avtn7y https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf http://www.gpntb.ru/ntb/ntb/2014/3/ntb_3_8_2014.pdf https://kp.rsl.ru/assets/files/documents/main-directions.pdf https://kp.rsl.ru/assets/files/documents/main-directions.pdf information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 17 37 g. p. nesgovorova, “modern information, communication and digital technologies in the preservation of cultural and scientific heritage and the development of museums: problems of intellectualization and quality of informatics systems” (2006): 153–61, https://www.iis.nsk.su/files/articles/sbor_kas_13_nesgovorova.pdf. 38 n. g. povroznik, “virtual museum: preservation and representation of historical and cultural heritage,” perm university bulletin 4, no. 31 (2015): 2013–21. 39 the preservation of culture through technology, https://www.ibm.com/ibm/history/ibm100/us/en/icons/preservation/ . https://www.iis.nsk.su/files/articles/sbor_kas_13_nesgovorova.pdf https://www.ibm.com/ibm/history/ibm100/us/en/icons/preservation/ abstract introduction benefits of digitization in a cultural heritage repository policies regulating digitization of cultural heritage in russia digitization projects of cultural heritage in russia findings digitization in russian libraries digitization in russian archives digitization in russian museums conclusions and further study endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. harvesting information from a library data warehouse su, siew-phek t;needamangala, ashwin information technology and libraries; mar 2000; 19, 1; proquest pg. 17 harvesting information from a library data warehouse data warehousing technology has been defined by john ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform. "1 this concept has been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. in the academic sector, several universities have developed data warehouses containing the universities'ftnancial, payroll, personnel, budget, and student data.2 these data warehouses across all industries and academia have met with varying degrees of success. data warehousing technology and its related issues have been widely discussed and published. 3 little has been done, however, on the application of this cutting edge technology in the library environment using library data. i motivation of project daniel boorstin, the former librarian of congress, mentions that "for most of western history, interpretation has far outrun data." 4 however, he points out "that modem tendency is quite the contrary, as we see data outrun meaning." his insights tie directly to many large organizations that long have been rich in data but poor in information and knowledge. library managers are increasingly finding the importance of obtaining a comprehensive and integrated view of the library operations and the services it provides. this view is helpful for the purpose of making decisions on the current operations and for their improvement. due to financial and human constraints for library support, library managers increasingly encounter the need to justify everything they dofor example, the library's operation budget. the most frustrating problem they face is knowing that the information needed is available somewhere in the ocean of data but there is no easy way to obtain it. for example, it is not easy to ascertain whether the materials of a certain subject area, which consumed a lot of financial resources for their acquisition and processing, are either frequently used (i.e., high rate of circulation), seldom used, or not used at all. or, whether they satisfy users' needs. another example, an analysis of the methods of acquisition (firm order vs. approval plan) together with the circulation rate could be used as a factor in deciding the best method of acquiring certain types of material. such information can play a pivotal role in performing collection development and library management more efficiently and effectively. unfortunately, the data needed to make these types of decisions are often scattered in different files maintained siew-phek t. su and ashwin needamangala by a large centralized system, such as notis, that does not provide a general querying facility or by different file/ data management or application systems. this situation makes it very difficult and time-consuming to extract useful information. this is precisely where data warehousing technology comes in. the goal of this research and development work is to apply data warehousing and data mining technologies in the development of a library decision support system (loss) to aid the library management's decision making. the first phase of this work is to establish a data warehouse by importing selected data from separately maintained files presently used in the george a. smathers libraries of the university of florida into a relational database system (microsoft access). data stored in the existing files were extracted, cleansed, aggregated, and transformed into the relational representation suitable for processing by the relational database management system. a graphical user interface (gui) is developed to allow decision makers to query for the data warehouse's contents using either some predefined queries or ad hoc queries. the second phase is to apply data mining techniques on the library data warehouse for knowledge discovery. this paper covers the first phase of this research and development work. our goal is to develop a general methodology and inexpensive software tools, which can be used by different functional units of a library to import data from different data sources and to tailor different warehouses to meet their local decision needs. for meeting this objective, we do not have to use a very large centralized database management system to establish a single very large data warehouse to support different uses. i local environment the university of florida libraries has a collection of more than two million titles, comprising over three million volumes. it shares a notis-based integrated system with nine other state university system (sus) libraries for acquiring, processing, circulating, and accessing its collection. all ten sus libraries are under the consortium umbrella of the florida center for library automation (fcla). siew-phekt. su (pheksu@mail.uflib.ufl.edu) is associate chair of the central bibliographic services section, resource services department, university of florida libraries, and ashwin needamangala (nsashwin@grove.ufl.edu) is a graduate student at the electrical and computer engineering department, university of florida. harvesting information from a library data warehouse i su and needamangala 17 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i library data sources the university of florida libraries' online database, luis, stores a wealth of data, such as bibliographic data (author, title, subject, publisher information), acquisitions data (price, order information, fund assignment), circulation data (charge out and browse information, withdrawn and inventory information), and owning location data (where item is shelved). these voluminous data are stored in separate files. the notis system as used by the university of florida does not provide a general querying facility for accessing data across different files. extracting any information needed by a decision maker has to be done by writing an application program to access and manipulate these files. this is a tedious task since many application programs would have to be written to meet the different information needs. the challenge of this project is to develop a general methodology and tools for extracting useful data and metadata from these disjointed files, and to bring them into a warehouse that is maintained by a database management system such as microsoft access. the selection of access and pc hardware for this project is motivated by cost consideration. we envision that multiple special purpose warehouses be established on multiple pc systems to provide decision support to different library units. the library decision support system (loss) is developed with the capability of handling and analyzing an established data warehouse. for testing our methodology and software system, we established a warehouse based on twenty thousand monograph titles acquired from our major monograph vendor. these titles were published by domestic u.s. publishers and have a high percentage of dlc/dlc records (titles cataloged by the library of congress). they were acquired by firm order and approval plan, the publication coverage is the calendar year 1996-1997. analysis is only on the first item record (future project will include all copy holdings). although the size of the test data used is small, it is sufficient to test our general methodology and the functionality of our software system. fcla d82 tables and key list most of the data from the twenty-thousand-title domain that go into the loss warehouse are obtained from the db2 tables maintained by fcla. fcla developed and maintains the database of a system called ad hoc report request over the web (arrow) to facilitate querying and generating reports on acquisitions activities . the data are stored in 0b2 tables. 5 for our research and development purpose, we needed db2 tables for only the twenty-thousand titles that we identified as our initial project domain. these titles all have an identifiable 035 field in the bibliographic records (zybp1996, zybcip1996, zybp1997 or zybpcip1997). we used the batchbam program developed by gary strawn of northwestern university library to extract and list the unique bibliographic record numbers in separate files for fcla to pick up. 6 using the unique bibliographic record numbers, fcla extracted the 0b2 tables from the arrow database and exported the data to text files. these text files then were transferred to our system using the file transfer protocol (frp) and inserted as tables into the loss warehouse. bibliographic and item records extraction fcla collects and stores complete acquisitions data from the order records as db2 tables. however, only brief bibliographic data and no item record data are available . bibliographic and item record data are essential for inclusion in the loss warehouse in order to create a viable integrated system capable of performing cross-file analysis and querying for the relationships among different types of data. because these required data do not exist in any computer readable form, we designed a method to obtain them. using the identical notis key lists to extract the targeted twenty-thousand bibliographic and item records, we applied a screen scraping technique to scrape the data from the screen and saved them in a flat file. we then wrote a program in microsoft visual basic to clean the scraped data and saved them as text-delimited files that are suitable for importing into the loss warehouse. screen scraping concept screen scraping is a process used to capture data from a host application. it is conventionally a three-part process: • displaying the host screen or data to be scraped. • finding the data to be captured. • capturing the data to a pc or host file, or using it in another windows application. in other words, we can capture particular data on the screen by providing the corresponding screen coordinates to the screen scraping program. numerous commercial applications for screen scraping are available on the market. however, we used an approach slightly different from the conventional one. although we had to capture only certain fields from the notis screen, there were other factors that we had to take into consideration. they are: • the location of the various fields with respect to the screen coordinates changes from record to record . this makes it impossible for us to lock a particular field with a corresponding screen coordinate. 18 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. • the data present on the screen are dynamic because we are working on a "live" database where data are frequently modified. for accurate query results, all the data, especially the item record data where the circulation transactions are housed, need to be captured within a specified time interval so that the data are uniform. this makes the time taken for capturing the data extremely important. • most of the fields present on the screen needed to be captured. taking the above factors into consideration, it was decided to capture the entire screen instead of scraping only certain parts of the screen. this made the process both simpler and faster . the unnecessary fields were filtered out during the cleanup process . i system architecture the architecture of the loss system is shown in figure 1 and is followed by a discussion on its components' functions. notis notis (northwestern online totally integrated system) was developed at the northwestern university library and introduced in 1970. since its inception, notis has undergone many versions. university of florida libraries is one of the earliest users of notis. fcla has made many local modifications of the notis system since uf libraries started using it. as a result, the uf notis is different from the rest of the notis world in many respects . notis can be broken down into four subsystems: • acquisitions • cataloging • circulation • online public access catalog (opac) at the university of florida libraries, the notis system runs on an ibm 370 main frame computer that runs the os/390 operating system . host explorer host explorer is a software program that provides a tcp /ip link to the main frame computer . it is a terminal emulation program supporting the ibm main frame, as/400, and vax hosts . host explorer delivers an enhanced user environment for all windows nt platforms, windows 95 and windows 3.x desktops. exact tn3270e, tn5250, vt420/320/220/101/100/52, wyse 50/60 and ansi-bbs display is extended to leverage the wealth of the windows desktop. it also supports all db2tables loss host explorer data cleansing and extraction warehouse graphical user interface figure 1. loss architecture and its components tcp /ip based tn3270 and tn3270e gateways. the host explorer program is used as the terminal emulation program in loss. it also provides vba compatible basic scripting tools for complete desktop macro development. users can run these macros directly or attach them to keyboard keys, toolbar buttons, and screen hotspots for additional productivity. the function of host explorer in the loss is v ery simple. it has to "visit" all screens in the notis system corresponding to each notis number present in the batchbam file, and capture all the data on the screens. in order to do this, we wrote a macro that read the notis number one at a time from the batchbam file and input the number into the command string of host explorer . the macro essentially performed the following functions: • read the notis numbers from the batchbam file. • inserted the notis number into the command string of host explorer . • toggled the screen capture option in host explorer so that data are scraped from the screen only at necessary times. • saved all the scraped data into a flat file. after the macro has been executed, all the data scraped from the notis screen reside in a flat file. the data present harvesting information from a library data warehouse i su and needamangala 19 reproduced with permission of the copyright owner. further reproduction prohibited without permission. in this file have to be cleansed in order to make them suitable for insertion into the library warehouse. a visual basic program is written to perform this function. the details of this program will be given in the next section. i data cleansing and extraction this component of the loss is written in the visual basic programming language. its main function is to cleanse the data that have been scraped from the notis screen. the visual basic code saves the cleansed data in a text-delimited format that is recognized by microsoft access. this file is then imported into the library warehouse maintained by microsoft access. the detailed working of the code that performs the cleansing operation is discussed below. the notis screen that comes up for each notis number has several parts that are critical to the working of the code. they are: • notis number present in the top-right of the screen (in this case, akr9234) • field numbers that have to be extracted. example: 010::, 035:: • delimiters. the " i " symbol is used as the delimiter throughout this code. for example, in the 260 field of a bibliographic record, "i a" delimits the place of publication, " i b" the name of the publisher and, "i c" the date of publication. we shall now go step by step through the cleansing process. initially we have the flat file containing all the data that have been scraped from the notis screens. • the entire list of notis numbers from the batchbam file is read into an array called bam_number$. • the file containing the data that have been scraped is read into a single string called bibrecord$. • this string is then parsed using the notis numbers from the bam_number$ array. • we now have a string that contains a single notis record. this string is called single_record$. • the program runs in a loop till all the records have been read. • each string is now broken down into several smaller strings based on the field numbers. each of these smaller strings contains data pertaining to the corresponding field number. • a considerable amount of the data present on the notis screen is unnecessary from the point of view of our project. we need only certain fields from the notis screen. but even from these fields we need the data only from certain delimiters. therefore, we now scan each of these smaller strings for a certain set of delimiters, which was predefined for each individual field. the data present in the other delimiters are discarded. • the data collected from the various fields and their corresponding delimiters are assigned to corresponding variables. some variables contain data from more than one delimiter concatenated together. the reason for this can be explained as follows. there are certain fields, which are present in the database only for informational purposes and will not be used as a criteria field in any query. since these fields will never be queried upon, they do not need to be cleansed as rigorously as the other fields and therefore, we can afford to leave the data of these fields as concatenated strings. example: the catalog_source field which has data from " i a" and " i c" is of the form " i a dlc i c dlc" while the lang code field which has data from "i a" and" i h" is of the form" i a eng i h rus." but we split this into two fields: lang_code_l containing "eng" and lang_code_2 containing "rus." • the data collected from the various fields are saved in a flat file in the text-delimited format. microsoft access recognizes this format. a screen dump of the text-delimited file, which is the end result of the cleansing operation, is shown in figure 2. the flat file, which we now have, can be imported into the library warehouse. i graphical user interface in order to ease the tasks of the user (i.e., the decision maker) to create the library warehouse and to query and analyze its contents, a graphical user interface tool has been developed. through the gui, the user can enact the following processes or operations through a main menu: • connection to notis • screen scraping • data cleansing and extracting • importing data • viewing collected data • querying • report generating the first option opens hostexplorer and provides a connection to notis. lt provides a shortcut to closing or minimizing ldss and opening hostexplorer. the screen scraping option activates the data scraping process. the data cleansing and extracting option filters out the unnecessary data fields and saves the cleansed data in a text-delimited format. the importing data option imports the data in the text-delimited format into the warehouse. the viewing collected data option allows the user to view the contents of a selected relational table stored in 20 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. "record humber","system control humber","catalogin source","language codes 1","language code~ "akr9234", "ybp1996 0507--clarr done", "a dlc i c dlc ", "1 : i a eng "," i h rus", "e-ur-ru", "306/. 0~ "rks6472", "ybp1996 0507--clrrr done"," a dlc i c dlc ", "1 : i a eng "," i h rus", "hull", "891. 73/ 44 "aks6493", "ybp1996 0507--clarr done"," a dlc i c dlc ","hull", "hull", "hull"," 001. 4/225/ 028563 i ~f "ajx7554", "ybp1996 05 08--clarr done"," a uk i c uk ","hull", "hull", "e-uk---", "362. 1 / 068 12 2 o",' "akb3478", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-fr---", "843/. 7 12 2 o", "t " "akc6442","ybp19960508--clarr done","a dlc c dlc ","1 : la eng ","lh ger","e-fr---","194 12 "ake9837", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "e-gr---", "883/. 01 12 20",' "akk9486", "ybp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "e-uk---", "822/. 052309 12 ~% l'akl2258", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-xr---", "929. 4/2/ 08992401 1• "akm2455", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-gx---", "943. 086 12 2 o",' "akm4649", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "hull", "863/ .64 i 2 20", "hu] ' "akh0246","ybp19960508--clarr done","a dlc c dlc ","hull","hull","n-us--la e-uk-en","700/. "akh181 o", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull" ,"hull", "e-uk---", "305. 6/2 042/ 0903.: "akh3749","ybp19960508--clarr done","a dlc c dlc ","hull","hull","f-ke--la f-so --","327.{ "akq727 4", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "hull", "355. 4/2 12 2 o", "hu] "akq9180", "y.bp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "n-us---", "23 0/. 93/ 09 12 2,f "akr 0424", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "n-us-mi", "331 . 88/1292/ 097' "rkr1411", "ybp1996 05 08--clarr done"," a cl i c cl ","hull", "hull", "n-us---", "3 05. 896/ 073 12 2 o' "akr1846", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e-uk-ni", "hull", "hull", "x, "akr2169", "ybp1996jt5 08--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-sc", "323. 1/196073/ 091 "akr2245" ,"ybp19960508--c .larr d.one" ," a dlc i c dlc ","hull", "hull", "hull", "306 .4/6 i 2 20", "hu1 "akr2255", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "3 03. 48/2 12 2 o", "2r "akr226 o", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-", "3 03. 48/2 12 2 o", "akr2281", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "t-----i a r------", "333. , · "akr2287", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "57 4. 5/262 12 2 o", "t "rkr2357", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "361 . 6/1 / 094 12 l "akr2358", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull" ,"hull", "333. 7/2/01 12 20" ,' ¥' "akr2371", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "3 07. 72/ 094 12 211 "akr2386", "ybp1996 05 08--clarr done", "dlc i c dlci", "hull" ,/'hull", "e-uk---", "hull", "hull", "xu, "rkr25 03", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "575. 1 / 09 12 2 o", "hl 'i-r---· ---------·----figure 2. a text-delimited file the warehouse. the querying option activates ldss's querying facility that provides wizards to guide the formulations of different types of queries, as discussed later in this article . the last option, report generating, is for the user to specify the report to be generated. i data mining tool a very important component of loss is the data mining tool for discovering association rules that specify the interrelationships of data stored in the warehouse. many data mining tools are now available in the commercial world. for our project, we are investigating the use of a neural-network-based data mining tool developed by limin fu of the university of florida.? the tool allows the discovery of association rules based on a set of training data provided to the tool. this part of our research and development work is still in progress . the existing gui and report generation facilities will be expanded to include the use of this mining tool. i library warehouse fcla exports the data existing in the 0b2 tables into text files. as a first step towards creating the database, these text files are transferred using ftp and form separate relational tables in the library warehouse. the data that harvesting information from a library data warehouse i su and needamangala 21 reproduced with permission of the copyright owner. further reproduction prohibited without permission. are scraped from the bibliographic and item record screens result in the formation of two more tables. characteristics data in the warehouse are snapshots of the original data files. only a subset of the data contents in these files are extracted for querying and analysis since not all the data are useful for a particular decision-making situation. data are filtered as they pass from the operational environment to the data warehouse environment. this filtering process is necessary particularly when a pc system, which has limited secondary storage and main memory space, is used. once extracted and stored in the warehouse, data are not updateable. they form a read-only database. however, different snapshots of the original files can be imported into the warehouse for querying and analysis. the results of the analyses of different snapshots can then be compared. structure data warehouses have a distinct structure. there are summarization and detail structures that demarcate a data warehouse. the structure of the library data warehouse is shown in figure 3. the different components of the library data warehouse as shown in figure 3 are: • notis and 0b2 tables. bibliographic and circulation data are obtained from notis through the screen scraping process and imported into the warehouse. fcla maintains acquisitions data in the form of db2 tables. these are also imported into the warehouse after conversion to a suitable format. • warehouse. the warehouse consists of several relational tables that are connected by means of relationships. the universal relation approach could have been used to implement the warehouse by using a single table. the argument for using the universal relation approach would be that all the collected data fall under the same domain. but let us examine why this approach would not have been suitable. the different data collected for import into the warehouse were bibliographic data, circulation data, order data, and pay data. now, if all these data were incorporated into one single table with many attributes, it would not be of any exceptional use since each set of attributes have their own unique meaning when grouped together as bibliographic table, circulation table, and so on. for example, if we group the circulation data and the pay data together in a single table, it would not make sense. however, the pay data and the circulation data are related through the bib_key. hence, our use of the conventional approach of havuser .....--------~----.----------......----=--___ bibliographic data view circulation data view ufbib, ufpay, ufinv, ufcirc, uford warehouse pay data view import screen scraping notis fcla db2 tables figure 3. structure of the library data warehouse ing several tables connected by means of relationships is more appropriate. • views. a view in sql terminology is a single table that is derived from other tables. these other tables could be base tables or previously defined views. a view does not necessarily exist in physical form; it is considered a virtual table, in contrast to base tables whose tables are actually stored in the database. in the context of the ldss, views can be implemented by means of the adhoc query wizard. the user can define a query /view using the wizard and save it for future use. the user can then define a query on this query i view. • summarization. the process of implementing views falls under the process of summarization. summarization provides the user with views, which make it easier for users to query on the data of their interests. as explained above, the specific warehouse we established consists of five tables. table names including "_wh" indicates that it contains current detailed data of the warehouse. current detailed data represents the most recent snapshot of data that has been taken from the notis system. the summarized views are derived from the current detailed data of the warehouse. since current detailed data of the warehouse are the basic data of the 22 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. application, only the current detailed data tables are shown in appendix a. i decision support by querying the warehouse the warehouse contains a set of integrated relational tables whose contents are linked by the common primary key, the bib_key (biblio_key). the data stored across these tables can be traver sed by matching the key values associated with their tuples or records . decision makers can issue all sorts of sql-type queries to retrieve useful information from the warehouse. two general types of queries can be distinguished : predefined queries and ad hoc queries . the former type refers to queries that are frequently used by decision makers for accessing information from different snapshots of data imported into the warehouse . the latter type refers to queries that are exploratory in nature. a decision maker suspects that there is some relationship between different types of data and issues a query to verify the existence of such a relationship. alternatively, data mining tools can be applied to analyze the data contents of the warehouse and discover rules of their relationships (or associations). predefined queries below are some sample queries posted in english. their corresponding sql queries can be processed using loss. l. number and percentage of approval titles circulated and noncirculated. 2. number and percentage of firm order titles circulated and noncirculated . 3. amount of financial resources spent on acquiring noncirculated titles. 4. number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles . 5. number and percentage of "shared" cataloging records in circulated and noncirculated titles. 6. numbers of original and "shared" cataloging records of noncirculated titles. 7. identify the broad subject areas of circulated and noncirculated titles . 8. identify titles that have been circulated "n" number of times and by subjects . 9. number of circulated titles without the 505 field. each of the above english queries can be realized by a number of sql queries. we shall use the first two english queries and their corresponding sql queries to explain how the data warehouse contents and the querying facility of microsoft access can be used to support decision making. the results of sql queries also are given . the first english query can be divided into two parts (see figure 4), each realized by a number of sql queries as shown below . sample query outputs query 1: number and percentage of approval titles circulated and noncirculated result : total approval titles circulated noncirculated 1172 980 192 83.76 % 16.24 % similar to the above sql queries, we can translate the second english query into a number of sql queries and the result is given below: query 2: number and percentage of firm order titles circulated and noncirculated result : total firm order titles circulated noncirculated report generation 1829 1302 527 71.18 % 28.82 % the results of the two predefined english queries can be presented to users in the form of a report. total titles 3001 approval 1172 39% circulated 980 83.76 % noncirculated 192 16.24 % firm order 1829 61% circulated 1302 71.18 % noncirculated 527 28 .82 % from the above report, we can ascertain that, though 39 percent of the titles were purchased through the approval plan and 61 percent through firm orders, the approval titles have a higher rate of circulation, 83.76 percent, as compared to firm order titles of 71.18 percent. it is important to note that the result of the above queries is taken from only one snapshot of the circulation data. analysis from several snapshots is needed in order to compare the results and arrive with reliable information. we now present a report on the financial resources spent on acquiring and processing noncirculated titles. in order to generate this report, we need the output of queries four and five listed earlier in this article. the corresponding outputs are shown below. query 4: number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles. harvesting information from a library data warehouse i su and needamangala 23 reproduced with permission of the copyright owner. further reproduction prohibited without permission. result: total dlc/dlc records circulated noncirculated 2852 2179 673 76.40% 23.60% query 5: number and percentage of "shared" cataloging records in circulated and noncirculated titles. result: total "shared" records circulated noncirculated 149 100 49 67.11% 32.89% in order to come up with the financial resources, we need to consider several factors, which contribute to the amount of financial resources spent. for the sake of simplicity, we consider only the following factors: 1. the cost of cataloging each item with dlc/dlc record approval titles circulated 2. the cost of cataloging each item with shared record 3. the average price of noncirculated books 4. the average pages of noncirculated books 5. the value of shelf space per centimeter because the value of the above factors differs from institution to institution and might change according to more efficient workflow and better equipment used, users are required to fill in the value for factors 1, 2, and 5. loss can compute factors 3 and 4. the financial report , taking into consideration the value of the above factors, could be as shown below. processing cost of each dlc title = $10.00 673 x $10.00 = $ 6,730.00 processing cost of each shared title = $20.00 sql query t.o retrieve the distinct bibliographic keys of all the approval titles: select distinct bibscreen.bib_key from bibscreen right join pa yl on bibscreen.bib_key = pa y l.bib_num where (((payl.fund_key) like "*07*")); sql query to count the number of approval titles that have been circulated: select count (appr_title.bib_key) as countofbib_key from (bibscreen inner join appr_title on bibscreen.bib_key = appr _title.bib_key) inner join itemscreen on bibscreen.bib_key = itemscreen .biblio_key where (((itemscreen.charges)>0)) order by count(appr _title.bib_key); sql query to calculate the percentage: select cnt_appr_ti tle_circ.countofbib_ke y, int(([cnt_appr_titl e_circ]![countofbib _key])*lo0/ count([bibscreen)![bib_key])) as percent_apprcirc from bibscreen, cnt_appr_title _circ group by cnt _appr _title_circ.countofbib _key; approval titles noncirculated sql query for counting the number of approval titles that have not been circulated: select distinct count(appr_title.bib_key) as countofbib_ke y from (appr _title inner join bibscreen on appr_title.bib_key bibscreen.bib_key) inner join itemscreen on bibscreen .bib_key = itemscreen.biblio_ke y where ( ( (itemscreen.charges)=0) ); sql query to calculate the percentage: select cnt_appr_title_noncirc.countofbib_ke y, int(([cnt_appr_title_noncirc)![countofbib_ke y])*lo0/ count([bibscreen]! [bib _key]))) as percent_appr _noncirc from bibscreen, cnt_appr _title_noncirc group by cnt_appr_title_noncirc .countofbib_ke y; figure 4. example of an english query divided into two parts 24 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 49 x $20.00 = $ 980.00 average price paid per noncirculated item = $48.00 722 x $48.00 = $34,656.00 average size of book = 288 pages = 3 cm average cost of 1 cm of shelf space= $0.10 722 x $0.30 = $216.60 grand total = $42,582.60 again it is important to point out that several snapshots of the circulation data have to be taken to track and compare the different analyses before deriving the reliable information. ad hoc queries alternately, if the user wishes to issue a query that has not been predefined, the ad hoc query wizard can be used. the following example illustrates the use of the ad hoc query wizard. assume the sample query is: how many circulated titles in the english subject area cost more than $35? we now take you on a walk-through of the adhoc query wizard starting from the first step till the output is obtained. figure 4 depicts step 1 of the ad hoc query wizard. the sample query mentioned above requires the following fields: • biblio_key for a count of all the titles which satisfy the given condition. • charges to specify the criteria of "circulated title". • fund_key to specify all titles under the "english" subject area. • paid_amt to specify all titles which cost more than $35. step 2 of the ad hoc query wizard (figure 5) allows the user to specify criteria and thereby narrow the search domain. step 3 (figure 6) allows the user to specify any mathematical operations or aggregation functions to be performed. step 4 (figure 7) displays the user-defined query in sql form and allows the user to save the query for future reuse. the output of the query is shown below in figure 8. the figure shows the number of circulated titles in the english subject area that cost more than $35. alternatively, the user might wish to obtain a listing of these 33 titles. figure 9 shows the listing. i conclusion in this article, we presented the design and development of a library decision support system based on data warehousing and data mining concepts and techniques. we described the functions of the components of loss. the screen scraping and data cleansing and extraction figure 4. step 1: ad hoc query wizard ~ e.9,~lang__;c~,tfe ... 1 lik~ "'ft,f" j.esi: !han eg,. crfi;irget t 4 gr~er th'jn·eii, q:,arges,> 0 equal tci'e_g_cfiarge~= !1 not . . figure 5. step 2: ad hoc query wizard harvesting information from a library data warehouse i su and needamangala 25 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 6. step three : ad hoc query wizard figure 7. step four: ad hoc query wizard figure 8. query output figure 9. listing of query output 26 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. processes were described in detail. the process of importing data stored in luis as separate data files into the library data warehouse was also described. the data contents of the warehouse can provide a very rich information source to aid the library management in decision making. using the implemented system, a decision maker can use the gui to establish the warehouse, and to activate the querying facility provided by microsoft access to explore the warehouse contents . many types of queries can be formulated and issued against the database. experimental results indicate that the system is effective and can provide pertinent information for aiding the library management in making decisions. we have fully tested the implemented system using a small sample database . our on going work includes the expansion of the database size and the inclusion of a data mining component for association rule discovery. extensions of the existing gui and report generation facilities to accommodate data mining needs are expected. i acknowledgments we would like to thank professor stanley su for his support and advice on the technical aspect of this project. we would also like to thank donna alsbury for providing us with the 0b2 data, daniel cromwell for loading the 0b2 files and along with nancy williams and tim hartigan for their helpful comments and valuable discussions on this project. references and notes 1. john ladley , "operational data stores: building an effective strategy, " data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997). 2. information on har vard university's adapt proj ect. accessed march 8, 2000, www.adapt.harvard .edu/; information on the arizona state university data administration and institutional analysis warehou se. accessed march 8, 2000, www .asu .edu / data_admin / wh-1.html; information on the university of minnesota clarity project. accessed march 8, 2000,www.clarity.umn .edu/; information on the uc san diego darwin project. accessed march 8, 2000, www.act .ucsd .edu/ dw i darwin.html; information on university of wisconsinmadison infoaccess . accessed march 8, 2000, http :/ / wiscinfo. doit.wisc .edu/infoac cess /; information on the univer sity of nebraska data warehouse-nulook. accessed march 8, 2000, www .nulook.uneb.edu /. 3. ramon barquin and herbert edelstein, eds ., building, using, and managing the data warehouse (englewood cliffs, n .j.: prentice hall , 1997); ramon barquin and herbert edelstein, eds ., planning and designing the data warehouse (upper saddle river, n.j .: prentice hall, 1996); joyce bischoff and ted alexander, data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997); jeff byard and donovan schneider, "the ins and outs (and everything in between) of data war ehousing ," acm sigmod 1996 tutorial notes, may 1996. accessed march 8, 2000, www .redbrick.com / product s/ white / pdf/sigmod96.pdf ; surajit chaudhuri and umesh dayal, "an overview of data warehousing and olap technolog ," acm sigmod record 26(1), march 1997. accessed march 8, 2000, www.acm.org/sigmod / record/issue s/ 9703/ chaudhuri .ps ; b. devlin , data warehouse: from architecture to implementation (reading, mass.: addison-wesle y, 1997); u. fayyad and others, eds ., advances in knowledge discovery and data mining (cambridge, mass.: the mit pr., 1996); joachim hammer, "data war ehousing overview, terminology, and research issues." accessed march 8, 2000, www.cise.ufl .edu/ -jhammer / classes / wh-seminar / overview / index .htm ; w. h. inmon, building the data warehouse (new york, n.y.: john wiley, 1996); ralph kimball , "dangerous preconceptions." accessed march 8, 2000, www .dbmsmag.com/9608d05.html ; ralph kimball , the data warehouse toolkit (new york, n.y.: john wiley, 1996); ralph kimball, "mastering data extraction," in dbms magazine, june 1996. (provides an overview of the process of extracting , cleaning, and loading data .) accessed march 8, 2000, www .dbmsmag.com / 9606d05 .html ; alberto mendelzon , "bibliography on data warehousing and olap." accessed march 8, 2000, www.cs.toronto.edu/-mendel/dwbib.html. 4. daniel j. boorstin, "the age of negative discovery," cleopatra's nose: essays on the unexpected (new york: random hous e, 1994). 5. information on the arrow system . accessed march 8, 2000,www . fcla.edu /s ystem/intro_arrow.html. 6. gary strawn, "batchbaming." accessed march 8, 2000, http:/ /web .uflib.ufl .edu/rs/rsd/batchbam .html. 7. li-min fu, "oomrul: leaming the domain rules ." accessed march 8, 2000, www .cise.ufl .edu / -fu / domrul.html. harvesting information from a library data warehouse i su and needamangala 27 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a warehouse data tables ufcirc_wh uford _wh ufpay_wh attribute domain attribute domain attribute domain bib_key text(s0) id autonumber inv_key text(20) status text(20) ord_num text(20) ord_num text(20) enum / chron text(20) ord_div number ord_div number midspine text(20) process_uni t text(20) process _unit text(20) temp_locatn text(20) bib_num text(20) bib_key text(20) pieces number order_da te da te / time ord_seq_num number ch arges number mod_date date / time inv_seq_num number last_use date / tune vendor_code text(20) status text(20) browse s number vndadr_order text(20 create_ date da te / tune value text(20) vndadr_claim text(20) lst_update da te / time invnt_date date / time vndadr_retum text(20) currency text(20) created date / time vend_ title_n um text(20) paid_am t num ber ord_unit text(20) usd_amt n u mber rcv_unit text(20) fund_key text(20) ufinv_wh ord_scope text(20 exp_class text(20) pur_ord_prod text(20) fiscal_year text(20) attribute domain action _int number copies number inv_key text(20) libspecl text(20) type_pay text(lo) create _dat e date / time libspec2 text(20) text text(20) mod_date date / time vend_note text(20) db2_11mestamp date / time approv _stat text(20) ord_note text(20) vend_adr _code text(20) source text(20) vend_code text(20) ref text(20) ufbib_wh action_date text(20) copyctl _num number attribute domain vend_inv _date date/tune mediu m text(20) approval_date date / tune piece_cnt n umber bib_key text(20) appro ver_id text(20) div_no te text(20) system_control _num text(s0) vend_inv _num text(20) acr_stat text(20) ca talog_source text(20) inv_tot number rel_stat text(20) lan g_code_l text(20) cale_ tot_rym ts num ber lst_date date / time lang_code_2 text(20) calc_net _tot_pymts number action_date text(20) geo_code text(20) currency text(20) libspec3 text(20) dewey_num text(20) discount_percen t number libspec4 text(20) edition text(20) vouch_no te text(20) encum b_units number pagina tion text(20) official_ vend text(20) currency text(20) size text(20) process _unit text(20) est_price number series_440 text(20) intemal_note text(20) encumb_outs num ber series_490 text(20) db2_ timestamp text(20) fund _key text(20) conten t text(20) fiscal_ year text(20) subject_l text(20) copies n u mber subject_2 text(20) xpay_method text(20) subject_3 text(20) vol_isu_date text(20) authors_l text(20) title_author text(20) au thors_2 text(20) db2_ timestamp date / time au th ors_3 text(20) series text(20) 28 information technology and libraries i march 2000 september_ital_ozeran_for_proofing managing metadata for philatelic materials megan ozeran information technology and libraries | september 2017 7 abstract stamp collectors frequently donate their stamps to cultural heritage institutions. as digitization becomes more prevalent for other kinds of materials, it is worth exploring how cultural heritage institutions are digitizing their philatelic materials. this paper begins with a review of the literature about the purpose of metadata, current metadata standards, and metadata that are relevant to philatelists. the paper then examines the digital philatelic collections of four large cultural heritage institutions, discussing the metadata standards and elements employed by these institutions. the paper concludes with a recommendation to create international standards that describe metadata management explicitly for philatelic materials. introduction postage stamps have existed since great britain introduced them in 1840 as a way to prepay postage. historian and professor winthrop boggs (1955) points out that postage stamps have been collected by individuals since 1841, just a few months after the first stamps were issued (5). to describe this collection and research, the term philately was coined by a french stamp collector, georges herpin, who “combined two greek words philos (friend, amateur) and atelia (free, exempt from any charge or tax, franked)” (boggs 1955, 7). thus postage stamps and related materials, such as the envelopes to which they have been affixed, are considered philatelic materials. in the united states, numerous societies have formed around philately, such as the american philatelic society, the postal history society, the precancel stamp society, and the sacramento philatelic society (in northern california). the definitive united states authority on stamps and stamp collecting for nearly 150 years has been the scott postage stamp catalogue, which was first created by john walter scott in 1867 (boggs 1955, 6). the scott catalogue “lists nearly all the postage stamps issued by every country of the world” (american philatelic society 2016). philately is a massively popular hobby, and cultural heritage institutions have amassed large collections of postage stamps through collectors’ donations. in this paper, i will examine how cultural heritage institutions apply metadata to postage stamps in their digital collections. libraries, archives, and museums have obtained specialized collections of stamps over the decades, and they have used various ways to describe these collections, such as through creating finding aids. only recently have institutions begun to digitize their stamp collections and make the collections available for online review, as digitization in general has become more common in cultural heritage institutions. megan ozeran (megan.ozeran@gmail.com), a recent mlis degree graduate from san jose state university school of information, is winner of the 2017 lita/ex libris student writing award. managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 8 problem statement textual materials have received much attention in regards to digitization, including the creation and implementation of metadata standards and schemas. philatelic materials are not like textual materials, and are not even like photographic materials, which have also received some digitization attention. in fact, there is very little literature that currently exists describing how metadata is or should be applied to philatelic materials, even though digital collections of these materials already exist. therefore, the goal of this paper is to examine exactly how metadata is applied to digital collections of philatelic materials. several related questions drove the research about this topic: as institutions digitize stamp collections, what metadata schema(s) are they using to do so? are current metadata standards and schemas appropriate for these collections, or have institutions created localized versions? what metadata elements are most crucial in describing philatelic materials to enhance access in a digital collection? literature review while there is abundant literature regarding the use of metadata for library, archives, and museum collections, there is a dearth of literature that specifically discusses the use of metadata for philatelic materials. indeed, there is no literature at all that analyzes best practices for philatelic metadata, despite the fact that several large institutions have already created digital stamp collections. even among the many metadata standards that have been created, very few specify metadata guidelines for philatelic collections. it is clear that philatelic collections have not been highlighted in discussions over the last few decades about digitization, so best practices must be inferred based on the more general discussions that have taken place. the purpose and quality of metadata when considering why metadata is important to digital collections (of any type), it is crucial to remember, as david bade (2008) puts it, “users of the library do not need bibliographic records at all. . .. what they want is to find what they are looking for” (125). in other words, the descriptive metadata in a digital record is important only to the extent that it facilitates the discovery of materials that are useful to a researcher. as arms and arms (2004) point out, “most searching and browsing is done by the end users themselves. information discovery services can no longer assume that users are trained in the nuances of cataloging standards and complex search syntaxes” (236). echoing these sentiments, chan and zeng (2006) write, “users should not have to know or understand the methods used to describe and represent the contents of the digital collection” (under “introduction”). when creating digital records, then, institutions need to consider how the creation, display, and organization of metadata (especially within the search system) make it easier or more difficult for those end users to effectively search the digital collection. how effective metadata is in facilitating user research is ultimately dependent upon the quality of that metadata. bade (2007) notes that the information systems are essentially a way for an institution to communicate with researchers, and that this communication is only effective if metadata creators understand what the end users are looking for in the content and style of information technology and libraries | september 2017 9 communication (3-4). thus, in somewhat circular fashion, metadata quality is dependent upon understanding how best to communicate with end users. to help define discussions of metadata quality, bruce and hillmann (2004) suggest seven factors to consider: “completeness, accuracy, provenance, conformance to expectations, logical consistency and coherence, timeliness, and accessibility” (243). deciding how to prioritize one or several factors over the others will depend on the resources and goals of the institution, as well as the ultimate needs of the end users. the state of standards standards are created by various organizations to define the rules for applying metadata to certain materials in certain settings. standards generally describe a metadata schema, “a formal structure designed to identify the knowledge structure of a given discipline and to link that structure to the information of the discipline through the creation of an information system that will assist the identification, discovery and use of information within that discipline” (cc:da 2000, under “charge #3”). essentially, a metadata schema standard demonstrates how best to organize and identify materials to enhance discovery and use of those materials. such standards are helpful to catalogers and digitizers because they define rules for how to include content, how represent content, and/or what the allowable content values are (chan and zeng 2006, under “metadata schema”). unfortunately, very few current metadata standards even mention philatelic materials, despite their unique nature. the only standard that appears to do so with any real purpose is the canadian rules for archival description (rad), created by the bureau of canadian archivists in 1990, and revised in 2008. thirteen chapters comprise the first part of the rad, and these chapters describe the standards for a variety of media. philatelic materials are given their own focus in chapter 12, which discusses general rules for philatelic description as well as specifics for each of nine areas of description: title and statement of responsibility, edition, issue data, dates of creation and publication, physical description, publisher’s series, archival description, note, and standard number. the rad therefore provides a decent set of guidelines for describing philatelic materials. the encoded archival description tag library created by the society of american archivists (ead3, updated in 2015) mentions philatelic materials only in passing. there is no specific section discussing how to properly apply descriptive metadata to philatelic materials. the single mention of such materials in the entire ead3 documentation is in the discussion of the tag, where it is noted that “jurisdictional and denominational data for philatelic records” (257) may be recorded. other standards don’t appear to mention philatelic materials at all, so implementers of those standards must extrapolate based on the general information provided. for example, describing archives: a content standard (dacs), also published by the society of american archivists (2013), does not discuss philatelic materials in any way. it does note, “different media of course require different rules to describe their particular characteristics…” (xvii), but the recommendations for specific content standards for different media listed in appendix b still leave out philately (141142). institutions using dacs for philatelic materials need to determine how to localize the standard. although marc similarly does not include specific guidelines for philatelic materials, peter roberts (2007) suggests ways to effectively use it for cataloging philatelic materials. for managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 10 instance, in the marc 655 field he suggests using the getty art and architecture thesaurus terms to describe the form of the materials and the library of congress subject headings to describe the subjects (genres) of the materials (86-87). in similar ways, most standards could potentially be applied to philatelic materials if an institution were to provide additional local rules for how to best implement the standard. the metadata that philatelists want there are actually a good number of resources for determining what metadata is important to philatelic researchers. boggs (1955) suggests that a philatelist may want to “study the methods of production; the origin, selection, and the subject matter of designs; their relation to the social, political and economic history of the country of issue; the history of the postal service which issued them” (1-2). these few initial research suggestions can provide some insight into what metadata elements would be most useful in a digital record. david straight (1994) suggests the most basic crucial items are the date and country of issue for an item (75). roberts (2007) provides significant background about philatelic materials and research, and indicates multiple metadata elements that will be helpful for researchers. he reiterates that dates are extremely useful, and are often identified on the materials themselves; when specific dates are not visible, a stamp itself may provide evidence of an approximate year based on when the stamp was issued (75). he notes that many of the postal markings also “indicate the time and place of origin, route, destination, and mode of transportation” (78), which will also be of interest to philatelic researchers. if any information is available about the original collector, dealer, or exhibitor of the stamp before it was acquired by a cultural heritage institution, this may also be of great interest to a researcher (81). roberts also suggests that the finding aids for philatelic collections are more crucial places for description than for specific item records, and that controlled vocabulary subject terms are important in these descriptions (86). because the scott postage stamp catalogue is the leading united states authority on stamps, it can also suggest the metadata elements that primarily concern philatelic researchers. each listing includes a unique scott number, paper color, variety (e.g., perforation differences), basic information, denomination, color of the stamp, year of issue, value used/unused, any changes in the basic set information, and the total value of the set (scott publishing co. 2014, 14a). the scott catalogue also describes a variety of additional components that researchers may be interested in, including the type of paper used, any watermarks, inks used, separation type, printing process used, luminescence, and gum condition (19a-25a). one additional interesting source for deciding what metadata is important to researchers (aside from directly surveying them, of course) is a piece of software that was created to help philatelists catalog their own private collections. stampmanage is available in united states and international versions, and it is largely based on the scott postage stamp catalogue in creating the full listing of stamps that may be available to a collector. it includes a wide variety of metadata elements for cataloging stamps, such as the scott number, country of origin, date of issue, location of issue, type of stamp, denomination, condition, color, brief description, presence and type of perforations, category, plate block size, mint sheet size, paper type, presence and type of watermark, gum type, and so forth (liberty street software 2016). as a product that is sold to stamp collectors, information technology and libraries | september 2017 11 stampmanage is likely to have a confident grasp of all the metadata that could possibly be important to its customers. this literature review helps create a holistic view of the issues faced by cultural heritage institutions with digitized stamp collections. although little progress has been made in the literature to describe how best to apply metadata to philatelic materials, there are ways that institutions can extrapolate guidelines from the literature that does exist. methodology to explore my research questions, i interviewed (over email) representatives of several large institutions with digitized stamp collections. the information provided by these institutions sheds light on the current state of metadata and metadata schemas for philatelic collections. note that there are other institutions with online collections of postage stamps that are not discussed in this paper (e.g., the swedish postal museum, https://digitaltmuseum.se/owners/s-pm). due to my own language limitations, this paper is limited to analysis of online collections that are described in english. additional research into institutions with non-english displays would support greater analysis of how cultural heritage institutions are currently creating and providing philatelic metadata. results smithsonian national postal museum in the united states, the largest publicly accessible digital collection of philatelic materials is from the smithsonian national postal museum. i discussed the metadata for this collection with elizabeth heydt, collections manager at the museum. ms. heydt stated that the stamps are primarily identified “by their country and their scott number” (e. heydt, pers. comm., october 5, 2016). for digital collections, the smithsonian national postal museum uses a gallery systems database called the museum system, which includes the getty art and architecture thesaurus as an embedded thesaurus. ms. heydt noted that aside from this embedded thesaurus, they “do not use any additional, formalized data standards such as the dublin core, mods,” or the like. of note, the museum system does allege compliance with “standards including spectrum, cco, cdwa, dacs, chin, lido, xmp, and other international standards” (gallery systems 2015, 4). the end user interface that pulls data from the museum system is called arago, which has “an internal structure that built on the scott catalogue system and some internal choices for grouping and classifying objects for the philatelic and the postal history collections.” users can search and browse the entire digital collection through arago, but ms. heydt did note that arago “is in stasis right now as we are in the planning stages for an updated version sometime in the near future.” based on an example record (http://arago.si.edu/record_145471_img_1.html), the descriptive metadata currently available for end users include a title, scott number, detailed description (including keywords), date of issue, medium, museum id (a unique identifier), and place of origin. digital images of the stamps are also included. a set of “breadcrumb” links at the top of the page also allow a user to browse each level of the digital collection, from an individual stamp record up to the entire museum collection as a whole. managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 12 library and archives canada i discussed the library and archives canada (lac) online philatelic collection with james bone, archivist at the lac. he explained that the philatelic collection has had a complicated history: our philatelic collection largely began with the dissolution of the national postal museum … in 1989 and the subsequent division and transfer of its collection to the canadian postal museum for artifacts/objects at the former canadian museum of civilization (now the canadian museum of history) and to the canadian postal archives at the former national archives (which was merged with the national library in the mid-2000s to create library and archives canada). as a side note, both the canadian postal museum and the canadian postal archives are themselves now defunct – although lac still acquires philatelic records and records related to philately and postal administration, these functions are no longer handled by a dedicated section but rather by archivists within our government records branch and our private records branch (the latter being me). (j. bone, pers. comm., october 11, 2016) regarding the collection’s metadata, mr. bone confirmed that the archival records at the lac all conform to the rad standard (discussed in the literature review above), and that philatelic materials are all given “at least a minimum level of useful file level or item level description for philatelic records based on chapter 12 of rad,” the chapter that specifically discusses philatelic materials. unfortunately, to his knowledge, the online database for these records does not use a common metadata standard such as oai-pmh that enables “external metadata harvesting or querying,” so the system is not searchable outside of the lac website. mr. bone also pointed out that there are fields visible on the back end of the lac online database that are not visible to end users, and the most notable of these omissions is the scott number (the number assigned to every stamp by the scott catalogue). he wrote that it seemed “bizarre” to not have the scott number visible, “as that’s definitely an access point that i would expect philatelic researchers to use to narrow down a result set to the postage stamp issue of interest.” however, it appears this invisibility was a decision consciously made by the lac, based on mr. bone’s review of an internal lac standards document. based on an example record (http://collectionscanada.gc.ca/pam_archives/index.php?fuseaction=genitem.displayitem&lang= eng&rec_nbr=2184475) the following fields are available for end users to view: title, place of origin, denomination, date of issue, title of the collection of which it is a part, extent of item, language, access conditions, terms of use, mikan number (a unique identifier), itemlev number (deprecated), and any additional relevant information such as previous exhibitions of the physical item. the postal museum the postal museum in london is set to open its physical doors in 2017, but much of the collection is already available for browsing and searching online. stuart aitken, curator, philately, explained to me that the online collection uses the general international standard archival description, second edition, as the primary metadata schema, but the online collection also includes “non isad(g) fields for certain extra-specific data for our archive material, including philatelic material” (s. aitken, pers. comm., december 1, 2016). based on my own review of the isad(g) standards information technology and libraries | september 2017 13 document (international council on archives 1999) and an example record from the postal museum’s online collection (http://catalogue.postalmuseum.org/collections/getrecord/gb813_p_150_06_02_011_01_001#cu rrent), it appears nearly all the fields are based on the isad(g) standards. these fields include information such as date, level of description, extent of item, language, description, and conditions for access and reproduction. only the field for “philatelic number” appears to be extra. there may be additional non-isad(g) fields that are not included in the example record above, but are included in other records when the extra information is available and relevant. each digital record also allows end users to submit tags for help with identification and search. no tags were already submitted on the example record reviewed above, but this is likely because the online collection is still rather new. of note, digital records are created at each archival level, from the broadest collection category down to the individual item (similar to the smithsonian national postal museum collection). to provide an additional way to browse the collection, a sidebar in each digital record shows where it exists in the hierarchy of collections and provides links to each broader collection of which the current record is a part. the british museum i reached out to the folks at the british museum to discuss the application of metadata to their online records for postage stamps, but at the time of this writing i have not received any response. however, some information can be gleaned from examining the website. unlike the other institutions reviewed in this paper, the british museum’s online collection includes a wide variety of objects. postage stamps are therefore identified in the online collection by specifying “postagestamp” in the “object type” field, which likely uses a controlled vocabulary. based on an example record (http://www.britishmuseum.org/research/collection_online/collection_object_details.aspx?objec tid=1102502&partid=1&searchtext=postage+stamp&page=1), each record for a postage stamp lists the museum number (a unique identifier), denomination, description, date issued, country of origin, materials, dimensions, acquisition name and date, department, and registration number (which appears to be the same as the museum number). digital images of the stamps are occasionally included. the collection website notes that the british museum is “continuing every day to improve the information recorded in it [the digital collection] and changes are being fed through on a regular basis. in many cases it does not yet represent the best available knowledge about the objects” (trustees of the british museum 2016a, under “about these records”). therefore, end users are encouraged to read the information in any given record with care, and to provide feedback if they have any additional information or corrections about an object. the online collection also is offered in machine-readable format, via linked data and sparql, to encourage wider accessibility and use. the website advises, the use of the w3c open data standard, rdf, allows the museum's collection data to join and relate to a growing body of linked data published by other organisations around the world interested in promoting accessibility and collaboration. the data has also been organised using the cidoc crm (conceptual reference model) crucial for harmonising managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 14 with other cultural heritage data. the cidoc crm represents british museum's data completely and, unlike other standards that fit data into a common set of data fields, all of the meaning contained in the museum's source data is retained. (trustees of the british museum 2016b) each digital object has rdf and html resources, as well as a sparql endpoint with an html user interface. discussion the information from the four institutions above provides a starting point for examining best practices for philatelic metadata. in the following discussion, i will review the information in light of the research questions: important metadata elements, the standards that were implemented, and whether the standards that currently exist have been sufficient. as explained in the literature review above, relevant metadata are crucial for enhancing end user research of digital records. this suggests that similarity of metadata across collections of the same type will improve users’ ability to conduct their research. unfortunately, there are only a few descriptive metadata fields used across all four of the institutions reviewed in this paper. these fields include a title (sometimes used very loosely), the date of issue, the place of issue, a description, and a unique identifier. these fields certainly seem to be the absolute minimum necessary for identifying (and searching for) a postage stamp, since they are among the fields discussed in the literature review as being important to philatelic researchers. other fields that are included in some but not all of the above collections, such as stamp denomination and access conditions, are nonetheless quite relevant to online collections of postage stamps. interestingly, although the scott catalogue is recognized as a premier stamp catalogue, only one institution (the smithsonian national postal museum) currently uses the scott identification number as part of the standard philatelic metadata. as noted above, the library and archives canada does include the scott number in the behind-the-scenes metadata, but does it not display the scott number to end users. the postal museum and the british museum don’t use the scott number at all. it appears that only the smithsonian believes the scott number is useful to end users, either for search or identification purposes. of the four institutions, it appears that only the british museum uses metadata standards that increase the accessibility of the online collection beyond its own website. the implementation of rdf for linked data creates an open collection that is machine-readable beyond the internal database used by the museum. the smithsonian national postal museum, library and archives canada, and the postal museum do not appear to use any similar metadata standard for data harvesting or transmission, which means that these collections can only be searched from within their respective websites. the most important thing to note in reviewing the online collections for these four institutions is the fact that each institution uses different standards to apply metadata in a different way. frankly, this is not a surprise. as discussed in the literature review above, although metadata standards exist for a variety of materials, philatelic materials are simply not considered. only the canadian rules for archival description explicitly include information about philatelic materials; information technology and libraries | september 2017 15 accordingly, the library and archives canada utilizes these rules when creating its online records of postage stamps. no similar standard exists in the united states or internationally, leaving individual institutions with the task of deciding what generic metadata standard to use as a jumping off point, and then modifying it to meet local needs. as described above, the smithsonian national postal museum uses the metadata schema that comes with their collection management software, and has created an end-user interface based off of internal metadata decisions. the postal museum based their metadata primarily off of isad(g), an international metadata standard with no specific suggestions for philatelic materials. i was unable to confirm the base metadata schema the british museum employs, although it is clear they use rdf to make the collection’s digital records more widely available. each institution appears to be using a different base metadata standard, essentially requiring them to reinvent the wheel upon deciding to digitize philatelic materials. this is what happens when there is no single, unified standard available for the type of material being described. conclusion as this paper has shown, metadata standards are sorely lacking when it comes to philatelic materials. other kinds of materials have received special considerations because more and more institutions decided it would be important to digitize them, so various groups came together to create standards that provide some guidance. it is time for this to happen for philatelic materials as well. there aren’t many cultural heritage institutions that currently manage digital collections of philatelic materials, so this is an opportunity for those who plan to digitize their collections to consider what has been done and what makes sense to pursue. it is clear that philatelic digitization is still nascent, but as with other kinds of materials, it is only likely that more and more institutions will attempt digitization projects. it is hoped that this paper can serve as a jumping off point for institutions to discuss the creation of international metadata standards specifically for philatelic materials. acknowledgements many thanks are owed to the people who took time out of their very busy lives to respond to the unrefined inquiries of an mlis grad student: stuart aitken (curator, philately, the postal museum); james bone (archivist, private archives branch, library and archives canada); and elizabeth heydt (collections manager, smithsonian national postal museum). their expertise and responsiveness is immensely appreciated. managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 16 references aape (american association of philatelic exhibitors). 2016a. “aape join/renew your membership.” http://www.aape.org/join_the_aape.asp. –––––. 2016b. “exhibits online.” http://www.aape.org/join_the_aape.asp. american philatelic society. 2016. “stamp catalogs: your guide to the hobby.” accessed december 8. http://stamps.org/how-to-read-a-catalog. arms, caroline r., and william y. arms. 2004. “mixed content and mixed metadata: information discovery in a messy world.” in metadata in practice, edited by diane i. hillman and elaine l. westbrooks, 223-37. chicago, il: ala editions. bade, david. 2007. “structures, standards, and the people who make them meaningful.” paper presented at the 2nd meeting of the library of congress working group on the future of bibliographic control, chicago, il, may 9, 2007. https://www.loc.gov/bibliographicfuture/meetings/docs/bade-may9-2007.pdf. bade, david. 2008. “the perfect bibliographic record: platonic ideal, rhetorical strategy or nonsense?” cataloging & classification quarterly 46 (1): 109-33. https://doi.org/10.1080/01639370802183081. boggs, winthrop s. 1955. the foundations of philately. princeton, nj: d. van nostrand company. bruce, thomas r., and diane i. hillmann. 2004. “the continuum of metadata quality: defining, expressing, exploiting.” in metadata in practice, edited by diane i. hillman and elaine l. westbrooks, 238-56. chicago, il: ala editions. bureau of canadian archivists. 2008. rules for archival description. rev. ed. ottawa, canada: canadian council of archives. http://www.cdncouncilarchives.ca/archdesrules.html. cc:da (american library association committee on cataloging: description and access). 2010. “task force on metadata: final report.” american library association. https://www.libraries.psu.edu/tas/jca/ccda/tf-meta6.html. chan, lois m., and marcia l. zeng. 2006. “metadata interoperability and standardization – a study of methodology part i: achieving interoperability at the schema level.” d-lib magazine 12 (6). https://doi.org/10.1045/june2006-chan. gallery systems. 2015. “tms: the museum system.” http://go.gallerysystems.com/abouttms.html. international council on archives. 1999. isad(g): general international standard archival description. 2nd ed. stockholm, sweden: international council on archives. http://www.icacds.org.uk/eng/isad(g).pdf. liberty street software. 2016. “stampmanage the best way to catalog your stamp collection.” http://www.libertystreet.com/stamp-collecting-software.htm. information technology and libraries | september 2017 17 roberts, peter j. 2007. “philatelic materials in archival collections: their appraisal, preservation, and description.” the american archivist 70 (1): 70-92. https://doi.org/10.17723/aarc.70.1.w3742751w5344275. scott publishing co. 2014. scott 2015 standard postage stamp catalogue. vol. 3, countries of the world, g-i. sidney, oh: scott publishing co. society of american archivists. 2013. describing archives: a content standard. 2nd ed. chicago, il: society of american archivists. http://files.archivists.org/pubs/dacs2e-2013_v0315.pdf. society of american archivists. 2015. encoded archival description tag library, version ead3. chicago, il: society of american archivists. http://www2.archivists.org/sites/all/files/taglibrary-versionead3.pdf. straight, david. 1994. “adding value to stamp and coin collections.” library journal 119 (10): 7578. accessed december 8, 2016. http://libaccess.sjlibrary.org/login?url=http://search.ebscohost.com/login.aspx?direct=tr ue&db=ulh&an=9406157617&site=ehost-live&scope=site. trustees of the british museum. 2016a. “about the collection database online.” accessed december 8. http://www.britishmuseum.org/research/collection_online/about_the_database.aspx. –––––. 2016b. “british museum semantic web collection online.” accessed december 8. http://collection.britishmuseum.org/. leadership and infrastructure and futures…oh my! letter from the core president leadership, infrastructure, futures christopher cronin information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13027 christopher cronin (cjc2260@columbia.edu) is core president and associate university librarian for collections, columbia university. © 2020. i am so pleased to be able to welcome all ital subscribers to core: leadership, infrastructure, futures! this issue marks the first of ital since the election of core’s inaugural leadership. a merger of what was formerly three separate ala divisions—the association of library collections & technical services (alcts), library & information technology association (lita), and the library leadership & management association (llama)—core is an experiment of sorts. it is, in fact, multiple experiments in unification, in collaboration, in compromise, in survival. while initially born out of a sheer fight or flight response to financial imperatives and the need for organizational effectiveness, developing core as a concept and as a model for an enduring professional association very quickly became the real motivation for those of us deeply embedded in its planning. core is very deliberately not an all-caps acronym representing a single subset of practitioner within the library profession. it is instead an assertion of our collective position at the center of our profession. it is a place where all those working in libraries, archives, museums, historical societies—information and cultural heritage broadly—will find reward and value in membership and a professional home. all organizations need effective leaders, strong infrastructure, and a vision for the future. and that is what core strives to build with and for its members. while i welcome ital’s readers into core, i also welcome core’s membership into ital. no longer publications of their former divisions, all three journals within core have an opportunity to reconsider their mandates. as with all things, audience matters. ital’s readership has now expanded dramatically, and those new readers must be invited into ital’s world just as much as ital has been invited into theirs. as we embark on this first year of the new division, we do so with a sense of altogether newness more than of a mere refresh, and a sense of still becoming more than a sense of having always been. and who doesn’t want to reinvent themselves every once in a while? start over. move away from the bits that aren’t working so well, prop up those other bits that we know deserve more, and venture into some previously uncharted territory. how will being part of this effort, and of an expanded division, reframe ital’s mandate? the importance of information technology has never been more apparent. it is not lost on me that we do this work in core during a year of unprecedented tumult. in 2020, a murderous global pandemic was met with unrelenting political strife, pervasive distribution of misinformation and untruths, devastating weather disasters, record-setting unemployment, heightened attention on an array of omnipresent social justice issues, and a racial reckoning that demands we look both inward and outward for real change. individually and collectively, we grieve so many losses —loss of life, loss of income, loss of savings, loss of homes, loss of dignity, loss of certainty, loss of control, loss of physical contact. and throughout all of these challenges, what have we relied on more this year than technology? technology kept us productive and engaged. it provided a focal point for communication and connection. it provided venues for advocacy, expression, inspiration, and, as a mailto:cjc2260@columbia.edu information technology and libraries december 2020 leadership, infrastructure, futures | cronin 2 counterpoint to that pervasive distribution of misinformation, it provided mechanisms to amplify the voices of the oppressed and marginalized. for some, but unfortunately not all, technology also kept us employed. and as the physical doors of our organizations closed, technology provided us with new ways to invite our users in, to continue to meet their information needs, and to exceed all of our expectations for what was possible even with closed physical doors. and yet our reliance on and celebration of technology in this moment has also placed another critical spotlight on the devastating impact of digital poverty on those who continue to lack access, and by extension also a spotlight on our privilege. in her parting words to you in the final issue of ital as a lita journal, evviva weinraub lajoie, the last president of lita, wrote: we may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. i believe that together, we can make core stand up to that challenge. i believe we will do this, too, and with a spirit of reinvention that is guided by principles and values that don’t just inspire membership but also improve our professional lives and experience in tangible ways. it was a privilege to have served as the final president of alcts and such a humbling and daunting responsibility to now transition into serving as core’s first. it is a responsibility i do not take lightly, particularly in this moment when so much is demanded of us. as we strive for equity and inclusion, we do so knowing that we are only as strong as every member’s ability to bring their whole selves to this work. we must work together to make our professional home everything we need it to be and to help those who need us. it is yours, it is theirs, it is ours. https://doi.org/10.6017/ital.v39i3.12687 libraryvpn: a new tool to protect patron privacy public libraries leading the way libraryvpn a new tool to protect patron privacy chuck mcandrew information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12391 chuck mcandrew (chuck.mcandrew@leblibrary.com) is information technology librarian, lebanon (nh) public libraries. due to increased public awareness of online surveillance, a rise in massive data breaches, and spikes in identity theft, there is a high demand for privacy enhancing services. vpn (virtual private network) services are a proven way to protect online security and privacy. vpn’s effectiveness and ease of use have led to a boom in vpn service providers globally. vpns protect privacy and security by offering an encrypted tunnel from the user’s device to the vpn provider. vpns ensure that no one who is on the same network as the user can learn anything about their traffic except that they are connecting to a vpn. this prevents surveillance of data from any source, including commercial snooping such as your isp trying to monetize your browsing habits by selling your data, malicious snooping such as a fake wifi hotspot in an airport hoping to steal your data, or government-level surveillance that can target political activists and reporters in repressive countries. some people might ask why we need a vpn as https becomes more ubiquitous and provides end to end encryption for your web traffic. https will encrypt the content that goes over the network, but metadata such as the site you are connecting to, how long you are there, and where you go next are all unprotected. additionally, some very important network protocols, such as dns, are unencrypted and anyone can see them. a vpn eliminates all of those issues. however, there are two major problems with current vpn offerings. first, all reliable vpn solutions require a paid subscription. this puts them out of reach of economically vulnerable populations who often have no access to the internet in their homes. in order to access online services, they may rely on public internet connections such as those provided by restaurants, coffee shops, and libraries. using publicly accessible networks without the security benefits of a vpn puts people’s security and privacy at great risk. this risk could be eliminated by providing free access to a high-quality vpn service. the second problem is that using a vpn requires people to place their trust in whatever vpn company they use. some (especially free solutions) have proven not to be worthy of that trust by containing malware or leaking and even outright selling customer data. companies that abuse customer data are taking advantage of vulnerable populations who are unable to afford more expensive solutions or who do not have the knowledge to protect themselves. together, these two problems create a situation where having security and privacy is only available to those who can afford it and have the knowledge to protect themselves. libraries are ideally positioned to help with this situation. libraries work to provide privacy and security to people every day. this can mean teaching classes, making privacy resources available, and even advocating for privacyfriendly laws. mailto:chuck.mcandrew@leblibrary.com https://www.forbes.com/sites/forbestechcouncil/2018/07/10/the-future-of-the-vpn-market/#5b08fd8e2e4d https://research.csiro.au/ng/wp-content/uploads/sites/106/2016/08/paper-1.pdf information technology and libraries june 2020 libraryvpn | mcandrew 2 libraries are also located in almost every community in the united states and enjoy a high level of trust from the public. librarians can be thought of as being a physical vpn. people who come into libraries know that what they read and information that they seek out will be protected by the library. in fact, libraries have helped to get laws protecting the library records of patrons in all 50 states of the usa. people know that when a library offers a service to their community it isn’t because they want to sell their information or show them advertisements. with libraries, our patrons are not the product. libraries also already provide many online services to all members of their community, regardless of financial circumstances. examples include access to online databases, language learning software, and online access to periodicals such as the new york times or consumer reports. many of these services would cost too much for individual patrons to access individually. by pooling their resources, communities are able to make more services available to all of their citizens. to help address the above issues, the lebanon public libraries, in partnership with the westchester (new york) library system, the leap encryption access project (https://leap.se/), and tj lamanna (emerging technology librarian from cherry hill public library and library freedom institute graduate) started the libraryvpn project. this project will allow libraries to offer a vpn to their patrons. patrons will be able to download the libraryvpn application on a device of their choosing and connect to their library’s vpn server from wherever they are. libraryvpn was first conceived a number of years ago, but the real start of the project was when it received an imls national leadership grant (lg-36-19-0071-19) in 2019. this grant was to develop integrations between leap’s existing vpn solution and integrated library systems using sip2 which will allow library patrons to sign in to libraryvpn using their library card. this grant also included development of a windows client (there was already a mac and linux client) and alpha testing at the lebanon public libraries and westchester library system. we are currently working on moving into the testing phase of the software, and planning phase two of this project. phase two of libraryvpn will involve expanding our testing to up to 12 libraries and conducting end-user testing with patrons and library staff. we have submitted an application for imls funding for phase two and are actively looking for libraries that are excited about protecting patron privacy and would like to help us beta test this software. if you work for a library that would be interested in participating, you can reach us via email at libraryvpn@riseup.net or @libraryvpn on twitter. if you would like to help out with this project in another way, we would love to have more help. please reach out. we currently are thinking about three deployment models for libraries in phase two. first would be an on-premises deployment. this would be for larger library systems with their own servers and it staff. libraryvpn is free and open source software and can be deployed by anyone. since it uses sip2 to connect to your ils, it should work with any ils that supports the sip2 protocol. this deployment model has the advantage of not requiring any hosting fees but does require the library system to have staff that can deploy and manage public facing services. drawbacks to this approach would include higher bandwidth use and dealing with abuse complaints. phase 2 testing should give us better data about how much of an issue this will be, but https://leap.se/ mailto:libraryvpn@riseup.net information technology and libraries june 2020 libraryvpn | mcandrew 3 our experience hosting a tor exit node at the lebanon public libraries suggest that it won’t be too bad to deal with. our second deployment model would be cloud hosting. if a library has it staff who can deploy services to the cloud, they could host their own libraryvpn service without needing their own hardware. however, when deploying to the cloud, there will be ongoing costs for running the servers and bandwidth used. figuring out how much bandwidth an average user will consume is part of the data we are hoping to get from our phase 2 testing so we can offer guidelines to libraries who choose to deploy their own libraryvpn service. finally, we are looking at a hosted version of libraryvpn. we anticipate that smaller systems that do not have dedicated servers or it staff will be interested in this option. in this case, there would be ongoing hosting and support costs, but managing the service would not be any more complicated than subscribing to any other service the library hosts for their patrons. libraryvpn is a new project that is pushing library services outside of the library to where the library is. we want to make sure that all of our patrons are protected, not just those with the financial ability and technical know-how to get their own vpn service. as librarians, we understand that privacy and intellectual freedom are joined, and we want to maximize both. as the american library association’s code of ethics says, “we protect each library user's right to privacy and confidentiality.” http://www.ala.org/tools/ethics we can do it for free! using freeware for online patron engagement public libraries leading the way we can do it for free! using freeware for online patron engagement karin suni and christopher a. brown information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13257 karin suni (sunik@freelibrary.org) is curator, theatre collection, the free library of philadelphia. christopher a. brown (brownc@freelibrary.org) is curator, children’s literature research collection, the free library of philadelphia. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. in the early weeks of the pandemic, the special collections division of the free library of philadelphia (https://freelibrary.org/) responded to the library’s call for fun and interactive online engagement. initially staff members released games and buzzfeed-inspired lists via various social media accounts to amuse patrons, distract from the lockdown, and provide educational programming. as the list of activities grew, we realized this content needed a more substantial home; the return on investment of time for the development and production of an online game to be released once on social media was not sufficient. activities and passive programming that took hours to create could easily fall victim to social media’s algorithms and be quickly buried in a patron’s feed. the free library’s official blog was an insufficient option because it promoted all library programming, and our goal was to highlight the value of our division and the materials housed within it. we resolved these issues by creating an online repository solely with freeware systems (https://bit.ly/funwithflpspeccoll). the repository provides a stable landing page wherein the special collections division content builds meaningful connections with patrons of all ages. this model can be readily adapted and is a valuable tool for library workers promoting their own online engagement. repository framework it was clear that our division could not add to the burden of an overworked it staff by requesting support for digital engagement. we needed to seek external alternatives that would interest patrons and could be managed with limited training. before we began our search, we brainstormed a list of requirements: • an inexpensive and user-friendly hosting platform • a pleasing look and easy navigation • the ability to be updated frequently and easily • the flexibility to adapt and expand as our requirements change our search led us to the google suite of products, specifically google sites and google drawings. google sites and google drawings integrated perfectly with each other, and we appreciated their usability and relative simplicity. once we selected the software, we knew we needed a list of best practices to guide the repository’s creation: ● to establish a visual connection with our official website, the repository would primarily use the free library’s branded color scheme. ● all thumbnails created would be square, allowing us to reuse the image as promotional material on different social media accounts. mailto:sunik@freelibrary.org mailto:brownc@freelibrary.org https://freelibrary.org/ https://bit.ly/funwithflpspeccoll information technology and libraries march 2021 we can do it for free! | suni and brown 2 ● all members of the division can create content, but the ability to update and edit the repository would remain limited to ensure consistency. these guidelines have proven effective. the color scheme and thumbnail rules formed a framework wherein we could work productively without “reinventing the wheel.” limiting administrative abilities has allowed us to maintain a controlled vocabulary within the repository, better unifying the content. repository software the google suite, specifically google sites, is advantageous for library workers looking to create professional-looking content quickly. it is free with a google account and built-in templates allow users to build a fully functional website within a few hours with little-to-no design experience. as with all freeware, google sites has quirks. the foremost is that while there are options for customization, these options are finite. there are a limited number of layout, header, and font designs meaning that anyone using the software must temper their vision to fit within the confines of the program. google drawings is far more flexible, in part because it is a much simpler program. users familiar with software like powerpoint or ms paint have the ability to design images for headers, thumbnails, etc. two drawbacks we encountered with this freeware are the restrictions on image upload size (a consideration for our division given the archival files used in our digital collections) and the limited ability to create word art. for our division, the advantages of these software products outweigh their limitations. content framework the repository houses programming devised primarily with freeware. an early discovery was a suite of activities from flippity (https://www.flippity.net). designed for educational use, flippity provides templates for a variety of online activities including memory games, matching games, and board games. our primary focus has been on the first two, although we continue to explore new aspects of this suite as templates are added. flippity works with google sheets and can integrate images from google drawings. jigsaw planet (https://jigsawplanet.net/) has been used extensively by libraries and museums during the pandemic. it allows creators to easily turn images into puzzles that are played online, either on the site itself or through embedding the puzzle. the site allows registered users to access leaderboards, and it allows creators to track how many times puzzles have been played. in addition to the ease of use, the major benefit of jigsaw planet is that the patron can customize their experience by changing the number of pieces to fit their preferred level of difficulty. the desire for audio and video content has surged over the last several months, and we have sought to meet that need through the use of a variety of software. in regard to video, youtube is not a new tool, but the majority of our pre-pandemic programs were not filmed. with the shift to crowdcast and zoom, we now have a library of online lectures and other events that have been uploaded to youtube and can be viewed repeatedly and at any time. with a dedicated home for this content, we have been inspired to seek out older videos of special collections programming across multiple channels and link them to the repository. https://www.flippity.net/ https://jigsawplanet.net/ information technology and libraries march 2021 we can do it for free! | suni and brown 3 one of the newest additions to our offerings has been the podcast story search from special collections (http://bit.ly/flpstorysearch), which explores stories based on, inspired by, or connected to material artifacts. the podcast is recorded and edited using zencastr and audacity and is posted on anchor, which also distributes it to major listening apps. in recent weeks, our division has added images, blog posts, and additional con tent for current and past exhibitions. this is the first formal exhibition compilation since the special collections division began in 2015, and we are delighted that it is available for the public to explore. the material is arranged using templates and tools available in google sites, allowing patrons to view image carousels, exhibition tags, and past programs. the inclusion of this material marks a shift away from the repository functioning as a response to the need for pandemic-related content to a living history of our division and our work promoting the special collections of the free library. accessibility accessibility and equity of access lie at the core of library service. sadly, we were not initially focused on this point, and our content was not fully accessible, e.g., text was presented in thumbnails only which limited the use of screen readers to relay information. as the content expanded, we sought to make the space as inclusive as the freeware limits allowed. alternative text was added to images and information was not limited within thumbnails. this is an ongoing process, but one that is necessary to reach as many patrons as possible. analytics site visits and other statistics for a library’s online presence are always important, but especially so during the pandemic when restricted physical access has driven more patrons to online resources. our plan for capturing this information was two-pronged. first, we used bit.ly to create customized, trackable links for our content. these are used within the repository and on social media and in other online promotions. this has proven to increase repository traffic while providing information on how patrons discover our content. the statistics generated from bit.ly are only available for 30 days for free accounts, albeit in a rolling 30-day window. knowing this, we transcribe the statistics monthly into a spreadsheet to maintain a consistent account of patron access. our second prong is google analytics, a freeware option that only tracks data within the repository. google analytics connects a single google account to google sites, but the integration is seamless and the data remains available indefinitely. this provides a visual breakdown of statistics, including maps and graphs that are easily shared with other stakeholders. by using both tools we are able to surmise who is visiting the repository, where they are finding the links, and which sections are popular with our patrons. conclusion the special collections repository was created in response to a growing need for online patron engagement during the early weeks of the pandemic. our division strove to engage the public with fun, educational programming and activities primarily using freeware. this has proven to be successful with the general public and members of our division. the statistics from the site have both informed content creation and engendered a better appreciation for the repository from our administration. as we move forward, the repository is evolving into a comprehensive collection of what the special collection division does and how we meet the need for patron engagement http://bit.ly/flpstorysearch information technology and libraries march 2021 we can do it for free! | suni and brown 4 online and in person. it is a framework that can be used by library workers across a multitude of areas and specialties, housing activities from story times and passive programming to book clubs and lectures. repository framework repository software content framework accessibility analytics conclusion primo new user interface: usability testing and local customizations implemented in response blake lee galbreath, corey johnson, and erin hvizdak information technology and libraries | june 2018 10 blake lee galbreath (blake.galbreath@wsu.edu) is core services librarian, corey johnson (coreyj@wsu.edu) is instruction and assessment librarian, and erin hvizdak (erin.hvizdak@wsu.edu) is reference and instruction librarian, washington state university. abstract washington state university was the first library system of its 39-member consortium to migrate to primo new user interface. following this migration, we conducted a usability study in july 2017 to better understand how our users fared when the new user interface deviated significantly from the classic interface. from this study, we learned that users had little difficulty using basic and advanced search, signing into and out of primo, and navigating their account. in other areas, where the difference between the two interfaces was more pronounced, study participants experienced more difficulty. finally, we present customizations implemented at washington state university to the design of the interface to help alleviate the observed issues. introduction a july 2017 usability study by washington state university (wsu) libraries was the final segment of a sixmonth process for migrating to the new user interface of ex libris primo called primo new ui. wsu libraries assembled a working group in december 2016 to plan for the migration from the classic interface to primo new ui and met bi-weekly through may 2017. to start, the primo new ui working group attempted to answer some baseline questions: what can and cannot be customized in the new interface? how, and according to what timeline, should we introduce the new interface to our library patrons? what methods could be used to assess the new interface? this working group customized the look and feel of the new interface to conform to wsu branding and then released a beta version of primo new ui in march, leaving the older interface (primo classic) as the primary means of access to primo but allowing users to enter and test the beta version of the new interface. in early may (at the start of the summer semester), the prominence of the old and new interfaces was reversed, making primo new ui the default interface but leaving the possibility of continued access to primo classic. the older interface was removed from public access in mid-august, just prior to the start of the fall semester. the public had the opportunity to work with the beta version from march to may and then another two months experience with the production release by the time the usability study took place in july 2017. the remainder of this paper will focus on the details of this usability study. mailto:blake.galbreath@wsu.edu mailto:coreyj@wsu.edu mailto:erin.hvizdak@wsu.edu primo new user interface | galbreath, johnson, and hvizdak 11 https://doi.org/10.6017/ital.v37i2.10191 research questions primo new ui was the name given to the new front end of the primo discovery layer, which was made available to customers in august 2016. according to ex libris, “its design is based on user studies and feedback to address the different needs of different types of users.”1 we were primarily interested in understanding the usability of the essential functionalities of primo new ui, especially where the design of the new interface deviated significantly from the classic interface (taking local customizations into account). for example, we noted that the new interface introduced the following differences to the user (this ordinal list corresponds to the number labels in figure 1): 1. basic search tabs were expressed as drop-downs. 2. the advanced search link was less prominent than it was with our customized shape and color in the classic interface. 3. main menu items were located in a separate area from the sign in and my account links. 4. my favorites and help/chat icons were located together and in a new section of the top navigation bar. 5. sign in and my account links were hidden beneath a “guest” label. 6. facet values were no longer associated with checkboxes or underlining upon hover. 7. availability statuses were expressed through colored text. figure 1. basic search screen in primo new ui. we also observed a fundamental change in the structure of the record in primo new ui: the horizontally oriented and tabbed structure of the classic record (see figure 2) was converted to a vertically oriented and non-tabbed structure in the new interface (see figure 3). additionally, the tabbed structure of the classic interface opened in a frame of the brief results area, while the same information was displayed on the full display page of the new interface. the options displayed in these areas are known as get it and view it (although we locally branded our sections availability and request options and access options, information technology and libraries | june 2018 12 respectively). therefore, we were eager to see how this change in layout might affect a participant’s ability to find get it and view it information on the full display page. taking the above observations into account, we formulated the following questions: 1. will the participant be able to find and use the basic search functionality? 2. will the participant be able to understand the availability information of the brief results? 3. will the participant be able to find and use the sign in and sign out features? 4. will the participant be able to understand the behavior of the facets? 5. will the participant be able to find and use the actions menu? (see the “send to” boxed area in figure 3.) 6. will the participant be able to navigate the get it and view it areas of the full display page? (see the “availability and request options” boxed area in figure 3.) 7. will the participant be able to navigate the my account area? 8. will the participant be able to find and use the help/chat and my favorites icons? 9. will the participant be able to find and use the advanced search functionality? 10. will the participant be able to find and use the main menu items? (see figure 1, number 3.) figure 2. horizontally oriented and tabbed layout of primo classic. literature review 2012 witnessed a flurry of studies involving primo classic. majors compared the experiences of users within the following discovery interfaces: encore synergy, summon, worldcat local, primo central, and ebsco discovery service. the study used undergraduate students enrolled at the university of colorado and focused on common undergraduate searching activities. each interface was tested by five or six participants who also completed an exit survey. observations specific to the primo interface noted that users had difficulty finding and using existing features, such as email and e-shelf, and difficulty connecting their failed searches to interlibrary loan functionality.2 primo new user interface | galbreath, johnson, and hvizdak 13 https://doi.org/10.6017/ital.v37i2.10191 figure 3. vertically oriented and non-tabbed layout of primo new ui. comeaux noted issues relating to terminology and the display of services during usability testing carried out at tulane university. twenty people, including undergraduates, graduates, and faculty members, participated in this study, which tested five typical information-seeking scenarios. the study found several problems related to terminology. for example, participants did not fully understand the meaning of the expand my results functionality.3 participants also did not understand that the display text “no full-text” could be used to order an item via interlibrary loan. 4 the study also concluded that the mixed presentation of differing resource types (e.g., books, articles, reviews) was confusing for patrons who were attempting known-item searches.5 jarrett documented a usability study conducted at flinders university library. the aims of the study were to determine user perceptions regarding the usability of the discovery layer, the relevance of the information retrieved, and the user experiences of this search interface compared to other interfaces. 6 the usability portion of the study scored the participants’ completion of tasks in the primo discovery layer as difficult, confusing, neutral, or straightforward. scores indicated that participants had difficulty determining different editions of a book, locating a local thesis, and placing an item on hold. the investigators also observed that students had issues signing into primo and distinguishing between journals and journal articles.7 information technology and libraries | june 2018 14 nichols et al. conducted a usability test on a newly implemented primo instance at the university of vermont libraries in 2012. their research questions were designed to understand primo’s design, functionality, and layout.8 the majority of the participants were undergraduate students. similar to comeaux, confusion occurred when participants had to find specific or relevant records within longer sets of results.9 nichols et al. also noticed that test subjects had difficulty navigating and finding information in the primo tabbed structure. like jarrett, nichols et al. noted that participants had difficulty distinguishing between the journals and articles.10 similar to majors, participants in nichols et al. had difficulty finding certain primo functionality, such as email, the e-shelf, and the feature to open items in a new window.11 the investigators concluded that these tools were difficult to find because they were buried too deep in the interface. the university of kansas libraries conducted two usability studies on primo. the first study took place during the 2012–13 academic year and involved 27 participants, including undergraduate, graduate, and professional students, who performed four to five main tasks in two separate sessions. similar to other studies, participants experienced great difficulty using the save to e-shelf and email citation tools.12 kliewer et al. conducted the second usability study in 2016, which focused primarily on student satisfaction with the primo discovery tool. thirty undergraduates participated in this study that collected both qualitative and quantitative data. in contrast to most usability studies of discovery services, this study allowed participants to explore primo with open-ended searches to more closely mimic natural searching strategies. results of the study indicated that the participants preferred basic search to advanced search, used facets (but not enough to maximize their searching potential), rarely moved beyond the first page of search results, and experienced difficulties using the link resolver. in response to the latter, a primo working group clarified language on the link resolver page to better differentiate between links to articles and links to journals.13 brett, lierman, and turner conducted a usability study at the university of houston libraries focusing primarily on undergraduate students. users were able to complete the assigned tasks, but the majority did not do so in the most efficient manner. that is, the participants did not take full advantage of primo functionality, such as facets, holds, and recalls. additionally, some participants exhibited difficulty deciphering among the terms journals, journal articles, and newspaper articles. another difficulty participants experienced was knowing what further steps to take once they had successfully found an item in the results list. for example, participants had trouble locating stacks guides, finding request features, and using call numbers. the researchers concluded that many of the issues witnessed in this usability study could be mitigated via library instruction.14 usability testing of primo new ui has recently begun to take a foothold in academic libraries. in addition to conducting usability testing on the primo classic in april 2015 (5 participants, 5–6 tasks), researchers at boston university carried out both preand post-launch testing of the new interface in december 2016 and april 2017, respectively. pre-launch testing with five student participants identified issues with “labelling, locating links to online services, availability statement links in full results, [and] my favorites.”15 after completing fixes, post-launch testing with four students (2 infrequent users, 2 frequent) found that they were able to easily complete tasks, use filters, save results, and find links to online resources. usage statistics for the new interface, compared to classic, also showed an increased use of facets after fixes, and an increase in the use of some features but decrease in the use of others, providing information on what features warranted further examination.16 primo new user interface | galbreath, johnson, and hvizdak 15 https://doi.org/10.6017/ital.v37i2.10191 california state university (csu) libraries conducted usability studies on primo new ui with 24 participants (undergraduate students, graduate students, and faculty) across five csu campuses. five standard tasks were required: find a specific book, find a specific film, find a peer-reviewed journal article, find an item in the csu network not owned locally, and find a newspaper article. each campus added additional questions based on local needs. participants were overwhelmingly positive about the interface look and feel, ease of use, and speed of the system. the success rate for each task varied across the campuses, with participants having greater success on simple tasks such as finding a specific or known item and mixed results on more difficult tasks including using scopes, understanding icons and elements of the frbr record, and facets. steps were taken to relabel and rearrange the scopes and facets so that they were more meaningful to users, and frbr icons were replaced. the authors concluded that primo is an ideal solution to incorporate both global changes and local preference because of its customizability.17 university of washington libraries conducted usability studies on the classic and new primo interfaces. the primo new ui study observed 12 participants. each 60-minute session included an orientation, pre and post-tests, tasks, and follow-up questions. difficulties were noted with terminology, the site logo, the inability to select multiple facets, unclear navigation, volume requesting, advanced search logic, the pin location in item details, and the date facet. a/b testing with 12 participants (from both the new and c lassic ui studies) revealed the need to fix the sign-in prompt for my favorites, enable libraries to add custom actions to the actions menu, add a sort option for favorites in the new interface, add the ability to rearrange elements on a single item page, and add zotero support. overall, participants preferred the new interface. generally, participants easily completed basic tasks, such as known-item searches, searches for course reserves, and open searches, but had more difficulty with article subject searching, audio/visual subject searching, and print-volume searching, which was consistent from the classic to the new interfaces for student participants.18 method we conducted a diagnostic usability evaluation of primo new ui using eight participants, whom we recruited from the wsu faculty, staff, and student populations. in the end, we received a skewed distribution among the categories: three members of staff and five students (two undergraduate students and three graduate students). the initial composition of the participants comprised a greater number of undergraduate students, but substitution created the final makeup. all the study participants had some exposure to primo classic in the past. we recruited participants by hanging flyers around the libraries of our pullman campus and the adjoining student commons area. we offered the participants $15 in exchange for their time, which we advertised as being a maximum of one hour. the usability test was designed by a team of three library staff, one from systems (it) and two from research services (reference/instruction). two of us were present at each session, one to read the tasks aloud and the other to document the session. we used camtasia to record each session so that we would have the ability to return to it later if we needed to verify our notes or other specifics of the session. we stored the recordings on a secured share of the internal library drive. we received an institutional review board certificate of exemption (irb #16190) to conduct this study. information technology and libraries | june 2018 16 this usability test comprised eleven tasks (see appendix a) to test the research questions described above. the tasks were drafted in consultation with the ex libris set of recommendations for conducting primo usability testing.19 each investigator drew their conclusions as to the participants’ successes and failures. we then met as a group to form a consensus regarding task success and failure (see appendix b). we met to discuss the patterns that emerged and to formulate remedies to problems we perceived as hindering student success. results for each of the ten research questions below, consult appendix b to see details regarding the associated tasks and how each participant approached and completed each task. task set(s) related to research question 1: will the participant be able to find and use the basic search functionality? this was one of the easier tasks for the participants to complete. some participants did not follow the task literally to find their favorite book or movie, but rather completed a search for an item or topic of interest to them. all the participants completed this task successfully. task set(s) related to research question 2: will the participant be able to understand the availability information of the brief results? the majority of the participants understood that the availability text and its color represented important access information. however, there were instances where the color of the availability status was in conflict with its text. this led at least one participant to evaluate the availability of a resource incorrectly. task set(s) related to research question 3: will the participant be able to find and use the sign in and sign out features? the participants all successfully completed this task. participants used multiple methods to sign in: the guest link in the top navigation bar, the sign in link from the ellipsis main menu item, and the get it sign in link on the full display page. all participants signed out via the user link in the top navigation bar. task set(s) related to research question 4: will the participant be able to understand the behavior of the facets? almost all of the participants were able to select the articles facet without issue. one person, however, misunderstood the include behavior of the facets. instead of using the include behavior, this participant used the exclude behavior to remove all facets other than the articles facet. only two participants attempted to use the print books facet to complete the task, “from the list of results, find a print book that you would need to order from another library.” instead, the other 75 percent simply scanned the list of results to find the same information. five out of the eight participants attempted to find the peer-reviewed facet when completing the task to choose any peer-reviewed article from a results list: three were successful, while one selected the newspaper articles facet, and another selected the reviews facet. task set(s) related to research question 5: will the participant be able to find and use the actions menu? the tasks related to the actions menu (copy a citation and email a record) were some of the most difficult for the participants: two were successful, three had some difficulty, and three were unsuccessful. of those primo new user interface | galbreath, johnson, and hvizdak 17 https://doi.org/10.6017/ital.v37i2.10191 who experienced difficulty, one seemed not to understand the task fully; this participant found and copied the citation, but then spent additional time looking for a “clipboard.” the other two participants were both distracted by competing areas of interest: the citations section of the full display and the section headings of the full display. of those who were unsuccessful, one suffered from a technical issue that ex libris needs to resolve (the functionality to expand the list of action items failed), one did not seem to understand what a citation was when they found it, and another could not find the email functionality. this last subject continued searching in the ellipsis area of the main menu, in the my account area, and the facets, but ultimately never found the email icon in the scrolling section of the actions menu. task set(s) related to research question 6: will the participant be able to navigate the get it and view it areas of the full display page? three participants experienced substantial difficulty in completing this set of tasks. these participants were distracted by the styled show libraries and stack chart buttons on the full display page that were competing for attention with the requesting options. task set(s) related to research question 7: will the participant be able to navigate the my account area? all of the participants completed this task successfully. four participants located the back-arrow icon to exit the my account area, while the other four participants used alternate methods: using the library logo, selecting the new search button, and signing out of primo. task set(s) related to research question 8: will the participant be able to find and use the help/chat and my favorites icons? participants encountered very little difficulty in finding a way to procure help and chat with a librarian, with one exception. participant 2 immediately navigated to and opened our help/chat icon, but then moved away from this service because it opened in a new tab. this same participant, along with three others, had a more difficult time finding and deciding to use the pin this item icon than did the three participants who completed the same task with ease. the remaining participant failed to complete this task because they could not find the my favorites area of primo. task set(s) related to research question 9: will the participant be able to find and use the advanced search functionality? one participant had more trouble finding the advanced search functionality than the other seven. another experienced a technical difficulty, in which the primo screen froze during the experiment, and we had to begin the task anew. the remaining six people easily finished the tasks. task set(s) related to research question 10: will the participant be able to find and use the main menu items? the majority of the participants completed this task with ease, navigating to the databases link in the main menu items. one participant, however, was confused by the term database but was able to succeed once we provided a brief definition of the term. the remaining two participants were further confused by the term and instead entered general search terms into the primo search bar. these two participants failed to find the list of databases. discussion information technology and libraries | june 2018 18 study participants completed four of our task sets with relative ease: using basic search (see research question 1 above), signing into and out of primo (see research question 3 above), navigating their my account area (see research question 7 above), and using advanced search (see research question 9 above). there was one exception: one participant experienced minor trouble finding the advanced search link, checking first among the drop-down options on our basic search page. subsequent and unrelated to this study, wsu elected to eliminate the first set of drop-down options from our primo landing page. further testing might tell us if this elimination in the number of drop-down options has effectively made the advanced search link more prominent for users. also, the ease with which participants were able to use items located underneath the “guest” label contradicted our expectations. we predicted that this opacity would cause users issues, but it did not seem to deter them. from this, we concluded that the placement of the sign in options in the upper right corner is sufficient to maintain continuity. participants encountered a moderate degree of difficulty completing two task sets: determining availability statuses and navigating the get it area of the full display page. concerning availability, participants were quick to understand that statuses such as “check holdings” relayed that the item was not available. the participants were also keen to notice that green availability statuses implied access while non -green availability statuses implied non-access. however, per the design of the new interface, certain non-green links became green after opening the full display page of primo. this was a significant deviation from the classic interface, where colors indicating availability status did not change. this design element misled one participant. of note, we did not observe participants experiencing issues with the converted format of the get it and view it areas (see figures 2 and 3) per se. however, we did notice that three of our participants were unnecessarily distracted by the show libraries link when trying to find resource sharing options because wsu had previously styled the show libraries links with color and shape. therefore, our local branding in this area impeded usability and led us to rethink the hierarchy of actions on the full display page. similar to comments made by demars, study participants also remarked that the layout of the full display was cluttered and difficult to read.20 we therefore took steps to make this page more readable for the viewer. study participants displayed the greatest difficulty completing the remaining four task sets: selecting a main menu item, refining a search via the facets, using the actions menu, and navigating the my favorites functionality. however, web design was not necessarily the culprit in all four areas. three participants experienced difficulty finding the databases link (a main menu item). after further discussion, it became apparent that this trouble related not to usability but to information literacy—they did not understand the term databases. therefore, like majors and comeaux,21 we recognize the recurring issue of library jargon, and like brett, lierman, and turner,22 we believe that this issue would best be mitigated via library instruction. in agreement with the literature, two participants selected the incorrect facet because they had difficulty distinguishing among the terms articles, newspaper articles, reviews and peer-reviewed.23 further, one of these participants experienced even more difficulty because of not understanding the inherent functionality of the facet values. that is, this participant did not grasp that the facet value links performed an inclusion process by default. to the contrary, this person believed that they would have had to exclude all unwanted facet values to arrive at the wanted facet value. the change in facet behavior between classic and new interfaces likely caused this confusion. in primo classic, wsu had installed a local customization that provided checkboxes and underlining upon hover for each facet value. the new interface did not primo new user interface | galbreath, johnson, and hvizdak 19 https://doi.org/10.6017/ital.v37i2.10191 provide either one of these clues to the user. additionally, we observed, similar to kliewer et al. and brett, lierman, and turner, that participants oftentimes preferred to scan the results list over refining their search via faceting.24 this finding also matches a 2014 ex libris user study indicating that users are easily confused by too many interface options and thus tend to ignore them.25 regarding the actions menu, the majority of the participants attempted to find the email icon in the correct section of the full display page (i.e., the “send to” section). however, because of a technical issue in the design of the new interface, the email icon was not always present for the participant to find. for others, it was difficult to reach the icon even when it was present as participants had to click the right arrow three to four times to navigate past all the citation manager icons. this observed difficulty in finding existing functionalities in primo echoes that cited by majors and nichols et al.26 participants also experienced significant difficulty deciphering between the similarly named functionalities of the citation icon and the citations section of the full display page. as a result of this observed difficulty, we concluded that differentiating sections of the page with distinct naming conventions would be beneficial to users. like the results reported by boston university, our study participants encountered significant issues when trying to save items into their my favorites list.27 we noticed that participants had difficulty making connections between the icons named keep this item/remove this item and the my favorites area. during testing, it was clear that many of the participants were drawn to the pin icon for the correctly anticipated functionality but then were confused that the tooltips did not include any language resembling “my favorites.” from this last observation, we surmised that providing continuity in language between these icons and the my favorites area would increase usability for our library patrons. pepitone reported problems with the placement of the my favorites pin icon,28 but we observed this being less of a problem than the actual terminology used to name the pin icon. beyond success and failure, a 2014 ex libris user study suggested that academic level and discipline play a key role in user behavior.29 however, we were unable to draw meaningful conclusions among user groups because of our small and homogenous participant pool. decisions made in response to usability results declined to change facets. although one participant did not understand the inclusion mechanism of the facet values, we declined to investigate a customization in this area. according to the primo august 2017 release notes, ex libris plans to make considerable changes to the faceting functionality.30 therefore, we decided to wait until after this release to reassess whether customization was warranted. implemented a change labels citations. we observed confusion between the citation icon of the actions menu and the section of the full display page labeled “citations.” to differentiate between the two items, we changed the actions menu icon text to “cite this item” (see figure 4) and the heading for the citations section to “references cited” (see figure 5). information technology and libraries | june 2018 20 figure 4. cite this item icon of the actions menu. figure 5. references cited section of the full display page. my favorites. there was a mismatch among the tooltip texts of the my favorites icons. we changed the tooltip language for the “keep this item” pin to read “add to my favorites” (see figure 6) and the tooltip language for the “unpin this item” pin to read “remove from my favorites” (see figure 7). figure 6. add to my favorites language for my favorites tooltip. figure 7. remove from my favorites language for my favorites tooltip. availability statuses. per the design of the new interface, certain non-green links became green after opening the full display page of primo new ui. we implemented css code to retain the non-green coloring of the availability statuses after opening the full display. in this case, “check holdings” remains orange (see figure 8). figure 8. availability status color of brief display, before and after opening the full display. primo new user interface | galbreath, johnson, and hvizdak 21 https://doi.org/10.6017/ital.v37i2.10191 link removal full display page headings. there was confusion as to the function of the headings on the full display page. these are anchor tags, but patrons clicked on them as if they were functional links. no patrons used the headings successfully. therefore, we hid the headings section via css (see figure 9). figure 9. removal of headings on full display page. links to other institutions. we observed participants attempting to use the links to other institutions to place resource sharing requests. therefore, we removed the hyperlinking functionality of the links in the list, via css (see figure 10). figure 10. neutralization of links to other institutions. prioritized the emphasis of certain functionalities request options and show libraries buttons. it is usually more important to be able to place a request than find the names of other institutions who own an item. however, the show libraries button was originally styled with crimson coloring, which drew unwarranted attention, while the requesting links were not. therefore, we added styling to the resource-sharing links and removed styling from the show libraries button via css (see figure 11). figure 11. resource sharing link with crimson color, show libraries removed of styling. information technology and libraries | june 2018 22 e-mail icon. we observed that the e-mail icon of the actions menu was difficult to find. therefore, we decreased the number of icons and moved the emailing functionality to the left side of the actions menu (see figure 12). figure 12. email icon prioritized over citation manager icons. contrast and separation full display page sections. participants noted that the information on the full display page tended to run together. to remedy, we created higher contrast between the foreground and background of the page sections via css. we also styled the section titles and dividers with color, among other edits (see figure 13). figure 13. separated sections of full display page (see figure 3 to compare to the new ui default full display page design). primo new user interface | galbreath, johnson, and hvizdak 23 https://doi.org/10.6017/ital.v37i2.10191 conclusion while providing one of the first studies on primo new ui, we acknowledge several limitations. previous studies on primo had larger study populations compared to this one (which had eight participants). however, we adhered to nielsen’s findings that usability studies uncover most design deficiencies with five or more participants.31 additionally, the scope of this study was limited to the usability of the desktop view. we recommend further studies that will concentrate on accessibility compliance and that will test the interface on mobile devices. regarding the study design, the question arose as to whether the participants’ difficulties reflected poor design functionality or a misunderstanding of library terminology (as noted by majors and comeaux).32 the researchers did not carry out pre-tests or an assessment of participants’ level of existing knowledge. this limitation is almost always unavoidable, however, as a task list will always risk not fitting the skills or knowledge of every participant. the lack of some features’ use also might have been because of study design. while not using the facets may reflect that participants are unaware of them, it could also be from the fact that they never had to scroll past the first few items to find the needed resource. users might have felt a greater need to use the facets had we asked more difficult discovery tasks. the study also contained an investigative bias in that the researchers were part of the working group that developed the customized interface, and then tested those customizations. this bias could have been reduced if the study had used researchers who were not a part of the same group that made these customizations. despite these limitations, there are still key findings of note. tasks that participants completed with the greatest ease mapped to those that we assume they do most often, which included basic searching for materials and accessing account information. tasks beyond these basics proved to be more difficult. this raises the question of whether difficulties were really a function of the interface design or if they reflected ongoing literacy issues. therefore, it is crucial that designers work with public services and instruction librarians to identify areas where users might be well-served by making certain functionalities more userfriendly and creating educational and training opportunities to increase awareness of these functionalities.33 bringing diverse perspectives into the study is also crucial so that researchers can discover and be more conscious of commonalities in design and literacy needs, particularly regarding advanced tasks. information technology and libraries | june 2018 24 appendix a: usability tasks note: search it is the local branding for primo at washington state university. 1) please search for your favorite book or movie. a) is this item available for you to read or watch? b) how do you know that this item is or isn’t available for you to read or watch? 2) please sign in to search it. 3) please perform a search for “causes of world war ii” (do not include quotation marks). a) limit your search results to articles. b) for any of the records in your search results list: i) find the citation for any item and copy it to the clipboard. ii) email this record to yourself. 4) please perform a search for “actor’s choice monologues” (do not include quotation marks). a) from the list of results, find a print book that you would need to order from another library. 5) please perform a search for a print book with isbn 0582493498. a) this book is checked out. how would you get a copy of it? b) pretend that this book is not checked out. please show us the information from this record that you would use to find this item on the shelves. 6) please navigate to your library account (from within search it). a) pretend that you have forgotten how many items you have checked out. please show us how you would find out how many items you currently have checked out. b) exit your library account area. 7) please navigate to advanced search. a) perform any search on this page. 8) please show us where you would go to find help and/or chat with a librarian? 9) please perform a search using the keywords “gender and media.” a) add any source to your my favorites list. then open my favorites and click on the title of the source you just added. b) return to your list of results. choose any peer-reviewed article that has the full text available. click on the link that will access the full text. 10) please find a database that might be of interest to you (e.g., jstor). 11) please sign out of search it and close your browser. primo new user interface | galbreath, johnson, and hvizdak 25 https://doi.org/10.6017/ital.v37i2.10191 appendix b: usability results note: search it is the local branding for primo at washington state university. research question 1: will the participant be able to find and use the basic search functionality? associated task(s): 1. please search for your favorite book or movie. participant successful? commentary 1 yes searches for “the truman show” from the beginning. 2 yes searches for “pet sematary” from the beginning. 3 yes searches for “additive manufacturing” from the beginning. 4 yes signs in first, navigates to new search, searches for “pzt sensor design.” 5 yes searches for “the notebook” from the beginning. 6 yes searches for “das leben der anderen” from the beginning. 7 yes searches for “legally blonde” from the beginning. 8 yes searches for “jurassic park” from the beginning. research question 2: will the participant be able to understand the availability information of the brief results? associated task(s): 1b. how do you know that this item is or isn’t available for you to read or watch? 4a. from the list of results, find a print book that you would need to order from another library. participant successful? commentary 1 yes differentiates between green and orange text; uses the “check holdings” availability status. clicks on “availability and request option” heading and then clicks on the resource sharing link. 2 yes, with difficulty. says that green “check holdings” status indicates ability to read the book. selects book with “check holdings” status and locates resource sharing link. information technology and libraries | june 2018 26 participant successful? commentary 3 yes, with difficulty unclear. initially, goes to a record with online access; redoes search, eventually locates resource sharing link. 4 yes says the record for the item reads “in place” and the availability indicator = 1. the record for the item reads “check holdings.” 5 yes says that status is indicated by statement “available at holland/terrell libraries.” the record for the item reads “check holdings.” 6 yes says that status is indicated by statement “available at holland/terrell libraries” and “item in place.” clicks on “check holdings”; says that orange color denotes fact that we don’t have it. 7 yes hovers over “check holdings” status, and then notes that “availability” statement reads “did not match any physical resources.” the record for the item reads “check holdings.” 8 yes says that status is indicated by statement “available at holland/terrell libraries.” says the record for the item reads “check holdings.” research question 3: will the participant be able to find and use the sign in and sign out features? associated task(s): 2. please sign into search it. 11. please sign out of search it and close your browser. participant successful? commentary 1 yes navigates to “guest” link, signs in. 2 yes navigates to ellipsis, signs in. navigates to “user” link, signs out. 3 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. 4 yes n/a—already signed in. navigates to “user” link, signs out. primo new user interface | galbreath, johnson, and hvizdak 27 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary 5 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. 6 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. 7 yes uses sign in link from full display page. navigates to “user” link, signs out. 8 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. research question 4: will the participant be able to understand the behavior of the facets? associated task(s): 3a. limit your search results to articles. 4a. from the list of results, find a print book that you would need to order from another library. 9b. return to your list of results. choose any peer-reviewed article that has the full text available. participant successful? commentary 1 yes selects articles facet. n/a—does not use facets (however, participant investigates the library and type facets, returns to results lists). 2 yes selects articles facet. n/a—does not use facets. 3 no uses “exclude” property to remove everything but articles. uses “exclude” property to remove everything but print books. looks in facet type for articles; selects newspaper articles instead. 4 yes, with difficulty selects articles facet. selects print books facet. selects articles under type facet, clicks on “full-text available” status, selects peer-reviewed articles facet. 5 no selects articles facet. n/a—does not use facets. screen freezes (technical issue) and participant is forced to redo search. n/a— does not use facets. when further prompted to find only peerreviewed articles, participant searches pre-filter area and then selects reviews facet. information technology and libraries | june 2018 28 participant successful? commentary 6 yes selects articles facet. clicks on “check holdings.” participant hovers over “online access” text and then selects peer-reviewed facet. 7 yes looks in drop-down scope, then moves to articles facet. n/a— does not use facets. n/a—does not use facets. 8 yes hovers over peer-reviewed articles facet, and then selects articles facet. n/a—does not use facets. selects peer-reviewed facet. research question 5: will the participant be able to find and use the actions menu? associated task(s): 3.b.i. for any of the records in your search results list, find the citation for any item and copy it to the clipboard. 3.b.ii. for any of the records in your search results list, email this record to yourself. participant successful? commentary 1 yes briefly looks at citation icon, scrolls to bottom of page and looks at citations area, returns to citation icon. scrolls to bottom of page, returns to actions area, scrolls with arrow to find email icon, emails to self. 2 no initially clicks on citation manager icon (easybib), then clicks on citation icon and copies to clipboard. could not find email icon (technical issue with search it). although further discussion reveals that participant expects to see email function within “send to” heading. 3 no opens full display page of item, scrolls to bottom of page. clicks on the citation icon but doesn’t see what looking for. finds email icon and emails to self. 4 no opens full display page of item, clicks on the citation icon, double-clicks to highlight citation. could not find email icon. searches in ellipsis. attempts the keep this item pin. navigates to my account. searches in facets. 5 yes, with difficulty finds citation icon, but then leaves the area via citations heading and winds up at web of science homepage. hovers over “cited in this” language. finds the copy functionality. primo new user interface | galbreath, johnson, and hvizdak 29 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary attempts sent to heading twice, looks through actions icons, scrolls to right, finds email icon. 6 yes finds citation icon, copies to clipboard. scrolls down page, returns to actions menu, scrolls to email icon, emails record to self. 7 yes, with difficulty copies citation from the brief result, and then spends some time trying to find “the clipboard.” navigates to the email icon. 8 yes, with difficulty scrolls to bottom of full display page, clicks on citing this link, clicks on title to record, and then copies first 3 lines of record. scrolls until finds email icon, but then moves to sent to heading, and then back to email icon, and sends. research question 6: will the participant be able to navigate the get it and view it areas of the full display page? associated task(s): 5.a. this book is checked out. how would you get a copy of it? 5.b. please show us the information from this record that you would use to find this item on the shelves. 9.b. click on the link that will access the full text. participant successful? commentary 1 yes clicks on “check holdings” availability status, clicks on availability and request options heading, clicks on request summit item link. refers to call number in alma iframe. clicks “full-text available” status, clicks database name. 2 yes opens record, locates resource sharing link. refers to call number; opens stack chart to find call number. clicks on title, clicks database name. 3 yes locates request option. locates call number in record. clicks “full-text available” status, clicks database name. 4 yes, with difficulty. clicks on show libraries button, then finds request option after searching page. locates call number in record. clicks “full-text available” status but does not click on database name. 5 yes, with difficulty. moves to stack chart button, then to show libraries button, and then to availability and request options heading, clicks on stack chart, clicks on show libraries, moves into first library listed and information technology and libraries | june 2018 30 participant successful? commentary back out, and finally to ill link. finds call number on full display page. 6 yes finds request summit option. identifies call number and stack chart as means to find book. clicks on database name. 7 yes, with difficulty. looks at status statement, scrolls to bottom of page, then show libraries button, then request summit option. identifies call number and stack chart as means to find book. attempts to use “full-text available” link, then clicks on database name. 8 yes finds summit request option. identifies call number and stack chart as means to find book. attempts to use “full-text available” link, then clicks on database name. research question 7: will the participant be able to navigate their my account area? associated task(s): 6. please navigate to your library account (from within search it). 6a. pretend that you have forgotten how many items you have checked out. please show us how you would find out how many items you currently have checked out. 6b. exit your library account area. participant successful? commentary 1 yes navigates to my account from “user” link. navigates to loans tab. uses back arrow icon. 2 yes navigates to my account from “user” link. navigates to loans tab. uses back arrow icon. 3 yes navigates to my account from main menu ellipsis. navigates to loans. uses back arrow icon. 4 yes navigates to my account from main menu ellipsis. navigates to loans. uses to back arrow icon. 5 yes navigates to my account from “user” link. navigates to loans. signs out of search it. 6 yes navigates to my account from “user” link. navigates to loans. uses search it logo to exit. primo new user interface | galbreath, johnson, and hvizdak 31 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary 7 yes navigates to my account from “user” link. navigates to loans. uses new search button to exit. 8 yes navigates to my account from “user” link. navigates to loans. uses search it logo to exit. research question 8: will the participant be able to find and use the help/chat and my favorites icons? associated task(s): 8. please show us where would you go to find help and/or chat with a librarian? 9.a. add any source to your my favorites list. then, open my favorites and click on the title of the source you just added. participant successful? commentary 1 yes, with difficulty navigates to help/chat icon. navigates to keep this item pin, hesitates, navigates to ellipsis, returns to and clicks on pin. moves to my favorites via animation. clicks on title. 2 yes, with difficulty initially navigates to help/chat icon, but thinks it is the wrong button because chat is not directly available within search it. navigates to keep this item pin, hesitates, looks around, selects pin. moves to my favorites via animation. clicks on title. 3 yes, with difficulty navigates to help/chat icon. navigates to ellipsis, actions menu, and tags section. finds keep this item pin. 4 no navigates to help/chat icon. navigates to ellipsis, keep this item pin, my account, and facets quits search. 5 yes, with difficulty navigates to help/chat icon. adds keep this item pin after investigating 12 other icons. moves to my favorites via animation. clicks on title. 6 yes navigates to help/chat icon. adds keep this item pin and moves to my favorites via animation. clicks on title. 7 yes navigates to help/chat icon. checks actions menu, adds keep this item pin and moves to my favorites via animation clicks on title. 8 yes navigates to help/chat icon. adds keep this item pin and moves to my favorites via animation. clicks on title. information technology and libraries | june 2018 32 research question 9: will the participant be able to find and use the advanced search functionality? associated task(s): 7. please navigate to advanced search. 7a. perform any search on this page. participant successful? commentary 1 yes navigates to advanced search. performs search. 2 yes navigates to advanced search. performs search. 3 yes, with difficulty navigates to basic search drop-down, then to new search, then to advanced search. has trouble inserting cursor into search box. 4 yes, with difficulty navigates to advanced search. builds complex search, then search it freezes and we have to restart the search tool. 5 yes navigates to advanced search. performs search. 6 yes navigates to advanced search. performs search. 7 yes navigates to advanced search. performs search. 8 yes navigates to advanced search. performs search. research question #10: will the participant be able to find and use the main menu items? associated task(s): 10. please find a database that might be of interest to you (e.g., jstor). participant successful? commentary 1 yes navigates to “databases” link of main menu. 2 yes navigates to “databases” link of main menu. 3 no types query “stretchable electronics” into search box, but unsure how to find a database in the results lists. 4 no types query “reinforced concrete” into search box, but unsure how to find a database in the results lists. primo new user interface | galbreath, johnson, and hvizdak 33 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary 5 yes, with difficulty is confused by term database. enters “ieee” in search box. 6 yes navigates to “databases” link of main menu. 7 yes searches within drop-down scopes, then facets, then moves to “databases” link of main menu. 8 yes navigates to “databases” link of main menu. 1 “frequently asked questions,” ex libris knowledge center, accessed august 28, 2017, https://knowledge.exlibrisgroup.com/primo/product_documentation/050new_primo_user_interface /010frequently_asked_questions. 2 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186–207, https://doi.org/10.1353/lib.2012.0029. 3 david comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (2012): 199, https://doi.org/10.1080/10691316.2012.695671. 4 comeaux, “usability testing,” 202. 5 comeaux, “usability testing,” 196–97. 6 kylie jarrett, “findit@flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (2012): 280, https://doi.org/10.1080/00048623.2012.10722288. 7 jarrett, “findit@flinders,” 287. 8 aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (2014): 174, https://doi.org/10.1080/19322909.2014.903133. 9 nichols, “kicking the tires,” 181. 10 nichols, “kicking the tires,” 184. 11 nichols, “kicking the tires,” 184–85. 12 scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (2015): 9, https://doi.org/10.1080/19322909.2014.983259. 13 greta kliewer et al., “using primo for undergraduate research: a usability study,” library hi tech 34, no. 4 (2016): 576, https://doi.org/10.1108/lht-05-2016-0052. https://knowledge.exlibrisgroup.com/primo/product_documentation/050new_primo_user_interface/010frequently_asked_questions https://knowledge.exlibrisgroup.com/primo/product_documentation/050new_primo_user_interface/010frequently_asked_questions https://doi.org/10.1353/lib.2012.0029 https://doi.org/10.1080/10691316.2012.695671 https://doi.org/10.1080/00048623.2012.10722288 https://doi.org/10.1080/19322909.2014.903133 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.1108/lht-05-2016-0052 information technology and libraries | june 2018 34 14 kelsey brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology & libraries 35, no. 1 (2016): 21, https://doi.org/10.6017/ital.v35i1.8965. 15 cece cai, april crockett, and michael ward, “our experience with primo new ui,” ex libris users of north america conference 2017, accessed november 4, 2017, http://documents.eluna.org/1467/1/caicrockettward_051017_445pm.pdf. 16 cai, crockett, and ward, “our experience with primo new ui.” 17 j. michael demars, “discovering our users: a multi-campus usability study of primo” (paper presented, international federation of library association and institutions world library and information conference 2017, warsaw, poland, august 14, 2017), 11, http://library.ifla.org/1810/1/s10-2017demars-en.pdf. 18 anne m. pepitone, “a tale of two uis: usability studies of two primo user interfaces” (slideshow presentation, primo day 2017: migrating to the new ui, june 12, 2017), https://www.orbiscascade.org/primo-day-2017-schedule/. 19 “primo usability guidelines and test script,” ex libris knowledge center, accessed october 28, 2017, https://knowledge.exlibrisgroup.com/primo/product_documentation/ new_primo_user_interface/primo_usability_guidelines_and_test_script. 20 demars, “discovering our users,” 9. 21 majors, “comparative user experiences,” 190; comeaux, "usability testing," 198–204. 22 brett, lierman, and turner, “lessons learned,” 21. 23 jarrett, “findit@flinders,” 287; nichols, “kicking the tires,” 184; brett, lierman, and turner, “lessons learned,” 20–21. 24 kliewer et al., “using primo for undergraduate research,” 571–72; brett, lierman, and turner, “lessons learned,” 17. 25 miri botzer, “delivering the experience that users expect: core principles for designing library discovery services,” white paper, nov 25 2015, 10, http://docplayer.net/10248265-delivering-theexperience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzerprimo-product-manager-ex-libris.html. 26 majors, “comparative user experiences,” 194; nichols et al., “kicking the tires,” 184–85. 27 cai, crockett, and ward, “our experience with primo new ui,” 28–29. 28 pepitone, “a tale of two uis,” 29. 29 botzer, “delivering the experience,” 4–5; christine stohn, “how do users search and discover? findings from ex libris user research,” library technology guides, may 5 2015, 7–8, https://librarytechnology.org/document/20650. https://doi.org/10.6017/ital.v35i1.8965 http://documents.el-una.org/1467/1/caicrockettward_051017_445pm.pdf http://documents.el-una.org/1467/1/caicrockettward_051017_445pm.pdf http://library.ifla.org/1810/1/s10-2017-demars-en.pdf http://library.ifla.org/1810/1/s10-2017-demars-en.pdf https://www.orbiscascade.org/primo-day-2017-schedule/ https://knowledge.exlibrisgroup.com/primo/product_documentation/%20new_primo_user_interface/primo_usability_guidelines_and_test_script https://knowledge.exlibrisgroup.com/primo/product_documentation/%20new_primo_user_interface/primo_usability_guidelines_and_test_script http://docplayer.net/10248265-delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html http://docplayer.net/10248265-delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html http://docplayer.net/10248265-delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html https://librarytechnology.org/document/20650 primo new user interface | galbreath, johnson, and hvizdak 35 https://doi.org/10.6017/ital.v37i2.10191 30 “primo august 2017 highlights,” ex libris knowledge center, accessed november 2, 2017, https://knowledge.exlibrisgroup.com/primo/product_documentation/highlights/ 027primo_august_2017_highlights. 31 jakob nielsen, “how many test users in a usability study?,” nielsen norman group, jun 4, 2012, https://www.nngroup.com/articles/how-many-test-users/. 32 majors, “comparative user experiences,” 190; comeaux, “usability testing,” 200–204. 33 brett, lierman, and turner, “lessons learned,” 21. https://knowledge.exlibrisgroup.com/primo/product_documentation/highlights/%20027primo_august_2017_highlights https://knowledge.exlibrisgroup.com/primo/product_documentation/highlights/%20027primo_august_2017_highlights https://www.nngroup.com/articles/author/jakob-nielsen/ https://www.nngroup.com/articles/how-many-test-users/ abstract introduction research questions literature review method results task set(s) related to research question 1: will the participant be able to find and use the basic search functionality? task set(s) related to research question 2: will the participant be able to understand the availability information of the brief results? task set(s) related to research question 3: will the participant be able to find and use the sign in and sign out features? task set(s) related to research question 4: will the participant be able to understand the behavior of the facets? task set(s) related to research question 5: will the participant be able to find and use the actions menu? task set(s) related to research question 6: will the participant be able to navigate the get it and view it areas of the full display page? task set(s) related to research question 7: will the participant be able to navigate the my account area? task set(s) related to research question 8: will the participant be able to find and use the help/chat and my favorites icons? task set(s) related to research question 9: will the participant be able to find and use the advanced search functionality? task set(s) related to research question 10: will the participant be able to find and use the main menu items? discussion decisions made in response to usability results declined to change implemented a change labels link removal prioritized the emphasis of certain functionalities contrast and separation conclusion appendix a: usability tasks appendix b: usability results research question 1: associated task(s): research question 2: associated task(s): research question 3: associated task(s): research question 4: associated task(s): research question 5: associated task(s): research question 6: associated task(s): research question 7: associated task(s): research question 8: associated task(s): research question 9: associated task(s): research question #10: associated task(s): the impact of covid-19 on the use of academic library resources article the impact of covid-19 on the use of academic library resources ruth sara connell, lisa c. wallis, and david comeaux information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12629 abstract the covid-19 pandemic has greatly impacted higher education, including academic libraries. this paper compares the use of library resources (including interlibrary loan, website and discovery tool pageviews, database use, patron interactions, etc.) at three university libraries before and after the pandemic. the latter part of the 2019 and 2020 spring semesters are the time frames of focus, although two control time frames from earlier in those semesters are used to determine how the semesters differed when the coronavirus was not a factor. the institutions experienced similar patterns of use across many metrics. introduction the year 2020 will be remembered as the year of the novel coronavirus (covid-19). around the world, hundreds of thousands of people died from the disease, schools and businesses shut their doors, wearing masks in public places became commonplace, and unemployment soared.1 everyone and everything changed in ways large and small, and libraries were no exception. this study measures changes in use of library resources during time frames of covid-19 closures at three different institutions: louisiana state university, northeastern illinois university, and valparaiso university. these three universities vary in size (large to medium), control (two public, one private), basic carnegie classification (doctoral-very high research, master’s, and doctoral/ professional), and setting (two primarily non-residential and one highly residential).2 despite their differences, these institutions experienced similar patterns of use across many categories. key findings: • the pandemic affected the three institutions studied on a continuum, with the least impact at the largest school, and the biggest drop in use seen at the smallest school. • use of all three libraries’ websites as well as the discovery tools/catalogs and major databases decreased during the covid time frame. • all three libraries experienced an increase in virtual communication. background louisiana state university louisiana state university (lsu) is the flagship institution of louisiana and is one of only 22 prestigious universities nationwide holding land-grant, sea-grant, and space-grant status. the main campus is in baton rouge, has the carnegie basic classification “doctoral universities: very ruth sara connell (ruth.connell@valpo.edu) is professor of library science & director of systems, valparaiso university. lisa c. wallis (l-wallis@neiu.edu) is associate dean of libraries and eresources & systems librarian, northeastern illinois university. david comeaux (davidcomeaux@lsu.edu) is systems and discovery librarian, louisiana state university. © 2021. mailto:ruth.connell@valpo.edu mailto:l-wallis@neiu.edu mailto:davidcomeaux@lsu.edu information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 2 high research activity,” is primarily non-residential, and has a student full-time equivalent (fte) of about 30,000.3 lsu library is the university’s main library and a center of campus life for both students and faculty, and it houses approximately three million physical items (print books, media, and serials).4 before the onset of covid-19, lsu library was open 24/5 (24 hours sunday-thursday) plus weekend hours, with a 24/7 schedule during finals. as of july 30, 2020, it had 100 employees, of which 39 were librarians. lsu library staff, like all nonessential lsu staff, last reported to work in-person on monday, march 16. the previous day, the state’s governor ordered a statewide stay-at-home order, restricting events and closing venues. that monday, library it employees hurriedly assisted other staff in preparing to work remotely. near the end of the day, the university’s president sent an email asking all employees to work from home until further notice. classes were canceled for the week to allow instructors to prepare for remote teaching. the following week, march 23, was the beginning of spring break. classes resumed online-only on march 30. the libraries remained closed through the duration of the spring semester. despite the closure, the libraries continued to serve patrons. librarians continued to assist students through email and zoom-based sessions. the catalog and discovery systems remained available, and staff continued to fill article and book chapter requests through interlibrary loan and document delivery. northeastern illinois university northeastern illinois university (neiu) is a public, comprehensive, multicultural university located on the north side of chicago. it has an enrollment of approximately 7,000 students in undergraduate and graduate programs among three colleges: arts and sciences, education, and business. while currently classified in the “master’s colleges and universities: larger programs” category, neiu is undergoing a major enrollment shift due to the state of illinois’ failure to provide a budget during the period 2015–2017 and, now, the covid crisis.5 the spring 2019 fte was 4,644 (83% undergraduate), while in spring 2020, fte was 4,404 (80% undergraduate). the campus is primarily a commuter campus. the neiu libraries offer library services at three locations in chicago. altogether, the three libraries house approximately 500,000 physical books, media, and serials.6 in spring 2020, the neiu libraries employed 12 people in positions requiring an mls—including the dean and associate dean of libraries—and 18 staff. the main library is typically open 92 hours per week. neiu’s spring break typically falls at the same point in the semester every year, with the spring 2020 break scheduled to begin on saturday, march 14. the week prior to spring break, neiu’s president announced that the break would be extended by an extra week to allow instructors to move instruction to alternatives from face-to-face teaching. the neiu libraries closed its doors on wednesday, march 18 at 6 p.m., and library faculty and staff began working from home. the libraries were able to offer continued reference and instruction by chat, email, phone, and google meet and to fill article and book chapter requests electronically. no physical materials were available, as the statewide delivery system supporting borrowing among academic libraries stopped its services on march 16. alternative instruction resumed on monday, march 30 for all students. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 3 valparaiso university valparaiso university (valpo) is a private, not-for-profit, highly residential, four-year university in northwest indiana. its carnegie basic classification is “doctoral/professional universities,” although it serves a largely undergraduate student population (90% of fte in spring 2020). the graduate programs serve fewer than 500 students. during the time frame of this study, valpo’s fte ranged from 3,449 (spring 2019) to 3,147 (spring 2020).7 there is one library on campus. the christopher center for library and information resources has approximately 450,000 items in its print collection.8 during the fall and spring semesters (excluding breaks), the library is open 113 hours per week. as of fall 2020, the library employed 19 people: 10 librarians (including the dean) and nine staff members. this is four fewer positions than before the pandemic; due to covid-19 funding cuts, three staff members were laid off and one open librarian position was eliminated. valpo is unusual in that it has a two-week spring break which always falls on the first full two weeks in march. in 2020, spring break coincided with the burgeoning covid-19 crisis. during the second week of spring break, campus administration announced that campus would remain open, but classes would go online immediately following the break, starting monday, march 16. on march 16, the christopher center library moved to reduced hours (open 67.5 hours per week). as the fallout from the pandemic rapidly unfolded, hours were further reduced four days later. at the close of business on tuesday, march 24, in accordance with the state of indiana’s stay-at-home order, the physical building closed, although library faculty and staff continued to work from home. literature review preparing and responding to pandemics and other disasters libraries have long understood the need to prepare for disasters and have chronicled their struggles with disasters both natural and man-made. library collections have been lost due to fires and floods, and libraries have been forced to drastically alter their service models in response. the literature on this topic includes surveys of libraries regarding their emergency preparedness, advice on preparing disaster recovery plans, and recovering or replacing lost collections.9 one particularly prescient article describes the work done at the university of minnesota’s school of public health.10 two librarians joined a task force to prepare the school to function through an influenza pandemic. they focused on two scenarios. one was the onset of a pandemic mid semester, forcing schools to close for weeks or even through the duration of the semester. the other was an even longer (9to 18-month) school closure. both scenarios included implementing social distancing practices now familiar to us all. the task force provided many recommendations to enable continuity of online teaching in the event of a pandemic, but none dealt specifically with libraries. resource use during building disruptions previous studies have studied the impact on the use of library resources during building disruptions, generally due to renovations. in the studies reviewed, these libraries moved to temporary locations and still had some physical space available to students. this differs from the complete closure experienced by most libraries due to covid-19. typically, library services and resources are used less when normal building operations are disrupted, but there are exceptions. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 4 in 1999, the library at eastern illinois university closed for 31 months. library services were relocated elsewhere. overall, the library experienced a “sharp drop in the use of library resources and services,” although one bright spot of growth was in interlibrary loan use, which went up by 16%.11 however, the authors note that this increase may be due to patrons placing requests for items owned by eiu that were difficult to access. mcneese state university in louisiana started a multiyear library renovation in 2012, during which the library’s personnel, services, and a small subset of the collection were relocated to a ballroom in the student union. a 2016 article discusses effects on library services half-way through the disruption. in reviewing the literature, the author found, “the longer students and faculty are not allowed access to the library building, the more usage statistics such as circulation, interlibrary loan (ill), and instruction decrease.”12 mcneese experienced a 22% decrease in interlibrary loan requests (borrowing and lending), a 51% decrease in the use of e-books, and a 62% decline in reference transactions. the author noted that “nearly every library service experienced a precipitous decline.”13 pepperdine university bucked the trend of sharp decreases during their 2016–2017 renovation. they experienced a dramatic increase in the number of interlibrary loan requests, both for borrowing (33%) and lending (375%), likely related to their decision to join rapid-ill at the beginning of the renovation. conversely, there was a slight decrease (10%) in borrowing from the california statewide consortium. pepperdine’s e-book usage remained fairly steady and actually increased 3% during the disruption. as expected, their in-person questions decreased while in a temporary facility, although chat and email reference questions increased by 30%.14 covid-19 impact though the coronavirus was first discovered to have reached the united states in january 2020, it was not until late february that american colleges and universities began to implement travel restrictions for students and staff and to develop plans for potential closure. 15 by early march 2020 the us department of education had developed guidelines for the coronavirus’ impact on distance learning and financial aid.16 the topic was also the subject of numerous articles on higher education websites such as the chronicle of higher education and inside higher ed. in march 2020, when the impact of covid-19 began to reverberate around the united states, virtually all libraries closed.17 the american library association (ala) conducted a survey of libraries of all types in may and reported that the majority of academic libraries had already lost funding, or anticipated losing funding within the next year, for staff, new hires, professional development, print collections, programs, and services.18 the press release for the ala survey stated that “survey respondents shared leaps in the use of digital content, online learning, and virtual programs.”19 however, a review of the survey instrument reveals that no questions were asked about use of these resources, so it is unclear where this assertion comes from.20 the survey included opportunities to leave comments, so it may be that the leaps in use mentioned in the ala press release is anecdotal. although libraries have never faced a global pandemic in modern times, previous research indicates library building disruptions generally reduce use, not cause it to increase as described in the press release. academic libraries were among the last facilities to close on many campuses, as they were viewed as essential for students.21 however, as early as the week of march 9, 7% of academic libraries reported that they had stopped circulating physical materials. in addition, building and face-toinformation technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 5 face reference desk staffing began to be scaled back, with 28% of reporting libraries doing no faceto-face reference by the end of that week.22 complete closure of academic library physical facilities became the norm by mid-march, with vocal advocacy for that measure expressed by both the american library association and the association of college and research libraries.23 apart from library instruction, the coronavirus did not so much force academic libraries to move online as to temporarily suspend physical and in-person services. by 2020, provision of online services—including book, article, and media circulation and email and chat reference—was nothing new for libraries. recent years have seen increased migration to cloud-based systems, eliminating the need for library work to be done using specific client software on library staff computers. academic libraries and vendors alike promoted the ability of their institutions and computer systems to handle the covid-19 crisis with minimal disruption.24 the international coalition of library consortia (icolc) issued a statement on march 13, 2020 , asking vendors to lift many of the usual licensing restrictions and opening access to the 391 million students affected by school and library shutdowns.25 publishers and vendors quickly began to remove paywalls between users and their online collections, either for free without library mediation or upon library request. icolc followed up by starting a comprehensive list of these materials on march 16, 2020.26 despite increased access to online collections, library resource use was disrupted. no matter how many services can be offered online, plenty of students and faculty still use traditional items such as print books. given the circumstances of covid-19, with sudden and lasting limits on access to physical materials and space, academic libraries began to promote online equivalents of th ese, such as the hathitrust emergency temporary access service (etas), which penn state reported offered “reading access to more than 48% of the libraries’ print collections.”27 upon seeing the collected mass of print books students returned prior to leaving campus due to covid -19, librarian nora dimmock of brown university’s john jay library identified the need to move “more intentionally” to e-books over print books in future purchasing decisions.28 methodology in order to determine whether the covid-19 pandemic affected use of library systems and resources, the researchers compared usage statistics from a covid-affected time frame in 2020 to the same time frame in 2019. these will be called the covid 2019 and covid 2020 time frames (or covid time frames collectively). because there could be other factors influencing use, such as differences in student fte between the two years, the researchers also pulled data from control time frames to compare earlier in the spring semesters. these control time frames were unaffected by covid in both 2019 and 2020. these will be called the control 2019 and control 2020 time frames (or control time frames collectively). by including data from the control time frames, we were able to determine whether there were trends affecting usage differences before the pandemic hit. as an example, lsu’s catalog pageviews were down 5% from the previous year during the first part of the semester unaffected by covid-19. for the latter part of the semester affected by covid-19, catalog use fell 25% as compared to the previous year. the control time frame comparison shows that catalog use was already in decline before the pandemic; therefore, some of the 25% decline is likely due to factors other than covid-19. the baseline factored difference is −20% (−5% + x = −25%; x = information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 6 −20%). figures 2, 4, and 6 illustrate the percentage of differences in covid time frames from determined baselines. each of the three researchers determined the control and covid time frames for their own institutions. because the academic calendars vary widely between the three institutions, the time frames of study also differ. the absolutes for each institution were: • the 2019 and 2020 control time frames were the same number of days • the 2019 and 2020 covid time frames were the same number of days (but the control time frames were not the same length as the covid time frames) • if a time frame from one year contained a special calendar event (e.g., spring break, mardi gras holiday, etc.), the researchers made sure the corresponding time frame from the other year also included that event. table 1. dates of institutions’ control & covid comparison time frames (excluding ebsco and proquest) control time frame 2019 control time frame 2020 covid time frame 2019 covid time frame 2020 lsu monday, january 14 through friday, march 8 monday, january 13 through friday, march 6 monday, march 18 through saturday, april 27 monday, march 23 through saturday, may 2 neiu monday, january 7 through friday, march 15 monday, january 6 through friday, march 13 saturday, march 16 through sunday, may 5 saturday, march 14 through sunday, may 3 valpo wednesday, january 9 through friday, march 1 wednesday, january 8 through friday, february 28 monday, march 18 through tuesday, may 14 monday, march 16 through tuesday, may 12 the primary concern was to ensure that each institution was comparing like time frames to like time frames within its own academic calendar. the variability of institutional calendars means that the data cannot be compared between institutions. for the major database platforms, the control and covid time frames differ from the other areas measured due to limitations of the statistical platforms. ebsco and proquest platforms allow reports to be run on full months only; partial month statistics cannot be pulled. for that reason, for the major database platforms alone, the control time frame is january through february, and the covid time frame is april through may. because 2020 was a leap year, february 2020 had one more day than february 2019. this means that the 2019 control time frame had 59 days while the 2020 control time frame had 60 days. the extra day could account for an increase in use of approximately 2%. when evaluating use of database platforms, there are several metrics from which to choose. the authors chose full text downloads (the counter metric known as total item requests) because it is the same across reports. there are different search metrics in platform and database reports, so the authors chose to focus on the consistency of the full text download metric. when considering information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 7 e-resource use during covid-19, another factor to consider is that many publishers removed some or all of their paywalls as the pandemic began. according to counter, the non-profit organization that regulates standards for usage data: as a result of this open content, usage may appear to go down during this period. this is because many of your users will be working from home, outside of your institution’s ip range and not authenticated. this means that the publishers are unable to attribute this usage to your institution.29 the journal platforms (wiley, sage, etc.) were more likely to be affected by this consideration than ebsco and proquest, who did not remove paywalls for databases of interest to academic audiences. the researchers determined which metrics to harvest. the following were selected: • main website usage • catalog usage • discovery tool usage • libguides usage (database a–z lists and guide views) • interlibrary loan article requests received (for both lending and borrowing) • sfx document delivery clickthroughs • ebsco and proquest total item requests (full text downloads) • patron interactions (chat questions, research consultations, ask-a-librarian) the three institutions use different products for discovery tools, chat reference, and interlibrary loan, so reporting mechanisms also vary. however, in general a combination of google analytics pageviews and vendors’ own reporting systems were used to pull the data. neiu libraries migrated from illiad to tipasa in june 2019, so it was not possible to gather comparable daily statistics for interlibrary loan request comparisons. since neiu did not have interlibrary loan data available that compared 2019 to 2020, the sfx report on document delivery clickthroughs was used as a proxy measure. when full text of an item is not available in a database, the docdel clickthroughs indicate that the user went to the sfx menu and then the tipasa interlibrary loan request page. because of various factors, not all data points are available for the three institutions. for example, neiu is part of the carli consortium. it was decided by a carli committee a few years ago that google analytics would not be used in the shared catalog in order to protect users’ privacy, s o the catalog usage data point is missing for neiu. neiu and valpo have chat service data; lsu now has chat but did not during the time frames under investigation. when data are not available, they are missing from the tables and results. results louisiana state university lsu’s data present a mixed picture. while the use of libguides rose, use of the libraries’ website, discovery system, and catalog all declined during the covid-enforced closure. while downloads through ebsco increased during the covid time frame by 30%, that is less than the increase information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 8 during the control time frame (37%), resulting in a baseline-factored decrease. interlibrary loan requests and requests for help from patrons rose considerably. table 2. lsu: use of library resources during control and covid time frames, 2019 and 2020 measure of use 2019 control 2020 control control period change 2019 covid 2020 covid covid period change ill borrowing & docdel 3,872 3,774 -3% ▼ 2,619 3,525 35% ▲ ill lending 2,692 2,419 -10% ▼ 1,845 2,884 56% ▲ catalog pageviews 83,760 79,838 -5% ▼ 54,662 41,000 -25% ▼ discovery tool pageviews 432,070 407,832 -6% ▼ 349,558 291,093 -17% ▼ ebsco total item requests 59,892 81,804 37% ▲ 49,819 64,817 30% ▲ proquest total item requests 2,783 5,575 100% ▲ 6,002 2,859 -52% ▼ main website pageviews 164,329 151,090 -8% ▼ 103,340 81,404 -21% ▼ databases a–z views 27,977 26,037 -7% ▼ 20,893 19,624 -6% ▼ libguides views 127,191 123,783 -3% ▼ 12,163 18,925 56% ▲ ask-a-librarian tickets 35 88 151% ▲ 17 108 535% ▲ information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 9 figure 1. lsu: change from 2019 to 2020 in both control and covid time frames (in %). figure 2. lsu: percentage of differences in covid time frames from determined baselines. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 10 northeastern illinois university use of some of neiu’s resources dropped dramatically during the covid 2020 time frame. however, use of some resources was already lower overall in 2020 than in 2019, even in the control time frame. illinois has experienced a tumultuous few years, with state universities starting new fiscal years without budgets in 2015 and 2016. this has led to a steady decrease in enrollment at public regional universities, as students sought to attend more stable institutions out of state.30 neiu was hit particularly hard, experiencing a 25% drop in enrollment between fall 2015 and fall 2019.31 so, it is likely some of the decrease in 2020 was due to lower enrollment. areas that saw growth in covid time frame 2020 were chat and reference consultations and interlibrary loan clickthroughs from sfx (see table 3). table 3. neiu: use of library resources during control and covid time frames, 2019 and 2020 measure of use 2019 control 2020 control control time frame change 2019 covid 2020 covid covid time frame change sfx docdel clickthroughs 1,864 2,728 46% ▲ 924 1,249 35% ▲ discovery tool pageviews 52,435 50,310 -4% ▼ 30,362 23,729 -22% ▼ ebsco total item requests 14,525 19,704 36% ▲ 15,785 15,452 -2% ▼ proquest total item requests 7,003 5,842 -17% ▼ 6,895 4,898 -29% ▼ main website pageviews 47,085 43,215 -8% ▼ 28,938 15,624 -46% ▼ databases a–z views 11,475 11,980 4% ▲ 8,687 7,743 -11% ▼ libguides views 7,473 7,081 -5% ▼ 3,317 3,282 -1% ▼ chats 363 306 -16% ▼ 182 213 17% ▲ research consultations 162 149 -8% ▼ 59 77 31% ▲ information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 11 figure 3. neiu: change from 2019 to 2020 in both control and covid time frames (in %). figure 4. neiu: percentage of differences in covid time frames from determined baselines. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 12 valparaiso university overall, usage of valparaiso university’s library resources dropped dramatically during the part of the 2020 spring semester affected by covid-19 (see table 4). reductions occurred in all areas except chat assistance. if you consider that usage was up in most categories during the pre-covid19 part of the semester compared to the previous year, the decline during the latter part of the semester is even more significant. the two exceptions to this pattern of covid-related net reduction were chat questions and interlibrary loan lending requests received (see figure 6). table 4. valpo: use of library resources during control and covid time frames, 2019 and 2020 measure of use 2019 control 2020 control control time frame change 2019 covid 2020 covid covid time frame change ill borrowing & docdel 564 668 18% ▲ 721 622 -14% ▼ ill lending 341 260 -24% ▼ 373 303 -19% ▼ catalog pageviews 18,951 16,231 -14% ▼ 26,865 6,234 -77% ▼ discovery tool pageviews 20,020 21,369 7% ▲ 32,818 28,534 -13% ▼ ebsco total item requests 7,355 7,324 -0% ▼ 10,871 10,159 -7% ▼ proquest total item requests 8,770 10,844 24% ▲ 12,325 10,915 -11% ▼ main website pageviews 27,334 31,919 17% ▲ 29,771 21,336 -28% ▼ databases a–z views 3,959 4,242 7% ▲ 5,044 4,526 -10% ▼ libguides views 15,395 18,029 17% ▲ 13,374 12,980 -3% ▼ chats 26 32 23% ▲ 20 37 85% ▲ information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 13 figure 5. valpo: change from 2019 to 2020 in both control and covid time frames (in %). figure 6. valpo: percentage of differences in covid time frames from determined baselines. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 14 table 5. three institutions: net pandemic change in resource use (in %) measure of use lsu neiu valpo ill borrowing & docdel 37% ▲ -32% ▼ ill lending 66% ▲ 5% ▲ sfx docdel clickthroughs -11% ▼ catalog pageviews -20% ▼ -62% ▼ discovery tool pageviews -11% ▼ -18% ▼ -20% ▼ ebsco total item requests -6% ▼ -38% ▼ -6% ▼ proquest total item requests -153% ▼ -12% ▼ -35% ▼ main website pageviews -13% ▼ -38% ▼ -45% ▼ databases a–z views 1% ▲ -15% ▼ -17% ▼ libguides views 58% ▲ 4% ▲ -20% ▼ chats 33% ▲ 62% ▲ research consultations 39% ▲ ask-a-librarian tickets 384% ▲ discussion louisiana state university usage of the libraries’ website, discovery system, and catalog all declined during the covidenforced closure. use of the main library site has been steadily declining since 2012. lsu library’s main website saw reduced traffic in both 2020 time frames, but this reduction was particularly dramatic during the covid-19 pandemic. catalog use has also been declining but experienced a sharp decline of 25% during the covid closure. catalog use regularly drops when the library is closed, at least partially owing to the fact that library staff are heavy catalog users. also, the catalog is largely used by patrons seeking print items and, with the closure of the library building, patrons were less likely to search for print items. therefore, a drop could be expected, but such an extreme reduction was a surprise. usage of the main discovery system (eds) followed a similar trajectory. as most discovery searches begin on the library website’s home page, a decline in discovery use would logically follow a decline in website usage. eds pageviews declined between 2019 and 2020 during the control time frame by 6% but declined much more sharply during the covid time frame. interlibrary loan requests (borrowing/document delivery and lending) rose considerably during the closure. one factor in the borrowing/document delivery increase was that document delivery was opened up to undergraduates during the covid 2020 time frame; this service was not available to that patron group during the control time frames and the covid 2019 time frame. downloads through ebsco, typically lsu’s busiest platform, increased in both the control time frame (37%) and covid time frame (30%), resulting in a baseline-factored decrease of 6% (rounded to the nearest percent). proquest use was less straightforward to interpret. while there was a dramatic decrease in downloads between 2019 and 2020, that seems to owe more to an unusually high usage in the 2019 covid time frame rather than a precipitous drop in the 2020 covid time frame. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 15 lsu uses springshare’s popular libguides product, both for its list of databases (databases a–z) and as a platform for research guides. usage of the databases a–z page was minimally impacted in the covid time frame. however, overall guide usage rose significantly at 56%. at least some of this is due to the creation of a covid-specific research guide. this guide included descriptions and links to various electronic resources which vendors had made freely available during the crisis. that page alone had nearly 1,700 views in the covid time frame. but other pages with no clear connection to covid also had significant increases. this may reflect students feeling the need to consult electronic guides to compensate for lack of in-person access to librarians. the covid closure did have an enormous impact on usage of lsu’s patron support system. this system, labeled “ask us!”, handled 108 tickets in the 2020 covid time frame, compared to 17 in 2019. while this system’s use was trending upward in the control time frame as well, the jump in use during the covid-19 closure was notable. while many of the tickets were similar to questions asked when the libraries were open, many others were clearly related to the library’s closure. for example, several inquiries were in reference to alternate access to print items (such as scanning and delivering electronic versions of book chapters). there were also inquiries about when the building might reopen, and when patrons might be able to access print items that had been placed on hold. northeastern illinois university the library’s website is part of the university’s drupal content management system, which restricts the library’s ability to design and structure content. for that reason, much of the high impact library content is stored on third-party sites, like libguides or worldcat discovery. while the library directs users to start their research on the main website, users typically immediately follow a link that leads them to one of those third-party resources. usage of the library website and resources linked from the website both dropped. sfx document delivery clickthroughs, which pass users through to neiu’s interlibrary loan request pages, saw increases overall in 2020 compared to 2019. this is not surprising, given that neiu was forced to cancel two “big deal” packages at the end of 2019. the neiu libraries’ chat service was one of the resources that experienced increased usage during the covid 2020 time frame. an informal review of questions coming in during the early covid closure showed that most respondents were asking about building hours or returning or checking out materials. while the library website was immediately updated with covid-related closure information, the chat button is easily spotted and readily available, while the hours and materials information required clicking additional links. research consultations also increased. as the physical reference desk was no longer available, students and faculty were directed to use email or set up subject librarian google meet appointments for those questions where they may have usually visited the desk. an increase of 31% in these interactions demonstrates that users still needed librarian assistance with research and course-related questions in the time of covid. valparaiso university when evaluating valpo’s interlibrary loan demand, only article requests (for both borrowing/document delivery and lending) were considered; loans were excluded. demand from valpo patrons fell 32% during the time frame affected by covid-19. despite lack of access to the print collection once the pandemic hit, net lending demand from electronic resources rose slightly. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 16 catalog use was in decline before covid hit. this is likely due to the increased reliance on the library’s discovery tool, which includes records for all materials held in the catalog. however, the difference in use seen during the pandemic is striking; net use fell 62%. because patrons use the catalog to access information about the library’s physical collection and the library was closed, this precipitous decline is not surprising. overall discovery tool (summon) use also fell during the 2020 covid time frame (20%). valparaiso university subscribes to approximately 60 ebsco databases and 30 proquest databases, making them valpo’s most popular database vendors. both saw declines, but in proquest more than ebsco. the library’s main website has slightly more than 100 pages, and although it serves as a starting point to reach many of the library’s other resources on different platforms (databases, libguides, etc.) it still receives considerable use: 147,125 pageviews in the 2019 calendar year. high-use pages include hours, the library directory, departments such as interlibrary loan and archives & special collections, and the listing of liaison librarians. the main website experienced a net 45% drop during the time frame affected by covid-19. valpo uses libguides for two primary purposes: to organize and share approximately 200 databases using the product’s a–z database list, and to deliver subject-specific and instruction content. the a–z lists are heavily used by students and faculty across campus. however, during the pandemic, net usage of the databases fell 17%. the net decrease in libguides views was approximately the same at −20%. seemingly, patrons had increased need for chat reference during the pandemic, although valpo receives a relatively small number of chat questions. during the first part of the spring 2020 semester unaffected by covid, valpo received six more questions over the same time frame in 2019. during the covid-affected part of the 2020 semester, usage jumped from 20 chats to 37 chats. again, these numbers are small, so caution should be used in interpretation, but considering the baseline change from the control time frame, there was a net 62% jump in usage for chat reference during the covid time frame. commonalities and trends while the study did not set out to compare usage among the three institutions, some clear patterns did emerge. use of all three libraries’ websites as well as the discovery tools/catalogs and major databases decreased during the covid time frame. a number of explanations could apply. for instance, regarding library website usage, libraries’ public computers are often set to open to the libraries home pages whenever a new browser is launched. when those computers are not in use, such as when library buildings are closed, students do not automatically start their research on those pages. even though students did continue to interact with librarians and library staff through virtual methods, in-person reference encounters were not possible during covid. many patron interactions begin with librarians demonstrating how to start research at the library home page, moving on to find databases or conduct searches in the discovery tool or catalog. without direct librarian guidance, it is not surprising that students do not start their research at library tools. whether it is because they do not realize library resources are available off campus or they have become fond of free web search tools such as google scholar, most public services librarians would expect a decrease in library resource usage when students are not on campus. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 17 with the exception of lsu’s libguides usage, the results of this study do not support the ala assertion that covid led to “leaps in the use of digital content” by library patrons. it is not surprising to see that all three libraries experienced an increase in virtual communication methods, as chat, email, and online meetings were the only means of student-librarian communication once campuses closed. no longer could students catch library staff in the stacks or at a service desk to ask quick—and often uncounted—questions. instead, interactions were more easily measured through the virtual trails they left behind. it would be difficult to determine whether overall student-librarian interaction increased or decreased during the covid time frame as compared to “normal” times. data only show that students did continue to seek help from libraries even when the buildings were closed. individual institutions noted some within–time frame usage patterns. for instance, valpo found that the drop in usage was more dramatic during the first part of the covid 2020 time frame and improved during the latter part of the covid time frame. initially, students were probably in sinkor-swim mode, with school assignments having a lower priority in their lives. after the first month or so, students and faculty may have been able to start thinking about research needs more. valparaiso university fared the worst of the three institutions, with net decreases in eight out of ten of the areas studied (80%) (see table 4). neiu had decreases in six out of nine areas (67%) and lsu, the largest institution, broke even with five out of ten (50%) areas showing decreases. valpo has a carnegie classification of highly residential, while lsu and neiu are primarily nonresidential. it could be that the residential nature of valpo’s campus more negatively impacted use when students were away from campus. lsu, a state flagship, has a higher percentage of graduate students than valpo or neiu. it seems likely that graduate students would be more determined to continue research activities during the closure than undergraduates, which may explain the smaller decrease. this could be an area of further research. conclusion as the pandemic continues and universities plan for altered learning environments into the fall, will there be a rebound in library system and resource usage, or will the dramatic dip seen during immediate covid time frame continue? nearly every day there are multiple webinars for academic libraries, their administrators, and their staff members to share their stories, compare their experiences, and help guide each other for operating in the new normal. librarians have had time to adjust typical pedagogical practices, learn new virtual technologies, and develop outreach plans to ensure continued library instruction in remote and online environments. other changes to library practice may include an even greater shift in acquisitions from print to electronic resources. services for distance students might become a greater point of emphasis. faculty, too, have had time to reevaluate their syllabi and identify support needs for themselves and their students as courses go online. as with so many areas of life these days, the outcomes of this work remain uncertain. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 18 endnotes 1 “covid-19 map,” johns hopkins coronavirus resource center, accessed july 31, 2020, https://coronavirus.jhu.edu/map.html. 2 the carnegie classification of institutions of higher education, 2018 ed. (bloomington: indiana university center for postsecondary research, 2018), https://carnegieclassifications.iu.edu/. 3 the carnegie classification. 4 “library collection by material type: fiscal year 2018,” national center for education statistics, accessed august 21, 2020, https://nces.ed.gov/ipeds/datacenter/institutionbyname.aspx?gotoreportid=1. 5 the carnegie classification. 6 “library collection by material type.” 7 “enrollment data by semester,” valparaiso university office of institutional effectiveness, accessed august 11, 2020, https://www.valpo.edu/institutional-effectiveness/institutionalresearch/enrollment-data/. 8 “library collection by material type.” 9 s. d. allen iske and linda g. lengfellner, “fire, water & books: disaster preparedness for academic libraries,” professional safety 60, no. 10 (october 2015), https://search.proquest.com/docview/1735009821?pq-origsite=summon; daryl l. superio, stephen b. alayon, and mary grace h. oliveros, “disaster management practices of academic libraries in panay island, philippines: lessons from typhoon haiyan,” information development 35, no. 1 (january 2019): 51–66, https://doi.org/10.1177/0266666917725905; andy corrigan, “disaster: response and recovery at a major research library in new orleans,” library management 29, no. 4/5 (may 2008): 293–306, https://doi.org/10.1108/01435120810869084. 10 lisa mcguire, “planning for a pandemic influenza outbreak: roles for librarian liaisons in emergency delivery of educational programs,” medical reference services quarterly 26, no. 4 (december 2007): 1–13, https://doi.org/10.1300/j115v26n04_01. 11 bradley p. tolppanen and marlene slough, “providing circulation services in a temporary location,” journal of access services 1, no. 4 (may 24, 2004): 125, https://doi.org/10.1300/j204v01n04_10. 12 walter m. fontane, “assessing library services during a renovation,” journal of access services 13, no. 4 (october 1, 2016): 223, https://doi.org/10.1080/15367967.2016.1250643. 13 fontane, 223. 14 marc vinyard et al., “a pop-up service point and repurposed study spaces: maintaining market share during a renovation,” journal of library administration 58, no. 5 (july 4, 2018): 449–67, https://doi.org/10.1080/01930826.2018.1468193. https://coronavirus.jhu.edu/map.html https://carnegieclassifications.iu.edu/ https://nces.ed.gov/ipeds/datacenter/institutionbyname.aspx?gotoreportid=1 https://www.valpo.edu/institutional-effectiveness/institutional-research/enrollment-data/ https://www.valpo.edu/institutional-effectiveness/institutional-research/enrollment-data/ https://search.proquest.com/docview/1735009821?pq-origsite=summon https://doi.org/10.1177/0266666917725905 https://doi.org/10.1108/01435120810869084 https://doi.org/10.1300/j115v26n04_01 https://doi.org/10.1300/j204v01n04_10 https://doi.org/10.1080/15367967.2016.1250643 https://doi.org/10.1080/01930826.2018.1468193 information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 19 15 amy harmon, “inside the race to contain america’s first coronavirus case,” new york times, february 5, 2020, sec. us, https://www.nytimes.com/2020/02/05/us/corona-viruswashington-state.html; karin fischer, “colleges brace for more-widespread outbreak of coronavirus,” chronicle of higher education, february 26, 2020, http://www.chronicle.com/article/colleges-brace-for/248123. 16 “guidance for interruptions of study related to coronavirus (covid-19),” us department of education office of postsecondary education, updated june 16, 2020, https://ifap.ed.gov/electronicannouncements/030520guidance4interruptionsrelated2coronaviruscovid19. 17 “libraries respond: covid-19 survey results (may 2020),” american library association, http://www.ala.org/tools/covid/libraries-respond-covid-19-survey. 18 “libraries respond: covid-19 survey,” appendix 1: academic library financial aggregate tables. 19 “survey: libraries examine phased building re-opening, prepare summer programs,” american library association news and press center, june 3, 2020, http://www.ala.org/news/pressreleases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summerprograms. 20 “libraries respond: covid-19 survey” (instrument), american library association, may 2020, http://www.ala.org/pla/sites/ala.org.pla/files/content/advocacy/covid-19/librariesrespond-covid-19-survey-may-2020_5-12-20.pdf. 21 karin fischer, “as coronavirus spreads, colleges make limited allowances for support staff,” chronicle of higher education, march 23, 2020, http://www.chronicle.com/article/ascoronavirus-spreads/248304. 22 christine wolff-eisenberg and lisa janicke hinchliffe, “academic library strategies shift to closure and restriction,” ithaka s+r (blog), march 15, 2020, https://sr.ithaka.org/blog/academic-library-strategies-shift-to-closure-and-restriction/. 23 fischer, “as coronavirus spreads”; colleen flaherty, “librarians advocate closing campus libraries during coronavirus pandemic,” inside higher ed, march 19, 2020, https://www.insidehighered.com/news/2020/03/19/librarians-advocate-closing-campuslibraries-during-coronavirus-pandemic. 24 “cloud-based library platform keeps california community college libraries operational during covid-19 crisis—system is now live at 110 california community colleges,” library technology guides, current news service and archive, 2020, https://librarytechnology.org/pr/25004; “springshare responds to remarkable shift to online library services,” library technology guides, current news service and archive, april 29, 2020, https://librarytechnology.org/pr/25116. 25 “statement on the global covid-19 pandemic and its impact on library services and resources | icolc website,” icolc coordinating committee, march 13, 2020, https://www.nytimes.com/2020/02/05/us/corona-virus-washington-state.html https://www.nytimes.com/2020/02/05/us/corona-virus-washington-state.html http://www.chronicle.com/article/colleges-brace-for/248123 https://ifap.ed.gov/electronic-announcements/030520guidance4interruptionsrelated2coronaviruscovid19 https://ifap.ed.gov/electronic-announcements/030520guidance4interruptionsrelated2coronaviruscovid19 http://www.ala.org/tools/covid/libraries-respond-covid-19-survey http://www.ala.org/news/press-releases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summer-programs http://www.ala.org/news/press-releases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summer-programs http://www.ala.org/news/press-releases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summer-programs http://www.ala.org/pla/sites/ala.org.pla/files/content/advocacy/covid-19/libraries-respond-covid-19-survey-may-2020_5-12-20.pdf http://www.ala.org/pla/sites/ala.org.pla/files/content/advocacy/covid-19/libraries-respond-covid-19-survey-may-2020_5-12-20.pdf http://www.chronicle.com/article/as-coronavirus-spreads/248304 http://www.chronicle.com/article/as-coronavirus-spreads/248304 https://sr.ithaka.org/blog/academic-library-strategies-shift-to-closure-and-restriction/ https://www.insidehighered.com/news/2020/03/19/librarians-advocate-closing-campus-libraries-during-coronavirus-pandemic https://www.insidehighered.com/news/2020/03/19/librarians-advocate-closing-campus-libraries-during-coronavirus-pandemic https://librarytechnology.org/pr/25004 https://librarytechnology.org/pr/25116 information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 20 https://icolc.net/statement/statement-global-covid-19-pandemic-and-its-impact-libraryservices-and-resources. 26 “icolc covid19 complimentary expanded access specifics” (google doc), accessed may 8, 2020, https://docs.google.com/spreadsheets/d/1xiinlf9p00to5lgki3v4s413iujycm5qjokug19a_y/edit?usp=sharing. 27 “libraries without walls: even wider access to digital resources during pandemic | penn state university,” penn state news, april 8, 2020, https://news.psu.edu/story/614577/2020/04/08/research/libraries-without-walls-evenwider-access-digital-resources-during. 28 emilija sagaityte and cate ryan, “historic pandemic poses lasting impact on libraries, scholarship,” brown daily herald, march 31, 2020, https://www.browndailyherald.com/2020/03/31/historic-pandemic-poses-lasting-impacton-libraries-scholarship/. 29 “message to libraries about counter usage during the covid-19 pandemic,” project counter, june 12, 2020, https://www.projectcounter.org/message-to-libraries-about-counter-usageduring-the-covid-19-pandemic/. 30 “outmigration numbers increase, with evidence of tie to budget impasse,” illinois board of higher education, march 12, 2019, http://www.ibhe.org/pressreleases/2019.03.12-ibheoutmigration-numbers-for-web.htm. 31 “fall enrollments: five-year trend data,” northeastern illinois university, accessed june 3, 2020, https://www.neiu.edu/sites/neiu.edu/files/documents/ysun2/fall%202019%20data%20di gest%205-year%20enrollment.pdf. https://icolc.net/statement/statement-global-covid-19-pandemic-and-its-impact-library-services-and-resources https://icolc.net/statement/statement-global-covid-19-pandemic-and-its-impact-library-services-and-resources https://docs.google.com/spreadsheets/d/1xiinlf9p00to-5lgki3v4s413iujycm5qjokug19a_y/edit?usp=sharing https://docs.google.com/spreadsheets/d/1xiinlf9p00to-5lgki3v4s413iujycm5qjokug19a_y/edit?usp=sharing https://news.psu.edu/story/614577/2020/04/08/research/libraries-without-walls-even-wider-access-digital-resources-during https://news.psu.edu/story/614577/2020/04/08/research/libraries-without-walls-even-wider-access-digital-resources-during https://www.browndailyherald.com/2020/03/31/historic-pandemic-poses-lasting-impact-on-libraries-scholarship/ https://www.browndailyherald.com/2020/03/31/historic-pandemic-poses-lasting-impact-on-libraries-scholarship/ https://www.projectcounter.org/message-to-libraries-about-counter-usage-during-the-covid-19-pandemic/ https://www.projectcounter.org/message-to-libraries-about-counter-usage-during-the-covid-19-pandemic/ http://www.ibhe.org/pressreleases/2019.03.12-ibhe-outmigration-numbers-for-web.htm http://www.ibhe.org/pressreleases/2019.03.12-ibhe-outmigration-numbers-for-web.htm https://www.neiu.edu/sites/neiu.edu/files/documents/ysun2/fall%202019%20data%20digest%205-year%20enrollment.pdf https://www.neiu.edu/sites/neiu.edu/files/documents/ysun2/fall%202019%20data%20digest%205-year%20enrollment.pdf abstract introduction background louisiana state university northeastern illinois university valparaiso university literature review preparing and responding to pandemics and other disasters resource use during building disruptions covid-19 impact methodology results louisiana state university northeastern illinois university valparaiso university discussion louisiana state university northeastern illinois university valparaiso university commonalities and trends conclusion endnotes creating and managing a repository of past exam papers communications creating and managing a repository of past exam papers mariya maistrovskaya and rachel wang information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11837 mariya maistrovskaya (mariya.maistrovskaya@utoronto.ca) is digital publishing librarian, university of toronto. rachel wang (rachel.wang@utoronto.ca) is application programmer analyst, university of toronto. abstract exam period can be a stressful time for students, and having examples of past papers to help prepare for the tests can be extremely helpful. it is possible that past exams are already shared on your campus—by professors in their specific courses, via student unions or groups, or between individual students. in this article, we will go over the workflows and infrastructure to support the systematic collection, provision of access to, and repository management of past exam papers. we will discuss platform-agnostic considerations of opt-in versus opt-out submission, access restriction, discovery, retention schedules, and more. finally, we will share the university of toronto setup, including a dedicated instance of dspace, batch metadata creation and ingest scripts, and our submission and retention workflows that take into account the varying needs of stakeholders across our three campuses. background the university of toronto (u of t) is the largest academic institution in canada. it spans across three campuses and serves more than 90,000 students through its 700 undergraduate and 200 graduate programs.1 the university of toronto structure is the product of its rich history and is thus largely decentralized. as a result, the management of undergraduate exams is carried out individually by each major faculty at the downtown (st. george) campus, and centrally at the university of toronto mississauga (utm) campus and the university of toronto scarborough (utsc) camp us. the faculty of arts and science (fas) at the st. george campus has traditionally made exams from its departments available to students. in the pre-internet era, students were able to consult print and bound exams in departmental and college libraries’ reference collections. with the rise of online technologies, the fas registrar’s office seized the opportunity to make access to past exams more equitable for students and worked with the university of toronto libraries (utl) information technology services (its) to digitize and make exams available online. they were initially shared electronically via the gopher protocol and later via docutek eres, one of the first available course e-reserves systems. after the utl became an early adopter of the dspace (https://duraspace.org/dspace/) open source platform for its institutional repository in 2003, the utl its created a separate instance of dspace to serve as a repository of old exams. the repository makes the last three years of exams from the fas, utm, and utsc available online in pdf. about 5,500 exam papers are available to students with u of t login at any given time. discussed below are some of the considerations in establishing and maintaining a repository of old exams on campus, along with practical recommendations and shared workflows from the utl. mailto:mariya.maistrovskaya@utoronto.ca mailto:rachel.wang@utoronto.ca https://duraspace.org/dspace/ information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 2 considerations in establishing a repository of old exams if you are looking to establish a repository of old exams, these are some of the considerations to take into account when planning a new service or evaluating an existing one. the source of old exams depending on the level of centralization on your campus, exams may be administered by individual academic departments or submitted by instructors/admins into a single location and managed centrally. the stakeholders involved in this process may include the office of the registrar, campus it, departmental admins or libraries, etc. establishing a relationship with such stakeholders is key in getting access to the files. when arranging to receive electronic files, consider whether they could be accompanied with existing metadata. alternatively, if the university archives or records management already receive copies of campus exams, you may be able to obtain them there. print versions will need to be digitized for online access—later in this article we will share metadata creation strategies in this scenario. it is also possible that exams may be collected in less formal ways, for example, via exam drives by student unions and groups. the utl works closely with the fas registrar’s office to receive a batch of exams annually. the utl receives a copy of print fas exams that get digitized by the its staff. the utl also receives exams from two u of t campuses, utm and utsc, that arrive in electronic format via the campus libraries. the u of t engineering society and the faculty of law each maintain their individual exam repositories, and the arts and science student union maintains a bank of term tests donated by students. content hosting and management one of the key questions to answer is which campus department or unit will be responsible for hosting the exams, managing content collection, processing and uploads, and providing technical and user support. these responsibilities may be within the purview of a single unit or may be shared between stakeholders. here are some examples of the tasks to consider: 1. collecting exams from faculty or receiving them from a central location 2. managing restrictions (exams that will not be made available online) 3. digitizing exams received in print 4. creating metadata or converting metadata received with the files 5. uploading exams to the online repository 6. removing exams from the online repository 7. providing technical support and maintenance (e.g., platform upgrades, troubleshooting) 8. providing user support (e.g., assistance with locating exams) at u of t, tasks 1–2 are taken care of by registrar offices at fas and utm and by the library at utsc. tasks 3–8 are performed centrally by the utl its, with the exception of digitization services for exams received from the utm and utsc campuses. further details and considerations related to the content management system and processing pipelines are outlined in the “infrastructure and workflows” section below. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 3 collection scope depending on the sources of your exams, you may need to establish the scope rules for what gets included in the collection. for example: • will you only include final exams? will term tests also be included? • will solutions be posted with the exams? • will additional materials, such as course syllabi, also be included? at the utl, only final exams are included in the repository, and no answers are supplied. exam retention making old exams available online is always a balancing act between the interests of students who want to have access to past test questions and the interests of instructors who may have a limited pool of questions to draw from or who may teach different course content over time and want to ensure that the questions continue to be relevant. at the utl, in consultation with campus partners, the balance was achieved by only posting the three most recent years of exams in the repository. as soon as a new batch is received, the utl removes a batch of exams more than three years old. opt-in versus opt-out approach where exam collection is driven centrally by a registrar’s office, for example, that office may require that all past exams be made available to students. similarly to the retention considerations, the needs of instructors who draw questions from a limited pool can be accommodated via opt-outs, individual exam restrictions, and ad hoc take-down requests. an alternative approach to exam collection would be an opt-in model where faculty choose to submit exam questions on their own schedule. at the utl, the fas and the utm campus both operate under the opt-out model. the utl receives all exam questions in regular batches unless they have been restricted by instructors’ requests. occasional withdrawal requests from instructors require an approval from the registrar’s office. conversely, the utsc campus operates under the opt-in model where individual departments submit their exams to the library. while this model provides the most flexibility, the volume of exams received from this campus is subsequently relatively small. repository access when making old exams available online, one of the things to consider is who will have access to them. will the exams only be available to students of the respective academic department, or to all students, or to the general public? will access be possible on campus as well as off campus? if the decision is made to restrict access, is there an existing authorization infrastructure in place that the repository could take advantage of, such as an institutional single sign-on or library’s proxy access? at the utl, access to the old exams repository is provided through ezproxy in the same fashion as subscription resources made available via the library. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 4 discoverability and promotion how will students find out about the exams available in the repository? will the repository be advertised via the library’s website, promoted by course instructors, or linked with the other course materials? considering the challenge of promoting a resource like this along with a variety of other library resources, it will be preferable to make it known to students via the same channels through which they receive other course information. for many institutions this would be via their learning management system or their course information system. at u of t, the old exams repository is linked from the library website. previously, the link was embedded in the university’s learning management system course template. with a recent transition to a new learning management engine, such exposure is yet to be reestablished. infrastructure and workflows minimum cms requirements a repository of old exams does not require a specific content management system (cms) or an offthe-shelf platform. your institution may already have all the components in place to make it happen. here are the minimum requirements you will want to see in such a system: • file upload by staff (preferably in batch) • file download by end users • basic descriptive metadata • search / browse interface • access control / authentication (if you choose to restrict access) the utl uses a stand-alone instance of dspace for its old exams repository. dspace is an opensource software for digital repositories used across the globe primarily in academic institutions. the utl chose this platform since it was already running an instance of dspace for its institutional repository (ir) and had the infrastructure and expertise on site. however, this is not a solution we would recommend to an institution with no existing dspace experience. while dspace is an open source platform, maintaining it locally requires significant staff expertise that may not be warranted considering that a collection of exams would only use a fraction of its robust functionality. if you do consider using dspace, a hosted solution may be preferable in a situation when local it resources and expertise are limited. distributing past exams via an existing digital repository an institution that already maintains a digital repository may consider adding exams as a collection to the existing infrastructure. when choosing to do so it is important to consider whether the exams use case may be different from your ir use case, and whether the new collection will fit in the existing mission and policies. differences may include the following: • access level. ir missions tend to revolve around providing openly accessible materials, whereas exams may need to be restricted. will your repository allow selective access restrictions to the exams collection? • longevity. ir materials are usually intended to be kept long-term, whereas exams may be on a retention schedule. for that reason, it also does not make sense to assign permanent identifiers to exams as many repositories do for their other materials. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 5 • file types and metadata. unlike a variety of research outputs and metadata usually captured in an ir, exams would have uniform metadata and object type. this makes them suitable for batch transformations and uploads. batch metadata creation options because of the uniform object type, exams are well suited to batch processing, transformations, and uploads. at utl, metadata is created from the filenames of scanned pdf files by a python script.2 the script breaks up the filename into dublin core metadata fields based on the pattern shown in figure 1. see figure 2 for a snippet of the script populating dublin core metadata fields. figure 1. file-naming pattern for metadata creation at utl. figure 2. a screenshot of the utl script generating dublin core metadata from filenames. once metadata is generated, the second python script (figure 3) packages the pdf and metadata file into a dspace simple archive (dsa) which is the format that dspace accepts for batch ingests. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 6 figure 3. a screenshot of the utl script packaging a pdf and metadata into a dspace simple archive. the dspace simple archive (dsa) then gets batch uploaded into the respective campus and examperiod collections (figure 4) using the dspace native batch import functionality. figure 5 shows what an individual exam record looks like in the repository. after a new batch is uploaded, collections older than three years are removed from the repository. the utl’s exams processing scripts are openly available in github under an apache license 2.0 (https://github.com/utlib/dspace-exams-ingest-scripts/). figure 4. a screenshot of collections in the utl’s old exams repository. https://github.com/utlib/dspace-exams-ingest-scripts/ information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 7 figure 5. a screenshot of a record in the utl’s old exams repository. conclusion having access to examples of past exam questions can be extremely helpful to students in preparing for upcoming tests. it is possible that old exams are already being shared on your campus in official or unofficial ways, in print or electronically. facilitating online sharing of electronic copies means that all students, on and off campus, will have equitable access to these valuable resources. we hope that the considerations and workflows outlined in this article will help institutions establish such services locally. acknowledgements the authors would like to acknowledge the utl librarians and staff who contributed to the setup and maintenance of the old exams repository over the years: marlene van ballegooie, metadata technologies manager, who operated the filename-to-dublin core metadata crosswalk; sean xiao zhao, former applications programmer analyst, who converted it into python; and sian meikle, associate chief librarian for digital strategies and technology, who was at the inception of the original exam-sharing service and provided valuable historical context and feedback on this article. endnotes 1 university of toronto, “quick facts,” accessed november 4, 2019, https://www.utoronto.ca/about-u-of-t/quick-facts. 2 university of toronto libraries, “exam metadata generation and ingest for dspace,” github repository, last modified september 20, 2019, https://github.com/utlib/dspace-exams-ingestscripts/. https://www.utoronto.ca/about-u-of-t/quick-facts https://github.com/utlib/dspace-exams-ingest-scripts/ https://github.com/utlib/dspace-exams-ingest-scripts/ abstract background considerations in establishing a repository of old exams the source of old exams content hosting and management collection scope exam retention opt-in versus opt-out approach repository access discoverability and promotion infrastructure and workflows minimum cms requirements distributing past exams via an existing digital repository batch metadata creation options conclusion acknowledgements endnotes editorial board thoughts: events in the life of ital sharon farnel information technology and libraries | june 2018 4 sharon farnel (sharon.farnel@ualberta.ca) is metadata coordinator, university of alberta libraries. at the end of june 2018, i will be ending my time on the ital editorial board. during my term i have had the opportunity to write several “from the board” pieces and have very much enjoyed the freedom to explore a library technology topic of choice. this time around i would like to examine ital as seen through crossref’s event data service. crossref launched its event data service in beta in 2017; production service was announced in late march of this year. event data is “an open data service that registers online activity (specifically, events) associated with crossref metadata. event data will collect and store a record of any activity surrounding a research work from a defined set of web sources. the data will be made available as part of our metadata search service or via our metadata api and normalised across a diverse set of sources. data will be open, audit-able and replicable.”1 using dois as a basis, event data captures information on discussions, citations, references and other actions on wikipedia, twitter, and other services. i thought it might be interesting to see what the crossref event data might say about ital. i used the event data api2 to pull event data using the prefix for all ojs journals hosted by boston college (10.6017). i then used openrefine3 to filter out all non-ital records and then began further examining the data. the data was gathered on may 9, 2018. in total, 313 events were captured. of these, 193 events were from wikipedia, 110 from twitter, and 5 each from the lens (patent citations) and wordpress blogs. the 313 events are associated with 38 ital articles, the earliest from 1973 (volume 6, number 1, from ital’s digitized archive), and the most recent from 2018 (volume 37, number 1). the greatest number of events (126) are associated with an article from volume 25, number 1 (2006) on rfid in libraries.4 the other articles are associated with a varying number of discrete events, from one to 24. looking more closely at the events associated with the 2006 article on rfid, all 126 events are references in wikipedia. these represent references to the english and japanese language wikipedia articles on radio frequency identification. other references from wikipedia are to articles on open access, fast (faceted application of subject terminology), library 2.0 , biblioteca 2.0, and others. what about that article from 1973? it was written by j. j. dimsdale and titled “file structure for an on-line catalog of one million titles.” the abstract provides a tantalizing glimpse into the content: “a description is given of the file organization and design of an on-line catalog suitable for automation of a library of one million books. a method of virtual hash addressing allows rapid search of the indexes to the catalog file. storage of textual material in a compressed f orm allows considerable reduction in storage costs.”5 mailto:sharon.farnel@ualberta.ca events in the life of ital | farnel 5 https://doi.org/10.6017/ital.v37i2.10460 there are only four events associated with this 1973 article, but interestingly all are from the lens,6 a global patent search database. these are a set of related patents, by mayers and whiting, for data compression apparatus and methods.7 there are 110 events associated with twitter, with tweets from 15 different users. the largest number of events, 21, begins with aaron tay, 8 a librarian and blogger from singapore management university, tweeting about a 2016 ital article9 on user expectations of library discovery products, which was then retweeted 20 times. the two next most-tweeted articles (17 tweets/retweets each) discuss privacy and user experience in library discovery 10 and “reference rot” in etd (electronic theses & dissertations) repositories. 11 what value can such a brief examination of this small set of data from a very new service provide to ital authors, or the editorial board? it can certainly provide a glimpse of who might be accessing ital articles, and how, and perhaps provide some hints as to ways to increase the reach of the journal. this kind of data is not a replacement for download counts or bibliographic citation patterns, but can complement them and add another layer to our understanding of the place of ital in the library technology community and beyond. as ital continues to thrive and as services like event data continue to improve, i look forward to seeing what story this data continues to tell! references the event data used for this analysis can be found at https://bit.ly/2kgdjcm. 1 madeleine watson, “event data: open for your interpretation,” crossref blog, february 25, 2016, https://www.crossref.org/blog/event-data-open-for-your-interpretation/. 2 crossref, event data user guide, https://www.eventdata.crossref.org/guide/. 3 openrefine, http://openrefine.org/. 4 jay sing, navjit brar and carmen fong, “the state of rfid applications in libraries,” information technology and libraries 25 no. 1, 2006, https://doi.org/10.6017/ital.v25i1.3326. 5 j. j. dimsdale, “file structure for an on-line catalog of one million titles,” information technology and libraries 6, no. 1, 1973, https://doi.org/10.6017/ital.v6i1.5760. 6 the lens, https://www.lens.org/. 7 clay mayers and douglas whiting. data compression apparatus and method using matching string searching and huffman encoding. us patent 5532694, filed july 7, 1995, and issued july 2, 1996. 8 aaron tay, https://twitter.com/aarontay. 9 irina trapido, “library discovery products: discovering user expectations through failure analysis,” information technology and libraries 35, no. 3, 2016, https://doi.org/10.6017/ital.v35i3.9190. https://bit.ly/2kgdjcm https://www.crossref.org/blog/event-data-open-for-your-interpretation/ https://www.eventdata.crossref.org/guide/ http://openrefine.org/ https://doi.org/10.6017/ital.v25i1.3326 https://doi.org/10.6017/ital.v6i1.5760 https://www.lens.org/ https://twitter.com/aarontay https://doi.org/10.6017/ital.v35i3.9190 information technology and libraries | june 2018 6 10 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2, 2017, https://doi.org/10.6017/ital.v36i2.9817. 11 mia massicotte and kathleen botter, “reference rot in the repository: a case study of electronic theses and dissertations (etds) in an academic library,” information technology and libraries 36, no. 1, 2017, https://doi.org/10.6017/ital.v36i1.9598. https://doi.org/10.6017/ital.v36i2.9817 https://doi.org/10.6017/ital.v36i1.9598 references reproduced with permission of the copyright owner. further reproduction prohibited without permission. editorial: how do you know whence they will come? marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 3 editorial: how do you know whence they will come? a s i write this, i am putting my affairs in order at western michigan university, in preparation for a move to a new position at the university of notre dame libraries beginning in april. at each university my responsibilities include overseeing both the online catalog and the libraries' web presence. i mention this only because i find it interesting, and indicative of an issue with which the library profession in general is grappling, that librarians in both institutions are engaged in discussions regarding the relationship between the two. in talking to librarians at those places and others, from some i hear sentiment for making one or the other the "primary" access point. thus i've heard arguments that "the online catalog represents our collection, so we should use it as our main access mechanism." other librarians state that "the online catalog is fine for searching for books in our collection, but there is so much more to find and so many more options for finding it, that we should use our web pages to link everything together." my hunch is that probably we can all agree that there are things that an online catalog can do better than a web site, and things that a web site can do better than the online catalog. as far as that goes, have we ever had a primary access point (thanks to karen coyle for this thought)? but that's not what i want to talk about today. the debate over a primary access point contains an invalid implicit assumption and asks the wrong question. the implicit assumption is that we can and should control how our patrons come into our systems. the question we should be asking ourselves is not "what is our primary access method?" but rather "how can we ensure that our users, local and remote, will find an avenue that enables them to meet their informational needs?" since at this time i'm more familiar with wmu than notre dame, i'll draw some examples from the former. we have "subject guides to resources" on our web site. these consist of pages put together by subject specialists that point to recommended sources, both print and electronic, dan marmion local and remote, on given subjects. students can use them to begin researching topics in a large number of subject areas. the catch is that the students have to be browsing around the web site. if they happen to start out in the online catalog they will never encounter these gateways, because the only reference to them is on the web site. on the other hand, a student who stays strictly with the web site is quite possibly going to miss a valuable resource in our library if he/she doesn't consult the online catalog, because we obviously can't list everything we own on the web site. (also, obviously, the web site doesn't provide the patron with status information.) this is why we have to ask ourselves the correct question mentioned above. what is the solution? unfortunately i'm not any smarter than everyone else, so i don't have the answer (although i do know some folks who can help us with it: check out www.lita.org/ committe / toptech/ main page. htm). my guess is that we'll have to work it out as a profession, possibly in collaboration with our online system vendors, and that the solution will be neither quick nor simple nor easy. there are some ad hoc moves we can make, of course, such as put links to the gateways into the catalog, and on our web pages stress that the patron really needs to do a catalog search. the bottom line is that we have a dilemma: we can't control how people come into our electronic systems, so we can't have a "primary access point." if we try, we do harm to those who, for whatever reason, reach us via some other avenue. we need to make sure that we provide equal opportunity for all. dan marmion (dmarmion@nd.edu) is associate director of information systems and access at notre dame university, notre dame, indiana. production: ala production services (troy d. linker, christine s. taylor; angela hanshaw, kevin heubusch, and tracy malecki), american library association, 50 e. huron st., chicago, il 60611. publication of material in information technologtj and libraries does not constitute official endorsement by lita or the ala. abstracted in computer & information systems, computing reviews, information science abstracts, library & information science abstracts, referativnyi zhurnal, nauclmaya i tekhnicheskaya informatsiya, otdyelnyi vypusk, and science abstracts publications. indexed in compumath citation index, computer contents, computer literature index, current contents/health services administration, current contents/social behavioral sciences, current index to journals in education, education, library literature, magazine index, newsearch, and social sciences citation index. microfilm copies available to subscribers from university microfilms, ann arbor, michigan. mum requirements of american national standard for information sciences-permanence of paper for printed library materials, ansi z39.48-1992.oo copyright ©2000 american library association. all material in this journal subject to copyright by ala may be photocopied for the noncommercial purpose of scientific or educational advancement granted by sections 107 and 108 of the copyright revision act of 1976. for other reprinting, photocopying, or translating, address requests to the ala office of rights and permissions. the paper used in this publication meets the minieditorial i 3 digital faculty development editorial board thoughts digital faculty development cinthya ippoliti information technology and libraries | june 2019 5 cinthya ippoliti (cinthya.ippoliti@ucdenver.edu) is director, auraria library, colorado. the role of libraries within faculty development is not a new concept. librarians have offered workshops and consultations for faculty for everything from designing effective research assignments, to scholarly impact, and open educational resources. in recent months however, both acrl and educause have highlighted new expectations for faculty to develop skills in supporting students within a digital environment. as part of acrl’s “keeping up with…” series, katelyn handler and lauren hays1 discuss the rise of faculty learning communities that cover topics such as universal design, instructional design, and assessment. effective teaching has also recently become the focus of many institutions’ efforts in increasing student success and retention, and faculty play a central role in students’ academic experience. in addition, the educause horizon report echoes these sentiments, positing that “the role of full-time faculty and adjuncts alike includes being key stakeholders in the adoption and scaling of digital solutions; as such, faculty need to be included in the evaluation, planning, and implementation of any teaching and learning initiative.”2 finally, maha bali and autumn caines mention that “when offering workshops and evidence-based approaches, educational development centers make decisions on behalf of educators based on what has worked in the past for the majority.”3 they call for a new model that blends digital pedagogy, identity, networks, and scholarship where the experience is focused on “participants negotiating multiple online contexts through various online tools that span open and more private spaces to create a networked learning experience and an ongoing institutionally based online community.”4 so how does the library fit into this context? what we are talking about here goes far beyond merely providing access to tools and materials for faculty. it requires a deep tripartite partnership with educators and the centers for faculty development, as each partner brings something unique to the table that cannot be covered by one area alone. the interesting element here is a dichotomy where this type of engagement can span both in-person and virtual environments as faculty utilize both to teach and connect with colleagues as part of their own development. the lines between these two worlds suddenly blur and it is experience and connectivity that are at the center of the interactions rather than the tools themselves. while librarians may not be able to provide direct support in terms of instructional technologies, they can certainly inform efforts to integrate open and critical pedagogy and scholarship into faculty development programming and into the curriculum. libraries can take the lead on providing the theoretical foundation and application for these efforts while the specifics of tools and approaches can be covered by other entities. bali and caines also observe that bringing together disparate teaching philosophies and skill sets under this broader umbrella of digital support and pedagogy can help provide professional development opportunities for faculty, especially adjuncts, who may not have the ability to participate otherwise. this opportunity can act as a powerful catalyst to influence their teaching by implementing, and therefore modeling, a best-practices approach so that they are thinking about digial faculty develoment | ippoliti 6 https://doi.org/10.6017/ital.v38i2.11091 bringing students together in a similar fashion even if they are not teaching exclusively online, but especially if they are.5 open pedagogy can accomplish this in a variety of ways. bronwyn hegarty defines eight areas that constitute open pedagogy: (1) participatory technologies; (2) people, openness, and trust; (3) sharing ideas and resources; (4) connected community; 5) learner generated; (6) reflective practice; and (7) peer review.6 these elements are applicable to both faculty development practices, as well as pedagogical ones. just as faculty might interact with one another in this manner, so can they collaborate with their students utilizing these methods. by being able to change the course materials and think about the ways in which those activities shape their learning, students can view the act of repurposing information as a way to help them define and achieve their learning goals. this highlights the fact that an environment where this is possible must exist as a starting point and it also underlines the importance of the instructor’s role in fostering this environment. having a cohort of colleagues, for both instructors and students, can “facilitate student access to existing knowledge, and empower them to critique it, dismantle it, and create new knowledge.”7 this interaction emphasizes a twoway experience where both students and instructors can learn from one another. this is very much in keeping with the theme of digital content, as by the very nature of these types of activities, the tools and methods must lend themselves to being manipulated and repurposed, and this can only occur in a digital environment. finally, in a recent posting on the open oregon blog, silvia lin hanick and amy hofer discuss how open pedagogy can also influence how librarians interact with faculty and students. specifically, they state that “open education is simultaneously content and practice”8 and that by integrating these practices into the classroom, students are learning about issues such as intellectual property and the value of information, by acting “like practitioners” 9 where they take on “a disciplinary perspective and engage with a community of practice.”10 this is a potentially pivotal element to take into consideration when analyzing the landscape of library-related instruction, because it frees the librarian from feeling as if everything rests on that one-time instructional opportunity. the development of a community of practitioners which includes the students, faculty, and the librarian has the potential to provide learning opportunities along the way. including the librarian as part of this model makes sense not only as a way to signal the critical role the librarian plays in the classroom, but also as a way to stress that thinking about, and practicing library-related activities is (or should be) as much part of the course as any other exercise. information technology and libraries | june 2019 7 references 1 katelyn handler and lauren hays, “keeping up with…faculty development,” association of college and research libraries, last modified 2019, http://www.ala.org/acrl/publications/keeping_up_with/faculty_development. 2 “horizon report,” educause, last modified 2019, https://library.educause.edu//media/files/library/2019/2/2019horizonreportpreview.pdf. 3 maha bali and autumm caines. “a call for promoting ownership, equity, and agency in faculty development via connected learning.” international journal of educational technology in higher education 15, no. 1 (2018): 3. 4 bali, “a call for promoting ownership, equity, and agency in faculty development,” 9. 5 ibid, 3. 6 bronwyn hegarty, “attributes of open pedagogy: a model for using open educational resources,” last modified, 2015, https://upload.wikimedia.org/wikipedia/commons/c/ca/ed_tech_hegarty_2015_article_attri butes_of_open_pedagogy.pdf. 7 kris shaffer, “the critical textbook,” last modified 2014, http://hybridpedagogy.org/criticaltextbook/. 8 silvia lin hanick and amy hofer, “opening the framework: connecting open education practices and information literacy,” open oregon, last modified 2017, http://openoregon.org/openingthe-framework/. 9 “opening the framework.” 10 “opening the framework.” 20190318 10992 editor letter from the editor kenneth j. varnum information technology and libraries | march 2019 1 https://doi.org/10.6017/ital.v38i1.10992 the current (march 2019) issue of information technology and libraries sees the first of what i know will be many exciting contributions to our new “public libraries leading the way” column. this feature (announced in december 2018) shines a spotlight on technology-based innovations from the public library perspective. the first column, “the democratization of artificial intelligence: one library’s approach,” by thomas finley of the frisco (texas) public library, discusses how his library has developed a teaching and technology lending program around artificial intelligence, creating kits that community members can take home and use to explore artificial intelligence through a practical, hands-on, approach. if you have a public library perspective on technology that you would like to share in a conversational, 1000-1500-word column, submit a proposal. full details and a link to the proposal submission form can be found on the lita blog. i look forward to hearing your ideas. in addition to the premiere column in this series, the current issue includes the lita president’s column from bohyun kim to update us on the 2019 ala midwinter meeting, particularly on the status of the proposed alcts/llama/lita merger, and our regular editorial board thoughts column, contributed this quarter by kevin ford, on the importance of user stories in successful technology projects. articles in this issue cover the topics: improving sitewide navigation; improving the display of hathitrust records in primo; using linked data to create a geographic discovery system; measuring information system project success; a systematic approach towards web preservation; and determining textbook cost, formats and licensing. i hope you enjoy reading the issue, whether you explore just one article, or read it “cover to cover.” as always, if you want to share the research or practical experience you have gained as an article in ital, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu march 2019 pal: toward a recommendation system for manuscripts scott ziegler and richard shrake information technology and libraries | september 2018 84 scott ziegler (sziegler1@lsu.edu) is head of digital programs and services, louisiana state university libraries. prior to this position, ziegler was the head of digital scholarship and technology, american philosophical society. richard shrake (shraker13@gmail.com) is a library technology consultant based in burlington, vermont. abstract book-recommendation systems are increasingly common, from amazon to public library interfaces. however, for archives and special collections, such automated assistance has been rare. this is partly due to the complexity of descriptions (finding aids describing whole collections) and partly due to the complexity of the collections themselves (what is this collection about and how is it related to another collection?). the american philosophical society library is using circulation data collected through the collectionmanagement software package, aeon, to automate recommendations. in our system, which we’re calling pal (people also liked), recommendations are offered in two ways: based on interests (“you’re interested in x, other people interested in x looked at these collections”) and on specific requests (“you’ve looked at y, other people who looked at y also looked that these collections”). this article will discuss the development of pal and plans for the system. we will also discuss ongoing concerns and issues, how patron privacy is protected, and the possibility of generalizing beyond any specific software solution. introduction the american philosophical society library (aps) is an independent research library in philadelphia. founded in 1743, the library houses a wide variety of material in early american history, history of science, and native american linguistics. the majority of the library’s holdings are manuscripts, with a large amount of audio material, maps, and graphics, nearly all of which are described in finding aids created using encoded archival description (ead) standards. like similar institutions, the aps has long struggled to find new ways to help library users discover material relevant to their research. in addition to traditional in-person, email, and phone reference, the aps has spent years creating search and browse interfaces, subject guides , and web exhibitions to promote the collections.1 as part of these ongoing efforts to connect users with collections, the aps is working on an automated recommendation system to reuse circulation data gathered through aeon. developed by atlas systems, aeon is a “request and workflow management software specifically designed for special collections libraries and archives,” and it enables the aps to gather statistics on both the use of our manuscript collections and on aspects of the library’s users.2 the automated recommendation system, which we’re calling pal, for “people also liked,” is an ongoing effort. this article presents a snapshot of current work. pal: toward a recommendation system for manuscripts | ziegler and shrake 85 https://doi.org/10.6017/ital.v37i3.10357 literature review the benefits of recommendations in library opacs has long been recognized. writing in 2008 about the library recommendation system bibtip, itself started in the early 2000s, mönnich and spiering observe that “library services are well suited for the adoption of recommendation systems, especially services that support the user in search of literature in the catalog.” by 2011 oclc research and the information school at the university of sheffield began exploring a recommendation system for oclc’s worldcat.3 recommendations for library opacs commonly fall into one of two categories, content-based or collaborative filtering. content-based recommendations pair specific users to library items based on the metadata of the item and what is known about the user. for example, if a user indicates in some way that they enjoy mystery novels, items identified as mystery novels might be recommended to them. collaborative filtering combines users in some way and creates recommendations for one user based on the preferences of another user. there can be a dark side to recommendations. the algorithms that determine which users are similar and thus which recommendations to make are not often understood. writing about algorithms in library discovery systems broadly, reidsma points out that “in librarianship over the past few decades, the profession has had to grapple with the perception that computers are better at finding relevant information then people.”4 the algorithms that are doing the finding, however, often carry the same hidden biases that their programmers have. reidsma encourages a broader understanding of algorithms in general and deeper understanding of recommendation algorithms in particular. the history of recommendation systems in libraries has informed the ongoing development of pal. we use both the content-based and the collaborative filtering approach to offering recommendations to users. for the purposes of communicating them to nontechnical patrons, we refer to them as “interest-based” and “request-based,” respectively. furthermore, we are cautious about the role algorithms play in determining which recommendations users see. our help text reinforces the continued importance of working directly with in-house experts, and we promote pal as one tool among the many offered by the library. we are not aware of any literature on the development of recommendation tools for archives or special-collections libraries. the nature of the material held in these institutions presents special challenges. for example, unlike book collections, many manuscript and archival collections are described in aggregate: one description might refer to many letters. these issues are discussed in detail below. putting data to use: recommendations based on interests and requests the use of aeon allows the aps to gather and store data, including both data that users supply through the registration form and data concerning which collections are requested. pal use both types of data to create recommendations. interest-based recommendations the first type of recommendation uses self-identified research interest data that researchers supply when creating an aeon account. when registering, a user has the option to select from a list of sixty-four topics grouped into seven broad categories (figure 1). the aps selected these information technology and libraries | september 2018 86 interests based on suggestions from researchers as well as categories common in the field of academic history. upon signing in, a registered user sees a list of links (figure 2); each link leads to a full-page view of collection recommendations (figure 3). these recommendations follow the model, “you’re interested in x, other people interested in x looked at these collections.” request-based recommendations using the circulation data that aeon collects, we are able to automate recommendations in pal based on request information. upon clicking a request link in a finding aid, the user is presented with a list of recommendations on the sidebar in aeon (figure 4). each link opens the finding aid for the collection listed. figure 1. list of interests a user sees when registering for the first time. a user can also revisit this list to modify their choices at any point by following links through the aeon interface. the selected interests generate recommendations. pal: toward a recommendation system for manuscripts | ziegler and shrake 87 https://doi.org/10.6017/ital.v37i3.10357 figure 2. list of links appearing on the right-hand sidebar, based on interests that users select. figure 3. recommended collections, based on interest, showing collection name (with a link to finding aid), call number, number of requests, and number of users who have requested from the collections. the user sees this list after clicking on option from sidebar, as shown in figure 2. information technology and libraries | september 2018 88 figure 4. request-based recommendation links appearing on the right-hand sidebar after a patron requests an item from a finding aid. the process currently, the data that drives these two functions is obtained from a semidynamic process via daily, automated sql query exports. usernames are employed to tie together requests and interests but are subsequently purged from the data before the results are presented to users and staff. this section explains the process in detail and presents code snippets where available. all code is available on github.5 interest-based recommendations for interest-based recommendations, we employ two queries. the first query pulls every collection requested by a user for each topic for which that user has expressed an interest. the second aggregates the data for every user in the system. the following queries get data from the microsoft sql database, via a microsoft access intermediary, that aeon uses to store data. because of the number of interest options in the registration form, and the character length of some of them (“early america colonial history,” for example) we encode the interests in shortened form. “early america colonial history” becomes “ea-colhist” so as not to run into character limits in the database. this section explores each of these queries in more detail and provides example code. pal: toward a recommendation system for manuscripts | ziegler and shrake 89 https://doi.org/10.6017/ital.v37i3.10357 the first query gathers research topics for all users who are not staff (user status is ‘researcher’), and where at least one research topic is chosen (‘researchtopics’ is not null). the data is exported into an xml file that we call “aeonmssreg.” select aeondata.dbo.users.researchtopics, aeondata.dbo.transactions.callnumber, aeondata.dbo.transactions.location from aeondata.dbo.transactions inner join aeondata.dbo.users on (aeondata.dbo.users.username = aeondata.dbo.transactions.username) and (aeondata.dbo.transactions.username = aeondata.dbo.users.username) where (((aeondata.dbo.users.researchtopics) is not null) and ((aeondata.dbo.transactions.callnumber) like 'mss%' or (aeondata.dbo.transactions.callnumber) like 'aps.%') and ((aeondata.dbo.users.status)='researcher')) for xml raw ('aeonmssreq'), root ('dataroot'), elements; the second query combines all data for all users and exports an xml file ‘aeonmssusers.’ select distinct aeondata.dbo.users.researchtopics, aeondata.dbo.transactions.callnumber, aeondata.dbo.transactions.location, aeondata.dbo.transactions.username from aeondata.dbo.transactions inner join aeondata.dbo.users on (aeondata.dbo.users.username = aeondata.dbo.transactions.username) and (aeondata.dbo.transactions.username = aeondata.dbo.users.username) where (((aeondata.dbo.users.researchtopics) is not null) and ((aeondata.dbo.transactions.callnumber) like 'mss%' or (aeondata.dbo.transactions.callnumber) like 'aps.%') and ((aeondata.dbo.users.status)='researcher')) for xml raw ('aeonmssusers'), root ('dataroot'), elements; each query produces an xml file. these files are parsed using xsl stylesheets into subsets for each research interest. the stylesheets also generate counts of users requesting a collection and number of total requests for a collection by users sharing an interest. an example is the following stylesheet for the topic “early america colonial history,” which pulls from the xml file “aeonmssreg”: information technology and libraries | september 2018 90 this process is repeated for each interest. the data from the query that we modify with xslt is presented as html that we insert into aeon templates. this html includes the collection name (linked to finding aid), call number, number of requests, and number of users in a table. see figure 3 for how this appears to the user. the following shows how xsl is wrapped in html.

the collections most frequently requested from researchers who expressed an interest in are listed below with links to each collection's finding aid and the number of times each collection has been requested.

collection call number # of requests # of users
to ensure a user only sees the links that match the interests they have selected, we use javascript to determine the expressed interests of the current user and display the corresponding links to the html pages in a sidebar. this approach works well, but we must account for two quirks. the first is that many interests in the database do not conform to the current list of options because many users predate our current registration form and wrote in free-form interests. secondly, aeon stores the research information as an array rather than in a separate table, so we must account for the fact that the aeon database contains an array of values that includes both controlled and uncontrolled vocabulary. first, we set the array as a variable so we can look for a value that matches our controlled vocabulary and separate the array into individual values for manipulation: // use var message to check for presence of controlled list of topics var message = "<#user field='researchtopics'>"; // use var values to separate topics that are collected in one string var values = "<#user field='researchtopics'>".split(","); pal: toward a recommendation system for manuscripts | ziegler and shrake 91 https://doi.org/10.6017/ital.v37i3.10357 we also create variables to generate the html entries and links out when we have extracted our research topics: var open = "
" next we set a conditional to determine if one of our controlled vocabulary terms appears in the array: //determine if user has an interest topic from the controlled list if ((message.indexof("ea-colhis") > -1) || (message.indexof("ea-amrev") > -1) || (message.indexof("ea-earlynat") > -1) || (message.indexof("ea-antebellum") > -1) || … if the array contains a value from our controlled vocabulary, we generate a link and translate our internal code back into a human-friendly research topic (“ea-colhist,” for example, becomes once again “early american colonial history”): for (var i = 0; i < values.length; ++i) { if (values[i]=="ea-colhis"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america-colonial history" + close);} else if (values[i]=="ea-amrev"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america american revolution" + close);} else if (values[i]=="ea-earlynat"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america early national" + close);} else if (values[i]=="ea-antebellum"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america antebellum" + close);} … see figure 2 for how this appears to the user. users only see the links that correspond to their stated interest. if the array does not contain a value from our controlled vocabulary, we display the research-topic interests associated with the user account, note that we don’t currently have a recommendation, and provide a link to update the research topics for the account. else {document.getelementbyid("notopic").innerhtml = "

you expressed interest in:

<#user field='researchtopics'>

we are unable to provide a specific collection recommendation for you. please visit our user profile page to select from our list of research topics.

" } request-based recommendations in addition to interest-based recommendations, pal supplies recommendations based on past requests a user has made. this section details how these recommendations are generated. aeon allows users to request materials directly from a finding aid (see figure 6). to generate our request-based recommendations we employ a query depicting the call number and user of every request in the system and export the results to an xml file called “aeonlikecollections.” information technology and libraries | september 2018 92 select subquery.callnumber, subquery.username, iif(right(subquery.trimlocation,1)='.',left(subquery.trimlocation,len(subquery.trimlocation)1),subquery.trimlocation) as finallocation from ( select distinct aeondata.dbo.transactions.callnumber, aeondata.dbo.transactions.username, iif(charindex(':',[location])>0,left([location],charindex(':',[location])-1),[location]) as trimlocation from aeondata.dbo.transactions inner join aeondata.dbo.users on (aeondata.dbo.users.username = aeondata.dbo.transactions.username) and (aeondata.dbo.transactions.username = aeondata.dbo.users.username) where (((aeondata.dbo.transactions.callnumber) like 'mss%' or (aeondata.dbo.transactions.callnumber) like 'aps.%') and ((aeondata.dbo.transactions.location) is not null) and ((aeondata.dbo.users.status)='researcher'))) subquery order by subquery.callnumber for xml raw ('aeonlikecollections'), root ('dataroot'), elements; we then process the “aeonlikecollections” file through a series of xslt stylesheets, creating lists of every other collection that every user of the current collection has requested. first the stylesheets remove collections that have only been requested once. then we count the number of times each collection has been requested: we sort on the collection name and username and then re-sort to combine groups of requested collections with users who have requested each collection. pal: toward a recommendation system for manuscripts | ziegler and shrake 93 https://doi.org/10.6017/ital.v37i3.10357 we then create a new xml file that is organized by our collection groupings. the following snippet shows a populated xml file generated by the xslt stylesheet above. mss.497.3.b63c mss.497.3.b63c american council of learned societies … 94 mss.ms.coll.200 mss.ms.coll.200 miscellaneous manuscripts collection … 92 we use javascript to determine the call number of the user’s current request and display the list of other collections that users who have requested the current collection have also requested. see figure 4 for how these links appear to the user. all of the exports and processing are handled automatically through a daily scheduled task. the only personally identifiable data that is contained in these processes are usernames, which are used for counting purposes, but they are removed from the final products through the xslt processing on an internal administrative server, are never stored in the aeon web directory, and are never available for other library users or staff to see. potential pitfalls and what to do about them pal allows us to see new things about our users, and we hope that our users are able to see new collections in the library. however, there are potential pitfalls to the way we’ve been working on this project. we’re calling the two biggest pitfalls the “bias toward well-described collections” and the “problem of aboutness.” information technology and libraries | september 2018 94 the bias toward well-described collections the bias toward well-described collections is best understood by examining how the aps integrates aeon into our finding aids. we offer request links at every available level of description: collection, series, folder, and item. if a patron spends all day in our reading room and looks at the entirety of an item-level collection, they could have made between twenty and one hundred individual requests from that collection. for our statistics, each request will be counted as that collection being used. figure 6 shows a collection described at the item level; each item can be individually requested, giving the impression that this collection is very heavily used even if it is only one patron doing all the requesting. figure 6. finding aid of collection described at the item level. a patron making their way through this collection could make as many as one hundred individual requests. for collections described at the collection level, however, the patron has only one link to click to see the entire collection. for pal, however, it looks like that collection was only used once, as shown in figure 7. a patron sitting all day in our reading room looking at a collection with little description might use the collection more heavily than a patron clicking select items in a well-described collection. however, when we review the numbers, all we see is that the well-described collections get more clicks. pal: toward a recommendation system for manuscripts | ziegler and shrake 95 https://doi.org/10.6017/ital.v37i3.10357 figure 7. screenshot of finding aid with only collection-level description. this collection has only one request link, the “special request” link at the top right. a patron looking through the entirety of this collection will only log a single request from the point of view of our statistics. the problem of aboutness when we speak of the problem of aboutness, we draw attention to the fact that manuscript collections can be about many different things. one researcher might come to a collection for one reason, another researcher for another reason. a good example at the aps library is the william parker foulke papers.6 this collection contains approximately three thousand items and represents a wide variety of the interests of the eponymous mr. foulke. he discovered the first full dinosaur skeleton, promoted prison reform, worked toward abolition, and championed arctic exploration. a patron looking at this collection could be interested in any of these topics, or others. pal, however, isn’t able to account for these nuances. if a researcher interested in prison reform requests items from the foulke papers, they’ll see the same suggestion as a researcher who came to the collection for arctic exploration. what to do about this identifying these pitfalls is a good first step to avoiding them, but it’s only a first step. there are technical solutions, and we’ll continue to explore them. for example, the bias toward welldescribed collections is mitigated by showing both the number of requests and the number of users who have requested from a collection (see figure 3). we hope that by presenting both numbers, we move a little toward overcoming this bias. however, we’re also interested in the nontechnical approaches to these issues. as mentioned in the introduction, the aps relies heavily on traditional reference service, both remote and in-house. nontechnical solutions acknowledge the shortcomings of any constructed solution and injects a healthy amount of humility into our work. additionally, the subject guides, search tools, and web exhibitions all form an ecosystem of discovery and access to supplement pal. future steps using data outside of aeon we have begun exploring options for using the recommendation data outside of aeon. one early prototype surfaces a link in our primary search interface. for example, searching for the william information technology and libraries | september 2018 96 parker foulke papers shows a link of what people who requested from this collection also looked at. see figures 8 and 9. generalizing for other repositories there are ways to integrate the use of aeon with ead finding aids. the systems that the aps has developed to collect data for automated recommendations takes advantage of our infrastructure. we’d like for other repositories to be able to use pal. it is our hope that an institution using aeon in a different way will help us generalize this system. generalizing beyond aeon pal is currently configured to pull data out of the microsoft sql database used by aeon. however, all the manipulation is done outside of aeon and is therefore generalizable to data collected in other ways. because archives and special collections have long-held statistics in different types of systems, we hope to be able to generalize beyond the aeon use case if there is any interest in this from other repositories. integrating pal into aeon conversations with atlas staff about pal have been positive, and there is interest in building many of the features into future releases of aeon. as of this writing, an open uservoice forum topic is taking votes and comments about this integration.7 figure 8. a link in the search returns that leads to recommendations based on finding aid search. clicking on the link “pal recommendations: patrons who used henry howard houston, ii papers also used these collections” will open an html page with a list of links to finding aids. pal: toward a recommendation system for manuscripts | ziegler and shrake 97 https://doi.org/10.6017/ital.v37i3.10357 figure 9. html link of recommended finding aids based on search. conclusion the aps is trying to add to the already robust options for users to find relevant manuscript collections. in addition to traditional reference, web exhibitions, and online search and browse tools, we have started reusing circulation data and self-identified user interests to automate recommendations. this new system fits within the ecosystem of tools we already supply. this is a snapshot of where the pal recommendation project is as of this writing, and we hope to work with other special collections libraries and archives to continue to grow the tool. if you are interested, we hope you reach out. endnotes 1 “subject guides and bibliographies,” american philosophical society, accessed february 27, 2018, https://amphilsoc.org/library/guides; “exhibitions,” american philosophical society, accessed february 27, 2018, https://amphilsoc.org/library/exhibit; “galleries,” american philosophical society, accessed february 27, 2018, https://diglib.amphilsoc.org/galleries. 2 “aeon,” atlas systems, accessed february 27, 2018, https://www.atlas-sys.com/aeon/. https://amphilsoc.org/library/guides https://amphilsoc.org/library/exhibit https://diglib.amphilsoc.org/galleries https://www.atlas-sys.com/aeon/ information technology and libraries | september 2018 98 3 michael mönnich and marcus spiering, “adding value to the library catalog by implementing a recommendation system,” d-lib magazine 14, no. 5/6 (2008), https://doi.org/10.1045/may2008-monnich. 4 matthew reidsma, “algorithmic bias in library discovery systems,” matthew reidsma (blog), march 11, 2016, https://matthew.reidsrow.com/articles/173. 5 “americanphilosophicalsociety/pal,” american philosophical society, last modified september 11, 2017, https://github.com/americanphilosophicalsociety/pal. 6 “william parker foulke papers, 1840–1865,” american philosophical society, accessed february 27, 2018, https://search.amphilsoc.org/collections/view?docid=ead/mss.b.f826-ead.xml. 7 “recommendation system to suggest items to researchers based on users with the same research topic,” atlas systems, accessed february 27, 2018, https://uservoice.atlassys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-tosuggest-items-to-research. https://doi.org/10.1045/may2008-monnich https://matthew.reidsrow.com/articles/173 https://github.com/americanphilosophicalsociety/pal http://amphilsoc.org/collections/view?docid=ead/mss.b.f826-ead.xml https://uservoice.atlas-sys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-to-suggest-items-to-research https://uservoice.atlas-sys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-to-suggest-items-to-research https://uservoice.atlas-sys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-to-suggest-items-to-research abstract introduction literature review putting data to use: recommendations based on interests and requests interest-based recommendations request-based recommendations the process interest-based recommendations request-based recommendations potential pitfalls and what to do about them the bias toward well-described collections the problem of aboutness what to do about this future steps using data outside of aeon generalizing for other repositories generalizing beyond aeon integrating pal into aeon conclusion endnotes letter from the editor kenneth j. varnum information technology and libraries | september 2018 1 https://doi.org/10.6017/ital.v37i3.10747 this september 2018 issue of ital continues our celebration of the journal’s 50 th anniversary with a column by former editorial board member mark dehmlow, who highlights the technological changes beginning to stir the library world in the 1980s. the seeds of change planted in the 1970s are germinating, but the explosive growth of the 1990s is still a few years away. in addition to peer-reviewed articles on recommender systems, big data processing and storage, finding vendor accessibility documentation, using gis to find specific books on a shelf, and a recommender system for archival manuscripts, we are also publishing the student paper by this year’s ex libris/lita student writing award, “the open access citation advantage: does it exist and what does it mean for libraries?”, by colby lewis at the university of michigan school of information. this inciteful paper impressed the competition’s judges (as ital’s editor, i was one of them) and i am very pleased to include ms. lewis’ work here. this issue also marks my fourth as editor. with one year under my belt i am finding a rhythm for the publication process and starting to see the increased flow of articles from outside traditional academic library spaces that i wrote about in december 2017. as always, if you have an idea for a potential ital article, please do get in touch. we on the editorial board look forward to working with you. sincerely, kenneth j. varnum, editor varnum@umich.edu september 2018 http://www.ala.org/news/member-news/2018/04/colby-lewis-wins-2018-litaex-libris-student-writing-award mailto:varnum@umich.edu taking the long way around: improving the display of hathitrust records in the primo discovery system jason alden bengtson and jason coleman information technology and libraries | march 2019 27 jason bengtson (jbengtson@ksu.edu) is head of it services for kansas state university libraries. jason coleman (coleman@ksu.edu) is head of library user services for kansas state university libraries. abstract as with any shared format for serializing data, primo’s pnx records have limits on the types of data which they pass along from the source records and into the primo tool. as a result of these limitations, pnx records do not currently have a provision for harvesting and transferring rights information about hathitrust holdings that the kansas state university (ksu) library system indexes through primo. this created a problem, since primo was defaulting to indicate that all hathitrust materials were available to ksu libraries (k-state libraries) patrons, when only a limited portion of them actually were. this disconnect was infuriating some library users, and creating difficulties for the public services librarians. there was a library-wide discussion about removing hathitrust holdings from primo altogether, but it was decided that such a solution was an overreaction. as a consequence, the library it department began a crash program to attempt to find a solution to the problem. the result was an application called hathigenius. introduction many information professionals will be aware of primo, the web scale discovery tool provided by ex libris. web scale discovery services are designed to provide indexing and searching user experiences, not only for the library’s holdings (as with a traditional online public access catalog), but also for many of a library’s licensed and open access holdings. primo offers a variety of useful features for search and discovery, taking in data from manifold sources and serializing them into a common format for indexing within the tool. however, such applications are still relatively young, and the technologies powering them have not fully matured. the combination of this lack of maturity and deliberately closed architecture between vendors leads to several problems for the user. one of the most frustrating is errors in identifying full-text access availability. as with any shared format for serializing data, primo’s pnx (primo normalized xml) records have limits on the types of data they pass from the source records into the primo tool. as a result of these limitations, pnx records do not currently have a provision for harvesting and transferring rights information about hathitrust holdings that the k-state libraries system indexes through primo. this created a problem in the k-state libraries’ implementation, since primo was defaulting to indicate that all hathitrust materials were available to k-state libraries patrons, when only a limited portion of them actually were. this disconnect was infuriating some library users, and creating difficulties for the public services librarians. there was a library-wide discussion about removing hathitrust holdings from primo altogether, but it was decided that such a solution was an overreaction. as a consequence, the library it services department began a crash program to attempt to find a solution to the problem. taking the long way around | bengston and coleman 28 https://doi.org/10.6017/ital.v38i1.10574 hathitrust’s digital library as a collection in primo central hathitrust was established in 2008 as a collaboration among several research libraries that were interested in preserving digital content. as of the beginning of march 2018, the collaborative’s digital library contained more than sixteen million items, approximately 37 percent of which were in the public domain.1 ex libris’ primo central index (pci), which serves as primo’s built-in index of articles from various database providers, includes metadata for the vast majority of the items in hathitrust’s digital library, providing inline frames within the original primo user interface to directly display full-text content of those items that the library has access to. libraries subscribing to primo choose whether or not to make these records available to their users. k-state libraries, like many other primo central clients, elected to activate hathitrust in its instance of primo, which it has branded with the name search it. the unmodified version of primo central identified all records from hathitrust’s digital library as available online, regardless of the actual level of access provided to users. users who discovered a record for an item from hathitrust’s digital library were presented with a conspicuous message indicating that full text was available and two links named view it and details. an example of the appearance of these search results is shown in figure 1. after clicking the “view it” tab, the center window would display the item’s homepage from hathitrust’s digital library inside an iframe. public domain items would display the title page of the item and present users with an interface containing numerous visual indicators that they were viewing an ebook (see figure 2 for an example). items with copyright restrictions would display a message indicating that the item is not available online (see figure 3 for an example). figure 1. two books from hathitrust as they appeared in search it prior to implementation of hathigenius. information technology and libraries | march 2019 29 figure 2. hathitrust result for an item in the public domain. figure 3. hathitrust’s homepage for an item that is not in the public domain. despite the intentions evident in the design of the primo interface, availability of hathitrust records was not being accurately reflected in the list of returns. the size of the indices underlying web scale discovery systems and the number of configurations and settings that must be maintained locally introduce a variety of failure points that can intercede when patrons attempt to access subscribed resources.2 one of the failure points identified by sunshine and carter is inaccurate knowledgebase information. the scope of inaccurate information about hathitrust items in primo central index constituted a particularly egregious example of this type of failure. patron reaction to misinformation about access to hathitrust between the time hathitrust’s digital library was activated in search it and the time the hathigenius application was installed at least thirty patrons contacted k-state libraries to ask why they were unable to access a book in hathitrust when search it had indicated that full text was available for the book. many of these expressed frustration at frequently encountering this error (for an example, see figure 4). taking the long way around | bengston and coleman 30 https://doi.org/10.6017/ital.v38i1.10574 1:08 26389957777759601093088133 i find it misleading that the search it function often finds a book i am interested in, but sometimes says it is available online; however, oftentimes it takes me to the hathi trust webpage for the book where i am told it is not available online. is this because our library has had to give up their subscription to this service? 1:08 me hi! 1:09 me that is definitely frustrating and we are trying to find a way to correct it. 1:10 me it does not have to do with our subscription, but rather the metadata we receive from hathitrust and its compatibility (or rather, incompatibility) with search it 1:11 26389957777759601093088133 okay, so i guess i better ask for the book i am seeking (the emperor’s mirror) through ill. 1:11 me that’d probably be your best bet, but let me take a look one moment 1:14 me yes, ill does look best. please note that the ill department will be closed after today until january 1:14 26389957777759601093088133 got it. thanks. i hope the hathi trust issue is resolved soon. (i have seen this problem all semester and finally got so frustrated to ask about it.) 1:15 26389957777759601093088133 have a happy holiday! 1:15 me you as well! and yes, i hope we can figure it out asap 1:15 me (it’s frustrating for us, too!) 1:20 26389957777759601093088133 has left the conversation figure 4. chat transcript revealing frustration with inaccurate information about availability of items in hathitrust. staff reaction to misinformation about access to hathitrust reference staff at k-state libraries use a ticketing system to report electronic resource access problems to a team of librarians who troubleshoot the underlying issues. shortly after the hathitrust library was activated in search it, reference staff submitted several tickets about problems with access to items in that collection. members of the troubleshooting team responded quickly and informed the reporting librarians that the problem was one beyond their control. this message was slow to reach the entirety of the reference staff and was not always understood as being applicable to the full range of access problems our patrons were experiencing. samples and healy note that this type of decentralization and reactive orientation is common in electronic resource troubleshooting.3 like them, k-state libraries recognized a need to develop best practices to obviate confusion. we also found ourselves pining for a tool such as that described by collins and murray that could automatically verify access for a large set of links.4 the extent of displeasure with the situation was so severe that some librarians stated they were loath to promote search it to students since several million records were so conspicuously inaccurate. information technology and libraries | march 2019 31 technical challenges the k-state libraries it department wanted to fix the situation, in order to provide accurate expectations to their users, but doing so presented severe technical challenges, the most significant of which stemmed from the lack of rights information in the pnx record in primo. without more accurate information on availability, user satisfaction seemed destined to remain low. research into patron use of discovery layers predicted this unsurprising dissatisfaction. oclc’s (2009) research into what patrons want from discovery system led the researchers to conclude that “a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.”5 a later usability study reported: “some participants spent considerable time looking around for features they hoped or presumed existed that would support their path toward task completion.”6 additionally, the perceived need to customize discovery layers so that they reflect the needs of a particular research library is hardly new, or exclusive to k-state libraries. the same issue was confronted by catalogers at east carolina university, as well as catalogers at unc chapel hill.7 nonetheless, the challenge posed by discovery layers comes with opportunity, as james madison university discovered when their ebsco discovery service widget netted almost twice the usage of their previous library catalog widget, and as the university of colorado discovered when they observed users attempting to use the discovery layer search box in “google-like” ways that could potentially aid discovery layer creators (as well as library it departments) in both design and in setting expectations.8 as previously noted, primo’s results display is driven by pnx records (see figure 5 for an example). the single most fundamental challenge was finding a way to get to holdings rights information despite that data not being present in the pnx records, or, consequently, the search results that showed up in the presentation layer. there was no immediate option to create a solution that leveraged “server-side” resources, where the data itself resided and was transformed, since k-state libraries subscribes to primo as a hosted service, and ex libris provided no direct server-side access to k-state libraries. some alternative way had to be found to locate the rights data for individual records and populate it into the primo interface. upon assessing the situation, the assistant director, it (ad) decided that one potential approach would be to independently query the hathitrust bibliographic application programming interface (api) for rights information. this approach solved a number of fundamental problems, but also posited its own questions and challenges: 1. some server-side component would still be needed for part of the query . . . where would that live and how could it be made to communicate with the javascript k-state libraries had injected into its primo instance? 2. how to best isolate hathitrust object identifiers from primo and then use them to launch an api query? 3. how to keep those responses appropriately “pinned” to their corresponding entries on the primo page? 4. how would the hathitrust bibliographic api perform under load from search it queries? answering these questions would require significant research into the hathitrust bibliographic api documentation, and extensive experimentation. taking the long way around | bengston and coleman 32 https://doi.org/10.6017/ital.v38i1.10574 figure 5. a portion of the pnx record for http://hdl.handle.net/2027/uc1.32106011231518 (the second item shown in figure 1). building the application of these four questions, the first was easily the most challenging: where would the server-side component live and how would it work? the k-state libraries it services department had, in the past, made a number of significant modifications to the appearance and functionality of the primo application by adding javascript to the static html tiles used in the primo interface. however, generally speaking, javascript cannot successfully request data from outside of the domain of the web document it occupies. requesting data from an api across domains requires the mediation of a server-side appliance. the ad constructed one for this purpose, using the php programming language. this script would serve as an intermediary between the javascript in primo and the hathitrust api. the appliance accepted data from the primo javascript in the form of the contents of http variables (encoded in the url of the get request to the php appliance), then used those values to query the hathitrust api. however, since this server-side appliance did not reside in the same domain as k-state libraries’ primo instance, the problem of getting the returned api data from the php appliance to the javascript still remained. this problem was solved by treating the php appliance as a javascript file for purposes of the application. while javascript cannot load data from another domain, a web document may load actual javascript files from anywhere on the web. the hathigenius appliance takes advantage of this fact by calling the php appliance programmatically as a javascript file, with a javascript object notation (json) version of the identifiers of any hathitrust entries encoded as part of the url used to call the file. the php script runs the queries against the api and returns a javascript file consisting of a single variable containing the json data encoding the availability information for the hathitrust entries as supplied from the bibliographic api . . . essentially appearing to the browser as a standard javascript file. information technology and libraries | march 2019 33 the second and third problems were intrinsically interrelated, and essentially boiled down to finding a unique identifier to use in an api query from the hathitrust entries. the most effective way to handle these queries was to use the “htid” identifier, which was largely unique to hathitrust entries, could be easily extracted from any entries that contained it, and would form the basis of the php script’s request to the hathitrust restful api to obtain rights information. in the process of harvesting the htid, hathigenius also copies the id for the object in the webpage that serves as the entry in the list of primo returns containing that htid. as the data is moved back and forth for processing, the htids, and later the resultant json data, remain paired to the object id for the entry in the list of returns. when hathigenius receives the results of the api query, it can then easily rewrite those entries to reflect the rights data it obtained. the fourth question has been fully answered with time. to this point, well over a year after hathigenius was activated in production, library it has not observed any failure of the api to deliver the requested results in testing, and no issues to that effect have been reported by users. log data indicates that, even under heavy load, the api is performing to expectations. further modifications originally, the hathigenius application supplied definitive states of available or unavailable for each entry. however, some experimentation showed this approach to be less than optimal. since the bibliographic api cannot be queried by kansas state university as a specific user, but rather was being queried for general access rights, the possibility still existed for false negatives in the future, if kansas state university’s level of access to hathitrust changed. the data returned from the api queries, when drilled down, just consisted of the usrightsstring property from the api, which corresponded to open-source availability, and did not account for any additional items available to the library by license in the future. after the application had been active for a short time, to mitigate this potential issue, the “not available” state (consisting of an application of the “exlresultstatusnotavailable” class to the hathitrust entry) was “softened” into an application of the “exlresultstatusmaybeavailable” class and verbiage asking users to check the “view it” tab for availability. a few weeks after deployment, it received a ticket indicating hathigenius was failing to work properly. the source of the problem proved to be detailed bibliographic pages for items in a search results list, which were linking out from the search entries. these pages used a different class and object structure than the search results pages in primo, requiring that an additional module be built into hathigenius to account for them. once the new module was added to the application and put into place, the problem was resolved. a second issue presented itself some weeks later, when a few false negatives were reported. at first, the assistant director assumed that licensing had changed, creating a disparity between the access information from the usrightsstring property and the library’s actual holdings. however, upon investigation it was clear that hathigenius was dropping some of the calls to the hathitrust bibliographic api. the api itself was performing as expected under load, however, and the failure proved to be coming from an unexpected source. the php script used by hathigenius to interface with the api was employing the curl module, which, in turn, was using its own, less secure certificate to establish a secure socket layer (ssl) connection to the hathitrust server. once the taking the long way around | bengston and coleman 34 https://doi.org/10.6017/ital.v38i1.10574 script was refactored to employ the simpler file_get_contents function, which relied upon the server’s main ssl certificate, the problem was fully resolved. hathigenius also had a limited vulnerability to bad actors. while the internal script’s destination hardwiring prevented hathigenius from being used as a generic tool to anonymously query apis, the library did encounter a situation in which a (probably inadvertently) malicious bot repeatedly pinged the script, causing it to use up system resources until it interrupted other services on the host machine. modifications were added to the script to provide a simple check against requests originating from primo. additionally, restrictions were placed on the script so that excessive resource use would cause it to be intermittently deactivated. while not perfect solutions, these measures have prevented a repeat of the earlier incident. k-state libraries has recently finished work on its version of the new primo user interface (primo new ui), which was moved into production this year. the new interface has a completely different client-side structure, requiring a very different approach to integrating hathigenius.9 appearance of hathitrust results in primo after hathigenius when the hathigenius api does not find a usrights property, we configured primo to display a yellow dot and the text “please check availability with the view it tab” (see figure 6 for an example). as noted earlier, we originally considered this preferable to displaying a red dot and the text “not available online,” because there might be instances in which the item is actually available in full view through hathitrust despite the absence of usrights in the record. figure 6. two books for which hathigenius found no usrights in hathitrust. when the hathigenuis api finds usrights, we configured primo to display a green dot and text “available online” (see figure 7 for an example). information technology and libraries | march 2019 35 figure 7. a book for which hathigenius found usrights. patron response since the beginning of 2017, the reference staff at k-state libraries have received no reports of patrons encountering situations in the original user interface in which primo indicates that full text is available but hathitrust is only providing a preview. however, a small number of patrons (at least four) expressed confusion at seeing a result in primo and discovering that the full-text is not available. some of those patrons noted that they saw the text “please check availability with the view it tab,” and inferred that this was meant to state that the full-text was available. others indicated that they never considered that we would include results for books that we do not own. these responses add to the body of literature documenting user expectations that everything should be available in full-text in an online library and that systems should be easy to use.10 internal response in order to gauge the feelings of k-state libraries’ staff who regularly assist patrons with reference questions, the authors crafted a brief survey (included in appendix a). respondents were asked to indicate whether they had noticed a positive change following implementation of hathigenius, a negative change, or no change at all. they were also invited to share comments. the survey was distributed to thirty individuals. twelve (40 percent) of those thirty responded to the survey. the survey response indicated a great deal of ambivalence by reference staff toward the change, with four individuals (33 percent) indicating they had not noticed a difference, and another four (33 percent) indicating that they had noticed a difference, but that it had not improved the quality of search results. only two (17 percent) of the respondents revealed that they had noticed an improvement in the quality of the search results. one (9 percent) respondent indicated that they felt that the hathitrust results had gotten noticeably worse since the introduction of hathigenius, although they did not elaborate on this in the survey question which invited further comment. the remaining respondent stated that they did not have an opinion. four comments were left by respondents, including one which indicated displeasure with the new, softer verbiage for hathitrust “negatives,” and one who claimed that the problem of false positives persisted, despite such feedback not being seen by the authors through any of the statistical modalities currently used for recording reference transactions. one user praised hathigenius, while another related broad displeasure with the decision to include hathitrust records in search it. that individual claimed that almost none of the results from hathitrust were available and stated that the hope engendered by the presence of the hathitrust results and the corresponding suggestion to check the view it tab was always dashed, to the detriment of patron satisfaction. taking the long way around | bengston and coleman 36 https://doi.org/10.6017/ital.v38i1.10574 the new ui as previously mentioned, in late 2018, k-state libraries adopted the primo new ui created by ex libris. this new user interface was built in angular, and changed many aspects about how hathigenius had to be integrated into primo. the k-state libraries’ it department completed a refactoring (reworking application code to change how an application works, but not what it does) of hathigenius to integrate it with the new ui and released it into production in september 2018. as an interesting aside, the it department did not initially prioritize the reintegration of hathigenius, due to the ambivalence of the response to the application evidenced by the survey conducted for this paper. however, shortly after search it was switched over to the new ui, complaints about the hathitrust results again displaying inaccurate availability information began to come in to the it department via both email and tickets from reference staff. as the stridence of the response increased, the project was reprioritized, and the work completed. future directions as previously mentioned, hathigenius currently uses the very rough “usrightsstring” property value from the hathitrust bibliographic api. however, the api also delivers much more granular rights data for digital objects. a future version of the app may inspect these more granular rights codes and compare them to rights data from k-state libraries in order to more definitively provide access determinations for hathitrust results in primo should the licensing of hathitrust holdings be changed. similarly, since htid technically only resolves to the volume level, a future version may additionally harvest the hathitrust record number, which appears to be extractable from the primo entries. based on feedback from the survey, the “soft negative” verbiage used in hathigenius was replaced with a firmer negative. this decision proved especially sagacious given that, once the early issues with certificates and communication with the hathitrust bibliographic api were sorted out, the accuracy of the tool seemed to be fully satisfactory. another problem with the “soft negative” was the fact that it asked users to click on the view-it tab, when many users simply chose to ignore the tabs and links in the search results, instead clicking on the article title, as found in a usability study on primo conducted by the university of houston libraries.11 it is also worth noting the one survey respondent who is apparently not seeing an improvement in hathitrust accuracy. if the continued difficulties they have indicated can be documented and replicated, the it department can examine those complaints to investigate where the tool may be failing. discussion one interesting feature of this experience is the seeming disconnect between library reference support staff and users in terms of the perception of the efficacy of the tool. this disconnect is all the more curious given the negative reaction displayed by reference support staff when hathigenius became unavailable temporarily upon introduction of the primo new ui. part of this perceived disconnect may be a result of the fact that staff were given a survey instrument, while the reactions of users have been determined largely via null results (a lack of complaints to, or information technology and libraries | march 2019 37 requests for assistance from, service point staff). however, given the dramatic drop in user complaints compared to the ambivalent reaction to the tool by most of the survey respondents, it appears that the staff had a much less enthusiastic response to the intervention than patrons. a few possibilities occur to the authors, including a general dislike for the discovery layer by reference librarians, a general disinclination toward a technological solution by some respondents, or the initial perception by at least part of the reference staff that the problem was not significant. as noted by fagan et al., the pivot toward discovery layers has not been a comfortable one for many librarians.12 until further research can be conducted on this, and reactions to similar customization interventions, these possibilities remain speculation. one particular feature of note with hathigenius is the use of what one of the authors refers to as “sidewise development” to solve problems that seem to be intractable within a proprietary, or open source, web-based tool. while not a new methodology in and of itself, the author has mainly encountered this type of design in ad-hoc creations, rather than as a systematic approach to problem-solving. instead of relying upon the capabilities of primo, this type of customization made its own query to a relevant api and blended that external data with the data available from primo seamlessly within the application’s presentation layer in order to facilitate a solution to a known problem. the solution created in this fashion was portable, and unaffected by most updates to primo itself. even the transition to the new ui required changes to the “hooks” and timing used by the javascript, rather than any substantial rewrite of the core engines of the application. this methodology has been used repeatedly by k-state libraries it services to solve problems where other interventions would have necessitated the creation of specialized modules, or the rewriting of source code; both of which would be substantially affected by updates to the product itself, and which would have been difficult to improve or version without down time to the affected product. similar solutions have seen tools independently query an application’s database in order to inject the data back into the application’s presentation layer, bypassing the core functionality of the application. conclusion reactions at this point from users, and at least some library staff, have been positive. while not a perfect tool, hathigenius has improved the user experience, removing a point of frustration and an area of disconnect between the library and its users. the application itself is fully replicable by other institutions (as is the general model of sideways development), allowing them to improve the utility of their primo instances. as with many possible customizations to discovery layers, hathigenius provides fertile ground for additional work, research, and refinement, as libraries struggle to find the most effective ways to implement discovery tools within their own environments. beyond hathigenius itself, the sideways development method provides a powerful tool for libraries to improve the tools they use by integrating additional functionality at the presentation layer level. tackling the problem of inaccurate full-text links in discovery layers is only one application of this approach, but it is an important one. as libraries continue to strive to improve the results and usability of their search offerings, the ability to add local customizations and improvements will be an essential feature for vendors to consider. taking the long way around | bengston and coleman 38 https://doi.org/10.6017/ital.v38i1.10574 appendix a. feedback survey q1 in january 2017, the library began applying a tool (called hathigenius) to the hathitrust results in primo in order to eliminate the problem of “false positives.” in other words, primo would report that all of the hathitrust results it returned were available online as full text, when many were not. we would like your feedback about the impact of this change from your perspective. q2 which of the following statements best describes your opinion about the impact of hathigenius? o i haven’t noticed a difference. o i feel that search it’s presentation of hathitrust results in search it has become noticeably better since hathigenius was implemented. o i feel that search it’s presentation of hathitrust results in search it has become noticeably worse since hathigenius was implemented. o i have noticed a difference, but i feel that search it’s presentation of hathitrust results is about the same quality as it was before hathigenius was implemented. o no opinion. q3 please share any comments you have about hathigenius or any ideas you have for improving the display of hathitrust’s records in search it. information technology and libraries | march 2019 39 references 1 hathitrust digital library, “welcome to hathitrust!” accessed march 4, 2018, https://www.hathitrust.org/about. 2 sunshine carter and stacie traill, “essential skills and knowledge for troubleshooting eresources access issues in a web-scale discovery environment,” journal of electronic resources librarianship 29, no. 1 (2017): 7, https://doi.org/10.1080/1941126x.2017.1270096. 3 jacquie samples and ciara healy, “making it look easy: maintaining the magic of access,” serials review 40, no. 2 (2014): 114, https://doi.org/10.1080/00987913.2014.929483. 4 maria collins and william t. murray, “seesau: university of georgia’s electronic journal verification system,” serials review 35, no. 2 (2009): 80, https://doi.org/10.1080/00987913.2009.10765216. 5 karen calhoun, diane cellentani, and oclc, eds., online catalogs: what users and librarians want: an oclc report (dublin, ohio: oclc, 2009): 20, https://www.oclc.org/content/dam/oclc/reports/onlinecatalogs/fullreport.pdf. 6 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends; baltimore 61, no. 1 (summer 2012): 191, https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1132&context=library. 7 marlena barber, christopher holden, and janet l. mayo, “customizing an open source discovery layer at east carolina university libraries “the cataloger’s role in developing a replacement for a traditional online catalog,” library resources & technical services 60, no. 3 (july 2016): 184, https://journals.ala.org/index.php/lrts/article/view/6039; benjamin pennell and jill sexton, “implementing a real-time suggestion service in a library discovery layer,” code4lib journal, no. 10 (june 2010): 5, https://journal.code4lib.org/articles/3022. 8 jody condit fagan et al., “usability test results for a discovery tool in an academic library,” information technology and libraries 31, no. 1 (march 2008): 99, https://doi.org/10.6017/ital.v31i1.1855. 9 dan moore and nathan mealey, “consortial-based customizations for new primo ui,” the code4lib journal, no. 34 (october 25, 2016), http://journal.code4lib.org/articles/11948. 10 lesley m. moyo, “electronic libraries and the emergence of new service paradigms,” the electronic library, 22, no. 3 (2004): 221, https://www.emeraldinsight.com/doi/full/10.1108/02640470410541615. 11 kelsey brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries, 35, no. 1 (march 2016): 20, https://ejournals.bc.edu/ojs/index.php/ital/article/view/8965. 12 fagan et al., “usability test results for a discovery tool in an academic library,” 84. reproduced with permission of the copyright owner. further reproduction prohibited without permission. pearls marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 53 pearls ed. note: "pearls" is a new section that will appear in these pages from time to time. it will be ital 's own version of the "top technology trends" topic begun by pat ensor. these pearls might be gleaned from a variety of places, but most often will come from discussion lists on the net. our first pearl, from thomas dowling appeared on web4lib on august 19, 1999 under the subject "pixel sizes for web from : thomas dowling to : multiple recipients of list sent : thu, 19 aug 1999 06:07 :08 -0700 (pdt) subject: [web4lib] pixel s izes for web pages dan marmion pages." he is responding to a query that asked if web site developers should assume the standard monitor resolution is 640x480 pixels, or something else. you may want to consult the web4lib archive for comments from the last few merry go-rounds on this topic. monitor size in inches is different from monitor size in pixels , which is different from window size in pixels, which is d ifferent from the rendered size of a browser's default font. not only are these four measurements different, they operate almost wholly independently of each other . so a statement like "i have trouble reading text at 600x800" puts the blame in the wrong place . html inherently has no sense of screen or window dimensions. many web designers will argue that the only aspects to a page with fixed pixel dimensions should be inline images; such designers typically restrain their use of images so that no single image or horizontal chain of images is wider than, say, 550px (with obvious exceptions for sites like image archives where the main purpose of a page is to display a larger image) . outside of images, find ways to express measurements relative to window size (percentages) or relative to text size (ems). users detest horizontal scrolling. in my experience, users with higher screen resolutions and/or larger monitors are less likely to run any application full screen; average window size on a 1280x1024 19" or 21 " monitor is very likely to be less than b00px wide. (the browser window i currently have open is 587px wide and 737px high .) i applaud your decision to support web access for the visually impaired . since that entails much , much more than monitor resolution, i trust the people actually writing your pages are familiar with the web content accessibility guidelines. it is actually possible to design web sites that are equally usable , even equally beautiful, under a wide range of viewing conditions. failing to accomplish that completely is understandable; failing to identify it as a goal is not. my recommendations to your committee would be a) find a starting point that isn't tied up in presentational nitpicking; b) find a design that looks attractive anywhere from 550 to 1550 pixels wide; c) crank up both your workstations ' resolution and font size; and d) continue to run your browsers in windows that are approximately 600 to 640 pixels wide . thomas dowling ohiolink ohio library and information network tdowllng @ohiolink.edu pearls i 53 vr hackfest public libraries leading the way vr hackfest chris markman, m ryan hess, dan lou, and anh nguyen information technology and libraries | december 2019 6 chris markman (chris.markman@cityofpaloalto.org) is senior librarian – information technology & collections, palo alto city public library. m ryan hess (ryan.hess@cityofpaloalto.org) is library services manager — digital initiatives, information technology & collections, palo alto city public library. dan lou (dan.lou@cityofpaloalto.org) is senior librarian — information technology & collections, palo alto city public library. anh nguyen (anh.nguyen@cityofpaloalto.org) is library specialist, information technology & collections, palo alto city public library. we built the future of the internet…today! the elibrary team at the palo alto city library held a vr hackfest weaving together multiple emerging technologies into a single workshop. during the event, participants had hands -on experience building vr scenes, which were loaded to a raspberry pi and published online using the distributed web. throughout the day, participants discussed how these technologies might change our lives, for good and for ill. and afterward, an exhibit showcasing the participants’ vr scenes was placed at our mitchell park branch to stir further conversation. multiple emerging technologies explored the workshop was largely focused around the a-frame code, a framework for publishing 3d scenes to the web (https://aframe.io/). however, we also integrated a number of other technologies, including a raspberry pi, qr codes, a twitter-bot, and the inter-planetary file system (ipfs), which is a distributed web technology. virtual reality built with a-frame code in the vr hackfest, participants first learned how to use a-frame code to render 3d scenes that can be experienced through a web browser or vr headset. a-frame is a new framework that web publishers and 3d designers can use to design web sites, games and 3d art. a-frame is an extension of html, the code used to build web pages. anyone who is familiar with html will pick up a-frame very quickly, but it is simple enough even for beginners. for example, here is some raw a-frame code: mailto:chris.markman@cityofpaloalto.org mailto:ryan.hess@cityofpaloalto.org mailto:dan.lou@cityofpaloalto.org mailto:anh.nguyen@cityofpaloalto.org https://aframe.io/ vr hackfest | markman, hess, lou, and nguyen 7 https://doi.org/10.6017/ital.v38i4.11877 figure 1. try this code example! https://tinyurl.com/ipfsvr02. save the above code as an html file and open it with a webvr compatible browser like chrome and you will then see a blue cube in the center of your screen. by just changing the values of a few parameters, novice coders can easily change the shape, size, color and location of primitive 3d objects, add 3d backgrounds and more. advanced users can also insert javascript code to make the 3d scenes more interesting. for example, in the workshop, we provided javascript that animated a 3d robot head (see figure 1) pre-loaded into the codepen (https://codepen.io) interface for quicker editing and iteration. the inter-planetary file system (ipfs) the collection of 3d scenes created in the vr hackfest was published to the internet using the inter-planetary file system (ipfs), an open source distributed web technology originally created in palo alto by protocol labs in 2014 and now actively improved by a global network of software developers. ipfs allows anyone to publish to the internet without a server, through a peer-to-peer network that can also work seamlessly with the regular internet through http “gateways”. in november 2019, brave browser (https://brave.com) became the first to offer seamless ipfs integration, capable of spawning its own background process or daemon that can upload and download to ipfs content on the fly without the need for an http gateway or separate browser extension installation. unlike p2p technologies such as bittorrent, ipfs is best suited for distributing small files available for long periods of time rather than the quick distribution of large files over a short period of time. this is an oversimplification of what is really happening behind the scenes (part of the magic involves content-addressable storage and asynchronous communication methods based on pub/sub messaging, to name a few) but the ability to share and publish 3d environments and 3d objects in a way that can instantly scale to meet demand could have far reaching consequences for future technologies like augmented reality. https://tinyurl.com/ipfsvr02 https://codepen.io/ https://ipfs.io/ https://brave.com/ information technology and libraries | december 2019 8 figure 2. workshop attendees. ipfs can load content much faster, more securely (through features like automated cryptographic hash checking), and allows people to publish directly to the internet without the need of a thirdparty host. google, facebook, and amazon web services need not apply. the same technology has already been used to overcome censorship efforts by governments, but like any technology it has its downsides. content on ipfs is essentially permanent, allowing free speech to flourish but it could also make undesirable content, like hate speech or child pornography, all but impossible to control. toward 21st century literacy like our other technology programs, the vr hackfest was designed to engage customers around new forms of literacy, particularly around understanding code and thinking critically about emerging communication technologies. in 2019, we are already seeing how technologies like machine learning and social media are impacting social relations, politics and the economy. it is no longer enough to know how to read and write code that underlies the web. true literacy must also understand how these technologies interface with each other and how they impact people and society. vr hackfest | markman, hess, lou, and nguyen 9 https://doi.org/10.6017/ital.v38i4.11877 figure 3. the free-standing exhibit. information technology and libraries | december 2019 10 to this end, the vr hackfest sought to take participants on a journey, both technological but also sociological. once the initial practice with the code was completed, we moved on to a discussion of the consequences for using these technologies. with the distributed web, for example, we explored questions like: • what are the implications for permanent content on the web which no one can take down? • what power do gatekeepers like the government and private companies have over our online speech? • what does a 3d web look like and how will that change how we communicate, tell stories and learn? after the workshop ended, we continued the conversation with the public through an exhibit placed at our mitchell park branch (see figure 3). in this exhibit, we showcased the vr scenes participants had created and introduced the technologies underlying them. but we also asked people to reflect on the future of the internet and to share their thoughts by posting on the exhibit itself. public comments reflected the current discourse around the internet. responses (see figure 5) were generally positive—most of our customers mentioned better download speeds or other efficiency increases but a few also highlighted online privacy and safety improvements. we recorded an equal number of pessimistic and technical responses to the same question, these often demonstrated either knowledge of similar technology (e.g. “how is this different than napster?”) or displeasure with the current state of the world wide web (e.g. “less human connections” or “more spyware and less freedom”). outcomes one surprise outcome was that our project reached the attention of the developers of ipfs, who happen to live a few blocks away from the library. after reading about the exhibit online, their whole team visited our team at the library. in fact, one of their team turned out to be a former child customer of our library! the workshop itself, which was featured as a summer reading program activity, also brought in record numbers. originally open to 20 participants and later expanded to 30, the workshop grew a waitlist that more than quadrupled our initial room capacity. clearly, people were interested in learning about these two emerging technologies. we also want to take a moment to highlight the number of design iterations this project went through before making its way into the public eye. the free-standing vr hackfest exhibit was originally conceived as a wall mounted computer kiosk that encouraged users to publish a short message directly to the web with ipfs, but this raised too many privacy concerns and ultimately our building design does not make mounting a computer on the wall an easy task. our workshop also initially focused much more on command line skills working directly with ipfs, but user testing with library staff showed learning a-frame was more than enough. vr hackfest | markman, hess, lou, and nguyen 11 https://doi.org/10.6017/ital.v38i4.11877 figure 4. building the exhibit. information technology and libraries | december 2019 12 figure 5. exhibit responses. figure 6. visit from protocol labs co-founders. 0 2 4 6 8 10 12 14 16 18 20 optimistic pessimistic technical spam illegible n u m b e r o f p o st -i t n o te s vr hackfest | markman, hess, lou, and nguyen 13 https://doi.org/10.6017/ital.v38i4.11877 the vr hackfest was also a win because it combined so many different skills into a single project. we were not only working with open source tools and highlighting new technologies, but also building an experience for workshop attendees and showcasing their work to thousands of people. future work our immediate plans include re-use of the exhibit frame for future public technology showcases and offering another round of vr hackfest workshops, perhaps in a smaller group so participants have the chance to view their work while wearing a vr headset. figure 7. 3d mock-up. beyond this, we also think libraries have the opportunity to harness the distributed web for digital collections, potentially undercutting the cost of alternative content delivery networks or file hosting services. through this project we have already tested things like embedded ipfs links in marc records and building a 3d object library. essentially, all the pieces of the “future web” are already here and it is just a matter of time before all modern web browsers offer native support for these new technologies. in general, our project demonstrated the popularity of 21st-century literacy programs. but it also demonstrated the significant technical difficulties of conducting cutting edge technology workshops in public libraries. clearly, the demand is there, and our library will continue to strive to re-imagine library services. multiple emerging technologies explored virtual reality built with a-frame code the inter-planetary file system (ipfs) toward 21st century literacy outcomes future work topic modeling as a tool for analyzing library chat transcripts article topic modeling as a tool for analyzing library chat transcripts hyunseung koh and mark fienup information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13333 hyunseung koh (hyunseung.koh@uni.edu) is an assessment librarian and assistant professor of library services, university of northern iowa. mark fienup (mark.fienup@uni.edu) is an associate professor in the computer science department, university of northern iowa. © 2021. abstract library chat services are an increasingly important communication channel to connect patrons to library resources and services. analysis of chat transcripts could provide librarians with insights into improving services. unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. as a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from april 10, 2015, to may 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. in this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. this study found that from a human’s qualitative evaluation, probabilistic latent semantic analysis (plsa) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. interestingly, the commonly used technique latent dirichlet allocation (lda) did not necessarily perform better than plsa. also, semi-supervised techniques with human-curated anchor words of correlation explanation (corex) or guided lda (guidedlda) did not necessarily perform better than an unsupervised technique of dirichlet multinomial mixture (dmm). last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes. introduction with the rise of online education, library chat services are an increasingly important tool for student learning.1 library chat services have the potential to support student learning, especially for distant learners who have a lack of opportunity to come and learn about library and research skills in person. in addition, unlike traditional in-person reference services whose use has declined drastically, library chat services have become an important communication channel that connects patrons to library resources, services, and spaces.2 quantitative and qualitative analysis of chat transactions could provide librarians with insights into improving the quality of these resources, services, and spaces. for example, in order to maximize patrons’ satisfaction, librarians could identify or evaluate quantitative and qualitative mailto:hyunseung.koh@uni.edu mailto:mark.fienup@uni.edu information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 2 patterns of chat reference data (e.g., busiest days and times of nondirectional, research-focused questions) and develop a better staffing plan for assigning librarians or student employees to most appropriate days and times. furthermore, these insights could be used to help demonstrate library value by showing external stakeholders how successfully library chat services support students’ needs, which is increasingly in demand for higher education. 3 in practice, it is burdensome for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing chat software tools, such as libraryh3lp, questpoint, springshare’s libchat, and liveperson.4 currently, in order to obtain rich and hidden insights from large volumes of chat transcripts, librarians need to conduct manual qualitative analysis of chat transcripts with unstructured text data, which requires a lot of time and effort. in an age when library patrons' information needs have been changing, the lack of chat analysis tools that handle large volumes of transcripts hinders librarians’ ability to respond to patrons’ wants and needs in a timely manner.5 in particular, small and medium-sized academic libraries have seen a shortage of librarians and need to hire and train student employees , so librarians’ capabilities for real-time quick and easy analysis and assessment will become critical in helping them take appropriate actions to best meet user needs.6 as part of an effort to develop a quick and easy analysis tool for large volumes of chat transcripts, this study applied topic modeling, which is a statistical technique “for learning the latent structure in document collections” or “a type of statistical model for finding hidden topical patterns of words.”7 we compared outcomes of different types of topic modeling techniques and attempted to propose topic modeling techniques that would be most appropriate in the context of chat reference transcript data. literature review to identify the most appropriate research methods that would facilitate analyzing a vast amount of chat transcripts, this section first introduces literature in relation to research methods used in analyzing chat transcript data in library settings and nonlibrary settings. it follows by discussing different types of topic modeling techniques that have high potential for quick and easy analysis of chat transcripts and their strengths and weaknesses. chat transcript analysis methods in library settings in analyzing library chat transcripts, which are one major data source of library chat service research, researchers have used variants of quantitative and qualitative research methods.8 coding-based content analysis with or without predefined categories is one type of qualitative method.9 the other type of qualitative research method is conversation or language usage analysis but it is not a dominant type of research method, as compared to coding-based qualitative content analysis.10 the most common quantitative methods are simple descriptive countor frequencybased analyses that are accompanied by qualitative coding-based content analyses.11 in some recent research, advanced quantitative research methods, such as cluster analysis and topic modeling techniques, have been used, but they have not been fully explored yet with a wide range of techniques.12 chat transcript analysis methods in nonlibrary settings as shown in table 1, researchers in nonlibrary settings also used research methods in analyzing chat data from diverse technology platforms or contexts, ranging from qualitative manual coding methods to data mining and machine learning techniques. topic modeling techniques are one of the chat analysis methods, but again, it seems that they have not been fully explored yet in chat analyses in nonlibrary settings, even though they have been used in a wide range of contexts.13 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 3 table 1. chat transcript analysis applications in non-library settings disciplines platforms/sources of chat transcript data chat transcript analysis methods/tools/techniques education chat rooms and text chat14 qualitative content analysis health social media15 qualitative & quantitative content analysis business in-game chat features and chatbots16 a spell-checker, readability scores, the number of spelling and grammatical errors, linguistic inquiry and word count (liwc) program, logistic regression analysis, decision tree, support vector machine (svm) criminology instant messengers, internet relay chat (irc) channels, internet-based chat logs, and social media17 liwc program, cluster analysis, latent dirichlet allocation (lda) topic modeling techniques and their strengths and weaknesses as a quantitative and statistical method appropriate for analyzing a vast amount of chat transcript data, researchers from both library and nonlibrary settings used topic modeling. as shown in table 2, conventional topic modeling techniques include latent semantic analysis, probabilistic latent semantic analysis, and latent dirichlet allocation, each of which has its unique strengths and weaknesses.18 in order to overcome weaknesses of the conventional techniques, researchers have developed alternative techniques. for example, dirichlet multinomial mixture (dmm) has been proposed to overcome data sparsity problems in short texts.19 as another example, correlation explanation (corex) has been proposed to avoid time and effort to identify topics and their structure ahead of time.20 last, guided lda (guidedlda) has been proposed to improve performance of infrequently occurring topics.21 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 4 table 2. strengths and weaknesses of conventional topic modeling techniques acronym definitions strengths weaknesses latent semantic analysis lsa a document is represented as a vector of numbers found by applying dimensionality reduction (specifically, truncated svd) to summarize the frequencies of cooccurring words across documents. can deal with polysemy (multiple meanings) to some extent. is hard to obtain and to determine the optimal number of topics. probabilistic latent semantic analysis plsa a document is represented as vectors, but these vectors have nonnegative entries summing to 1 such that each component (topic) represents the relative prominence of some probabilistic mixture of words in the corpus. topics in a document are “probabilistic instead of the heuristic geometric distances.”22 can deal with polysemy issues; provides easy interpretation terms of word, document, and topic probabilities. has over-fitting problems. latent dirichlet allocation lda a bayesian extension of plsa that adds assumptions about the relative probability of observing different document's distributions over topics. prevents overfitting problems; provides a fully bayesian probabilistic interpretation. does not show relationships among topics. data, preprocessing, analysis, and evaluation this section first introduces the data used for this study. next, it explains the procedures of each stage starting from preprocessing to analyzing chat transcript data using different types of conventional and alternative topic modeling techniques. last, it discusses quantitative and qualitative evaluation in terms of the quality of topic outcomes across different types of topic technique. for more details including python scripts please visit our github page at https://github.com/mfienup/uni-library-chat-study. https://github.com/mfienup/uni-library-chat-study information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 5 data this study collected the university of northern iowa’s rod library chat reference data dated from april 10, 2015, to may 31, 2019 (irb#18-0225). this raw chat data was downloaded from libchat in the form of an excel spreadsheet with 9,942 english chat transcripts with each transcript as a separate row. preprocessing as the first step, this study removed unnecessary components of each chat transcript using a custom python script. components removed were timestamps, patron and librarian identifiers, http tags (e.g., urls), and non-ascii characters. next, it processed the resulting text words using python’s natural language toolkit (https://www.nltk.org/) and its wordnetlemmatizer function (https://www.nltk.org/_modules/nltk/stem/wordnet.html) to normalize words for further analyses. as the final step, it prepared the four types of data sets to identify which type of data set would produce better topic outcomes. the four types of data sets were as follows: • question-only: consists of only the initial question asked by the library patron in each chat transcript. only the latter 10.7% of the chats recorded in the excel spreadsheet contained an initial question column entry. the remaining chats assumed to contain their initial question in the patron’s first response if it was longer than a trivial welcome message. • whole-chat: consists of the whole chat transcripts from the library patron and librarians. • whole-chat with nouns and adjectives: consists of only nouns and adjectives as parts of speech (pos) from the whole chat transcripts. • whole-chat with nouns, adjectives, and verbs: consists of only nouns, adjectives, and verbs as pos from the whole chat transcripts. the first two data sets were prepared to examine if the first question initiated by each patron or the whole chat transcripts would help produce better topic outcomes. the last two data sets were prepared to examine which parts of speech retained would help produce better topic outcomes. data analysis with conventional topic modeling techniques this study first analyzed chat reference data using three conventional topic modeling techniques: latent semantic analysis (lsa), probabilistic latent semantic analysis (plsa), and two versions of latent dirichlet allocation (lda), as shown in table 3. all three techniques are examples of unsupervised topic modeling techniques that automatically analyze text data from a set of documents (in this study, a set of chat transcripts) to infer predominant topics or themes across all documents without human help. a key challenge, or a key parameter to be determined, for unsupervised topic modeling techniques is to identify the optimal number of topics. the study ran the commonly used lda technique with the whole-chat data set with various numbers of topics. fifteen was chosen as an optimal number of topics for this study by calculating and comparing the log-likelihood scores among various number of topics. https://www.nltk.org/ https://www.nltk.org/_modules/nltk/stem/wordnet.html information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 6 table 3. conventional topic modeling techniques and their sources technique programming language implementation source version used in the study latent semantic analysis python https://pypi.org/project/gensim/ 3.8.1 probabilistic latent semantic analysis python https://scikitlearn.org/stable/modules/generated/ sklearn.decomposition.nmf.html 0.21.3 latent dirichlet allocation (with sklearn) python https://scikitlearn.org/stable/modules/generated/ sklearn.decomposition.latentdirichlet allocation.html 0.21.3 latent dirichlet allocation (with pymallet) python https://github.com/mimno/pymallet dated february 26, 2019 also, before analyzing chat transcript data using lsa and plsa, this study performed a term frequency–inverse document frequency (tf–idf) transformation. tf–idf is a measure of how important a word is to a document (i.e., a single chat transcript) compared to its relevance in a collection of all documents. data analysis with alternative topic modeling techniques in addition to conventional topic modeling techniques, this study analyzed chat reference data using three alternative techniques of dirichlet multinomial mixture (dmm), anchored correlation explanation (corex) and guided lda (guidedlda), as shown in table 4. this study selected dmm as an alternative unsupervised topic modeling technique that has been developed for short texts. also, this study selected anchored corex and guided lda (guidedlda) as semi-supervised topic modeling techniques that require human-curated sets of words, called anchors or seeds, which nudge topic models toward including the suggested anchors. this is based on the assumption that human’s curated techniques would help produce better quality of topics than the unsupervised techniques. for example, the three words “interlibrary,” “loan,” and “request,” or the two words “article” and “database,” are possible anchor words in the context of library chat transcripts. such anchor words can appear anywhere within a chat in any order. https://pypi.org/project/gensim/ https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://github.com/mimno/pymallet information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 7 table 4. alternative topic modeling techniques and their sources unsupervised vs. semisupervised technique programming language implementation source version used in the study unsupervised dirichlet multinomial mixture (dmm) java https://github.com/qiang2 100/sttm 9/27/2019 semi-supervised anchored correlation explanation (corex) python https://github.com/gregve rsteeg/corex_topic 1/21/2020 semi-supervised guided lda using collapsed gibbs sampling python https://guidedlda.readthe docs.io/en/latest/ 10/5/2017 given that a known set of anchor words associated with academic library chats seems unavailable in the literature, this study decided to obtain a list of most meaningful anchor words by combining outcomes of the unsupervised techniques with a human’s follow-up curation, as follows: step 1. execute unsupervised topic modeling techniques step 2. combine resulting topics from all unsupervised topic modeling techniques step 3. identify a list of all possible pairs of words (bi-occurrences), e.g., 28 pairs of words if each topic has 8 words, and all possible combinations of tri-occurrences of words step 4. identify most common bi-occurrences and tri-occurrences of words across all topics by ordering in descending order by frequency step 5. select a set of anchors from these bi-occurrences and tri-occurrences of words by a human’s judgment in terms of selecting a set of anchor words, the librarian author of this paper judged whether combinations of words in each row from step 4 were aligned with frequently asked questions or easily inferable themes in academic library contexts. as shown in table 5, a set of “interlibrary,” “loan,” and “request” was selected as anchor words that are aligned with one frequently asked question about interlibrary loan requests, whereas a set of “access,” “librarian,” and “research” was not selected as anchor words because multiple themes, such as access to resources and asking for research help from librarians, can be inferred. additionally, a set of “hour,” “time,” and “today” was selected over a set of “time,” “tomorrow,” and “tonight” as better or clearer anchor words. https://github.com/qiang2100/sttm https://github.com/qiang2100/sttm https://github.com/gregversteeg/corex_topic https://github.com/gregversteeg/corex_topic https://guidedlda.readthedocs.io/en/latest/ https://guidedlda.readthedocs.io/en/latest/ information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 8 table 5. examples of anchor words that were selected and not selected examples of tri-occurrences of words (note: strikethrough denotes a set of words that were not selected as anchor words) 1 interlibrary loan request 2 hour time today 3 time tomorrow tonight 4 time today tomorrow 5 floor librarian research 6 access librarian research 7 camera digital hub 8 digital hub medium 9 access article journal 10 access article database 11 access account campus 12 research source topic 13 paper research topic quantitative evaluation with topic coherence metrics comparing the quality of topic outcomes across various topic modeling techniques is tricky. purely statistical and quantitative evaluation techniques, such as held-out log-likelihood measures, have proven to be unaligned with human intuition or judgment with respect to topic interpretability and coherency.23 thus, this study adopted the three topic coherence metrics of tcpmi (normalized pointwise mutual information), tc-lcp (normalized log conditional probability), and tc-nz (number of topic word pairs never observed together in the corpus) that have been introduced by boyd-graber, mimno, and newman; bouma; and lau, newman, and baldwin.24 these three metrics are based on the assumption that the likelihood that two words that co-occur in a topic would also co-occur within a corpus. to utilize the three topic coherence metrics, the study chose a binarized choice (e.g., does a transcript contain two words?) instead of a sliding window of fixed size (e.g., do two words appear within a fixed window of 10 consecutive words?) as a type of how to count term cooccurrences. this decision was made because each chat transcript is relatively short, and a fixed information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 9 window size seemed inconsistent across different type of data sets that included different parts of speech. in terms of the other decision to be made for applying the three topic coherence metrics, this study chose a training corpus of all the chat transcripts instead of external corpuses such as the entire collection of english wikipedia articles that has little in common with average library chat transcripts. qualitative evaluation with human judgment in addition to quantitative evaluation with topic coherence metrics, qualitative accuracy and interpretability were judged by the librarian author of this paper based on whether topics were aligned with frequently asked questions or easily inferable themes in academic library contexts. for example, “find or access book or article” was inferred, from a set of words in topic 1 on lsa in table 6, as an accurate and easily interpretable theme. from a set of words in topic 3 on lda, “reserve study room” and “check out laptop computer” were inferred as two separable, easily interpretable themes. from a set of words in topic 15 on corex with nine anchors, no theme was inferred as an easily interpretable theme. (see table 10 in the results section for all themes inferred from table 6.) information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 10 table 6. examples of topics found by topic modeling techniques topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. latent semantic analysis (lsa) topic 1. article book search find access link will check topic 2. renew book article room reserve search journal check topic 3. room renew reserve book study scheduler loan online topic 4. renew request loan interlibrary search room review peer topic 5. loan floor renew access interlibrary request log book topic 6. book open print request search loan renew interlibrary topic 7. print floor open printer color hour research pm topic 8. open hour print search review close peer floor topic 9. print access renew research book loan librarian open topic 10. floor article open book renew print locate database topic 11. article book attach file print database floor check topic 12. check book desk laptop answer print shortly open topic 13. answer desk shortly place room database circulation pick topic 14. review peer search reserve log access campus database topic 15. database file attach collection access journal research reserve probabilistic latent semantic analysis (plsa) topic 1. collection special youth contact email number archive department topic 2. book title hold online check pick number reserve topic 3. room reserve study scheduler reservation group rodscheduler (software) space topic 4. search bar click type journal onesearch (a discovery tool) result homepage topic 5. request loan interlibrary link illiad (system) submit inter instruction topic 6. renew online account book today number circulation item topic 7. access link log campus click work online sign topic 8. article journal attach file title access google scholar topic 9. research librarian paper appointment consultation source topic question topic 10. open hour today close pm tomorrow midnight tonight topic 11. check answer place shortly desk laptop student long topic 12. print color printer computer printing mobile release black topic 13. floor locate desk stack main fourth number section information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 11 topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. topic 14. database az subject ebsco(database) list business topic access topic 15. review peer journal topic sociology study article result latent dirichlet allocation (lda) with sklearn topic 1. file attach cite citation link article author pdf topic 2. check book renew student item today time member topic 3. room reserve computer laptop study check reservation desk topic 4. book request loan interlibrary check title online copy topic 5. search article database review result type google bar topic 6. student class access iowa course university college fall topic 7. research librarian source paper topic good appointment specific topic 8. email contact chat good librarian work question address topic 9. open hour today check pick hold desk close topic 10. link access click log work campus sign database topic 11. floor locate desk main art music circulation section topic 12. medium digital check video hub desk rent camera topic 13. article journal access title online link education amp topic 14. print printer color card scan document charge job topic 15. answer check place collection shortly special question number dirichlet multinomial mixture (dmm) topic 1. room reserve how will study check floor what topic 2. request loan book interlibrary how article will link topic 3. article access find journal link how search full topic 4. book how find check what online link will topic 5. article find attach file what how will link topic 6. how check open today desk hour will what topic 7. find article what search how research source database topic 8. how print will cite printer link what citation topic 9. search article find how review will database journal topic 10. book find floor how will where call number topic 11. book check how renew will today request what topic 12. research how librarian find what article will email topic 13. find how will contact collection what special email topic 14. access article link log how campus database work topic 15. article find will search what link book how anchored correlation explanation (corex) with nine anchor words topic 1. request loan interlibrary illiad (system) form submit inter fill topic 2. study reserve room scheduler hub medium equipment digital information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 12 topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. topic 3. search review peer bar result type onesearch (a discovery tool) homepage topic 4. today open hour pm assist close window midnight topic 5. locate floor main where third fourth desk stack topic 6. print printer color printing black white mobile release topic 7. number collection special call phone youth archive xxx topic 8. research librarian appointment consultation paper set xxx transfer topic 9. access database journal article campus full az text topic 10. email will contact work when good who student topic 11. education read school class professor amp teacher child topic 12. topic source cite write apa start citation recommend topic 13. find attach file google what scholar title specific topic 14. click log link left side catid button hand topic 15. shortly place answer check cedar fall iowa northern guidedlda with nine anchor words and confidence 0.75 topic 1. book request loan interlibrary will how check link topic 2. room reserve how check will desk study medium topic 3. search article find how will database book review topic 4. book check how renew today will hour open topic 5. book floor find how check where call locate topic 6. print how computer will printer color desk student topic 7. contact collection will find email special how check topic 8. research librarian find how what will email article topic 9. article access link how log click database find topic 10. article find how access what link attach file topic 11. find chat copy how good online what will topic 12. article find file attach what journal will work topic 13. how check book answer place shortly what find topic 14. book how find what sport link video textbook topic 15. how cite what find citation author article source information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 13 results this section first introduces which topic modeling techniques, as well as which type of data set, performed the best on each of the three topic coherence metrics. it follows by introducing which technique was the best according to human qualitative judgment. quantitative evaluation with topic coherence metrics given that for a topic coherence metric tc-pmi larger values mean more coherent topics, table 7 and its corresponding figure 1 show that corex with anchor words on the whole-chat performed best on tc-pmi. tf–idf & plsa on the whole-chat performed better than lda on the whole-chat. given that for topic coherence metric tc-lcp larger values mean more coherent topics, table 8 and its corresponding figure 2 show that dmm on the whole-chat performed best on tc-lcp. tf– idf & plsa on the whole-chat performed better than lda, even though lda (pymallet) on the whole-chat performed better than tc-idf & plsa on the whole-chat. given that for topic coherence metric tc-nz smaller values mean more coherent topics, table 9 and its corresponding figure 3 show that tf–idf & plsa, lda and lda (pymallet) on the wholechat performed best on tc-nz. table 7. tc-pmi comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, adjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa -0.066 -0.061 -0.063 -0.429 tf–idf & plsa 0.508 0.321 0.494 -0.122 lda (sklearn) 0.378 0.261 0.099 -0.995 lda (pymallet) 0.218 0.262 0.271 -0.091 dmm 0.136 0.22 0.285 0.109 corex without anchor words 0.47 0.497 0.396 -0.584 corex with nine anchor words 0.522 0.534 0.558 -0.401 guidedlda with nine anchor words and confidence 0.75 0.133 0.216 0.262 0.069 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 14 figure 1. tc-pmi comparison of topic modeling techniques on the four types of data sets. table 8. tc-lcp comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, adjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa -1.114 -1.124 -1.204 -1.675 tf–idf & plsa -0.751 -0.793 -0.893 -1.956 lda (sklearn) -0.789 -0.979 -1.263 -2.827 lda (pymallet) -0.637 -0.767 -0.918 -1.626 dmm -0.546 -0.645 -0.731 -1.159 corex without anchor words -0.868 -0.853 -1.062 -2.618 corex with nine anchor words -0.82 -0.791 -0.884 -2.348 guidedlda with nine anchor words and confidence 0.75 -0.637 -0.686 -0.792 -1.143 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 15 figure 2. tc-lcp comparison of topic modeling techniques on the four types of data sets. information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 16 table 9. tc-nz comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, sdjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa 0.267 0.267 0.333 1.8 tf–idf & plsa 0 0 0.067 3.8 lda (sklearn) 0 0.467 1.2 7.067 lda (pymallet) 0 0.133 0.267 1.8 dmm 0.067 0 0 0.267 corex without anchor words 0.333 0.067 0.6 7.067 corex with nine anchor words 0.133 0 0.133 5.267 guidedlda with nine anchor words and confidence 0.75 0.2 0.067 0 0.133 figure 3. tc-nz comparison of topic modeling techniques on all four data sets. information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 17 last, all tables 7 to 9 and their corresponding figures 1 to 3 clearly show that the whole-chat data set with all parts of speech was generally the best data set on all the techniques. qualitative evaluation with human judgment as shown in table 10, all techniques had relatively high accuracy and interpretability in terms of straightforward topics or themes in italicized text, such as “interlibrary loan,” “technology,” “hours,” and “room reservations,” where one keyword could represent a whole theme. however, in terms of less-straightforward topics or themes plsa performed better than the other techniques. in other words, plsa had the highest number of topics that are aligned clearly with frequently asked questions or are easily inferable themes in academic library contexts. also, plsa had a lower number of unrelated or multiple themes within one topic, whereas other techniques had a higher number of unrelated or multiple themes within one topic. as an example, topic 8 on dmm shows that “print” and “citation” can be inferred as two unrelated themes within one topic. table 10. examples of themes qualitatively inferred from a list of words (a topic) identified by each topic modeling technique topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) latent semantic analysis (lsa) topic 1. find or access book or article topic 2. renew book or article; reserve a room; search journal topic 3. renew book online; reserve room; loan topic 4. renew; interlibrary loan; search; room topic 5. renew book; interlibrary loan; floor topic 6. renew; interlibrary loan print book; search topic 7. print color; floor; hours; research topic 8. hours; print; search; peer peer review; floor topic 9. print; renew book; librarian; open hours topic 10. renew book and article, print, floor and locate; database topic 11. print; database; floor topic 12. check out book or laptop; print; open topic 13. circulation desk; room; database topic 14. not clear topic 15. not clear probabilistic latent semantic analysis (plsa) topic 1. contact information of special collection and youth topic 2. not clear topic 3. room reservation topic 4. journal search and onesearch topic 5. interlibrary loan request topic 6. how to renew book online topic 7. working from off campus (not clear) topic 8. journal article via google scholar topic 9. appointment with librarians for research consultations topic 10. open hours topic 11. not clear topic 12. printing information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 18 topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) topic 13. stack on the fourth floor topic 14. databases a-z for business including ebsco topic 15. peer reviewed journals for sociology latent dirichlet allocation (lda) with sklearn topic 1. not clear topic 2. not clear topic 3. reserve study room; check out laptop computer topic 4. interlibrary loan online topic 5. search article via databases topic 6. not clear topic 7. appointment with research librarians topic 8. contact librarian via email topic 9. open hours topic 10. database access from off campus topic 11. floor for art and music circulation desk topic 12. rent camera topic 13. access journal article topic 14. printing and charge topic 15. special collection dirichlet multinomial mixture (dmm) topic 1. reserve study room and floor topic 2. interlibrary loan topic 3. search and access article topic 4. find book online topic 5. find article (not clear) topic 6. open hours topic 7. find article and database topic 8. print; citation topic 9. find article & database topic 10. find book with call number topic 11. renew book (not clear) topic 12. email librarians for research help topic 13. special collection (not clear) topic 14. access article/database from on campus topic 15. find article (not clear) anchored correlation explanation (corex) with nine anchor words topic 1. interlibrary loan topic 2. reserve study room; equipment topic 3. peer-reviwed and onesearch topic 4. open hours topic 5. floor location topic 6. printing topic 7. special collection and phone number topic 8. research consultation appointment topic 9. access database a-z information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 19 topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) topic 10. not clear topic 11. not clear topic 12. apa citations topic 13. google scholar (not clear) topic 14. log in topic 15. not clear guidedlda with nine anchor words and confidence 0.75 topic 1. interlibrary loan topic 2. reserve study room & medium topic 3. search and find article; databases topic 4. renew book; hours topic 5. find book with call number topic 6. printing topic 7. special collection topic 8. email to research librarian topic 9. access article and databases topic 10. access article; attach file (not clear) topic 11. not clear topic 12. find article and journal; file attach (not clear) topic 13. not clear topic 14. find book, video, and textbook about sport topic 15. citation discussion given that different topic modeling techniques performed the best depending on different types of topic coherence metrics, it is not possible to make a firm conclusion that one technique is better than the others. interestingly, the commonly-used technique lda tested in both sklearn and pymallet in this study did not consistently outperform tf–idf & plsa. in addition, semisupervised techniques of anchored correlation explanation (corex) or guided lda (guidedlda) did not necessarily outperform an unsupervised technique of the dirichlet multinomial mixture (dmm). last, from a human’s qualitative judgment, plsa performed the best, which is aligned with the findings on tc-nz. this might imply that tc-nz is a more appropriate metric than the other metrics in measuring topic coherence in the context of academic library chat transcripts. in terms of different types of data sets, all three of the whole-chat data sets significantly outperformed the questions-only data set. at the outset of the study, it was conjectured that the initial question of each chat transaction might concentrate the essence of each chat, thereby leading to better performance. clearly this was not the case, possibly because the rest of chat transcripts would reinforce a topic by standardizing the vocabulary of the chat’s initial question. it was somewhat interesting that varying the parts of speech (pos) retained in the three whole-chat data sets had little benefit on the topic modeling analyses. it might imply that topic modeling information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 20 techniques are sensitive enough to differentiate across different parts of speech, thereby leading to good performance regardless of types of data sets. conclusion this study clearly showed that conventional techniques should be also examined to avoid any errors from the assumption that newly developed techniques such as lda would always outperform regardless of contexts. also, both quantitative and qualitative evaluations indicate that unsupervised techniques should be equally weighted as semi-supervised techniques with human interventions. as a future study, like other similar research, it would be meaningful to compare human qualitative judgment with scores of each metric more rigorously, along with more librarians’ input, to confirm (or disconfirm) our preliminary conclusion that tc-nz is the most appropriate topic coherence metric in the context of library chat transcripts.25 it would be also interesting to investigate and examine semi-supervised techniques with different types of anchoring approaches, such as tandem anchoring.26 last, in order to overcome limitations of this study, it would be valuable to collect more and diverse chat reference data and compare output of topics across different types of institutions (e.g., teaching versus research institutions). acknowledgments this project was made possible in part by the institute of museum and library services [national leadership grants for libraries, lg-34-19-0074-19]. endnotes 1 christina m. desai and stephanie j. graves, “cyberspace or face-to-face: the teachable moment and changing reference mediums,” reference & user services quarterly 47, no. 3 (spring 2008): 242–55, https://www.jstor.org/stable/20864890; megan oakleaf and amy vanscoy, “instructional strategies for digital reference: methods to facilitate student learning,” reference & user services quarterly 49, no. 4 (summer 2010): 380–90, https://www.jstor.org/stable/20865299; shu z. schiller, “chat for chat: mediated learning in online chat virtual reference service,” computers in human behavior 65 (july 2016): 651–65, https://doi.org/10.1016/j.chb.2016.06.053; mila semeshkina, “five major trends in online education to watch out for in 2021,” forbes, february 2, 2021, https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-inonline-education-to-watch-out-for-in-2021/?sh=3261272521eb. 2 maryvon côté, svetlana kochkina, and tara mawhinney, “do you want to chat? reevaluating organization of virtual reference service at an academic library,” reference and user services quarterly 56, no. 1 (fall 2016): 36–46, https://www.jstor.org/stable/90009882; sarah lemire, lorelei rutledge, and amy brunvand, “taking a fresh look: reviewing and classifying reference statistics for data-driven decision making,” reference & user services quarterly 55, no. 3 (spring 2016): 230–38, https://www.jstor.org/stable/refuseserq.55.3.230; b. jane scales, lipi turner-rahman, and feng hao, “a holistic look at reference statistics: whither librarians?,” evidence based library and information practice 10, no. 4 (december 2015): 173– 85, https://doi.org/10.18438/b8x01h. https://www.jstor.org/stable/20864890 https://www.jstor.org/stable/20865299 https://doi.org/10.1016/j.chb.2016.06.053 https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-in-online-education-to-watch-out-for-in-2021/?sh=3261272521eb https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-in-online-education-to-watch-out-for-in-2021/?sh=3261272521eb https://www.jstor.org/stable/90009882 https://www.jstor.org/stable/refuseserq.55.3.230 https://doi.org/10.18438/b8x01h information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 21 3 pamela j. howard, “can academic library instant message transcripts provide documentation of undergraduate student success?,” journal of web librarianship 13, no. 1 (february 2019): 61– 87, https://doi.org/10.1080/19322909.2018.1555504. 4 côté and kochkina, “do you want to chat?”; sharon q. yang and heather a. dalal, “delivering virtual reference services on the web: an investigation into the current practice by academic libraries,” journal of academic librarianship 41, no. 1 (november 2015): 68–86, https://doi.org/10.1016/j.acalib.2014.10.003. 5 feifei liu, “how information-seeking behavior has changed in 22 years,” nn/g nielsen norman group, january 26, 2020, https://www.nngroup.com/articles/information-seeking-behaviorchanges/; amanda spink and jannica heinström, eds., new directions in information behavior (bingley, uk: emerald group publishing limited, 2011). 6 kathryn barrett and amy greenberg, “student-staffed virtual reference services: how to meet the training challenge,” journal of library & information services in distance learning 12, no. 3–4 (august 2018): 101–229, https://doi.org/10.1080/1533290x.2018.1498620; robin canuel et al., “developing and assessing a graduate student reference service,” reference services review 47, no. 4 (november 2019): 527–43, https://doi.org/10.1108/rsr-06-20190041. 7 bhagyashree vyankatrao barde and anant madhavrao bainwad, “an overview of topic modeling methods and tools,” in proceedings of international conference on intelligent computing and control systems, 2018, 745–50, https://doi.org/10.1109/iccons.2017.8250563; jordan boydgraber, david mimno, and david newman, “care and feeding of topic models: problems, diagnostics, and improvements,” in handbook of mixed membership models and their applications, eds. edoardo m. airoldi et al. (new york: crc press, 2014), 225–54. 8 miriam l. matteson, jennifer salamon, and lindy brewster, “a systematic review of research on live chat service,” reference & user services quarterly 51, no. 2 (winter 2011): 172–89, https://www.jstor.org/stable/refuseserq.51.2.172. 9 kate fuller and nancy h. dryden, “chat reference analysis to determine accuracy and staffing needs at one academic library,” internet reference services quarterly 20, no. 3–4 (december 2015): 163–81, https://doi.org/10.1080/10875301.2015.1106999; sarah passonneau and dan coffey, “the role of synchronous virtual reference in teaching and learning: a grounded theory analysis of instant messaging transcripts,” college & research libraries 72, no. 3 (2011): 276–95, https://doi.org/10.5860/crl-102rl. 10 paula r. dempsey, “‘are you a computer?’ opening exchanges in virtual reference shape the potential for teaching,” college & research libraries 77, no. 4 (2016): 455–68, https://doi.org/10.5860/crl.77.4.455; jennifer waugh, “formality in chat reference: perceptions of 17to 25-year-old university students,” evidence based library and information practice 8, no. 1 (2013): 19–34, https://doi.org/10.18438/b8ws48. 11 robin brown, “lifting the veil: analyzing collaborative virtual reference transcripts to demonstrate value and make recommendations for practice,” reference & user services quarterly 57, no. 1 (fall 2017): 42–47, https://www.jstor.org/stable/90014866; sarah https://doi.org/10.1080/19322909.2018.1555504 https://doi.org/10.1016/j.acalib.2014.10.003 https://www.nngroup.com/articles/information-seeking-behavior-changes/ https://www.nngroup.com/articles/information-seeking-behavior-changes/ https://doi.org/10.1080/1533290x.2018.1498620 https://doi.org/10.1108/rsr-06-2019-0041 https://doi.org/10.1108/rsr-06-2019-0041 https://doi.org/10.1109/iccons.2017.8250563 https://www.jstor.org/stable/refuseserq.51.2.172 https://doi.org/10.1080/10875301.2015.1106999 https://doi.org/10.5860/crl-102rl https://doi.org/10.5860/crl.77.4.455 https://doi.org/10.18438/b8ws48 https://www.jstor.org/stable/90014866 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 22 maximiek, elizabeth brown, and erin rushton, “coding into the great unknown: analyzing instant messaging session transcripts to identify user behaviors and measure quality of service,” college & research libraries 71, no. 4 (2010): 361–73, https://doi.org/10.5860/crl48r1. 12 christopher brousseau, justin johnson, and curtis thacker, “machine learning based chat analysis,” code4lib journal 50 (february 2021), https://journal.code4lib.org/articles/15660; ellie kohler, “what do your library chats say?: how to analyze webchat transcripts for sentiment and topic extraction,” in brick & click libraries conference proceedings (maryville, mo: northwest missouri state university, 2017), 138–48, https://files.eric.ed.gov/fulltext/ed578189.pdf; megan ozeran and piper martin, “‘good night, good day, good luck,’” information technology and libraries 38, no. 2 (june 2019): 49–57, https://doi.org/10.6017/ital.v38i2.10921; thomas stieve and niamh wallace, “chatting while you work: understanding chat reference user needs based on chat reference origin ,” reference services review 46, no. 4 (november 2018): 587–99, https://doi.org/10.1108/rsr09-2017-0033; nadaleen tempelman-kluit and alexa pearce, “invoking the user from data to design,” college & research libraries 75, no. 5 (2014): 616–40, https://doi.org/10.5860/crl.75.5.616. 13 jordan boyd-graber, yuening hu, and david mimno, “applications of topic models,” foundations and trends in information retrieval 11, no. 2–3 (2017): 143–296, https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf. 14 ewa m. golonka, medha tare, and carrie bonilla, “peer interaction in text chat: qualitative analysis of chat transcripts,” language learning & technology 21, no. 2 (june 2017): 157–78, http://hdl.handle.net/10125/44616; laura d. kassner and kate m. cassada, “chat it up: backchanneling to promote reflective practice among in-service teachers,” journal of digital learning in teacher education 33, no. 4 (august 2017): 160–68, https://doi.org/10.1080/21532974.2017.1357512. 15 eradah o. hamad et al., “toward a mixed-methods research approach to content analysis in the digital age: the combined content-analysis model and its applications to health care twitter feeds,” journal of medical internet research 18, no. 3 (march 2016): e60, https://doi.org/10.2196/jmir.5391; janet richardson et al., “tweet if you want to be sustainable: a thematic analysis of a twitter chat to discuss sustainability in nurse education,” journal of advanced nursing 72, no. 5 (january 2016): 1086–96, https://doi.org/10.1111/jan.12900. 16 shuyuan mary ho et al., “computer-mediated deception: strategies revealed by languageaction cues in spontaneous communication,” journal of management information systems 33, no. 2 (october 2016): 393–420, https://doi.org/10.1080/07421222.2016.1205924; mina park, milam aiken, and laura salvador, “how do humans interact with chatbots?: an analysis of transcripts,” international journal of management & information technology 14 (2018): 3338–50, https://doi.org/10.24297/ijmit.v14i0.7921. 17 abdur rahman, m. a. basher, and benjamin c. m. fung, “analyzing topics and authors in chat logs for crime investigation,” knowledge and information systems 39, no. 2 (march 2014): 351–81, https://doi.org/10.1007/s10115-013-0617-y; michelle drouin et al., “linguistic https://doi.org/10.5860/crl-48r1 https://doi.org/10.5860/crl-48r1 https://journal.code4lib.org/articles/15660 https://files.eric.ed.gov/fulltext/ed578189.pdf https://doi.org/10.6017/ital.v38i2.10921 https://doi.org/10.1108/rsr-09-2017-0033 https://doi.org/10.1108/rsr-09-2017-0033 https://doi.org/10.5860/crl.75.5.616 https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf http://hdl.handle.net/10125/44616 https://doi.org/10.1080/21532974.2017.1357512 https://doi.org/10.2196/jmir.5391 https://doi.org/10.1111/jan.12900 https://doi.org/10.1080/07421222.2016.1205924 https://doi.org/10.24297/ijmit.v14i0.7921 https://doi.org/10.1007/s10115-013-0617-y information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 23 analysis of chat transcripts from child predator undercover sex stings,” journal of forensic psychiatry & psychology 28, no. 4 (february 2017): 437–57, https://doi.org/10.1080/14789949.2017.1291707; da kuang, p. jeffrey brantingham, and andrea l. bertozzi, “crime topic modeling,” crime science 6, no. 12 (december 2017): 1–12, https://doi.org/10.1186/s40163-017-0074-0; md waliur rahman miah, john yearwood, and siddhivinayak kulkarni, “constructing an inter‐post similarity measure to differentiate the psychological stages in offensive chats,” journal of the association for information science and technology 66, no. 5 (january 2015): 1065–81, https://doi.org/10.1002/asi.23247. 18 charu c. aggarwal and chengxiang zhai, eds. mining text data (new york: springer, 2012); rubayyi alghamdi and khalid alfalqi, “a survey of topic modeling in text mining,” international journal of advanced computer science and applications 6, no. 1 (2015): 146–53, https://doi.org/10.14569/ijacsa.2015.060121; leticia h. anaya, “comparing latent dirichlet allocation and latent semantic analysis as classifiers” (phd diss., university of north texas, 2011); barde and bainwad, “an overview of topic modeling”; david m. blei, “topic modeling and digital humanities,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-davidm-blei/; tse-hsun chen, stephen w. thomas, and ahmed e. hassan, “a survey on the use of topic models when mining software repositories,” empirical software engineering 21, no. 5 (september 2016): 1843–919, https://doi.org/10.1007/s10664-015-9402-8; elisabeth günther and thorsten quandt, “word counts and topic models: automated text analysis methods for digital journalism research,” digital journalism 4, no. 1 (october 2016): 75–88, https://doi.org/10.1080/21670811.2015.1093270; gabe ignatow and rada mihalcea, an introduction to text mining: research design, data collection, and analysis (new york: sage, 2017); stefan jansen, hands-on machine learning for algorithmic trading: design and implement investment strategies based on smart algorithms that learn from data using python (birmingham: packt publishing limited, 2018); lin liu et al., “an overview of topic modeling and its current applications in bioinformatics,” springerplus 5, no. 1608 (september 2016): 1– 22, https://doi.org/10.1186/s40064-016-3252-8; john w. mohr and petko bogdanov, “introduction—topic models: what they are and why they matter,” poetics 41, no. 6 (december 2013): 545–69, https://doi.org/10.1016/j.poetic.2013.10.001; gerard salton, anita wong, and chung-shu yang, “a vector space model for automatic indexing,” communications of the acm 18, no. 11 (november 1975): 613–20, https://doi.org/10.1145/361219.361220; jianhua yin and jianyong wang, “a dirichlet multinomial mixture model-based approach for short text clustering,” in proceedings of the twentieth acm sigkdd international conference on knowledge discovery and data mining (new york: acm, 2014), 233–42, https://doi.org/10.1145/2623330.2623715; hongjiao xu et al., “exploring similarity between academic paper and patent based on latent semantic analysis and vector space model, ” in proceedings of the twelfth international conference on fuzzy systems and knowledge discovery (new york: ieee, 2015), 801–5, https://doi.org/10.1109/fskd.2015.7382045; chengxiang zhai, statistical language models for information retrieval (williston, vt: morgan & claypool publishers, 2018). 19 neha agarwal, geeta sikkaa, and lalit kumar awasthib, “evaluation of web service clustering using dirichlet multinomial mixture model based approach for dimensionality reduction in service representation,” information processing & management 57, no. 4 (july 2020), https://doi.org/10.1016/j.ipm.2020.102238 ; chenliang li et al., “topic modeling for short https://doi.org/10.1080/14789949.2017.1291707 https://doi.org/10.1186/s40163-017-0074-0 https://doi.org/10.1002/asi.23247 https://doi.org/10.14569/ijacsa.2015.060121 http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ https://doi.org/10.1007/s10664-015-9402-8 https://doi.org/10.1080/21670811.2015.1093270 https://doi.org/10.1186/s40064-016-3252-8 https://doi.org/10.1016/j.poetic.2013.10.001 https://doi.org/10.1145/361219.361220 https://doi.org/10.1145/2623330.2623715 https://doi.org/10.1109/fskd.2015.7382045 https://doi.org/10.1016/j.ipm.2020.102238 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 24 texts with auxiliary word embeddings,” in proceedings of the thirty-ninth international acm sigir conference on research and development in information retrieval (new york: acm, 2016), 165–74, https://doi.org/10.1145/2911451.2911499; jipeng qiang et al., “short text topic modeling techniques, applications, and performance: a survey,” ieee transactions on knowledge and data engineering 14, no. 8 (april 2019): 1–17, https://doi.org/10.1109/tkde.2020.2992485. 20 ryan j. gallagher et al., “anchored correlation explanation: topic modeling with minimal domain knowledge,” transactions of the association for computational linguistics 5 (december 2017): 529–42, https://doi.org/10.1162/tacl_a_00078. 21 jagadeesh jagarlamudi, hal daumé iii, and raghavendra udupa, “incorporating lexical priors into topic models,” in proceedings of the thirteenth conference of the european chapter of the association for computational linguistics (stroudsburg, pa: acl, 2012), 204–13, https://www.aclweb.org/anthology/e12-1021; olivier toubia et al., “extracting features of entertainment products: a guided latent dirichlet allocation approach informed by the psychology of media consumption,” journal of marketing research 56, no. 1 (december 2019): 18–36, https://doi.org/10.1177/0022243718820559. 22 nan zhang and baojun ma, “constructing a methodology toward policy analysts for understanding online public opinions: a probabilistic topic modeling approach,” in electronic government and electronic participation, eds. efthimios tambouris et al. (amsterdam, netherlands: ios press bv, 2015): 72–9, https://doi.org/10.3233/978-1-61499-570-8-72. 23 jonathan chang et al., “reading tea leaves: how humans interpret topic models,” in proceedings of the twenty-second international conference on neural information processing systems (new york: acm, 2009), 288–96, https://dl.acm.org/doi/10.5555/2984093.2984126. 24 gerlof bouma, “normalized (pointwise) mutual information in collocation extraction,” in proceedings of the international conference of the german society for computational linguistics and language technology (tübingen, germany: gunter narr verlag, 2009), 43–53; boydgraber, mimno, and newman, “care and feeding of topic models,” in handbook of mixed membership models and their applications, eds. edoardo m. airoldi, david m. blei, elena a. erosheva, and stephen e. fienberg (boca raton: crc press, 2014), 225–54; jey han lau, david newman, and timothy baldwin, “machine reading tea leaves: automatically evaluating topic coherence and topic model quality,” in proceedings of the fourteenth conference of the european chapter of the association for computational linguistics (stroudsburg, pa: acl, 2014), 530–39, https://doi.org/10.3115/v1/e14-1056. 25 lau, newman, and baldwin, “machine reading tea leaves”; david newman et al., “automatic evaluation of topic coherence,” in proceedings of human language technologies: the 2010 annual conference of the north american chapter of the association for computational linguistics (new york: acm, 2010), 100–108, https://dl.acm.org/doi/10.5555/1857999.1858011. 26 jeffrey lund et al., “tandem anchoring: a multiword anchor approach for interactive topic modeling,” in proceedings of the fifty-fifth annual meeting of the association for computational linguistics (stroudsburg, pa: acl, 2017), 896–905, https://doi.org/10.18653/v1/p17-1083. https://doi.org/10.1145/2911451.2911499 https://doi.org/10.1109/tkde.2020.2992485 https://doi.org/10.1162/tacl_a_00078 https://www.aclweb.org/anthology/e12-1021 https://doi.org/10.1177/0022243718820559 https://doi.org/10.3233/978-1-61499-570-8-72 https://dl.acm.org/doi/10.5555/2984093.2984126 https://doi.org/10.3115/v1/e14-1056 https://dl.acm.org/doi/10.5555/1857999.1858011 https://doi.org/10.18653/v1/p17-1083 abstract introduction literature review chat transcript analysis methods in library settings chat transcript analysis methods in nonlibrary settings topic modeling techniques and their strengths and weaknesses data, preprocessing, analysis, and evaluation data preprocessing data analysis with conventional topic modeling techniques data analysis with alternative topic modeling techniques quantitative evaluation with topic coherence metrics qualitative evaluation with human judgment results quantitative evaluation with topic coherence metrics qualitative evaluation with human judgment discussion conclusion acknowledgments endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. consortia building: a handshake and a smile, island style cutright, patricia j information technology and libraries; jun 2000; 19, 2; proquest pg. 90 consortia building: a handshake and a smile, island style patricia j. cutright in the evaluation of consortia and what constitutes these entities the discussion runs the gamut. from small, loosely knit groups who are interested in cooperation for the sake of improving services to large membershipdriven organizations addressing multiple interests, all recognize the benefits of partnerships. the federated states of micronesia are located in the western pacific ocean and cover 3.2 million square miles. throughout this scattering of small islands exists an enthusiastic library community of staff and users that have changed the outlook of libraries since 1991. motivated by the collaborative eff orts of this group, a project has unfolded over the past year that will furth er enhance library services through staff training and education while utilizing innovative technology. in assessing the library needs of the region this group crafted the document "the federated states of micronesia library services plan, 1999-2003," which coalesces the concepts, goals, and priorities put forward by a broad-based contingency of librarians. the compilation of the plan and its implementation demonstrate an understanding of the issues and exhibit the ingenuity, creativity, and willingness to solve problems on a g rand scale addressing the needs of all libraries in this vast pacific region. t he basic philosophy inher ent in librarianship is the concept of sharing. the di sse mination of information through material exchang e and interlibrary communication has enriched so cieties for centuries. th ere ar e few institutions other than libraries that are better equipped or suited for such cooperation and collaborati ve e ndeavors. with servic e as the lifeblood that runs through its inky veins , the librar y has the potential to be the driving force in an y community toward partnerships that a fford mutual benefit for all. the examination of the literatur e exposes a wid e rang e of perceptions as to the d e finition of what is a consortium . the term "consortia" conjur es up impressions that span the spectrum from highly or ganized, membership-driv en groups to loosely knit cadres focusing on impro ving services to their patrons however they can make it happen. in kopp 's pap er "library consortia and patricia j. cutright (cutright@eou .edu} is library director of the pierce library at eastern oregon university. 90 information technology and libraries i june 2000 information technology : th e past, the present, th e promise" he presents information from a study conduct ed by ruth patrick on academic library consortia. in that study she identified four general types of consortia : • large consortia concerned primarily with computerized large-scale technical processing; • small consortia conc erned with user services and everyday probl ems ; • limited-purpose consortia cooperating with respect to limited special subject areas; • limited-purpose con sorti a concerned primarily with interlibrary loan or reference; and network operations.i with this distinction in mind , this paper will focus on th e second category typifying a small , less structured organization. whil e on a visiting assis tantship in the federated states of micronesia (fsm), i worked with a partnership of libraries that believe in order for cooperation to succeed, results for the patron must be the goal-not equity between libraries or some magical balance between resources lent by one library and resources received from a noth er library.2 unified effort s to provide service to the p a tron is the key. the libraries on a small, rem ote island situated in the western pacific ocean exhibit this grassroots effort that define s the true meaning of consortia-demonstrating collaboration , cooperation , and partnerships. it is a multi type library cooperative that not only encompasses interaction among libraries but also betwe en agencies as well as governments. the librarians on the island of pohnpei, micron esia, and all the islands throughout the federated states of micronesia have embraced this consortia) attitud e whil e achieving much through these collaborative efforts : • the joint work done on crafting the library services plan, 1999-2003 for the libraries throu ghout the federated states of micronesia • initiating successful grant-writing efforts which target national goals and priorities • implementing a collaborative library automation project which is d esigned to evolve into a national union catalog • the implementation of a viable resource-sharing and document delivery service for the nation i background and socioeconomic overview micron esia, a name m eaning " tiny islands ," comprise s som e 2,200 volcanic and coral islands spread throughout reproduced with permission of the copyright owner. further reproduction prohibited without permission. 3.2 million square miles of pacific ocean. lying west of hawaii, east of the philippines, south of japan and north of australia, the total land mass of all these tropical islands is fewer than 1,200 square miles with a population base estimated at no more than 111,500.3 a locationunique region, but nonetheless still plagued with all the problems associated with any geographically remote, economically depressed area found anywhere in the united states or elsewhere in the world. the federated states of micronesia is a small-island, developing nation that is aligned with the united states through a compact of free association, making it eligible for many u.s. federal programs. the economic base is centered around fisheries and marine-related industries, tourism, agriculture, and small-scale manufacturing. the average per capita income in 1996 was $1,657 for the four states of the fsm: kosrae, pohnpei, yap, and chuuk. thirteen major languages exist in the country, with english as the primary second language. the 607 different islands, atolls, and islets dot an immense expanse of ocean; this geographic condition presents challenges in implementing and enhancing library services and technology. 4 despite the extreme geographic and economic conditions, the college of micronesia-fsm national campus in collaboration with the librarians throughout the states have been successful in implementing nationwide projects. these endeavors have resulted in technical infrastructure and the foundation for information technology instruction supported through awards from the u.s. department of education, the title iii program, and the national science foundation. i collaboration: building bridges that cross the oceans the libraries in micronesia have shown an ongoing commitment to librarianship and cooperation since the establishment of the pacific islands association of libraries and archives (piala) in 1991. the organization is a micronesia-based regional association committed to fostering awareness and encouraging cooperation and resource sharing among libraries, archives, museums, and related institutions. piala was formed to address the needs of pacific islands librarians and archivists, with a special focus on micronesia; it is responsible for the common-thread cohesiveness shared by the librarians over the past eight years. the organization has grown to become an effective champion of the needs of libraries and librarians in the pacific region.s when piala was established, the most pressing areas of concern within the region were development of resource-sharing tools and networks among the libraries, archives, museums, and related institutions of the pacific islands. the development of continuing education programs and the promotion of technology and telecommunications applications throughout the region were areas targeted for attention. those concerns have changed little since the group's inception. building upon that original premise, in january 1999 a group of interested parties from throughout the federated states of micronesia met to draft a document they envisioned would lay the groundwork for library planning over the next five years. this strategic plan encompasses all library activity-services, staffing, and the impact technology will have on libraries in the region. the document, "the federated states of micronesia library services plan, 1999-2003," coalesces the concepts, goals, and priorities put forward by a broad-based contingent. in this meeting, the group addressed basic issues of library and museum service, barriers and solutions to improve service delivery, and additional funding and training resources for libraries and museums.6 the compilation of the plan crafted at the gathering demonstrated a thorough understanding of the issues that face the librarians of the vast region. it exhibits the ingenuity, creativity, and willingness to problem-solve on a grand scale in a way that addresses the needs of all libraries in the pacific region. the goals set forward by the writing session group illustrate the concerns impacting library populations throughout the fsm. the fsm has now established six major goals to carry out its responsibilities and the need for overall improvement in and delivery of library services: 1. establish or enhance electronic linkages between and among libraries, archives, and museums in the fsm. 2. enhance basic services delivery and promote improvement of infrastructure and facilities. 3. develop and deliver training programs for library staff and users of the libraries. 4. promote public education and awareness of libraries as information systems and sources for lifelong learning. 5. develop local and nationwide partnerships for the establishment and enhancement of libraries, museums, and archives. 6. improve quality of information access for all segments of the fsm population and extend access to information to underserved segments of the population. priorities the following are general priorities for the fsm library services plan. the priorities represent needs for overall improvement of the libraries, museums, and archives. the priorities are based on the fact that currently libraries, museums, and archives development is in its infancy in consortia building i cutright 91 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the fsm. specific priorities will change from year to year as programs are developed. 1. establishment of new libraries and enhancement of existing library facilities to increas e accessibility of all fsm citizens to library resources and services. outer islands and remote areas generally have no access to libraries or information sources. new facilities or mechanisms need to be established to provide access to information resources for the public. existing public and school library facilities often lack adequate staffing, climate control, and electrical connections needed to meet the needs of the community. existing public and school libraries also need to improve their facilities and services delivery to meet the needs of disabled individuals and other special populations. 2. provide training and professional development for library operation and use of new information technologies. a survey held during the writing session indicated that public and school library staff do not currently possess the skills needed to effectively provide assistance in the use of new information technologies. well-designed training programs with mechanisms for follow-up technical assistance and support need to be developed and implemented. 3. promote collaboration and cooperation among libraries, museums, and archives for sharing of holdings and technical ability. limited holdings, financial capacity, and human resources are major barriers to improving library services. collaboration and cooperation are needed among libraries, museums, and archives to maximize scarce resources . 4. develop recommended standards and guidelines for library services in the fsm. the ability to share resources and information could be significantly increased by development and implementation of recommended standards and guidelines for library services. standardization could assist with sharing of holdings and holdings information, increase availability of technical assistance, and provide guidance as new libraries and library services are set up. 5. increase access to electronic information sources. existing public and school libraries have limited or no access to electronic linkages including basic services such as e-mail and connections to the internet. the priority need is to establish basic electronic linkages for all libraries, followed by extending access to electronic information to all users.7 i shifting into action with the drafting of this five-year plan, the librarians stated emphatically the need and desire to move ahead 92 information technology and libraries i june 2000 with haste and determination . as the plan was conceptualized and documented, a small cadre of librarians from the college of micronesia -fsm national campus, the public library, and high school library crafted two successful grant proposals which addressed: • a cooperative library automation project which is designed to evolve into a national union catalog (goal 1; priorities 3, 5); • the installation of intern et services that would link the college of micronesia-fsm campuses, the public library, and high school library (goals 1, 2, 6; priorities 1, 2, 3, 5); • the development and delivery of training programs for library staff and users of the libraries (goals 3, 4, 6; priority 2); and • the implementation of a viable resource-sharing and document delivery service for the nation (goal 1, 2, 5, 6; priorities 3, 4, 5). over the past year the awarding of grant funds has shifted the library community into high gear with the design and implementation of project activities that will fulfill the targeted needs. the automation project and internet connectivity a collaborative request submitted by the bailey olter high school (bohs) library and the pohnpei public library provided the funding necessary to computerize the manual card catalog system at bohs and upgrade the dated automated library system at pohnpei public library. since the college of micronesia-fsm campuses are automated, it was important for the high school library and the public library to install like systems to achieve a networkable automated system, facilitating the development of a union catalog for all th e libraries' holdings. this migration to an automated system promoted cooperation and resource sharing for the island libraries-opening a wealth of information for all island residents. the project entailed purchasing a turnkey cataloging and circulation system that will facilitate the cataloging and processing of new acquisitions for each library as well as the conversion of approximately five thousand volumes of material already owned by the public and high school libraries. through internet connectivity, which was integral to the project, the system would also serve as public access to the many holdings of the libraries for students, faculty, and town patrons through a union catalog to be established in the future. the development and deliv ery of training programs for library staff and users is linked to the implementation of a viable resource-sharing and document delivery service for the nation. stated earlier, the librarians of the federated states of micronesia accepted the challenge facing them in rampreproduced with permission of the copyright owner. further reproduction prohibited without permission. ing up for the twenty-first century. their prior experience laid the groundwork necessary to implement the training programs necessary to bring the library community the knowledge and skills needed. a survey administered during the writing session indicated that few public and school librarians have significant training in or use of electronic linkages or information technologies, nor are they actively using such technologies at present. of the fourteen public and school librarians in the four states of micronesia, none hold a master's degree from an accredited library school or library media specialist certification. an exception is the library staff at the com-fsm national campus, where two-thirds of the librarians hold professional credentials. significant effort is needed on a sustained basis for effective training in the understanding and use of information systems throughout the nation. where training has occurred, it has often been of an infrequent, short variety with little support for ensuring implementation at the work site. additionally, often there are no formal systems for getting answers to questions when problems do arise. in addressing the information needs for this population it is apparent that education is the key component for continued improvement of library services. this concern is evident in a paper by daniel barron, where it is stated that only 54 percent of librarians and 19 percent of staff in libraries serving communities considered to be rural (i.e., 25,000 people or fewer) have an ala-mls. 8 and dowlin proposes even more perplexing questions, "how can a staff with such an educational deficit be expected to accomplish all that will be demanded to enable their libraries to go beyond being a warehouse of popular reading materials? how can we expect them to change from pointers and retrievers to organizers and facilitators?" 9 micronesia is no different than any other state or country in wanting its population to have access to qualified staff, current resources, and services. it recognizes the libraries are inadequately staffed and many others have staff who are seriously undereducated to meet the expanded information needs of the people in their communities. if these libraries are to seize the opportunities suggested by the developing positive view, develop services to support this view, and market such a view to a wider range of citizens in their communities they must invest in the intellectual capital of their staffs. in order to carry out this charge, the following activities were designed to address the educational and training needs of the librarians in the fsm. as outlined in a recently funded institute of museums and library services (imls) national leadership grant, preparation has begun with the following activities, which will address the staffing and technology concerns described in fsm libraries: 1. recruit and hire an outreach services librarian to survey training needs, coordinate and plan training, and deliver or arrange for needed training. 2. develop a skills profile for all library, museum, and archival staff positions. 3. identify training contact or coordinator for each state. 4. develop and provide periodic updates to operational manuals for school and public libraries, museums, and archives. 5. recruit local students and assist them in seeking out scholarships for professional training off island. 6. design and implement programs to provide continuous training and on-site support in new technological developments and information systems (provided on-site and virtually). 7. establish a summer training institute offering training based on needs as determined by the outreach services librarian in collaboration with state coordinators and recruiting onand off-island expertise as instructors. 8. design and develop programs for orientation and training of users of information systems (provided on-site and virtually). 9. develop and implement a "train the trainer" program, which will have representation from all four states, that will ensure continuity and sustainability of the project for the years to come. 10 the primary requisite to initiating this project is the recruitment and hiring of the outreach services librarian who will then begin the activities as listed. a beginning cadre of librarians gleaned from the summer institute will become the trainers of the future, perpetuating a learning environment enhanced with advanced technology. breakthroughs in distance education, aided with advances in telecommunications, will significantly impact this project. on-site training will be imperative for the initial cadre of summer institute attendees to provide sound teaching skills and a firm understanding of the material at hand. follow-up training will be presented on each island by the trainer either on location or virtually with available technology. products such as web course in a box, webct, or nicenet will be analyzed for appropriate utilization as teaching tools. these products will take advantage of newly established internet connections on each island and, more importantly, will provide the interactive element that distinguishes this learning methodology from the "talking head" or traditional correspondence course approach. a web site designed for this project will provide valuable information and connectivity for not only the pacific library community but anyone worldwide who may be interested in innovative methods of serving remote populations. using computer conferencing and virtual communities technology, a video conferencing system such as 8 x 8 consortia building i cutright 93 reproduced with permission of the copyright owner. further reproduction prohibited without permission. technologies will be used, which will allow face-to-face interaction with trainer and student in an intra-island situation (interisland telephone rates are too expensive for regular use as a teaching tool). to enhance the learning experience and information retrieval component for these librarians and the population they serve, the project also incorporates implementation of a viable resource-sharing, document delivery system capitalizing on a shared union catalog and using a service such as research library group's ariel product. with library budgets reflecting the critical economic climate of the nation, it becomes even more crucial for collaborative collection development and resource sharing to satisfy the needs of the library user. to maintain cost-effective communication and build a sense of community among the librarians, the messaging software icq has been installed on all participant hardware and utilized for group meetings, question and answer, and general correspondence. since icq operates as part of the internet, this package allows low-cost communication with maximum benefit in connecting the group. this technology will also be used as the primary mechanism for communication with an outside advisor who will provide expertise in the area of outreach services for rural populations. the realm of outreach services in libraries has always presented unique challenges that can now benefit greatly from current and emerging technologies. the definition of "outreach" is truly a matter of perspective, with the more traditional sense relating to a specific library servicing its own user or patron. but current practice regards "outreach" as a mere extension of services to all users whether they be a registered patron or colleague or peer. micronesia is a country where the proverbial phrase "the haves and the have-nots" is amplified. the recent (and ongoing) installation of internet services in the region has made possible many basic changes, but there still exists the reality that some of the sites for services proposed have nothing more than a common analog line and rudimentary services. as an example of the realities that exist, only 38 percent of the approximately 180 public schools in the fsm have access to reliable sources of electricity. another challenge for these libraries is the climate and environment, which has a significant impact on library facilities, equipment, and holdings. the fsm lies in the tropics, with temperatures ranging daily from 85 to 95 degrees with humidity normally 85 percent or higher.11 the high salt content in the ocean air wreaks havoc upon electrical equipment, and the favorable environs inside a library often entice everything from termites in the wooden bookcases to nesting ants in keyboards. from these examples it is apparent that the problems that trouble these libraries are not going to be solved with the magic bullet of technology. this reality constitutes the 94 information technology and libraries i june 2000 need for varying strategies and different aproaches to address the training requirements of the library staff. i summary the fsm library group, in particular the pohnpeian librarians, have accomplished much in the past year. the motivating factor for the flurry of activity that enveloped the libraries on pohnpei was spurred by the collaborative writing session in january 1999. a week-long "meeting of the minds" from libraries throughout micronesia produced the blueprint that will map the future of libraries and library service for years to come. these librarians stated their primary issues in delivering library services and came to a consensus on activities needed to address the issues. the "federated states of micronesia library services plan, 1999-2003" was crafted as a working document, a strategic plan for improving library services in the pacific region, and a commitment to achievement through collaboration. while in micronesia i observed the impact that the unification of ideas can have on the citizens of a community. in my fourteen-year tenure at eastern oregon university i have been exposed to the benefits of "consortium attitude" that come from cooperation and partnerships. time and again the university demonstrates the positive effects of what is referred to as "politics of entanglement." shepard describes the overriding philosophy that has been the recipe for success: the politics are really quite simple. we maintain an intricate pattern of relationships, any one of which might seem inconsequential. yet there is strength in the whole that is largely unaffected if a single relationship wanes. rather than mindlessly guarding turf, we seek to involve larger outside entities and in the ensnaring, to turn potential competitors into helpful partners .12 just as eastern oregon university has discovered, the libraries of the federated states of micronesia are learning the merits of entanglement. references and notes 1. james j. kopp, "library consortia and information technology: the past, the present, the promise," information technology and libraries 17 (mar. 1998): 7-12. 2. jan ison, "rural public libraries in multi-type library cooperatives," library trends 44 (summer 1995): 29-52. 3. pacific islands association of libraries and archives, www.uog.edu/rfk/piala.html, accessed june 6, 2000. 4. division of education, department of health, education and social affairs, federated states of micronesia, "federated reproduced with permission of the copyright owner. further reproduction prohibited without permission. states of micronesia, library services plan 1999-2003" (march 3, 1999): 2. 5. pacific islands association of libraries and archives, www.uog.edu/rfk/piala.html, accessed june 6, 2000. 6. division of education and others, "library services plan," 4. 7. ibid, 6. 8. daniel d. barron, "staffing rural pubic libraries: the need to invest in intellectual capital," library trends 44 (summer 1995): 77-88. the mit from gutenberg to the global information infrastructure access to information in the networked world christine l. borgman considers digital libraries from a social rather than a technical perspective. digital libraries and electronic publishing series 340 pp. $42 now in paperback remediation understanding new media jay david bolter and richard grusin " clearly written and not overly technical, this book will interest general readers, students, and scholars engaged with current trends in technology." choice 307 pp., 102 illus. $17.95 paper 9. k. e. dowlin, "the neographic library: a 30-year perspective on public libraries," in libraries and the future: essays oil the library ill the twenty-first century, f. w. lancaster, ed. (new york: haworth pr., 1993). 10. patricia j. cutright and jean thoulag, college of micronesia-fsm national campus, "institute of museums and library services, national leadership grant" (mar. 19, 1999). 11. division of education and others, "library services plan," 2. 12. w. bruce shepard, "spinning interin;titutional webs," aahe bulletin 49 (feb. 1997): 3-6. the intellectual foundation of information organization elaine svenonius "provides sound guidance to future developers of search engines and retrieval systems. the work is original, building on the foundations of information science and librarianship of the past 150 years." dr. barbara 8. tillett, director. ils program, library of congress digital libraries and electronic publishing series 264 pp. $37 now in paperback information ecologies using technology with heart bonnie a. nardi and vicki l. o'day "a new and refreshing perspective on our technologically dependent society." daily telegraph 246 pp. $15.95 paper to order call 800-356-0343 (us & canada) or 617-625-8569. prices subject to change without notice. http:/ /mitpress.mit.edu consortia building i cutright 95 letter from the editor: farewell 2020 letter from the editor farewell 2020 kenneth j. varnum information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13051 i don’t think i’ve ever been so ready to see a year in the rear-view mirror as i am with 2020. this year is one i’d just as soon not repeat, although i nurture a small flame of hope. hope that as a society what we have experienced this year will exert a positive influence on the future. hope that we recall the critical importance of facts and evidence. hope that we don’t drop the effort to be better members of our local, national, and global communities and treat everyone equitably. hope that as a global populace we continue to get into “good trouble” and push back against institutionalized policies and practices of racism and discrimination and strive to be better. despite the myriad challenges this year has brought, it is welcome to see so many libraries continuing to serve their communities, adapting to pandemic restrictions, and providing new and modified access to books and digital information. and equally gratifying, from my perspective as ital’s editor, is that so many library technologists continue to generously share what they have learned through submissions to this journal. along those lines, i’m extending my annual invitation to our public library colleagues to propose a contribution to our quarterly column, “public libraries leading the way.” items in this series highlight a technology-based innovation from a public library perspective. topics we are interested in could include any way that technologies have helped you provide or innovate service to your communities during the pandemic, but could touch on any novel, interesting, or promising use of technology in a public library setting. columns should be in the 1,000-1,500 word range and may include illustrations. these are not intended to be research articles. rather, public libraries leading the way columns are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. wishing you the best for 2021, kenneth j. varnum, editor varnum@umich.edu december 2020 https://ejournals.bc.edu/index.php/ital/pllw https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform mailto:varnum@umich.edu letter from the editor: a blank page letter from the editor a blank page kenneth j. varnum information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12405 nothing is as daunting as a blank page, particularly now. as i sat down to write this issue’s letter, i was struck by how much fundamental uncertainty is in our lives, so much trauma. a blank page can emphasize our concerns that the old familiar should return at all, or that a new, better, normal will emerge. at the same time, a blank page can be liberating at a time when so much of our social, professional, and personal lives needs to be reconceptualized and reactivated in new, healthier , more respectful and inclusive ways. we are collectively faced with two important societal ailments. the first is the literal disease of the covid-19 pandemic that has been with us for only months. the other is the centuries-long festering disease of racial injustice, discrimination, and inequality that typifies (particularly, but not uniquely) american society. while some of us may be in better positions to help heal one or the other of these two ailments, we can all do something in both, as different as they are. lend emotional support to those in need of it, take part in rallies if your personal health and circumstances allow, and advocate for change to government officials at all levels from local to national. learn about the issues and explore ways you can make a difference on either or both fronts. i hope i am not being foolish or naive when i say i believe the blank page before us as a society will be liberating: an opportunity to shift ourselves toward a better, more equitable, more just path. * * * * * * to rephrase humphrey bogart’s rick blaine in casablanca, “it doesn’t take much to see that the problems of three little people library association divisions don’t amount to a hill of beans in this crazy world.” but despite the small global impact of our collective decision, i am glad our alcts, llama, and lita colleagues chose a united future as core: leadership, infrastructure, futures. watch for more information about what the merged division means for our three divisions and this journal in the months to come. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2020 https://core.ala.org/ mailto:varnum@umich.edu web content strategy in practice within academic libraries article web content strategy in practice within academic libraries courtney mcdonald and heidi burkhardt information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12453 courtney mcdonald (crmcdonald@colorado.edu) is associate professor and user experience librarian, university of colorado boulder. heidi burkhardt (heidisb@umich.edu) is web project manager and content strategist, university of michigan. © 2021. abstract web content strategy is a relatively new area of practice in industry, in higher education, and, correspondingly, within academic and research libraries. the authors conducted a web-based survey of academic and research library professionals in order to identify present trends in this area of professional practice by academic librarians and to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries. this article presents the findings of that survey. based on analysis of the results, we propose a web content strategy maturity model specific to academic libraries. introduction our previous article traced the history of library adoption of web content management systems (cms), the evolution of those systems and their use in day-to-day library operations, and the corresponding challenges as libraries have attempted to manage increasingly prolific content creation workflows across multiple, divergent cms platforms.1 these challenges include inconsistencies in voice and a lack of sufficient or dedicated resources for library website management, resulting in the absence of shared strategic vision and organizational unity regarding the purpose and function of the library website. we concluded that a productive solution to these challenges lay in the inherently user-centered practice of web content strategy, defined as “an emerging discipline that brings together concepts from user experience design, information architecture, marketing, and technical writing.”2 we further noted that organizational support for web content management and governance strategies for library-authored web content had been rarely addressed in the library literature, despite the growing importance of this area of expertise to the successful provision of support and services: “libraries must proactively embrace and employ best practices in content strategy . . . to fully realize the promise of content management systems through embracing an ethos of libraryauthored content.”3 we now investigate the current state of practice and philosophy around the creation, editing, management, and evaluation of library-authored web content. to what degree, if at all, does web content strategy factor into the actions, policies, and practices of academic libraries, and academic librarians today? does a suitable measure for estimating the maturity of web content strategy practice for academic libraries exist? mailto:crmcdonald@colorado.edu mailto:heidisb@umich.edu information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 2 background maturity models maturity models are one useful mechanism for consistently measuring and assessing an organization’s current level of achievement in a particular area, as well as providing a path to guide future growth and improvement: “maturity levels represent a staged path for an organization’s performance and process improvement efforts based on predefined sets of practice areas. . . . each maturity level builds on the previous maturity levels by adding new functionality or rigor.”4 the initial work on maturity models emerged from carnegie mellon institute (cmi), focused on contract software development.5 since that time, cmi founded the cmmi institute which has expanded the scope of maturity models into other disciplines. many such models, developed for a variety of specific industries or specializations, have since been developed based on the cmmi institute approach, in which stages are defined as: • maturity level 1: initial (unpredictable and reactive); • maturity level 2: managed (planning, performance, measurement and control occur on the project level); • maturity level 3: defined (proactive, rather than reactive, with organization-wide standards); • maturity level 4: quantitatively managed (data-driven with shared, predictable, quantitative performance improvement objectives that align to meet the needs of internal and external stakeholders); and • maturity level 5: optimizing (stable, flexible, agile, responsive, and focused on continuous improvement).6 application of maturity models within user experience work in libraries thus far, discussion of maturity models in the library literature relevant to web librarianship has primarily centered on user experience (ux) work. in their 2020 paper “user experience methods and maturity in academic libraries,” young, chao, and chandler noted, “. . . several different ux maturity models have been advanced in recent years,” reviewing approximately a half-dozen approaches with varying emphases and numbers of stages.7 in 2013, coral sheldon-hess developed the following five-stage model, based on the aforementioned cmmi framework, for assessing maturity of ux practice in library organizations: 1 – decisions are made based on staff’s preferences, management’s pet projects. user experience [of patrons] is rarely discussed. 2 – some effort is made toward improving the user experience. decisions are based on staff’s gut feelings about patrons’ needs, perhaps combined with anecdotes from service points. 3 – the organization cares about user experience; one or two ux champions bring up users’ needs regularly. decisions are made based on established usability principles and studies from other organizations, with occasional usability testing. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 3 4 – user experience is a primary motivator; most staff are comfortable with ux principles. users are consulted regularly, not just for major decisions, but in an ongoing attempt at improvement. 5 – user experience is so ingrained that staff consider the usability of all of their work products, including internal communications. staff are actively considerate, not only toward users but toward their coworkers.8 as an indicator of overall ux maturity within an organization, sheldon-hess focuses on “consideration” in interactions not only between library staff and library patrons, but also between library staff: “when an organization is well and truly steeped in ux, with total awareness of and buy-in on user-centered thinking, its staff enact those principles, whether they’re facing patrons or not.”9 in 2017, macdonald conducted a series of semi-structured interviews with 16 ux librarians to investigate, among other things, “the organizational aspects of ux librarianship across various library contexts.”10 macdonald proposes a five-stage model, broadly similar in concept to the cmmi institute structure and to sheldon-hess’s model. most compelling, however, were these three major findings, taken from macdonald’s list: • some (but not all) ux librarian positions were created as part of purposeful and strategic efforts to be more self-aware; . . . • the biggest challenges to doing ux are navigating the complex library culture, balancing competing responsibilities, and finding ways to more efficiently employ ux methods; an d • the level of co-worker awareness of ux librarianship is driven by the extent to which ux work is visible and by the individual ux librarian’s ability to effectively communicate their role and value.11 based on analysis of the results of their 2020 survey of library ux professionals, in which they asked respondents to self-diagnose their organizations, young, chao, and chandler presented, for use in libraries, their adaptation of the nielsen norman group’s eight-stage scale of ux maturity: • stage 1: hostility toward usability / stage 2: developer-centered ux—apathy or hostility to ux practice; lack of resources and staff for ux. • stage 3: skunkworks ux—ad hoc ux practices within the organization; ux is practiced, but unofficially and without dedicated resources or staff; leadership does not fully understand or support ux.12 • stage 4: dedicated ux budget—leadership beginning to understand and support ux; dedicated ux budget; ux is assigned fully or partly to a permanent position. • stage 5: managed usability—the ux lead or ux group collaborates with units across the organization and contributes ux data meaningfully to organizational and strategic decision-making. • stage 6: systematic user-centered design process—ux research data is regularly included in projects and decision-making; a wide variety of methods are practiced regularly by multiple departments. • stage 7: integrated user-centered design / stage 8: user-driven organization—ux is practiced throughout the organization; decisions are made and resources are allocated only with ux insights as a guide.13 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 4 young et al.’s findings supported macdonald’s, underscoring the importance of shared organizational understandings, priorities, and culture related to ux activities and personnel: ux maturity in libraries is related to four key factors: the number of ux methods currently in use; the level of support from leadership in the form of strategic alignment, budget, and personnel; the extent of collaboration throughout the organization; and the degree to which organizational decisions are influenced by ux research. when one or more of these four connected factors advances, so too does ux maturity. 14 these findings are consistent with larger patterns in the management of library-authored web content identified in the earlier cited literature review: inconsistent processes, disconnects between units, varying constituent goals, and vague or ineffective wcm governance structures are recurrent themes throughout the literature . . . web content governance issues often signal a lack of coordination, or even of unity, across an organization.15 assessing the maturity of content strategy practice in libraries we consider kristina halverson’s definition of content strategy, offered in content strategy for the web, as the authoritative definition. halverson states: “content strategy is the practice of planning for the creation, delivery, and governance of useful, usable content.”16 this definition can be divided into five elements: 1. planning: intentionality and alignment, setting goals, discovery and auditing, connecting to strategic a plan or vision 2. creation: roles, responsibilities, and workflows for content creation; attention to content structure; writing or otherwise developing content in its respective format 3. delivery: findability of content within site and more broadly (i.e., search engine optimization), use of distinct communication channels 4. governance: maintenance and lifecycle management of content through coordinated process and decision making; policies and procedures; measurement and evaluation through analysis of usage data, testing, and other means 5. useful/usable (hereafter referred to as ux): relevant, current, clear, concise, and in context jones discusses the application of content strategy–specific maturity models as a potential tool for content strategists: “the[se] model[s] can help your company identify your current level of content operations, . . . decide whether that level will support your content vision and strategy . . . [and] help you plan to get to the next level of content operations.”17 three examples of maturity models developed for use by content strategy industry professionals map industry-specific terms, tools, and actions to the level-based structure put forward by the cmmi institute (see table 1). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 5 table 1. comparative table of content strategy maturity models content strategy, inc.18 [2016] jones (gathercontent)19 [2018] randolph (kapost)20 [2020] ad hoc: inconsistent quality, lack of uniform practice, little or no opportunity to understand customer needs chaotic: no formal content operations, only ad hoc approaches reactive: chaotic, siloed, lacking clarity, chronically behind rudimentary: movement toward structure, unified process and voice; can be derailed by timelines, resistance piloting: trying content operations in certain areas, such as for a blog siloed: struggles to collaborate, poorly defined and inconsistently measured goals organized & repeatable: strong leadership, uniform process and voice has become routine, integration of userfocused data collection scaling: expanding formal content operations across business functions mobilizing: varying collaboration, content is centralized but not necessarily accessible, defined strategy sometimes impacted by ad hoc requests managed & sustainable: larger buy-in across organization, can sustain changes in leadership, increased number and sophistication of methods sustaining: solidifying and optimizing content operations across business functions integrating: effective collaboration across multiple teams, capability for proactive steps, still struggle to prove roi optimized: close alignment to strategic objectives, integration across the organization, leadership within and outside the organization thriving: sustaining while also innovating and seeing return on investment (roi) optimizing: cross-functional collaboration results in seamless customer messaging and experiences, consistently measured roi contributes to planning while these models have some utility for content strategy practitioners in higher education, including those in academic and research libraries, emphasis on commercial standards for assessing success (e.g., business goals, centrally managed marketing) limits their direct application in the academic environment. the 2017 blog post by tracey playle, “ten pillars for getting the most of your content: how is your university doing?”, presented ten concepts paired with questions, which could be used by higher education content professionals to reflect on their current state of practice.21 this model was developed for use by a consultancy, and the “pillars”—”strategy and vision,” “risk tolerance and creativity,” and “training and professional development”— are more broadly conceived than typical maturity models. thus, this approach seems more appropriate as a personal or management planning tool rather than as a model for evaluating maturity across library organizations. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 6 methods following review and approval by the researchers’ institutional review boards, a web-based survey collecting information about existing workflows for web content, basic organizational information, and familiarity with concepts related to web content strategy was distributed to 208 professionals in april 2020. the survey was available for four weeks. participants were drawn from academic and research libraries across north america, providing their own opinions as well as information on behalf of their library organization. (see appendix a: institution list.) the sample group (n=208) was composed of north american academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding non-academic member institutions): the association of research libraries, the big ten academic alliance, the greater western library alliance, and/or the oberlin group. some libraries are members of multiple groups. details are supplied below in table 3. we identified individuals (n=165) based on their professional responsibilities and expertise using the following order and process: 1. individual job title contains some combinations of the following words and/or phrases: content strategy, content specialist, content strategist, web content, web communications, digital communications, digital content 2. head of web department or department email 3. head of ux department or department email 4. head of it or department email for institutions where a specific named individual could not be identified through a review of the organizational website, we identified a general email (e.g., libraries@state.edu) as the contact (n=43). a mailing list was created in mailchimp, and two campaigns were created: one for named individuals, and one for general contacts. only one response was requested per institution. (see appendix b: recruitment emails.) the 165 named individuals, identified as described above, received a personalized email inviting them to participate in the study. the recruitment email explained the purpose of the study, advised potential participants of possible risks and their ability to withdraw at any time, and included a link to the survey. a separate email was sent to the 43 general contacts on the same day, explaining the purpose of the study, and requesting that the recipient forward the communication to the appropriate person in the organization. this email also included information advising potential participants of possible risks and their ability to withdraw at any time, and a link to the survey. data was recorded directly by participants using qualtrics. the bulk of survey data does not include any personal information; we did not collect the names of institutions as part of our data collection, so identifying information is limited to information about institutional memberships. for the group of named individuals, one email bounce was recorded. the open rate for personalized emails sent to named individuals was approximately 62% (88 of 142 successfully delivered emails were opened) and the survey link was followed 66 times. the general email group had a 51% open rate (n=22) with 11 clicks of the survey link. with recruitment occurring in april 2020, most individuals and institutions were at the height of switching to remote operations information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 7 in light of the covid-19 pandemic. despite this, our open rates were considerably higher than average open rates as reported by mailchimp.22 as discussed below, we achieved our minimum response rate goal of 20%. table 2. survey question topics and response count question topic category response count 1 consent — 43 2 organizational memberships demographic 40 3 approx. # full-time employees demographic 41 4 cms products used infrastructure/ organizational structure 41 5 primary cms infrastructure/ organizational structure 39 6 number of site editors infrastructure/ organizational structure 39 7 describe responsibility for content infrastructure/ organizational structure 39 8 existence of position(s) with primary duties of web content infrastructure/ organizational structure 39 9 titles of such positions, if any infrastructure/ organizational structure 24 10 familiar with web content strategy content strategy practices 36 11 definition of web content strategy content strategy practices 32 12 policies or documentation content strategy practices 35 13 methods content strategy practices 37 14 willing to be contacted — 37 15 name — 27 16 email — 26 the survey included 16 questions; question topics and response counts are noted in table 2. informed consent was obtained as part of the first survey question. (see appendix c: survey questions and appendix d: informed consent document.) most questions were multiple-choice or short answer (i.e., a number). two questions required longer-form responses. information collected fell into the following three categories: information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 8 • demographics (estimated total number of employees; institutional memberships; estimated number of employees with website editing privileges) • infrastructure and organizational structure (content management systems used to manage library-authored web content; system used to host primary public-facing website; distribution of responsibility for website content; titles of positions (if any) whose primary responsibilities focus on web content) • web content strategy practices (familiarity with; personal definition; presence or absence of policy or documentation; evaluation methods regularly used) upon completion of the survey questions, participants had the option to indicate that they would be willing to be contacted for an individual interview as part of planned future research on this topic. twenty-seven individuals (63%) opted in and provided us with their contact information. findings in sum, 43 responses were received, resulting in a response rate of 20.67%. because we did not collect names of individuals or institutions and used an anonymous link for our survey, we cannot determine the ultimate response rate by contact group (named individuals or general email). demographic information the bulk of responses came from association of research libraries members, but within-group response rates show that the proportion of responses from each group was relatively balanced within the overall 20% response rate. table 3. distribution of survey contacts, responses, and response rates by group23 organization member libraries contacted responses share of total responses (%) group response rate (%) association of research libraries 117 26 50.98 22.22 big ten academic alliance 15 5 9.8 33.0 greater western library alliance 38 8 15.69 21.05 oberlin group 80 12 23.53 15.0 infrastructure & organizational structure content management systems a variety of content management systems are used to manage library-authored web content (see table 4); libguides, wordpress, omeka, and drupal were most commonly used across the group. other systems mentioned as write-in responses included acquia drupal, cascade, fedora-based systems, archivesspace, google sites, and “wiki and blog.” one response stated, “most pages are just non-cms for the website.” write-in responses for “other” and “proprietary system hosted by institution” were carried forward within the survey from question 3 to question 4, and are available in full in appendix e: other content management systems mentioned by respondents. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 9 table 4. cms products used to manage library-authored web content q3: cms products used percentage (%) count libguides 28.06 39 wordpress 18.71 26 omeka 15.11 21 drupal 13.67 19 other 9.35 13 sharepoint 7.19 10 proprietary system hosted by institution 7.19 10 adobe experience manager 0.72 1 total 100 139 for their primary library website, just under half of respondents relied on drupal (n=17, 43.59%). slightly fewer selected the specific system, whether the institution’s proprietary system or some other option, that they had shared as a write-in answer for the previous question; in total just under 36% (n=14). despite the widespread use reported in the previous question, only two respondents indicated that their primary website was hosted in libguides. (see table 5.) table 5. cms used to host primary library website q4: primary website cms percentage (%) count drupal 43.59 17 other (write in answers) 20.51 8 wordpress 15.38 6 libguides 5.13 2 proprietary system hosted by institution (write in answers) 15.38 6 dedicated positions, position titles, and organizational workflows almost two-thirds of respondents (n=24, 61.5%) indicated there were position(s) within their library whose primary duties were focused on the creation, management, and/or editing of web content. a total of 52 position titles were shared (the full list of position titles can be found in appendix f). terms and phrases most commonly occurring across this set were web (15), librarian (15), user experience (10), and digital (8). explicitly content-focused terms appeared more rarely: content (6), communication/communications (5), and editor (1). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 10 table 6. frequency of terms and phrases in free-text descriptions of website content management, grouped by the authors into concepts count count count count count concept collaborative 29 assigned roles 18 locus of control 13 support 5 libguides 14 terms group 7 admin* 6 their own 7 training 2 team 6 manager 5 review 3 guidance 2 distributed 5 editor/s 4 oversight 3 consulting 1 committee 3 developer 3 permission 1 stakeholder 3 product owner 2 representative 2 crossdepartmental 1 decentralized 1 inclusive 1 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 11 most respondents described collaborative workflows for web content management, in which a group of representatives or delegates collectively stewards website content (see table 6 for a summary and appendix f for full-text responses). collaborative concepts appeared 29 times, including terms like group (7), team (6), distributed (5), and committee (3). within this set, decentralized, inclusive, and cross-departmental each appeared once. similarly, within terms related to locus of control, the phrase “their own” appeared seven times. specifically assigned roles or responsibilities were mentioned 18 times, including terms like admin/administrator (6), manager (5), and editor/s or editorial (4). respondents discussed support structures such as training, guidance or consulting five times. libguides were mentioned 14 times. over 60% of respondents indicated that 20 or fewer employees had editing privileges on the library website (see table 7). three respondents commented “too many” when citing the number or range: “too many! i think about five, but there could be more”; “too many, about 12”; “too many to count, maybe 20+.” table 7. distribution of the number of employees with website editing privileges response percentage (%) count less than five 23.08 9 5–10 20.51 8 11–20 17.95 7 21–99 23.08 9 100–199 10.26 4 200+ 2.56 1 the greatest variation in practice regarding how many employees had website editing privileges occurs in institutions with more than 100 total employees, where institutions reported within every available range (see table 8). table 8. comparison of number of total employees and of number of employees with editing privileges number of employees less than 5 5–10 11–20 21–99 100–199 200+ 4–10 2 — — — — — 11–25 3 1 — — — — 26–50 — 2 2 — — — 51–99 1 1 4 1 — — 100+ 3 4 2 8 4 1 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 12 web content strategy practices almost all respondents (n=36, 83%) reported that they were familiar with the concept of web content strategy. conversely, only 20% (n=7) reported that their library had either a documented web content strategy or web content governance policy. respondents were asked, optionally, to provide a definition of web content strategy in their own words, and we received 32 responses (see appendix g: definitions of web content strategy). we analyzed the free-text definitions of content strategy based on the five elements of halvorson’s previously cited definition: planning, creation, delivery, governance, and ux. we first individually rated the definitions, then we determined a mutually agreed rating for each. across the set, responses most commonly addressed concepts or activities related to planning and ux, and least commonly mentioned concepts or activities related to delivery (see table 9). table 9. occurrence of content strategy elements in free-text definitions element count percentage (%) plan intentional, strategic, brand, style, best practices 29 91 creation workflows, structure, writing 20 63 delivery findability, channels 13 41 governance maintenance, lifecycle, measurement/evaluation 16 50 ux needs of the user, relevant, current, clear, concise, in context 19 59.38 responses were scored on each of the five elements as follows: zero points, concept not mentioned; one point, some coverage of the concept; two points, thorough coverage of the concept. representative examples are provided in table 10. a perfect score for any individual definition would be 10. the median score across the group was four, and the average score was 3.4. we consider scores less than three to indicate a basic level of practice; scores from four to seven to be an intermediate level of practice; and scores above eight to be advanced levels of practice. of the 33 responses to the free-text definition question, one respondent failed to include any data, 14 responses were classed as basic, 17 responses as intermediate, and none were advanced. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 13 table 10. example showing scoring of four representative free-text definitions provided by respondents free-text definitions of content strategy plan intentional, strategic, brand, style, best practices creation workflows, structure, writing delivery findability, channels governance maintenance, lifecycle, measuremen t/evaluation ux needs of the user, relevant, current, clear, concise, and in context total score intentional and coordinated vision for content on the website. 1 0 0 0 0 1 an overarching method of bringing user experience best practices together on the website including heuristics, information architecture, and writing for the web. 1 1 0 0 1 3 strategies for management of content over its entire lifecycle to ensure it is accurate, timely, usable, accessible, appropriate, findable, and well-organized. 1 0 1 1 1 4 the process of creating and enacting a vision for the organization and display of web content so that it is user friendly, accurate, up-to-date, and effective in its message. web content strategy often involves considering the thoughts and needs of many stakeholders, and creating one cohesive voice to represent them all. 2 1 0 1 2 6 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 14 respondents reported most frequent use of practices associated with web librarianship and user experience work: analysis of usage data (n=36) and usability testing (n=28) (see fig. 1). contentspecific methods were less commonly used overall. figure 1. frequency of reported usage of analysis and evaluation methods the five other responses mainly clarified or qualified the selections, although some added additional information, for example: at this time, all library websites use a standard template, so they have the same look and feel. beyond that everything else is “catch as catch can” because we do not have a web services librarian, nor are we likely to get that dedicated position any time soon, given the recent covid-19 financial upheaval. brand guidelines, accessibility guidance, and personal responsibility were also mentioned. discussion the targeted recruitment methodology and survey, representing a combination of demographic and practice-based questions, aspired to collect data suitable to generate a snapshot of how web content strategy work is being undertaken in academic libraries at this time, as well as the depth and breadth of that practice. we were struck by several contrasts in findings: first and foremost, the 80–20 inversion across responses related to knowledge of web content strategy versus its practice. this was particularly notable in combination with respondents’ reports that, in nearly two-thirds of organizations, one information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 15 or multiple positions exist in their organization with primary duties focused on the creation, management, and/or editing of web content. the influence of ux thinking and methods in academic libraries is visible in the frequency of respondents’ reported use of general and established ux practices for maintaining the primary website (e.g., usability testing). the other four elements of halvorson’s definition were less thoroughly covered, both in provided definitions of web content strategy and in methods reported. some respondents mentioned use of methods such as content audits or inventories and style guides, but many fewer reported reliance on review checklists, content calendars, and readability scores. in reviewing the self-reported definitions of content strategy for evidence of each of the five elements of halvorson’s previously discussed definition, trends in findings suggest higher levels of maturity in the elements of planning, creation, and ux, and lower levels in the elements of delivery and governance. nearly all respondents (91%) referenced the element of planning. almost twothirds mentioned concepts or practices related to creation, and approximately 60% of respondents referenced usability of content or a focus on the user in some capacity. only half made mention of governance (including maintenance and evaluation), and even fewer (41%) referenced delivery, whether considering content channels or findability; in fact, no single definition touched on both. overall, the results of the analysis of provided definitions (discussed in the previous section) suggest that at present, web content strategy as a community of practice in academic libraries is operating at, or just above, a basic level. proposed maturity model from these findings, and referencing the structure of the cmmi institute five-stage maturity model, the authors propose the following proposed content strategy maturity model for academic libraries. as previously noted in our findings, we assess the web content strategy community of practice in academic libraries as operating at, or just above, a basic level. to align the proposed maturity model with the definition scores, we applied the 10-point rating scale for provided definitions to the five levels by assigning two points per level, so a score of one or two would be equivalent to level 1, a score of three or four equivalent to level 2, and so on (table 11). table 11. comparison of maturity model with definition rating scale and maturity assessment maturity model level definition score assessment level 1 1 basic level 1 2 basic level 2 3 basic level 2 4 intermediate level 3 5 intermediate level 3 6 intermediate level 4 7 intermediate level 4 8 advanced level 5 9 advanced level 5 10 advanced content strategy maturity model for academic libraries level 1: ad hoc • no planning or governance • creation and delivery are reactive, distributed, and potentially chaotic • no or minimal consideration of ux level 2: establishing • some planning and evidence of strategy, such as use of content audits and creation of a style guide; may be localized within specific groups or units • basic coordination of content creation workflows • delivery workflows not explicitly addressed, or remain haphazard • no or minimal organization-wide governance structures or documentation in place; may be localized within specific groups or units • evidence of active consideration of ux in creation and structure of content level 3: scaling • intentional and proactive planning coordinated across multiple units • basic content creation workflows in place across organization • delivery considered, but may not be consistent or strategic • ad hoc evaluation through usage data and usability testing; organization-wide governance documents and workflows may be at a foundational level • consideration of ux is integral to process of creating useful, usable content • web content creation and maintenance is assigned at least partly to a permanent position with some level of authority and responsibility for the primary website level 4: sustaining • alignment in planning, able to respond to organizational priorities; style guidelines and best practices widely accepted • established and accepted workflows for content creation are coordinated through a person, department, team, or other governing body • delivery includes strategic and consistent use of channels, as well as consideration of findability • regular and strategic evaluation occurs; proactive maintenance and retirement practices in place; managed through established governance documents and workflows • web content strategy explicitly assigned partly or fully to a permanent position level 5: thriving • full lifecycle of content (planning, creation, delivery, maintenance, retirement) managed in coordination across all library-authored web content platforms • governance established and accepted throughout the organization, including documented policies, procedures, and accountability • basic understanding of content strategy concepts and importance across the organization • overall stable, flexible, agile, responsive, user-centered and focused on continuous improvement information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 17 as previously mentioned, the median score across the group was four, and the average score was 3.4; these measures suggest that the majority of survey respondents’ organizational web content strategy maturity levels would currently stand at level 2 or 3, with a few at level 1. conclusion the findings of this survey and assessment, while inherently limited, suggest that web content strategy is currently not a pervasive factor for academic libraries and academic web librarians in the development and implementation of actions, policies, and practices related to website creation, maintenance, and evaluation. we have proposed a measure for self-estimating the maturity of web content strategy practice for academic libraries. our content strategy maturity model for academic libraries, while grounded both in industry best practices and in evidence from practitioners in academic libraries, is nonetheless a work in progress. we intend to further develop and strengthen the model through follow-up interviews with practitioners, drawing on those survey respondents who opted-in to being contacted. interviewees will be invited to discuss their work within and outside the frame of the proposed maturity model, and to provide feedback on the model itself, with the ultimate goal of enabling a better understanding of web content strategy practice in academic libraries and the needs of its community of practice. endnotes 1 courtney mcdonald and heidi burkhardt, “library-authored web content and the need for content strategy,” information technology and libraries 38, no. 3 (september 15, 2019): 8–21, https://doi.org/10.6017/ital.v38i3.11015. 2 mcdonald and burkhardt, 14. 3 mcdonald and burkhardt, 16. 4 “cmmi levels of capability and performance,” sec. maturity levels, cmmi institute llc, accessed may 28 2020, https://cmmiinstitute.com/learning/appraisals/levels. 5 “about cmmi institute,” cmmi institute llc, accessed may 28 2020, https://cmmiinstitute.com/company. 6 “cmmi levels of capability and performance,” sec. maturity levels. 7 scott w. h. young, zoe chao, and adam chandler, “user experience methods and maturity in academic libraries,” information technology and libraries 39, no. 1 (march 16, 2020): 2, https://doi.org/10.6017/ital.v39i1.11787. 8 coral sheldon-hess, “ux, consideration, and a cmmi-based model,” para. 6, july 25, 2013, http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/. 9 sheldon-hess, “ux, consideration, and a cmmi-based model,” para. 2, http://www.sheldonhess.org/coral/2013/07/ux-consideration-cmmi/. https://doi.org/10.6017/ital.v38i3.11015 https://cmmiinstitute.com/learning/appraisals/levels https://cmmiinstitute.com/company https://doi.org/10.6017/ital.v39i1.11787 http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 18 10 craig m. macdonald, “‘it takes a village’: on ux librarianship and building ux capacity in libraries,” journal of library administration 57, no. 2 (february 17, 2017): 196, https://doi.org/10.1080/01930826.2016.1232942. 11 macdonald, 212. 12 skunk works is trademarked by lockheed martin corporation, but is informally used to describe an experimental, sometimes secret, research and development group focused on agile innovation. 13 young, chao, and chandler, “user experience methods and maturity in academic libraries,” 19. 14 young, chao, and chandler, 23. 15 mcdonald and burkhardt, “library-authored web content and the need for content strategy,” 15–16. 16 kristina halvorson, content strategy for the web, 2nd ed. (berkeley, ca: new riders, 2012), 28. 17 colleen jones, “a content operations maturity model,” sec. a maturity model for content operations, gather content (blog), november 30, 2018, https://gathercontent.com/blog/content-operations-model-of-maturity. 18 “understanding the content maturity model,” content strategy inc. (blog), march 2016, https://www.contentstrategyinc.com/understanding-content-maturity-model/. 19 jones, “a content operations maturity model,” sec. a maturity model for content operations. 20 zoë randolph, “where do you fall on the content operations maturity model?,” sec. the content operations maturity model, kapost blog (blog), april 20, 2020, https://kapost.com/b/content-operations-maturity-model/. 21 tracy playle, “ten pillars for getting the most of your content: how is your university doing?,” pickle jar communications (blog), september 29, 2017, http://www.picklejarcommunications.com/2017/09/29/content-strategy-benchmarking/. 22 “email marketing benchmarks by industry,” mailchimp, accessed june 15, 2020, https://mailchimp.com/resources/email-marketing-benchmarks/. 23 some libraries are members of multiple groups. https://doi.org/10.1080/01930826.2016.1232942 https://gathercontent.com/blog/content-operations-model-of-maturity https://www.contentstrategyinc.com/understanding-content-maturity-model/ https://kapost.com/b/content-operations-maturity-model/ http://www.picklejarcommunications.com/2017/09/29/content-strategy-benchmarking/ https://mailchimp.com/resources/email-marketing-benchmarks/ information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 19 appendices appendix a: institution list appendix b: recruitment emails appendix c: survey questions appendix d: informed consent document appendix e: other content management systems mentioned by respondents appendix f: organizational responsibility for content; and position titles appendix g: definitions of web content strategy information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 20 appendix a: institution list institution membership(s) agnes scott college oberlin group alabama arl alberta arl albion college oberlin group alma college oberlin group amherst college oberlin group arizona arl, gwla arizona state arl, gwla arkansas gwla auburn arl augustana college oberlin group austin college oberlin group bard college oberlin group barnard college oberlin group bates college oberlin group baylor gwla beloit college oberlin group berea college oberlin group boston arl boston college arl boston public library arl bowdoin college oberlin group brigham young arl, gwla british columbia arl brown arl bryn mawr college oberlin group bucknell university oberlin group calgary arl california, berkeley arl california, davis arl california, irvine arl california, los angeles arl information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 21 institution membership(s) california, riverside arl california, san diego arl california, santa barbara arl carleton college oberlin group case western reserve arl chicago arl, btaa cincinnati arl claremont colleges gwla, oberlin group clark university oberlin group coe college oberlin group colby college oberlin group colgate university oberlin group college of the holy cross oberlin group college of wooster oberlin group colorado arl, gwla colorado college oberlin group colorado state arl, gwla columbia arl connecticut arl connecticut college oberlin group cornell arl dartmouth arl davidson college oberlin group delaware arl, gwla denison university oberlin group denver gwla depauw university oberlin group dickinson college oberlin group drew university oberlin group duke arl earlham college oberlin group eckerd college oberlin group emory arl information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 22 institution membership(s) florida arl florida state arl franklin & marshall college oberlin group furman university oberlin group george washington arl georgetown arl georgia arl georgia tech arl gettysburg college oberlin group grinnell college oberlin group guelph arl gustavus adolphus college oberlin group hamilton college oberlin group harvard arl haverford college oberlin group hawaii arl hope college oberlin group houston arl, gwla howard arl illinois, chicago arl, gwla illinois, urbana arl, btaa indiana arl, btaa iowa arl, btaa iowa state arl, gwla johns hopkins arl kalamazoo college oberlin group kansas arl, gwla kansas state gwla kent state arl kentucky arl kenyon college oberlin group knox college oberlin group lafayette college oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 23 institution membership(s) lake forest college oberlin group laval arl lawrence university oberlin group library of congress arl louisiana state arl louisville arl macalester college oberlin group manhattan college oberlin group manitoba arl maryland arl, btaa massachusetts arl mcgill arl mcmaster arl miami arl michigan arl, btaa michigan state arl, btaa middlebury college oberlin group mills college oberlin group minnesota arl, btaa missouri arl, gwla mit arl morehouse/spelman colleges (auc) oberlin group mount holyoke college oberlin group nebraska arl, btaa nevada las vegas gwla new mexico arl, gwla new york arl north carolina arl north carolina state arl northwestern arl, btaa notre dame arl oberlin college oberlin group occidental college oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 24 institution membership(s) ohio arl ohio state arl, btaa ohio wesleyan university oberlin group oklahoma arl, gwla oklahoma state arl, gwla oregon arl, gwla oregon state gwla ottawa arl pennsylvania arl pennsylvania state arl, btaa pittsburgh arl princeton arl purdue arl, btaa queen's arl randolph-macon college oberlin group reed college oberlin group rhodes college oberlin group rice arl, gwla rochester arl rollins college oberlin group rutgers arl, btaa sarah lawrence college oberlin group saskatchewan arl sewanee: the university of the south oberlin group simmons university oberlin group simon fraser arl skidmore college oberlin group smith college oberlin group south carolina arl southern california arl, gwla southern illinois arl, gwla southern methodist gwla st. john's university / college of st. benedict oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 25 institution membership(s) st. lawrence university oberlin group st. olaf college oberlin group suny-albany arl suny-buffalo arl suny-stony brook arl swarthmore college oberlin group syracuse arl temple arl tennessee arl texas arl, gwla texas a&m arl, gwla texas state gwla texas tech arl, gwla toronto arl trinity college oberlin group trinity university oberlin group tulane arl union college oberlin group utah arl, gwla utah state gwla vanderbilt arl vassar college oberlin group virginia arl virginia commonwealth arl virginia tech arl wabash college oberlin group washington arl, gwla washington and lee university oberlin group washington state arl, gwla washington u.-st. louis arl, gwla waterloo arl wayne state arl, gwla wellesley college oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 26 institution membership(s) wesleyan university oberlin group west virginia gwla western arl wheaton college oberlin group whitman college oberlin group whittier college oberlin group willamette university oberlin group williams college oberlin group wisconsin arl, btaa wyoming gwla yale arl york arl information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 27 appendix b: recruitment emails recruitment email: named recipients this message is intended for *|mmerge6|* dear *|fname|*, we are writing today to ask for your participation in a research project “content strategy in practice within academic libraries,” (cu boulder irb protocol #18-0670), led by co-investigators courtney mcdonald and heidi burkhardt (university of michigan). we have provided the information below as a downloadable pdf should you wish to keep it for your records. the purpose of the study is to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding nonacademic member institutions): association of research libraries, big ten academic alliance, greater western library alliance, and/or the oberlin group. if you opt to participate, we expect that you will be in this research study for the duration of the time it takes to complete our web-based survey. you will not be paid to be in this study. whether or not you take part in this research is your choice. you can leave the research at any time and it will not be held against you. we expect about 210 people, representing their institutions, in the entire study internationally. this survey will be available over a four-week period in the spring of 2020, through friday, may 1. ** confidentiality ----------------------------------------------------------- information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of colorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 28 research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. ** questions ----------------------------------------------------------- if you have questions, concerns, or complaints, or think the research has hurt you, contact the research team at crmcdonald@colorado.edu. this research has been reviewed and approved by an irb. you may talk to them at (303) 735 3702 or irbadmin@colorado.edu if: * your questions, concerns, or complaints are not being answered by the research team. * you cannot reach the research team. * you want to talk to someone besides the research team. * you have questions about your rights as a research subject. * you want to get information or provide input about this research. thank you for your consideration, courtney mcdonald crmcdonald@colorado.edu heidi burkhardt heidisb@umich.edu ============================================================ not interested in participating? you can ** unsubscribe from this list (*|unsub|*). this email was sent to *|email|* (mailto:*|email|*) why did i get this? (*|about_list|*) unsubscribe from this list (*|unsub|*) update subscription preferences (*|update_profile|*) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 29 recruitment email: named recipients dear library colleague, we are writing today to ask for your participation in a research project “content strategy in practice within academic libraries,” (cu boulder irb protocol #18-0670), led by co-investigators courtney mcdonald and heidi burkhardt (university of michigan). our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding non academic member institutions): association of research libraries, big ten academic alliance, greater western library alliance, and/or the oberlin group. we ask that you forward this message to the person in your organization whose role includes oversight of your public web site. we are only requesting a response from one person at each institution contacted. thank you for your assistance in routing this request. we have provided the information below as a downloadable pdf should you wish to keep it for your records. the purpose of the study is to establish an understanding of the degree of institutio nal engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. if someone within your library opts to participate, we expect that person will be in this research study for the duration of the time it takes to complete our web-based survey. the participant will not be paid to be in this study. whether or not someone in your library takes part in this research is an individual choice. the participant can leave the research at any time and it will not be held against them. we expect about 210 people, representing their institutions, in the entire study internationally. this survey will be available over a four-week period in the spring of 2020, through friday, may 1. ** confidentiality ----------------------------------------------------------- information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of co lorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 30 ** questions ----------------------------------------------------------- if you have questions, concerns, or complaints, or think the research has hurt you, contact the research team at crmcdonald@colorado.edu. this research has been reviewed and approved by an irb. you may talk to them at (303) 735 3702 or irbadmin@colorado.edu if: * your questions, concerns, or complaints are not being answered by the research team. * you cannot reach the research team. * you want to talk to someone besides the research team. * you have questions about your rights as a research subject. * you want to get information or provide input about this research. thank you for your consideration, courtney mcdonald crmcdonald@colorado.edu heidi burkhardt heidisb@umich.edu ============================================================ not interested in participating? you can ** unsubscribe from this list (*|unsub|*). this email was sent to *|email|* (mailto:*|email|*) why did i get this? (*|about_list|*) unsubscribe from this list (*|unsub|*) update subscription preferences (*|update_profile|*) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 31 appendix c: survey questions web content strategy methods and maturity start of block: introduction q1 web content strategy methods and maturity in academic libraries (cu boulder irb protocol #20-0581) purpose of the study the purpose of the study is to gather feedback from practitioners on the proposed content strategy maturity model for academic libraries, and to further enhance our understanding of web content strategy practice in academic libraries and the needs of its community of practice. q2 please make a selection below, in lieu of your signature, to document that you h ave read and understand the consent form, and voluntarily agree to take part in this research. o yes, i consent to take part in this research. (1) o no, i do not grant my consent to take part in this research. (2) skip to: end of survey if q2 = no, i do not grant my consent to take part in this research. end of block: introduction start of block: demographic information information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 32 q3 estimated total number of employees (fte) at your library organization: o less than five (12) o 5-10 (13) o 11-20 (14) o 21-99 (15) o 100-199 (16) o 200+ (17) q4 estimated number of employees with editing privileges within your primary library website: o less than five (12) o 5-10 (13) o 11-20 (14) o 21-99 (15) o 100-199 (16) o 200+ (17) q5 does your library have a documented web content strategy and / or a web content governance policy? o no (1) o yes (2) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 33 q6 are there position(s) within your library whose primary duties are focused on creation, management, and/or editing of web content? o no (1) o yes, including myself (2) o yes, not including myself (3) end of block: demographic information start of block: web content strategy q7 please indicate the degree to which each of the five elements of content strategy are currently in practice at your library. q8 creation employ editorial workflows, consider content structure, support writing. definitely true (48) somewhat true (49) somewhat false (50) definitely false (51) this is currently in practice at my institution. (1) o o o o information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 34 q9 delivery consider findability, discoverability, and search engine optimization, plus choice of content platform or channels. definitely true (48) somewhat true (49) somewhat false (50) definitely false (51) this is currently in practice at my institution. (1) o o o o q10 governance support maintenance and lifecycle of content, as well as measurement and evaluation. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o q11 planning use an intentional and strategic approach, including brand, style, and writing best practices. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 35 q12 user experience consider needs of the user to produce relevant, current, clear, concise, and in context. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o q13 please rank the elements of content strategy (as defined above) in order of their priority based on your observations of practice in your library. • ______ creation (1) • ______ delivery (2) • ______ governance (3) • ______ planning (4) • ______ user experience (5) q14 how would you assess the content strategy maturity of your organization? o basic (1) o intermediate (2) o advanced (3) end of block: web content strategy information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 36 start of block: thank you! q15 your name: ________________________________________________________________ q16 thank you very much for your willingness to be interviewed as part of our research study. prior to continuing on to finalize your survey submission, please sign up for an interview time: [link] (this link will open in a new window in order to allow you to finalize and submit your survey response after scheduling an appointment) please contact courtney mcdonald, crmcdonald@colorado.edu, if you experience any difficulty in registering or if there is not a time available that works for your schedule. end of block: thank you! information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 37 appendix d: informed consent document permission to take part in a human research study page 37 of 28 title of research study: content strategy in practice within academic libraries irb protocol number: 18-0670 investigators: courtney mcdonald and heidi burkhardt purpose of the study the purpose of the study is to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding nonacademic member institutions): association of research libraries, big ten academic alliance, and/or greater western library alliance. we expect that you will be in this research study for the duration of the time it takes to complete our web-based survey. we expect about 210 people, representing their institutions, in the entire study internationally. explanation of procedures we are directly contacting each library to request that the appropriate individual(s) complete a web-based survey. this survey will be available over a four-week period in the spring of 2020. voluntary participation and withdrawal whether or not you take part in this research is your choice. you can leave the research at any time and it will not be held against you. the person in charge of the research study can remove you from the research study without your approval. possible reasons for removal include an incomplete survey submission. confidentiality information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of colorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. payment for participation you will not be paid to be in this study. contact for future studies we would like to keep your contact information on file so we can notify you if we have future research studies we think you may be interested in. this information will be used by only th e information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 38 principal investigator of this study and only for this purpose. you can opt-in to provide your contact information at the end of the online survey. questions if you have questions, concerns, or complaints, or think the research has hurt you, contact to the research team at crmcdonald@colorado.edu this research has been reviewed and approved by an irb. you may talk to them at (303) 7353702 or irbadmin@colorado.edu if: • your questions, concerns, or complaints are not being answered by the research team. • you cannot reach the research team. • you want to talk to someone besides the research team. • you have questions about your rights as a research subject. • you want to get information or provide input about this research. signatures in lieu of your signature, your acknowledgement of this statement in the online survey document documents your permission to take part in this research. mailto:crmcdonald@colorado.edu mailto:irbadmin@colorado.edu information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 39 appendix e: other content management systems mentioned by respondents question #4: which of the following content management systems does your library use to manage library-authored web content? write-in responses for ‘proprietary system hosted by institution’ ● xxxxxxxxxxx • archivesspace • pressbooks • preservica • hippo cms • siteleaf • cascade • dotcms • terminal four • acquia drupal • fedora based digital collections system built in house write-in responses for ‘other” • wiki and blog • we draft content in google docs & also use gather content for auditing. • google sites • cascade • ebsco stacks • modx • islandora and online journal system • contentful • we also have some in-house-built tools such as for room booking; some of these are quite old and we would like to upgrade or improve them when we can. (very few people can make edits in these tools.) • cascade • the majority of the library website (and university website) is managed by a locally developed cms; however, the university is in the process of migrating to the acquia drupal cms. • blacklight, vivo, fedora • most pages are just non-cms for the website information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 40 appendix f: organizational responsibility for content; and position titles question 6 please explain how your organization distributes responsibility for content hosted in your content management system(s). if different parties (individuals, departments, collaborative groups) are responsible for managing content in different platforms please describe. • we have one primary website manager who oversees the management of the website, including content strategy and editing, and 2 editors who assist with small editing tasks. • we have content editors that edit content for individual libraries and collections. there is a content creator network managed by library communications. they provide trainings and guidance for content editors and act as reviewers, but not every single thing gets reviewed. • we have a team of developers and product owners who are responsible for managing web content. • we currently have a very distributed model, where virtually any library staff member or student assistant can request a drupal account and then make changes to existing content or develop new pages. we have a cross-departmental team that oversees the libraries' web interfaces and makes decisions about library homepage content, the menu navigation, overall ia, etc. we have web content guidelines to help staff as they develop new content. we have identified functional and technical owners for each of our cmss and have slightly different processes for managing content in those cmss. our general approach, however, is very inclusive (for better or worse ;) )-lots of staff have access to creating and editing content. we are, however, moving to a less distributed content for drupal in particular. moving forward, we'll have a small team responsible for editing and developing new content. this is to ensure that content is more consistent and user-centered. we attempted to identify funding for a full-time content manager but were unsuccessful, so this team will attempt to fill the role of a full-time content manager. • ux is the product owner and admin. if staff want content added to the website, they send a request to ux, we structure and edit content in a google doc, and then ux posts to the website. • there's no method for how or why responsibility is distributed. it ends up being something like, someone wants to add some content, they get editing access, they can now edit anything for as long as they're at the library. we are a super decentralized and informal library. • the primary content managers are the xxxxxx librarian and the xxxxxx. other individuals (primarily librarians) that are interested in editing their content have access on our development server. their edits are vetted by the xxxxxxand/or the xxxxxx librarian before being moved into production. • the xxxxxx department (6 staff) manages content and helps staff throughout the organization create and maintain content. ux staff sometimes teach others how to manage content, and sometimes do it for them. if design or content is complex, usually ux staff do the work. many staff don't maintain any content beyond their staff pages. subject specialists and instruction librarians maintain content [like] libguides-like content, but we don't use libguides. branch library staff maintain most of the content for their library pages. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 41 • in addition, the xxxxxx manages the catalog. the xxxxxx department manages special web projects. and the xxxxxx department manages social media, publications, and news. • a web content team made up of two administrators and librarians from xxxxxx and xxxxxx makes executive-level decisions about web content. • the xxxxxx team (xxxxxx) provides oversight and consulting for online user interfaces chaired by a xxxxxxposition which is new and is not yet filled. • for the public website, content editing is distributed to many groups and teams throughout the libraries. • the xxxxxxteam manages the main portions of the site including the homepage, news, maps, calendars, etc. the research librarians and subject liaisons manage the research guides. the xxxxxx provides guidance regarding overall responsibilities and style guidelines. • site structure and top-level pages for our main website resides with xxxxxx. page content is generally distributed to the departments closest to the services described by the pages. • right now editing of pages is distributed to those individuals who have the closest relationship to the pages being edited, with a significantly smaller number of people having administrative access to all of the libraries' websites. • primary website is co-managed by xxxxxx team (4 people) and xxxxxx team (3 people). xxxxxxteam creates timely content about news/events/initiatives while xxxxxx team manages content on evergreen topics. • research librarians and staff manage libguides content, which is in sore need of an inventory and pruning. • primarily me, plus two colleagues who serve with me as a web editorial board • one librarian manages the content and makes changes based on requests from other library staff • my role (xxxxxx) is xxxxxx. we also have a web content creator in our xxxxxx. i chair our xxxxxxgroup (xxxxxx), which has representatives from each division in the library and they are the primary stewards of supporting library authored web content. our "speciality" platforms (libguides, omeka, and wordpress for microsites) all have service leads, but content is managed by the respective stakeholders. the lead for libguides is a xxxxxx [group] member due to its scope and scale. in our primary website, we are currently structured around drupal organic groups for content management with xxxxxx [group] having broad editing access. in our new website, all content management will go through the xxxxxx, with communications for support and dynamic content (homepage, news, events) management. • management is somewhat in flux right now. we recently migrated our main web site to acquia drupal; there is a very new small committee consisting of xxxxxx, and three representatives from elsewhere in the library. for libguides, all reference, instructio n, and subject librarians can edit their own guides; the xxxxxx has tended to have final oversight but i don't know if this has ever been formally delegated. • librarians manage their own libguides subject guides; several members of xxxxxx can make administrative changes to coding, certificates, etc. on the entire site; there are individuals in different departments who control their own pages/libguides. there is a group within the library that administers wordpress for the institution. other content systems are administered by individuals within the library. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 42 • librarians are responsible for their own libguides. the xxxxxx department manages changes to most content, although some staff do manage their own wordpress content. they tend not to want to. • individuals. mainly one person authors content. the other individual has created some research guides. • individuals in different positions and departments within the library are assigned roles based on the type of content they frequently need to edit. • for instance, xxxxxx staff have the ability to create and edit exhibition content in drupal. xxxxxx staff and xxxxxx staff have the ability to create and edit equipment content. the event coordinator and librarians and staff involved in instruction are allowed to create and edit event and workshop listings. • only the communication coordinator is permitted to create news items that occupy real estate on the home page and various service point home pages. • as for general content, the primary internal stakeholders for that content typically create and edit that content, but if any staff notice a typo or factual error they are encouraged to correct them on their own, although they can also submit a request to the it department if they are not comfortable doing so. • subject specific content is hosted in libguides, and is maintained by subject liaison librarians. other content in libguides, software tutorials or information related to electronic resources for example, is created and maintained by appropriate specialists. • the drupal site when launched had internal stakeholders explicitly defined for each page, and only staff from the appropriate group could edit that content (e.g. if xxxxxx was tagged as the only stakeholder for a page about xxxxxx policies, then only staff from the xxxxxx department with editing privileges could edit that page). this system was abandoned after about two years as it was considered too much overhead to maintain and also the introduction of a content revisioning module that kept a history of edits alleviated fears of malicious editing. • individuals are assigned pages to keep content updated. the xxxxxx is responsible for coordinating with those staff and offers training to make sure content gets updated. • individual liaison librarians are responsible for their own libguides. i and the "xxxxxx" are the primary editors of the wordpress site, although 4 others have editing access (an employee who writes and posts news articles, the liaison librarian who spearheaded our new video tutorials, and two who work in special collections to update finding aids on that site, which is still on wordpress and i would consider under the main libraries web page, but is part of a multisite installation.) • in omeka and libguides, librarians are pretty self-sufficient and responsible for all of their own content. the three or four digital projects faculty and staff who work with omeka manage it internally alongside one of our developers. our omeka instance is relatively small-scale. • i (xxxxxx) oversee our libguides environment. while i am in the process of creating and implementing formal libguides content and structure guidelines, as of now it's a bit of a free-for-all with everyone responsible for the content pertaining to their own liaison department(s). content is made available to patrons via automatically populating legacy landing pages (we've had libguides for a decade and i've been with the institution not yet a year). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 43 • as the xxxxxx, i am ultimately responsible for almost all of the content in our wordpress environment. that said, i try to distribute content upkeep responsibilities to the relevant department for each piece of the site. managers and committee chairs provide me with what they want on the web, and as needed (and in consultation with them) i review/rewrite it for the web (plain language), develop information architecture, design the front-end, and accessibly publish the content. there are only a few faculty and staff at my library who are comfortable with wordpress -but one of my long-term goals is to empower more folks to enact their own minor edits (e.g., updating hours, lending policies, etc.) while i oversee large-scale content creation, overall architecture, and strategy. we have a blog portion of our wordpress site which is not managed by anyone in particular, but i tend to clean it up if things go awry. • generally all of our web authors *can* publish to most parts of the site. (a very few content types (mostly featured images that display on home pages) can be edited only by admins and a small number of super-users.) however the great majority of people who can post content very rarely do (and some never do). some edit or post only to specific blogs, some only to their own guides or to very specific pages or suites of pages (e.g. liaison librarians to their own guides; thesis assistant to thesis pages). our small group in xxxxxx reviews new and updated pages and edits for in-house style and usability guidelines, and also trains and works collaboratively with web authors to create more usable content and reduce duplication -but given the large number of authors (with varied priorities, skills, and preferences) and pages we have trouble keeping up. we also more actively manage content on home pages. • for the main website and intranet, we have areas broken apart by unit area. we use workbench access to determine who can edit which pages. libguides is managed by committee, but most of the librarians have access. proprietary systems have separate accounts for those who need access. • for libguides, librarians can create content as they like, though there is a group that provides some (light) oversight. for main library website, most content is overseen by departments (in practice, one person each from a handful of “areas”, such as the branches, access services, etc.). • dotcms is primarily managed in systems (2 staff), with delegates from admin and outreach allowed to make limited changes to achieve their goals. libguides is used by all librarians and several staff, with six people given admin privileges. wordpress is used only in special collections. • xxxxxx dept manages major public facing platforms (drupal, wordpress, and shares libguides responsibilities with xxxxxx dept). xxxxxx manages omeka. within platforms, responsibilities are largely managed by department with individuals assigned content duties & permissions as needed. • different units maintain their content; one unit has overall management and checks for uniformity, needed updates, and broken links. • developers/communications office oversees some aspects, library management, research and collections librarians, and key staff edit other pieces. • currently, content is maintained by the xxxxxx librarian in coordination with content stakeholders from around the organization. we are in the process of migrating our site from drupal to omniupdate. once that is complete, we will develop a new model for content responsibilities. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 44 • content is provided by department/services. • 5 librarians manage the libguides question 9 titles of positions in your organization whose primary duties involve creation, management and/or editing of web content: • head of web services; developer; web designer; user experience librarian • user experience librarian, lead librarian for discovery systems, digital technologies development librarian, lead librarian for software development. and we have titles that are university system it titles that don't mean a whole lot, such as technology support specialist and business and technology applications analyst. • web content specialist • user experience strategist, user experience designer, user experience student assistants , director of marketing communications and events • sr. ux specialist • web support consultant; coordinator, web services & library technology • editor & content strategist in library communications • web manager • discovery & systems librarian • head of library systems and technology • web services and data librarian • communications manager • web content and user experience specialist • metadata and discovery systems librarian, systems analyst, outreach librarian • digital services librarian; manager, communication services; communication specialist • (1) web project manager and content strategist, (2) web content creator • web services librarian • web developer ii • sr. software engineer, program director for digital services • user experience librarian • digital initiatives & scholarly communication librarian; senior library associate in digital scholarship and services • web services and usability librarian • senior library specialist -web content • web developer, software development librarian information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 45 appendix g: definitions of web content strategy question 11 in your own words, please define web content strategy. • a cohesive plan to create an overall strategy for web content that includes tone, terminology, structure, and deployment to best communicate the institution's message and enable the user. for the next question, the true answer is sort of. we have the start of a style guide. we also have the university's branding policies. we also have a web governance committee that is university-wide, of which i'm a part of. however, we don't have a complete strategy and it is certainly not documented. so you pick. • planning, development, and management of web content. two particularly important parts of web content strategy for academic library websites: 1. keeping content up to date and unpublishing outdated content. 2. building consensus for the creation and maintenance of a web style guide and ensuring that content across the large website adheres to the style guide. • strategies for management of content over its entire lifecycle to ensure it is accurate, timely, usable, accessible, appropriate, findable, and well-organized. • a system of workflows, training, and governance that supports the entire lifecycle of content, including creation, maintenance, and updating of content across all communications channels (e.g. websites, social media, signage). • a comprehensive, coordinated, planned approach to content across the site including components such as style guides, accessibility, information architecture, discoverability, seo. • not terribly familiar with the concept in a formal sense but think of it related to how the institution considers the intersection of content made available by the institution, the management and governance of issues such as branding/identity, accessibility, design, marketing, etc. • intentional and coordinated vision for content on the website • content strategy is the planning for the lifecycle of content. it includes creating, editing, reviewing, and deleting content. we also use a content strategy framework to determine each of the following for the content on our websites: audience, page goal, value proposition, validation, and measurement strategy. • website targets the community to ensure they can find what they need • the process of creating and enacting a vision for the organization and display of web content so that it is user friendly, accurate, up-to-date, and effective in its message. web content strategy often involves considering the thoughts and needs of many stakeholders, and creating one cohesive voice to represent them all. • web content strategy is the planning, design, delivery and governance plan for a website. this responsibility is guided by the library website management working group. • a web content strategy is a cohesive approach to managing and editing online content. an effective strategy takes into account web accessibility standards and endeavors to produce and maintain consistent, reliable, user-centered content. an effective content strategy evolves to meet the needs of online users and involves regular user testing and reviews of web traffic/analytics. • web content strategy is the theory and practice of creating, managing, and publishing web content according to evidence-based best practices for usability and readability information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 46 • making sure your content aligns with both your business goals and your audience needs. • a plan to oversee the life cycle of useful, usable content from its creation through maintenance and ultimately removal. • web content strategy is the overarching strategy for how you develop and disseminate web content. ideally, it would be structured and user tested to ensure that the content you are spending time developing is meeting the needs of your library and your community. • a web content strategy guides the full lifecycle of web content, including creation, maintenance, assessment, and retirement. it also sets guiding principles, makes responsibility and authority clear, and documents workflows. • an overarching method of bringing user experience best practices together on the website including: heuristics, information architecture, and writing for the web • planning and management of online content • a defined strategy for creating and delivering effective content to a defined audience at the right time. • in the most basic sense, web content strategy is matching the content, services and functionality of web properties with the organizational strategic goals. • web content strategy can include guidelines, processes, and/or approaches to making your website(s) usable, sustainable, and findable. it's a big-picture or higher-level way of thinking about your site(s), rather than page by page or function by function. • deliberate structures and practices to plan, deliver, and evaluate web content. • producing content that will be useful to users and easy for them to access • tying content to user behavior/user experience? • web content strategy is the thoughtful planning and construction of website content to meet users' needs. • n/a • cohesive planning, development, and management of web content, to engage and support library users. • working with teams and thinking strategically and holistically about the usability, functions, services, information, etc. provided on the website to best meet the needs of the site's users, as well as incorporating the marketing/promotional perspectives offered by the website. • planning and managing web content • web content strategy is the idea that all written and visual information on a certain site would conform to or align with the goals for that site. • ensuring that the most accurate and appropriate words, images, and other assets are presented to patrons at the point of need, while using web assets to tell stories patrons might not know they want to know. abstract introduction background maturity models application of maturity models within user experience work in libraries assessing the maturity of content strategy practice in libraries methods findings demographic information infrastructure & organizational structure content management systems dedicated positions, position titles, and organizational workflows web content strategy practices discussion proposed maturity model content strategy maturity model for academic libraries level 1: ad hoc level 2: establishing level 3: scaling level 4: sustaining level 5: thriving conclusion endnotes letter from the editor: improving ital's peer review letter from the editor improving ital’s peer review kenneth j. varnum information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13573 over the past several months, ital has enrolled almost 30 reviewers to the journal’s new review panel. increasing the pool of reviewers for the journal supports the editorial board’s desire to provide equitable treatment to submitted articles by having two independent reviews provide double-blind consideration of each article, a practice that has now been in effect for articles submitted after may 1, 2021. i am grateful to the individuals (listed on the editorial team page) who volunteered, attended an orientation session, and have begun contributing to the work of the journal. * * * * * * in this issue in the editorial section of this issue, we have a column by incoming core president margaret heller. her essay, “making room for change through rest,” highlights the need for each of us to recharge after a collectively challenging year. this inaugurates what we plan to be an occasional feature, the “core leadership column,” to which we invite contributions from members of core leadership. it is joined by two other regular items, our editorial board thoughts essay by michael p. sauers, “do space’s virtual interview lab: using simple technology to serve the public in a time of crisis” and william yarbrough’s public libraries leading the way column, “service barometers: using lending kiosks to locate patrons.” an interesting and diverse set of peer-reviewed articles round out the issue: 1. the impact of covid-19 on the use of academic library resources / ruth sara connell, lisa c. wallis, and david comeaux 2. emergency remote library instruction and tech tools: a matter of equity during a pandemic / kathia ibacache, amanda rybin, and eric vance 3. off-campus access to licensed online resources through shibboleth / francis jayakanth, anand t. byrappa, and raja visvanathan 4. a framework for measuring relevancy in discovery environments / blake l. galbreath, alex merrill, and corey m. johnson 5. beyond viaf: wikidata as a complementary tool for authority control in libraries / carlo bianchini, stefano bargioni, and camillo carlo pellizzari di san girolamo 6. algorithmic literacy and the role for libraries / michael ridley and danica pawlick-potts 7. persistent urls and citations offered for digital objects by digital libraries / nicholas homenda kenneth j. varnum, editor varnum@umich.edu june 2021 https://ejournals.bc.edu/index.php/ital/about/editorialteam https://ejournals.bc.edu/index.php/ital/article/view/13513 https://ejournals.bc.edu/index.php/ital/article/view/13461 https://ejournals.bc.edu/index.php/ital/article/view/13461 https://ejournals.bc.edu/index.php/ital/article/view/13499 https://ejournals.bc.edu/index.php/ital/article/view/13499 https://ejournals.bc.edu/index.php/ital/article/view/12629 https://ejournals.bc.edu/index.php/ital/article/view/12751 https://ejournals.bc.edu/index.php/ital/article/view/12751 https://ejournals.bc.edu/index.php/ital/article/view/12589 https://ejournals.bc.edu/index.php/ital/article/view/12835 https://ejournals.bc.edu/index.php/ital/article/view/12959 https://ejournals.bc.edu/index.php/ital/article/view/12963 https://ejournals.bc.edu/index.php/ital/article/view/12987 mailto:varnum@umich.edu in this issue off-campus access to licensed online resources through shibboleth article off-campus access to licensed online resources through shibboleth francis jayakanth, ananda t. byrappa, and raja visvanathan information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12589 abstract institutions of advanced education and research, through their libraries, invest substantially in licensed online resources. only authorized users of an institution are entitled to access licensed online resources. seamless on-campus access to licensed resources happens mostly through internet protocol (ip) address authentication. increasingly, licensed online resources are accessed by authorized users from off-campus locations as well. libraries will, therefore, need to ensure seamless off-campus access to authorized users. libraries have been using various technologies, including proxy server or virtual private network (vpn) server or single sign-on, to facilitate seamless offcampus access to licensed resources. in this paper, authors share their experience in setting up a shibboleth-based single sign-on (sso) access management system at the jrd tata memorial library, indian institute of science, to enable authorized users of the institute to seamlessly access licensed online resources from off-campus locations. introduction the internet has both necessitated and offered options for libraries to enable remote access to an organization’s licensed online content—journals, e-books, technical standards, bibliographical and full-text databases, and more. in the absence of such an option for remote access, faculty, students, and researchers have limited and constrained access to the licensed online content from off campus locations. as scholarly resources transitioned from print to online in the mid-1990s, libraries and their vendors had to start identifying user affiliations in order to grant access to licensed online resources to the authorized users of an institution. the ip address was an obvious mechanism to do that. allowing or denying access to online resources based on a user’s ip address was simple, it worked, and, in the absence of practical alternatives, it became the universal means of authentication for gaining access to licensed library content.1 to facilitate seamless access to licensed online resources from off-campus sites, libraries have been using various technologies including proxy server or vpn server or remote desktop gateway or federated identity management or a combination of the said technologies. in our institute, the on-campus ip-based access to the licensed content is supplemented by vpn technology for off-campus access. the covid-19 pandemic has necessitated academic and scientific staff work from home, which demands smooth and seamless access to the organization’s licensed content. the sudden surge in demand for seamless off-campus access to the licensed online resources had an impact on the institute’s vpn server. also, not all authorized users of the francis jayakanth (francis@iisc.ac.in) is scientific officer, j.r.d. tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, j.r.d. tata memorial library, indian institute of science. raja visvanathan (raja@inflibnet.ac.in) is scientist c (computer science), inflibnet centre, gandhinagar, india. © 2021. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:raja@inflibnet.ac.in information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 2 institute are entitled to get vpn access. to mitigate the situation, the library, therefore, had to explore a secure, reliable, and cost-effective solution to facilitate seamless off-campus access to all the licensed online resources to all the authorized users of the institute. after exploring the possibilities, the library decided to implement a single sign-on solution based on shibboleth. shibboleth software implements the security assertion markup language (saml) protocol, separating the functions of authentication (undertaken by the library or university, which knows its community of end users) and authorization (undertaken by the resource provider, which knows which libraries have licenses for their users to access the resource in question). 2 about the indian institute of science (iisc) the indian institute of science (iisc, or “the institute”) was established in 1909 by a visionary partnership between the industrialist jamsetji nusserwanji tata, the maharaja of mysore, and the government of india. over the 109 years since its establishment, iisc has become the premier institute for advanced scientific and technological research and education in india. since its inception, the institute has laid a balanced emphasis on the pursuit of fundamental knowledge in science and engineering, and the application of its research findings for industrial and social benefit. during 2017–18, the institute initiated the practice of undergoing international peer academic reviews over a 5-year cycle. each year, a small team of invited international experts reviews a set of departments. the experts spend 3 to 4 days at the institute. during this period, they interact closely with the faculty and students of these departments and tour the facilities, aiming to assess the academic work against international benchmarks. iisc has topped the ministry of human resource development (mhrd), government of india’s nirf (national institutional ranking framework) rankings not only in the university’s category but also overall among all ranked institutions. times higher education has placed iisc at the 8th position in its small university rankings (that is, among universities with fewer than 5 ,000 students), at the 13th position in its ranking of universities in the emerging economies, and in the range 91–100 in its world reputation rankings. in the qs world university rankings, iisc is ranked 170. in the same ranking system, on the metric of citations per faculty, iisc is placed in second position. iisc publishes about 3,000 papers per year in scopus and web of science indexed journals and conferences and, each year, the institute awards around 400 phd degrees. about the iisc library jrd tata memorial library (https://www.library.iisc.ac.in), popularly known as the indian institute of science library, is one of the best science and technology libraries in india. started in 1911, as one of the first three departments in the institute, it has become a precious national resource center in the field of science and technology. the library receives annually a grant of 1012% of the total budget of the institute. the library spends about 95% of its budget toward periodical subscriptions, which is unparalleled in this part of the globe. with a collection of nearly 500,000 volumes of books, periodicals, technical reports and standards, the jrd tata memorial library is one of the finest in the country. currently, it subscribes to over 13,000 current periodicals. the library also maintains the iisc’s research publications repository, eprints@iisc (http://eprints.iisc.ac.in), and its theses and dissertations repository, etd@iisc (https://etd.iisc.ac.in). https://www.library.iisc.ac.in/ http://eprints.iisc.ac.in/ http://etd.iisc.ac.in/ https://etd.iisc.ac.in/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 3 off-campus access to licensed online resources in a typical research library, licensed scholarly resources comprise research databases, electronic journals, e-books, standards, and more. a library licenses these resources through publishers/vendors. these license agreements limit access to the resources to the authorized users of an institute. in our case, authorized users include faculty members, enrolled students, current staff, contractual staff, and walk-in users to the library. seamless access to the licensed resources from on-campus sites is predominantly ip-address authenticated, which is a simple and efficient model for users physically located on the institute campus. these users expect a similar experience while accessing licensed online resources from off-campus locations. therefore, the challenge to the libraries is to ensure that such off-campus accesses are secure, seamless, and restricted to authorized users of an institute. libraries have been using various technologies including proxy servers, vpn servers, or single sign-on to facilitate seamless off-campus access to licensed resources. our institute has been using vpn technology to enable off-campus access to licensed online resources. a virtual private network (vpn) is a service offered by many organizations to its members to enable them to remotely connect to the organization’s private network. a vpn extends a private network across a public network and allows users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network. applications running across a vpn may therefore benefit from the functionality, security, and management of the private network. encryption is common, although not an inherent, part of a vpn connection.3 in our institute, faculty members and students are provided access to the vpn service when their institute email address is created. users follow four steps to use a vpn client to get connected to the campus network: • install vpn client software on their computer system. cisco anyconnect (https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-securemobility-client/at-a-glance-c45-578609.html) is one such software. • start the vpn client software every time there is a need to connect to the private network. • enter the address of the institute’s vpn server, and click connect in the anyconnect window. • log in to the vpn server using their institutional email credentials. an authorized user of the institute can use any of the ip authenticated network services, including the licensed online resources, after a successful login to the vpn server. the vpn technology has been serving the purpose well, but the service is, by default, available only to the institute’s faculty and students. other categories of employees such as project assistants, project associates, research assistants, post-doctoral fellows, and others, who constitute a good percentage of iisc staff, are provided vpn access on a case-by-case basis. during the covid-19 lockdown, the library received several enquiries about accessing the online resources from off-campus sites. realizing the importance of the situation, the library quickly assessed the various possibilities for facilitating seamless off-campus access to the subscribed online resources apart from the vpnbased access. federated access through shibboleth identity provider (idp) service emerged as a possible solution to facilitate seamless off-campus access to the entire institute community. https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 4 federated access federated access is a model for access control in which authentication and authorization are separated and handled by different parties. if a user wishes to access a resource controlled by a service provider (sp), the user logs in via an identity provider (idp). more complex forms of federated access involve the use of attributes (information about the user passed from the idp to the sp, which can be used to make access decisions) and can include extra services such as trust federations and discovery services (where the user selects which idp to use to connect to the sp). 4 examples of this federated access model include shibboleth and openathens. shibboleth is opensource software that offers single sign-on infrastructure. openathens is a commercial product delivered as a cloud-based solution. it supports many of the same standards as shibboleth. so, an institution could pay and join the openathens federation, which will provide technical support to set up, integrate, and operationalize federated access using openathens. we decided to go with shibboleth for the following reasons: • to avoid the recurring cost associated with the openathens solution. • the existence of a shibboleth-based infed federation in the country. infed manages the trust between the participating institutions and publishers (http://infed.inflibnet.ac.in/). • infed is part of the edugain inter-federation, which enables our users to gain access to the resources of federations of other countries. what is shibboleth? shibboleth is a standards-based, open-source software package for web single sign-on across or within organizational boundaries. it allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner. the shibboleth software implements widely used federated identity standards, principally the oasis security assertion markup language (saml), to provide a federated single sign-on and attribute exchange framework. a user authenticates with their organizational credentials, and the organization (or identity provider) passes the minimal identity information necessary to the service provider to enable an authorization decision. shibboleth also provides extended privacy functionality allowing a user and their home site to control the attributes released to each application (https://www.shibboleth.net/index/). shibboleth has two major components: (1) an identity provider (idp), and (2) a service provider (sp). the idp supplies required authorizations and attributes about the users to the service providers (for example, publishers). the service providers make use of the information about the users sent by the idp to make decisions on whether to allow or deny access to their resources. interaction between a shibboleth identity provider and service provider. when a user attempts to access licensed content on the service provider’s platform, the service provider generates an authentication request and then directs the request and the user to the user’s idp server. the idp prompts for the login credentials. in our setup, the idp server communicates the login credentials to the institute’s active directory (ad) using the secure lightweight directory access protocol (ldap). http://infed.inflibnet.ac.in/ https://www.shibboleth.net/index/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 5 ad is a directory service provided by microsoft. in a directory service, objects (such as a user, a group, a computer, a printer, or a shared folder) are arranged in a hierarchical manner facilitating easy access to the objects. organizations primarily use ad to perform authentication and authorization. once the authenticity of a user is verified, ad helps in determining if a user is authorized to use a specific resource or service. access is granted to a user only if the user checks out on both counts. the ad authenticates a user, and the response is sent back to the idp along with the required attributes. the idp then releases only the required set of attributes to the service provider. based on the idp attributes, which is nothing but a user’s entitlement, the sp grants access to the resource. figure 1 illustrates the functioning of the two components of shibboleth. figure 1. a shibboleth workflow involving a user, identity provider, and service provider. identity federation the interaction between a service provider and identity provider happens based on mutual trust. the trust is established by providing idp metadata as encrypted keys and the idp url that the sp uses to send and request information from the idp. the exchange of metadata between idp and sp can be informal if an institution licenses online resources from only a few publishers. however, research libraries license content from hundreds of sps. therefore, the role of federations is significant. in the absence of a federation, each identity provider and service provider must individually communicate with each other about their existence and configuration, as illustrated in figure 2. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 6 figure 2. individual communication between idps and sps. a federation is merely a list of metadata entries aggregated from their member idps and their sps. our institute is a member of infed (information and library network access management federation). infed was established as a centralized agency to coordinate with member institutions in the process of implementing user authentication and access control mechanism across all member institutions. infed manages the trust relationship between the idps and sps (publishers) in india. therefore, individual idps that intend to facilitate access to subscribed online resources through shibboleth will share their metadata with infed. infed, in turn, will share the metadata of the idps with respective service providers, as illustrated in figure 3. other regions have their federations. for example, n the us, incommon (https://www.incommon.org/) serves as the federation, and in the uk, it is the uk access management federation (http://www.ukfederation.org.uk/). https://www.incommon.org/ http://www.ukfederation.org.uk/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 7 figure 3. role of a federation as a trust manager between idps and sps. how does one gain access to shibboleth-enabled resources? a federation manages the trust between identity providers and service providers. the sps enable shibboleth-based access to subscribed resources to the idps based on the metadata shared by a federation. once the sps allow access, users can access such resources by using the institutional login option via the athens/shibboleth link found on the service provider’s platform. alternatively, a library can create a simple html page listing all the shibboleth-enabled licensed resources, as shown in figure 4. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 8 figure 4. partial screenshot of shibboleth-enabled resources of our institute. each of the links in figure 4 is a wayfless url. a wayfless url is specific to an institution (idp), and it enables users of that institution to gain federated access to a service or resource in a way that bypasses where are you from (wayf), or the institutional login (discovery service) steps on the sp’s platform. since the institutional login or the discovery service step can be confusing to end users, wayfless links to the resources will facilitate an improved end-user experience in accessing licensed resources. a user needs to follow a link from the list of resources. the link will take the user to the sp. the sp will redirect the user to the idp server for authentication. after successful authentication, the user will gain access to the resource. there are two ways to get a wayfless url to a service: (1) the service provider can share the url or (2) one can make use of a wayfless url generator service like wugen (https://wugen.ukfederation.org.uk/wugen/login.xhtml). https://wugen.ukfederation.org.uk/wugen/login.xhtml information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 9 benefits of shibboleth-based access shibboleth-based single sign-on can effectively address several requirements of the libraries in ensuring secure and seamless on-campus and off-campus access to subscribed online resources. there are other benefits of shibboleth-based sso: 1. it is open-source software that provides single sign-on infrastructure. 2. it enables organizations to use their existing user authentication mechanism to facilitate seamless access to licensed online resources. 3. being a single sign-on system, for the end users, it eliminates the need to have individual credentials for each online resource. 4. it uses security assertion mark-up language (saml) to securely transfer information about the process of authentication and authorization. 5. it is used by most of the publishers, who facilitate shibboleth-based access through shibboleth federations. 6. it requires a formal federation as a trusted interface between the institutions as an identity provider (idp) and publishers as service providers (sp) thereby ensuring the use of uniform standards and protocols while transmitting attributes of authorised users to publishers. inflibnet’s access management federation, infed, plays this role (https://parichay.inflibnet.ac.in/objectives.php). idp server configuration we installed the shibboleth idp software version 3.3.2 on a virtual machine on the azure platform. the vm system is configured with two virtual cpus, 4 gb of ram, 300 gib of os disk (standard hdd), and ubuntu linux os version 18.04.4 lts. coordination with the organization’s network support team is essential. the network support team handles the domain name service resolution of the idp server and facilitates the idp server to communicate with the organization’s active directory and to open non-standard communication ports on the idp server. shibboleth idp usage statistics the infed team has developed a beta version of the usage analysis tool called infedstat to analyse the use of federated access to gain access to licensed resources. we have implemented the tool on the idp server. figure 5 shows the redacted screenshot of the infedstat dashboard. it shows • date-wise usage details of logged-in users along with ip address, time logged in, and the publishers’ platforms accessed, • number of times the publishers’ platforms were accessed during a specific period, • number of times users logged in for a specific period, • unique users for a specific period, and • unique publishers accessed during a specific period. https://parichay.inflibnet.ac.in/objectives.php information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 10 figure 5. idp usage dashboard. conclusions the implementation of federated access to subscribed online resources has ensured that all the authorized users of the institute can access almost all the licensed resources from wherever they are. the counter 5 usage analysis of subscribed resources for the period of january 2020 to october 2020 indicates that usage of online resources has increased by nearly 20 percent over the last year for the same period. the enhanced use could be partly because of ease of accessing online resources facilitated by federated access. to assess the reasons for enhanced usage of online resources, the library is planning to conduct a survey to understand how convenient and useful federated access to online resources has been especially while being off campus. federated access through single sign-on is useful not just for accessing licensed online resources. a typical research library offers various other services to its users, including the institutional repository service, learning management system, online catalogue, etc. the library intends to integrate such services with sso, thereby freeing the end users from service-specific credentials. endnotes 1 thomas dowling, “we have outgrown ip authentication,” journal of electronic resources librarianship 32, no. 1 (2020): 39–46, https://doi.org/10.1080/1941126x.2019.1709738. 2 john paschoud, “shibboleth and saml: at last, a viable global standard for resource access management,” new review of information networking 10, no. 2 (2004): 147–60, https://doi.org/10.1080/13614570500053874. 3 andrew g. mason, ed., cisco secure virtual private network (cisco press, 2001): 7, https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336. 4 masha garibyan, simon mcleish, and john paschoud, “current access management technologies,” in access and identity management for libraries: controlling access to online information (london, uk: facet publishing, 2014): 31–38. https://doi.org/10.1080/1941126x.2019.1709738 https://doi.org/10.1080/13614570500053874 https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336 abstract introduction about the indian institute of science (iisc) about the iisc library off-campus access to licensed online resources federated access what is shibboleth? interaction between a shibboleth identity provider and service provider. identity federation how does one gain access to shibboleth-enabled resources? benefits of shibboleth-based access idp server configuration shibboleth idp usage statistics conclusions endnotes applying gamification to the library orientation: a study of interactive user experience and engagement preferences articles applying gamification to the library orientation a study of interactive user experience and engagement preferences karen nourse reed and a. miller information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12209 karen nourse reed (karen.reed@mtsu.edu) is associate professor, middle tennessee state university. a. miller (a.miller@mtsu.edu) is associate professor, middle tennessee state university. © 2020. abstract by providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. the benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. one academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. the library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups. introduction background newcomers to the academic campus can be a bit overwhelmed by their unfamiliar environment: there are faces to learn, services and processes to navigate, and an unexplored landscape of academic buildings to traverse. whether one is an incoming student or recently hired employee of the university, all need to become quickly oriented to their surroundings to ensure productivity. in the midst of this transition, the academic library may or may not be on the list of immediate inquiries; however, the library is an important place to start. newcomers would be wise to familiarize themselves with the building and its services so that they can make optimal use of its offerings. two studies found that students who used the library received better grades and had higher retention rates. 1 another study regarding university employees revealed that untenured faculty made less use of the library than tenured faculty, a problem attributed to lack of familiarity with the library.2 researchers have also found that faculty will often express interest in different library services without realizing that these services are in fact available.3 it is safe to say that libraries cannot always rely on newcomers to discover the physical and electronic services on their own; they need to be shown these items in order to mitigate the risk of unawareness. in consideration of these issues, the walker library at middle tennessee state university (mtsu) recognized that more could be done to welcome its new arrivals to campus. the public university enrolls approximately 21,000 students, the majority of whom are undergraduates. however, with a carnegie classification of doctoral/professional and over one hundred graduate degree programs, there was a strong need for specialized research among the university’s graduate students and faculty. other groups needed to use the library too: non-faculty employees on campus as well as community users who frequently used walker library for its specialized and general collections. the authors realized that when new members of these different groups mailto:karen.reed@mtsu.edu mailto:a.miller@mtsu.edu information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 2 arrived on campus, few opportunities were available for acclimation to the library’s services or building layout. limited orientation experiences were conducted within library instruction classes, but these sessions primarily taught research skills and targeted freshman generaleducation classes as well as select upper-division and graduate classes. in short, it appeared that students, employees, and visitors to the university would largely have to discover the library’s services on their own through a search on the library website or an exploration of the physical library. it was very likely that, in doing so, the newcomers might miss out on valuable services and information. as mtsu librarians, the authors felt strongly that library orientations were important to everyone at the university so that they might make optimal use of the library’s offerings. the authors based this opinion on their knowledge of relevant scholarly literature as well as their own anecdotal experiences with students and faculty.4 the authors defined the library orientation differently from library instruction: in their view, an orientation should acquaint users with the services and physical spaces of the library, as compared to instruction that would teach users how to use the library’s electronic resources such as databases. the desired new approach would structure orientations in response to the different needs of the library’s users. for example, the authors found that undergraduates typically had distinct library interests compared to faculty. it was recognized that library orientations were time-consuming for everyone: library patrons at mtsu often did not want to take the time for a physical tour, nor did the library have the staffing to accommodate large-scale requests. the authors turned to the gamification trend, and specifically interactive storytelling, as a solution. interactive storytelling has previous applications in librarianship as a means of creating an immersive and self-guided user experience.5 however, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. to overcome this gap, the authors developed an online, interactive, game-like experience via storytelling software to orient four different groups of users to the library’s services. these groups were undergraduate students, graduate students, faculty members (which included both faculty and staff at the university), and community members (i.e., visitors to the university or alumni); see figure 1 for an illustration of each groups’ game avatars. these groups were invited to participate in the gamified experience called libgo (short for library game orientation). after playing libgo, participants gave feedback through an online survey. this paper will give a brief explanation of the creation of the game, as well as describe the results of research conducted to understand the impact of the gamified experience across the four user groups. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 3 figure 1. libgo players were allowed to self-select their user group upon entering the game. each of the four user groups was assigned an avatar and followed a logic path specified for that group. literature review traditional orientation searches for literature on library orientation yield very broad and yet limited details about users of the traditional library orientation method. it is important to note that the terms “library tour” and “library orientation” can be somewhat vague, because this terminology is not interchangeable, yet is frequently treated as such in the literature.6 these terms are often included among library instruction materials which predominately influence undergraduate students.7 kylie bailin, benjamin jahre, and sarah morris define orientation as “any attempt to reduce library anxiety by introducing students to what a college/university library is, what it contains, and where to find information while also showing how helpful librarians can be.”8 their book is a culmination of case studies of academic library orientation in various forms worldwide where the common theme across most chapters is the need to assess, revise, and change library orientation models as needed, especially in response to feedback, staff demands, and the evolving trend of libraries and technology.9 furthermore, the majority of these studies are undergraduate-focused, and often freshman-focused, while only a few studies are geared towards graduate students. other traditional orientation problems discussed in the literature include students lacking intrinsic motivation to attend library orientation, library staff time required to execute the orientation, and lack of attendance.10 additionally, among librarians there seems to be consensus that the traditional library tours are the least effective means of orientation, yet they are the most highly used and with attention predominately focused on the undergraduate population alone. 11 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 4 in 1997, pixey anne mosely described the traditional guided library tour as ineffective, and documented the trend of libraries discontinuing it in favor of more active learning options.12 her study surveyed 44 students who took a redesigned library tour, all of whom were undergraduates (with freshmen as the target population). although mosely’s study only addressed one group of library users, it does attempt to answer a question on library perception whereby 93 percent of surveyed students indicated feeling more comfortable in using the library after the more active learning approach.13 a comparison study by marcus and beck looked at traditional vs treasure hunt orientations, and ultimately discovered that perception of the traditional method is limited by the selective user population and lack of effective measurements. they cited the need for continued study of alternative approaches to academic library orientation.14 a study by kenneth burhanna, tammy eschedor voelker, and julie gedeon looked at the traditional library tour from the physical and virtual perspective. confronted with a lack of access to the physical library, these researchers at kent state university decided to add an online option for the required traditional freshman library tour.15 their study compared the efficacy of learning and affective outcomes between face-to-face library tours and those of online library tours. of the 3,610 students who took the required library tour assignment, 3,567 chose the online tour method and 63 opted or were required to take the in-person, librarian-led tour. surveys were later sent to a random list of 250 students who did not take the in-person tour and the 63 students who did take the in-person tour. of the 46 usable responses all but one were undergraduates and 39 (85 percent) of them were freshman.16 this is a small sample size with a ratio of slightly greater than 2:1 for online versus in-person tour participation. although results showed that an instructor’s recommendation on format selection was the strongest influencing factor, convenience was also significant for those who selected the online option (81.5 percent). in contrast, only 18.5 percent of the students who took the face-toface tour rated it as convenient. the authors found that regardless of tour type, students were more comfortable using the library (85 percent) and more likely to use library resources (80 percent) after having taken a library tour. interestingly, students who took the online tour seemed slightly more likely to visit the physical library than those who took the in-person tour. ultimately the analysis of both tours showed this method of library orientation encourages library resource use, and the “online tour seems to perform as well, if not slightly better than the in-person tour.”17 gamification use in libraries an alternative format to the traditional method is gamification. gamification has become a familiar trend within academic libraries in recent years, and most often refers to the use of a technology based game delivery within an instructional setting. some users find gamified library instruction to be more enjoyable than traditional methods. for these people, gamification can potentially increase student engagement as well as retention of information.18 the goal of gamification is to create a simplified reality with a defined user experience. kyle felker and eric phetteplace emphasized the importance of user interaction over “specific mechanics or technologies” in thinking about the gamification design process.19 proponents of gamification of library instructional content indicate that it connects to the broader mission of library discovery and exploration as exemplified through collaboration and the stimulation of learning.20 additional benefits of gamification are its teaching, outreach and engagement functions.21 many researchers have documented specific applications of online gaming as a means of imparting library instruction. mary j. broussard and jessica urick oberlin described the work of librarians at lycoming college in developing an online game as one approach to teaching about information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 5 plagiarism.22 melissa mallon offered summaries of nine games produced for higher education, several of which were specifically created for use by academic libraries.23 many of these online library games reviewed used flash, or required players to download the game before playing. by contrast, j. long detailed an initiative at miami university to integrate gamification into the library instruction, a project which utilized twine.24 twine is an in-browser method and therefore avoids the problem of requiring users to download additional software prior to playing the game. other libraries have used online gamification specifically as a tool for library orientations. although researchers have demonstrated that the library orientation is an important practice in establishing positive first impressions of the library and counteracting library anxiety among new users, the differences between in-person versus online delivery formats are unclear.25 several successful instances have been documented in which the orientation was moved to an online game format. nancy o’hanlon, karen diaz, and fred roecker described a collaboration at ohio state university libraries between librarians and the office of first year experience; for this project, they created a game to orient all new students to the library prior to arrival on campus.26 the game was called “head hunt,” and was cited among those games listed in the article by mallon. 27 anna-lise smith and lesli baker reported the “get a clue” game at utah valley university which oriented new students over two semesters.28 another orientation game developed at california state university-fresno was noteworthy for its placement in the university’s learning management system (lms).29 in reviewing the literature regarding online library gamification efforts, there appear to be several best practices. several studies cite initial student assessment to understand student knowledge and/or perceptions of the content, followed by an iterative design process with a team of librarians and computer programmers.30 felker and phetteplace reinforced the need for this iterative process of prototyping, testing, deployment, and assessment as one key to success; however they also stated that the most prevalent reason for failure is that the games are not fun for users.31 librarians are information experts, and are not necessarily trained in fun game design. some libraries have solved this problem by partnering with or hiring professional designers; however for many under-resourced libraries, this is not an option.32 taking advantage of opensource tools, as well as the documented trial-and-error practices of others, can be helpful to newcomers who wish to break into new library engagement methods utilizing gamification. as literature has shown, a traditional library tour may have a place in the list of library services, but for whom and at what cost are questions with limited answers in studies done to date. gamification has offered an alternative perspective but with narrow accounts of its success in the online storytelling format and for users outside of the heavily focused freshman group. across all literature of library orientation studies, there is little reference to other library user populations such as faculty, staff, community users, distance students, or students not formally part of a class that requires library orientation. development of the library game orientation (libgo) libgo was developed by the authors with not only a consideration for the walker library user experience, but also with a specific attention to the differing needs of the multiple user groups served by the library. this user-focused concern led to exploring creative methodologies such as user experience research and human-centered design thinking, a process of overlapping phases that produces a creative and meaningful solution in a non-linear way. the three pillars of design information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 6 thinking are inspiration, ideation, and iteration.33 defining the problem and empathizing with the users (inspiration) led into the ideation phase, whereby the authors created lowand high-fidelity prototypes. the prototypes were tested and improved (iteration) through the use of beta testing in which playtesters interacted with the gamified orientation. the authors were novice developers of the gamified orientation, and this entailed a learning curve for not only the design thinking mindset but also the technical achievability. the development started with design thinking conversations and quickly turned to low-fidelity prototypes designed on paper. the development soon advanced to the actual coding so that the authors could get early designs tested before launching the final version. prior to deployment on the library’s website, libgo underwent a series of playtesting by library faculty, staff, and student employees. this testing was invaluable and led to such improvements as streamlining of processes and less ambiguity of text. libgo was developed with the twine open-source software (https://twinery.org), a product which is primarily used for telling interactive, non-linear stories with html. twine was an excellent application for this project as it allowed the creation of an online and interactive “choose your own adventure” styled library orientation game, in which users could explore the library based upon their selection of one of multiple available plot directions. with a modest learning curve and as an open source software, twine is highly accessible for those who are not accustomed to coding. for those who know html, css, javascript, variables, and conditional logic, twine’s capabilities can be extended. the library’s interactive orientation adventure requires users to select one of the four available personas: undergraduate student, graduate student, faculty, or community member. users subsequently follow that persona through a non-linear series of places, resources and points of interest built with the html output of using twee (twine’s programming language). see figure 2 for an example point of interest page and figure 3 for an example of a user’s final score after completing the gamified experience. once the twine story went through several iterations of design and testing, the html file was placed on the library’s website for the gamified orientation to be implemented with actual users. https://twinery.org/ information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 7 figure 2. this instructional page within libgo explains how to reserve different library spaces online. upon reading this content, the user will progress by clicking on one of the hypertext lines in blue font at the bottom. figure 3. based upon the displayed avatar, this libgo page is representative of a graduate student’s completion of libgo. the page indicates the player’s final score and gives additional options to return to the home page or complete the survey. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 8 purpose of study libgo utilized the common "choose your own adventure" format whereby players progress through a storyline based upon their selection of one of multiple available plot directions. although the literature suggests that other technology-based methods are an engaging and instructive mode of content delivery, little prior research exists regarding this specific approach to library outreach. furthermore, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. the researchers wanted to understand the potential of interactive storytelling as a means to educate a range of users on library services as well as make the library more approachable from a user perspective. the study was designed to understand the user experience of each of the four groups. the researchers hoped to discern which users, if any, found the gamified experience to be a helpful method of orientation to the library’s physical and electronic services. another area of inquiry was to determine whether this might be an effective delivery method by which to target certain segments of the campus for outreach. finally, the study intended to determine whether this method of orientation might incline participants toward future use of the library. methodology overview the authors selected an embedded mixed methods design approach in which quantitative and qualitative data were collected concurrently through the same assessment instrument.34 the survey instrument primarily collected quantitative data, however a qualitative open-response question was embedded at the end of the survey: this question gathered additional data by which to answer the research questions. each data set (one quantitative and one qualitative) was analyzed separately for each participant group, and then the groups were compared to develop a richer understanding of participant behavior. research questions the data collection and subsequent analysis attempted to answer the following questions: 1. which group(s) of library users prefer to be oriented to library services and resources through the interactive storytelling format, as compared to other formats? 2. which group(s) of library users are more likely to use library services and resources after participating in the interactive storytelling format of orientation? 3. what are user impressions of libgo, and are there any differences in impression based on the characteristics of the unique user group? participants participants for the study were recruited in-person and via the library website. in-person recruitment entailed the distribution of flyers and use of signage to recruit participants to play libgo in a library computer lab during a one-day event. online recruitment lasted approximately ten weeks and simply involved the placement of a link to libgo on the home page of th e library’s website. a total of 167 responses were gathered through both methods and participants were distributed as shown in table 1. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 9 table 1. composition of study’s participants group number affiliation number of responses 1 undergraduate students 55 2 graduate students 62 3 faculty 13 4 staff 28 5 community members 9 total 167 for the purposes of statistical data analysis, groups 3 and 4 were combined to produce a single group of 41 university employee respondents; also, group 5’s data was not included in the statistical analysis due to the low number of participants. qualitative data for all groups, however, was included in the non-statistical analysis. survey instrument a survey with twelve total questions was developed for this study and was administered online through qualtrics. after playing libgo, participants were asked to voluntarily complete the survey; if they agreed, they were redirected to the survey’s website. before answering any survey questions, the instrument administered an informed consent statement to participants . all aspects of the research, including the survey instrument, were approved through the university’s institutional review board (protocol number 18-1293). the first part of the survey (see appendix a) consisted of ten questions, each with a ten-point likert scaled response. the first five questions were each designed to measure a preference construct, and the next five questions each measured a likelihood construct. the pref erence construct referred to participant’s preference for a library orientation: did they prefer libgo’s online interactive storytelling format, or did they prefer another format such as in-person talks? the likelihood construct referred to the participant’s self-perceived likelihood of more readily engaging with the library in the future (both in-person and online) after playing libgo. the second part of the survey gathered the participant’s self-reported affiliation (see table 1 for the list of possible group affiliations) as well as offered participants an open-ended response area for optional qualitative feedback. data collection the study’s data was collected in two stages. in stage one, libgo was unveiled to library visitors during a special campus-wide week of student programming events. on the library’s designated event day, the researchers held a drop-in event at one of the library’s computer labs (see figure 4 for an example of event advertisement). library visitors were offered a prize bag and snacks if they agreed to play libgo and complete the survey. during the three-hour-long drop-in session, 58 individual responses were collected: the vast majority of these came from undergraduate students (51 responses), with additional responses from graduate students (n = 4), university staff employees (n = 2), and one community member responding. community members were defined as anyone not currently directly affiliated with the university; this group may have included prospective students or alumni. stage 2 began the following day after the library drop-in event, and simply involved the placement of a link to libgo on the home page of the library’s website. any visitor to the library’s website could click on the advertisement to be taken to libgo. this link information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 10 remained active on the library website for ten weeks, at which point the final data was gathered. a total of 167 responses were gathered during both stages and participants were distributed as previously shown in table 1. figure 4. example of student libgo event advertisement results quantitative findings statistical analysis of each of the ten quantitative questions required the use of one-way anova in spss. a post hoc test (hochberg’s gt2) was run in each instance to account for the different sample sizes. for all statistical analysis, only the data from undergraduates, graduate students, and university employees (a group which combined both faculty and staff results) were utilized. a listing of mean comparisons by group, for each of the ten survey questions, may be found in table 2. the analysis of the one-way anovas yielded statistically significant results for three of the ten individual questions in the first part of the survey: questions 2, 3, and 6 (see table 3). table 2. descriptive statistics for survey results (10-point scale, with 10 as most likely) survey question mean for undergraduate students mean for graduate students mean for university employees 1. in considering the different ways to learn about walker library, do you find this library orientation game to be more or less preferable as compared to other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)? 7.02 6.39 6.02 2. in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources? 8.13 6.94 7.12 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 11 3. if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own?) 7.38 5.94 5.98 4. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services by my own exploration of the library website?” 6.11 6.50 5.88 5. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services through an inperson orientation tour.” 6.11 5.08 5.76 6. after playing this orientation game, are you more or less likely to visit walker library in person? 8.27 6.94 6.90 7. after playing this library orientation game, are you more or less likely to use the walker library website to find out about the library (such as hours of operation, where to go to get different materials/services, etc.)? 7.82 6.97 7.20 8. after playing this library orientation game, are you more or less likely to seek help from a librarian at walker library? 6.95 6.58 6.63 9. after playing this library orientation game, are you more or less likely to use the library’s online resources (such as databases, journals, e-books)? 7.67 7.15 6.90 10. after playing this library orientation game, are you more or less likely to attend a library workshop, training, or event? 6.96 6.73 6.24 table 3. overall statistically significant group differences df f p w2 question 2 2 3.714 .027 .03 question 3 2 4.508 .012 .04 question 6 2 7.178 .001 .07 question 2 asked “in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 3.714, p = .027, ω2 = .03). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to prefer libgo in this manner (m = 8.13, sd = 1.94, p = .031) as information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 12 compared to the graduate students (m = 6.94, sd = 2.72). there was no statistically significant difference between undergraduates and the university employees (p = .145). according to criteria suggested by roger kirk, the effect size of .03 indicates a small effect in perceived usefulness of libgo as an introduction among undergraduates.35 question 3 asked “if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)?” the one-way anova found that there was a statistically significant difference between groups (f(2, 155) = 4.508, p = .012, ω2 = .04). the post hoc comparison using the hochberg’s gt2 test found that undergraduates were statistically significantly more likely to prefer libgo over other orientation options (m = 7.38, sd = 2.49, p = .021) as compared to graduate students (m = 5.94, sd = 3.06). there was no statistically significant difference between undergraduates and university employees (p = .053). the effect size of .04 indicates a small effect regarding undergraduate preference for libgo versus other orientation options. question 6 asked “after playing this library orientation game, are you more or less likely to visit walker library in person?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 7.178, p = .001, ω2 = .07). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to visit the library after playing libgo (m = 8.27, sd = 2.09, p = .003) as compared to graduate students (m = 6.94, sd = 2.20). additionally, the test found that undergraduates were statistically significantly more likely to visit the library after playing libgo (p = .007) as compared to university employees (m = 6.90, sd = 2.08). according to criteria suggested by kirk, the effect size of .07 indicates a medium effect regarding undergraduate potential to visit the library in person after playing libgo. 36 in addition to testing each individual survey question, tests were run to understand the possible group differences by construct (preference and likelihood). the preference construct was an aggregate of survey questions 1-5, and the likelihood construct was an aggregate of survey questions 6-10. for both constructs, the one-way anova found results which were not statistically significant. in all, the quantitative findings indicated three areas by which the experience of playing libgo was more helpful for the surveyed undergraduates than the other surveyed groups (i.e., graduate students or university employees). at this point, the analysis turned to the qualitative data so as to better understand participant views of libgo. qualitative findings analysis of the qualitative results was limited to the data collected in the survey’s final question. question 12 was an open-response area, and was intentionally prefaced with a vague prompt: “do you have any final thoughts for the library (suggestions, additions, modification, comments, criticisms, praise, etc.)?” of the 167 total survey responses, 67 individuals chose to answer this question. preliminary analysis showed that the feedback derived from this question covered a spectrum of topics, ranging from remarks on the libgo experience itself to broader concerns regarding other library services. open coding strategies were utilized to interpret the content of participant responses. under this methodology, the responses were evaluated for general themes and then coded and grouped information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 13 under a constant comparative approach.37 nvivo 12 software was used to code all 67 participant responses. initial coding yielded eight open codes, but these were later consolidated into six final codes (see table 4). one code (libgo improvement tip) was rather nuanced and yielded five axial codes (see table 5). axial codes denoted secondary concerns which fell under a larger category of interest. although some participants gave longer feedback which addressed multiple concerns, care was taken to segregate each distinct concern to a specific code. therefore, it is important to note that some comments addressed multiple concerns, and so the total number of concerns (n = 76) is greater than the total number of individuals responding to the prompt (n = 67). table 4. distribution of qualitative codes by user group code undergraduate graduate faculty staff community member total # concerns positive feedback 7 7 1 4 2 21 negative feedback 1 2 0 3 0 6 in-person tour preference 2 3 0 1 0 6 libgo improvement tip 5 11 1 3 3 23 library services feedback 2 4 3 0 0 9 library building feedback 1 7 1 2 0 11 total: 18 34 6 13 5 76 discussion of qualitative themes positive feedback (21 separate concerns). affirmative comments regarding libgo were primarily split between undergraduate and graduate students, with a small number of comments coming from the other groups. although all groups stated that the game was helpful, one undergraduate wrote “i wish i would’ve received this orientation at the very beginning of the year!” a graduate student declared “this was a creative way to engage students, and i think it should be included on the website for fun.” both community members commented on the utility of libgo in providing an orientation without having to physically come to the library; for example, “interactive without having to actually attend the library in person which i liked.” additionally, a community member pointed out the instructional capability of libgo, writing “i think i learned more from the game than walking around in the library.” negative feedback (6 separate concerns). unfavorable comments regarding libgo primarily challenged the orientation’s characterization as a “game” in terms of its lack of fun. one graduate student wrote a comment representative of this concern by stating, “the game didn’t really seem like a game at all.” a particularly searing comment came from a university staff member who information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 14 wrote, “calling this collection of web pages an ‘interactive game’ is a stretch, which is a generous way of stating it.” in-person tour preference (6 separate concerns). a small number of concerns indicated a preference for in-person orientations versus online. one undergraduate cited the ability to ask questions during an in-person tour as an advantage of that delivery medium. a graduate student mentioned their desire for kinesthetic learning over an online approach, writing, “i prefer hands on exploration of the library.” libgo improvement tip (23 separate concerns). suggested improvements to libgo were the largest area of qualitative feedback and produced five axial themes (subthemes); see table 5 for a breakdown of the five axial themes by group. 1. design issues were the largest cited area of improvement, and the most commonly mentioned design problem was the inability of the user to go back to previously seen content. although this functionality did in fact exist, it was apparently not intuitive to users; design modifications in future iterations are therefore critical. other users made suggestions as to the color scheme used and the ability to magnify image sizes. 2. user experience was another area of feedback, and primarily included suggestions on how to make libgo a more fun experience. one graduate student offered a role-playing game alternative. another graduate student expressed an interest in a game with side missions, in addition to the overall goals, where tokens could be earned for completed missions; the student justified these changes by stating “i feel that incorporating these types of idea will make the game more enjoyable.” in suggesting similar improvements, one undergraduate stated that libgo “felt more like a quiz than a game.” 3. technology issues primarily addressed two related issues: images not loading and broken links. images not loading could be dependent on many factors, including the user’s browser settings, internet traffic (volume) delaying load time, or broken image links, among others. broken links could be the root issue since the images used in libgo were taken from other areas of the library website. this method of gathering content pointed out a design vulnerability of using existing image locations (controlled by non-libgo developers) rather than images exclusively for libgo. 4. content issues were raised exclusively by graduate students. one student felt that libgo placed an emphasis on physical spaces in the library and did not give a deep enough treatment to library services. another graduate student asked for “an interactive map to click on so that we physically see the areas” of the library, thus making the interaction more user-friendly with a visual. 5. didn’t understand purpose is a subtheme where improvement is needed and is based on two comments made by the two university staff members. one wrote that “an online tour would have been better and just as informative,” although libgo was not only designed to be an online tour of the library, but also an orientation of the library’s services. the other staff member wrote, “i read the rules but it was still unclear what the objective was.” in all, it is clear that libgo’s purpose was confusing for some. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 15 table 5. libgo improvement tip axial codes by user group axial code undergraduate graduate faculty staff community member total # concerns design 4 3 0 0 1 8 user experience 1 2 1 0 1 5 tech issue 0 1 0 1 0 2 content 0 5 0 0 1 6 didn’t understand purpose 0 0 0 2 0 2 total: 5 11 1 3 3 23 library services feedback (9 separate concerns). several participants took the opportunity to provide feedback on general library services rather than on libgo itself. undergraduates simply gave general positive feedback about the value of the library, but many graduate students gave recommendations regarding specific electronic resource improvements. additionally, one graduate student wrote, “i think it is critical to meet with new graduate students before they start their program,” something the library used to do but had not pursued in recent years. although these comments did not directly pertain to libgo, the authors accepted all of them as valuable feedback to the library. library building feedback (11 separate concerns). this was another theme in which graduate students dominated the comments. feedback ranging from requests for microwave use, additional study tables and better temperature control in the building appeared. several participants asked for greater enforcement of quiet zones. like the library services feedback, the authors again took these comments as helpful to the overall library rather than libgo. discussion the results of this study indicated that some groups of library visitors better received the gamified library orientation experience than other groups. undergraduate students indicated the largest appreciation for a library orientation via libgo. specifically, they demonstrated a statistically significant difference over the other groups in supporting libgo’s usefulness as an orientation tool, a preference for libgo over other orientation formats, and a likelihood of future use of the physical library after playing libgo. these very encouraging results provide evidence for the efficacy of alternative means of library orientation. the qualitative results provided additional helpful insight regarding the user impressions from each of the five surveyed groups. this feedback demonstrated that a variety of groups benefited from the experience of playing libgo, including some community members who appreciated libgo as a means of becoming acclimated to the library without having to enter the building. a virtual orientation format was not ideal for a few players who indicated a preference for a face-toface orientation due to the ability to ask questions. many people identified areas of improvement for libgo. graduate students in particular offered a disproportionate number of suggestions as compared to the other groups. while they provided a great deal of helpful feedback, it is possible that graduate students were so distracted by the perceived problems that they could not fully take in the experience or gain value from libgo’s orientation purpose. it is also very likely that libgo information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 16 simply was not very fun for these players: several players noted that it did not feel like a game but rather a collection of content. the review of literature indicated that this amusement issue is a common pitfall of educational games. although the authors tried to design an enjoyable orientation experience, it is possible that more work is needed to satisfy user expectations. the mixed-methods design of this study was instrumental in providing a richer understanding of user perceptions. while the statistical analysis of participant survey responses was very helpful in identifying clear trends between groups, the qualitative analysis helped the authors draw valuable conclusions. specifically, the open-response data demonstrated that additional groups such as graduate students and community members appreciated the experience of playing libgo; this information was not readily apparent through the statistical analysis. additionally, the qualitative analysis demonstrated that many groups had concerns regarding areas of improvement that may have impaired their user experience. these important findings could help guide future directions of the research. in all, the authors concluded this phase of the research feeling satisfied that libgo showed great promise for library orientation delivery but could benefit from continued development and future user assessment. although undergraduate students seemed most receptive overall to a virtual orientation experience, other groups appeared to have benefited from the resource. study limitations a primary limitation of this study was its small sample size. as the entire university campus was targeted for participation in the study, the number of respondents was far too small to generalize the results. despite this limitation however, the study’s population reflected many different groups of library patrons on campus. the findings are therefore valuable as a means of stimulating future discussion regarding the value of alternative library orientation methods utilizing gamification. another limitation is that the authors did not pre-assess the targeted groups for their prior knowledge of walker library services and building layout, nor for their interest in learning about these topics. it is possible that various groups did not see the value in learning about the library for a variety of reasons. faculty members, in particular, may have considered their prior knowledge adequate for navigating the electronic holdings or building layout without recognizing the value of the other many services offered physically and electronically by the library. all groups may have experienced a level of “library anxiety” that prevented them from being motivated to learn more about the library.38 it is difficult to understand the range of covariate factors without a pre-assessment. finally, there was qualitative evidence supporting the limitation that libgo did not properly convey its stated purpose of orientation rather than imparting research skills. without understanding libgo’s focus on library orientation, users could have been confused or disappointed by the experience. although care was taken to make this purpose explicit, some users indicated their confusion in the qualitative data. this observed problem points to a design flaw that undoubtedly had some bearing on the study’s results. conclusion & future research convinced of the importance of the library orientation, the authors sought to move this traditional in-person experience to a virtual one. the quantitative results indicated that the gamified information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 17 orientation experience was useful to undergraduate students in its intended purpose of acclimating users to the library, as well as encouraging their future use of the physical library. at a time in which physical traffic to the library has shown a marked decline, new outreach strategies should be considered.39 the results were also helpful in showing that this particular iteration of the gamified orientation was preferred over other delivery methods by undergraduate students, as compared to other groups, to a statistically significant level. this is an important finding as it demonstrates that a diversified outreach strategy is necessary: different groups of library patrons desire their orientation information in different formats. the next logical question to ask however is: why did the other groups examined through the statistical data analysis (graduate students and faculty) not appreciate the gamified orientation to the same level as undergraduates? the answers to this question are complicated and may be explained in part by the qualitative analysis. based upon those findings, it is possible that the game did not appeal to these groups on the basis of fun or enjoyment; this concern was specifically mentioned by graduate students. faculty members, including staff, provided a smaller level of qualitative feedback; it is therefore difficult to speculate as to their exact reasons for disengagement with libgo. with this concern in mind, the authors would like to concentrate their next iteration of research on the specific library orientation needs of graduate students and faculty. both groups present different, but critical, needs for outreach. graduate students were the largest group of survey respondents, presumably indicating a high level of interest in learning more about the library. many graduate programs at mtsu are delivered partially or entirely online; as a result, these students may be less likely to come to campus. due to graduate students’ relatively infrequent visits to campus, a virtual library orientation could be even more meaningful for them in meeting their need for library services information. faculty are another important group to target because if they lack a full understanding of the library’s offerings, they are unlikely to assign assignments that wholly utilize the library’s services. although it is possible that faculty prefer an in-person orientation, many new faculty have indicated limited availability for such events. a virtual orientation seems conducive to busy schedules. however, it is possible that the issue is simply a matter of marketing: faculty may not know that a virtual option is available, nor do they necessarily understand all that the library has to offer. in all, future research should begin with a survey to understand what both groups already know about the library, as well as the library services they desire. another necessary step in future research would be the expansion of the development team to include computer programmers. although the authors feel that libgo holds great promise as a virtual orientation tool, more needs to be done to enhance the user’s enjoyment of the experience. twine is a user-friendly software that other librarians could pick up without having to be computer programmers; however, programmers (professional or student) could bring a design expertise to the project. future iterations of this project should incorporate the skills of multiple groups, including expertise in libraries, user research, visual design, interaction design, programming, marketing, and testers from each type of intended audience. collectively, this group will have the greatest impact on improving the user experience and ultimately the usefulness of a gamified orientation experience. this experience with gamification, and specifically interactive storytelling, was a valuable experience for walker library. these results should encourage other libraries seeking an alternate information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 18 delivery method for orientations. the authors hope to build upon the lessons learned from this mixed methods research study of libgo to find the correct outreach medium for their range of library users. acknowledgments special thanks to our beta playtesters and student assistants who worked the libgo event, which was funded, in part, by mt engage and walker library at middle tennessee state university. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 19 appendix a: survey instrument information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 20 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 21 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 22 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 23 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 24 endnotes 1 sandra calemme mccarthy, “at issue: exploring library usage by online learners with student success,” community college enterprise 23, no. 2 (january 2017): 27–31; angie thorpe et al., “the impact of the academic library on student success: connecting the dots,” portal: libraries and the academy 16, no. 2 (2016): 373–92, https://doi.org/10.1353/pla.20160027. 2 steven ovadia, “how does tenure status impact library usage: a study of laguardia community college,” journal of academic librarianship 35, no. 4 (january 2009): 332–40, https://doi.org/10.1016/j.acalib.2009.04.022. 3 chris leeder and steven lonn, “faculty usage of library tools in a learning management system,” college & research libraries, 75, no. 5 (september 2014): 641–63, https://doi.org/10.5860/crl.75.5.641. 4 kyle felker and eric phetteplace, “gamification in libraries: the state of the art,” reference and user services quarterly 54, no. 2 (2014): 19-23, https://doi.org/10.5860/rusq.54n2.19; nancy o’hanlon, karen diaz, and fred roecker, “a game-based multimedia approach to library orientation,” (paper, 35th national loex library instruction conference, san diego, may 2007), https://commons.emich.edu/loexconf2007/19/; leila june rod-welch, “let’s get oriented: getting intimate with the library, small group sessions for library orientation,” (paper, association of college and research libraries conference, baltimore, march 2017), http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/letsgetoriented.pdf. 5 kelly czarnecki, “chapter 4: digital storytelling in different library settings,” library technology reports, no. 7 (2009): 20-30; rebecca j. morris, “creating, viewing, and assessing: fluid roles of the student self in digital storytelling,” school libraries worldwide, no. 2 (2013): 54–68. 6 sandra marcus and sheila beck, “a library adventure: comparing a treasure hunt with a traditional freshman orientation tour,” college & research libraries 64, no. 1 (january 2003): 23–44, https://doi.org/10.5860/crl.64.1.23. 7 lori oling and michelle mach, “tour trends in academic arl libraries,” college & research libraries, 63, no. 1 (january 2002): 13-23, https://doi.org/10.5860/crl.63.1.13. 8 kylie bailin, benjamin jahre, and sarah morriss, “planning academic library orientations: case studies from around the world,” (oxford, uk: chandos publishing, 2018): xvi. 9 bailin, jahre, and morriss, “planning academic library orientations.” 10 marcus and beck, “a library adventure”; a. carolyn miller, “the round robin library tour,” journal of academic librarianship 6, no. 4 (1980): 215–18; michael simmons, “evaluation of library tours,” edrs, ed 331513 (1990): 1-24. 11 marcus and beck, “a library adventure”; oling and mach, “tour trends”; rod-welch, “let’s get oriented.” https://doi.org/10.1353/pla.20160027 https://doi.org/10.1016/j.acalib.2009.04.022 https://doi.org/10.5860/crl.75.5.641 https://commons.emich.edu/loexconf2007/19/ http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf https://doi.org/10.5860/crl.64.1.23 https://doi.org/10.5860/crl.63.1.13 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 25 12 pixey anne mosley, “assessing the comfort level impact and perceptual value of library tours,” research strategies 15, no. 4 (1997): 261–70, https://doi.org/10.1016/s07343310(97)90013-6. 13 mosley, “assessing the comfort level impact and perceptual value of library tours.” 14 marcus and beck, “a library adventure,” 27. 15 kenneth j. burhanna, tammy j. eschedor voelker, and jule a. gedeon, “virtually the same: comparing the effectiveness of online versus in-person library tours,” public services quarterly 4, no. 4(2008): 317–38, https://doi.org/10.1080/15228950802461616. 16 burhanna, voelker, and gedeon, “virtually the same,” 326. 17 burhanna, voelker, and gedeon, “virtually the same,” 329. 18 felker and phetteplace, “gamification in libraries.” 19 felker and phetteplace, “gamification in libraries,”20. 20 felker and phetteplace, “gamification in libraries.” 21 felker and phetteplace, “gamification in libraries”; o’hanlon et al., “a game-based multimedia approach.” 22 mary j. broussard and jessica urick oberlin, “using online games to fight plagiarism: a spoonful of sugar helps the medicine go down,” indiana libraries 30, no. 1 (january 2011): 28–39. 23 melissa mallon, “gaming and gamification,” public services quarterly 9, no. 3 (2013): 210–21, https://doi.org/10.1080/15228959.2013.815502. 24 j. long, “chapter 21: gaming library instruction: using interactive play to promote research as a process,” distributed learning (january 1, 2017), 385–401, https://doi.org/10.1016/b978-008-100598-9.00021-0. 25 rod-welch, “let’s get oriented.” 26 o’hanlon et al., “a game-based multimedia approach.” 27 mallon, “gaming and gamification.” 28 anna-lise smith and lesli baker, “getting a clue: creating student detectives and dragon slayers in your library,” reference services review 39, no. 4 (november 2011): 628–42, https://doi.org/10.1108/00907321111186659. 29 monica fusich et al., “hml-iq: frenso state’s online library orientation game,” college & research libraries news 72, no. 11 (december 2011): 626–30, https://doi.org/10.5860/crln.72.11.8667. https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1080/15228950802461616 https://doi.org/10.1080/15228959.2013.815502 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1108/00907321111186659 https://doi.org/10.5860/crln.72.11.8667 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 26 30 broussard and oberlin, “using online games”; fusich et al., “hml-iq”; o’hanlon et al., “a gamebased multimedia approach.” 31 felker and phetteplace, “gamification in libraries.” 32 felker and phetteplace, “gamification in libraries”; fusich et al., “hml-iq.” 33 “design thinking for libraries: a toolkit for patron-centered design,” ideo (2015), http://designthinkingforlibraries.com. 34 john w. creswell and vicki l. plano clark, designing and conducting mixed methods research (thousand oaks, ca: sage publications, 2007). 35 roger kirk, “practical significance: a concept whose time has come,” educational and psychological measurement, no. 5 (1996). 36 kirk, “practical significance.” 37 sandra mathison, “encyclopedia of evaluation,” sage, 2005, https://doi.org/10.4135/9781412950558. 38 rod-welch, “let’s get oriented.” 39 felker and phetteplace, “gamification in libraries.” http://designthinkingforlibraries.com/ https://doi.org/10.4135/9781412950558 abstract introduction background literature review traditional orientation gamification use in libraries development of the library game orientation (libgo) purpose of study methodology overview research questions participants survey instrument data collection results quantitative findings qualitative findings discussion of qualitative themes discussion study limitations conclusion & future research acknowledgments appendix a: survey instrument endnotes letter to the editor ann kucera information technology and libraries | june 2018 9 https://doi.org/10.6017/ital.v37i2.10407 dear editorial board, regarding “halfway home: user centered design and library websites” in the march 2018 issue of information technology and libraries (ital), i thought there were some interesting points. i think that your assertion, however, that user centered design automatically eliminates anything from a website that your main user group did not expressly ask for is faulty. when someone brings up the fact that user centered design is not statistically significant, i interpret that as a misunderstanding of what user centered design is. our academic library websites are not research projects so why would we gather statistically significant information about them? our academic library websites are (or should be) helpful to students and faculty and constantly changing to meet their needs. if librarians perpetuate a misunderstanding of user centered design, my fear is that misunderstanding could perpetuate stagnation and a refusal to change our technology/user interfaces in a rapidly changing environment and do our patrons and ourselves a disservice. user centered design is a set of tools to help us gather information about users and their needs. the information gathered informs the design but does not dictate the design and needs to be part of an iterative process. the web design team at your institution demonstrated user centered design when they added floor maps back into the web site when a group of users pointed out that it was causing problems for the main users at your institution. while valuable experience from librarians and other staff is critical to take into account, it is sometimes difficult to determine which pieces of the puzzle provide comfort to those who work at the library vs. which pieces assist students in their studies. i applaud your willingness to “clear the slate” and reduce the amount of information you were maintaining on your website. i’m guessing you may have removed dozens of links from your website. you only mentioned adding one category of information back into the design. i would say your user centered design process is working quite well. ann kucera systems librarian central michigan university https://doi.org/10.6017/ital.v37i1.10338 navigation design and library terminology: findings from a user-centered usability study on a library website communication navigation design and library terminology findings from a user-centered usability study on a library website isabel vargas ochoa information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12123 isabel vargas ochoa (ivargas2@csustan.edu) is web services librarian, california state university, stanislaus. © 2020. abstract the university library at california state university, stanislaus is not only undergoing a library building renovation, but a website redesign as well. the library conducted a user-centered usability study to collect data in order to best lead the library website “renovation.” a prototype was created to assess an audience-based navigation design, homepage content framework, and heading terminology. the usability study consisted of 38 student participants. it was determined that a topicbased navigation design will be implemented instead of an audience-based navigation, a search-all search box will be integrated, and the headings and menu links will be modified to avoid ambiguous library terminology. further research on different navigation and content designs, and usability design approaches, will be explored for future studies. introduction the university library at california state university, stanislaus is currently undergoing a much anticipated and necessary redesign of the library website. website redesigns are crucial and a part of website maintenance to acclimate with modern technology and meet accessibility standards. “if librarians are expected to be excellent communicators at the reference desk and in the classroom, then the library website should complement the work of a librarian.”1 in this case, a library website prototype was created, using a springshare llc product, libguides cms, as the testing subject for our user-centered usability study. the usability study was completed with 38 student participants belonging to different academic years and areas of study. the library website prototype tested was designed using a user-based design framework and an audience-based navigation. this study found issues reported from users based on navigation design and ambiguous library terminology. an audience-based navigation was chosen in order to best organize and group the information and services offered to best make them accessible for users. however, an audience-based navigation will directly affect users and their search behaviors.2 the prototype, like the current library website, did not have a search-all search box during the study. a catalog search box was utilized to test whether or not the catalog was enough for student participants to find information. this also forced the participants to utilize the menu navigation. literature review the design and approach of usability studies, preference for types of search boxes, navigation design, and library terminology evolve over time in parallel with technology changes. most recent usability studies use screen and audio recording tools as opposed to written observation notes. participants in recent studies are also more adapted to learning how to navigate websites, as mailto:ivargas2@csustan.edu information technology and libraries december 2020 navigation design and library terminology | ochoa 2 opposed to participants in usability studies twenty years ago. regardless, it’s crucial to compare the results from previous usability studies to analyze differences and similarities. different types of usability studies include user-centered usability studies and heuristic usability studies. this study chose a user-centered approach because of the library's desire to collect data and feedback from student users. the way in which the usability study is presented is also detrimental to the approach. website usability studies are meant to test the website, although participants may unconsciously believe they are being tested. in tidal’s library website case study (2012) researchers assured the participants that “the web site was being tested and not the participants themselves.”3 this unconscious belief may also affect the data collected from the participants and “influence user behavior, including number of times students might attempt to find a resource or complete a given task.”4 the features tested were the navigation design and homepage elements. the navigation design in the prototype was developed to test an audience-based navigation design (see figure 1). an audience-based navigation design organizes the navigation content by audience type. 5 that is to say, the user will begin their search by identifying themselves first. although this design can organize content in a more efficient manner, especially for organizations that have specific, known audiences, critics argue that this design forces users to identify themselves before searching for information, thus taking them out of their task mindset.6 for this usability study, i wanted to test this navigation design and compare the results to our current navigation design which is a topicbased navigation design. a topic-based navigation design is developed to present topics as navigation content.7 this design is our current library website navigation design (see figure 2) figure 1. screenshot of the audience-based navigation design developed for the library website prototype. figure 2. screenshot of the current content-based navigation design in the library website. information technology and libraries december 2020 navigation design and library terminology | ochoa 3 designing the navigation and homepage also means choosing accessible terms that are relevant to all users. unfortunately, over the course of many decades, library terminology has been a hindrance for student users. terms such as “catalog,” “reference,” and “research guides” are still difficult for users to understand. as conrad states (2019), “students are not predisposed to think of a ‘research guide’ as a useful tool to help them get started.”8 a research guide isn’t necessarily a self-explanatory term. in many ways, the phrase is ambiguous. augustine’s case study in 2002 had similar difficulties. students “lack of understanding and awareness of library resources impacted their ability more than the organization of the site did.”9 it’s unsettling to know that our own terminology has been deterring users from accessing library resources for decades. librarians use library terminology to such an extent that it’s part of our everyday language, but what is common knowledge to us may be completely alien to our very own audience. not only should libraries be aware of confusing library terms, but content should also not overwhelm the user with an abundance of information. most students who visit the library are looking for something specific and easy to find. it’s important for librarians to condense their information on guides or website pages to not frustrate the user or make them search elsewhere, like google. “students scan. . . rather than [read] material.” 10 this is also something that has been noted from our crazy egg statistics. heatmaps of our website’s pages prove that users are not scrolling to the bottom of the pages. this also applies to the use of large images, or unnecessary flashy or colorful content that covers most of the desktop or mobile screen. these images should be reduced in size so that users can find information swiftly. for this reason, any large design on the homepage should also be included in menu links, in case large flashy content is ignored.11 the search box is also another fundamental element i analyzed. in this case study, our search box was the catalog search box for ex libris primo. if a page, particularly the homepage, has two search boxes—search-all and catalog search—the user can be confused. search boxes are primarily placed at the center of the page. depending on how these search boxes are labeled and identified, users may not know which one to use. students approach library search boxes as if searching google.12 in our case, neither the current website nor the prototype has a general search-all box. we have a catalog search box placed on the top center of the homepage for both sites. if we were to add a general search-all box, it would be placed away from the catalog search box and preferably in the header where it is visible in all pages. methodology the usability study was conducted by the author, the web services librarian at california state university, stanislaus, who also worked with a computer science instructor in order to recruit participants. not only is the university library redesigning its website, but the university library building is also undergoing a physical renovation. due to this project, the library has relocated to the library annex, a collection of modular buildings providing library services to the campus community. the usability study was conducted in a quiet study room in one of these modular sites. i reserved this study area and borrowed eight laptops for the sessions. the usability study employed two different methods to get students to participate. the first offered an extra credit incentive, which was offered when i collaborated with the computer science instructor. this instructor was teaching a course on human-centered design for websites. she offered her students an extra credit incentive, since several of her learning objectives centered on website design and usability studies. the second approach was an informal one. this approach information technology and libraries december 2020 navigation design and library terminology | ochoa 4 was promoted by scouting students who were already at the library annex during the usability study scheduled sessions. this enabled students to participate without having to sign-up or remember to participate. the students were recruited in-person during the usability session and through flyers posted in study rooms on the days of the study. an incentive of snacks for students to take home was also included. i created questions and seven tasks to be handed out to the participants during the study. the tasks were created to test the navigation design of the main menu and content on the homepage. i also added a task to test the research skills of the student. after these tasks, students were asked to rate the ease of access, answer questions about their experience navigating the prototype and to provide feedback. all students were given the same tasks, however if the student was taking the human-centered design course, they were also given specific web design questions for feedback (see appendices a and b). the tasks were piloted before the study with three library student workers who provided feedback on how to better word the tasks for students. the following tasks are the final seven tasks used for the usability study: 1. find research help on citing legal documents—a california statute— in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stan state campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5. you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, oregon. you need to print it, email it to yourself, and you also need the article cited. the usability study sessions took place from 11am to 2pm on february 10, 12, and 14, 2020. these days and times were chosen because the snack incentive would attract students during lunch hour and i wanted to accommodate the start and end times of the human-centered design course on mondays, wednesdays, and fridays. the total time it took for students to complete the 7 tasks averaged 15 minutes. in total, there were 38 student participants. the student’s experience was recorded anonymously. i asked students to provide their academic year and major. students ranged from freshman (5), sophomore (2), junior (12), senior (17), graduate (1), and unknown (1). areas of study included computer science (16), criminal justice (2), business (2), psychology (3), communications (1), sociology (1), english (3), nursing (1), spanish (1), biology (3), geology (1), history (2), math (1), gender studies (1), and undeclared (1). the subject tested was the library website prototype created and executed using a springshare llc product, libguides cms. the tools i used were eight laptops and a screen recording tool, snagit. snagit is a recording tool made accessible through a campus subscription. the laptops were borrowed from the library for the duration of the sessions. during the session, students navigated and completed the tasks on their own with no direct interference, including no direct observations. i planned to create a space where my presence didn’t directly influence or intimidate their experience with the website. my findings were based solely on their written responses and screen recordings. i also explained to the students that their screen recorded video information technology and libraries december 2020 navigation design and library terminology | ochoa 5 will not be linked to their identity, since they had to sign-in to the laptop using their campus student id. i did, however, occasionally walk around the tables in the room in case a student was navigating the current website or using a separate site to complete the tasks. once the students completed the tasks and answered the questions, i collected the handouts and the screen -capture videos by copying them to a flash drive. limitations during the usability study session, there were two technical issues that hindered the initial process. on the first day, there were difficulties accessing the campus wi-fi in the room as well as difficulties accessing the snagit video recording application. this limitation affected some of the students' experiences and feedback. these issues were resolved and not present on the second and third day of the study. results and observations the results and observations collected from this study mirror results from the studies conducted by azadbakht and swanson.13 i found that students searched the catalog search box for library collections, citations, and other library terms they didn’t understand, even though it was a catalog search box with the keywords “find articles, books, and other materials” labeled in the search bar. another finding was that the navigation design can detrimentally affect a user's experience with the website. mixed reviews were received from utilizing the audience-based navigation design. the study also found that students are adept at finding research materials. for example, most students knew how to search, find, print, email, and cite an article. students in general are also familiar with book requests, ill accounts, and filling out book request webforms. this indicates that, in terms of utilizing library services, students are well aware of how to find, request, and acquire resources, using the website on their own. what was most difficult for students was interpreting library terminology. this was explicitly shown in their attempts to complete tasks 1 and 6: finding how to cite a legal document in apa style and finding information on special collections and the university archives. the following results and observations are divided into three categories: written responses, video recording observations, and data collected. data was collected based on observations from the video recording and the written responses. data was then input into eight separate charts. written responses observations comments from both non-human-centered website design students and human-centered design students included mixed reviews on the navigation layout, overall positive outlook on the page layout design, suggestions to add a search-all “search bar,” and frustrations with tasks 1 and 6. video recording observations the ex libris primo search box was constantly mistaken as a search-all search box. this occurred during students’ search for tasks 1 and 6: citation help and university archives, respectively. students also used the research guides search box in libguides as a search-all search box. students found the citation style guides easily because of this feature, however on the proposed new website, it was difficult to find citation help. students were also using research guides to complete other tasks, such as task 6. a search bar for the entire website was continuously mentioned as a solution from student participants. information technology and libraries december 2020 navigation design and library terminology | ochoa 6 tasks 2 and 3, regarding library hours and study spaces, were easily completed. tasks 4 and 5 were also easily accessible. after completing task 4 (book request form) it was easier for participants to complete task 5 (thesis submission form) because both tasks required students to search the top main navigation menu. to complete task 4, several students immediately signed-in to their ill account or login to primo for csu+, which was expected as signing-in to these accounts are alternate modes to request a book. an additional other observation, regarding task 4 is that the confusion revolving the library terms, “call number,” was solved by adding an image reference pointing to the call number in the catalog. the call number image reference was opened several times for assistance. most students completed task 7 (find a research article) but not all students used the catalog search box on the homepage to complete it. several students searched the top main navigation and clicked on the “research help” link. others utilized research guides and the research guides search box on the homepage. a particular unique observation was made by some computer science students. most computer science students were quicker to give up on a task as opposed to non-computer science students. some computer science students did not scroll down when browsing pages. these students failed to complete several tasks because they didn’t scroll down the page after being on the page for less than ten seconds. data collected figure 3. ease of navigation (overall). information technology and libraries december 2020 navigation design and library terminology | ochoa 7 figure 3 illustrates the ease of navigation rating overall from all student participants. students were asked to rate the ease of access of the website (see appendices a and b). other than the keywords “ease of navigation (1 difficult; 10 easy)” students were given the freedom to define what “easy” and “difficult” meant to them individually. the mean for the ease of access rating for all student participants was 7.7. the lowest rating of ease of access was 3 and the highest rating of ease of access was 10. figure 4. ease of navigation (computer science major). figure 4 illustrates the ease of access rating by the student participants based on whether the student was a computer science major or not a computer science major. the lowest ease of access ratings were from computer science majors. overall non-computer science majors had higher ease of access ratings than computer science majors. information technology and libraries december 2020 navigation design and library terminology | ochoa 8 figure 5. ease of navigation (human-centered design). figure 5 illustrates the ease of access rating by the student participants based on whether the student was taking the human-centered design course. the human-centered design students’ learning outcomes include website user-interface design and an assignment on how to create a usability study. similar to patterns found in figure 2, human-centered design students had lower ease of access ratings. figure 6. tasks – status of completion. information technology and libraries december 2020 navigation design and library terminology | ochoa 9 figure 6 illustrates whether a task was completed or not. completion of task was determined by analyzing whether or not the student not only found the page(s) that provided the solution to the task. it was determined that a student did not complete the task if the student was unable to find the page(s) that provided the solution to the task. “not applicable” was determined if th e student did not use the website prototype (e.g., followed a link that led elsewhere or opted to use google search instead). most students completed tasks 2, 3, 4, 5, and 7. the task with most “did not complete” was task 1 , which 64 percent of student participants did not complete. task 6 had neutral completion, 63 percent. 86 percent of students completed tasks 2 and 4, and 90 percent of students completed tasks 3, 5 and 7. it is evident that task 1 was a difficult task to complete, regardless of the stu dent’s area of study. task 1 required students to find apa legal citation help. the terms “apa legal citation” confused users. likewise, for task 6 (special collections), students did not understand what “collections” referred to or where to search them. figure 7. tasks – number of clicks (complete). figure 7 illustrates how many clicks it required students to complete the task. the clicks were separated into three categories: 1-2 clicks, 3-5 clicks, and more than 6 clicks. this figure only illustrates data collected from tasks that were completed. the number of clicks began at the website prototype’s homepage or from the main menu navigation found in the website prototype’s header, when it was evident that the student was starting a new task. tasks 2 and 3 were completed in 1-2 clicks, whereas tasks 1, 4, 5, 6, and 7 required an average of 3-5 clicks. because of experience helping students find articles at the librarian's research help desk, task 7 (find research articles) was expected to require 6+ clicks. task 1 may have a pattern of needing a high number of clicks to complete because it was a generally a difficult task to complete. information technology and libraries december 2020 navigation design and library terminology | ochoa 10 figure 8. tasks – number of clicks (did not complete). figure 8 illustrates how many clicks a student participant made before they decided to skip the task or if they believed they had completed the task. this figure only illustrates data from tasks that were not completed. the clicks were separated into three categories: 1-2 clicks, 3-5 clicks, and more than 6 clicks. the number of clicks began at the website prototypes homepage or from the main menu navigation found in the website prototype’s header, when it was evident that the student was starting a new task. task 1 and 6 show the most patterns in this figure. task 1 (citation help) shows that students generally skipped the task after more than 6 clicks. task 6 (special collections) was generally skipped after 3-6+ clicks. figure 9 illustrates the duration to complete each task. the duration was separated into three categories: 0-1 minutes, 1-3 minutes, or more than 3 minutes. this figure only illustrates data for tasks that were completed. the duration began when the student started a new task. this was determined when it was observed that the students started to use the main menu navigation, or if the student directed their screen back to the website prototype’s homepage. there are parallels between the number of clicks and duration of tasks. for tasks 2, 3, and 5, the duration to complete the task was less than 1 minute. task 5 was a task similar to task 4 (both are forms, linked once on the website), but the duration for task 5 may have averaged lower than the duration of task 4, because task 5 was after task 4. having completed a form before task 5 may have influenced the student’s behavior on searching for forms. tasks 1, 6, and 7 averaged 1-3 minutes to complete. information technology and libraries december 2020 navigation design and library terminology | ochoa 11 figure 9. tasks – question duration (complete) figure 10. tasks – question duration (did not complete) information technology and libraries december 2020 navigation design and library terminology | ochoa 12 figure 10 illustrates the duration of each task that wasn’t completed. the duration was separated into three categories: 0-1 minutes, 1-3 minutes, or more than 3 minutes. this figure only illustrates data for tasks that were not completed. the duration began when the student started a new task. this was determined when it was observed that the students started to use the main menu navigation, or if the student directed their screen back to the website prototype’s homepage. similarly, to observations for figure 7, there are parallels between the number of clicks and duration of tasks. for task 1, the average time before students skipped the task varied, however most students who didn’t complete the task skipped it after more than 3 minutes of trying to complete it. for task 6, the average duration before skipping the task was 1-3 minutes. conclusion and recommendations the purpose of this study was primarily designed to test the user-centered study approach and the navigational redesign of the library website. the results, however, provided the library with a variety of outcomes. based on suggestions and comments on the website prototype navigation design, menus, and page content, there are several elements that will be integrated to help lead the redesign of the library’s website. students found that the navigation design of the website was clear and simple, but also required a “getting used to.” because of this, and due to navigation design literature, it is recommended to design a menu navigation that is a topic-based navigation as opposed to an audience-based navigation. our findings also highlighted the effects of the use of library terms. to make menu links exceptionally user-friendly, it is recommended to utilize clear and common terminology. student participants also voiced that a search-all search box for the website was necessary. this will enable users to access information efficiently. library website developers should also map more than one link to a specific page, especially if the only link to the page is on an image or slideshow. the user-centered usability approach for this case study worked well in collaboration with campus faculty and as an informal recruitment. it provided relevant and much needed data and feedback for the university library. in terms of future usability studies, a heuristic approach may be effective. a heuristic study approach will enable moderators to gather feedback and analysis from library web development experts.14 moreover, the usability study could be conducted over a semester long time and include focus groups to acquire consistent feedback. 15 overall, website usability studies are evolving and require constant improvements and research. information technology and libraries december 2020 navigation design and library terminology | ochoa 13 appendix a major: ___________ year (freshman, sophomore, etc.): ______________ link to site: url please do not use url please complete the following situations. for some of these, you don’t need to actually submit/send, but pretend as if you are. 1. find research help on citing legal documents a california statute in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stan state campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5.you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, oregon. you need to print it, e-mail it yourself, and you also need the article cited. complete the following questions. 1. rate the ease of access of the website (1= really difficult to navigate, 10=eas y to navigate) 1 2 3 4 5 6 7 8 9 10 2. did you ever feel frustrated or confused? if so, during what question? 3. do you think the website provides enough information to answer the above questions? why or why not? information technology and libraries december 2020 navigation design and library terminology | ochoa 14 appendix b cs 3500 major: ___________ year (freshman, sophomore, etc.): ______________ link to site: url please do not use url please complete the following situations. for some of these, you don’t need to actually submit/send, but pretend as if you are. 1. find research help on citing legal documents a california statute in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stan state campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5.you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, oregon. you need to print it, e-mail it yourself, and you also need the article cited. then, complete the following questions. 1. rate the ease of access of the website (1= really difficult to navigate, 10=easy to navigate) 1 2 3 4 5 6 7 8 9 10 2. what did you think of the overall web design? 3. what would you change about the design? please be specific. 4. what did you like about the design? please be specific. information technology and libraries december 2020 navigation design and library terminology | ochoa 15 endnotes 1 mark aaron polger, “student preferences in library website vocabulary,” library philosophy and practice, no. 1 (june 2011): 81, https://digitalcommons.unl.edu/libphilprac/618/. 2 jakob nielsen, “is navigation useful?,” nn/g nielsen norman group, https://www.nngroup.com/articles/is-navigation-useful/. 3 junior tidal, “creating a user-centered library homepage: a case study,” oclc systems & services: international digital library perspectives 28, no. 2 (may 2012): 95, https://doi.org/10.1108/10650751211236631. 4 suzanna conrad and christy stevens, “‘am i on the library website?’: a libguides usability study,” information technology and libraries (online) 38, no. 3 (september 2019): 73, https://doi.org/10.6017/ital.v38i3.10977. 5 eric rogers, “designing a web-based desktop that's easy to navigate,” computers in libraries 20, no. 4 (april 2000): 36, proquest. 6 katie sherwin, “audience-based navigation: 5 reasons to avoid it,” nn/g nielsen norman group, https://www.nngroup.com/articles/audience-based-navigation/. 7 rogers, “designing a web-based desktop that's easy to navigate,” 36. 8 conrad, “‘am i on the library website?’: a libguides usability study,” 71. 9 susan augustine and courtney greene, “discovering how students search a library web site: a usability case study,” college & research libraries 63, no. 4 (july 2002): 358, https://doi.org/10.5860/crl.63.4.354. 10 conrad, “‘am i on the library website?’: a libguides usability study,” 70. 11 kate a. pittsley and sara memmott, “improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides,” information technology and libraries 31, no. 3 (september 2012): 54, https://doi.org/10.6017/ital.v31i3.1880. 12 elena azadbakht, john blair, and lisa jones, “everyone's invited: a website usability study involving multiple library stakeholders,” information technology and libraries 36, no. 4 (december 2017): 43, https://doi.org/10.6017/ital.v36i4.9959. 13 azadbakht, “everyone's invited,” 43; troy a. swanson and jeremy green, “why we are not google: lessons from a library web site usability study,” the journal of academic librarianship 37, no. 3 (february 2011): 226, https://doi.org/10.1016/j.acalib.2011.02.014. 14 laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing ,” information technology and libraries 25, no. 3 (september 2006): 164, https://doi.org/10.6017/ital.v25i3.3348. 15 tidal, “creating a user-centered library homepage: a case study,” 97. https://digitalcommons.unl.edu/libphilprac/618/ https://www.nngroup.com/articles/is-navigation-useful/ https://doi.org/10.1108/10650751211236631 https://doi.org/10.6017/ital.v38i3.10977 https://www.nngroup.com/articles/audience-based-navigation/ https://doi.org/10.5860/crl.63.4.354 https://doi.org/10.6017/ital.v31i3.1880 https://doi.org/10.6017/ital.v36i4.9959 https://doi.org/10.1016/j.acalib.2011.02.014 https://doi.org/10.6017/ital.v25i3.3348 abstract introduction literature review methodology limitations results and observations written responses observations video recording observations data collected conclusion and recommendations appendix a appendix b endnotes from dreamweaver to drupal: a university library website case study jesi buell and mark sandford information technology and libraries | june 2018 118 jesi buell (jbuell@colgate.edu) is instruction and design and web librarian and mark sandford (msandford@colgate.edu) is systems librarian at colgate university, hamilton, new york. abstract in 2016, colgate university libraries began converting their static html website to the drupal platform. this article outlines the process librarians used to complete this project using only in-house resources and minimal funding. for libraries and similar institutions considering the move to a content management system, this case study can provide a starting point and highlight important issues. introduction the literature available on website design and usability is predominantly focused on business or marketing websites. what separates library websites from other informational or commercial websites is the complexity of the information architecture—they contain both intricate informational and transactional functions. website managers need to maintain congruity between many interrelated but disparate tools in a singular interface and navigational system. libraries are also often challenged with finding individuals who possess the appropriate skills to build and maintain a secure, accessible, attractive, and easy-to-use website. in contrast to libraries, commercial companies employ a team of designers, developers, content managers, and specialists to triage internal and external issues. they can also spend months or years perfecting a website and, of course, all these factors have great costs associated with them. given that many commercial websites need a team of highly skilled workers with copious time and funding, how can librarians be expected to give their patrons similar experiences to sites like google? this case study will outline how a small team of librarians completely overhauled their fragmented, dreamweaver-based website to a more secure, organized, and appealing open-source platform with drupal within a tight timeline and very few financial consequences. it includes a timeline of major milestones in the appendix. goals and objectives the first necessity for restructuring the colgate university libraries’ website was building a team that had the skills and knowledge necessary to perform this task. the website overhaul was spearheaded by jesi buell, instructional design and web librarian, and mark sandford, systems librarian. buell has a user experience (ux) design and editing background while sandford has systems, cataloging, and server experience. they were advised by web development committee (wdc) members cindy li, associate director of library technology and digital initiatives, and debbie krahmer, digital learning and media librarian. together, the group understood trends in digital librarianship, the needs of the libraries’ patrons, as well as website and catalog design and mailto:jbuell@colgate.edu mailto:msandford@colgate.edu from dreamweaver to drupal | buell and sandford 119 https://doi.org/10.6017/ital.v37i2.10113 maintenance. the first thing the wdc did was outline its goals and objectives, and this documented weaknesses the group wanted to address with a new website. the wdc identified four main improvements colgate libraries needed to make to the website: improve design colgate libraries’ old website suffered from varied design and language use across pages and various tools (libguides, catalog, etc.). this led to an inconsistent and often frustrating user experience and detracted from the user’s sense of a single, cohesive website. the wdc also wanted to improve and update the aesthetic quality of the website. while many of these changes could have been made with an overhaul of the existing site, the wdc would have still needed to address the underlying cause. responsibility for content was decentralized, and content creation relied too heavily on technical expertise with dreamweaver. further, the ad hoc nature of the content—the product of years of “fitting in” content without a holistic approach—meant that changes to visual style could not be accomplished by changing a single css file. there were far too many exceptions to make changes simply. improve usability the wdc needed to make sure all the webpages were responsive and accessible. a restructuring of layout and information architecture (ia) was also necessary to improve findability of resources. on the old site, some content was hidden behind several layers of links. with no platform to ensure or enforce accessibility standards, website managers had to trust that all content creators were conscious of best practices or, failing that, pages had to be re-edited to improve accessibility. improve content creation and governance a common source of library staff frustration was the authoring experience using dreamweaver. there was no way to track when a webpage was changed or see who had made those changes. situations occurred where content was deleted or changed in error, and no one else knew until a patron discovered a mistake. staff could also mistakenly push out outdated versions of pages. it was not an ideal situation, and it was impossible for an individual (the web librarian) to monitor hundreds of pieces of content for daily changes to check for accuracy. the only other option would be narrow access to only those on the wdc, but that would mean everyone had to wait for the web librarian to push content live, which would also be frustrating. beyond the security and workflow issues, many of the library staff felt uncomfortable adding or editing content because dreamweaver requires some coding knowledge (html, css, javascript). therefore, the group wanted to install a content management system (cms) that provided a wysiwyg (what you see is what you get) content editor so that no coding knowledge would be needed. unite disparate sites (website, blog, and database list) under one updated url on a single secure server colgate libraries’ website functionality suffered from what marshall breeding describes as “a fragmented user experience.”1 the libraries website’s main address was http://exlibris.colgate.edu. however, different tools lived under other urls—one for a blog, another for the database list, yet another still for the mobile site librarians had to maintain information technology and libraries | june 2018 120 because the main website was not responsive. additionally, some portions of the website had been set up on other servers because of various limitations in the windows.net environment and inhouse skills. this was further complicated by the fact that most specialized interactivity or visual components had to be created from scratch by existing staff. the libraries’ blog was on an externally hosted wordpress site, and the database a–z list was on a custom-coded php page. a unified domain would make usage statistics easier to track and analyze. additionally, it would eliminate the need for multiple credentials for the various external sites. custom code, be it in php, .net, or any other language, also needs to be regularly updated as new security vulnerabilities arise.2 moving to a well-maintained cms would help alleviate that burden. by establishing goals and objectives, the wdc had identified that it wanted a cms to help with better governance, easier maintenance, and ways to disperse web maintenance responsibilities across library faculty. it was important to choose a cms platform that offered a wysiwyg editor so that content authoring did not require coding knowledge. additionally, the group wanted to update the site’s aesthetic and navigational designs. the wdc also decided that this was the optimal time to introduce a discovery layer (since all these changes would be one entirely new experience for colgate users) rather than smaller, continual changes that would require users to keep readjusting how they used the website. the backend complexity of updating both the website platform and implementing a discovery layer required abundant and detailed planning. however, while there was a lot of overlap in the preparatory work for implementing the discovery layer as well the cms, this article will focus primarily on the cms. planning after the wdc had detailed goals and objectives, and the proposal to update the libraries’ website platform was accepted by library faculty, the group had to take several steps to plan the implementation. the first steps in planning dealt with analysis. content analysis the web librarian conducted a content analysis of the existing website. using microsoft excel to document the pages and the omni group’s omnigraffle to organize the spreadsheet into a diagram, she cataloged each page and the navigation that connected that page to other pages. this can be extremely laborious but was necessary because some content was inherited from past employees over the course of a decade, and no one knew exactly what content was live on the website. this visual representation allowed for content creators to see redundancy in both content and navigation. it also made it easy for them to identify old content and combine or reorder pages. needs analysis the wdc wanted to make sure it considered more than the content creators’ needs. this group surveyed colgate faculty, staff, and students to learn what they would like to see improved or changed. the web librarian conducted several ux studies with both students and faculty, and this elucidated several key areas in need of improvement. from dreamweaver to drupal | buell and sandford 121 https://doi.org/10.6017/ital.v37i2.10113 peer analysis peer analysis involves thoroughly investigating peer institution’s websites to analyze how they organize both their content and their site navigation. it also gives insight into what other services and tools they provide. it is important to choose institutions similar in size and academic focus. colgate university is a small, liberal arts institution that only serves an undergraduate population, so the libraries would not seek to emulate a large university that serves graduate populations or distance learners. peer analysis is an excellent opportunity to see where a website is not measuring up to other websites as well as to borrow ideas from peers to customize for your specific patrons. evaluating platforms now that the group knew what the libraries had and what the libraries wanted from our web presence, it was time to evaluate the available options. this involved evaluating cms products and discovery layer platforms. the wdc researched different cmss and listed positives and negatives. ultimately, the group determined that drupal best satisfied the majority of colgate’s identified needs. a separate committee was formed to evaluate the major discovery-layer services with the understanding that any option could be integrated into the main website as a search box. budgeting as free, open-source software, drupal does not require a subscription or licensing fee. campus it provided a virtual server for the website at no cost to the libraries. budgeting was organized by the associate director of library technology and digital initiatives and the university librarian. money was set aside in case a consultant or developer was needed, but the web and systems librarians were able to execute the conversion from dreamweaver to drupal without external support. if future development support is needed for specific projects, it can be budgeted for and purchased as needed. the last step was creating a timeline defining achievable goals, ownership (who oversees completing the goal and who needs to be involved with the work), and date of completion. timeline the timeline was outlined as follows: october 2015–january 2016 halfway through the fall 2015 semester, the wdc began to create a proposal for changes to be made to the website. this proposal would be submitted to the university librarian for consideration by december 1. in the meantime, the web librarian completed a content inventory, peer analysis, and ux studies. she also gathered faculty and staff feedback on the current website through suggestion-box commentary, one-on-one interviews, online questionnaires, and anecdotal stories. by the deadline for the proposal, this additional information was condensed and presented to the university librarian. after incorporating suggested changes made by the university librarian, the wdc was able to present both the proposal and results from various studies to the library faculty on january 4, information technology and libraries | june 2018 122 2016. at the end of the meeting, the faculty voted to move forward and adopt the proposed changes. february 2016 february was spent meeting with stakeholders, both internal and external to the libraries, to gather concerns, necessary content, and ideas for improvements. the wdc members shared the responsibility of running these meetings. all members from the following departments were interviewed: research and instruction, borrowing services, acquisitions, library administration, cataloging, government documents, information literacy, special collections and university archives, and the science library. together, the wdc also met with members from it and communications. it was vital that these sessions identify several components. first, what content was important to retain on the new site, and why? the act of justification made stakeholders evaluate whether the information was necessary and useful to the libraries’ users. the wdc also asked the stakeholders to identify changes they wanted to see made to the website. the answers ranged from minor aesthetic tweaks to major navigational overhauls. last, it was important to understand how specific changes might impact workflows and functionality for tools outside colgate libraries’ own website. for example, the wdc had to update information with the communications department so that the libraries’ website would be findable on the university’s app. all the answers the wdc received were compiled into a report, and the web librarian used this information to inform design decisions moving forward. march 2016 while the associate director of library technology and digital initiatives coordinated demos from discovery layer vendors, the wdc also met to choose the final template from three options designed by the web librarian. the web and systems librarians also met to create a list of developers in case assistance was needed in the development of the drupal site. the wdc team researched potential developers and inquired about their pricing. the web librarian began to create wireframe templates of the different types of pages and page components (homepage, hours blocks, blogs, forms, etc.). she also began transferring existing content from the old website to the new website. this process, in addition to the development of new content identified by stakeholders, was to be completed by mid-summer. meanwhile, the systems librarian began to consolidate the external sites under drupal to the extent possible. while libguides lives externally to drupal and maintains its own url that the libraries’ website links out to, he was able to bring the database a–z list, blog, and analytics into the drupal platform. this entailed setting up new content types in drupal to accommodate various functional requirements for the a–z list and assist in creating pages to search for and display database information. from dreamweaver to drupal | buell and sandford 123 https://doi.org/10.6017/ital.v37i2.10113 april–may 2016 drupal allows for various models of permissions and authentication. by default, accounts can be created within the drupal system and roles and permissions assigned to individuals as needed. the ldap (lightweight directory access protocol) module allowed us to tie authentication to university accounts and includes the ability to tie drupal permissions to active directory roles and groups. connecting drupal to the university ldap server required the assistance of it infrastructure staff but was straightforward. it staff provided the connection information for the drupal module’s configuration and created a resource account for the drupal module to use to connect to the ldap service. as currently implemented, the ldap module simply verifies credentials and, if a local drupal account does not exist, creates one for the user. permissions for staff are added to accounts after account creation as needed as a part of the onboarding process. permissions in drupal can be highly granular. since one of the goals of the migration to drupal was to simplify maintenance of the website, the wdc decided to begin with a relatively simple, permissive approach. currently, all library staff can edit any page. because of drupal’s ability to track and revert changes easily, undoing a problematic edit is a simple procedure, and because all changes are tied to an individual login, problems can be addressed through training as needed. the wdc discussed a more fragmented approach that tied editing privileges to specific parts of the site but decided against it. the wdc team felt it was better to begin with the presumption of trustworthiness, expecting staff to only make changes to pages they were personally responsible for. additionally, trying to divide the site into logical pieces, then accounting for the inevitable exceptions, would be complicated and time-consuming. the wdc reserved the right to begin restricting permissions in the future, but thus far this has proven unnecessary. july–august 2016 as the libraries ramped up to the official launch, it was crucial to educate the library faculty and staff so they could become independent back-end content creators. both the web and systems librarians held multiple training sessions for the libraries employees so that everyone felt comfortable both editing and generating content. the associate director of library technology and digital initiatives drafted a campus-wide email announcing the new website and discovery layer at this point. it was sent out a month in advance of the official launch. the new website launched in two parts. the soft launch occurred on august 1, 2016. the web and systems librarians set up a link to the new website on the old site so that users could choose between getting acclimated to the new website or using the tool they were used to in the frantic weeks leading up to the beginning of the semester. august 15, 2016, was the official launch. at this point, the http://exlibris.colgate.edu dreamweaver-based website was retired, and it redirected all traffic heading to the old url to the new drupal-based website at http://cul.colgate.edu. because drupal’s url structure and information architecture differed from the old website, the wdc decided that mapping every page on the old site to the new one would be too time consuming. while it was acknowledged that this may cause some disruption (as it would break existing links), it seemed necessary for keeping the project moving forward. library staff updated all external links possible. the google search operator “inurl” allowed us to identify other sites information technology and libraries | june 2018 124 outside the libraries’ control that pointed to the old website. the wdc reached out to the maintainers of those few sites as appropriate. the biggest risk the libraries took by not redirecting all urls to the correct content was the potential to disrupt faculty who had bookmarked content or had direct urls in course materials. however, the wdc team received very few complaints about the new site, and most users agreed that the improvements to the site far outweighed any temporary inconveniences caused by it. if nothing else, the simplified architecture made finding content easier, so direct links and bookmarks became far less important than they once were. implementation and future steps by strictly following the timeline and working closely together, the web librarian and systems librarian were able to launch colgate libraries’ new website in time for the 2016 fall semester. the wdc team was able to pull off this feat within eight months without spending any extra money. the timeline above only gives a high-level view of the steps the wdc took to accomplish this task. the librarians who worked on this project cannot overemphasize the complexity of this endeavor, especially with a small team. however, a website conversion is feasible with organization, time, and with the online support the drupal community provides (especially the community of libraries on the drupal platform). it is also critical to have in-house personnel that have technical (coding and server-side) knowledge, project management knowledge, and information architecture and design knowledge. the response from incoming and returning students and faculty to the updated look and improved usability of the libraries’ digital content was overwhelmingly positive. following best design practices, in january 2017 more ux testing was conducted with student and teaching faculty participants to gauge their reactions to the new website. 3 users overwhelmingly found the new website to be both more aesthetically pleasing and usable than the old website. on the back end, the libraries’ content is now more secure, responsive, and accessible because the libraries are using a cms. library faculty and staff have been able to add or remove content that they are responsible for, but the website can still maintain a consistent look and feel across all pages. governance has been improved exponentially as library staff have been able to easily and quickly contribute to the website’s content without administrative delays. as the team moves forward, the wdc plans to investigate different advanced drupal tools, implementing an intranet, and better leveraging google analytics. as with all library endeavors, improvement requires continued effort and attention. from dreamweaver to drupal | buell and sandford 125 https://doi.org/10.6017/ital.v37i2.10113 appendix: detailed timeline 1. october 2015 a. began discussion with wdc to create proposal for website changes (web librarian) 2. november–december 2015 a. complete content inventory (web librarian) b. complete peer analysis (web librarian) c. complete ux studies (web librarian) d. gather faculty and staff feedback on current website (web librarian) 3. december 1, 2015 a. submit proposal to change from dreamweaver to drupal to university librarian for consideration and approval (web librarian) 4. january 4, 2016 a. submit revised proposal to library faculty for consideration and approval (web librarian) 5. january 2016 a. set up test drupal site (systems librarian) 6. february 2016 a. complete meetings with departments to gather feedback on concerns, content, and ideas for improvements (library department meetings were split among wdc members) 7. march 2016 a. demo primo, ex libris, and summon for library faculty and staff consideration (associate director of library technology and digital initiatives) b. from three options, choose template for our website (web librarian—approval by the wdc and then the library faculty) c. create list of developers in case we need assistance (web librarian and systems librarian) d. create wireframe templates for homepage (web librarian) e. begin transferring content from old website to new website and create new content with other stakeholders—to be completed by mid-summer (web librarian) f. begin consolidating multifarious external sites under drupal as much as possible (systems librarian) 8. april 2016 a. get drupal working with the ldap (systems librarian) b. agree on permissions and roles for back-end users (systems librarian—with approval by wdc) c. agree on discovery layer choice (associate director of library technology and digital initiatives) d. meet with outside stakeholders—communications, it, administration 9. may 2016 a. integrate discovery layer search (systems librarian) 10. july 2016 a. provide training for library faculty and staff as back-end content creators (web librarian) information technology and libraries | june 2018 126 b. prepare campus-wide email to announce new website and discovery layer with our new url (associate director of library technology and digital initiatives and web librarian) 11. august 1, 2016 a. set up a link on our old site (http://exlibris.colgate.edu) so for two weeks users could choose between using the old interface or start getting acclimated to the new website before the fall semester started (systems librarian) 12. august 15, 2016 a. official launch—we retire our http://exlibris.colgate.edu dreamweaver-based website and redirect all traffic headed to our old url to our new drupal-based website at http://cul.colgate.edu (systems librarian) 13. september–october 2016 a. update and get approval from library faculty for a new web style guide and governance guide (web librarian) 14. january 2017 a. conduct ux studies of students and faculty to see how people are using both the new website and the new discovery layer; gather feedback and ideas for improvement (web librarian) bibliography breeding, marshall. “smarter libraries through technology: strategies for creating a unified web presence.” smart libraries newsletter 36, 11 (november 2016): 1-2. general onefile (accessed august 3, 2017). http://go.galegroup.com/ps/i.do?p=itof&sw=w&v=2.1&it=r&id=gale%7ca471553487. naudi, t. “nearly all websites have serious security vulnerabilities--new research shows.” database and network journal 45, 4 (2015): 25. general onefile (accessed august 3, 2017). http://bi.galegroup.com/essentials/article/gale%7ca427422281. raward, r. “academic library website design principles: development of a checklist.” australian academic & research libraries 32, 2 (2001): 123-36. http://dx.doi.org/10.1080/00048623.2001.10755151 1 marshall breeding, “smarter libraries through technology: strategies for creating a unified web presence,” smart libraries newsletter 36, no. 11 (november 2016): 1–2. general onefile. 2 tamara naudi, “nearly all websites have serious security vulnerabilities—new research shows,” database and network journal 45, no. 4 (2015): 25. general onefile. 3 roslyn raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. http://dx.doi.org/10.1080/00048623.2001.10755151 introduction goals and objectives improve design improve usability improve content creation and governance unite disparate sites (website, blog, and database list) under one updated url on a single secure server planning content analysis needs analysis peer analysis evaluating platforms budgeting timeline october 2015–january 2016 february 2016 march 2016 april–may 2016 july–august 2016 implementation and future steps appendix: detailed timeline bibliography harnessing the power of orcam public libraries leading the way harnessing the power of orcam mary howard information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12637 mary howard (mhoward@sccl.lib.mi.us) is reference librarian library for assistive media and talking books (lamtb) at the st. clair county library, port huron, michigan. © 2020. library for assistive media and talking books services (lamtb) are located at the main branch for the st. clair county’s library system. lamtb facilitates resources and technologies for residents of all ages who have visual, physical, and/or reading limitations that prevent them from using traditional print materials. operating out of port huron, michigan, we encounter many instances where we need to provide assistance above and beyond what a basic library may offer. we host talking book services which provide free players, cassettes, braille titles, and downloads to users who are vision or mobility impaired. we also have a large and stationary kurtzweil reading machine that converts print to speech, video-enhanced magnifiers, large print books. we also provide home delivery service for patrons who are unable to travel to branches. the library has been searching for a more technology-forward focus for our patrons. the state’s talking books center in lansing set up an educational meeting at the library of michigan in 2018 to see a live demonstration of the orcam my eye reader. this was the innovation we were seeking and i was thoroughly impressed with the compact and powerful design of the reader, the ease of use, and the stunningly accurate feedback provided by this ai reading assistive device. users are able to read with minimal setup and total control. orcam readers are lightweight, easily maneuverable assistive technology devices for users who are blind, visually impaired, or have a reading disability, including children, adults and the elderly. the device automatically reads any printed text: newspapers, money, books, menus, labels on consumer products, text on screens, books, or smartphones, etc. the orcam reader will repeat back any text immediately and is fit for all ages and abilities. orcam works with english, spanish, and french languages and can identify money and other business and household items. it can be placed near either the left or right ear. users can easily adjust the volume and speed of the read text. it can be to either the left or right temple on your glasses using a magnetic docking device. having a diverse group of users with different needs use the reader as they like is one of the more impressive offerings. changing most settings is normally facilitated with just a finger swipe on the orcam device. the mission of orcam is to develop a "portable, wearable visual system for blind and visually impaired persons, via the use of artificial computer intelligence and augmented reality” by offering these devices to our sight, mobility, or otherwise impaired patrons we open up the world of literacy, discovery and education. some of our users are not able to read in any other fashion and the orcam provides a much-needed boost to their learning profile. we secured a grant from the institute of museum and library services (imls) for the purchase of the readers (cfda 45.310). we also worked with orcam to get lower pricing for these units. normally they retail for $3,500 but we were able to move this to the lower price point of $3,000. we also were awarded a $22,106 improving access to information grant from the library of michigan to fund the entire purchase. without this funding stream we would not have been able to secure the orcam. however, if you have veterans in your service area please contact the company since there is availability for va health coverage for low vision or legally blind veterans who may information technology and libraries september 2020 harnessing the power of orcam | howard 2 qualify to receive an orcam device, fully paid for by the va. please visit https://orcam.com/en/veterans for more information. figure 1. close-up of the orcam device. the grant was initially set to run from september 2019 to september 2020. we purchased six orcam readers for our library users, and they were planned to be rotated among our twelve branches throughout this grant cycle. however, due to the pandemic and out of safety concerns for staff and visitors, our library was closed from march 23 to june 15 and we were only able to offer it to the public at six branches. as of july 14, 2020, we are projecting that we may open to the public in september, but covid-19 issues could halt that. we have had to make arrangements with the grantor to extend the period for the usage of the orcam from september to december. this will make up for some of the lost time and open a path for the other six libraries to have their turn offering the orcam to their patrons. the interesting aspect of this is we now have to take our technology profile even further by offering remote training to prospective orcam users. thankfully, the design and rugged housing for the reader makes it easy to clean and maintain but the social distancing can prove to be intrusive for training. to set up a user you need to be within a foot or two of them and being very close in order to get them used to how the orcam reads. there is a lot of directing involved and close contact with the user and instructor. we will use a work around of providing distance instruction including in-person and remote training. orcam also has a vast array of instructional videos that we will have cued up for users. we have had over 150 https://orcam.com/en/veterans information technology and libraries september 2020 harnessing the power of orcam | howard 3 residents attend presentations, demonstrations, and talks on the orcam. i anticipate that this number will not be achieved for the second round; however, we may be more successful in our online presence since we can add the instruction to our youtube page, offer segments on facebook and other social media and provide film clips for our webpage. the situation has been difficult, but it has opened up lamtb services to think about how we should be working to provide better and more remote service to our users. since we cover over 800 square miles in the county, becoming more adaptable to servicing our patrons has become a paramount area of work for the library. the orcam will bring about a new way of remote training to our patrons, which will bring about more awareness of the reader and how it can be beneficial to users. the st. clair county library system would like to thank the institute of museum and library services for supporting this program. the views, findings, conclusions or recommendations expressed in this article do not necessarily represent those of the institute of museum and library services. expanding and improving our library’s virtual chat service: discovering best practices when demand increases article expanding and improving our library’s virtual chat service discovering best practices when demand increases parker fruehan and diana hellyar information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13117 parker fruehan (fruehanp1@southernct.edu) is assistant librarian, hilton c. buley library, southern connecticut state university. diana hellyar (hellyard1@southernct.edu) is assistant librarian, hilton c. buley library, southern connecticut state university. © 2021. abstract with the onset of the covid-19 pandemic and the ensuing shutdown of the library building for several months, there was a sudden need to adjust how the hilton c. buley library at southern connecticut state university (scsu) delivered its services. overnight, the library’s virtual chat service went from a convenient way to reach a librarian to the primary method by which library patrons contacted the library for help. in this article, the authors will discuss what was learned during this time and how the service has been adjusted to meet user needs. best practices and future improvements will be discussed. background the buley library started using springshare's libchat service in january 2015. the chat service was accessible as a button in the header of all the library webpages, and the wording would change depending on the availability of a librarian. at buley library, the chat service is only staffed by our faculty librarians. there were other chat buttons on various individual libguides for either specific librarians or for the general library chat. chat was monitored at the research & information desk by the librarian on duty. the first librarian of the day would log into the shared chat account on the reference desk computer. while each librarian had their own account, using a shared account meant that the librarians could easily hand off a chat interaction during a shift change. while the reference desk was typically busy, librarians would only receive a small number of chats per day. between 2015 and 2019, the library saw an average of 250 chats per year. due to the low usage, there was little focus on libchat training for librarians. for more complicated questions, librarians would often recommend that chat users call, email, or schedule an in-person appointment. since libchat was only monitored while librarians were at the reference desk, it was easy to let it become a secondary mode of reference interaction, particularly if there was a surge of in-person reference questions at any given time. due to the covid-19 pandemic, the library quickly shifted from mostly in-person to solely online services. suddenly, libchat was the virtual reference desk and the main mode of patron interaction. despite this change in how the library interacted with the campus, there was only a slight increase in chat usage in the first two months of the closure. in april 2020, we started to explore our options with libchat in the hopes of increasing visibility and usage. mailto:fruehanp1@southernct.edu mailto:hellyard1@southernct.edu information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 2 evaluating chat widget options considering technical implementation the publicly accessible chat interface is made available completely within a webpage, requiring no clients, external applications, or plugins to make it functional. springshare calls this component the libchat widget, and provides a prepackaged set of website code necessary to create the chat interface. within the libchat system there are a few options for widget placement and presentation. at the time of writing, springshare offers four widget types in its libchat product: in-page chat, button pop-out, slide-out tab, and floating.1 when the service is offline, the system replaces the chat interface with a link to library faqs and the option to submit a question for follow-up. at buley library, prior to the covid-19 pandemic shutdown, the button pop-out was the main widget type used to enter a chat session (see fig. 1). figure 1. previous library website header with chat pop-out button in upper right-hand corner. the pop-out button works by opening a separate pop-up window with the chat interface. this allows the user to navigate to other pages in the previous window without disconnecting from the session. one challenge to the pop-up window method is that many web browsers block pop-up windows by default, requiring a user to recognize and override this setting. another option used mainly on librarian profiles and subject guides is the in-page chat, which embeds the chat interface directly on an existing webpage. many times, these chat widgets are connected to a particular user rather than the queue monitored by all librarians. the user will interact with the chat operator in this dedicated section of the webpage. if a user navigates to a different page in the same window or tab it will disconnect from the chat session. these widget options are easiest when considering web design expertise and time commitment involved in implementation. both the button pop-out and in-page chat can be accomplished with a user having access to a what you see is what you get, or wysiyg, editor on the webpage and the ability to copy and paste a few lines of html code. it does not require any custom https://www.facebook.com/digitalnyrepozitar/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 18 many elements like the one above could be added, too. see the documentation available at https://schema.org/. a free online tool for testing structured data is available from google at https://search.google.com/structured-data/testing-tool?hl=en. reducing repository website size during transfer the primary goal of this criterion is to measure a reduction of website size which is conducted by the enabled compression of website code parts. this reduction can be achieved by enabling compression methods for html and other file formats when they are transmitted from a server to a client. the tomcat webserver (which is an essential website component for dspace repositories) allows turning on gzip compression and so -called javascript minification. to enable gzip on the tomcat webserver, edit the tomcat’s configuration file “server.xml” located in its home directory. under the tag “” edit a corresponding connector tag so it looks like the following example. changes in the code are shown in bold: the compressablemimetype contains the formats you want to compress. important note: if you deal with https (and corresponding port number 443 instead 8080), you must set the options stated above into the corresponding connector (443), too. otherwise, the compression will be enabled only in simple http (running on port number 8080). javascript minification can be enabled in a “dspace.cfg” configuration file located usually in a dspace home directory (/dspace/config/). change the key value from false to true in the following rows: xmlui.theme.enableminification = true xmlui.theme.enableconcatenation = true setting a canonical link this requirement deals with the presence of a canonical link used by search engines. “a canonical link is included in the html code of a webpage to indicate the original source of content. this markup is used to address seo problems with duplicate content which arise when different pages with different urls contain identical or nearly identical content.”30 the problem with duplicated content can arise, for example, when a webpage is accessible with or without a www prefix in its url or a webpage is accessible via http and https protocols. “for seo purposes, the canonical link shows google and other search engines which url corresponds to the original source of content and should be shown in search results. it is added as a meta tag to every url version of a given webpage and indicates the canonical url.”31 https://schema.org/ https://search.google.com/structured-data/testing-tool?hl=en information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 19 after a necessary customization, insert the following row just below the tag in dspace’s page-structure.xsl file (/dspace/webapps/xmlui/themes/mirage/lib/xsl/core/page-structure.xsl) : https adoption the adoption of https is required for a secure data transfer. this criterion inspects if https is enabled and what quality it displays. https is an essential component that supports website security for sites available via the internet. we pointed out the importance of http adoption in a dspace respiratory interface in our previous research papers.32 firstly, you should prepare a file called the certificate signing request, or csr, that will be used by the certificate authority of your choice to generate the certificate ssl. the process of https configuration on the tomcat webserver (used natively in dspace repositories) is widely described online (for example available at https://www.mulesoft.com/tcat/tomcat-ssl). secondly, you should configure a corresponding connector for https (port 443) in tomcat´s configuration file. we strongly recommend following those instructions and to use dspace instance only with https, among other major security risks, because dealing with simple http has surely a very negative impact on seo final score. google and other search engines strongly prefer websites with https enabled. discussion about the seo issues solving process in the previous subsections, we have offered solutions of selected major seo issues that can be relatively easily resolved in systems based on dspace and its website technologies. however, in practice, it is unrealistic to expect a 100% optimization level and final solutions for all detected problems. therefore, we intentionally did not mark the second state of the system (shown in table 1) as fully optimized but only semi-optimized. some of the issues we detected remain unsolved despite all our efforts. there are several reasons. one of the most important of them is the fact that dspace software, like many complex systems, cannot be easily modified without programming experience. therefore, resolving some complicated issues is beyond the scope of this article. another significant reason is that we lacked knowledge about some issues at the time of writing this paper and therefore we could not solve them. this situation creates an opportunity for further research and proposals for solutions of unsolved issues in this specific area, which the professional public would certainly like to welcome. taken together, it could be said that the changes we have made, helped to objectively increase the average seo score by 59 percent compared to the default installation. all the successfully performed actions improved the search results of our repository and rapidly increased its. we suppose that all related seo actions can affect website traffic. most major issues discussed in this case study were resolved before november 10, 2019. therefore, we prepared an analysis of the repository traffic which involved 30-day period before and after this date (one from october 11 until the change, the other from the change until december 10, 2019). we determined the impact of performed seo actions on website traffic. the results are satisfactory because the number of established relations has significantly raised. the impact of organic search (through google, for example) has increased traffic by 47.67% (from 86 to 127 sessions). the number of sohttps://www.mulesoft.com/tcat/tomcat-ssl information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 20 called referral sessions (sessions initiated from social media and other referral sites) has increased by 193.75% (from 16 to 47 sessions). users spent much more time on the website and viewed more pages on average (an increase of up to 159%). we view the significant traffic increase as a proof that the seo changes we implemented helped to promote use of the digital repository’s content. in the next section, we want to compare the quantitative improvement of seo parameters, which we have been able to achieve to this point, with the results achieved in global testing of worldwide dspace-based repositories by the same set of tests. next, we can easily compare the results gained in the local case study with the current state determined in the worldwide area of dspace repositories. testing seo parameters of worldwide dspace-based repositories there are several thousand digital repositories around the world. most of them (41.1% according to roar registry data and over 39% according to the opendoar registry) are based on dspace software.33 therefore, we also focus our research exclusively on dspace-based repositories in this study. as we have pointed out in the methodology, the second objective of this paper is to briefly describe a current state of seo parameters related to worldwide dspace-based digital repositories. next, we will discuss the comparison of results obtained from the case study and exploration of worldwide repositories. methodology according to the facts stated above, we would like to know more details about the quality of seo parameters related to worldwide repositories running with dspace. we decided to use one of the two most authoritative registries of digital repositories: the registry of open access repositories (roar) and the directory of open access repositories (opendoar). roar is hosted at the university of southampton in the united kingdom and is available online at http://roar.eprints.org/. opendoar is available at https://v2.sherpa.ac.uk/opendoar/. both are quality-assured global directories of academic open access repositories. they “enable the identification, browsing and search for repositories, based on a range of features, such as location, software or type of material held.”34 we decided to utilize the roar registry as the source for a sample list because it is possible to filter systems based on specific criteria. we applied these three filters on march 11, 2020: any country, any repository type, and dspace software. we downloaded the raw data in a text/csv file with 1,977 records. each record had a separate row for each repository. each row has a sequence number and includes many columns with much additional information. only a few columns were necessary for our purpose—the columns marked as “title” and “home_page”. other columns were removed. all changes in the list were performed using microsoft excel. for further evaluation, we selected a random sample from this file. we used a sample size online calculator (available at https://www.calculator.net/sample-size-calculator.html ) to do that. we had set the following values for statistical parameters: http://roar.eprints.org/ https://v2.sherpa.ac.uk/opendoar/ https://www.calculator.net/sample-size-calculator.html information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 21 • population size: 1,977 (the total count of dspace repositories in roar) • confidence level: 95% • margin of error: 10% a sample size of 92 was automatically calculated for these values of statistical parameters. next, we used a random number generating function integrated in excel (randbetween(1,1977)) that generated 92 random numbers from the strictly defined range. each randomly generated number corresponds with the matching row number in the table of repositories downloaded from the roar. we could choose 92 dspace repositories for testing purposes. in this way, objectivity in the selection of the research sample was ensured. we also tested the sample for duplicate entries, to ensure that no repository was selected twice. we had to do so, because the random generating function does not guarantee that only unique integer values will be generated. figure 1 shows the distribution histogram of randomly generated values from 1 to 1,977. figure 1. distribution histogram of randomly generated values. then, we attempted to test each of 92 selected repositories with three audit tools. the results are discussed in the next section. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 22 test results table 2 shows a part of the table with results. this table does not contain any urls or titles to ensure anonymity, however we can provide this information upon request. a second-level domain name is only displayed in each row as well as the corresponding scores gained in tests. the maximum value is 100 points in every case. the rows in the table were sorted by the calculated average score from high to low. many rows are omitted due to the table length (one row for the each from 92 repositories). the last tested repository has a sequence number equal to 65. the repositories with a higher sequence number have no gained score (n/a state), due to inaccessibility. table 2. test results the repository sequence number the first and secondlevel domain name seo site checkup seo checker woorank average 1 econstor.eu 76 65.9 69 70.30 2 datadryad.or g 73 54.9 54 60.63 3 edu.ar 66 54.5 61 60.50 4 cuni.cz 60 52.8 65 59.27 5 edu.co 65 55.5 56 58.83 . . . . . . . . . . . . . . . . . . 65 ac.cn 36 21.7 33 33.23 66 scholarporta l.info n/a n/a n/a n/a . . . . . . . . . . . . . . . . . . 89 org.br n/a n/a n/a n/a 90 mapfig.com n/a n/a n/a n/a 91 edu.ec n/a n/a n/a n/a 92 edu.co n/a n/a n/a n/a average score gained from particular tests 53.47 48.08 49.22 50.26 standard deviation 9.31 9.29 10.27 9.62 median 54 46.7 52 50.90 modus 52 40 54 48.67 the testing process started on march 11, 2020, and finished on april 6, 2020. it took a lot of time, because we were limited by the reuse restrictions (described above) in the audit tools’ free accounts. these restrictions meant that only a few tests could be performed daily even though we used several public ip addresses to speed up the overall testing process. among other items, we identified a startling problem related to nonfunctional repository urls. thirty one out of 92 tested repositories were unavailable between march and april 2020 (in table 2, they are shown with n/a status). on april 6, 2020, at the end of testing period, we tried to test the unavailable systems once again. four of them had become functional, so the final number of information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 23 really tested repositories rose to 65 (out of 92). the remaining 27 (29.35 percent of the total) repositories were still offline or unavailable. it is possible that the urls stated in roar’s records have been out-of-date. n/a values were ignored in all calculations and had no impact on the final average score or other statistical parameters. only 65 fully functional dspace-based worldwide repositories were involved and were used for testing purposes. for better visualization of the partial as well as summarized results, we have decided to use a graph instead of a table. figure 2 shows the results of 65 repositories sorted by an average gained score (from highest to lowest) that was calculated from three partial scores gained in seo site checkup, seo checker and woorank testing tools. so, there are three corresponding partial discrete values (colored dots) shown for each repository in figure 2. the calculated average score for each one is marked in red color. the red dotted line provided the most valuable results for this partial section. figure 2. results of 65 involved repositories. the repositories that gained a higher score (e.g., better seo results) are, logically, situated on the left side of figure 2. on the right side are systems with lower scores. non-functional systems (n/a) are not displayed at all. the underlying frequency distribution graph of average score (the red dotted line in the previous figure) is available in figure 3. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 24 figure 3. underlying frequency distribution graph of average score based on the submitted results manifested in the previous figures we can make the conclusion with a relatively high degree of reliability: the large part of dspace-based repositories registered in roar (over 29%) were unavailable at the time of writing the article. it is alarming, because roar is still considered as an authoritative registry for open access repositories and should not contain any invalid data. an average score of functional repositories, gained during the testing period, is very similar between audit tools: 53.47 points in seo site checkup, 48.08 points in seo checker and 49.22 points in woorank. standard deviations of population are comparable, too. finally, most of the tested repositories (19) gained a score from the interval (55.60) as is shown in fig ure 3; however, the average seo score of all tested dspace-based repositories was only 50.26 points out of 100 (data from march/april 2020), which is an adequate value for a relatively low level of search engine optimization of those systems. results and discussion we have obtained complete insights on the seo parameters of worldwide dspace-based digital repositories in the previous section. now, we can compare this data with the results gained during the case study solving process described above. the situation is briefly pointed out in table 3. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 25 table 3. comparison of fresh installation, semi-optimized installation and average worldwide score seositechechup seo checker woorank total average score calculated improvement (%) fresh dspace installation 58 50.1 32 46.7 100 a reference point semi-optimized state of the institutional repository of the department of mediamatics and cultural heritage 81 78 59 74.66 +59.87 the average score of worldwide dspacebased repositories 53.47 48.08 49.22 50.26 +7.62 based on table 3, it is proposed that the fresh, non-optimized dspace obtained a slightly worse score than the worldwide average. although a few seo issues still remain in our semi-optimized dspace instance, the state of seo parameters is much better than the score gained in any other discussed cases. if we considered a fresh dspace installation as a reference point (100 percent), the improvement level would be shown in the last column of table 3. semi-optimized dspace offers an improvement up to 59.87% compared to fresh (non-optimized) dspace installation. there is no significant difference (up to 7.62%) in seo quality between the worldwide average repository and non-optimized dspace instance. the results they have obtained are very similar. as we have mentioned at the beginning of our paper, a higher score obtained in tests is not the primary objective. the main goal is to improve visibility and the content searchability of digital repositories, as well as to improve their security and ways of promotion through the social/new media. conclusion this study exposed a serious research in the field of digital repositories running dspace software—as the most popular tool for this purpose. we have shown that significant seo improvement of more than 59% can be achieved thanks to a few simple modifications within the dspace configuration and associated used application layers (tomcat webserver, etc.). some of those technical optimization steps can be performed in a relatively simple way, using previously described solving procedures and a wide theoretical background. we have publicly presented the reports and solving explanations of the most common and major seo problems that dspace repositories usually face. this paper is one of the first academic studies to deal with seo issues related to digital repositories, especially those that are running dspace software. we realize that we have not been able to solve all of the identified problems completely. therefore, the following seo issues remain unresolved: information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 26 • h1 heading tags test • h1 coherence test • seo friendly url test • inline css test • page cache test (server-side caching) • cdn usage test • image, javascript, css caching tests • css minification test • url canonicalization test some of these could probably be solved more easily than others; however, the system url cannot be changed without difficulty to be considered as seo friendly. in conclusion, all of the above presents a great opportunity for further discussions and research in this field. the current state of seo parameters related to dspace repositories has been presented as unsatisfactory, as shown in the test results. conclusively, the results of our research indicate that there is a small difference in seo quality between the average results obtained by global, worldwide dspace repositories and the non-optimized installation of dspace v6.3 (the difference is approximately 7% in global repositories’ favor). it seems that the most of these systems are not currently optimized in terms of seo and other technical website parameters. the second major finding indicates that the metadata records stored in the roar are not always accurate and may be incorrect or obsolete. in order to make this finding more objective we must note that the roar’s storage had a major failure, which could lead to the harvesting service failing. (more information about the failure is available at http://roar.eprints.org/.) finally, we recommend periodically re-testing the level of search engine optimization on digital repositories. the “search engine algorithms tend to change often, and new factors are added while outdated or not effective factors are excluded. this is why web developers must check the algorithm changes and adjust their websites in order to not only achieve but also maintain high ranking in search engines.”35 we believe that our work will also contribute to the initiation of cooperation among other experts in order to resolve remaining seo problems. ultimately, we hope that all presented efforts and recommendations will help repository administrators, users, scientists, researchers, teachers as well as students and other members of the general public to find what they need in the virtual spaces like digital repositories more quickly and efficiently. endnotes 1 christos ziakis et al., “important factors for improving google search rank,” future internet 11, no. 2 (january 2019): 2–3, https://doi.org/10.3390/fi11020032. 2 f. insidro aguillo et al., ”comparing university rankings,” scientometrics 85 (february 2010): 243–56, https://doi.org/10.1007/s11192-010-0190-z. 3 ahmad bakeri abu baka and nur leyni, ”webometric study of world class universities websites,” qualitative and quantitative methods in libraries (july 2017): 105–15, http://qqmljournal.net/index.php/qqml/article/view/367; andreas giannakoulopoulos et al., ”academic http://roar.eprints.org/ https://doi.org/10.3390/fi11020032 https://doi.org/10.1007/s11192-010-0190-z http://qqml-journal.net/index.php/qqml/article/view/367 http://qqml-journal.net/index.php/qqml/article/view/367 information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 27 excellence, website quality, seo performance: is there a correlation?” future internet 11, no. 11 (november 2019): 242, https://doi.org/10.3390/fi11110242. 4 dwi budi santoso, “pemanfaatan teknologi search engine optimazion sebagai media untuk meningkatkan popularitas blog wordpress,” dinamik 14, no. 2 (2009): 12–33, https://www.unisbank.ac.id/ojs/index.php/fti1/article/view/100; m. iskandar and d. komara, “application marketing strategy search engine optimization (seo),” iop conference series: materials science and engineering 407 (2018), https://iopscience.iop.org/article/10.1088/1757-899x/407/1/012011/pdf. 5 giannakoulopoulos et al., ”academic excellence, website quality, seo performance.” 6 giannakoulopoulos et al., ”academic excellence, website quality, seo performance.” 7 thomas abrahamson, “life and death on the internet: to web or not to web is no longer a question,” journal of college admission 168 (2000): 6–11. 8 sukhpuneet kaur, kulwant kaur, and parminder kaur, “an empirical performance evaluation of universities website,” international journal of computer applications 146, no. 15 (july 2016): 10–16, https://doi.org/10.5120/ijca2016910922. 9 “best seo software,” g2, last modified 2020, https://www.g2.com/categories/seo. 10 “best seo software.” 11 ziakis et al., “important factors for improving google search rank.” 12 giannakoulopoulos et al., “academic excellence, website quality, seo performance.” 13 joeran beel, bela gipp, and eric wilde, “academic search engine optimization (aseo): optimizing scholarly literature for google scholar & co.,” journal of scholarly publishing 41, no. 2 (january 2010): 176–90, http://dx.doi.org/10.3138/jsp.41.2.176. 14 brian kelly, “majesticseo analysis of russell group university repositories,” uk web focus (blog), august 29, 2012, http://ukwebfocus.wordpress.com/2012/08/29/majesticseoanalysis-of-russell-group-university-repositories/. 15 “opendoar statistics,” jisc, last modified september 2020, https://v2.sherpa.ac.uk/view/repository_visualisations/1.html. 16 si ong quan, “44 best free seo tools (tried & tested),” last modified may 28, 2020, https://ahrefs.com/blog/free-seo-tools/; navneet kaushal, ”top 15 most recommended seo tools,” last modified september 2020, https://www.pagetraffic.com/blog/top-15-mostrecommended-seo-tools/. 17 quan, “44 best free seo tools (tried & tested)”; kaushal, “top 15 most recommended seo tools.” https://doi.org/10.3390/fi11110242 https://www.unisbank.ac.id/ojs/index.php/fti1/article/view/100;%20m https://iopscience.iop.org/article/10.1088/1757-899x/407/1/012011/pdf https://doi.org/10.5120/ijca2016910922 https://www.g2.com/categories/seo http://dx.doi.org/10.3138/jsp.41.2.176 http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-analysis-of-russell-group-university-repositories/ http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-analysis-of-russell-group-university-repositories/ https://v2.sherpa.ac.uk/view/repository_visualisations/1.html https://ahrefs.com/blog/free-seo-tools/ https://www.pagetraffic.com/blog/top-15-most-recommended-seo-tools/ https://www.pagetraffic.com/blog/top-15-most-recommended-seo-tools/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 28 18 “28 top seo site checkup tools,” traffic radius, accessed march 29, 2020, https://trafficradius.com.au/seo-site-checkup-tools/. 19 “28 top seo site checkup tools,” traffic radius. 20 chandan kumar, “13 online tools to analyse website seo for better search ranking,” last modified april 11, 2020, https://geekflare.com/online-tool-to-analyze-seo/#seo-testeronline. 21 “28 top seo site checkup tools,” traffic radius. 22 kumar, “13 online tools to analyse website seo for better search ranking.” 23 “opendoar statistics,” jisc. 24 “dspace 7.0 beta 4 release announcement,” lyrasis, october 13, 2020, https://duraspace.org/dspace-7-0-beta-4-release-announcement/. 25 “the open graph protocol,” ogp, accessed january 25, 2021, https://ogp.me/. 26 “welcome to schema.org,” schema, accessed may 1, 2020, https://schema.org/. 27 “welcome to schema.org.” 28 “understand how structured data works,” google, accessed may 2, 2020, https://developers.google.com/search/docs/guides/intro-structured-data. 29 “understand how structured data works,” google. 30 “canonical tag,” seobility, accessed march 20, 2020, https://www.seobility.net/en/wiki/canonical_tag. 31 “canonical tag,” seobility. 32 matus formanek and martin zaborsky, “web interface security vulnerabilities of european academic repositories,” liber quarterly 27, no. 1 (february 2017): 45–57, http://doi.org/10.18352/lq.10178; matus formanek, vladimir filip, and erika sustekova, “the progress of web security level related to european open access lis repositories between 2016 and 2018,” jlis.it 10, no. 2 (may 2019): 107–15, http://dx.doi.org/10.4403/jlis.it-12545. 33 “opendoar statistics,” jisc. 34 “about opendoar,” jisc, last modified september 2020, https://www.jisc.ac.uk/opendoar. 35 ziakis et al., “important factors for improving google search rank,” 2. https://trafficradius.com.au/seo-site-checkup-tools/ https://geekflare.com/online-tool-to-analyze-seo/#seo-tester-online https://geekflare.com/online-tool-to-analyze-seo/#seo-tester-online https://ogp.me/ https://schema.org/ https://developers.google.com/search/docs/guides/intro-structured-data https://www.seobility.net/en/wiki/canonical_tag http://doi.org/10.18352/lq.10178 http://dx.doi.org/10.4403/jlis.it-12545 https://www.jisc.ac.uk/opendoar abstract introduction and state of art literature review website quality audit tools audit tools selection process the institutional repository of the department of mediamatics and cultural heritage (seo case study) background information initial testing of a clean dspace 6 installation resolving major seo issues title, description, and keywords tags in a website header language declaration google analytics, robots.txt and sitemap implementation enabling connections with social media opengraph protocol integration structured data integration (schema.org) reducing repository website size during transfer setting a canonical link https adoption discussion about the seo issues solving process testing seo parameters of worldwide dspace-based repositories methodology test results results and discussion conclusion endnotes generating collaborative systems for digital libraries | hilera et al. 195 josé r. hilera, carmen pagés, j. javier martínez, j. antonio gutiérrez, and luis de-marcos an evolutive process to convert glossaries into ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. it would be what is normally called a “lightweight” ontology,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, knowledge not contained in the dictionary. this paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the ieee standard glossary of software engineering terminology.7 ■■ ontologies, the semantic web, and libraries within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. this may be observed particularly within the realm of digital libraries, although, as krause asserts, objections to their use have often been raised by the digital library community.8 one of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. the term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a categorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. the term has been widely employed since 2001, when berners-lee et al. envisaged the semantic web, which aims to turn the information stored on the web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 to accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. to do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. technologies for the semantic web have been developed by the world wide web consortium (w3c). the most relevant technologies are rdf (resource description this paper describes a method to generate ontologies from glossaries of terms. the proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). these products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. although this method has been applied to produce an ontology from the “ieee standard glossary of software engineering terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the semantic web. f rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include definitions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relationships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 ontologies belong at this last level. according to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. there are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge representation systems, especially from descriptor thesauri.3 this paper proposes a process for generating a terminological ontology from a dictionary of a specific knowledge domain.4 given the definition offered by neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. if the developed ontology is based exclusively on the josé r. hilera (jose.hilera@uah.es) is professor, carmen pagés (carmina.pages@uah.es) is assistant professor, j. javier martínez (josej.martinez@uah.es) is professor, j. antonio gutiérrez (jantonio.gutierrez@uah.es) is assistant professor, and luis de-marcos (luis.demarcos@uah.es) is professor, department of computer science, faculty of librarianship and documentation, university of alcalá, madrid, spain. 196 information technology and libraries | december 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; software and system testing; software architecture; software development process; software development techniques; and software tools.15 in the glossary, entries are arranged alphabetically. an entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “cm.” if a term has more than one definition, the definitions are numbered. in most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. examples, notes, and illustrations have been added to clarify selected definitions. cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different meaning; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. figure 2 shows an example of one of the definitions of the glossary terms. note that definitions can also include framework),10 which defines a common data model to specify metadata, and owl (ontology web language),11 which is a new markup language for publishing and sharing data using web ontologies. more recently, the w3c has presented a proposal for a new rdf-based markup system that will be especially useful in the context of libraries. it is called skos (simple knowledge organization system), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabularies.12 the emergence of the semantic web has created great interest within librarianship because of the new possibilities it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ils opacs.13 for that reason, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have compatible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like rdf. there are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. for example, folksonomies, though originating within the web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs.”14 authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be combined with other ontologies used in information systems running in both conventional and digital libraries for indexing as well as for supporting document searches. there are examples of glossaries that have been transformed into ontologies, such as the cambridge healthtech institute’s “pharmaceutical ontologies glossary and taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ ieee standard glossary of software engineering terminology to demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. the glossary, available in electronic format (pdf), defines approximately 1,300 terms in the domain of software engineering (figure 1). topics include addressing assembling, compiling, linking, loading; computer performance evaluation; figure 1. cover of the glossary document generating collaborative systems for digital libraries | hilera et al. 197 4. define the classes and the class hierarchy 5. define the properties of classes (slots) 6. define the facets of the slots 7. create instances as outlined in the introduction, the ontology developed using our method is a terminological one. therefore we can ignore the first two steps in noy’s and mcguinness’ process as the concepts of the ontology coincide with the terms of the glossary used. any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different methods. in our case, since the ontology has a terminological character, we have established an incremental development process that supposes the natural evolution of the glossary from its original format (dictionary or vocabulary format) into an ontology. the proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. the intermediate products and the final examples associated with the described concept. in the resulting ontology, the examples were included as instances of the corresponding class. in figure 2, it can be seen that the definition refers to another glossary on programming languages (std 610.13), which is a part of the series of dictionaries related to computer science (“ieee std 610,” figure 3). other glossaries which are mentioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. to avoid redundant definitions and possible inconsistencies, links must be implemented between ontologies developed from those glossaries that include common concepts. the ontology generation process presented in this paper is meant to allow for integration with other ontologies that will be developed in the future from the other glossaries. in addition to the explicit references to other terms within the glossary and to terms from other glossaries, the textual definition of a concept also has implicit references to other terms. for example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. these relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ ontology development process many ontology development methods presuppose a life cycle and suggest technologies to apply during the process of developing an ontology.16 the method described by noy and mcguinness is helpful when beginning this process for the first time.17 they establish a seven-step process: 1. determine the domain and scope of the ontology 2. consider reusing existing ontologies 3. enumerate important terms in the ontology figure 2. example of term definition in the ieee glossary figure 3. ieee computer science glossaries 610—standard dictionary of computer terminology 610.1—standard glossary of mathematics of computing terminology 610.2—standard glossary of computer applications terminology 610.3—standard glossary of modeling and simulation terminology 610.4—standard glossary of image processing terminology 610.5—standard glossary of data management terminology 610.6—standard glossary of computer graphics terminology 610.7—standard glossary of computer networking terminology 610.8—standard glossary of artificial intelligence terminology 610.9—standard glossary of computer security and privacy terminology 610.10—standard glossary of computer hardware terminology 610.11—standard glossary of theory of computation terminology 610.12—standard glossary of software engineering terminology 610.13—standard glossary of computer languages terminology high order language (hol). a programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program statement. examples include ada, cobol, fortran, algol, pascal. syn: high level language; higher order language; third generation language. contrast with: assembly language; fifth generation language; fourth generation language; machine language. note: specific languages are defined in p610.13 198 information technology and libraries | december 2010 since there are terms with different meanings (up to five in some cases) in the ieee glossary of software engineering terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. for example, there are five different definitions for the term test, which is why there are five concepts (test1–test5), corresponding to the five meanings of the term: (1) an activity in which a system or component is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) to conduct an activity as in (1); (3) a set of one or more test cases; (4) a set of one or more test procedures; (5) a set of one or more test cases and procedures. taxonomy the proposed lifecycle establishes a stage for the conversion of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in xml format; a taxonomy, which reflects the hierarchic relationships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the thesaurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 the following paragraphs describe the way each of these products is obtained. dictionary the first step of the proposed development process consists of the creation of a dictionary in xml format with all the terms included in the ieee standard glossary of software engineering terminology and their related definitions. this activity is particularly mechanical and does not need human intervention as it is basically a transformation of the glossary from its original format (pdf) into a format better suited to the development process. all formats considered for the dictionary are based on xml, and specifically on rdf and rdf schema. in the end, we decided to work with the standards daml+oil and owl,19 though we are not opposed to working with other languages, such as skos or xmi,20 in the future. (in the latter case, it would be possible to model the intermediate products and the ontology in uml graphic models stored in xml files.)21 in our project, the design and implementation of all products has been made using an ontology editor. we have used oiled (with oilviz plugin) as editor, both because of its simplicity and because it allows the exportation to owl and daml formats. however, with future maintenance and testing in mind, we decided to use protégé (with owl plugin) in the last step of the process, because this is a more flexible environment with extensible modules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. note that the dictionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. figure 4. ontology development process highorderlanguage figure 5. example of dictionary entry generating collaborative systems for digital libraries | hilera et al. 199 example, when analyzing the definition of the term compiler: “(is) a computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that compiler is a subconcept of computer program, which is also included in the glossary.) in addition to the lexical or syntactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the development of the taxonomy. the implementation of the hierarchical relationships among the concepts is made using rdfs:subclassof, regardless of whether the taxonomy is implemented in owl or daml format, since both languages specify this type of relationship in the same way. figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus according to the international organization for standardization (iso), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 this definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a thesaurus. the following is a sample of the lexical units: ■■ descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a concept that can be in documents or in queries to these documents. the iso standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. this note is identified by the abbreviation sn (scope note), as shown in figure 7. ■■ non-descriptors (“non-preferred terms”): the synonyms or quasi-synonyms of a preferred term. a nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropriate descriptor. usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesaurus descriptors into subsets called micro-thesauri. in addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. the more common relationships between lexical units are the following: ■■ equivalence: the relationship between the descriptors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. as gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 in addition, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 this situation is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabularies, taxonomies, and thesauri). this paper emphasizes the importance and usefulness of the intermediate products created at each stage of the evolutive process from glossary to ontology. the end product of the initial stage is a dictionary expressed as xml. the next stage in the evolutive process (figure 4) is the transformation of that dictionary into a taxonomy through the addition of hierarchical relationships between concepts. to do this, it is necessary to undertake a lexicalsemantic analysis of the original glossary. this can be done in a semiautomatic way by applying natural language processing (nlp) techniques, such as those recommended by morales-del-castillo et al.,24 for creating thesauri. the basic processing sequence in linguistic engineering comprises the following steps: (1) incorporate the original documents (in our case the dictionary obtained in the previous stage) into the information system; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate peripheral device—screen, speakers, printer, etc. an important aspect of this process is natural language comprehension. for that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent elements: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, organization and data structuring capabilities). from the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (for 200 information technology and libraries | december 2010 indicating that high order language relates to both assembly and machine languages. the life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. basically, we have to add two types of relationships—equivalence and associative, represented in the standard thesauri with uf (and use) and rt respectively. we will continue using xml to implement this new product. there are different ways of implementing a thesaurus using a language based on xml. for example, matthews et al. proposed a standard rdf format,26 where as hall created an ontology in daml.27 in both cases, the authors modeled the general structure of quasi-synonymous). iso establishes that the abbreviation uf (used for) precedes the nondescriptors linked to a descriptor; and the abbreviation use is used in the opposite case. for example, a thesaurus developed from the ieee glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ hierarchical: a relationship between two descriptors. in the thesaurus one of these descriptors has been defined as superior to the other one. there are no hierarchical relationships between nondescriptors, nor between nondescriptors and descriptors. a descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. according to the iso standard, hierarchy is expressed by means of the abbreviations bt (broader term), to indicate the generic or higher descriptors, and nt (narrower term), to indicate the specific or lower descriptors. the term at the head of the hierarchy to which a term belongs can be included, using the abbreviation tt (top term). figure 7 presents these hierarchical relationships. ■■ associative: a reciprocal relationship that is established between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. it is generally indicated by the abbreviation rt (related term). there are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. it is possible to establish associative relationships between descriptors belonging to the same or different category. the associative relationships can be of very different types. for example, they can represent causality, instrumentation, location, similarity, origin, action, etc. figure 7 shows two associative relations, .. high order language (descriptor) sn a programming language that... uf high level language (no-descriptor) uf third generation language (no-descriptor) tt language bt programming language nt object oriented language nt declarative language rt assembly language (contrast with) rt machine language (contrast with) .. high level language use high order language .. third generation language use high order language .. figure 7. fragment of a thesaurus entry figure 6. example of taxonomy entry ... generating collaborative systems for digital libraries | hilera et al. 201 terms. for example: . or using the glossary notation: . ■■ the rest of the associative relationships (rt) that were included in the thesaurus correspond to the cross-references of the type “contrast with” and “see also” that appear explicitly in the ieee glossary. ■■ neither compound descriptors nor groups of descriptors have been implemented because there is no such structure in the glossary. ontology ding and foo state that “ontology promotes standardization and reusability of information representation through identifying common and shared knowledge. ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 this semantic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. the final stage of the evolutive process is the transformation of the thesaurus created in the previous stage into an ontology. this is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge representation standards (such as dictionaries, taxonomies, and thesauri). for example: ■■ semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ axioms of classes and axioms of properties. these are restriction rules that are declared to be satisfied by elements of ontology. for example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relationships have been implemented as properties. software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. this software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:class or daml:class) and properties (rdf:property or daml:objectproperty). in the first case they proposed five classes: thesaurusobject, concept, topconcept, term, scopenote; and several properties to implement the relations, like hasscopenote (sn), isindicatedby, preferredterm, usedfor (uf), conceptrelation, broaderconcept (bt), narrowerconcept (nt), topofhierarchy (tt) and isrelatedto (rt). recently the w3c has developed the skos specification, created to define knowledge organization schemes. in the case of thesauri, skos includes specific tags, such as skos:concept, skos:scopenote (sn), skos:broader (bt), skos:narrower (nt), skos:related (rt), etc., that are equivalent to those listed in the previous paragraph. our specification does not make any statement about the formal relationship between the class of skos concept schemes and the class of owl ontologies, which will allow different design patterns to be explored for using skos in combination with owl. although any of the above-mentioned formats could be used to implement the thesaurus, given that the endproduct of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. that tool does not need to be modified during the following (final) phase of transformation into an ontology. nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as skos), it is possible to apply a simple xslt transformation to the product. another option would be to integrate a thesaurus ontology, such as the one proposed by hall,28 with the ontology representing the ieee glossary. in the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ only the hierarchical relationships implemented in the taxonomy have been considered. these include relationsips of type “is-a,” that is, generalization relationships or type–subset relationships. relationships that can be included in the thesaurus marked with tt, bt, and nt, like relations of type “part of” (that is, partative relationships) have not been considered. instead of considering them as hierarchical relationships, the final ontology includes the possibility of describing classes as a union of classes. ■■ the relationships of synonymy (uf and use) used to model the cross-references in the ieee glossary (“syn” and “see,” respectively) were implemented as equivalent terms, that is, as equivalent axioms between classes (owl:equivalentclass or daml:sameclassas), with inverse properties to reflect the preference of the 202 information technology and libraries | december 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). the isolated words will then be candidates for a relationship between both of them. (figure 8 shows the candidate properties obtained from the software engineering glossary.) the user then has the option of creating relationships with the identified candidate words. the user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quantification or cardinality (minimum or maximum). after confirming this information, the program updates the file containing the ontology (owl or daml), adding the property to the class that represents the processed term. figure 9 shows an example of the definition of two properties and its application to the class highorderlanguage: a property express with existential quantification over the class datastructure to indicate that a language must represent at least one data structure; and a property translateto of universal type to indicate that any high-level language is translated into machine language (machinelanguage). ■■ results, conclusions, and future work the existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the semantic web and in digital libraries, as well as the reuse of learning objects of the same domain stored in repositories available on the web.30 when a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. it will be necessary to assign its properties according to the concept definition included in the ontology. the user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the web. in that case, however, semantic search engines such as swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as google. the creation of a complete ontology of a knowledge domain is a complex task. in the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology creation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. this paper has described a process for developing a modest but complete ontology from a glossary of terminology, both in owl format and daml+oil format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes figure 8. candidate properties obtained from the linguistic analysis of the software engineering glossary generating collaborative systems for digital libraries | hilera et al. 203 to each term.) we defined 324 properties or relationships between these classes. these are based on a semiautomated linguistic analysis of the glossary content (for example, allow, convert, execute, operatewith, produces, translate, transform, utilize, workin, etc.), which will be refined in future versions. the authors’ aim is to use this ontology, which we have called ontoglose (ontology glossary software engineering), to unify the vocabulary. ontoglose will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engineering from the swebok guide.32 although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. the flexibility that owl permits for ontology description, along with its compatibility with other rdf-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. these representations may eventually be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ acknowledgments this research is co-funded by the spanish ministry of industry, tourism and commerce profit program (grant tsi-020100-2008-23). the authors also want to acknowledge support from the tifyc research group at the university of alcala. references and notes 1. m. dörr et al., state of the art in content standards (amsterdam: ontoweb consortium, 2001). 2. d. soergel, “the rise of ontologies or the reinvention of classification,” journal of the american society for information science 50, no. 12 (1999): 1119–20; a. gilchrist, “thesauri, taxonomies and ontologies—an etymological note,” journal of documentation 59, no. 1 (2003): 7–18. 3. b. j. wielinga et al., “from thesaurus to ontology,” proceedings of the 1st international conference on knowledge capture (new york: acm, 2001): 194–201: j. qin and s. paling, “converting a controlled vocabulary into an ontology: the case of gem,” information research 6 (2001): 2. 4. according to van heijst, schereiber, and wielinga, ontologies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowledge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. g. van heijst, a. t. which is ready to use in the semantic web. as described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. we have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. such improvements will increase the ontology’s utility, but will make it a lessfaithful representation of the ieee glossary from which it was derived. the ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the ieee glossary. (included in this number are the different meanings that the glossary assigns ... figure 9. example of ontology entry 204 information technology and libraries | december 2010 20. w3c, skos; object management group, xml metadata interchange (xmi), 2003, http://www.omg.org/technology/documents/formal/xmi.htm (accessed oct. 5, 2009). 21. uml (unified modeling language) is a standardized general-purpose modeling language (http://www.uml.org). nowadays, different uml plugins for ontologies’ editors exist. these plugins allow working with uml graphic models. also, it is possible to realize the uml models with a case tool, to export them to xml format, and to transform them to the ontology format (for example, owl) using a xslt sheet, as the one published in d. gasevic, “umltoowl: converter from uml to owl,” http://www.sfu.ca/~dgasevic/projects/umltoowl/ (accessed oct. 5, 2009). 22. gilchrist, “thesauri, taxonomies and ontologies.” 23. soergel, “the rise of ontologies or the reinvention of classification.” 24. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” information technology & libraries 28, no. 1 (2009): 22–31. 25. international standards organization, iso 2788:1986 documentation—guidelines for the establishment and development of monolingual thesauri (geneve: international standards organization, 1986). 26. b. m. matthews, k. miller, and m. d. wilson, “a thesaurus interchange format in rdf,” 2002, http://www.w3c.rl.ac .uk/swad/thes_links.htm (accessed feb. 10, 2009). 27. m. hall, “call thesaurus ontology in daml,” dynamics research corporation, 2001, http://orlando.drc.com/daml/ ontology/call-thesaurus (accessed oct. 5, 2009). 28. ibid. 29. y. ding and s. foo, “ontology research and development. part 1—a review of ontology generation,” journal of information science 28, no. 2 (2002): 123–36. see also b. h. kwasnik, “the role of classification in knowledge representation and discover,” library trends 48 (1999): 22–47. 30. s. otón et al., “service oriented architecture for the implementation of distributed repositories of learning objects,” international journal of innovative computing, information & control (2010), forthcoming. 31. o. mendes and a. abran, “software engineering ontology: a development methodology,” metrics news 9 (2004): 68–76; c. calero, f. ruiz, and m. piattini, ontologies for software engineering and software technology (berlin: springer, 2006). 32. ieee, guide to the software engineering body of knowledge (swebok) (los alamitos, calif.: ieee computer society, 2004), http:// www.swebok.org (accessed oct. 5, 2009). schereiber, and b. j. wielinga, “using explicit ontologies in kbs development,” international journal of human & computer studies 46, no. 2/3 (1996): 183–292. 5. r. neches et al., “enabling technology for knowledge sharing,” ai magazine 12, no. 3 (1991): 36–56. 6. o. corcho, f. fernández-lópez, and a. gómez-pérez, “methodologies, tools and languages for buildings ontologies. where is their meeting point?” data & knowledge engineering 46, no. 1 (2003): 41–64. 7. intitute of electrical and electronics engineers (ieee), ieee std 610.12-1990(r2002): ieee standard glossary of software engineering terminology (reaffirmed 2002) (new york: ieee, 2002). 8. j. krause, “semantic heterogeneity: comparing new semantic web approaches with those of digital libraries,” library review 57, no. 3 (2008): 235–48. 9. t. berners-lee, j. hendler, and o. lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43. 10. world wide web consortium (w3c), resource description framework (rdf): concepts and abstract syntax, w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-concepts/ (accessed oct. 5, 2009). 11. world wide web consortium (w3c), web ontology language (owl), 2004, http://www.w3.org/2004/owl (accessed oct. 5, 2009). 12. world wide web consortium (w3c), skos simple knowledge organization system, 2009, http://www.w3.org/ tr/2009/rec-skos-reference-20090818/ (accessed oct. 5, 2009). 13. m. m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (2009): 55-80. 14. l. f. spiteri, “the structure and form of folksonomy tags: the road to the public library catalog,” information technology & libraries 26, no. 3 (2007): 13–25. 15. corcho, fernández-lópez, and gómez-pérez, “methodologies, tools and languages for buildings ontologies.” 16. ieee, ieee std 610.12-1990(r2002). 17. n. f. noy and d. l. mcguinness, “ontology development 101: a guide to creating your first ontology,” 2001, stanford university, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed sept 10, 2010). 18. d. baader et al., the description logic handbook (cambridge: cambridge univ. pr., 2003). 19. world wide web consortium, daml+oil reference description, 2001, http://www.w3.org/tr/daml+oil-reference (accessed oct. 5, 2009); w3c, owl. 5 tails wagging dogs a funny thing happened on the way to the form. in the past decade, many libraries believed they were developing or using automated systems to produce catalog cards, or order slips, or circulation control records. the trauma of aacr2 implementation has helped many to realize belatedly that they have, in fact, been building data bases. libraries must relate their own machine-readable records to each other in a new way as they face new applications. further methods of relating and using records from different libraries, and even different networks, are becoming necessities in our increasingly interdependent world. a narrow view of the process of creating records has often resulted in introduction of nonstandard practices that provide the required immediate result, but create garbage in the data base. in effect, letting the tails wag the dogs. for many years, john kountz and the tesla (technical standards for library automation) committee addressed this issue forcefully, but were as voices in the wilderness. the problems created are the problems of success. the expectations libraries have developed have outstripped their practices. many libraries are only now seriously addressing the practices they have used to create data bases that already contain hundreds of thousands of records. precisely because of its success, the oclc system is a useful case in point. in general, oclc has adhered closely to marc standards. in call number and holding fields, national standards have been late forthcoming, and libraries have often improvised. meeting the procrustean needs of catalog cards has ofttimes blinded libraries to the long-term effects of their practices. multiple subfield codes to force call number "stacking" and omission of periods from lc call numbers are two examples of card-driven practice. not following recommended oclc practice of fully updating the record at each use has created archive tapes requiring significant manual effort to properly reflect library holdings. variant branch cataloging practices create dilemmas. some malpractices have resulted from attempts to beat pricing algorithms . some, like retaining extraneous fields or accepting default options when they are incorrect, merely reflect laziness or shortsighted procedures. while implementing systems in the present, libraries must keep a weather eye to the future. what new requirements will future systems place on records being created today? brian aveney graphs in libraries: a primer | powell et al. 157 james e. powell, daniel alcazar, matthew hopkins, robert olendorf, tamara m. mcmahon, amber wu, and linn collinsgraphs in libraries: a primer answer routine searches is compelling. how, we wonder, can we bring a bit of google to the library world? google harvests vast quantities of data from the web. this data aggregation is obviously complex. how does google make sense of it all so that it can offer searchers the most relevant results? answering this question requires understanding what google is doing, which requires a working knowledge of graph theory. we can then apply these lessons to library systems, make sense of voluminous bibliometric data, and give researchers tools that are as effective for them as google is for web surfers. just as web surfers want to know which sites are most relevant, researchers want to know which of the relevant results are the most reliable, the most influential, and of the highest quality. can quantitative metrics help answer these qualitative questions? the more deeply libraries and librarians can mine relationships between articles and authors and between subjects and institutions, the more reliable are their metrics. suppose some librarians want to compare the relative influence of two authors. they might first look at the authors’ respective number of publications. but are those papers of equally high quality? they might next count all citations to those papers. but are the citing articles of high quality? deeper still, they might assign different weights to each citing article using its own number of citations. at each step, whether realizing it or not, they are applying graph theory. with deeper knowledge of this subject, librarians can embrace complexity and harness it for research tools of powerful simplicity. ■■ pagerank and the global giant graph indexing the web is a massive challenge. the internet is a network of computer hardware resources so complex that no one really knows exactly how it is structured. in fact, researchers have resorted to conducting experiments to discern the structure and size of the internet and its potential vulnerability to attacks. representations of the data collected by these experiments are based on network whenever librarians use semantic web services and standards for representing data, they also generate graphs, whether they intend to or not. graphs are a new data model for libraries and librarians, and they present new opportunities for library services. in this paper we introduce graph theory and explore its real and potential applications in the context of digital libraries. part 1 describes basic concepts in graph theory and how graph theory has been applied by information retrieval systems such as google. part 2 discusses practical applications of graph theory in digital library environments. some of the applications have been prototyped at the los alamos national laboratory research library, others have been described in peer-reviewed journals, and still others are speculative in nature. the paper is intended to serve as a high-level tutorial to graphs in libraries. part 1. introduction to graph theory complexity surrounds us, and in the twenty-first century, our attempts at organization and structure sometimes lead to more complexity. in layman’s terms, complexity refers to problems and objects that have many distinct but interrelated issues or components. there also is an interdisciplinary field referred to as “complex systems,” which investigates emergent properties, such as collective intelligence.1 emergent properties are an embodiment of the old adage “the whole is greater than the sum of its parts.” these are behaviors or characteristics of a system “where the parts don’t give a real sense of the whole.”2 libraries reside at the nexus of these two definitions: they are creators and caretakers of complex data sets (metadata), and they are the source of explicit records of the complex and evolving intellectual and social relationships underlying the evolution of knowledge. digital libraries are complex systems. patrons visit libraries hoping to find some order in complexity or to discover a path to new knowledge. instead, they become the integration point for a complex set of systems as they juggle resource discovery by interacting with multiple systems, either overtly or via federated search, and by contending with multiple vendor sites to retrieve articles of interest. contrast this with google’s simple approach to content discovery: a user enters a few terms in a single box, and google returns a large list of results spanning the internet, placing the most relevant results at the top of this list. no one would suggest using google for all research needs, but its simplicity and recognized ability to james e. powell (jepowell@lanl.gov) is research technologist, daniel a. alcazar (dalcazar@lanl.gov) is professional librarian, matthew hopkins (mfhop@lanl.gov) is library professional, tamara m. mcmahon (tmcmahon@lanl.gov) is library technology professional, amber wu (amber.ponichtera@gmail.com) is graduate research assistant, and linn collins (linn@lanl .gov) is technical project manager, los alamos national laboratory, los alamos, new mexico. robert olendorf (olendorf@unm .edu) is data librarian for science and engineering, university of new mexico libraries, albuquerque, new mexico. 158 information technology and libraries | december 2011 influence a person has in a business context. if we want to analyze this aspect of the network, then it makes sense to consider the fact that some relationships are more influential than others. for example, a relationship with the president of the company is more significant than a relationship with a coworker, since it is a safe assumption that a direct relationship with the company leader will increase influence. so we assign weights to the edges based on who the edge connects to. google does something similar. all the webpages they track have centrality values, but google’s weighting algorithm takes into account the relative importance of the pages that connect to a given resource. the weighting algorithm bases importance on the number of links pointing to a page, not the page’s internal content, which makes it difficult for website authors to manipulate the system and climb the results ladder. so if a given webpage science, also known as graph theory. this is not the same network that ties all the computers on the internet together, though at first glance it is a similar idea. network science is a technique for representing the relationships between components of a complex system.3 it uses graphs, which consist of nodes and edges, to represent these sets of relationships. generally speaking, a node is an actor or object of some sort, and an edge is a relationship or property. in the case of the web, universal resource locators (urls) can be thought of as nodes, and connections between pages can be thought of as links or edges. this may sound familiar because the semantic web is largely built around the idea of graphs, where each pair of nodes with a connecting edge is referred to as a triple. in fact, tim berners-lee refers to the semantic web as the global giant graph—a place where statements of facts about things are published online and distinctly addressable, just as webpages are today.4 the semantic web differs from the traditional web in its use of ontologies that place meaning on the links and in the expectation that nodes are represented by universal resource identifiers (uris) or by literal (string, integer, etc.) values, as shown in figure 1, where the links in a web graph have meaning in the semantic web. semantic web data are a form of graph, so graph analysis techniques can be applied to semantic graphs, just as they are applied to representations of other complex systems, such as social networks, cellular metabolic networks, and ecological food webs. herein lies the secret behind google’s success: google builds a graph representation of the data it collects. these graphs play a large role in determining what users see in response to any given query. google uses a graph analysis technique called eigenvector centrality.5 in essence, google calculates the relative importance of a given webpage as a function of the importance of the pages that point to it. a simpler centrality measure is called degree centrality. degree centrality is simply a count of the number of edges a given node has. in a social network, degree centrality might tell you how many friends a given person has. if a person has edges representing friendship that connect him to seventeen other nodes, representing other people in the network, then his degree value is seventeen (see figure 2). if a person with seventeen friends has more friendship edges than any other person in the network, then he has the highest degree centrality for that network. eigenvector centrality expands on degree centrality. consider a social network that represents the amount of figure 1. a traditional web graph is compared to a corresponding semantic web graph. notice that replacing traditional web links with semantic links facilitates a deeper understanding of how the resources are related. graphs in libraries: a primer | powell et al. 159 networks evidence for the evolution of metabolic processes.7 chemists have used networks to model reactions in a step-wise fashion by “editing” graphs representing models of molecules and their reactivity,8 and they also have used graphs to better comprehend phase transition states, such as the freezing of water or the emergence of superconductivity when a material is cooled.9 economists have used graphs to model market trades and the effects of globalization.10 infectious disease specialists have used networks to model the spread of disease and to evaluate prospective vaccination plans.11 sociologists have modeled the complex interactions of people in communities.12 and in libraries, computer scientists have explored citation networks and coauthorship networks,13 and they have developed maps of science that integrate scientific papers, their topics, the journals in which they appear, and comsumers’ usage patterns to provide a new view of the pursuit of science.14 network science can make complexity more comprehensible by representing a subset of actors and relationships in a complex system as a graph. these has only two edges, it may still rank higher than a more connected page if one of the pages that links to it has a large number of pages pointing to it (see figure 3). this weighted degree centrality measure is eigenvector centrality, and a higher eigenvector centrality score causes a page to show up closer to the top of a google results set. the user never sees a graph, but this graphbased approach to exploring a complex system (the web), works quite well for routine web searches. ■■ graph theory graph theory, also known as network science, has evolved tremendously in the last decade. for example, information scientists have discovered hubs in the web that connect large numbers of pages, and if removed, disconnect large portions of the network.6 biologists have begun to explore cellular processes, such as metabolism, by modeling these processes as networks and have even found in these figure 2. friendship network figure 3. node 2 ranks higher than node 1 because node 3, which connects to node 2, has more incoming links than node 1. node 3 is deemed more important than node 9, which has no incoming links. 160 information technology and libraries | december 2011 as subgraphs, e.g., in the case where a person has two friends who are also mutual friends. small world networks have numerous highly interconnected subgroups called clusters. these may be distributed throughout the network in a regular fashion, with a few random connections that connect the otherwise disconnected clusters. these random links have the effect of greatly reducing the path length between any two nodes and explain the oft-cited six degrees of separation that connect all people to one another. in social networks, agency is often described as the mechanism by which graphs can then be explored visually and mathematically. graphs can be used to represent systems as they are, to extract subsets of these systems, or to discover wholly artificial collections of relationships between components of a speculative system. data also can be represented as graphs when they consist of “measurements that are either of or from a system conceptualized as a network.”15 in short, graphs offer a continuum of techniques for comprehending complexity and are suitable either for a layman with casual interest in a topic or a serious researcher ferreting out discrete details. at the core of network science is the graph. as stated earlier, a graph is a collection of nodes and the edges that connect some of those nodes, together representing a set of actors and relationships in a type of system. relationships can be unidirectional (e.g., in a social network, when the information flows from one person to another) or bidirectional (e.g., when the information flows back and forth between two individuals). relationships also can vary in significance and can be assigned a weight—for example, a person’s relationship to his or her supervisor might be weighted more heavily than a person’s relationship to his or her peers. a graph can consist of a single type of node (for subjects) and a single type of edge connecting those nodes (for predicates). these are called unipartite graphs. from the standpoint of graph theory, these are the easiest types of graphs to work with. graphs that represent two relationships (bipartite) or more are typically reduced to unipartite graphs in the process of exploring them because the vast majority of techniques for evaluating graphs were developed for graphs that address a single relationship between a set of nodes. ■■ global properties of graphs there are other aspects of graphs to consider, sometimes referred to as “global graph properties.”16 there are two basic classes of networks: homogeneous networks and inhomogeneous networks.17 these graphs exhibit characteristics that may not be comprehensible by close examination (e.g., by examining degree centrality, node clustering, or paths within a graph)18 but may be apparent, depending on the size and the way in which the graph is rendered, merely by looking at a visualization of the graph. in homogeneous graphs, nodes have no significant difference between their number of connections. examples include random graphs, complete graphs, and small world networks. in random graphs there is an equal probability that any two nodes will be connected (see figure 4), while in complete graphs (see figure 5) all nodes are connected with one another. random graphs are often used as tools to evaluate networks that describe real systems. complete graphs might occur in social networks figure 4. a random graph. any given node has an equal probability of being linked to any other node figure 5. a complete graph. all nodes are connected to all other nodes graphs in libraries: a primer | powell et al. 161 building blocks of networks.20 a three-node feedback motif is a set of nodes where the edges between them form a triangle and the edges are directional. in other words, node a is connected to (and might convey some information to) node b; node b, in turn, has the same relationship with node c; and node c is connected to (and conveys information back to) node a. in digital libraries, for example, if similar papers exhibit the same pattern of connectivity to a group of subject or keyword categories, motifs will make it possible to readily identify the topical overlap between them. collections of nodes that have a high degree of connectivity with each other are called clusters.21 in many complex systems, clusters are formed by preferential attachment. a group of highly clustered nodes that have low connectivity to the larger graph is known as a clique. while there are other aspects of graphs that can be explored, these four—node centrality measures, paths between nodes, motifs, and clustering—are accessible to most users and are significant in graphs representing bibliographic metadata and textual content. this will become clearer in the examples that follow. ■■ quantitative evaluation of graphs returning now to centrality measures, two of particular interest in digital libraries are degree centrality and betweenness centrality (or flow centrality). an interesting aspect of graphs is that, regardless of the data being represented, centrality measures and clustering characteristics often reveal important clues about the system that the data these random links get established. agency refers to the idea that multiple, often unpredictable actions on the part of individuals in a network result in unanticipated connections between people. examples of such actions are hobbies, past work experience, meeting someone new while on a trip to another country—pretty much anything that takes a person outside his or her normal social circles. in the case of inhomogeneous graphs, not all nodes are created equal. one type, scale-free networks, is common in a variety of systems ranging from biological to technological (see figure 6. these exhibit a structure in which a few nodes play a central role in connecting many others. these hubs form as a result of preferential attachment, known colloquially as “the rich get richer.” researchers became aware of scale-free networks as a result of analysis of the web when it was in its infancy. scale-free networks have been documented in biology, social networks, and technological networks. as a result, they are quite important in the field of information science. small world and scale-free networks are typical of complex systems that occur in nature or evolve because of emergent dynamic processes, in which a system self-organizes over time. small world networks provide fast, reliable communication between nodes, while scale-free networks are more fault tolerant, making them ideal for systems such as living cells, which are frequently challenged by the external environment.19 ■■ local properties of graphs below the ten-thousand-foot system-level view of networks, graphs can be scrutinized more closely using many other techniques. we will now consider four broad categories of local characteristics that describe networks and how they are, or could be, applied in digital libraries: node centrality measures, paths between nodes, motifs, and clustering. centrality measures make it possible to determine the importance of a given node in a network. degree centrality, in its simplest form, is simply a count of the number of edges connected to any given node in a network: a node with high-degree centrality has many connections to other nodes compared to a typical node in the graph. paths make it possible to explore the connections between nodes. an author who is two degrees removed from another author—in other words, the friend of a friend of a friend—has a path length of 2. researchers are often interested specifically in the shortest path between a given pair of nodes. many other types of paths can be explored depending on the type of network, but in libraries, paths that describe the flow of ideas or communication between people are most likely to be useful. motifs are the fundamental recurring structures that make up the larger graph, and they often are called the figure 6. example of a scale-free coauthorship network. a few nodes have many links, and most nodes have few or a single link to another node 162 information technology and libraries | december 2011 path that connects a node through other nodes back to itself. within graph visualization tools, the placement of nodes can vary from one layout to another. what matters is not the pictorial representation (though this can be useful), but the underlying relationships between nodes (the topology). along with clustering, paths help differentiate motifs, which are considered to be building blocks of some types of complex networks. since bibliographic metadata represents communication in one form or another, it is often most common to apply social network theory to graphs. but it is also possible to apply various centrality measures to graphs that are not social and to use these to discover significant nodes within those graphs. in part 2 we consider various unipartite and bipartite graphs that might be especially useful for examining digital library metadata. part 2. graph theory applications in digital libraries library systems, by virtue of the content they contain, are complex systems. fielded searches, faceted searches, and full-text searches all allow users to access aspects of the complex system. fielded searches leverage the explicit structure that has been encoded into the metadata describing the resources that users are ultimately trying to find (articles, books, etc). full-text searches enable users to explore in a more free-form manner, subject of course to the availability of searchable text. often, full-text search means the user is searching titles, abstracts, and other content that summarizes a resource, rather than the actual full text of articles and books. even if the user is searching the full content, there are relationships and aspects of the content that are not readily discernible through a full-text search. furthermore, there is not one single, comprehensive digital library—many library systems live in the deep web, that is, they are databases that are not indexed by search engines like google, and so users must describes, whether it’s coauthorship relationships or protein interactions in the cell of a living organism. often the clusters or nodes that exhibit a higher score in some centrality calculation are significant in some meaningful way compared to other nodes. recall that degree centrality refers to how many edges a given node has. degree centrality can vary significantly in strength depending on the relationships that are represented in the graph. consider a graph of citations between papers. while it may be obvious to humans that the mostly highly cited papers will have the highest-degree centrality, computers have no idea what this means. it is still up to humans to lend a degree of comprehensibility to the raw calculation: in other words, to understand that a paper with high-degree centrality is an important paper, at least among the papers the graph represents. betweenness centrality exposes how integral a given node is to a network. basically, without getting into the mathematics, it measures how often a node falls on the shortest path between other nodes. thus, nodes with high betweenness centrality do not necessarily have a lot of edges, but they bridge disparate clusters. in an informational network, the nodes with high betweenness centrality are crucial to information flow, social connections, or collaborations. hubs are examples of nodes with high betweenness centrality. the removal of a hub causes large portions of a network to become detached. in figure 7, the node labeled “folkner, w.m.” exhibits high betweenness centrality, since it connects two clusters together. a cluster coefficient expresses whether a given node in a network is a member of a tightly interlinked collection of nodes, or clique. the cluster coefficient of an entire graph reveals the overall tendency for clustering in a graph, with higher cluster coefficients typical of small world graphs. in other types of graphs, clusters sometimes manifest as homophily; that is, nodes of a given type are highly interconnected with one another and have few connections with nodes of other types. in social networks, this is sometimes referred to as the “birds of a feather” effect. in a more current reference, the effect was explored as a function of the likelihood that someone would “unfriend” an acquaintance on the social networking site facebook.22 in some networks (such as the internet), clusters are connected by hubs, while in others, the hub is the primary connecting node of other nodes. paths refer to the edges that connect nodes. the simplest case of a path is an edge that connects two nodes directly. path analysis addresses the set of edges that connect two nodes that are not, themselves, directly connected. the shortest path, as its name implies, refers to the route that uses the least number of edges to connect from node a to node b and measures the number of edges, not the linear distance. walks and paths refer to a list of nodes between two nodes, with walks allowing repeat visits to nodes, and paths not allowing them. cycles refer to a figure 7. paths in a coauthorship network graphs in libraries: a primer | powell et al. 163 coauthorship (collaboration) networks coauthorship (collaboration) networks are typically small world networks in which crossand interdisciplinary work provides the random links that connect various clusters (see figure 8). these graphs can be explored to determine which researchers are having the most influence in a given field; influence is a function of frequency of authorship. a prime example is the collaboration network graph for paul erdős, a highly productive mathematician. the popularity of his influence in academia has lead to the creation of the erdős number, which is “defined as indicating the topological distance in the graph depicting the co-authorship relations.”23 liu et al. proposed a node analysis measure that they called authorrank, which establishes weighted directed edges between authors. the author ’s authorrank value is a sum of the weighted edges connected to that author.24 these networks also can be used to explore how an idea spreads and what opportunities may exist for future collaborations, as well as many other existing and potential relationships. citation graphs citation graphs more strongly resemble scale-free networks, in which early papers in a given field tend to accumulate more links. such hub papers can be cited hundreds or even thousands of times while most papers are cited far less often or not at all. many researchers have explored citation graphs, though the person often credited with first noting the network characteristics of citation patterns was dereck j. de solla price in 1965.25 more recently, mark newman introduced the concept of what he calls “first mover advantage” to describe the preferential attachment observed in citation networks.26 search each individually. but if more of these systems adopted semantic web standards, they could be explored as graphs, and relationships between different databases would be easier to discern and represent to the user. many libraries have tried to emulate google by incorporating federated search engines with a single search box as an interface. this copies the form of google’s search engine but not its underlying power. to do that, libraries must enhance full-text searches by drawing on relationships. a full-text search will (hopefully) find relevant papers on a given topic, but a researcher often wants to find the best papers on that topic. to meet that need, libraries must harness the information contained in relationships; otherwise each paper is stuck in a vacuum. cited references are one way to connect papers. for researchers and librarians alike, this is a familiar metric for assessing a paper’s relative importance. the web of science and scopus are two databases that perform this function. looked at another way, citation counts are nothing more than degree centrality applied to a simple graph in which papers are nodes and references are edges. thus, in the framework of graph theory, citation analysis is just a small sliver of a world of possible relationships, many of which are unexplored. the following examples outline use case scenarios in which graph techniques are or could be applied to library data, such as bibliographic metadata, to help users find relationships and conduct research. ■■ informational graphs intrinsic to digital library systems there are multiple relationships represented within and between metadata contained in library systems that can be represented as graphs and explored using graph techniques. some of these, such as citation networks, are among the most well-studied informational networks. citation networks are valued because the data describing them is readily accessible and because scientists studying classes of networks have used them as surrogates for exploring scale-free networks. they are often evaluated as static networks (i.e., a snapshot in time) but some also have dynamic characteristics (e.g., they change and grow over time or they allow information-flow analysis). techniques such as pagerank can be used to evaluate information when the importance of a linking resource is as important as the number of links to a resource. multirelational networks can be developed to explore dynamic processes in research fields by using library data to provide the basic topological framework for some of the explorations. figure 8. a coauthorship network 164 information technology and libraries | december 2011 network with three types of nodes: one to represent individual pieces of debris, a second to represent collections of debris that are the original object that the debris is a fragment of, and a third to represent conjunction events (near misses) between objects. another example of graphs being used as tools is the case of developing vaccination strategies to curtail the spread of an infectious disease.30 in this case, networks have been used to determine that one of the best strategies for curtailing the transmission of a disease is to identify and vaccinate hubs, rather than to conduct mass vaccination campaigns. in libraries, graphs as tools could be used to help researchers identify collaboration opportunities, to disambiguate author identities and aggregate related materials, to allow library staff to evaluate the academic contribution of a group of researchers (bibliometrics), and to explore geospatial and temporal aspects of information, including changes in research focus over time. graphs for author name disambiguation author name disambiguation is a long-standing problem in libraries. many resources have been devoted to manual and automatic name authority control, yet the problem remains unsolved. projects such as oclc viaf and efforts to establish unique author identifiers will no doubt improve the situation, but many problems remain.31 meanwhile, we have experimented with an approach to author name matching by generating multirelational graphs. authors subject–author (expertise) graphs graphs that connect authors by subject areas can vary because of the granularity of subject headings (see figure 9). high-level subject headings tend to function as hubs, but more useful relationships are revealed by specific subject headings and author-provided keywords. the map of science merges publications and citations with actual end user usage patterns captured in library systems and deals, in part, with categories of scientific research.27 it clusters publications and visualizes them “as a journal network that outlines the relationships between various scientific domains.” implicit in this a model is the relationship of authors to subject areas. institution–topic and nation–topic (expertise) graphs from a commercial or geopolitical perspective, graphs that represent institutional or national expertise can reveal valuable information for scientists, elected officials, and investors, particularly in networks that represent the change in a given organization or region’s contributions to a field over time. metadata for scientific papers typically includes enough information to generate nodes and edges describing this. the resulting graph can reveal unexpected details, such as national or institutional efforts to nurture expertise in a given field, and the results of those efforts. the visualization of this data may take the form of icons that vary in shape and size depending on various aspects of nodes in the institution-topic network. these visual representations can then be overlaid onto a map, with various visual aspects of the icons also affected by centrality measures applied to a given institution’s contributions.28 ■■ graphs as tools graph representations can be used as tools to explore a variety of complex systems. even systems that do not initially appear to manifest networks of relationships can often be better understood when some aspect of the system is represented as a graph. this approach requires thinking about what aspects of information needs, discovery, or consumption might be represented or evaluated using networks. two interesting examples from other fields will illustrate the point. a 2009 paper in acta astronautica proposed that techniques to reduce the amount of space junk in orbit around the earth could be evaluated using graph theory techniques.29 the authors propose a dynamic multirelational figure 9. a subject–author graph for stephen hawking graphs in libraries: a primer | powell et al. 165 computation over time because it is typically so important to understanding data. allen’s temporal intervals address machine reasoning over disparate means of recording the temporal aspects of events.33 another temporal computing concept that has applicability to graphs is from the memento project, which makes it possible for users to view prior versions of webpages.34 entities in the memento ontology can become predicates in triples, which in turn can become edges in graphs. using graphs, time can be represented as a relationship between objects or as a distinct object within a graph. nodes that connect through a temporal node may overlap, coincide, or co-occur. nodes that cluster around time represent something important about the objects. genomic-document and proteindocument networks many people hoped that mapping the human genome would result in countless medical advances, but the process whereby genes manifest themselves in living organisms turned out to be much more complex—there wasn’t just a simple mapping between genes and organism traits, there were other processes controlled by genes representing additional layers of complexity scientists had not anticipated. today biologists apply network science to these processes to reveal the missing pieces of this puzzle.35 just as the process itself is complex, the information needs of these researchers benefit from a more sophisticated approach. biologists often need to find papers that reference a given gene or protein sequence. and so, representing these relationships (e.g., article–gene) as graphs has the added benefit of making the digital library research data compatible with the methods that biologists already use to document what they know about these processes. although this is a specialized type of graph, a similar approach might be valuable to researchers in a number of scientific disciplines, including materials science, astrophysics, and environmental sciences. graphs of omission one of the less obvious capabilities of network science is to make predictions about complex systems by looking for missing nodes in graphs.36 this has many applications: for example, identifying a hub in the metabolic processes of bacteria can yield new targets for antibiotics, but it is vital to know that interrupting the enzyme that serves as that hub will effectively kill the organism. making predictions about the evolution of research by identifying areas for cross-disciplinary collaboration or areas where little work has been done—enabling a researcher to leverage are the primary nodes of interest, but relationships such as topic areas, titles, dates, and even soundex representations of names also are represented. as one would expect, phonetically similar names cluster around particular soundex representations. shared coauthorship patterns and shared topic areas can reveal that two different names are for the same author as, for example, when a person’s name changes after marriage (see figure 10). graphs for title or citation deduplication string edit distance involves counting the number of changes that would need to be made to one string to convert it to another, and it is one of the most common approaches to deduplicating titles, citations, and author names. multirelational graphs, in which titles are linked to authors, publication dates, and subjects, result in subgraphs that appear virtually identical when two title variants are represented. centrality measures can be applied to unipartite subgraphs of these networks to home in on areas where data duplication may exist. temporal-topic graphs for analyzing the evolution of knowledge over time a particularly active area of research in graph theory is the representation of dynamical systems as networks. a dynamical system is described as a complex system that changes over time.32 computer scientists have developed various strategies and technologies to cope with figure 10. two authors with similar names linked by subject nodes 166 information technology and libraries | december 2011 basis for an on-the-fly search expansion tool. a querysuggestion tool might look at user-entered terms and determine that some are hubs, then suggest related terms from nodes that connect to those hub nodes. remember, graphs need not be visible to be useful! global subject resolution using dbpedia although dbpedia appears to lag behind wikipedia in terms of completeness and scrutiny by domain experts, it offers one mechanism for unifying user-provided tags, author keywords, and library-assigned subject headings with a graph of known facts about a topic. links into and out of dbpedia’s graphs on a given topic would enable serendipitous knowledge discovery through browsing these semantic graphs. viaf linked author data oclc’s effort to convert tens of millions of identity records into graphs describing various attributes of authors promises to enhance exploration of digital library content on the author dimension.42 these authority records contain a wealth of information, linking name variations, basic genealogical data such as birth and death dates, associations with institutions, subject areas, and titles published by authors. although some rough edges need to be smoothed (one of the authors of this paper discovered that his own authorship data was linked with another author of the same name), iterative refinement of this data as it is actually used may enable crowd-sourced the first-mover advantage and thus advance his or her career—is a valuable service that libraries are well positioned to provide (see figure 11). machine-supplied suggestions offer another type of prediction. for example, providing the prompt “did you mean john smith and climate change?” can leverage real or predicted relationships between author and subject (see figure 12). graphs, in turn, can be used to create tools that will simplify an author–subject search. viral concept detection phase transition typically refers to a process in thermodynamics that describes the point at which a material changes from one state of matter to another (e.g., liquid to solid). phase transition also applies to the dispersal of a new idea. interestingly enough, graphs representing matter at the point of phase transition, and graphs representing the spread of a fad in a social network, exhibit the same recognizable pattern of change: suddenly there are links between many more nodes, there’s a dramatic increase in clustering, and something called a giant component emerges.37 in a giant component, all of the nodes in that portion of the graph are interlinked, resulting in a complete graph like figure 5. this is not so different from what one observes when something “goes viral” on the internet. in a library, a dynamic graph showing the usage of new keywords for emerging subject areas would likely reflect a similar pattern. ■■ linked data graph examples cross-collection graphs, or graphs that link data under your control to data published online, can be constructed by building links into the web of linked data.38 linked data refers to semantic graphs of statements that various organizations publish on the web. for example, geonames.org publishes millions of statements about geographic locations on the linked data web.39 as these graphs grow and evolve, opportunities emerge for using this data in combination with your own data in various ways. for example, it would be quite interesting to develop a network representation of library subject headings and their relationships to concepts in the encyclopedic linked data collection known as dbpedia.40 the resulting graph could be used in a variety of ways: for example, to evaluate the consistency of statements made about concepts, to establish semantic links between user-provided tags and concepts,41 or to function as the figure 11. identifying areas for collaboration: a co-author graph with many simple motifs and few clusters might indicate a field ripe for collaboration graphs in libraries: a primer | powell et al. 167 content could be represented and explored as a graph, and some research has already shown that geographic networks—especially those representing human-constructed entities such as cities and transportation networks—exhibit small world characteristics.45 another way graphs can express geographic relationships in useful ways would be in representing the concept of nearness. waldo tobler’s first law of geography states that “everything is related to everything else, but near things are more related than distant things.”46 in practice, human beings define nearness in different ways, so a graph representing a shared concept of nearness would be very valuable, particularly in exploring works associated with biological, ecological, geological, or evolutionary sciences. graph representations of nearness could be developed by librarians working with scientists and could be the geographic equivalent to subject guides and finding aids. they also might be useful across disciplines and would enable machine inferencing across data that include geographic relationships. still other kinds of graphs what might a digital library tool based on graph theory look like? what could it do? it wouldn’t necessarily depict visualizations of graphs (though in some cases visual graphs are the most efficient way to impart concepts). after all, citation databases utilize graph theory, but the user only sees a number (cite count) and lists of articles (citing or cited). in many cases, then, the tool would perform graph evaluation techniques behind the scenes, translating these metrics into simple descriptive queries for the user. for example, a user interested in the most influential papers in his field would enter his subject, and then on the backend, the tool would apply eigenvector centrality to that subject’s citation graph. if the same user finds an especially relevant article, clicking a “find similar articles” button will produce a list of papers in that graph with the shortest path length to the paper in question. researchers also could use this tool to evaluate authors and institutions in various ways: ■■ is my output diverse or specialized compared to my colleagues? the tool assigns a score for each author based on degree centrality in a subject-author graph. ■■ i want to find potential collaborators. the tool returns authors connected to researcher by the shortest path length in a coauthorship graph. ■■ i want to collaborate with colleagues from other departments at my institution. high betweenness centrality quality control that will more rapidly identify and resolve these problems. linked geographic data using geonames it is ironic that the use of networks to describe geographic aspects of the world is in its infancy, considering that many consider leonhard euler’s attempt to find a mathematical solution to the seven bridges of königsberg problem in 1735 to be the birth of the field.43 as some authors have pointed out, geometric evaluation of geographic relationships is actually a poor way to explore geographic relationships.44 graphs can be used to express arbitrary relationships between geographically separated objects, and it is perhaps no accident that our road and railway systems are in fact among the most familiar graphs that people encounter in the real word. a subway map is a graph where subway stations are nodes linked by railway. graphs can represent the relationships between topological features, the visibility of buildings in a city to one another, or the land, sea, and air transportation that links one country to another. geonames supplies a rich collection of geographic information that includes descriptions of geopolitical entities (cities, states, countries), geophysical features, and various names that have been ascribed to these objects. the geographic relationships in intellectual figure 12. find similar articles: a search for hv reynolds might prompt a suggestion for sd miller, who has a similar authorship pattern 168 information technology and libraries | december 2011 nov. 21, 2007, timbl’s blog, http://dig.csail.mit.edu/bread crumbs/node/215. 5. lawrence page et al., the pagerank citation ranking: bringing order to the web (1999), http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.31.1768. 6. duncan s. callaway et al., “network robustness and fragility: percolation on random graphs,” physical review letters 85, no. 25 (2000): 5468–71. 7. adreas wagner and david a. fell, “the small world inside large metabolic networks,” proceedings of the royal society b: biological sciences 268, no. 1478 (2001): 1803–10. 8. gil benko, christopher flamm, and peter f. stadler, “a graph-based toy model of chemistry,” journal of chemical information and modeling 43, no. 4 (2003): 1085–93. 9. tad hogg, bernardo a. huberman, and colin p. williams, “phase transition and the search problem,” artificial intelligence 81 (1996): 1–15. 10. vladimir boginski, sergiy butenko, and panos m. pardalos, “mining market data: a network approach,” computers & operations research 33, no. 11 (2006): 3171–84. 11. zoltán dezső and albert-lászló barabási, “halting viruses in scale-free networks,” physical review e 65, no. 5 (2002), doi: 10.1103/physreve.65.055103. 12. hans noel and brendan nyhan, “the ‘unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence,” sept. 2010, http://arxiv.org/abs/1009.3243. 13. johan bollen et al., “toward alternative metrics of journal impact: a comparison of download and citation data,” information processing & management 41, no. 6 (2005): 1419–40; xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41, no. 6 (2005): 1462–80. 14. johan bollen et al., “clickstream data yields highresolution maps of science,” ed. alan ruttenberg, plos one 4, no. 3 (3, 2009): e4803. 15. eric kolaczyk, statistical analysis of network data (new york; london: springer, 2009). 16. alejandro cornejo and nancy lynch, “reliably detecting connectivity using local graph traits,” csail technical reports mit-csail-tr-2010–043, 2010, http://hdl.handle .net/1721.1/58484 (accessed feb. 17, 2011). 17. réka albert, hawoong jeong, and albert-lászló barabási, “error and attack tolerance of complex networks,” nature 406, no. 6794 (2000): 378–82. 18. m. e. j. newman, “scientific collaboration networks. ii. shortest paths, weighted networks, and centrality,” physical review e 64, no. 1 (2001), doi: 10.1103/physreve.64.016132. 19. albert, jeong, and barabási, “error and attack tolerance.” 20. r. milo, “network motifs: simple building blocks of complex networks,” science 298, no. 5594 (2002): 824–27. 21. lawrence j. hubert, “some applications of graph theory to clustering,” psychometrika 39, no. 3 (1974): 283–309. 22. noel and nyhan, “the ‘unfriending’ problem.” 23. alexandru balaban and douglas klein, “co-authorship, rational erdős numbers, and resistance distances in graphs,” scientometrics 55, no. 1 (2002): 59–70. 24. liu et al., “co-authorship networks in the digital library research community.” 25. derek j. de solla price, “networks of scientific papers,” in a subject–author graph for that institution may locate potential “bridge” subjects to collaborate in. ■■ i’m leaving my current job. what other institutions are doing similar work? in an institution–subject graph, the shorter the path length between two institutions, the more comparable they may be. graphs also enable libraries to reach outside their own data to build connections with other data sets. heterogeneity, which makes relational database representations of arbitrary relationships difficult or impossible, becomes a trivial matter of adding additional nodes and edges to bridge collections. the linked data web defines simple requirements for establishing just such representations, and libraries are wellpositioned to build these bridges. ■■ conclusion for many centuries, libraries have served as repositories for the accumulated knowledge and creative products of civilization, and they contain mankind’s best efforts at comprehending complexity. this knowledge includes scientific works that strive to understand various aspects of the physical world, many of which are complex and require the efforts of numerous researchers over time. since the advent of the dewey decimal system, librarians have worked on many fronts to make this knowledge discoverable and to assist in its evaluation. qualitative evaluation increasingly requires understanding a resource in a larger context. we suggest that this context is itself a complex system, which would benefit from the modeling and quantitative evaluation techniques that network science has to offer. we believe librarians are well positioned to leverage network science to explore and comprehend emergent properties of complex information environments. as motivation for this pursuit, we offer in closing this prescient quote from carl woese, which though focused on the discipline of biology, is equally applicable to the myriad complexities of modern life: “a society that permits biology to become an engineering discipline, that allows that science to slip into the role of changing the living world without trying to understand it, is a danger to itself.”47 references 1. melanie mitchell, complexity: a guided tour (oxford, england; new york: oxford univ. pr., 2009). 2. carl woese, “a new biology for a new century,” microbiology and molecular biology reviews (june 2004): 173–86, doi: 10.1128/mmbr. 68.2.173–186.2004. 3. national research council (u.s.), network science (washington, d.c.: national academies pr., 2005). 4. tim berners-lee, “giant global graph,” online posting, graphs in libraries: a primer | powell et al. 169 networks,” proceedings of the national academy of sciences of the united states of america 98, no. 2 (jan. 16, 2001): 404–9. 38. chris bizer, richard cyganiak, and tom heath, how to publish linked data on the web? http://sites.wiwiss.fu-berlin.de/ suhl/bizer/pub/linkeddatatutorial/ (accessed feb. 17, 2011). 39. geonames, http://www.geonames.org/ (accessed feb. 17, 2011). 40. dbpedia, http://dbpedia.org/ (accessed february 17, 2011). 41. alexandre passant and phillippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” proceedings of the www 2008 workshop linked data on the web (ldow2008), bejing, apr. 2008, doi: 10.1.1.142.6915. 42. oclc, “viaf”; oclc homepage, http://www.oclc.org/ us/en/default.htm (accessed feb. 17, 2011). 43. norman biggs, graph theory, 1736–1936 (oxford, england; new york: clarendon, 1986). 44. bin jiang, “small world modeling for complex geographic environments,” in complex artificial environments (springer berlin heidelberg, 2006): 259–71, http://dx.doi.org/10.1007/3 -540-29710-3_17. 45. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010), http://www.dlib.org/dlib/november10/byrne/11byrne.html (accessed feb. 17, 2011). 46. daniel sui, “tobler’s first law of geography: a big idea for a small world?” annals of the association of american geographers 94, no. 2 (2004): 269–77. 47. woese, “a new biology for a new century.” science 149, no. 3683 (july 30, 1965): 510–15. 26. m. e. j. newman, “the first-mover advantage in scientific publication,” epl (europhysics letters) 86, no. 6 (2009): 68001. 27. bollen et al., “clickstream data yields high-resolution maps of science.” 28. chaomei chen, jasna kuljis, and ray j. paul, “visualizing latent domain knowledge,” ieee transactions on systems, man and cybernetics, part c (applications and reviews) 31, no. 4 (nov. 2001): 518–29. 29. hugh g. lewis et al., “a new analysis of debris mitigation and removal using networks,” acta astronautica 66, no. 1–2 (2010): 257–68. 30. dezso and barabási, “halting viruses in scale-free networks.” 31. oclc, “viaf (the virtual international authority file) [oclc—activities],” http://www.oclc.org/research/activities/viaf/ (accessed feb. 17, 2011). 32. mitchell, complexity: a guided tour. 33. james f. allen, “toward a general theory of action and time,” artificial intelligence 23, no. 2 (1984): 123–54. 34. herbert van de sompel et al., “memento: timemap apo for web archives,” http://www.mementoweb.org/events/ ia201002/slides/memento_201002_timemap.pdf (accessed feb. 17, 2011). 35. hawoong jeong et al., “lethality and centrality in protein networks,” nature 411 (may 3, 2001): 41–42. 36. aaron clauset, cristopher moore, and m. e. j. newman, “hierarchical structure and the prediction of missing links in networks,” nature 453, no. 7191 (2008): 98–101. 37. m. e. j. newman, “the structure of scientific collaboration librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 129 debra a. riley-huff and julia m. rholes librarians and technology skill acquisition: issues and perspectives qualified individuals to fill these technology-driven librarian roles in our libraries and if so why? how are qualifications acquired and what are are they, besides a moving target? there appears to be two major convergent trends influencing this uncertain phenomenon. the first is what is perceived as “lack of awareness” and consensus about what the core of lis needs to be or to become in order to offer real value in a constantly changing and competitive information landscape.5 the other trend centers on the role of lis education and the continuing questions regarding its direction, efficacy, and ability to prepare future librarians for the modern information professions of now and the future. while changes are apparent it appears many lis programs are still operating on a two-track model of “traditional librarians and information managers” and there are enough questions in this area to warrant further investigation and inquiry.6 ■■ literature review most of the literature pertaining to the readiness of librarians to work in increasingly technical environments, centers on lis education. this certainly makes sense given the assumed qualifications the degree confers. scant literature focuses solely on the core of the librarians’ professional identity, workplace culture, and institutional historical perspectives related to qualifications; however, allusions to “redefining” lis are often found in lis education literature. there is limited research on preprofessional or even professional in-service training although calls for such research have been made repeatedly. a key study on lis education is the 2000 kaliper report, issued when the impact of technology in libraries was clearly reaching saturation.7 the report is the product of an analysis project with a goal of examining new trends in lis education. the report lists six trends including three of which are pertinent to the investigation of technology inclusion in lis programs. these trends note that in 2000, lis programs were beginning to address a more broad range of information problems and environments, programs were increasing it content into the curriculum, and several programs were beginning to offer specializations within the curriculum, though not ones with a heavy technology focus. in a widely cited curriculum study in 2004, markey completed a comprehensive examination of 55 libraries are increasingly searching for and employing librarians with significant technology skill sets. this article reports on a study conducted to determine how well prepared librarians are for their positions in academic libraries, how they acquired their skillss and how difficult they are to hire and retain. the examination entails a close look at ala-accredited lis program technology course offerings and dovetails a dual survey designed to capture experiences and perspectives from practitioners, both library administrators and librarianss who have significant technology roles. a recent oclc report on research libraries, risk, and systemic change discusses what arl directors perceive as the highest risks to their libraries.1 the administrators reported on several high risks in the area of human resources including high-risk conditions in recruitment, training, and job pools. the oclc report notes that recruitment and retention is difficult due to the competitive environment and the reduction in the pool of qualified candidates. why precisely do administrators perceive that there is a scarcity of qualified candidates? changes in libraries, most of which have been brought on by the digital age, are reflected in the need for a stronger technological type of librarianship—not simply because technology is there to be taken advantage of, but because “information” by nature has found its dominion as the supreme commodity perfectly transported on bits. it follows, if information is your profession, you are no longer on paper. that lis is becoming an increasingly technology-driven profession is both recognized and documented. a noted trend particularly in academic libraries is a move away from simply redefining traditional or existing library roles altogether in favor of new and completely redesigned job profiles.2 this trend verifies actions by library administrators who are increasingly seeking librarians with a wider range of information technology (it) skills to meet the demands of users who are accessing information through technology.3 johnson states the need well as we need an integrated understanding of human needs and their relationships to information systems and social structures. we need unifying principles that illuminate the role of information in both computation and cognition, in both communication and community. we need information professionals who can apply these principles to synthesize human-centered and technological perspectives.4 the questions then become, is there a scarcity of debra a. riley-huff (rileyhuf@olemiss.edu) is web services librarian, university of mississippi libraries, university, miss. julia m. rholes (jrholes@olemiss.edu) is dean of libraries, university of mississippi libraries, university, mississippi. 130 information technology and libraries | september 2011 academic libraries had embarked on an unprecedented increase in filling librarian positions with professionals who do not have a master’s degree in library science.13 citing the association of research libraries annual salary statistics, among a variety of positions being filled by other professionals a substantial number are going to those in technology fields such as systems and instructional technology. in the mid 2000s, suggestions that library schools needed to work more closely with computer science departments began coming up more often. obstacles to these types of partnerships were noted as computer science departments failed to see the advantage offered by library science faculty as well as being wary of taking on a “softening” by the inclusion of what is perceived as a “soft science.”14 in response, most library schools have added courses in computing, but many still question the adequacy. more recently there have been increasing calls from within lis for more research into lis education and professional practice. in 2006, a study by mckinney comparing proposed “ala core competencies” to what was actually being taught in ala-accredited curricula, shed some light on what is currently offered in the core of lis education.15 the study found that the core competency required most often in ala-accredited programs were “knowledge organization” or cataloging (94.6 percent), “professional ethics” (80.4 percent), “knowledge dissemination” or reference (73.2 percent), “knowledge inquiry” or research (66.1 percent), and “technical knowledge” or technology foundations (66.1 percent).16 these courses map well to ala core competencies but the question in the digital age, is one, not even universally required, technology-related course adequate for a career in lis? the literature would seem to reflect that it is not. 2007 saw many calls for studies of lis education using methods that not only examined course curricula but that also sought evidence of outcomes by those working in the field.17 an interest in studies reporting on employers’ views, graduates’ workplace experiences, and if possible longitudinal studies have been outwardly requested.18 indications are that those in library work environments can play a vital role in shaping the future course of lis education and preprofessional training by providing targeted research, data, and evidence of where weaknesses are currently being experienced and what changes are driving new scenarios. the most current literature points out both areas of technology deficiencies and emerging opportunities in libraries. areas with an apparent need for immediate improvement are the continuing integration of third-party web 2.0 application programming interfaces (apis) and social networking platforms.19 debates about job titles and labels continue but the actuality is that the number of adequately trained digital librarians has not kept up with the demand.20 modern libraries require those in technology-related roles to have broad or ala-accredited lis programs looking for change between the years 2000 and 2002.8 markey’s study revealed that while there were improvements in the number of it-related courses offered and required throughout programs, they were still limited overall with the emphasis continuing to be on the core curriculum consisting of foundations, reference, organization, and management. one of the important points markey makes is the considerable challenge involved in retraining or acquiring knowledgeable faculty to teach relevant it courses. the focus on lis education issues came to the fore in 2004 when michael gorman released a pair of articles asserting that there was a crisis in lis education, namely an assault on lis by what gorman referred to as “information science,” “information studies” and “information technology.”9 gorman’s papers sought to establish that there is a de facto competition between information science courses, which he characterized as courses with a computational focus and lis courses, which composed core librarianship courses, those tending to be the more user focused and organizational. gorman claimed lis faculty were being marginalized in favor of information science and made further claims regarding gender roles within the profession along the alleged lis/is split. gorman also noted that there was no consensus about how “librarianship” should be defined coming from either ala or the lis graduate programs. the articles were not without controversy, spurring a flurry of discussion in the library community, which spawned several follow up articles. dillon and norris rallied against the library vs. information science argument as a premise, which has no bearing on the reality of what is happening in lis and does nothing but create yet another distracting disagreement over labels.10 others argued for the increasing inclusion of technology courses in lis education, as estabrook put it, librarianship without a strong linkage to technology (and it’s capacity to extend our work) will become a mastodon. technology without reference to the core library principles of information organization and access is deracinated.11 as the future of lis was being hotly debated, voices in the field were issuing warnings that obstacles were being encountered finding qualified librarians with the requisite technology skills necessary to take on new roles in the library. in 2007, johnson made the case for the increasing need for new areas of emphasis in lis, including specializations such as geographic information systems by pointing out that it is not so much the granular training that is expected of lis education but a higher level technology skill set that allows for the ability to move into these specializations, identify what is needed, assess problems, and make decisions.12 in 2006, neal noted that librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 131 by examination of course catalogs and surveys of both library administrators and technology librarians. the lis educational data was obtained by inspecting course catalogs. course catalogs and website curriculum pages from all ala-accredited lis programs in the united states, canada, and puerto rico were examined in december 2009 for the inclusion of technology-related courses. the catalogs examined were for the 2009–10 academic year. spanish and french catalogs were translated. each available course description was reviewed and those courses with a primary technology component were identified. in a secondary examination the selected courses were closely inspected for the exact technology focus and the primary subject content was noted for each course. courses were then separated into categories by areas of focus and tabulated. a targeted survey identified practicing technology librarians’ perspectives on their level of preparation and continuing skill level needs based on actual job demands. in this survey, librarians with significant technology roles was defined as “for the purposes of this survey a librarian with a significant technology role would be any librarian whose job would very likely be considered “it” if they were not in a library and whose job titles contain words like “systems, digital, web, electronic, network, database, automation, and whose job involves maintaining and/or building various it infrastructures.” the survey was posted on various library and library technology electronic discussion lists in december 2009 and was available for two weeks. library administrative perspectives were also gained through a targeted survey aimed at those with an administrative role of department head or higher. the survey was designed to capture the reported experience library administrators have had with librarians in significant technology roles, primarily as it relates to skill levels, availability, hiring, and retention. this survey was posted on to various library administrative and technology discussion lists in december 2009 and was also available for two weeks. both surveys included many similar questions to compare and contrast viewpoints. results were tabulated to form an overarching picture and some relevant comparisons were made. there are limitations and inherent issues with this type of research. catalog examinations when completed by qualified librarians can hold great accuracy; however, the introduction of bias or misinterpretation is always possible.26 when categorizing courses, the authors reviewed course descriptions three separate times to ensure accuracy. courses in doubt were reviewed again with knowledgeable colleagues to obtain a consensus. surveys designed to capture perspectives, views, and experiences are by nature highly subjective and provide data that is both qualitative and quantitative. tabulated data was given strictly simple numerical representation to provide a factual picture of what was reported. specialized competencies in areas such as web development, database design, and management paired with a good working knowledge of classification formats such as xml, marc, ead, rdf and dublin core. educational technology (et) has been identified as an area of expected growth opportunity for libraries and there have been suggestions that more lis programs should partner with et programs to improve lis technology offerings, skills and preprofessional training.21 lis program change, including the apparent coalescing of information technology focused education would appear to be demonstrated by the ischool or ifield caucus of ala accredited programs, however the literature is not clear on if that is actually being evidenced. the ischools organization started in as collective in 2005 with a goal of advancing information science. ischools incorporate a multidisciplinary approach and those with a library science focus are ala accredited.22 a 2009 study interestingly applied abbott’s theoretical framework used in the chaos of disciplines to the ifield.23 resulting in abstract yet relevant conclusions, abbott looks at change in a field through a sociological lens looking for patterns of fractal distinction over time. the study concluded that traditional lis education remained at the heart of the ifield movement and that the real change has been in locale, from libraries to location independent.24 hall’s 2009 study exploring the core of required courses across almost all ala accredited programs reveals that the core curriculum is still principle-centered, but it is focusing less on reference and intermediary activities with a definite shift toward research methods and information technology.25 ■■ method this research study was designed to capture a broad view of technology skill needs, skill availability, and skill acquisition in libraries, while still allowing for some areas of sharper focus on stakeholder perspectives. the four primary stakeholder groups in this study were identified as lis educators, lis students, working librarians, and library administrators. the research questions cover three main areas of technology skill acquisition and employment. one area is lis education and whether the status of all technology course offerings has changed in recent years in response to market demands. the second area is the experience of librarians with significant technology roles with regards to job availability, readiness, and technology skill acquisition. the third area is, the perception of library administrators regarding the availability and readiness of librarians with technology roles. to cover the research questions and provide a broad situational view, the research was triangulated and aimed at the three question areas. data collection was accomplished 132 information technology and libraries | september 2011 may arguably be considered description or cataloging. metadata was included because it is an integral part of many new digital services. the categories are presented in column 1, the total number of courses offered is presented in column 2. the number of advanced courses available within each category total is further broken out into parenthesis. some programs offered more than one course in a given category; hence the percentage of programs offering at least one course is given in column 3. additionally, although the librarian survey was targeted to “those with significant technology roles,” it would appear that the definition of “significant” seemed to vary in interpretation by the respondents. this is discussed in further detail in the findings. given the limitations of this type of research, the authors did not attempt to find definite correlations, however trends and patterns are clearly revealed. ■■ catalog findings course catalogs from all 57 ala-accredited programs in the united states, canada, and puerto rico were examined for the inclusion of technology-related courses. a total of 439 technology-related courses were offered across the 57 lis programs, including certificate program course offerings. the total number of technology-related courses offered by program ranged from 2 to 20. the mean number of courses offered per program was 7.7, the median was 10, and the mode was 4. table 1 shows the total number of technology courses being offered per program by matching them with the number of courses they offer. catalog course content descriptions were analyzed looking for a technology focus. the fifteen categories noted in table 2 were selected as representative of the technology-related courses offered. it is acknowledged that some course content may be overlapping, but each course was placed in only one category based on its primary content. note also the inclusion of “metadata markup” which table 1. number of technology-related courses being offered per program # of programs offering # of courses offered 1 offers 2 courses 6 offer 3 courses 8 offer 4 courses 6 offer 5 courses 7 offer 6 courses 5 offer 7 courses 5 offer 8 courses 1 offer 9 courses 6 offer 10 courses 1 offers 11 courses 3 offer 12 courses 2 offer 13 courses 2 offer 14 courses 1 offers 15 courses 1 offers 17 courses 1 offers 18 courses 1 offers 20 courses table 2. course content description and number of courses offered across all programs. the number of advanced courses in the total is given in parenthesis. course type as categorized by the course content description in the lis program catalog # of courses offered % of programs offering at least 1 course database design, development and maintenance 47 (7) 70 web architecture (web design, development, usability) 52 (11) 68 broad technology survey courses (basics of library technologies and overviews) 50 65 digital libraries 43 (4) 61 systems analysis, server management 49 (6) 60 metadata markup (dc, ead, xml, rdf) 43 (10) 50 digital imaging, audio and video production 33 (5) 47 automation and integrated library systems 21 37 networks 32 (3) 35 human computer interaction 21 (4) 29 instructional technology 12 21 computer programming languages, open source technologies 12 (2) 17 web 2.0 (social networking, virtual reality, third party api’s) 11 17 user it management (microcomputers in libraries) 6 10 geographic information systems 6 (1) 8 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 133 ■■ perspectives on job availability, readiness and skill acquisition as previously noted in the method, two surveys were administered to collect participant viewpoint data pertinent to the study. reponses were carefully checked to determine whether they met the criteria for inclusion in the study. no attempt was made to disqualify respondents based solely on job title. it did appear that a significant number of non-target subjects did initially reply to the librarian survey, but quit the survey at the technology-related questions. final inclusion was based on either an it-related job title or if the respondent answered the technology questions regardless of job title. tables 3–5 report demographic response data. ■■ perspectives on job and candidate availability a 2009 study by matthew and pardue asked the question “what skills do librarians need in today’s world?”29 they sought to answer this question by performing a content analysis, spread over five months, of randomly selected jobs from ala’s joblist. what they found in the area of technology was a significant need for web development, an assessment of the course catalog facts reveals that there have been increases in the number of technology courses offered in lis programs, but is it enough? significant longitudinal data shows an increased emphasis in the area of metadata. a 2008 study of the total number of lis courses offering internet or electronic resources and metadata schemas, found that the number of programs offering such as being ten (17.5 percent) with only twelve metadata courses offered in total.27 current results show 43 metadata courses offered with 50 percent of lis programs offering at least one course. the lack of a solid basis in web 2.0 applications and integration as reported by aharony is confirmed by the current catalog data, with only 17 percent of programs offering a course.28 while at first glance it looks like many technology-related courses are currently being offered in lis programs, a closer inspection reveals cause for concern. many of these courses should be offered by 100 percent of lis programs and advanced courses in many areas should be offered as well. while there may be some overlap of content in some of these course descriptions, the percentages are still too low to deduce that lis graduates, without preprofessional technology experience or education, are really prepared to take on serious technology roles in academic libraries. table 3. response data responses administrative survey librarian survey total responses 185 382 total usable (qualified) 146 227 table 4. respondents institution by size by type administrative survey librarian survey under 5,000 37 72 5,000 10,000 25 31 10,000 15,000 18 28 15,000 20,000 11 20 20,000 25,000 13 21 25,000 30,000 16 13 30,000 35,000 4 11 35,000 40,000 5 9 more than 40,000 12 21 unknown 5 1 table 5. respondent type administrative survey: position # of responses dean, director, university librarian 46 department head 71 manager or other leadership role 29 librarian survey: general area of work # of responses public services 48 systems 42 web services 32 reporting dual roles 31 digital librarian 29 electronic resources librarian 28 emerging/instructional technologies 18 administrative 10 metadata/cataloger 9 technical services 7 distance education librarian 4 demographic data 134 information technology and libraries | september 2011 based on the difficulty rating and the classifications were then averaged by difficulty. some respondents were unsure of difficulty ratings because the searches happened before their presence at their current library and those searches were excluded. position classifications with less than five searches were excluded from averaging and are marked “na” in table 6. the difficulty rubric is as follows: 1 = easy; 2 = not too bad, pretty straightforward search; 3 = a bit tough, the search was protracted; 4 = very difficult, required more than one search; 5 = unable to fill the position. it is to be noted that almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 2.48. a comparable set of questions was posted to the librarian survey. we asked librarians to report professional level technology positions they had held in the past five years along with any current job searches. 164 responses were received by people indicating that they had held such a position or were searching for one, with the total number of positions/searches being reported at 316 with some respondents reporting multiple positions. respondents reported having between one and five different positions with the average number being 1.92 jobs per respondent (see table 7). the respondents were also asked to give the position title for each position held or positions they were applying for as well as the difficulty encountered in obtaining the position. like the administrative report, job titles were project management, systems development, and systems applications. further they suggest that some librarians are using a substantial professional it skills subset. this article’s literature review points out that there are assertions being made that some technology-related librarian positions are difficult to fill and may in fact be filled by non-mls professionals. in the associated surveys the authors sought to capture data related to actual job availability, search experiences and perspectives by both library administration and librarians. note that both mls librarians and a few professional library it staff completed the survey. the distinction is made where appropriate. the survey asked library administrators if they had hired a technology professional position in the past five years. 146 responses were received and 100 respondents indicated that they had conducted such a search, with the total number of searches being reported at 167. of these searches, 22 did not meet the criteria for inclusion due to other missing data such as job title. the total reported number of librarian/professional level technology positions that were posted for hire by these respondents was 145 with some respondents reporting multiple searches for the same or different positions. respondents conducting searches reported having between 1 and 5 searches total with the average number being 1.45 per respondent. the respondents were also asked to provide the position title for each search, the difficulty encountered in conducting the search, and the success rate. job titles were divided into categories to ascertain how many positions in each category reported having a relevant search conducted. each search was then assigned a point value table 6. administrative report on positions open, searches and difficulty of search (n = 145) position classification searches search difficulty systems/ automation librarian 40 2.78 digital librarian 32 2.6 emerging & instructional technology librarian 15 2.53 web services/ development librarian 33 2.51 electronic resources librarian 22 1.95 database manager 1 na network librarian/ professional 1 na table 7. librarian report on positions held or current searches and difficulty (n = 316) position classification # of positions/ searches search difficulty administrative 8 3 technical services 17 2.11 public services 57 2.1 systems/ automation librarian 76 1.89 web services/ development librarian 38 1.89 electronic resources librarian 39 1.87 digital librarian 41 1.8 metadata/cataloger 13 1.77 distance education librarian 6 1.66 emerging & instructional technology librarian 21 1.61 reporting dual roles 30 na librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 135 employment status for “newly minted” mls graduates having just entered the profession were asked in a survey “did specific information technology or computer skills lead to you getting a job?” the answer was a “resounding yes” by 66 percent of the respondents.33 experience is divided into categories to ascertain how many positions in each classification category. each position classification was then assigned a point value base on how the respondents rated the difficulty of those particular searches and the classifications were then averaged by difficulty using the same scale that was applied in the administrative survey. again, almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 1.9. to provide as accurate a picture as possible the surveys asked both groups to indicate if any well known mitigating factors contributed to complications with the job searches. these factors are shown in table 8 which stacks both groups for comparison. this particular dataset reveals some interesting patterns. those roles that were in the most demand were the also the most difficult to hire for, while these also were the easier positions for candidates to find. librarians also listed more job categories as having a significant technology component than the administrators had. perhaps most notable is the discrepancy shown between how administrators perceive the qualifications of candidates as compared to how candidates view themselves. while both groups acknowledge lack of it skills and qualifications as the number one mitigating factor, library administrators perceive the problem as being significantly more serious. this data backs up other recent findings that important new job categories are being defined in lis.30 the data also further support that these roles, while centering on core librarianship principles, have a different skill set.31 ■■ job readiness perspectives issues of job readiness for academic librarians need to be looked at from a number of different perspectives. job readiness can be understood in one way by a candidate and can be something different to an employer. job readiness is not only of critical concern at the beginning of a librarian’s career, clearly this attribute continues to be significant throughout an individual’s length of service in one or more roles and to one or more employers. job readiness is composed of several factors, the most important being education, experience and ongoing skill acquisition. while this is certainly true for all librarians it is of even more concern to those librarians with significant technology roles because of rapid changes in technology. a concern has been established in the literature and in this study that lis education, in the areas of technology, may be inadequate and lack the intensity necessary for modern libraries. this perception has been backed up by entrants to the profession.32 that technology skills are extremely important to library employers has been evident for at least a decade. in 2001 a case study on table 8. mitigating factors in hiring and job search (n = 93) administrative survey: mitigating factors in hiring as a percentage of respondents to the question (n = 93) % of responses we had difficulty getting an applicant pool with adequate skills 54 we are unable to offer qualified candidates what we feel is a competitive salary. 38 we are located in what may reasonably be perceived as an undesirable area to live. 23 we are located in an area with a very high cost of living. 23 we have an it infrastructure or environment that we and/or a candidate may have perceived as unacceptable. 20 the current economic climate has made hiring for these types of positions easier. 18 a successful candidate did not accept an offer of employment 13 librarian survey: mitigating factors in job search as a percentage of respondents to the question (n = 198) % of responses i suspect i may not have/had adequate skills, experience or i was otherwise unqualified. 25 i have not been able to find a position for what i consider to be a fair salary. 11 many jobs are located in what may reasonably be perceived as an undesirable area to live. 10 many jobs are located in an area with a very high cost of living. 15 some jobs have an it infrastructure or environment that i have perceived as unacceptable. 10 the current economic climate has now made finding these types of positions tougher. 22 i was a successful candidate but i could or did not accept an offer of employment. 3 136 information technology and libraries | september 2011 library technology experience they preferred from a candidate. there were 97 responses; the range of preferred experience was 0–7, the mean was 3.06, and the mode was 3. librarians were also asked how much experience they had in a technology-related library role. there were 187 responses; the range of experience was 0–39 years, the mean was 8.7, the mode was 5. when participating administrators were asked if they felt it was necessary to have an mlis librarian fill a technology-related role that is heavily user-centric, 110 administrators responded. also a very important factor, with one study of academic library search committees reporting committee members mentioning that “experience trumps education.”34 this study sought to gather data on possible patterns in the job readiness area. the authors wanted to know how job candidates and employers felt about the viability of new mls graduates, how experience factored into job readiness, how much experience is out there and how long term experience impacted expectations. the survey asked administrators how many years of table 9. question sets related to experience factors by group administrative survey strongly disagree disagree can’t say agree strongly agree new librarians right out of graduate school seem to be adequately prepared (n = 111) 7% 40% 24% 28% 1% librarians with undergraduate or 2nd graduate degrees in a technology/computer fields seem adequately prepared (n = 109) 1% 9% 48% 39% 4% librarians with pre-professional technologyrelated experience seem adequately prepared (n = 109) 1% 6% 47% 41% 8% librarians with some (up to 3 years) post mls technology experience seem adequately prepared (n = 111) 1% 10% 17% 62% 10% librarians with more than 3 years post mls technology experience seem adequately prepared (n = 111) 1% 3% 24% 55% 16% librarians never seem adequately prepared for technology roles (n = 111) 19% 55% 12% 7% 6% librarian survey strongly disagree disagree other agree strongly agree as a new librarian right out of graduate school i was adequately prepared (n = 187) 12% 19% no grad degree 3% 42% 8% i have an undergraduate or 2nd graduate degree in a technology/computer field that has helped me be adequately prepared (n = 187) 13% 7% no tech degree 60% 13% 6% i had pre-professional technology-related experience that helped me be adequately prepared (n = 187) 3% 7% no such experience 20% 43% 27% i have less than 3 years of post mls technology experience and i am adequately prepared (n = 180) 6% 13% na 63% 16% 1% i have more than 3 years of post mls technology experience and i am adequately prepared (n = 184) 2% 12% na 17% 48% 20% i have never felt like i am adequately prepared for technology roles (n = 186) 19% 43% neutral 23% 12% 2% librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 137 readiness of new librarians and the value of related technology degrees. areas of agreement are noted in the importance of preprofessional experience, three or more years of experience, and the generally positive attitude regarding librarians’ ability to successfully take on significant technology roles in libraries. ■■ ongoing skill acquisition and retention how librarians with significant technology roles acquire the skills needed to do their jobs and how they keep those skills current was of great interest in this study. the importance of preprofessional experience has been noted but we should also include the value of service learning in lis education as an important starting point. successful service learning experiences include practicum and partnerships with libraries in need of technology-related services. successful projects such as online exhibits, wireless policies, taxonomy-creation and cross-walking for contentdm are just a few of the service projects that have given lis students real-world experience.35 this responses ranged from 50 percent “yes,” 38 percent “no,” and 12 percent “unsure.” to the same question, 195 practicing technology librarians responded with 58 percent “yes,” 23 percent “no,” and 20 percent “unsure.” the administrator participants were asked if they had ever had to fill a technology-related librarian role with a non-mls hire simply because they were unable to find a qualified librarian to fill the job. of 106 responses, 22 percent reported that they hired a non-mls candidate. the librarian participants were also was asked to report on mls status; out of 194 responses, 93 percent reported holding an mls or equivalent. the survey also asked the librarian participants to report what year they graduated from their mls program as the authors felt this data was important to the inherent longitudinal perspectives reported in the study. of 162 responses, participants reported graduating between 1972–2009. the mean was 1999, the median was 2002, and the mode was 2004. table 9 shows a question set related to experience factors, which stacks both groups for comparison. there are a few notable points in this particular dataset including what appears to be an area of disagreement between administrators and librarians about the table 10. education and skill supplementation for librarians with technology roles administrative survey: in what ways have you supplemented training for your librarians or professional staff with technology-related roles? (does not include ala conferences) % we have paid for technology-related conferences and pre-conferences. 79 we have paid for or allowed time off for classes. 72 we have paid for or allowed time for off online workshops and /or tutorials 87 we have paid for books or other learning materials. 55 we have paid for some or all of a 1st or 2nd graduate degree. 12 we would like to supplement but it is not in our budget. 5 we feel that keeping up with technology is essential for librarians with technology-related roles. 73 librarian survey: in what ways have you supplemented your own education related to technology skill development in terms of your time and/or money? (not including ala conferences) % i have attended technology-related conferences and pre-conferences. 73 i have taken classes. 60 i have taken online workshops and/or tutorials 87 i have bought books or other learning materials. 77 i am getting a 1st or 2nd graduate degree. 9 we would like to supplement my own education but i can not afford it. 13 i would like to supplement my own education but i do not have time. 13 i have not had to supplement in any way. 1 i feel that keeping up with technology is essential for librarians with technology-related roles. 84 i feel that keeping up with technology is somewhat futile. 11 138 information technology and libraries | september 2011 librarians who have transitioned successfully into technology centric roles. this supports the perception that experience and on the job learning play a leading role in the development of technology skills for librarians. openended survey comments also revealed a number of staff who initially were hired in an it role and then went on to acquire an mls while continuing in their technologyfocused role. retention is sometimes problematic for librarians with it roles, primarily because many of them are also employable in many other settings apart from libraries. the survey asked administrators “do you know any librarians with technology roles that have taken it positions outside the library field?” and out of 111 respondents, 33 percent answered “yes.” in open-ended responses the most common reasons administrators felt retention may be a problem was salary, lack of challenges/opportunities, and risk averse cultures. the survey also asked the librarian group “do you think you would ever consider taking an it position outside the library field?” out of 190 respondents; 34 percent answered “yes,” 23 percent “yes, but only if it was education related,” and 42 percent “no.” additionally 38 percent of these librarian respondents knew a librarian who took an it position outside the library field. for the librarian participants an open response field in the survey, named work environment and lack of support for technology as the most often named reasons for this leaving a position. the surveys used in this research study covered several complicated issues. those who responded to the surveys were encouraged to leave open text comments research study asked administrators and librarians in what formal ways they supplement their ongoing education and skill acquisition. table 10 shows these results in a stacked format for comparison. also of interest in this data set is the higher level of importance librarians place on continuing skill development in the area of technology. in open ended text responses a number of librarians reported that the less formal methods of monitoring electronic discussion lists and articles was also a very important part of keeping up with technology in their area. the priority of staying educated, active and current for librarians with significant technology roles cannot be underestimated; what tennant defines as technology agility, the capacity to learn constantly and quickly. i cannot make this point strongly enough. it does not matter what they know now. can they assess a new technology and what it may do (or not do) for your library? can they stay up to date? can they learn a new technology without formal training? if they can’t they will find it difficult to do the job.36 not all librarians with technology roles start out in those positions and thus role transformation must be examined. in some cases librarians with more traditional roles such as reference and collection development have transformed their skill set and taken on technology centric roles. table 11 shows the results of the survey questions related to role transformation in a stacked format for comparison. to be noted in this data set is the large number of table 11. role transformation from traditional library roles to technology centric roles and the reverse. administrative survey (n = 104) % we have had one or more librarians make this transformation successfully. 53 we have had one or more librarians attempt this transformation with some success. 35 we have had one or more librarians attempt this transformation without success. 17 some have been interested in doing this but have not done so. 14 we do not seem to have had anyone interested in this 11 we have had one or more librarians who started out in a technology-related librarian role but have left it for a more traditional librarian role. 5 librarian survey (n = 184) % i started out in a technology-related librarian role and i am still in it. 45 i have made a complete technology role transformation successfully from another type of librarian role. 30 i have attempted to make a technology role transformation but with only some success. 12 i have made a technology role transformation but sometimes i wish i had not. 9 i have made a technology role transformation but i wish i had not and i am interested in returning to a more traditional librarian role. 9 i am not a librarian. 4 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 139 vary considerably from program to program and the content of individual courses appears to vary considerably as well. there appears to be a clear need for additional courses at a more advanced level. this need is evidenced by the experiences of both information technology job candidates and the administrators involved in the hiring decisions. there are clearly still difficulties in both the acquisition of needed skill sets for certain positions and in actual hiring for some information technology positions. there are also some discrepancies between how administrators perceive candidates’ qualifications as compared to how the candidates view themselves. administrators perceive the problem of a lack of it skills/qualifications as more serious than do candidates. the two groups also differ on the question of “readiness” of new professionals. the two groups do agree on the importance of preprofessional experience, and they both exhibit generally positive attitudes toward librarians’ ability to successfully take on significant technology roles in libraries. in several key areas. a large number of comments were received and many of them were of considerable length. many individuals clearly wanted to be heard, others were concerned their story would not be captured in the data, and many expressed a genuine overall interest in the topic. a few salient comments from a variety of areas covered are given in table 12. ■■ conclusion this study seeks to provide an overview of the current issues related to it staffing in academic libraries by reporting on three areas dealing with library skill acquisition and employment. with regards to the status of technology course offerings in lis programs, there has been a significant increase in the number of technologyrelated courses, but the numbers of technology courses table 12. a sample of open ended responses from the two surveys administrative survey “there is a huge need for more and adequate technology training for librarians. it is essential for libraries to remain viable in the future.” “only one library technology position (coordinator) is a professional librarian. others are professional positions without mls.” “there is a lot of competition for few jobs, especially in the current economic climate.” “we finally hired at the level of technician as none of the mls candidates had the necessary qualifications.” “if i wanted a position that would develop strategy for the library’s tools on the web or create a digitization program for special collections, i probably would want an mls with library experience simply because they understand the expectations and the environment.” “number of years of experience in technology is not as important as a willingness to learn and keep current. sometimes old dogs won’t move on to new tricks. sometimes new dogs aren’t interested in learning tricks.” librarian survey “i believe that because technology is constantly changing and evolving, librarians in technology-oriented positions must do the same.” “my problem with being a systems librarian in a small institution is that the job was 24/7/365. way too much stress with no down time.” “i have left the library field for a few years but came back. my motivation was a higher salary, but that didn’t really happen.” “i’m considering leaving my current position because the technology role (which i do love) was added to my position without much training or support. now that part of my job is growing so that i can’t keep up with all my duties.” “i don’t think that library school alone prepared me for my job. i had to do a lot of external study and work to learn what i did, and worked as a part-time systems library assistant while in school, where i learned the majority of what prepared me for my current job.” “library schools need to be more rigorous about teaching students how to innovate with technology, not just use tools others have built. you can’t convert “traditional” librarians into technology roles without rigorous study. otherwise, you will get mediocre and even dangerous results.” 140 information technology and libraries | september 2011 16. ibid., 53–54. 17. thomas w. leonhardt, “thoughts on library education,” technicalities 27, no. 3 (2007): 4–7. 18. thomas w. leonhardt, “library and information science education” technicalities 27, no. 2 (2007): 3–6. 19. noa aharony, “web 2.0 in u.s. lis schools: are they missing the boat?” ariande 30, no. 54 (2008): 1. 20. chuck thomas and salwa ismail patel, “competencybased training for digital librarians: a viable strategy for an evolving workforce?” journal of education for library & information science, 49, no. 4 (2008): 298–309. 21. michael j. miller, “information communication technology infusion in 21st century librarianship: a proposal for a blended core course,” journal of education for library & information science 48, no. 3 (2007): 202–17. 22. “about the ischools.” (2010); http://www.ischools.org/ site/about/ (accessed 9/1/2010). 23. laurie j. bonnici, manimegalai m. subramaniam, and kathleen burnett, “everything old is new again: the evolution of library and information science education from lis to ifield,” journal of education for library & information science 50, no. 4 (2009): 263–74; andrew abbott, the chaos of disciplines (chicago: chicago univ. pr., 2001). 24. bonnici, “everything old is new again,” 263–74. 25. russell a. hall, “exploring the core: an examination of required courses in ala-accredited,” education for information 27, no. 1 (2009): 57–67. 26. ibid., 62. 27. jane m. davis, “a survey of cataloging education: are library schools listening?” cataloging & cataloging quarterly 46, no. 2 (2008): 182–200. 28. aharony, “web 2.0 in u.s. lis,” 1. 29. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57. 30. “redefining lis jobs,” library technology reports 45, no. 3, (2007): 40. 31. youngok choi and edie rasmussen, “what qualifications and skill are important for digital librarian positions in academic libraries? a job advertisement analysis,” the journal of academic librarianship 35, no. 5 (2009): 457–67. 32. carla j. soffle and kim leeder, “practitioners and library education: a crisis of understanding,” journal of education for library & information science 46, no. 4 (2005): 312–19. 33. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2, (2001): 21–38. 34. mary a. ball and katherine schilling, “service learning, technology and lis education,” journal of education for library & information science 47, no. 4 (2006): 277–90. 35. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2 (2001): 21–38. 36. roy tennant, “the most important management decision: hiring staff for the new millennium,” library journal 123, no. 3 (1998): 102. more research is still needed to identify the key technology skills needed. case studies of successful library technology teams and individuals may reveal more about the process of skill acquisition. questions regarding how much can be taught in lis courses or practicum, and how much must be expected through on-the-job experience are good areas for more research. references 1. james michalko, constance malpas and arnold arcolio, “research libraries, risk and systematic change,” oclc research (mar. 2010), http://www.oclc.org/research/publications/ library/2010/2010-03.pdf. 2. lori a. goetsch, reinventing our work, “new and emerging roles for academic librarians,” journal of library administration 48, no. 2 (2008): 157–72. 3. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college and research libraries 70, no. 3 (2009): 250–57. 4. peggy johnson, “from the editor’s desk,” technicalities 27, no. 3 (2007): 2–4. 5. ton debruyn, “questioning the focus of lis education,” journal of education for library & information science 48, no. 2 (2007): 108–15. 6. jacquelyn erdman, “education for a new breed of librarian,” reference librarian 47, no. 98 (2007): 93–94. 7. “educating library and information science professionals for a new century: the kaliper report,” executive summary. aliper advisory committee, alise. (reston, virginia, july 2000), http://www.si.umich.edu/~durrance/textdocs/ kaliperfinalr.pdf (accessed june 1, 2010). 8. karen markey, “current educational trends in library and information science curricula,” journal of education for library and information science 45, no. 4 (2004): 317–39. 9. michael gorman, “whither library education?” new library world 105, no. 9/10 (2004): 376–80; michael gorman, “what ails library education?” journal of academic librarianship 30, no. 2 (2004): 99–101. 10. andrew dillon and april norris, “crying wolf: an examination and reconsideration of the perception of crisis in lis education,” journal of education for library & information science 46, no. 4, (2005): 208–98. 11. leigh s. estabrook, “crying wolf: a response,” journal of education for library & information science 46, no. 4 (2005):299–303. 12. ian m. johnson, “education for librarianship and information studies: fit for purpose?” information development 23, no.1 (2007): 13–14. 13. james g. neal, “raised by wolves,” library journal 131, no. 3 (2006): 42–44. 14. sheila s. intner, “library education for the third millennium,” technicalities 24, no. 6 (2004): 10–12 15. renee d. mckinney, “draft proposed ala core competencies compared to ala-accredited, candidate, and precandidate program curricula: a preliminary analysis,” journal of education for library & information science 47 no.1 (2006): 52–77. j costs of library catalog cards produced by computer 121 frederick g. kilgour: ohio college library center, columbus, ohio production costs of 79,831 cards are analyzed. cards were produced by four variants of the columbia-harvard-yale procedure employing an ibm 870 document writer and an ibm 1401 computer. costs per card ranged from 8.8 to 9.8 cents for completed cards. . early in september, 1964, the yale medical library.put into routine operation the columbia-harvard-yale computerized technique for catalog card manufacture ( 1), and during the following three · years yale produced over 87,000 cards. the principal objective of the chy project was an on-line, computerized, bibliographic information retrieval system. however, the route selected for attaining the objective included manufacture of cards from machine readable data to keep up the manual catalog while machine readable records were being inexpensively accumulated for computerized subject retrieval. catalog cards were only one product of the system, but their production was designed to be as efficient as possible within constraints of the system. nevertheless, this paper will examine chy card production costs as though this segment of the system were an isolated procedure, yielding but one product, as is the case in classical library procedures. costing will disregard other benefits, such as accession lists and machine readable data produced for little, or no, additional expense. the columbia medical library and harvard medical library also installed ibm 870 document writers and tested the programs for card production, but neither library routinely produced cards. however, co122 journal of library automation vol. 1/ 2 june, 1968 lumbia produced its acquisitions lists until october, 1966, using chy techniques. harvard issued a similar list, but for a shorter period of time, and it was harvard's withdrawal early in 1966 that brought about the collapse of the project. nevertheless, other institutions adopted the chy procedure for catalog card production, among them the medical library at the university of rochester, which used the programs for two years following february, 1966. e. r. squibb & sons at east brunswick, new jersey, also uses the programs. at the university of kentucky an 870 document writer types catalog cards, but new programs were written to run on an ibm 7040 computer that recently have been recoded in cobol for an ibm 360/50. similarly, the library at philip morris, inc., richmond, virginia, rewrote the programs to run on an ibm 1620 computer which punches cards that drive an 870. the korean social science bibliography project of the human relations area files has elaborated the chy technique into its automated bibliographic system ( 2), which in turn is the base for another bibliographic system for mrican studies. the machine readable cataloging record of the chy mechanized system eventually became the great-grandfather of the marc ii format and contributed about as much to marc ii as would have been the case had their relationship been truly biological. although the columbia-harvard-yale project never did develop and activate its proposed bibliographic information retrieval system, r. k. summit working entirely independently has brought into successful operation his excellent dialog system ( 3) which is essentially the system that chy had in design stage. moreover, summit's system is definitely superior because it has several useful functions not contemplated in chy. nearly all reports on catalog card production limit study of costs to reproduction of cards and neglect other costs involved in preparing cards for the catalog. an exception is p. j. fasana's 1963 investigation wherein he found that library of congress cards, in seven copies and ready to be filed into a catalog, cost 16.6 cents per card; cards produced by a machine method consisting of a tape typewriter and a very small special purpose computer cost 9.9 cents ( 4). fasana used an hourly salary rate of $2.00. a study of early experience with chy production yielded 12.5 cents per card ( 1) whereas the present study shows that costs range between 8.8 and 9.8 cents per card, cards being ·in completed form, arranged in packs for individual catalogs, and ready for bursting before alphabetizing for filing. methods · during the course of the three years in which the chy programs were in operation, four variant techniques were used for card production. the first three with their limitations have been described · elsewhere ( 5). briefly, the initial system consisted of keypunching from worksheets, _listing the punch cards on an ibm 870 document writer, proofreading and costs of library catalog cards/ kilgour 123 correcting, processing the proofread and corrected punch cards on an ibm 1401 computer which produced punch card output that, in tum, was used to drive the 870 document writer for production of catalog cards on oneup forms. in the next arrangement, printing of cards on one-up forms was accomplished on an ibm 1401 computer driving an upperand lowercase print chain. in the third procedure, a two-up card form replaced the one-up form. finally, the medical library returned the 870 document writer to the manufacturer, and the 1401 was programmed to do the prooflisting in upper and lower case. the yale bibliographic system (6) replaced the chy routines on 25 july 1967. the keypuncher kept time records for the various activities listed in table 1 throughout the period of this study. during the first two months of operation, design for recording data was inadequate. subsequently an individual would, albeit infrequently, fail to record time elapsed, so that production of 7,630 cards was omitted from the study, leaving a total of 79,831 to be included. on several occasions during the fourth part of the study, the second proofreading was suspended, and only correction carried out. hence, time expended in this category is less than in the previous three periods. at first an ibm 1401 computer in the yale computer center was used, the center being located about a mile from the medical library. subsequently, another 1401 modified to drive an upperand lower-case print chain and located in the medical sc;hool was employed. later this machine was transferred to the administrative data systems computer center, which moved to a new location not long after it assumed operation of the 1401. still later, the 1401 was again transferred, this time to the yale computer center. as can be seen from the computer charges in table 1, these wanderings about new haven appear to have had no effect on operating efficiency. time recorded for each computer run was actual time clocked by the operator. other times were recorded by the individual performing the operation. ·. salaries used in the cost calculation were salaries being paid in june, 1967, which were, of course, appreciably higher than those in the autumn of 1964; hourly rate for the first proofreader in table 1 was $2.62 ~nd for the second $2.21. hourly rental for the 870 document writer was $.78. rate of computer charges employed in the calculation was $20 per hour, a rate that had existed during the last year or so during which data was collected. initially, computer charges had been $75 an hour, but they dropped precipitously during the first two years. costs for catalog card stock were the lowest cost charged for the two types of forms. since these forms were not standard items during the years of the study, their prices varied considerably depending upon the amount ordered. results table 1 contains cost figures for catalog card production by the four variant techniques. since salaries and computer charges can vary widely, -----.-.---.-~..::::-·...:::::-.-__ ...... l'o ~ table 1. per-card costs of computer-produced catalog cards. 'o' one-u p form on 870 one-up fo r m o n 1401, t woup f o r m on 1401 , two-up· form o n 1401 , ~ g proo f on 8 70 proof o n 870 p r oof o n 140 1 ...... ..a dollars hou r s dollars hou r s dolla r s hou r s d olla r s hours t"'' .... <:3"' k e ypunch i n g • 02 19 • 0099 • 0 2 18 • 0099 • 0222 • 0 10 1 • 0 235 • 0106 "'t ~ "'t '-!::: keypun c h • 0029 • 00 99 • 0030 • 009 9 • 003 0 • 0101 • 0 0 32 • 0 106 ::> ~ i b m 8 70p r o o£ • 0033 • 00 4 3 • 0 036 • 00 4 6 • 003 9 • 00 51 ..... 0 i bm 1401 -proof • 004 6 ~ • 009 1 ~ ..... .... proofr eaders (2) 0 ;:$ proofr eadi ng • 0 11 5 • 004 4 • 0 11 3 • 00 4 3 . 0118 • 00 45 • 011 6 • 0044 proofr eading and c orrecting • 0 120 • 0 0 55 • 0 12 2 • 005 5 • 0 11 9 • 0 0 54 • 009 1 • 004 1 ~ i bm 140 1 • 0149 • 0085 • 0313 • 0 156 • 023 1 • 011 6 • 024 5 • 0 112 !"""' ...... ib m 8 70-ca r d typing • 0 104 '-.... l'o card st o c k • 0 149 • 01 49 • 01 2 5 • 0125 '--1 t o ta l • 0 9 18 • 0981 • 0884 • 09 35 § v(l) ...... <;;0 n um b er of cards 1 5, 149 9343 27,210 28, 129 0:> 00 number of titles 1, 6 55 990 2 , 920 3,1 30 cards per titl e 9 . 2 9. 4 9. 3 9 . 0 ~--· costs of librm·y catalog cards/kilgour 125 particularly among countries, time per card produced is also included in the table to facilitate comparison with other systems. of course, amounts of tim~ calculated by dividing elapsed time by amount of product are not directly comparable with results of time and motion studies such as henry voos' helpful study (7) . however, two different methods of comparing the input costs in table 1 with those johnson ( 8) published for the stanford book catalog gave divergences of only 2 and 6 per cent. source of the increase in costs of six-tenths of a cent from the first procedure to the second is entirely the increase in computer charges when the 1401 replaced the 870 to print cards. when the two-up form was employed on the computer in variant three, charges then dropped to less than the combined 1401 and 870 costs in the first procedure. costs rose again in procedure four. here the principal cause of the increase was the substitution of computer-produced proof listings after the 870 document writer had been returned to the manufacturer. although there is no reason to think that preparation of cataloging copy on a worksheet is either more or less expensive than older techniques, coding a worksheet constitutes additional work for which there is no equivalent in classical procedures. coding costs were examined between 9 march and 11 may 1965, when six individuals, ranging from professional catalogers to a student assistant, recorded time required to code 725 worksheets. time per final catalog card produced was three seconds; in other words, $.003 for a cataloger receiving $7500 a year, or $.001 for a student assistant earning $1.50 an hour. if total coding cost, . rather than a portion of it, were to be charged to card production, costs reported in table 1 could rise oneto three-tenths cents. discussion the accurate comparison of costs would be with those of systems similar to the chy system that produce more than one product. for instance, the chy system also produced monthly accession lists from the same punch-card decklets that produced catalog cards. the accession list was produced mechanically at a cost far less than that for the previous manual preparation. the decklets also constituted machine readable information available for other purposes, most of which have not yet been realized. system costing would assign only a portion of keypunching and proofreading costs to card production. another saving was the appreciable shortening of time required for catalog cards to appear in the catalog. in procedures one through three, usually three or four days elapsed from the day on which the cataloger completed cataloging to the day on which cards were filed into the catalog. however, in procedure four, the computer, which was then a mile distant from the medical library, was used on two separate occasions for each batch of decklets, so that elapsed time rose to at least a week. ' i li ii ii '· ,, .. '· ,, ' • ,, 126 journal of library automation vol. 1/ 2 june, 1968 even though other benefits are not reflected in comparative costs, it is clear from fasana's findings that the chy computer-produced cards cost far less than do lc cards, and have a similar cost to those produced mechanically on which fasana reported. although there appears to be no published evidence that photocopying techniques can produce finished catalog cards at less expense than 9 cents, it is possible that some photoreproduced cards may be less expensive than those described in this article. however, it must be pointed out that photo-reproduced cards are products . of single-product procedures, whereas the chy cards are one of several system products. increase in cost between procedure three and procedure four was due to increase in cost of prooflisting in upper and lower case on the 1401 computer as compared to prooflisting on the 870 document writer. this cost increase was not detected until calculations were done for this investigation, and therein lies a moral. it was the policy at the yale library for all programming to be done by library programmers, since various inefficiences, and indeed catastrophes, had occasionally been observed when non-library personnel had prepared programs for library operations. the single exception to this policy was the proof program, which this investigation reveals used an exhorbitant amount of time-one-third of that required for subsequent card production. since it had been felt that writing and coding a prooflisting program. was perfectly straightfmward, an outside programmer of recognized ability was employed to write and code the program. because the program was simple, and because the programmer had high competence, efficiency of the program was never checked as it should have been. this episode raises the question that if even the wary can be trapped, how can the tmwary avoid pitfalls? there is no satisfactory answer, but it would appear that some difficulties could be avoided by review of new programs by experienced library programmers, of which there are unfortunately far too few. comparison with data such as that in table 1 will also be helpful, but not definitive, in evaluating new programs. of course, when widely used library computer programs of recognized efficiency are generally available, magnitude of the pitfalls will have been greatly reduced. concl"qsion computer-produced catalog cards, even when they are but one of several system products, can be prepared in finished form for a local catalog less expensively and with less delay than can library of congress printed cards. computer card production at 8.8 to 9.8 cents per completed card appears to be competitive with other procedures for preparing catalog cards. however, undetected inefficiency in a minor program increased costs, thereby emphasizing need to insure efficiency in programs used routinely. costs of library catalog cards/ kilgour 127 acknowledgements the author is most grateful to mrs. sarah boyd, keypuncher extraordinary, who maintained the record of the data used in this study. national science foundation grant no. 179 supported the chy project in part. references 1. kilgour, frederick g.: "mechanization of cataloging procedures," bulletin of the medical library association, 53 (aprill965), 152-162. 2. koh, hesung c.: "a social science bibliographic system; computer adaptations," the american behavioral scientist, 10 (jan. 1967), 2-5. 3. summit, roger k.: "dialog; an operational on-line reference retrieval system," association for computing machinery, proceedings of 22nd national conference, (1967), 51-56. 4. fasana, p.j.: "automating cataloging functions in conventional libraries," library resources & technical services, 7 ( fall1963), 350-365. 5. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 6. weisbrod, david l.: "an integrated, computerized, bibliographic system for libraries," (in press). 7. voos, henry: standard times for certain clerical activities in technical processing (ann arbor, university microfilms, 1965). 8. johnson, richard d.: "a book catalog at stanford~" journal of library automation, 1 (march 1968), 13-50. ----------------------practical limits to the scope of digital preservation mike kastellec practical limits to the scope of digital preservation | kastellec 63 abstract this paper examines factors that limit the ability of institutions to digitally preserve the cultural heritage of the modern era. the author takes a wide-ranging approach to shed light on limitations to the scope of digital preservation. the author finds that technological limitations to digital preservation have been addressed but still exist, and that non-technical aspects—access, selection, law, and finances—move into the foreground as technological limitations recede. the author proposes a nested model of constraints to the scope of digital preservation and concludes that costs are digital preservation’s most pervasive limitation. introduction imagine for a moment what perfect digital preservation would entail: a perfect archive would capture all the content generated by humanity instantly and continuously. it would catalog that information and make it available to users, yet it would not stifle creativity by undermining creators’ right to control their creations. most of all, it would perfectly safeguard all the information it ingested eternally, at a cost society is willing and able to sustain. now return to reality: digital preservation is decidedly imperfect. today’s archives fall far short of the possibilities outlined above. much previous scholarship debates the quality of different digital preservation strategies; this paper looks past these arguments to shed light on limitations to the scope of digital preservation. what are the factors that limit the ability of libraries, archives, and museums (henceforth collectively referred to as archival institutions) to digitally preserve the cultural heritage of the modern era? 1 i first examine the degree to which technological limitations to digital preservation have been addressed. next, i identify the non-technical factors that limit the archival of digital objects. finally, i propose a conceptual model of limitations to digital preservation. technology any discussion of digital preservation naturally begins with consideration of the limits of digital preservation technology. while all aspects of digital preservation are by definition related to technology, there are two purely technical issues at the core of digital preservation: data loss and technological obsolescence. 2 many things can cause data loss. the constant risk is physical deterioration. a digital file consists at its most basic level as binary code written to some form of mike kastellec (makastel@ncsu.edu) is libraries fellow, north carolina state university libraries, raleigh, nc. mailto:makastel@ncsu.edu information technology and libraries | june 2012 64 physical media. just like analog media (paper, vinyl recordings), digital media (optical discs, hard drives) are subject to degradation at a rate determined by the inherent properties of the medium and environment in which it is stored. 3 when the physical medium of a digital file decays to the point where one or more bits lose their definition, the file becomes partially or wholly unreadable. other causes of data loss include software bugs, human action (e.g., accidental deletion or purposeful alteration), and environmental dangers (e.g., fire, flood, war). assuming a digital archive can overcome the problem of physical deterioration, it then faces the issue of technological obsolescence. binary code is simply a string of zeroes and ones (sometimes called a bitstream)—like any encoded information, this code is only useful if it can be decoded into an intelligible format. this process depends on hardware, used to access a bitstream from a piece of physical media, and software, which decodes the bitstream into an intelligible object, such as a document or video displayed on a screen, a printout, or an audio output. technological obsolescence occurs when either the hardware or software needed to render a bitstream usable is no longer available. given the rapid pace of change in computer hardware and software, technological obsolescence is a constant concern. 4 most digital preservation strategies involve staying ahead of deterioration and obsolescence by copying data from older to current generations of file formats and storage media (migration) or by keeping many copies that are tested against one another to find and correct errors (data redundancy). 5 other strategies to overcome obsolescence include pre-emptively converting data to standardized formats (normalization) or avoiding conversion and instead using virtualized hardware and software to simulate the original digital environment needed to access obsolete formats (emulation). as may be expected of a young field, 6 there is a great deal of debate over the merits of each of these strategies. to date, the arguments mostly concern the quality of preservation, which is beyond the scope of this work. what should not be contentious is that each strategy also imposes limitations on the potential scale of digital preservation. migration and normalization are intensive processes, in the sense that they normally require some level of human interaction. any human-mediated process limits the scale of an archival institution’s preservation activities, as trained staffs are a limited and expensive resource. emulation postpones the processing of data until it is later accessed, potentially allowing greater ingest of information. as a strategy, however, it remains at least partly theoretical and untested, increasing the possibility that future access will be limited. data redundancy deserves closer examination, as it has emerged as the gold standard in recent years. the limitations data redundancy imposes on digital preservation are two-fold. the first is that simple maintenance of multiple copies necessarily increases expenses, therefore—given equal levels of funding—less information can be preserved redundantly than can be preserved without such measures. (cost considerations are inextricably linked to every other limitation on digital preservation and are examined in greater detail in “finances,” below.) there are practical, technical limitations on the bandwidth, disk access, and processing speeds needed to perform practical limits to the scope of digital preservation | kastellec 65 parity checks (tests of each bit’s validity) of large datasets to guard against data loss. pushing against these limitations incurs dramatic costs, limiting the scale of digital preservation. current technology and funding are many orders of magnitude short of what is required to archive the amount of information desired by society over the long term. 7 the second way technology limits digital preservation is more complex—it concerns error rates of archived data. non-redundant storage strategies are also subject to errors, of course. only redundant systems have been proposed as a theoretical solution to the technological problem of digital preservation, 8 though, so it is necessary to examine their error rate in particular. on a theoretical level, given sufficient copies, redundant backup is all but infallible. in practice, technological limitations emerge. 9 the number of copies required to ensure perfect bit preservation is a function of the reliability of the hardware storing each copy. multiple studies have found that hardware failure rates greatly exceed manufacturers’ claims. 10 rosenthal argues that, given the extreme time spans under consideration, storage reliability is not just unknown but untestable. 11 he therefore concludes that it cannot be known with certainty how many copies are needed to sustain acceptably low error rates. even today’s best digital preservation technologies are subject to some degree of loss and error. analog materials are also inevitably subject to deterioration, of course, but the promise of digital media leads many to unrealistic expectations of perfection. nevertheless, modern digital preservation technology addresses the fundamental needs of archival institutions to a workable degree. technological limitations to digital preservation still exist but the aspects of digital preservation beyond purely technical considerations—access, selection, law, and finances— should gain greater relative importance than they have in the past. access with regard to digital preservation, there are two different dimensions of access that are important. at one end of a digital preservation operation, authorized users must be able to access an archival institution’s holdings and unauthorized users restricted from doing so. this is largely a question of technology and rights management—users must be able to access preserved information and permitted to do so. this dimension of access is addressed in the technology and law sections of this paper. the other dimension of access occurs at the other end of a digital preservation operation: an archival institution must be able to access a digital object to preserve it. this simple fact leads to serious restrictions on the scope of digital preservation because much of the world’s digital information is inaccessible for the purposes of archiving by libraries and archives. there are a number of reasons why a given digital object may be inaccessible. large-scale harvesting of webpages requires automated programs that “crawl” the web, discovering and capturing pages as they go. web crawlers cannot access password-protected sites (e.g., facebook) and database-backed sites (all manner of sites, including many blogs, news sites, e-commerce sites, information technology and libraries | june 2012 66 and countless collections of data). this inaccessible portion of the web is estimated to dwarf the readily accessible portion by orders of magnitude. there is also an enormous amount of inaccessible digital information that is not part of the web at all, such as emails, company intranets, and digital objects created and stored by individuals. 12 additionally, there is a temporal limit to access. some digital objects only are accessible (or even exist) for a short window of time, and all require some measure of active preservation to avoid permanent loss. 13 the lifespans of many webpages are vanishingly short. other pages, like some news items, are publicly accessible for a short window before they are hidden behind paywalls. even long-lasting digital objects are often dynamic: the ads accompanying a webpage may change with each visit; news articles and other documents are revised; blog posts and comments are deleted. if an archival institution cannot access a digital object quickly or frequently enough, the object cannot be archived, at least not completely. large-scale digital preservation, which in practice necessarily relies on periodic automated harvesting of content, is therefore limited to capturing snapshots of the changes digital objects undergo over their lifespans. law existing copyright law does not translate well to the digital realm. leaving aside the complexities of international copyright law, in the united states it is not clear, for example, whether an archival institution like the library of congress is bound by licensing restrictions and if it can require deposit of digital objects, nor whether content on the web or in databases should be treated as published or unpublished. 14 “many of the uncertainties come from applying laws to technologies and methods of distribution they were not designed to address.” 15 a lack of revised laws or even relevant court decisions significantly impacts the potential scale of digital preservation, as few archival institutions will venture to preserve digital objects without legal protection for doing so. given this unclear legal environment, efforts at large-scale digital preservation are hampered by the need to secure permission to archive from the rights holder of each piece of content. 16 this obviously has enormous impact on preserving the web, but even scholarly databases and periodical archives may not hold full rights to all of their published content. additionally, a single digital object can include content owned by any number of authors, each of whose permission is needed for legal archival. without stronger legal protection for archival institutions, the scope of digital preservation is severely limited by copyright restrictions. digital preservation is further limited by licensing agreements, which can be even more restrictive than general copyright law. frequently, purchase of a digital object does not transfer ownership to the end-user, but rather grants limited licensed access to the object. in this case, libraries do not enjoy the customary right of first sale that, among other things, allows for actions related to preservation that would otherwise breach copyright. 17 preservation of licensed works requires that libraries either cede archival responsibility to rights practical limits to the scope of digital preservation | kastellec 67 holders, negotiate the right to archive licensed copies, or create dark archives that preserve objects in an inaccessible state until their copyright expires. selection the limitation selection imposes on digital preservation hinges on the act of intellectual appraisal. the total digital content created each year already outstrips the total current storage capacity of the world by a wide margin. 18 it is clear libraries and archives cannot preserve everything so, more than ever, deciding what to preserve is critical. 19 models of selection for digital objects can be plotted on a scale according to the degree of human mediation they entail. at one end, the selective model is closest to selection in the analog world, with librarians individually identifying digital objects worthy of digital preservation. at the other end of the scale, the whole domain model involves minimal human-mediation, with automated harvesting of digital objects. the collaborative model, in which archival institutions negotiate agreements with publishers to deposit content, falls somewhere between these two extremes, as does the thematic model, which can apply either selectiveor whole-domain-type approaches to relatively narrow sets of digital objects defined by event, topic, or community. each of these approaches results in limits to the scope of digital preservation. the human mediation of the selective model limits the scale of what can be preserved, as objects can only be acquired as quickly as staff can appraise them. the collaborative and thematic models offer the potential for thorough coverage of their target but by definition are limited in scope. the whole domain model avoids the bottleneck of human appraisal but, more than any other model, is subject to the access limitations discussed above. whole domain harvesting is also essentially wasteful, as it is an anti-selection approach—everything found is kept, irrespective of potential value. this wastefulness makes the whole domain model extremely expensive because of the technological resources required to manage information at such a scale. finances the ultimate limiting factor is financial reality. considerations of funding and cost have both broad and narrow effects. the narrow effects are on each of the other limitations previously identified— financial constraints are intertwined with the constraints imposed by technology, access, law, and selection. the technological model of digital preservation that offers the highest quality and lowest risk, redundant offsite copies, also carries hard-to-sustain costs. while the cost of storage continues to drop, hardware costs actually make up only a small percentage of the total cost of digital preservation. power, cooling, and—for offsite copy strategies—bandwidth costs are significant and do not decrease as scale increases to the same degree that storage costs do. cost considerations similarly fuel non-technical limitations: increased funding can increase the rate at which digital objects are accessed for preservation and can enable development of systems to mine deep web resources. selection is limited by the number of staff who can evaluate objects or information technology and libraries | june 2012 68 the need to develop systems to automate appraisal. negotiating perpetual access to objects or arranging to purchase archival copies creates additional costs. the broad financial effect is that any digital preservation requires dedicated funding over an indefinite timespan. lavoie outlines the problem: much of the discussion in the digital preservation community focuses on the problem of ensuring that digital materials survive for future generations. in comparison, however, there has been relatively little discussion of how we can ensure that digital preservation activities survive beyond the current availability of soft-money funding; or the transition from a project's first-generation management to the second; or even how they might be supplied with sufficient resources to get underway at all. 20 there are many possible funding models for digital preservation, 21 each with their own limitations. creators and rights holders can preserve their own content but normally have little incentive to do so over the long-term, as demand for access slackens. publicly funded agencies can preserve content, but they may lack a clear mandate for doing so, and they are chronically underfunded. preservation may be voluntarily funded, as is the case for wikipedia, although it is not clear if there is enough potential volunteer funding for more than a few preservation efforts. fees may support preservation, either through charging users for access or by third-party organizations charging content owners for archival services; in such cases, however, fees may also discourage access or provision of content, respectively. a nested model of limitations these aspects can be seen as a series of nested constraints (see figure 1). practical limits to the scope of digital preservation | kastellec 69 figure 1. nested model of limitations at the highest level, there are technical limitations on how much digital information can be preserved at an acceptable quality. within that constraint, only a limited portion of what could possibly be preserved can be accessed by archival institutions for digital preservation. next, within that which is accessible, there are legal limitations on what may be archived. the subset defined by technological, access, and legal limitations still holds far more information than archival institutions are capable of archiving, therefore selection is required, entailing either the limited quality of automated gathering or the limited quantity of human-mediated appraisal. finally, each of these constraints is in turn limited by financial considerations, so finances exert pressure at each level. conclusion it is possible to envision alternative ways to model these series of constraints—the order could be different, or they could all be centered on a single point but not nested within each other. thus, undue attention should not be given to the specific sequence outlined above. one important conclusion that may be drawn, however, is that the identified limitations are related but distinct. the preponderance of digital preservation research to date has understandably focused on overcoming technological limitations. with the establishment of the redundant backup model, which addresses technological limitations to a workable degree, the field would be well served by greater efforts to push back the non-technical limitations of access, law, and selection. the other conclusion is that costs are digital preservation’s most pervasive limitation. as rosenthal plainly states it, “society’s ever-increasing demands for vast amounts of data to be kept for the future are information technology and libraries | june 2012 70 not matched by suitably lavish funds.” 22 if funding cannot be increased, expectations must be tempered. perhaps it has always been the case, but the scale of the digital landscape makes it clear that preservation is a process of triage. for the foreseeable future, the amount of digital information that could possibly be preserved far outstrips the amount that feasibly can be preserved. it is useful to put the advances in digital preservation technology in perspective and to recognize that non-technical factors also play a large role in determining how much of our cultural heritage may be preserved for the benefit of future generations. references and notes 1. issues specific to digitized objects (i.e., digital versions of analog originals) are not specifically addressed herein. technological limitations apply equally to digitized and born-digital objects, however, and the remaining limitations overlap greatly in either case. 2. francine berman et al., sustainable economics for a digital planet: ensuring long-term access to digital information (blue ribbon task force on sustainable digital preservation and access, 2010), http://brtf.sdsc.edu/biblio/brtf_final_report.pdf (accessed apr. 23, 2011). 3. marilyn deegan and simon tanner, “some key issues in digital preservation,” in digital convergence—libraries of the future, ed. rae earnshaw and john vince, 219–37 (london: springer london, 2007), www.springerlink.com.proxyremote.galib.uga.edu/content/h12631/#section=339742&page=1 (accessed nov. 18, 2010). 4. berman et al., sustainable economics for a digital planet; deegan and tanner, “digital convergence.” 5. data redundancy normally will also entail hardware migration; it may or may not also incorporate file format migration. 6. the library of congress, for instance, only began digital preservation in 2000 (www.digitalpreservation.gov/partners/pioneers/index.html [accessed apr. 24, 2011]). 7. david s. h. rosenthal, “bit preservation: a solved problem?” international journal of digital curation 5, no. 1 (july 21, 2010), www.ijdc.net/index.php/ijdc/article/view/151 (accessed mar. 14, 2011). 8. h. m. gladney, “durable digital objects rather than digital preservation,” january 1, 2008, http://eprints.erpanet.org/149 (accessed mar. 14, 2011). 9. rosenthal, “bit preservation.” 10. ibid. rosenthal cites studies by schroeder and gibson (2007) and pinheiro (2007). 11. ibid. http://brtf.sdsc.edu/biblio/brtf_final_report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 http://www.digitalpreservation.gov/partners/pioneers/index.html file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ijdc.net/index.php/ijdc/article/view/151 http://eprints.erpanet.org/149/ practical limits to the scope of digital preservation | kastellec 71 12. peter lyman, “archiving the world wide web,” in building a national strategy for digital preservation: issues in digital media archiving (washington, dc: council on library and information resources and library of congress, 2002), 38–51, www.clir.org/pubs/reports/pub106/pub106.pdf (accessed dec. 1, 2010); f. mccown, c. c marshall, and m. l nelson, “why web sites are lost (and how they’re sometimes found),” communications of the acm 52, no. 11 (2009): 141–45; margaret e. phillips, “what should we preserve? the question for heritage libraries in a digital world,” library trends 54, no. 1 (summer 2005): 57–71. 13. deegan and tanner, “digital convergence”; mccown, marshall, and nelson, “why web sites are lost (and how they’re sometimes found).” 14. june besek, copyright issues relevant to the creation of a digital archive: a preliminary assessment (the council on library and information resources and the library of congress, 2003), www.clir.org/pubs/reports/pub112/contents.html (accessed mar. 15, 2011). 15. ibid., 17. 16. archival institutions that do not pay heed to this restriction, such as the internet archive (www.archive.org), claim their actions constitute fair use. the legality of this claim is as yet untested. 17. berman et al., sustainable economics for a digital planet. 18. francine berman, “got data?” communications of the acm 51, no. 12 (december 2008): 50, http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part =magazine&wanttype=magazines&title=communications (accessed nov. 20, 2010). 19. phillips, “what should we preserve?” 20. brian f. lavoie, “the fifth blackbird,” d-lib magazine 14, no. 3/4 (march 2008): i, www.dlib.org/dlib/march08/lavoie/03lavoie.html (accessed mar. 14, 2011). 21. berman et al., sustainable economics for a digital planet. 22. rosenthal, “bit preservation.” http://www.clir.org/pubs/reports/pub106/pub106.pdf file:///c:/users/gerrityr/documents/my%20dropbox/ital/ital_june_2012_preprints/,%20http:/www.clir.org/pubs/reports/pub112/contents.htm http://www.archive.org/ http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://www.dlib.org/dlib/march08/lavoie/03lavoie.html http://www.dlib.org/dlib/march08/lavoie/03lavoie.html : | zhang et al. 75seeing the wood for the trees | zhang et al. 75 here again, no weighting or differentiating mechanism is included in describing the multiple elements. what is addressed is the “what” problem: what is the work of or about? metadata schemas for images and art works such as vra core and cdwa focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. however, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword. recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. this introduces more subject terms into the system. yet again, there is typically no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency information in the search interfaces. as collections grow and more federated searching is carried out, the absence of weights for subject terms can cause problems in search and navigation. the following examples illustrate the problems, and the rest of the paper further reviews and discusses the precedent research and practice on weighting, and further outlines the issues that are critical in applying a weighting mechanism. example, the dublin core metadata element set recommends the use of controlled vocabulary to represent subject in “keywords, key phrases, or classification codes.”1 similarly, the library of congress practice, suggested in the subject headings manual, is to assign “one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics.”2 a topic is only “important enough” to be given a subject heading if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are “critical to the subject of the work as a whole.”3 although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice. a similar practice applies in non-textual object subject indexing. because of the difficulty of selecting words to represent visual/aural symbolism, subject indexing for art and cultural objects is usually guided by panofsky’s three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by layne in “ofness” and “aboutness” in each level. specifically, what can be indexed includes the “ofness” (what the picture depicts) as well as some “aboutness” (what is expressed in the picture) in both pre–iconographical and iconographical levels.4 in practice, vra core 4.0 for example defines subject subelements as: terms or phrases that describe, identify, or interpret the work or image and what it depicts or expresses. these may include generic terms that describe the work and the elements that it comprises, terms that identify particular people, geographic places, narrative and iconographic themes, or terms that refer to broader concepts or interpretations.5 seeing the wood for the trees: enhancing metadata subject elements with weights subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. with more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. we argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas. s ubjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. this approach to indexing is implicitly assumed in various guidelines for subject indexing. for hong zhang, linda c. smith, michael twidale, and fang huang gaocommunications hong zhang (hzhang1@illinois.edu) is phd candidate, graduate school of library and information science, university of illinois at urbana-champaign, linda c. smith (lcsmith@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, michael twidale (twidale@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, and fang huang gao (fgao@gpo.gov) is supervisory librarian, government printing office. 76 information technology and libraries | june 2011 ■■ examples of problems exhaustive indexing: digital library collections a search query of “tree” can return thousands of images in several digital library collections. the results include images with a tree or trees as primary components mixed with images where a tree or trees, although definitely present, are minor components of the image. figure 1 illustrates the point. these examples come from three different collections and either include the subject element of “tree” or are tagged with “tree” by users. there is no mechanism that catalogers or users have available to indicate that “tree” in these images is a minor component. note that we are not calling this out as an error in the professionally developed subject terms, nor indeed in the end user generated tags. although particular images may have an incorrectly applied keyword, we want to talk about the vast majority where the keyword quite correctly refers to a component of the image. furthermore, such keywords referring to minor components of the image are extremely useful for other queries. this kind of exhaustive indexing of images enables the effective satisfaction of search needs, such as looking for pictures of “buildings, people, and trees” or “trees beside a river.” with large image collections, such compound needs become more important to satisfy by combinations of searching and browsing. to enable them, metadata about minor subjects is essential. however, without weights to differentiate subject keywords, users will get overwhelmed with partially relevant results. for example, a user looking for images of trees (i.e., “tree” as the primary subject) would have to look through large sets of results such as a photograph of a dog with a tiny tree out of focus in the background. for some items that include rich metadata, such as title or description, when people look at a particular item’s record, with the title and sometimes the description, we may very well determine that the picture is primarily of, say, a dog instead of trees. that is, the subject elements have to be interpreted based on the context of other elements in the record to convey the “primary” and “peripheral” subjects among the listed subject terms. however, in a search and navigation system where subject elements are usually treated as context-free, search efficiency will be largely impaired because of the “noise” items and inability to refine the scope, especially when the volume of items grows. lack of weighting also limits other potential uses of keywords or tags. for example, all the tags of all the items in a collection can be used to create a tag cloud as a low cost way to contribute to a visualization of what a collection is “about” overall.6 unfortunately, a laboriously developed set of exhaustive tags, although valuable for supporting searching and browsing within a large image collection, could give a very distorted overview of what the whole collection is about. extending our example, the tag “tree” may occur so frequently and be so prominent in the tag cloud that a user infers that this is mostly a botanical collection. selective indexing: lcsh in library catalogs although more extreme in the case of images in conveying the “ofness,” the same problem with multiple subjects also applies to text in terms of “aboutness.” the following example comes from an online library catalog in a faceted navigation web interface using library of congress subject headings in subject cataloging.7 the query “psychoanalysis and religion” returned 158 results, with 126 in “psychoanalysis and religion” under the topic facet. according to the subject headings manual, the first subject is always the primary one, while the second and others could be either a primary or nonprimary subject.8 this means that among these 126 books, there is no easy way to tell which books are “primarily” about “psychoanalysis and religion” unless the user goes through all of them. with the provided metadata, we do know that all books that have “psychoanalysis and religion” as the first subject heading are primarily about this topic, but a book that has this same heading as its second subject heading may or may not be primarily about this topic. there is no way to indicate which it is in the metadata, nor in the search interface. as this example shows, the library of congress manual involves an attempt to acknowledge and make a distinction between primary and nonprimary subjects. however in practice the attempt is insufficient to be really useful since apart from the first entry, it is ambiguous whether subsequent entries are additional primary subjects or nonprimary subjects. consequently, the search system and, further on, the users are not able to take full advantage of the care of a cataloger in deciding whether an additional subject is primary or not. other information retrieval systems the negative effect of current subject indexing without weighting on search outcomes has been identified by some researchers on particular information retrieval systems. in a study examining “the contribution of metadata to effective searching,”9 hawking and zobel found that the available subject metadata are “of little value in ranking answers” to search queries.10 their explanation is that “it is difficult to indicate via metadata tagging the relative importance of a page to a particular topic,”11 in addition to the problems in data quality and system implementation. the same problem : | zhang et al. 77seeing the wood for the trees | zhang et al. 77 authors compared with the automatic indexing systems, because human indexers should be better at weighting the significance of subjects, and be more able to distinguish between important and peripheral compared with computers that base significance on term frequency.13 indeed, while various weighting algorithms have been used in automatic indexing systems to approximate the distinguishing function, there is simply no such mechanism built in human subject the particular page harder to find.12 a similar problem is reported in a recent study by lykke and eslau. in comparing searching by controlled subject metadata, searching based on automatic indexing, and searching based on automatic indexing expanded with a corporate thesaurus in an enterprise electronic document management system, the authors found that the metadata searches produced the lowest precision among the three strategies. the problem of indiscriminate metadata indexing is “remarkable” to the of multiple tags without weights is described: in the kinds of queries we have studied, there is typically one page (or at most a small number) that is particularly valuable. there are many other pages which could be said to be relevant to the query—and thus merit a metadata match—but they are not nearly so useful for a typical searcher. under the assumption that metadata is needed for search, all of these pages should have the relevant metadata tag, but this makes a. subject: women; books; dresses; flowers; trees; . . . in: victoria & albert museum (accessed aug. 30, 2010), http://collections.vam.ac.uk/item/014962/oil-painting-the-day-dream b. tags: japanese; moon; nights; walking; tree; . . . in: brooklyn museum (accessed aug. 30, 2010), http://www.brooklynmuseum.org/opencollections/objects/121725/aoi_slope_outside_toranomon_gate_no._113_from_ one_hundred_famous_views_of_edo c. tags: japanese; birds; silk; waterfall; tree; . . . in: steve: the museum social tagging project (accessed aug. 30, 2010), http://tagger.steve.museum/steve/object/15?offset=2 figure 1. example images with “tree” as a subject item 78 information technology and libraries | june 2011 anderson in niso tr021997.20 in addition, researchers have noticed the limitations of this dichotomous indexing. in an opinion piece, markey emphasizes the urgency to “replace boolean-based catalogs with post-boolean probabilistic retrieval methods,”21 especially given the challenges library systems are faced with today. it is the time to change the boolean, i.e., dichotomous, practice of subject indexing and cataloging, no matter whether it is produced by professional librarians, by user tagging, or by an automatic mechanism. indeed, as declared by svenonius, “while the purpose of an index is to point, the pointing cannot be done indiscriminately.”22 needed refinements in subject indexing the fact that weighted indexing has become more prominently needed over the past decade may be related to the shift in the continuum from subject indexing as representation/ surrogate to subject indexing as access points, which is consistent with the shift from a small number of subject terms to more subject terms. this might explain why the weighting practice is applied in the above mentioned medline/pubmed system. with web-based systems, social tagging technology, federated searching, and the growing number of collections producing more subject terms, to distinguish between them has become a prominent problem. in reviewing information users and use from the 1920s to the present, miksa points out the trend to “more granular access to informational objects” “by viewing documents as having many diverse subjects rather than one or two ‘main’ subjects,” no matter what the social and technical environment has been.23 in recognizing this theme in the future development of information organization and retrieval systems, we argue that the subject indexing mechanism subject indexing has been discussed in the research area of subject analysis for some time. weighting gives indexing an increased granularity and can be a device to counteract the effect of indexing specificity and exhaustivity on precision and recall, as pointed out by foskett: whereas specificity is a device to increase relevance at the cost of recall, exhaustivity works in the opposite direction, by increasing recall, but at the expense of relevance. a device which we may use to counteract this effect to some extent is weighting. in this, we try to show the significance of any particular specification by giving it a weight on a pre-established scale. for example, if we had a book on pets which dealt largely with dogs, we might give pets a weight of 10/10, and dogs, a weight of 8/10 or less.16 anderson also includes weighting as a part of indexing in the guidelines for indexes and related information retrieval devices (niso tr021997): one function of an index is to discriminate between major and minor treatments of particular topics or manifestations of particular features.17 he also notes that a weighting scheme is “especially useful in high-exhaustivity indexing”18 when both peripheral and primary topics are indicated. similarly, fidel lists “weights” as one of the issues that should be addressed in an indexing policy.19 metadata indexing without weighting is related to the simplified dichotomous assumption in subject indexing—primarily about/of and not primarily about/of, which further leads to the dichotomous retrieval result—retrieved and not retrieved. weighting as a mechanism to break this dichotomy is noted by metadata indexing even though human indexers are able to do the job much better than computers. weighting: yesterday, today, and future precedent weighting practices written more than thirty years ago, the final report of the subject access project describes how the project researchers applied weights to the newly added subject terms extracted from tables of contents and backof-the-book indexes. the criterion used in that project was that terms and phrases with a “ten-page range or larger” were treated as “major” ones.14 a similar mechanism was adopted in the eric database beginning in the 1960s, with indexes distinguishing “major” and “minor” descriptors as the result of indexing. while some search systems allowed differentiation of major and minor descriptors in formulating searches, others simply included the distinction (with an asterisk) when displaying a record. unfortunately, this distinguishing mechanism is no longer included in the later eric indexing data. a system using weighted indexing and searching and still running today is the medline/pubmed interface. a qualifier [majr] can be used with a medical subject headings (mesh) term in a query to “search a mesh heading which is a major topic of an article (e.g., thromboembolism[majr]).”15 in the search result page, each major mesh topic term is denoted by an asterisk at the end. weighting concept and the purpose of indexing the weighting concept is connected with the fundamental purpose of indexing. the idea of weighting in : | zhang et al. 79seeing the wood for the trees | zhang et al. 79 user tagging and machine generated metadata, such weighting becomes more important than ever if we are to make productive use of metadata richness and still see the wood for the trees. references 1. “dublin core metadata element set, version 1.1,” http://dublincore.org/docu ments/dces/ (accessed nov. 20, 2010). 2. library of congress, subject headings manual (washington, d.c.: library of congress, 2008). 3. ibid. 4. elaine svenonius, “access to nonbook materials: the limits of subject indexing for visual and aural languages,” journal of the american society for information science, 45, no. 8 (1994): 600–606. 5. “vra core 4.0 element description,” http://www.loc.gov/standards/vracore/ vra_core4_element_description.pdf (accessed mar. 31, 2011). 6. richard j. urban, michael b. twidale, and piotr adamczyk, “designing and developing a collections dashboard,” in j. trant and d. bearman (eds). museums and the web 2010: proceedings, ed. j. trant and d. bearman (toronto: archives & museum informatics, 2010). http://www .archimuse.com/mw2010/papers/urban/ urban.html (accessed apr. 5, 2011). 7. “vufind at the university of illinois,” http://vufind.carli.illinois.edu (accessed nov. 20, 2010). 8. library of congress, subject headings manual. 9. david hawking and justin zobel, “does topic metadata help with web search?” journal of the american society for information science & technology 58, no. 5 (2007): 613–28. 10. ibid. 11. ibid. 12. ibid, 625. 13. marianne lykke and anna g. eslau, “using thesauri in enterprise settings: indexing or query expansion?” in the janus faced scholar. a festschrift in honour of peter ingwersen, ed. birger larsen et al. (copenhagen: royal school of library & information science, 2010): 87–97. 14. subject access project, books are for use: final report of the subject access project to the council on library resources (syracuse, n.y.: syracuse univ., 1978). 15. “pubmed,” http://www.nlm.nih more than three categories or using continuous scales instead of category rating.24 subject indexing involves a similar judgment of relevance when deciding whether to include a subject term. more sophisticated scales certainly enable more useful ranking of results, but the cost of obtaining such information may rise. after the mechanism of incorporating weights into subject indexing/ cataloging is developed, guidelines should be provided for indexing practice to produce consistent and good quality. weights in both indexing and retrieval system adding weights to subject indexing/ cataloging needs to be considered and applied in three parts: (1) extending metadata schemas by encoding weights in subject elements; (2) subject indexing/cataloging with weight information; and (3) retrieval systems that exploit the weighting information in subject metadata elements. the mechanism will not work effectively in the absence of any one of them. conclusion this paper advocates for adding a weighting mechanism to subject indexing and tagging, to enable search algorithms to be more discriminating and browsing better oriented, and thus to make it possible to provide more granular access to information. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality. as social tagging is brought into today’s digital library collections and online library catalogs, as collections grow and are aggregated, and the opportunity arises for adding more metadata from a variety of different sources, including end should provide sufficient granularity to allow more granular access to information, as demonstrated in the examples in the previous section. potential challenges while arguing for the potential value of weights associated with subject terms, it is also important to acknowledge potential challenges posed by this approach. human judgment treating assigned terms equally might seem to avoid the additional human judgment and the subjectivity of the weight levels because different catalogers may give different weight to a subject heading. we argue that assigning subject headings is itself unavoidably subjective. we are already using professional indexers and subject catalogers to create value-added metadata in the form of subject terms. assigning weights would be a further enhancement. on the other hand, adding a weighting mechanism into metadata schemas is independent of the issue of human indexing. no matter who will do the subject indexing or tagging, either professional librarians or users or possibly computers, there is a need for weight information in the metadata records. the weighting scale in terms of the specific mechanism of representing the weight rating, we can benefit from research on weighting of index terms and on the relevance of search results. for example, the three categories of relevant, partially relevant, and nonrelevant in information retrieval are similar to the major, minor, and nonpresent subject indexing method in the examples above. borlund notes several retrieval studies proposing 80 information technology and libraries | june 2011 22. svenonius, “access to nonbook materials,” 601. 23. francis miksa, “information organization and the mysterious information user,” libraries & the cultural record 44, no. 3 (2009): 343–70. 24. pia borlund, “the concept of relevance in ir,” journal of the american society for information science & technology 54, no. 10 (2003): 913–25. 18. ibid. 19. raya fidel, “user-centered indexing,” journal of the american society for information science 45, no. 8 (1994): 572–75. 20. anderson, guidelines for indexes and related information retrieval devices, 20. 21. karen markey, “the online library catalog: paradise lost and paradise regained?” d-lib magazine 13, no. 1/2 (2007). . g o v / b s d / d i s t e d / p u b m e d t u t o r i a l / 020_760.html (accessed nov. 20, 2010). 16. a. c. foskett, the subject approach to information, 5th ed. (london: library association publishing, 1996): 24. 17. james d. anderson, guidelines for indexes and related information retrieval devices. niso-tr02–1997, http:// www.niso.org/publications/tr/tr02.pdf (accessed nov. 20, 2010): 25. 58 information technology and libraries | june 2010 know its power, and facets can showcase metadata in new interfaces. according to mcguinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 these functions offer several potential advantages to the user: the functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 library catalog records, of course, already supply “categories of interest” and a category structure. information science research has shown benefits to users from faceted search interfaces. but do these benefits hold true for systems as complex as library catalogs? this paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ method to find articles in the library and information science literature related to faceted browsing, the author searched the association for computing machinery (acm) digital library, scopus, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability in lista, the most successful searches included combining product names such as “aquabrowser” with “usability.” the search “catalog and usability” was also used. the author also searched google and the next generation catalogs for libraries (ngc4lib) electronic discussion list in an attempt to find unpublished studies. search terms initially included the concept of “clustering”; however, this was quickly shown to be a clearly defined, separate topic. according to hearst, “clustering refers to the grouping of items according to some measure faceted browsing is a common feature of new library catalog interfaces. but to what extent does it improve user performance in searching within today’s library catalog systems? this article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. both the results and the methods of these studies are analyzed by asking, what do we currently know about faceted browsing? how can we design better studies of faceted browsing in library catalogs? the article proposes methodological considerations for practicing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. m any libraries are now investigating possible new interfaces to their library catalogs. sometimes called “next-generation library catalogs” or “discovery tools,” these new interfaces are often separate from existing integrated library systems. they seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. one interesting feature these interfaces offer is called “faceted browsing.” hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 labarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list. in library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the marc record (see figure 1 for an example). using facets to structure information is not new to librarians and information scientists. as early as 1955, the classification research group stated a desire to see faceted classification as the basis for all information retrieval.3 in 1960, ranganathan introduced facet analysis to our profession.4 librarians like metadata because they jody condit fagan (faganjc@jmu.edu) is content interfaces coordinator, james madison university library, harrisonburg, virginia. jody condit fagan usability studies of faceted browsing: a literature review usability studies of faceted browsing: a literature review | fagan 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing the following summaries present selected empirical research studies that had significant findings related to faceted browsing or interesting methods for such studies. it is not an exhaustive list. pratt, hearst, and fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 they studied fifteen breast-cancer patients and families. every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. the subjects were given three simple queries related to breast cancer (e.g., “what are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. in this study, subjects completed two timed tasks. first, subjects found as many answers as possible to the question in four minutes. second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “can diet be used in the prevention of breast cancer?”) that related to the original, general query. for the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. the mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. this difference was significant (p<0.05).12 for the second task, the researchers found no significant difference between the tools when comparing time on task. the researchers gave the subjects a user-satisfaction questionnaire at the end of the study. on thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. this difference was statistically significant (p < 0.05). all fifteen users also affirmed that the faceted interface made sense, was helpful, was useful, and had clear labels, and said they would use the faceted interface again for another search. yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called flamenco.13 they collected data from thirty-two participants who were regular users of the internet, searching for information either every day or a few times a week. their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. an example of an unstructured task from their study was “search for images of interest.” an example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 using library catalog keywords to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. clustering has some advantages according to hearst. it is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. disadvantages to clustering include possible imperfections in the clustering algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintuitive subhierarchies.9 in user studies comparing clustering with facets, pratt, hearst, and fagan showed that users find clustering difficult to interpret and prefer a predictable organization of category hierarchies.10 ■■ results the author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. generally speaking, the information science literature consisted of empirical studies of interfaces created by the researchers. in some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. in the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. libraries had either implemented a new product, or they were thinking about figure 1. faceted results from jmu’s vufind implementation 60 information technology and libraries | june 2010 uddin and janacek asked nineteen users (staff and students at the asian institute of technology) to use a website search engine with both a traditional results list and a faceted results list.22 tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 they found that users were faster when using the faceted system, significantly so for two of the three tasks. success in finding relevant results was higher with the faceted system. in the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibility, interest, understanding of information content, and more search results relevancy. participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. capra et al. compared three interfaces in use by the bureau of labor statistics website, using a between-subjects study with twenty-eight people and a within-subjects study with twelve people.24 each set of participants performed three kinds of searches: simple lookup, complex lookup, and exploratory. the researchers used an interesting strategy to help control the variables in their study: because the bls website is a highly specialized corpus devoted to economic data in the united states organized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the us as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. the complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experimental interfaces as their best features. the researchers concluded, “if there is a predominant model of the information space, a well designed hierarchical organization might be preferred.”26 zhang and marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (relation browser ++).27 there were three types of tasks: ■■ type 1: simple look-up task (three tasks such as “check if the movie titled the matrix is in the library movie collection”). ■■ type 2: data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. the researchers designed the structured task so they knew exactly how many relevant results were in the system. they also gave a satisfaction survey. more participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. during the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 the researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 when given a post–test survey, participants identified the faceted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. they also rated it as slightly more “overwhelming.” when asked to choose between the two, twenty-nine participants chose the faceted interface, compared with two who chose the baseline (n = 31). thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twentyeight of them said it would be more useful for their usual tasks.16 the researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “these results indicate that a category-based approach is a successful way to provide access to image collections.”17 in a related usability study on the flamenco interface, english et al. compared two image browsing interfaces in a nineteen-participant study.18 after an initial search, the “matrix view” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. from this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. the “singletree” interface listed subcategories of the currently selected term at the top, with query previews underneath. the user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. the researchers found that a majority of participants preferred the “power” and “flexibility” of matrix to the simplicity of singletree. they found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. they did prefer singletree for locating a specific image, but matrix was preferred for browsing and exploring. participants started over only 0.2 percent of the time for the matrix compared to 4.5 percent for singletree.19 yet the faceted interface, matrix, was not “better” at everything. for specific image searching, participants found the correct image only 22.0 percent of the time in matrix compared to 66.0 percent in singletree.20 also, in matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. one interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usability studies of faceted browsing: a literature review | fagan 61 of the first two studies: the first study comprised one faculty member, five graduate students, and two undergraduate students; the second comprised two faculty members, four graduate students, and two undergraduate students. the third study did not report results related to faceted browsing and is not discussed here. the first study had seven scenarios; the second study had nine. the scenarios were complex: for example, one scenario began, “you want to borrow shakespeare’s play, the tempest, from the library,” but contained the following subtasks as well: 1. find the tempest. 2. find multiple editions of this item. 3. find a recent version. 4. see if at least one of the editions is available in the library. 5. what is the call number of the book? 6. you’d like to print the details of this edition of the book so you can refer to it later. participants found the interface friendly, easy to use, and easy to learn. all the participants reported that faceted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between primo and their library opac or other interfaces. facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 one specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 in a study of aquabrowser by olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 aquabrowser was created by medialab but is distributed by serials solutions in north america. this study also had three pilot subjects. no relevance judgments were made by the researchers. nine of the twelve subjects found relevant materials by using aquabrowser that they had not found before.34 olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. they gave overwhelmingly positive comments. only two felt the faceted interface was not an improvement. some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. a team at yale university, led by bauer, recently conducted two tests on pilot vufind installations: a subject-based presentation of e-books for the cushing/ whitney medical library and a pilot test of vufind using undergraduate students with a sample of 400,000 records from the library system.35 vufind is open-source software developed at villanova university (http://vufind.org). that require users to understand and make sense of the information collection: “in which decade did steven spielberg direct the most movies?”). ■■ type 3: (one free exploration task: “find five favorite videos without any time constraints”). the tasks assigned for the two interfaces were different but comparable. for type 2 tasks, zhang and marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 no participants got wrong answers for any but one of the tasks using the faceted interface. with regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction questions. participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. two participants were confused by the constant changing and updating of the faceted interface. the above studies are examples of empirical investigations of experimental interfaces. hearst recently concluded that facets are a “proven technique for supporting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “autocomplete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 in the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. these studies all use real library catalog records, usually the entire catalog’s database. in most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. in general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. in this review article, only results specifically relating to faceted browsing will be presented. sadeh described a series of usability studies performed at the university of minnesota (um), a primo development partner.30 primo is the next-generation library catalog product sold by ex libris. the author also received additional information from the usability services lab at um via e-mail. three studies were conducted in august 2006, january 2007, and october 2007. eight users from various disciplines participated in each 62 information technology and libraries | june 2010 participants. the researchers measured task success, duration, and difficulty, but did not measure user satisfaction. their study consisted of four known-item tasks and six topic-searching tasks. the topic-searching tasks were geared toward the use of facets, for example, “can you show me how would you find the most recently published book about nuclear energy policy in the united states?”45 all five participants using endeca understood the idea of facets, and three used them. students tried to limit their searches at the outset rather than search and then refine results. an interesting finding was that use of the facets did not directly follow the order in which facets were listed. the most heavily used facet was library of congress classification (lcc), followed closely by topic, and then library, format, author, and genre.46 results showed a significantly shorter average task duration for endeca catalog users for most tasks.47 the researchers noted that none of the students understood that the lcc facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from different aspects—science, medicine, education.”48 the authors could find no published studies relating to the use of facets in some next-generation library catalogs, including encore and worldcat local. although the university of washington did publish results of a worldcat local usability study in a recent issue of library technology reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ discussion summary of empirical evidence related to faceted browsing empirical studies in the information science literature support many positive findings related to faceted browsing and build a solid case for including facets in search interfaces: ■■ facets are useful for creating navigation structures.50 ■■ faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ facets help avoid dead ends.52 ■■ users are faster when using a faceted system.53 ■■ success in finding relevant results is higher with a faceted system.54 ■■ users find more results with a faceted system.55 ■■ users also seem to like facets, although they do not always immediately have a positive reaction. ■■ users prefer search results organized into predictable, multidimensional hierarchies.56 ■■ participants’ satisfaction is higher with a faceted system.57 the team drew test questions from user search logs in their current library system. some questions targeted specific problems, such as incomplete spellings and incomplete title information. bauer notes that some problems uncovered in the study may relate to the peculiarities of the yale implementation. the medical library study contained eight participants—a mix of medical and nursing students. facets, reported bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 the prompt for the faceted task in this study came after the user had done a search: “what if you wanted to look at a particular subset, say ‘xxx’ (determine by looking at the facets).”37 half of the participants used facets, half used “search within” to narrow the topic by adding keywords. sixty-two percent of the participants were successful at this task. the undergraduate study asked five participants faced with a results list, “what would you do now if you only wanted to see material written by john adams?”38 on this task, only one of the five was successful, even though the author’s name was on the screen. bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 the five undergraduates were also asked, “could you find books in this set of results that are about health and illness in the united states population, or control of communicable diseases during the era of the depression?”40 again, only one of the five was successful. bauer notes that “the overly broad search results made this difficult for participants. again, topic facets were difficult to navigate and not particularly useful to this search.”41 bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. these imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some participants.43 other results found by bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic facets, format and era facets were useful to participants, and call-number facets were not used by anyone. antelman, pace, and lynema studied north carolina state university’s (ncsu) next-generation library catalog, which is driven by software from endeca.44 their study used ten undergraduate students in a between-subjects design where five used the endeca catalog and five used the library’s traditional catalog. the researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most ncsu users enter one or two terms, which was not true of study usability studies of faceted browsing: a literature review | fagan 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. still, it is important for practicing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices after reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ consider reusing protocols from previous studies. this provides not only a tested method but also a possible point of comparison. ■■ define clear goals for each study and focus on specific research questions. it’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. for example, one of zhang and marchionini’s hypotheses specifically describes what rich interaction would look like: “typing in keywords and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ develop the study for one type of user. olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ pilot test the study with a student worker or colleague to iron out potential wrinkles. ■■ let users explore the system for a short time and possibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ make sure user performance on each task is measurable. will you measure the time spent on a task? if “success” is important, define what that would look like. for example, english et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ establish benchmarks for comparison. one can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ users are more confident with a faceted system.58 ■■ users may prefer the familiarity afforded by traditional website interface (hyperlinks + keyword search).59 ■■ initial reactions to the faceted interface may be cautious, seeing it as different or unfamiliar.60 users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. english et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 the above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing this review also included studies from practicing librarians at live library implementations. these studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. studies were usually linked to a specific product, and results varied widely between systems and studies. for this reason it is difficult to assemble a bulleted summary as with the previous section. the variety of results from these studies indicate that when faceted browsing is applied to a reallife situation, implementation details can greatly affect user performance and user preference. some, like labarre, are skeptical about whether facets are appropriate for library information. descriptions of library materials, says labarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 it is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an opac evaluation. yet it is a challenge for libraries to find expertise and resources to conduct user studies. the systems they want to test are large and complex. collaborating with other libraries has its own challenges: an evaluation of 64 information technology and libraries | june 2010 groups of participants, each of which tests a different system. ■❏ a within-subjects design has one group of participants test both systems. it is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing after examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and designing tasks and measurements to test specific hypotheses. while describing a full user-study protocol for investigating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. for example, one research question could surround the format facet: “compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” hypotheses could include the following: 1. users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. users will be able to identify formats of items more quickly with the faceted interface than with the traditional interface. looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “we will be asking you to find a variety of formats of materials. when we say formats of materials, we mean books, journal articles, videos, etc.” ■■ task 1: please use interface a to search on “interpersonal communication.” look at your results set. please list as many different formats of material as you can. ■■ task 2: how many items of each format are there? ■■ task 3: please use interface b to search on “family communication.” what formats of materials do you see in your results set? ■■ task 4: how many items of each format are there?” we would choose the topics “interpersonal communication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. we would choose different topics to expectations or against previous iterations of the same study. for example, “75 percent of users completed the task within five minutes.” zhang and marchionini measured error rates, another possible benchmark.67 ■■ consider looking at your existing opac logs for zeroresults searches or other issues that might inspire interesting questions. ■■ target tasks to avoid distracters. for example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. for example, capra et al. decided to include the united states as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ for some tasks, give the subjects simple queries (e.g., “what are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. this can help control for the potential challenges of formulating one’s own research question on the spot. as librarians know, formulating a good research question is its own challenge. ■■ if you are using any timed tasks, consider how the nature of your tasks could affect the result. for example, pratt, hearst, and fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 english et al. found that the system’s processing time influenced their results.70 ■■ consider the implications of your local implementation carefully when designing your study. at yale, the team chose to point their vufind instance at just 400,000 of their records, drew questions from problems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ try to study a larger set of users. it is better to create a short test with many users than a long test with a few users. nielsen suggests that twenty users is sufficient.72 consider collaborating with another library if necessary. ■■ if you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ use subjects who are already interested in the subject domain: for example, pratt, hearst, and fagan used breast cancer patients,73 and olson used graduate students currently writing their dissertations.74 ■■ consider focusing on advanced or scholarly users. la barre suggests that undergraduates may be overstudied.75 ■■ for comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ a between-subjects design involves creating two usability studies of faceted browsing: a literature review | fagan 65 these experimental studies. previous case-study investigations of library catalog interfaces with facets have proven inconclusive. by choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. references 1. marti a. hearst, “clustering versus faceted categories for information exploration,” communications of the acm 49, no. 4 (2006): 60. 2. kathryn la barre, “faceted navigation and browsing features in new opacs: robust support for scholarly information seeking?” knowledge organization 34, no. 2 (2007): 82. 3. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71. 4. s. r. ranganathan, colon classification basic classification, 6th ed. (new york: asia, 1960). 5. deborah l. mcguinness, “ontologies come of age,” in spinning the semantic web: bringing the world wide web to its full potential, ed. dieter fensel et al. (cambridge, mass.: mit pr., 2003): 179–84. 6. hearst, “clustering versus faceted categories,” 60. 7. ibid., 61. 8. ibid., 59. 9. ibid.. 60. 10. wanda pratt, marti a. hearst, and lawrence m. fagan, “a knowledge-based approach to organizing retrieved documents,” proceedings of the sixteenth national conference on artificial intelligence, july 18–22, 1999, orlando, florida (menlo park, calif.: aaai pr., 1999): 80–85. 11. ibid. 12. ibid., 5. 13. ka-ping yee et al., “faceted metadata for image search and browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed oct. 6, 2008). 14. ibid., 6. 15. ibid., 7. 16. ibid. 17. ibid., 8. 18. jennifer english et al., “flexible search and navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed apr. 22, 2010). 19. ibid., 7. 20. ibid., 6. 21. ibid., 7. 22. mohammed nasir uddin and paul janecek, “performance and usability testing of multidimensional taxonomy in web site search and navigation,” performance measurement and metrics 8, no. 1 (2007): 18–33. 23. ibid., 25. 24. robert capra et al., “effects of structure and interaction style on distinct search tasks,” proceedings of the 7th acm-ieee-cs joint conference on digital libraries (new york: acm, 2007): 442–51. 25. ibid., 446. 26. ibid., 450. help minimize learning effects. to further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. this way we can test for differences resulting from learning. the above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. for tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. for tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. we could also time the user to determine which interface helped them work more quickly. in addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. to measure user satisfaction, we would ask participants to complete the system usability scale (sus) after each interface and, at the very end of the study, complete a questionnaire comparing the two interfaces. even just selecting the format facet, we would have plenty to investigate. other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible parties, such as author or director: hypothesis: users can find more materials written in a certain time period using the faceted interface. task: find ten items of any type (books, journals, movies) written in the 1950s that you think would have information about television advertising. hypothesis: users can find movies directed by a specific person more quickly using the faceted interface. task: in the next two minutes, find as many movies as you can that were directed by orson welles. for the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. for the second, the total number of movies in the library catalog that were directed by welles is an objective quantity. in both cases, one could compare the user’s performance on the two interfaces. ■■ conclusion reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 information technology and libraries | june 2010 53. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hao chen and susan dumais, bringing order to the web: automatically categorizing search results (new york: acm, 2000): 145–52. 54. uddin and janecek, “performance and usability testing.” 55. ibid.; pratt, hearst, and fagan, “a knowledge-based approach”; hsinchun chen et al., “internet browsing and searching: user evaluations of category map and concept space techniques,” journal of the american society for information science 49, no. 7 (1998): 582–603. 56. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71; pratt, hearst, and fagan, “a knowledge-based approach,” 80–85.; chen et al., “internet browsing and searching,” 582–603; yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation using faceted metadata.” 57. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hideo joho and joemon m. jose, slicing and dicing the information space using local contexts (new york: acm, 2006): 66–74.; yee et al., “faceted metadata for image search and browsing.” 58. yee et al., “faceted metadata for image search and browsing”; chen and dumais, bringing order to the web. 59. capra et al., “effects of structure and interaction style.” 60. yee et al., “faceted metadata for image search and browsing”; capra et al., “effects of structure and interaction style”; zhang and marchionini, evaluation and evolution. 61. english et al., “flexible search and navigation,” 7. 62. yee et al., “faceted metadata for image search and browsing,” 7. 63. la barre, “faceted navigation and browsing,” 85. 64. zhang and marchionini, evaluation and evolution, 183. 65. english et al., “flexible search and navigation.” 66. ibid., 6. 67. zhang and marchionini, evaluation and evolution. 68. capra et al., “effects of structure and interaction style.” 69. pratt, hearst, and fagan, “a knowledge-based approach.” 70. english et al., “flexible search and navigation.” 71. bauer, “yale university library vufind test—undergraduates.” 72. jakob nielsen, “quantitative studies: how many users to test?” online posting, alertbox, june 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed apr. 7, 2010). 73. pratt, hearst, and fagan, “a knowledge-based approach.” 74. tod a. olson used graduate students currently writing their dissertations. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 75. la barre, “faceted navigation and browsing.” 76. capra et al., “effects of structure and interaction style.” 27. junliang zhang and gary marchionini, evaluation and evolution of a browse and search interface: relation browser++ (atlanta, ga.: digital government society of north america, 2005): 179–88. 28. ibid., 183. 29. marti a. hearst, “uis for faceted navigation: recent advances and remaining open problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed apr. 27, 2010). 30. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (jan. 2008): 7–24. 31. ibid., 22. 32. jerilyn veldof, e-mail from university of minnesota usability services lab, 2008. 33. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 34. ibid., 555. 35. kathleen bauer, “yale university library vufind test— undergraduates,” may 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed apr. 27, 2010); kathleen bauer and alice peterson-hart, “usability test of vufind as a subject-based display of ebooks,” aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed apr. 27, 2010). 36. bauer and peterson-hart, “usability test of vufind as a subject-based display of ebooks,” 1. 37. ibid., 2. 38. ibid., 3. 39. ibid. 40. ibid., 4. 41. ibid. 42. ibid., 5. 43. ibid., 8. 44. kristin antelman, andrew k. pace, and emily lynema, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 45. ibid., 139. 46. ibid., 133. 47. ibid., 135. 48. ibid., 136. 49. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (aug. 2008): 22. 50. english et al., “flexible search and navigation.” 51. peter ingwersen and irene wormell, “ranganathan in the perspective of advanced information retrieval,” libri 42 (1992): 184–201; winfried godert, “facet classification in online retrieval,” international classification 18, no. 2 (1991): 98–109.; w. godert, “klassificationssysteme und online-katalog [classification systems and the online catalogue],” zeitschrift für bibliothekswesen und bibliographie 34, no. 3 (1987): 185–95. 52. yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation.” editor’s comments bob gerrity information technology and libraries | september 2012 1 g’day, mates, and welcome to our third open-access issue. ital takes on an additional international dimension with this issue, as your faithful editor has taken up residence down under, in sunny queensland, australia. the recent ala annual meeting in anaheim marked some changes to the ital editorial board that i’d like to highlight. cynthia porter and judith carter are ending their tenure with ital after many years of service. cynthia is featured in this month’s editorial board thoughts column, offering her perspective on library technology past and present. judith carter ends a long run with ital as managing editor, and i thank her for her years of dedicated service. ed tallent, director of levin library at curry college, is the incoming managing editor. we also welcome two new members of the editorial board: brad eden, the dean of library services and professor of library science at valparaiso university, and jerome yavarkovsky, former university librarian at boston college, and the 2004 recipient of ala’s hugh c. atkinson award. jerome currently co-chairs the library technology working group at the mediagrid immersive education initiative. we cover a broad range of topics in this issue. ian chan, pearl ly, and yvonne meulemans describe the implementation of the open-source instant messaging (im) network openfire at california state university san marcos, in supporting of the integration of chat reference and internal library communications. richard gartner explores the use of the metadata encoding and transmission standard (mets) as an alternative to the fedora content model (fcm) for an “intermediary” digital-library schema. emily morton and karen hanson present an innovative approach to creating a management dashboard of key library statistics. kate pittsley and sara memmott describe navigational improvements made to libguides at eastern michigan university. bojana surla reports on the development of a platform-independent, java-based marc editor. yongming wang and trevor dawes delve into the need for next-generation integrated library systems and early initiatives in that space. melanie schlosser and brian stamper begin to explore the effects of reposting library digital collections on flickr. in addition to the compelling new content in this issue of ital, we have compelling old content from the print archive of ital and its predecessor, journal of library automation (jola), that will soon be available online, thanks in large to the work of andy boze and colleagues at the university of notre dame. scans of all of the back issues have now been deposited onto the server that currently hosts ital, and will be processed and published online over the coming months. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, st. lucia, queensland, australia. editorial and technological workflow tools to promote website quality | morton-owens 91 emily g. morton-owens editorial and technological workflow tools to promote website quality everard and galletta performed an experimental study with 232 university students to discover whether website flaws affected perception of site quality and trust. their three types of flaws were incompleteness, language errors (such as spelling mistakes), and poor style in terms of “ambiance and aesthetics,” including readable formatting of text. they discovered that subjects’ perception of flaws influenced their judgment of a site being highquality and trustworthy. further, they found that the first perceived error had a greater negative impact than additional problems did, and they described website users as “quite critical, negative, and unforgiving.”5 briggs et al. did two studies of users’ likelihood of accepting advice presented on a website. of the three factors they considered—credibility, personalization, and predictability—credibility was the most influential in predicting whether users would accept or reject the advice. “it is clear,” they report, “that the look and feel of a web site is paramount in first attracting the attention of a user and signaling the trustworthiness of the site. the site should be . . . free of errors and clutter.”6 though none of these studies focuses on libraries or academic websites and though they use various metrics of trustworthiness, together they point to the importance of quality. text quality and functional usability should be important to library website managers. libraries ask users to entrust them to choose resources, answer questions, and provide research advice, so projecting competence and trustworthiness is essential. it is a challenge to balance the concern for quality with the desire to update the website frequently and with librarians’ workloads. this paper describes a solution implemented in drupal that promotes participation while maintaining quality. the editorial system described draws on the author’s prior experience working in book publishing at penguin and random house, showing how a system that ensures quality in print publishing can be adjusted to fit the needs of websites. ■■ setting editing most people think of editing in terms of improving the correctness of a document: fixing spelling or punctuation errors, fact-checking, and so forth. these factors are probably the most salient ones in the sense that they are editor’s note: this paper is adapted from a presentation given at the 2010 lita forum library websites are an increasingly visible representation of the library as an institution, which makes website quality an important way to communicate competence and trustworthiness to users. a website editorial workflow is one way to enforce a process and ensure quality. in a workflow, users receive roles, like author or editor, and content travels through various stages in which grammar, spelling, tone, and format are checked. one library used a workflow system to involve librarians in the creation of content. this system, implemented in drupal, an opensource content management system, solved problems of coordination, quality, and comprehensiveness that existed on the library’s earlier, static website. t oday, libraries can treat their websites as a significant point of user contact and as a way of compensating for decreases in traditional measures of library use, like gate counts and circulation.1 websites offer more than just a gateway to journals; librarians also can consider instructional or explanatory webpages as a type of public service interaction.2 as users flock to the web to access electronic resources and services, a library’s website becomes an increasingly prominent representation of the library. at the new york university health sciences libraries (nyuhsl), for example, statistics for the 2009–10 academic year showed 580,980 in-person visits for all five locations combined. by comparison, the website received 986,922 visits. in other words, the libraries received 70 percent more website visits than in-person visits. many libraries conduct usability testing to determine whether their websites meet the functional needs of their users. a concern related to usability is quality: users form an impression of the library partly based on how it presents itself via the website. as several studies outside the library arena have shown, users’ experience of a website leads them to attribute characteristics of competence and trustworthiness to the sponsoring organization. tseng and fogg, discussing non-web computer systems, present “surface credibility” as one of the types of credibility affecting users. they suggest that “small computer errors have disproportionately large effects on perceptions of credibility.”3 in another paper by fogg et al., “amateurism” is one of seven factors in a study of website credibility. the authors recommend that “organizations that care about credibility should be ever vigilant—and perhaps obsessive—to avoid small glitches in their websites. . . . even one typographical error or a single broken link is damaging.”4 emily g. morton-owens (emily.morton-owens@med.nyu.edu) is web services librarian, new york university health sciences libraries, new york. 92 information technology and libraries | september 2011 happens when a page moves from one state to another. the very simple workflow in figure 1 shows two roles (author and editor) and three states (draft, approval, and published). there are two transitions with permissions attached to them. only the author can decide when he or she is done working and make the transition from draft to approval. only the editor can decide when the page is ready and make the transition from approval to published. (in these figures, dotted borders indicate states in which the content is not visible to the public.) a book publishing workflow involves perhaps a dozen steps in which the manuscript passes between the author, his or her agent, and various editorial staff. a year can pass between receiving the manuscript and publishing the book. the reason for that careful, conservative process is that it is very difficult to fix a book once thousands of copies have been printed in hardcover. by contrast, consider a newspaper: a new version appears every day and contains corrections from previous editions. a newspaper workflow is hardly going to take a full year. a website is even more flexible than a newspaper because it can be fixed or improved at any time. the kind of multistep process used for books and newspapers is effective, but not practical for websites. a website should have a workflow for editorial quality control, but it should be proportional to the format in terms of the number of steps, the length of the process, and the number of people involved. alternate workflow models this paper focuses on a contributor/editor model in which multiple authors create material that is vetted by a central authority: the editor. other models could be implemented with much the same tools. for example, in a peer-review system as is used for academic journals, there is a reviewer role, and an article could have states like “published,” “under review,” “conditionally accepted,” and so forth. most noticeable when neglected. editors, however, have several other important roles. for example, they select what will be published. in book publishing, that involves rejecting the vast majority of material that is submitted. in many professional contexts, however, it means soliciting contributions and encouraging authors. either way, the editor has a role in deciding what topics are relevant and what authors should be involved. additionally, editors are often involved in presenting their products to audiences. in book publishing, that can mean weighing in on jacket designs or soliciting blurbs from popular authors. on websites, it might mean choosing templates or fonts. editors want to make materials attractive and accessible to the right audience. together, correctness, choice, and presentation are the main concerns of an editor and together contribute to quality. each of these ideas can be considered in light of library websites. correctness means offering information that is current and free of errors, contradictions, and confusing omissions. it also means representing the organization well by having text that is well written and appropriate for the audience. writing for the web is a special skill; people reading from screens have a tendency to skim, so text should be edited to be concise and preferably organized into short chunks with “visible structure.”7 there is also good guidance available about using meaningful link words, action phrases, and “layering” to limit the amount of information presented at once.8 of course, correctness also means avoiding the kind of obvious spelling and grammar mistakes that users find so detrimental. choice probably will not involve rejecting submissions to the website. instead, in a library context it could mean identifying information that should appear on the website and writing or soliciting content to answer that need. presentation may or may not have a marketing aspect. a public library’s website may advertise events and emphasize community participation. as an academic medical library, nyuhsl has in some sense a captive audience, but it is still important to communicate to users that librarians understand their unique and highlevel information needs and are qualified to partner with them. workflow a workflow is a way to assign responsibility for achieving the goals of correctness, choice, and presentation. it breaks the process down into steps that ensure the appropriate people review the material. it also leaves a paper trail that allows participants to see the history and status of material. workflow can alleviate the coordination problems that prevent a website from exhibiting the quality it should. a workflow is composed of states, roles, and transitions. pages have states (like “draft” or “published”) and users have roles (like “contributor” or “editor”). a transition figure 1. very basic workflow editorial and technological workflow tools to promote website quality | morton-owens 93 effect was on the quality of the website, which contained mistakes and confusing information. ■■ methods nyuhsl workflow and solutions to resolve its web management issues, nyuhsl chose to work with the drupal content management system (cms). the ability to set up workflow and inventory content by date, subject, or author was a leading reason for that decision. other reasons included usability of the backend for librarians, theming options, the scripting language the cms uses (php), and drupal’s popularity with other libraries and other nyu departments.9 nyuhsl’s drupal environment has four main user roles: 1. anonymous: these are visitors to the nyuhsl site who are not logged in (i.e., library users). they have no permissions to edit or manage content. they have no editorial responsibilities. 2. library staff: this group includes all the staff content authors. their role is to notice what content library users need and to contribute it. staff have been encouraged to view website contributions as something casual—more akin to writing an e-mail than writing a journal article. 3. marketing team: this five-member group checks content that will appear on the homepage. their mandate is to make sure that the content is accurate about library services and resources and represents the library well. its members include both librarians and staff with relevant experience. 4. administrators: there are three site admins; they have the most permissions because they also build the site and make changes to how it works. two of the three admins have copyediting experience from prior jobs, so they are responsible for content approvals. they copyedit for spelling, grammar, and readability. admins also check for malformed html created by the wysiwyg (what you see is what you get) interface provided for authors, and they use their knowledge of other material on the site to look out for potential conflicts or add relevant links. returning to the themes of correctness, choice, and presentation, it could be said that librarian authors are responsible for choice (deciding what to post), the marketing team is responsible for choice and presentation, and the administrators are responsible for all three. an important thing to understand is that each person in a role has the same permissions, and any one of in an upvoting system like reddit (http://reddit .com), content is published by default, any user has the ability to upvote (i.e., approve) a piece of content, and the criterion for being featured on the front page is the number of approvals. in a moderation system, any user can submit content and the default behavior is for the moderator to approve anything that is not outright offensive. the moderator never edits, just chooses the state “approved” or the state “denied.” moderation is often used to manage comments. another model, not considered here, is to create separate “staging” and “production” websites. content and features are piloted on the staging site before being pushed to the live site. (nyuhsl’s workflow occurs all on the live site.) still, even in a staging/production system the workflow is implicit in choosing someone who has the permission and responsibility to push the staging site to the production site. problems at nyuhsl in 2007, the web services librarian position at nyuhsl had been open for nearly a year. librarians who needed to post material to the website approached the head of library systems or the “sysadmin.” both of them could post pages, but they did not proofread. pages that became live on the website stayed: they were never systematically checked. if a librarian or user noticed a problem with a page, it was not clear who had the correct information or was responsible for fixing it. often, pages that were found to be out-of-date would be delinked from other pages but were left on the server and thus findable via search engines or bookmarks. because only a few people had ftp access to the server, but authored little content, the usernames shown on the server were useless for determining who was responsible for a page. similarly, timestamps on the server were misleading; someone might fix one link on a page without reviewing the rest of it, so the page could have a recent timestamp but be full of outdated information. even after a new web services librarian started in 2007, problems remained. the new librarian took over sole responsibility for posting content, which made the responsibility clearer but created a bottleneck, for example, if she went on vacation. furthermore, in a library with five locations and about sixty full-time employees, it was hard for one person to do justice to all the libraries’ activities. if a page required editing, there was no way to keep track of whose turn it was to work on the document. there also was no automatic notification when a page was published. this made it possible for content to go astray and be forgotten. these problems added up to frustration for would-be content authors, a time drain for systems staff, and less time to create new content and sites. the most significant 94 information technology and libraries | september 2011 at the top of the homepage. their appearance should not be delayed, so any staff author can publish one. class sessions are specific dates, times, and locations that a class is being offered. these posts are assembled from prewritten text, so there is no way to introduce errors and no reason to route them through an approval step. figure 2 illustrates the main steps of the three cases. the names of the states are shown with arrows indicating which role can make each transition. unlabeled arrows mean that any staff member can perform that step. figure 3 shows how, at each approval step, content can be sent back to the author (with comments) for revision. although this happens rarely, it is important to have a way to communicate with the author in a way that is traceable by the workflow. figure 4 illustrates the concept of retirement. nyuhsl needed a way to hide content from library users and search engines, but it is dangerous to allow library staff to delete content. also, old content is sometimes useful to refer to or can even be republished if the need arises. any library staff user can retire content if they recognize it as no longer relevant or appropriate. additionally, library staff can resurrect retired content by resetting it to the draft state. that is, they cannot directly publish retired content (because they do not have permission to publish), but they can put it back on the path to being published by saving it as a draft, editing, and resubmitting for approval. figure 5 shows that library staff do not really need to understand the details of workflow. for any new content, they only have two options: keep the content in the draft state or move it on to whatever next step is available. all them can perform an action. the five marketing team members do not vote on the content, nor do they all have to approve it; instead, any one of them, who happens to be at his workstation when they get a notification, is sufficient to perform the marketing team duty. also, the marketing team members and administrators do not “self-approve”—no matter how good an editor someone may be, he or she is rarely good at editing her own work. nyuhsl’s workflow considers three cases: 1. most types of content are reviewed by one of the administrators before going live. 2. content types that appear on the homepage (i.e., at higher visibility) are reviewed by a member of the marketing team before being reviewed by an administrator. 3. two types of content do not go through any workflow. alerts are urgent messages that appear in red figure 2. approval steps figure 3. returning contents for edits figure 4. retirement editorial and technological workflow tools to promote website quality | morton-owens 95 this may sound like a large volume of e-mail, but it does not appear to bother library staff. the subject line of every e-mail generated by the system is prefaced with “[hsl site]” for easy filtering. also, every e-mail is signed with “love, the nyuhsl website.” this started as a joke during testing but was retained because staff liked it so much. one described it as giving the site a “warm, fuzzy feeling.” drupal modules nyuhsl developers used a number of different drupal modules to achieve the desired workflow functionality. a simple system could be achieved using fewer modules; the book using drupal offers a good walkthrough of workflow, actions, and trigger.10 of course, it also would be possible to implement these ideas in another cms or in a homegrown system. this list does not describe how to configure each module because the features are constantly evolving; more information is available on the drupal website.11 the drupal modules used include: ■■ workflow ■■ actions ■■ trigger ■■ token ■■ module grants ■■ wysiwyg, imce, imce wysiwyg api bridge ■■ node expire ■■ taxonomy role ■■ ldap integration ■■ rules ■■ results participation figure 6 shows the number of page revisions per person from july 14, 2009, to november 4, 2010. since many pages are static and were created only once, but need to be updated regularly, a page creation and a page update count equally in this accounting, which was drawn from the node_revisions table in drupal. it gives a general sense of content-related activity. a reasonable number of staff have logged in, including all of the librarians and a number of staff in key positions (such as branch managers). the black bars represent the administrators of the website. it is clear that the workflow system, while broadening participation, has hardly diffused primary responsibility of managing the website. the web services librarian and web manager have by far the most page edits, as they both write new content and edit content written by all other users. of the other options are hidden because staff do not have permission to perform them. the status of content in the workflow can be checked by clicking on the workflow tab of each page, but it also is tracked by notification e-mails. when the content enters a state requiring an approval, each person in that approving role gets an e-mail letting them know something needs their attention. the e-mail includes a link directly to the editing page. for example, if a librarian writes a blog post and changes its state from “draft” to “ready for marketing approval,” he or she gets a confirmation e-mail that the post is in the marketing approval queue. the marketing team members each get an e-mail asking them to approve the post; only one needs to do so. once someone has performed that approval, the marketing team members receive an e-mail letting them know that no further action is required. now the content is in the “ready for approval” state and the author gets another e-mail notification. the administrators get a notification with a link to edit the post. once an administrator gives the post final approval, the author gets an e-mail indicating that the post is now live. the nyuhsl website workflow system also includes reminders. each piece of content in the system has an author (authorship can be reassigned, so it is not necessarily the person who originally created the page). the author receives an e-mail every four months reminding him or her to check the content, revise it if necessary, and re-save it so that it gets a new timestamp. if the author does not do so, he or she will continue to get reminders until the task is complete. also, the site administrators can refer to a list of content that is out of date and can follow up in person if needed. note that reminders only apply to static content types like pages and faqs, not to blog posts or event announcements, which are not expected to have permanent relevance. figure 5. workflow choices for library staff users 96 information technology and libraries | september 2011 check the status by clicking on the workflow tab. this eliminates the discouraging mystery of having content get lost on the way to being published. ■■ identifying “problem” content: the node expire module has been modified to send e-mail reminders about stale content; as a result, this “problem” figure 7 shows the distribution of content updates once the web team members have been removed. it is clear that a small number of heroic contributors are responsible for the bulk of new content and updates, with other users logging on sporadically to address specific needs or problems. how editorial workflow addresses nyuhsl’s problems different aspects of the nyuhsl editorial workflow address different website problems that existed before the move to a cms. together, the workflow features create a clearly defined track that marches contributed content along a path to publication while always making the history and status of that content clear. ■■ keeping track of who wrote what when: this information is collected by the core drupal software and visible on administrative pages. (drupal also can be customized to display or sort this information in more convenient ways.) ■■ preventing mistakes and inconsistencies: this requires a human editor, but drupal can be used to formalize that role, assign it to specific people, and ensure nothing gets published without being reviewed by an editor. ■■ bottlenecks: nyuhsl eliminated bottlenecks that stranded content waiting for one person to post it by creating roles with multiple members, any one of whom can advance content to the next state. there is no step in the system that can be performed by only one person. ■■ knowledge: the issue of having too much going on in the library for one person to report on was addressed by making it easier for more people to contribute. drupal encourages this through its usability (especially a wysiwyg editor), and workflow makes it safe by controlling how the contributions are posted. ■■ “lost” content: when staff contribute content, they get e-mail notifications about its status and also can figure 6. number of revisions by user each user is indicated by their employee type rather than by name. figure 7. number of revisions by user, minus web team each user is indicated by their employee type rather than by name editorial and technological workflow tools to promote website quality | morton-owens 97 places web content in the context of other communication methods, like e-mail marketing, press releases, and social media.12 in her view, it is not enough to consider a website on its own; it has to be part of a complete strategy for communicating with an organization’s audience. libraries embarking on a website redesign would benefit from contemplating this larger array of strategic issues in addition to the nitty-gritty of creating a process to ensure quality. ■■ conclusions nyuhsl differs from other libraries in its size, status as an academic medical library, level of it staffing, and other ways. some aspects of nyuhsl’s experience implementing editorial workflow will, however, likely be applicable to other libraries. it does not necessarily make sense to assign editorial responsibility to it staff; instead, there may be someone on staff who has editorial or journalistic experience and could serve as the content approver. many universities offer short copyediting courses, and a prospective website editor could attend such a course. implementing a workflow system, especially in drupal, requires a lot of detailed configuration. developers should make sure the workflow concept is clearly mapped out in terms of states, roles, and transitions before attempting to build anything. workflow can seem complicated to users too, so developers should endeavor to hide as much as possible from nonadministrators. small mistakes in drupal settings and permissions can cause confusing failures in the workflow system. for example, a user may find him or herself unable to advance a blog post from “draft” to “ready for approval,” or a state change from “ready for approval” to “live,” and may not actually cause the content to be published. it would save time in the long run to thoroughly test all the possibilities with volunteers who play each role before the site is in active use. finally, when the workflow is in place, the website’s managers may find themselves doing less writing and fewer content updates. they have a new role, though: to curate the site and support staff who use the new tools. the concept of editing is not yet consistently applied to websites unless the site represents an organization that already relies on editors (like a newspaper)—but it is gaining recognition as a best practice. if the website is the most readily available public face of an institution, it should receive editorial attention just as a brochure or fundraising letter would. workflow is one way that libraries can promote a higher level of quality and perceived competence and reliability through their website presence. content is usually addressed by library staff without the administrators/editors doing anything at all. the administrators also can access a page that lists all the content that has been marked as “expired” so they know with whom to follow up. ■■ outdated content: some content may be outdated and undesirable to show the public or be indexed by search engines, but be useful to librarians. it also is not safe to allow staff to delete content, as they may do so by accident. these issues are addressed by the notion of “retiring” content, which hides content by unpublishing it but does not delete it from the system. ■■ future work the workflow system sets up an environment that achieves nyuhsl’s goals, structurally speaking, but social (nontechnology) considerations prevent it from living up to its full potential. not all of the librarians contribute regularly. this is partly because they are busy, and writing web content is not one of their job requirements. another reason is that some staff are more comfortable using the system than others, a phenomenon that reinforces itself as the expert users spend more time creating content and become even more expert. a third cause is that not all librarians may perceive that they have something useful to say. reluctant contributors have no external motivation to increase their involvement. it would be helpful to formalize the role of librarians as content contributors. there is presently no librarian at nyuhsl whose job description includes writing content for the website; even the web services librarian is charged only with “coordinating, designing, and maintaining” sites. ideally, every librarian job description would include working with users and would mention writing website content as an important forum for that. that said, it is not clear what metric could be used to judge the contributions fairly. it also is important to continue to emphasize the value of content contributions so that librarians are motivated and feel recognized. even librarians whose specialties are not outreach-oriented (e.g., systems librarians) have expert knowledge that could be shared in, say, a short article on how to set up rss feeds. workflow is part of a group of concerns being called “content strategy.” this concept, which has grown in popularity since 2008, includes editorial quality alongside issues like branding/messaging, search engine optimization, and information architecture. a content strategist would be concerned with why content is meaningful in addition to how it is managed. in her brief, useful book on the topic, kristina halvorson 98 information technology and libraries | september 2011 5. andrea everard and dennis f. galletta, “how presentation flaws affect perceived site quality, trust, and intention to purchase from an online store,” journal of management information systems 22 (2005–6): 79. 6. pamela briggs et al., “trust in online advice,” social science computer review 20 (2002): 330. 7. patrick j. lynch and sarah horton “online style,” web style guide, 3rd ed., http://webstyleguide.com/wsg3/9-editorial-style/3-online-style.html (accessed dec. 1, 2010). 8. janice (ginny) redish, letting go of the words: writing web content that works (san francisco: morgan kaufman, 2007). 9. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8 (2011): 1–14. 10. angela byron et al., using drupal (sebastopol, calif.: o’reilly, 2008). 11. all drupal modules can be found via http://drupal.org/ project/modules. 12. kristina halvorson, content strategy for the web (berkeley, calif.: new riders, 2010). ■■ acknowledgments thank you to jamie graham, karen hanson, dorothy moore, and vikram yelanadu. references 1. charles martell, “the absent user: physical use of academic library collections and services continues to decline 1995–2006,” journal of academic librarianship 34 (2008): 400–407. 2. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33 (2005): 371–79. 3. b. j. fogg and hsiang tseng, “the elements of computer credibility” (paper presented at chi ’99, pittsburgh, pennsylvania, may 15–20, 1999): 82. 4. b. j. fogg et al., “what makes web sites credible? a report on a large quantitative study” (paper presented at sigchi ’01, seattle, washington, mar. 31–apr. 4, 2001): 67–68. letter from the editors (december 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.16005 from the articles and communications in our december issue, the library technology profession has begun thinking through and reporting on the adaptations and changes wrought by the ongoing (some may say never-ending) covid-19 pandemic. four of the 5 articles in this issue relate to the many ways the pandemic altered how libraries do their work, both behind the scenes and in public. from the tools we use internally for project management to those we provide to our public service colleagues, it seems no aspect of library technology has been untouched. in particular, the seriousness of the challenges caused by interfaces with poor accessibility has been brought to the foreground. a critical component of libraries’ diversity, equity, inclusion, and accessiblity(deia) efforts, ensuring equitable access to all must be at top of mind. when the pandemic led libraries, and education in general, to adapt to largely virtual presentation models, the interactive tools we reached for—products such as padlet, jamboard, and poll everywhere— became de rigeur for establishing two-way interactions with our communities. yet little attention was paid, until now, to the accessibility of those tools. in this issue, “tech tools in pandemictransformed information literacy instruction: pushing for digital accessibility” provides excellent qualitative data to help us understand how well, or poorly, these tools meet accessibility needs. articles • digitization of libraries, archives, and museums in russia / heesop kim and nadezhda maltceva • tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility / amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson • spatiotemporal distribution change of online reference during the time of covid-19 / thomas gerrish and ningning nicole kong communications • a library website redesign in the time of covid: a chronological case study / erin rushton and bern mulligan • a library website migration: project planning in the midst of a pandemic / isabel vargas ochoa as always, if you have lessons learned about technologies and their effect on our mission, we’d like to hear from you. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. we particularly welcome our public library colleagues to consider a column in our “public libraries leading the way” series; proposals for 2023 may be submitted through the pllw proposal form. with best wishes for 2023, kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/article/view/13783 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15097 https://ejournals.bc.edu/index.php/ital/article/view/15101 https://ejournals.bc.edu/index.php/ital/article/view/14801 https://ejournals.bc.edu/index.php/ital/call-for-submissions https://docs.google.com/forms/d/e/1faipqlsegdx926lhtfsrsdkexaqzmx1ayfw7g2ny6j1iegy-qt6lubq/viewform?usp=sf_link mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com articles communications the first 500 mistakes you will make while streaming on twitch.tv public libraries leading the way the first 500 mistakes you will make while streaming on twitch.tv chris markman, kasper kimura, and molly wallner information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15475 chris markman (chris.markman@cityofpaloalto.org) is senior librarian, palo alto city library. kasper kimura (kasper.tsutomu@gmail.com) is methodist youth fellowship high school director, wesley united methodist church. molly wallner (molly.wallner@cityofpaloalto.org) is senior librarian, palo alto city library. © 2022. introduction three librarians at the palo alto city library embarked on an epic virtual event journey in 2020. this is our story. twitch.tv is the most popular video game streaming platform on the internet right now, but that does not mean it is the easiest to use or navigate as content creators. while the mistakes were many, you do not have to repeat them. in short, lessons learned over the past two years fell under four distinct categories, many of them interrelated or compounding one another: • physical space limitations and challenges migrating studio setups during various phases of the covid-19 pandemic. • complex decisions making audio and video equipment purchases. • our own familiarity with videogame streaming platforms and specialized software. • converting our in-person event policies and codes of conduct for virtual events. mistakes 001–135: picking the right time, place, and software we can say confidently that mistake #1 in your 500-mistake journey is pretending the library will strike gold with its first ever stream and achieve instant online success. we chose minecraft as our first videogame featured on twitch.tv. the cold reality is that real-world streamers who host thousands of viewers at one time are not building the interpersonal connections you are likely aiming for as a librarian. the second biggest mistake you’re likely to make while setting up a stream is in picking the right location. over the course of two years, in response to different levels of building access, we ended up moving our ad-hoc studio location a total of four times. each location posed its own challenges, and we learned more about what worked with every move. your streaming space should not only be distraction free, but also easy to adjust as needed, because your setup will change over time. picking the right av equipment for your stream is a gigantic topic, and the subject of infinite support forum threads and online discussions. the correct answer also largely depends on if you plan to stick with console game streaming, or pc, or some mixture of both. we can summarize by saying that to start off, you do not need the very best studio gear, and in fact , this thinking can lead to an artificial barrier that might result in more “tech debt” than necessary. you will end up spending a considerable amount of time troubleshooting strange quirks that were not there the last time you streamed, or with each new equipment purchase/upgrade. mailto:chris.markman@cityofpaloalto.org mailto:kasper.tsutomu@gmail.com mailto:molly.wallner@cityofpaloalto.org information technology and libraries september 2022 the first 500 mistakes you will make while streaming on twitch.tv | markman, kimura, and wallner 2 mistakes 136–223: moderation tools and volunteers we have had to block a few bots, as well as tactfully defuse some loose-cannon streamsurfers by maintaining aggressive kindness in answer to their sarcastic questioning. overall, our moderating world has not been rocked in a way we weren’t prepared for, due to our thoughtfully crafted and transparent policy that was adapted from our patron code of conduct, trained teen volunteer moderators, and clear communication as a team. mistakes 224–301: the finer points of twitch.tv in addition to having had little experience playing video games in general, our stream host also had no experience with streaming. by design, kasper went into our first stream with only two guidelines for interacting with twitch viewers: don’t stop talking and be friendly. no one wants to watch someone silently play a game badly; it’s not engaging and it’s not fun. another part of using twitch that we did not account for until we were in the middle of the first stream is that the chat runs on a delay. this makes sense from a moderating point of view; you want to be able to catch inappropriate or spam comments. however, in terms of holding a conversation with the chat, it became a mental challenge to hold multiple threads of conversation at a time—all while playing—and all while narrating what’s happening on screen, and as people were typing to respond to what was just said or done. this process can be very overwhelming for twitch.tv hosts.1 imagine driving a car on the highway while also watching a movie of yourself, and then simultaneously holding a conversation with ten or more people in the back seat of this car at the same time. they’re not commenting on what you’re currently doing though, instead they’re making jokes about the on-ramp or stop light two miles back. it’s not impossible to juggle these tasks simultaneously, but as the host, it does require practice. mistakes 302–389: art is a process, just like the inevitable bugs you will find in your setup every time you change anything heed our warning! you can find a mountain of well-meaning online advice and tutorials about the best possible streaming setup and content strategy: much of this is outdated or aimed at a very specific subset of gamers. there is a cottage industry of media consultants and youtuber personalities that review hardware and share tips and tricks advice. your information literacy skills should not go to waste here! always consider the source. stream decks and keyboard shortcuts: what the twitch.tv pros get right if we could go back in time, there is one element to our stream setup that could have been integrated sooner, and that’s the stream deck by elgato (https://www.elgato.com/en/streamdeck). this extra desktop keypad is literally a game changer for usability—it is the peanut butter that smooths over all the ux cracks created by open broadcast studio (obs) and the chaos of chat interactions already discussed. this small hardware upgrade also makes onboarding new stream hosts much easier because there is no need to memorize keyboard shortcuts: the buttons on the stream deck can be customized to do exactly what they say they do (like mute audio, change screen layouts, or stop and start streaming). mistakes 390–499: do androids dream of electric animal crossing dream codes? what does twitch.tv outreach look like? we used social media to connect with other organizations doing similar work, such as the lgbtq+ youth space in san jose. we had worked with this group before the pandemic on some https://library.cityofpaloalto.org/library-policies/patron-code-of-conduct/ https://www.elgato.com/en/stream-deck https://www.elgato.com/en/stream-deck https://youthspace.org/ information technology and libraries september 2022 the first 500 mistakes you will make while streaming on twitch.tv | markman, kimura, and wallner 3 pride programs for teens at the library, and so in 2020, when we saw on their instagram that they had a minecraft server open to the local community, our team eagerly jumped on this opportunity to collaborate with them. we had a minecraft stream; they had a minecraft server—could the stars be any more aligned? after some planning, one of the server mods joined us for a stream and gave us a tour of their server, which ended up being one of our most popular streams to date. conclusion: and what did we learn from all this? the final mistake (#500) is giving up. over the past two years we have hosted over 50 streams at https://www.twitch.tv/paloaltolibrary and can say confidently that not only was each virtual event unique, but also improved over time. we encourage more librarians to test out this mode of online outreach and practice your iterative design skills. video game streaming is not only fun for both the audience and hosts, but also a great way to connect with “extremely online” patrons of all ages. endnotes 1 to illustrate this problem in more detail: consider the events of our first very first stream, in which kasper’s dog saw a postal employee through the window while live on camera and reacted accordingly. this was one of the many reasons why moving our center of operations from the living room to the library was an upgrade. https://www.twitch.tv/paloaltolibrary introduction mistakes 001–135: picking the right time, place, and software mistakes 136–223: moderation tools and volunteers mistakes 224–301: the finer points of twitch.tv mistakes 302–389: art is a process, just like the inevitable bugs you will find in your setup every time you change anything stream decks and keyboard shortcuts: what the twitch.tv pros get right mistakes 390–499: do androids dream of electric animal crossing dream codes? what does twitch.tv outreach look like? conclusion: and what did we learn from all this? endnotes 6 information technology and libraries | march 2011 i n the new lita strategic plan, members have suggested an objective for open access (oa) in scholarly communications. some people describe oa as articles the author has to pay someone to publish. that can be true, but that’s not how i think of it. oa is definitely not vanity publishing. most oa journals are peer-reviewed. i like the definition provided by enablingopenscholarship: open access is the immediate (upon or before publication), online, free availability of research outputs without any of the restrictions on use commonly imposed by publisher copyright agreements.1 my focus on oa journals increased precipitously when the licensing for a popular american weekly medical journal changed. we could only access online articles from one on-campus computer unless we increased our annual subscription payment by 500 percent. we didn’t have the funds, and now the students suffer the consequences. i think it was an unfortunate decision the journal’s publishers made. i know from experience that if a student can’t access the first article they want, they will find another one that is available. interlibrary loan is simpler than ever, but i think only the patient and curious students will make the effort to contact us and request an article they cannot obtain. in 2006 scientist gary ward wrote that faculty at many institutions experience problems accessing current research. when faculty teach “what is available to them rather than what their students most need to know, the education of these students and the future of science in the u.s. will suffer.” he explains it is a false assumption that those who need access to scientific literature already have it. interlibrary loans or pay-per-view are often offered by publishers as the solution to the access problem, but this misses an essential fact of how we use the scientific literature: we browse. it is often impossible to tell from looking at an abstract whether a paper contains needed methodological detail or the perfect illustration to make a point to one’s students. apart from considerations of cost, time, and quality, interlibrary loans and pay-per-views simply do not meet the needs of those of us who often do not know what we’re looking for until we find it.2 i want our medical students and tomorrow’s doctors to have access to all of the most current medical research. we offer the service of providing jama articles to students, but i’m guessing that we hear from a small percentage of the students who can’t access the full text online. are people reading oa articles? not only are scholars reading the articles, but they are citing those articles in their publications. consider the public library of science’s plosone (http://www.plosone.org/home.action), a peerreviewed, open-access, online publication that features reports on primary research from all disciplines within science and medicine. in june 2010, plosone received its first impact factor of 4.351—an impressive number. that impact factor puts plosone in the top 25 percent of the institute for scientific information’s (isi) biology category.3 the impact factor is calculated annually by isi and represents the average number of citations received per paper published in that journal during the two preceding years.4 in other words, articles from plosone published in 2008 and 2009 were highly cited. is oa making an impact in my medical library? i believe it is, although i won’t be happy until our students can access the online journals they want from off campus and the library won’t have to pay outrageous licensing fees. we have more than one thousand online oa journal titles in our list of online journals. the more full text they can access, the less they’ll have to settle for their second or third choice because their first choice is not available online. i’m glad that lita members included oa in their strategic plan. the number of oa journals is increasing, and i believe we will continue to see that the articles are reaching readers and making a difference. i don’t think ital will be adopting the “author pays” model of oa, but the editorial board is dedicated to providing lita members with the access they want. references 1. enablingopenscholarship, “enabling open scholarship: open access,” http://www.openscholarship.org/jcms/c_6157/ open-access?portal=j_55&printview=true, (accessed jan. 18, 2011). 2. ward, gary, “deconstructing the arguments against improved public access,” newsletter of the american society for cell biology, nov. 2006, http://www.ascb.org/filetracker .cfm?fileid=550 (accessed jan. 18, 2011). 3. davis, phil, “plos one: is a high impact factor a blessing or a curse?” online posting, june 21, 2010, the scholarly kitchen, http://scholarlykitchen.sspnet.org/2010/06/21/plosone -impact-factor-blessing-or-a-curse/ (accessed jan. 18, 2011). 4. thomson reuters, “introducing the impact factor,” http://thomsonreuters.com/products_services/science/ academic/impact_factor/ (accessed jan. 18, 2011). cynthia porter editorial board thoughts: is open access the answer? cynthia porter (cporter@atsu.edu) is distance support librarian at a.t. still university of health sciences, mesa, arizona. 104 information technology and libraries | september 2010 development of such a mediation mechanism calls for an empirical assessment of various issues surrounding metadata-creation practices. the critical issues concerning metadata practices across distributed digital collections have been relatively unexplored. while examining learning objects and e-prints communities of practice, barton, currier, and hey point out the lack of formal investigation of the metadatacreation process.2 as will be discussed in the following section, some researchers have begun to assess the current state of descriptive practices, metadata schemata, and content standards. however, the literature has not yet developed to a point where it affords a comprehensive picture. given the propagation of metadata projects, it is important to continue to track changes in metadata-creation practices while they are still in constant flux. such efforts are essential for adding new perspectives to digital library research and practices in an environment where metadata best practices are being actively sought after to aid in the creation and management of high-quality digital collections. this study examines the prevailing current state of metadata-creation practices in digital repositories, collections, and libraries, which may include both digitized and born-digital resources. using nationwide survey data, mostly drawn from the community of cataloging and metadata professionals, we seek to investigate issues in creating descriptive metadata elements, using controlled vocabularies for subject access, and propagating metadata and metadata guidelines beyond local environments. we will address the following research questions: 1. which metadata schema(ta) and content standard(s) are employed in individual digital repositories and collections? 2. which controlled vocabulary schema(ta) are used to facilitate subject access? 3. what criteria are applied in selecting metadata and controlled-vocabulary schema(ta)? 4. to what extent are mechanisms for exposing and sharing metadata integrated into current metadatacreation practices? in this article, we first review recent studies relating to current metadata-creation practices across digital collections. then we present the survey method employed to conduct this study, the general characteristics of survey participants, and the validity of the collected data, followed by the study results. we report on how metadata and controlled vocabulary schema(ta) are being used across institutions, and we present a data analysis of current metadata-creation practices. the final section summarizes the study and presents some suggestions for future studies. this study explores the current state of metadata-creation practices across digital repositories and collections by using data collected from a nationwide survey of mostly cataloging and metadata professionals. results show that marc, aacr2, and lcsh are the most widely used metadata schema, content standard, and subjectcontrolled vocabulary, respectively. dublin core (dc) is the second most widely used metadata schema, followed by ead, mods, vra, and tei. qualified dc’s wider use vis-à-vis unqualified dc (40.6 percent versus 25.4 percent) is noteworthy. the leading criteria in selecting metadata and controlled-vocabulary schemata are collection-specific considerations, such as the types of resources, nature of the collection, and needs of primary users and communities. existing technological infrastructure and staff expertise also are significant factors contributing to the current use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. metadata interoperability remains a major challenge. there is a lack of exposure of locally created metadata and metadata guidelines beyond the local environments. homegrown locally added metadata elements may also hinder metadata interoperability across digital repositories and collections when there is a lack of sharable mechanisms for locally defined extensions and variants. m etadata is an essential building block in facilitating effective resource discovery, access, and sharing across ever-growing distributed digital collections. quality metadata is becoming critical in a networked world in which metadata interoperability is among the top challenges faced by digital libraries. however, there is no common data model that cataloging and metadata professionals can readily reference as a mediation mechanism during the processes of descriptive metadata creation and controlled vocabulary schemata application for subject description.1 the jung-ran park (jung-ran.park@ischool.drexel.edu) is assistant professor, college of information science and technology, drexel university, philadelphia, and yuji tosaka (tosaka@tcnj.edu) is cataloging/metadata librarian, tcnj library, the college of new jersey, ewing, new jersey. jung-ran park and yuji tosaka metadata creation practices in digital repositories and collections: schemata, selection criteria, and interoperability metadata creation practices in digital repositories and collections | park and tosaka 105 possible increase in the use of locally developed schemata as many projects added new types of nontextual digital objects that could not be adequately described by existing metadata schemata.6 there is a lack of research concerning the current use of content standards; however, it is reasonable to suspect that content-standards use exhibits patterns similar to that of metadata because of their often close association with particular metadata schemata. the oclc rlg survey reveals that anglo-american cataloguing rules, 2nd edition (aacr2)—the traditional cataloging rule that has most often been used in conjunction with marc—is the most widely used content standard (81 percent). aacr2 is followed by describing archives: a content standard (dacs) with 42 percent; descriptive cataloging of rare materials with 33 percent; archives, personal papers, manuscripts (appm) with 25 percent; and cataloging cultural objects (cco) with 21 percent.7 in the same way as metadata schemata, there appears to be a concentration of a few controlled vocabulary schemata at research institutions. ma’s arl survey, for example, shows that the library of congress subject headings (lcsh) and name authority file (naf) were used by most survey respondents (96 percent and 88 percent, respectively). these two predominantly adopted vocabularies are followed by several domain-specific vocabularies, such as art and architecture thesaurus (aat), library of congress thesaurus for graphical materials (tgm) i and ii, getty thesaurus of geographic names (tgn), and the getty union list of artists names (ulan), which were used by between 30 percent to more than 60 percent of respondents.8 the oclc rlg survey reports similar results; however, nearly half of the oclc rlg survey respondents (n = 9) indicated that they had also built and maintained one or more locally developed thesauri.9 while creating and sharing information about local metadata implementations is an important step toward increased interoperability, recent studies tend to paint a grim picture of current local documentation practices and open accessibility. in a nationwide study of institutional repositories in u.s. academic libraries, markey et al. found that only 61.3 percent of the 446 survey participants with operational institutional repositories had implemented policies for metadata schemata and authorized metadata creators.10 the oclc rlg survey also highlights limited collaboration and sharing of the metadata guidelines both within and across the institutions. it finds that even when there are multiple units creating metadata within the same institution, metadata-creation guidelines often are unlikely to be shared (28 percent do not share; 53 percent sometimes share).11 a mixed result is reported on the exposure of metadata to outside service providers. in an arl survey, the university of houston libraries institutional repository ■■ literature review as evinced by the principles and practices of bibliographic control through shared cataloging, successful resource access and sharing in the networked environment demands semantic interoperability based on accurate, complete, and consistent resource description. the recent survey by ma finds that the open archives initiative protocol for metadata harvesting (oai-pmh) and metadata crosswalks have been adopted by 83 percent and 73 percent of respondents, respectively. even though the sample comes only from sixty-eight association of research libraries (arl) member libraries, and the figures thus may be skewed higher than those of the entire population of academic libraries, there is little doubt that interoperability is a critical issue given the rapid proliferation of metadata schemata throughout digital libraries.3 while there is a variety of metadata schemata currently in use for organizing digital collections, only a few of them are widely used in digital repositories. in her arl survey, ma reports that the marc format is the most widely used metadata schema (91 percent), followed by encoded archival description (ead) (84 percent), unqualified dublin core (dc) (78 percent), and qualified dc (67 percent).4 similarly, a 2007 member survey by oclc research libraries group (rlg) programs gathered information from eighteen major research libraries and cultural heritage institutions and also found that marc is the most widely used scheme (65 percent), followed by ead (43 percent), unqualified dc (30 percent), and qualified dc (29 percent). the different levels of use reported by these studies are probably due to different sample sizes and compositions, but results nonetheless suggest that metadata use at research institutions tends to rely on a small number of major schemata.5 there may in fact be much greater diversity in metadata use patterns when the scope is expanded to include both research and nonresearch institutions. palmer, zavalina, and mustafoff, for example, tracked trends from 2003 through 2006 in metadata selection and application practices at more than 160 digital collections developed through institute of museum and library services grants. they found that despite perceived limitations, use of dc is the most widespread, with more than half of the digital collections using it alone or in combination with other schemata. marc ranks second, with nearly 30 percent using it alone or in combination. the authors found that the choice of metadata schema is largely influenced by practices at peer institutions and compatibility with a content management system. what is most striking, however, is the finding that locally developed schemata are used as often as marc. there is a decline in the percentage of digital projects using multiple metadata schemata (from 53 percent to 38 percent). yet the authors also saw a 106 information technology and libraries | september 2010 ■■ method the objective of the research reported in this paper is to examine the current state of metadata-creation practices in terms of the creation of descriptive metadata elements, the use of controlled vocabularies for subject access, and the exposure of metadata and metadata guidelines beyond local environments. we conducted a web survey using websurveyor (now vovici: http://www.vovici .com). the survey included both structured and openended questions. it was extensively reviewed by members of an advisory board—a group of three experts in the field—and it was pilot-tested prior to being officially launched. the survey included many multiple-response questions that called for respondents to check all applicable answers. we recruited participants through survey invitation messages and subsequent reminders to the electronic mailing lists of communities of metadata and cataloging professionals. table 1 shows the mailing lists employed for the study. we also sent out individual invitations and distributed flyers to selected metadata and cataloging sessions during the 2008 ala midwinter meeting, held that year in philadelphia. the survey attracted a large number of initial participants (n = 1,371), but during the sixty-two days from august 6 to october 6, 2008, we only received 303 completed responses via the survey management system. we suspect that the high incompletion rate (77.9 percent) stems from the fact that the subject matter may have been outside the scope of many participants’ job responsibilities. the length of the survey may also have been a factor in the incompletion rate. the profiles of respondents’ job titles (see table 2) task force found that exposing metadata to oai-pmh service providers is an established practice used by nearly 90 percent of the respondents.12 ma’s arl survey also reports the wide adoption of oai-pmh (83 percent). these results underscore the virtual consensus on the critical importance of exposing metadata to achieve interoperability and make locally created metadata useful across distributed digital repositories and collections.13 by contrast, the oclc rlg survey shows that only one-tenth of the respondents stated that all non-marc metadata is exposed to oai harvesters, while 30 percent indicated that only some of it was available. the prominent theme revealed by the oclc rlg survey is an “inward focus” in current metadata practices, marked by the “use of local tools to reach a generally local audience.”14 in summary, recent studies show that the current practice of metadata creation is problematic due to the lack of a mechanism for integrating various types of metadata schemata, content standards, and controlled vocabularies in ways that promote an optimal level of interoperability across digital collections and repositories. the problems are exacerbated in an environment where many institutions lack local documentation delineating the metadata-creation process. at the same time, researchers have only recently begun studying these issues, and the body of literature is at an incipient stage. the research that was done often targeted different populations, and sample sizes were different (some very small). in some cases the literature exhibits contradictory findings about issues surrounding metadata practices, increasing the difficulty in understanding the current state of metadata creation. this points out the need for further research of current metadata-creation practice. table 1. electronic mailing lists for the survey electronic mailing lists e-mail address autocat dublin core listserv metadata librarians listserv library and information technology association listserv online audiovisual catalogers electronic discussion list subject authority cooperative program listserv serialist text encoding initiative listserv electronic resources in libraries listserv encoded archival description listserv autocat@listserv.syr.edu dc-libraries@jiscmail.ac.uk metadatalibrarians@lists.monarchos.com lita-l@ala.org olac-list@listserv.acsu.buffalo.edu sacolist@listserv.loc.gov serialst@list.uvm.edu tei-l@listserv.brown.edu eril-l@listserv.binghamton.edu ead@listserv.loc.gov metadata creation practices in digital repositories and collections | park and tosaka 107 and job responsibilities (see table 3) clearly show that most of the individuals who completed the survey engage professionally in activities directly relevant to the research objectives, such as descriptive and subject cataloging, metadata creation and management, authority control, nonprint and special material cataloging, electronic resource and digital project management, and integrated library system (ils) management. although the largest number of participants (135, or 44.6 percent) chose the “other” category regarding their job title (see table 2), it is reasonable to assume that the vast majority can be categorized as cataloging and metadata professionals.15 most job titles given as “other” are associated with one of the professional activities listed in table 4. thus it is reasonable to assume that the respondents are in an appropriate position to provide first-hand, accurate information about the current state of metadata creation in their institutions. concerning the institutional background of participants, of the 303 survey participants, fewer than half (121, or 39.9 percent) provided institutional information. we believe that this is mostly due to the fact that the question was optional, following a suggestion from the institutional review board at drexel university. of those that provided their institutional background, the majority (75.2 percent) are from academic libraries, followed by participants from public libraries (17.4 percent) and from other institutions (7.4 percent). table 3. participants’ job responsibilities (multiple responses) job responsibilities number of participants general cataloging (e.g., descriptive and subject cataloging) 171 (56.4%) metadata creation and management 153 (50.5%) authority control 147 (48.5%) nonprint cataloging (e.g., microform, music scores, photographs, video recordings) 133 (43.9%) special material cataloging (e.g., rare books, foreign language materials, government documents) 126 (41.6%) digital project management 101 (33.3%) electronic resource management 62 (20.5%) ils management 59 (19.5%) other 51 (16.8%) survey question: what are your primary job responsibilities? (please check all that apply) table 2. job titles of participants (multiple responses) job titles number of participants other 135 (44.6%) cataloger/cataloging librarian/ catalog librarian 99 (32.7%) metadata librarian 29 (9.6%) catalog & metadata librarian 26 (8.6%) head, cataloging 26 (8.6%) electronic resources cataloger 17 (5.6%) cataloging coordinator 15 (5.0%) head, cataloging & metadata services 15 (5.0%) n = 227. survey question: what is your working job title? (please check all that apply) table 4. professional activities specified in “other” category in table 2 professional activities number of participants cataloging & metadata creation 31 (10.2%) digital projects management 23 (7.6%) technical services 17 (5.6%) archiving 16 (5.3%) electronic resources and serials management 6 (2.0%) library system administration/ other 6 (2.0%) n = 99. survey question: if you selected other, please specify. 108 information technology and libraries | september 2010 it is noteworthy that use of qualified dc was higher than that of unqualified dc. this result is different from the arl survey and a member survey conducted ■■ results in this section, we will present the findings of this study in the following three areas: (1) metadata and controlled vocabulary schemata and metadata tools used, (2) criteria for selecting metadata and controlled vocabulary schemata, and (3) exposing metadata and metadata guidelines beyond local environments. metadata and controlled vocabulary schemata and metadata tools used a great variety of digital objects were handled by the survey participants, as figure 1 shows. the most frequently handled object was text, cited by 86.5 percent of the respondents. about three-fourths of the respondents described audiovisual materials (75.2 percent), while 60.1 percent described images and 51.8 percent described archival materials. more than 65 percent of the respondents handled electronic resources (68.3 percent) and digitized resources (66.7 percent), while approximately half handled borndigital resources (52.5 percent). the types of materials described in digital collections were diverse, encompassing both digitized and born-digital materials; however, digitization accounted for a slightly greater percentage of metadata creation. to handle these diverse digital objects, the respondents’ institutions employed a wide range of metadata schemata, as figure 2 shows. yet there were a few schemata that were widely used by cataloging and metadata professionals. specifically, 84.2 percent of the respondents’ institutions used marc; dc was also popular, with 25.4 percent using unqualified dc and 40.6 percent using qualified dc to create metadata. ead also was frequently cited (31.7 percent). in addition to these major types of metadata schemata, the respondents’ institutions also employed metadata object description schema (mods) (17.8 percent), visual resource association (vra) core (14.9 percent), and text encoding initiative (tei) (12.5 percent). figure 1. materials/resources handled (multiple responses) survey question: what type of materials/resources do you and your fellow catalogers/metadata librarians handle? (please check all that apply) figure 2. metadata schemata used (multiple responses) survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 109 custom metadata elements derives from the imperative to accommodate the perceived needs of local collections and users, as indicated by the two most common responses: (1) “to reflect the nature of local collections/resources” (76.9 percent) and (2) “to reflect the characteristics of target audience/community of local collections” (58.3 percent). local conditions were also cited from institutional and technical standpoints. many institutions (34.3 percent) follow existing local practices for cataloging and metadata creation while other institutions (18.5 percent) are making homegrown metadata additions because of constraints imposed by their local systems. table 6 summarizes the most frequently used controlled vocabulary schematas by resource type. by far the most widely used schema across all resource types was lcsh. the preeminence of lcsh evinces the critical role that it plays as the de facto form of controlled vocabulary for subject description. library of congress classification (lcc) was the second choice for all resource types other than images, cultural objects, and archives. for digital collections of these resource types and digitized resources, aat was the second most used controlled vocabulary, a fact that reflects its purpose as a domain-specific terminology used for describing works of art, architecture, visual resources, material culture, and archival materials. while traditional metadata schemata, content standards, and controlled vocabularies such as marc, aacr2, and lcsh clearly were preeminent in the majority of the respondents’ institutions, current metadata creation in digital repositories and collections faces new challenges from the enormous volume of online and digital resources.19 approximately one-third of the respondents’ institutions (33.8 percent) were meeting this challenge with tools for semiautomatic metadata generation. yet a majority of respondents (52.5 percent) indicated that their institutions did not use any such tools for metadata creation and management. this result seems to contrast with ma’s finding that automatic metadata generation was used in some capacity in nearly by oclc rlg programs (as described in “literature review” on page 105).16 in these surveys, unqualified dc was more frequently cited than qualified dc. one possible explanation of this less frequent use of unqualified dc may lie in the limitations of unqualified dc metadata semantics. survey respondents also reported on problems using dc metadata, which were mostly caused by semantic ambiguities and semantic overlaps of certain dc metadata elements.17 limitations and issues of unqualified dc metadata semantics are discussed in depth in park’s study.18 in light of these results, examining trends of qualified dc use in a future study would be interesting. despite the wide variety of schemata reported in use, there seemed to be an inclination to use only one or two metadata schemata for resource description. as shown in table 5, the majority of the respondents’ institutions (53.6 percent) used only one schema for metadata creation, while approximately 37 percent used two or three schemata (26.2 percent and 10.3 percent, respectively). the institutions using more than three schemata during the metadata-creation processes comprised only 9.9 percent of the respondents. turning to content standards (see figure 3), we found that aacr2 was the most widely used standard, indicated by 84.5 percent of respondents. this high percentage clearly reflects the continuing preeminence of marc as the metadata schema of choice for digital collections. dc application profiles also showed a large user base, indicated by more than one-third of respondents (37.0 percent). more than one quarter of the respondents (28.4 percent) used ead application guidelines as developed by the society of american archivists and the library of congress, while 10.6 percent used rlg best practice guidelines for encoded archival description (2002). about one quarter (25.7 percent) indicated dacs as their content standard. homegrown standards and guidelines are local application profiles that clarify existing content standards and specify how values for metadata elements are selected and represented to meet the requirements of a particular context. as shown in the results on metadata schemata, it is noteworthy that homegrown content standards and guidelines constituted one of the major choices of participants, indicated by more than one-fifth of the institutions (22.1 percent). almost two-fifths of the survey participants (38 percent) also reported that they add homegrown metadata elements to a given metadata schema. slightly less than half of the participants (47.2 percent) indicated otherwise. the local practice of creating homegrown content guidelines and metadata elements during the metadatacreation process deserves a separate study; this study only briefly touches on the basis for locally added custom metadata elements. the motivation to create table 5. number of metadata schemata in use number of metadata schemata in use number of participants 1 141 (53.6%) 2 69 (26.2%) 3 27 (10.3%) 4 or more 26 (9.9%) n=263. survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use the most? (please check all that apply) 110 information technology and libraries | september 2010 criteria for selecting metadata and controlled vocabulary schemata what are the factors that have shaped the current state of metadata-creation practices reported thus far? in this section, we turn our attention to constraints that affect decision making at institutions in the selection of metadata and controlled vocabulary schemata for subject description. figure 4 presents the percentage of different metadata schemata selection criteria described by survey participants. first, collection-specific considerations clearly played a major role in the selection. the most frequently cited reason was “types of resources” (60.4 percent). this response reflects the fact that a large number of metadata schemata have been developed, often with wide variation in content and format, to better handle particular two-thirds of arl libraries.20 because semiautomatic metadata application is reported in-depth in a separate study, we only briefly sketch the topic here.21 the semiautomatic metadata application tools used in the respondents’ digital repositories and collections can be classified into five categories of common characteristics: (1) metadata format conversion, (2) templates and editors for metadata creation, (3) automatic metadata creation, (4) library system for bibliographic and authority control, and (5) metadata harvesting and importing tools. as table 7 illustrates, among those institutions that have introduced semiautomatic metadata generation tools, “metadata format conversion” (38.6 percent) and “templates and editors for metadata creation” (27 percent) are the two most frequently cited tools. figure 3. content standards used (multiple responses) survey question: what content standard(s) and/or guidelines do you and your fellow catalogers/metadata librarians use? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 111 job responsibility, “expertise of staff” (44.2 percent) and “integrated library system” (39.9 percent) appeared to highlight the key role that marc continues to play in the metadata-creation process for digital collections (see figure 2). “budget” also appeared to be an important factor in metadata selection (17.2 percent), showing that funding levels played a considerable role in metadata decisions. types of information resources. the primary factor in selecting metadata schemata is their suitability for describing the most common type of resources handled by the survey participants. the second and third most common criteria, “target users/ audience” (49.8 percent) and “subject matters of resources” (46.9 percent), also seem to reflect how domain-specific metadata schemata are applied. in making decisions on metadata schemata, respondents weighed materials in particular subject areas (e.g., art, education, and geography) and the needs of particular communities of practice as their primary users and audiences. however, existing technological infrastructure and resource constraints also determine options. given the prominence of general library cataloging as a primary table 6. the most frequently used controlled vocabulary schema(s) by resource type (multiple responses) lcsh lcc ddc aat tgm ulan tgn other text 79.5% (241) 35.6% (108) 16.8% (51) 10.2% (31) 6.9% (21) 3.6% (11) 5.0% (15) 14.2% (43) audiovisual materials 67.3% (204) 25.1% (76) 12.9% (39) 9.2% (28) 8.6% (26) 4.0% (12) 5.0% (15) 14.5% (44) cartographic materials 44.9% (136) 17.5% (53) 7.3% (22) 5.0% (15) 4.3% (13) 1.3% (4) 4.3% (13) 6.3% (19) images 43.2% (131) 11.9% (36) 5.6% (17) 25.7% (78) 20.1% (61) 9.9% (30) 10.6% (32) 11.2% (34) cultural objects (e.g., museum objects) 20.1% (61) 7.3% (22) 4.3% (13) 13.2% (40) 6.3% (19) 4.6% (14) 3.0% (9) 7.9% (24) archives 44.2% (134) 11.6% (35) 6.3% (19) 11.9% (36) 6.6% (20) 3.0% (9) 2.6% (8) 12.2% (37) electronic resources 60.7% (184) 23.4% (71) 8.6% (26) 5.3% (16) 3.6% (11) 1.7% (5) 3.0% (9) 14.2% (43) digitized resources 51.8% (157) 15.5% (47) 5.0% (15) 15.5% (47) 10.2% (31) 6.6% (20) 7.6% (23) 15.2% (46) born-digital resources 43.9% (133) 13.5% (41) 5.6% (17) 8.3% (25) 7.3% (22) 4.3% (13) 4.6% (14) 13.9% (42) survey question: which controlled vocabulary schema(s) do you and your fellow catalogers/metadata librarians use most? (please check all that apply) table 7. types of semi-automatic metadata generation tools in use types response rating metadata format conversion 38 (38.6%) templates and editors for metadata creation 26 (27.0%) automatic metadata creation 16 (16.7%) library system for bibliographic and authority control 15 (15.6%) metadata harvesting and importing tools 8 (8.3%) n = 96. survey question: please describe the (semi)automatic metadata generation tools you use. 112 information technology and libraries | september 2010 the software used by their institutions—i.e., “integrated library system” (39.9 percent), “digital collection or asset management software” (25.4 percent), “institutional repository software” (19.8 percent), “union catalogs” (14.9 percent), and “archival management software” (5.6 percent)—as a reason for their selection of metadata schemata. metadata decisions thus seem to be driven by a variety of local technology choices for developing digital repositories and collections. as shown in figure 5, similar patterns are observed with regard to selection criteria for controlled vocabulary schemata. three of the four selection criteria receiving majority responses—“target users/audience” (55.4 percent), “type of resources” (54.8 percent), and “nature of the collection” (50.2 percent)—suggest that controlled vocabulary decisions are influenced primarily by the substantive purpose and scope of controlled vocabularies for local collections. a major consideration seems to be whether particular controlled vocabularies are suitable for representing standard data values to improve access and retrieval for target audiences. “metadata standards,” another selection criteria frequently cited in the survey (54.1 percent), reflects how some domain-specific metadata schemata tend to dictate the use of particular controlled vocabularies. at the same time, the results also suggest that resources and technological infrastructure available to institutions were also important reasons for their selections. “expertise of staff” (38.3 percent) seems to be a straightforward practical reason: the application of controlled vocabularies is highly dependent on the width and depth of staff expertise available. likewise, when implementing controlled vocabularies in the digital environment, some institutions also took into account at the same time, it is noteworthy that while responses were not mutually exclusive, many respondents cited figure 4. criteria for selecting metadata schemata (multiple responses) question: which criteria were applied in selecting metadata schemata? (please check all that apply) figure 5. criteria for selecting controlled vocabulary schemata (multiple responses) question: which criteria are applied in selecting controlled vocabulary schemata? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 113 for search engines and 63.2 percent for oai harvesters), a result that may be interpreted as a tendency to create metadata primarily for local audiences. why do many institutions fail to make their locally created metadata available to other institutions despite wide consensus on the importance of metadata sharing in a networked world? responses from those institutions exposing none or not all of their metadata (see table 8) reveal that financial, personnel, and technical issues are major hindrances in promoting the exposure of metadata outside the immediate local environment. some institutions are not confident that their current metadata practices are able to satisfy the technical requirements for producing standards-based interoperable metadata. another reason frequently mentioned is copyright concerns about limited-access materials. yet some respondents simply do not see any merit to exposing their item-level metadata, citing its relative uselessness for resource discovery outside their institutions. as stated earlier, the practice of adding homegrown metadata elements seems common among many institutions. while locally created metadata elements accommodate local needs and requirements, they may also hinder metadata interoperability across digital repositories and collections if mechanisms for finding information about such locally defined extensions and variants are absent. homegrown metadata guidelines document local data models and function as an essential mechanism for metadata creation and quality assurance within and across digital repositories and collections.23 in this regard, it is essential to examine locally created metadata guidelines and best practices.24 however, the results of the survey analysis evince that the vast majority of institutions (72.0 percent) provided no public access to local application profiles on their websites while only 19.6 percent of respondents’ institutions made them available online to the public. ■■ conclusion metadata plays an essential role in managing, organizing, and searching for information resources. in the networked existing system features for authority control and controlled vocabulary searching, as exhibited by 17.2 percent of responses for “digital collection/or asset management software.” exposing metadata and metadata guidelines beyond local environments metadata interoperability across distributed digital repositories and collections is fast becoming a major issue.22 the proliferation of open-source and commercial digital library platforms using a variety of metadata schemata has implications on the librarians’ ability to create shareable and interoperable metadata beyond the local environment. to what extent are mechanisms for sharing metadata integrated into the current metadata-creation practices described by the respondents? figure 6 summarizes the responses concerning the uses of three major mechanisms for metadata exposure. approximately half of respondents exposed at least some of their metadata to search engines (52.8 percent) and union catalogs such as oclc worldcat (50.6 percent). more than one-third of the respondents exposed all or some of their metadata through oai harvesters (36.8 percent). about half or more of the respondents either did not expose their metadata or were not sure about the current operations at their institutions (e.g., 47.2 percent figure 6. mechanism to expose metadata (multiple responses) survey question: do you/your organization expose your metadata to oai (open archives initiative) harvesters, union catalogs or search engines? 114 information technology and libraries | september 2010 the dc metadata schema is the second most widely employed according to this study, with qualified dc used by 40.6 percent of responding institutions and unqualified dc used by 25.4 percent. ead is another frequently cited schema (31.7 percent), followed by mods (17.8 percent), vra (14.9 percent), and tei (12.5 percent). a trend of qualified dc being used (40.6 percent) more often than unqualified dc (25.4 percent) is noteworthy. one possible explanation of this trend may be derived from the fact that semantic ambiguities and overlaps in some of the unqualified dc elements interfere with use in resource description.25 given the earlier surveys reporting the higher use of unqualified dc over qualified dc, more in-depth examination of their use trends may be an important avenue for future studies. despite active research and promising results obtained from some experimental tools, practical applications of semiautomatic metadata generation have been incorporated into the metadata-creation processes by only one-third of survey participants. the leading criteria in selecting metadata and controlled vocabulary schemata are derived from collection-specific considerations of the type of resources, the nature of the collections, and the needs of primary users and communities. existing technological infrastructure, encompassing digital collection or asset management software, archival management software, institutional repository software, integrated library systems, and union catalogs also greatly influence the selection process. the skills and knowledge of metadata professionals and the expertise of staff also are significant factors in understanding current practices in the use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. the survey responses reveal that metadata interoperability remains a challenge in the current networked environment despite growing awareness of its importance. for half of the survey respondents, exposing metadata to the service providers, such as oai harvesters, union catalogs, and search engines, does not seem to be a high priority because of local financial, personnel, and technical constraints. locally created metadata elements are added in many digital repositories and collections in large part to meet local descriptive needs and serve the target user community. while locally created metadata elements accommodate local needs, they may also hinder metadata interoperability across digital repositories and collections when shareable mechanisms are not in place for such locally defined extensions and variants. locally created metadata guidelines and application profiles are essential for metadata creation and quality assurance; however, most custom content guidelines and best practices (72 percent) are not made publicly available. the lack of a mechanism to facilitate public access to local application profiles and metadata guidelines may environment, the enormous volume of online and digital resources creates an impending research need to evaluate the issues surrounding the metadata-creation process and the employment of controlled vocabulary schemata across ever-growing distributed digital repositories and collections. in this paper we explored the current status of metadata-creation practices through an examination of survey responses drawn mostly from cataloging and metadata professionals (see tables 2, 3, and 4). the results of the study indicate that current metadata practices still do not create conditions for interoperability. despite the proliferation of newer metadata schemata, the survey responses showed that marc currently remains the most widely used schema for providing resource description and access in digital repositories, collections, and libraries. the continuing predominance of marc goes hand-in-hand with the use of aacr2 as the primary content standard for selecting and representing data values for descriptive metadata elements. lcsh is used as the de facto controlled vocabulary schema for providing subject access in all types of digital repositories and collections, while domain-specific subject terminologies such as aat are applied at significantly higher rates in digital repositories handling nonprint resources such as images, cultural objects, and archival materials. table 8. sample reasons for not exposing metadata not all our metadata conforms to standards required not all our metadata is oai compliant lack of expertise and time and money to develop it it restrictions security concerns on the part of our information technology department some collections/records are limited access and not open to the general public we think that having worldcat available for traditional library materials that many libraries have is a better service to people than having each library dump our catalog out on the web varies by tool and collection, but usually a restriction on the material, a technical barrier, or a feeling that for some collections the data is not yet sufficiently robust “still in a work in progress” survey question: if you selected “some, but not all” or “no” in question 13 [see figure 6], please tell why you do not expose your metadata. metadata creation practices in digital repositories and collections | park and tosaka 115 presented at 2003 dublin core conference: supporting communities of discourse and practice—metadata research & applications, seattle, wash., sept. 28–oct. 2, 2003), http://dcpapers .dublincore.org/ojs/pubs/article/view/732/728 (accessed mar. 24, 2009); sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata-creation process,” alt-j 12 (2004): 5–20. 3. jin ma, metadata, spec kit 298 (washington, d.c.: association of research libraries, 2007): 13, 28. 4. ibid., 12, 21–22. 5. karen smith-yoshimura, rlg programs descriptive metadata practices survey results (dublin, ohio: oclc, 2007): 6–7, http://www.oclc.org/programs/publications/reports/2007-03 .pdf (accessed mar. 24, 2009); karen smith-yoshimura and diane cellentani, rlg programs descriptive metadata practices survey results: data supplement (dublin, ohio: oclc, 2007): 16, http://www.oclc.org/programs/publications/reports/2007-04 .pdf (accessed mar. 24, 2009). 6. carole palmer, oksana zavalina, and megan mustafoff, “trends in metadata practices: a longitudinal study of collection federation” (paper presented at the seventh acm/ iees-cs joint conference on digital libraries, vancouver, british columbia, canada, june 18–23, 2007), http://hdl.handle .net/2142/8984 (accessed mar. 24, 2009). 7. smith-yoshimura, rlg programs descriptive metadata practices survey results, 7; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 17. 8. ma, metadata, 12, 22–23. 9. smith-yoshimura, rlg programs descriptive metadata practices survey results, 7; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 18–21. 10. karen markey et al., census of institutional repositories in the united states: miracle project research findings (washington, d.c.: council on library & information resources, 2007): 3, 46–50, http://www.clir.org/pubs/reports/pub140/pub140.pdf (accessed mar. 24, 2009). 11. yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 24. 12. university of houston libraries institutional repository task force, institutional repositories, spec kit 292 (washington, d.c.: association of research libraries, 2006): 18, 78. 13. ma, metadata, 13, 28. 14. smith-yoshimura, rlg programs descriptive metadata practices survey results, 9, 11; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 27–29. 15. for the metrics of job responsibilities used to analyze job descriptions and competencies of cataloging and metadata professionals, see jung-ran park, caimei lu, and linda marion, “cataloging professionals in the digital environment: a content analysis of job descriptions,” journal of the american society for information science & technology 60 (2009): 844–57; jung-ran park and caimei lu, “metadata professionals: roles and competencies as reflected in job announcements, 2003–2006,” cataloging & classification quarterly 47 (2009): 145–60. 16. ma, metadata; smith-yoshimura, rlg programs descriptive metadata practices survey result. 17. jung-ran park and eric childress, “dublin core metadata semantics: an analysis of the perspectives of information professionals,” joural of information science 35, no. 6 (2009): 727–39. 18. park, “semantic interoperability.” 19. jung-ran park, “metadata quality in digital repositories: hinder cross-checking for quality metadata and creating shareable metadata that can be harvested for a high level of consistency and interoperability across distributed digital collections and repositories. development of a searchable registry for publicly available metadata guidelines has potential to enhance metadata interoperability. a constraining factor of this study derives from the participant population; thus we have not attempted to generalize the findings of the study. however, results indicate a pressing need for a common data model that is shareable and interoperable across ever-growing distributed digital repositories and collections. development of such a common data model demands future research of a practical and interoperable mediation mechanism underlying local implementation of metadata elements, semantics, content standards, and controlled vocabularies in a world where metadata can be distributed and shared widely beyond the immediate local environment and user community. (other issues such as semiautomatic metadata application, dc metadata semantics, custom metadata elements, and the professional development of cataloging and metadata professionals are explained in-depth in separate studies.)26 for future studies, incorporation of other research methods (such as follow-up telephone surveys and face-to-face focus group interviews) could be used to better understand the current status of metadata-creation practices. institutional variation also needs be taken into account in the design of future studies. ■■ acknowledgments this study is supported through an early career development research award from the institute of museum and library services. we would like to express our appreciation to the reviewers for their invaluable comments. references 1. jung-ran park, “semantic interoperability and metadata quality: an analysis of metadata item records of digital image collections,” knowledge organization 33 (2006): 20–34; rachel heery, “metadata futures: steps toward semantic interoperability,” in metadata in practice, ed. diane i. hillman and elaine l. westbrooks, 257–71 (chicago: ala, 2004); jung-ran park, “semantic interoperability across digital image collections: a pilot study on metadata mapping” (paper presented at the canadian association for information science 2005 annual conference, london, ontario, june 2–4, 2005), http://www.cais-acsi .ca/proceedings/2005/park_j_2005.pdf (accessed mar. 24, 2009). 2. jane barton, sarah currier, and jessie m. n. hey, “building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice” (paper 116 information technology and libraries | september 2010 a survey of the current state of the art,” in “metadata and open access repositories,” ed. michael s. babinec and holly mercer, special issue, cataloging & classification quarterly 47, no. 3/4 (2009): 213–38. 20. ma, metadata, 12, 24. the oclc rlg survey found that about 40 percent of the respondents were able to generate some metadata automatically. see smith-yoshimura, rlg programs descriptive metadata practices survey results, 6; yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 35. 21. jung-ran park and caimei lu, “application of semiautomatic metadata generation in libraries: types, tools, and techniques,” library & information science research 31, no. 4 (2009): 225–31. 22. park, “semantic interoperability”; sarah l. shreeves et al., “is ‘quality’ metadata ‘shareable’ metadata? the implications of local metadata practices for federated collections” (paper presented at the 12th national conference of the association of college and research libraries, apr. 7–10, 2005, minneapolis, minnesota), https://www.ideals.uiuc.edu/handle/2142/145 (accessed mar. 24, 2009); amy s. jackson et al., “dublin core metadata harvested through oai-pmh,” journal of library metadata 8, no. 1 (2008): 5–21; lois mai chan and marcia lei zeng, “metadata interoperability and standardization—a study of methodology part i: achieving interoperability at the schema level,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/ dlib/june06/chan/06chan.html (accessed mar. 24, 2009); marcia lei zeng and lois mai chan, “metadata interoperability and standardization—a study of methodology part ii: achieving interoperability at the record and repository levels,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/ zeng/06zeng.html (accessed mar. 24, 2009). 23. thomas r. bruce and diane i. hillmann, “the continuum of metadata quality: defining, expressing, exploiting,” in metadata in practice, ed. hillman and westbrooks, 238–56; heery, “metadata futures”; park, “metadata quality in digital repositories.” 24. jung-ran park, ed., “metadata best practices: current issues and future trends,” special issue, journal of library metadata 9, no. 3/4 (2009). 25. see park, “semantic interoperability”; park and childress, “dublin core metadata semantics.” 26. park and childress, “dublin core metadata semantics”; park and lu, “application of semi-automatic metadata generation in libraries.” editorial | truitt 3 marc truitteditorial w elcome to 2009! it has been unseasonably cold in edmonton, with daytime “highs”—i use the term loosely— averaging around -25°c (that’s -13°f, for those of you ital readers living in the states) for much of the last three weeks. factor in wind chill (a given on the canadian prairies), and you can easily subtract another 10°c. as a result, we’ve had more than a few days and nights where the adjusted temperature has been much closer to -40°, which is the same in either celsius or fahrenheit. while my boss and chief librarian is fond of saying that “real canadians don’t even button their shirts until it gets to minus forty,” i’ve yet to observe such a feat of derring-do by anyone at much less than twenty below . even your editor’s two labrador retrievers—who love cooler weather—are reluctant to go out in such cold, with the result that both humans and pets have all been coping with bouts of cabin fever since before christmas. n so, when is it “too cold” for a server room? why, you may reasonably ask, am i belaboring ital readers with the details of our weather? over the weekend we experienced near-simultaneous failures of both cooling systems in our primary server room (sr1), which meant that nearly all of our library it services, including our opac (which we host for a consortium of twenty area libraries), a separate opac for edmonton public library, our website, and access to licensed e-resources, e-mail, files, and print servers had to be shut down. temperature readings in the room soared from an average of 20–22°c (68–71.5°f) to as much as 37°c (98.6°f) before settling out at around 30°c (86°f). we spent much of the weekend and beginning of this week relocating servers to all manner of places while the cooling system gets fixed. i imagine that next we may move one into each staff person’s under-heated office, where they’ll be able to perform double duty as high-tech foot warmers! all of this happened, of course, while the temperature outside the building hovered between -20° and -25°c. this is not the first time we’ve experienced a failure of our cooling systems during extremely cold weather. last winter we suffered a series of problems with both the systems in sr1 and in our secondary room a few feet away. the issues we had then were not the same as those we’re living through now, but they occurred, as now, at the coldest time of the year. this seeming dichotomy of an overheated server environment in the depths of winter is not a matter of accident or coincidence; indeed, while it may seem counterintuitive, the fact is that many, if not all, of our cooling woes can be traced to the cold outside. the simple explanation is that extreme cold weather stresses and breaks things, including hvac systems. as we’ve tried to analyze this incident, it appears likely that our troubles began when the older of our two systems in sr1 developed a coolant leak at some point after its last preventive maintenance servicing in august. fall was mild here, and we didn’t see the onset of really severe cold weather until early to mid-december. since the older system is mainly intended for failover of the newer one, and since both systems last received routine service recently, it is possible that the leak could have developed at any time since, although my supposition is that it may be itself a result of the cold. in any case, all seemed well because the newer cooling system in sr1 was adequate to mask the failure of the older unit, until it suffered a controller board failure that took it offline last weekend. but, with the failure of the new system on saturday, all it services provided from this room had to be brought down. after a night spent trying to cool the room with fans and a portable cooling unit, we succeeded in bringing the two opacs and other core services back online by sunday, but the coolant leak in the old system was not repaired until midday monday. today is friday, and we’ve limped along all week on about 60 percent of the cooling normally required in sr1. we hope to have the parts to repair the newer cooling system early next week (fingers crossed!). some interesting lessons have emerged from this incident, and while probably not many of you regularly deal with -30°c winters, i think them worth sharing in the hope that they are more generally applicable than our winter extremes are: 1. document your servers and the services that reside on them. we spent entirely too much time in the early hours of this event trying to relate servers and services. we in information technology (it) may think of shutting down or powering up servers “fred,” “wilma,” “betty,” and “barney,” but, in a crisis, what we generally should be thinking of is whether or not we can shut down e-mail, file-and-print services, or the integrated library system (ils) (and, if the latter, whether we shut down just the underlying database server or also the related staff and public services). perhaps your servers have more obvious names than ours, in which case, count yourself fortunate. but ours are not so intuitively named—there is a perfectly good reason for this, by the way—and with distributed applications where the database marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2009 may reside here, the application there, and the web front end yet somewhere else, i’d be surprised if your situation isn’t as complex as ours. and bear in mind that documentation of dependencies goes two ways: not only do you want to know that “barney” is hosting the ils’s oracle database, but you also want to know all of the servers that should be brought up for you to offer ils–related services. 2. prioritize your services. if your cooling system (or other critical server-room utility) were suddenly only operating at 50 percent of your normal required capacity, how would you quickly decide which services to shut down and which to leave up? i wrote in this space recently that we’ve been thinking about prioritized services in the context of disaster recovery and business continuity, but this week’s incident tells me that we’re not really there yet. optimally, i think that any senior member of my on-call staff should be empowered in a given critical situation to bring down services on the basis of a predefined set of service priorities. 3. virtualize, virtualize, virtualize. if we are at all typical of large libraries in the association of research libraries (and i think we are), then it will come as no surprise that we seem to add new services with alarming frequency. i suspect that, as with most places, we tend to try and keep things simple at the server end by hosting new services on separate, dedicated servers. the resulting proliferation of new servers has led to ever-greater strains on power, cooling, and network infrastructures in a facility that was significantly renovated less than two years ago. and i don’t see any near-term likelihood that this will change. we are, consequently, in the very early days of investigating virtualization technology as a means of reducing the number of physical boxes and making much better use of the resources—especially processor and ram— available to current-generation hardware. i’m hoping that someone among our readership is farther along this path than we and will consider submitting to ital a “how we done it” on virtualization in the library server room very soon! 4. sometimes low-tech solutions work . . . no one here has failed to observe the irony of an overheated server room when the temperature just steps away is 30° below. our first thought was how simple and elegant a solution it would be to install ducting, an intake fan, and a damper to the outside of the building. then, the next time our cooling failed in the depths of winter, voila!, we could solve the problem with a mere turn of the damper control. 5. . . . and sometimes they don’t. not quite, it seems. when asked, our university facilities experts told us that an even greater irony than the one we currently have would be the requirement for can$100,000 in equipment to heat that -30°c outside air to around freezing so that we wouldn’t freeze pipes and other indoor essentials if we were to adopt the “low-tech” approach and rely on mother nature. oh, well . . . n in memoriam most of the snail mail i receive as editor consists of advertisements and press releases from various firms providing it and other services to libraries. but a few months ago a thin, hand-addressed envelope, postmarked pittsburgh with no return address, landed on my desk. inside were two slips of paper clipped from a recent issue of ital and taped together. on one was my name and address; the other was a mailing label for jean a. guasco of pittsburgh, an ala life member and ital subscriber. beside her name, in red felt-tip pen, someone had written simply “deceased.” i wondered about this for some time. who was ms. guasco? where had she worked, and when? had she published or otherwise been active professionally? if she was a life member of ala, surely it would be easy to find out more. it turns out that such is not the case, the wonders of the internet notwithstanding. my obvious first stop, google, yielded little other than a brief notice of her death in a pittsburgh-area newspaper and an entry from a digitized september 1967 issue of special libraries that identified her committee assignment in the special libraries assocation and the fact that she was at the time the chief librarian at mcgraw-hill, then located in new york. as a result of checking worldcat, where i found a listing for her master’s thesis, i learned that she graduated from the now-closed school of library service at columbia university in 1953. if she published further, there was no mention of it on google. my subsequent searches under her name in the standard online lis indexes drew blanks. from there, the trail got even colder. mcgraw-hill long ago forsook new york for the wilds of ohio, and it seems that we as a profession have not been very good at retaining for posterity our directories of those in the field. a friend managed to find listings in both the 1982–83 and 1984–85 volumes of who’s who in special libraries, but all these did was confirm what i already knew: ms. guasco was an ala life member, who by then lived in pittsburgh. i’m guessing that she was then retired, since her death notice gave her age as eighty-six years. of her professional career before that, i’m sad that i must say i was able to learn no more. student use of library computers: are desktop computers still relevant in today’s libraries? susan thompson information technology and libraries |december 2012 20 abstract academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. a two-year study conducted at california state university san marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. the study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. it also showed that students who used computers in the library were more likely to use other library services and physical collections. introduction for more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. originally, the focus was on providing access to library resources, first the online catalog and then journal databases. for the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. however, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. do students still use and value access to computers in the library? what impact does student computer use have on the library and its other services? have we reached the point where we should reevaluate how we use computers to support student research? california state university san marcos (csusm) is a public university with about nine thousand students, primarily undergraduates from the local area. csusm was established in 1991 and is one of the youngest campuses in the 23-campus california state university system. the library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. one of the core principles in planning the new building was the vision of the library as a teaching and learning center. as a result, a great deal of thought went into the design of technology to support this vision. rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. we also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 susan thompson (sthompsn@csusm.edu) is coordinator of library systems, california state university san marcos. student use of library computers | thompson 21 rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. we used two strategies. first, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. in addition, simplified “walk-up” computers are available on every floor. the simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. the other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. more than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. however, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. in 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. the goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. the study addressed several specific research questions. first, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. were students using the library’s computers appropriately? second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. how did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? this article will focus on the study results pertaining to the second and third research questions. we found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. however, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for libraryprovided desktop computers. literature review the concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. information commons, defined by steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 one of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 information technology and libraries |december 2012 22 providing access to computers appears to contribute to the value of libraries as “place.” shill and toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. for example, the university of tennessee and indiana university have shown significant increases in gate counts after they implemented their commons.5 while many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. burke looked at factors influencing students’ choice of computing facilities at an australian university.6 given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. they rated the library most highly on its convenience and closeness to resources. perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. the 2010 nationwide educause center for applied research (ecar) study, from the same year as the second csusm study, showed that 89 percent of students had laptops.7 other nationwide studies have corroborated this high level of laptop ownership.8 so, does this increased use of laptops and mobile devices have affect the use of desktop computers? the 2010 ecar study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of csusm’s new library building. pew’s internet & american life project trend data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. university of colorado boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 while they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. to entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. tom, voss, and scheetz found students want flexibility with a spectrum of technological options.11 certainly, they want wi-fi and power outlets to support their mobile technology. however, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. while the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. a 2009 wall street journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 for the last three years, educause’s horizon reports have made support for non-laptop mobile technologies one of the top trends. the 2009 horizon report mentioned that in countries like japan, “young people equipped student use of library computers | thompson 23 with mobiles often see no reason to own personal computers.”13 in 2010, horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 members of the group with the mobile devices were found to work on the course more during their spare time. the 2011 horizon report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for internet access.15 therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. trends described in the ecar and horizon studies make it clear that students own multiple devices. so how do they use them in the study environment? head’s interviews with undergraduate students at ten us campuses found that “students use a less is more approach to manage and control all of the it devices and information systems available to them.”16 for example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. the question then may not be which technology libraries should support but rather how to support the right technology at the right time. method the csusm study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. the study protocol received exempt status by the university human subjects review board. it was carried out twice over a two-year period to determine whether time of the semester affected usage. in 2009, the study was administered at the end of the spring term, april 15 to may 3. we expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. the 2010 study was conducted near the beginning of the term, february 4 to february 18. we that early term students would be less engaged in academic assignments, particularly major research projects. we carried out each study over a two-week period. an attempt was made to check consistency by duplicating each time and location. each location was surveyed monday—thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. the survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. the fourth and fifth floor observations also included the library’s forty quiet study rooms. for the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. we used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. the survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. it also asked students for their suggestions for changes in the library. the survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. the survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation information technology and libraries |december 2012 24 period. users who refused to take the survey were counted in the total number of students asked to do the survey. however, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. the “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. the second method involved conducting “over-the-shoulder” observations of students using the library computers. while students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. all users in the area were observed whether or not they had agreed to take the survey. the one exception was users in group-study rooms. the observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. the observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). the student was not identified on the form. we felt that these observations could validate information provided by the users on the survey. results we completed 1,452 observations in 2009 and 2,501 observations in 2010. the gate counts for the primary month each study took place—70,607 for april 2009 and 59,668 for february 2010— show the library was used more heavily during the final exam period. the larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. the observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. in 2009, 17 percent of students were observed using laptops (see table1). this number almost doubled in 2010 to 33 percent. most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or ipods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. student use of library computers | thompson 25 table 1. mobile technology observed in 2009, 1,141 students completed the computer-use survey. however, we were unable to accurately determine the return rate that year. the nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. thus the majority of the refusals to take the survey were because the subject had already completed one previously. the 2010 study accounted for this phenomenon by counting refusals and duplications separately. in 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. the 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. the 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. table 2. frequency of computer use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% laptop in use headphones in use mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% daily when on campus several times a week several times a month rarely use comps in library 2009 2010 information technology and libraries |december 2012 26 in both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. students also indicated they were frequent users (see table 2). in 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. the frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. about a quarter of students preferred library computers because of their close access to other library services. table 3. preferred computer to use in the library the types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). it is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. however, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. the 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). in 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. the majority of students own music players. the one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% sit-down pc walk-up pc own laptop laptop checked out in library 2009 2010 student use of library computers | thompson 27 table 4. technology devices owned by students (2010) to understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. by far the most popular use of the library, by more than three-quarters of the students, was for study. around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. the biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. the late-term students were 9 percent more likely to meet with others. by contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% kindle/book reader other handheld devices netbook smart phone desktop computer regular cell phone ipod/mp3 music player laptop information technology and libraries |december 2012 28 table 5. other library services used in 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. the question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. the next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% other pick up ill/circuit create a video/web page use a reserve book ask questions/get help look for journals/newspapers checkout a book use media meet with others study 2009 2010 non-library student use of library computers | thompson 29 discussion the results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. this preference is confirmed by the statistics reported by csusm’s campus it department, which tracks computer login data. this data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. for instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. the data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. supporting the popularity of using computers in the library is the fact that most students are repeat customers. table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. in contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. the 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. the kind of computer that users prefered changed somewhat over the course of the study. the preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). one reason for this 12 percent drop may be related to how the survey was adminstered. the 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. as a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. these numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). the drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. the other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. for instance, the pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 however, it is interesting to note that, according to the pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. the 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. we found that 77 percent of csusm students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). the combined 84 percent laptop ownership is comparable with the 2010 ecar study’s finding of 89 percent student laptop ownership nationwide.18 this high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of csusm students still prefer to use desktop computers in the library. aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent information technology and libraries |december 2012 30 of respondents. further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. to try to better understand this seemingly contradictory behavior, we have done some further investigation. anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. the desktop computers’ speed and reliable internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. in may 2012, the csusm library held a focus group that provided additional insight to the question of desktops vs. laptops. all of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. when asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. the 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). table 6. reasons students prefer using library computer areas for almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. the top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. the most popular library service was the media 0% 5% 10% 15% 20% 25% 30% library services are close library staff are close 2009 2010 student use of library computers | thompson 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. it is interesting that the level of use of these library services was similar whether early or late in the term. the biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. for example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. to put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. the implication is clear that students using computers in the library are much more likely to use the library’s other services. we usually think of providing desktop computers as a service for students, and so it is. however, the study results show that providing computers also benefits the library itself. it reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. the popularity of the library computers provide us with a “captive audience” of repeat customers. conclusion the csusm library technology that was planned in 2004 is still meeting students’ needs. although most of our students own laptops, most still prefer to use desktop computers in the library. in fact, providing a full-service computer environment to support the entire research process benefits the entire library. students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. going forward, several questions arise for future studies. csusm is a commuter school. students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. one question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. if the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? the 2010 study was completed just a few weeks before the first ipad was introduced. since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the macbook air, more likely to be carried on campus by students and used them more frequently for their research? how important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? are students using smart phones and other mobile devices for study purposes? in fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? this study’s results make it clear that we can’t just look at data such as ecar’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. as information technology and libraries |december 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. however, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. references 1. susan m. thompson and gabriella sonntag. “chapter 4: building for learning: synergy of space, technology and collaboration.” learning commons: evolution and collaborative essentials. oxford: chandos publishing (2008): 117-199. 2. heidi m. steiner and robert p. holley, “the past, present, and possibilities of commons in the academic library,” reference librarian 50, no. 4 (2009): 309–332. 3. michael j. whitchurch and c. jeffery belliston,“information commons at brigham young university: past, present, and future,” reference services review 34, no. 2 (2006): 261–78. 4. harold shill and shawn tonner, “creating a better place: physical improvements in academic libraries, 1995–2002,” college & research libraries 64 (2003): 435. 5. barbara i. dewey, “social, intellectual, and cultural spaces: creating compelling library environments for the digital age,” journal of library administration 48, no. 1 (2008): 85–94; diane dallis and carolyn walters, “reference services in the commons environment,” references services review 34, no. 2 (2006): 248–60. 6. liz burke et al., “where and why students choose to use computer facilities: a collaborative study at an australian and united kingdom university,” australian academic & research libraries 39, no. 3 (september 2008): 181–97. 7. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, co: educause center for applied research, october 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed march 21, 2012). 8. pew internet & american life project, “adult gadget ownership over time (2006–2012),” http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx (accessed june 14, 2012); the horizon report: 2009 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2010 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2011 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012). 9. pew internet, “adult gadget ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf student use of library computers | thompson 33 10. deborah keyek-franssen et al., computer labs study university of colorado boulder office of information technology october 7, 2011, http://oit.colorado.edu/sites/default/files/labsstudypenultimate-10-07-11.pdf (accessed june 15, 2012). 11. j. s. c. tom, k. voss, and c. scheetz[full names?], “the space is the message: first assessment of a learning studio,” educause quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed june 25, 2012). 12. nick wingfield, “time to leave the laptop behind,” wall street journal, february 23, 2009, http://online.wsj.com/article/sb122477763884262815.html (accessed june 15 2012). 13. the horizon report: 2009 edition. 14. the horizon report: 2010 edition. 15. the horizon report: 2011 edition. 16. alison j. head and michael b. eisenberg, “balancing act: how college students manage technology while in the library during crunch time,” project information literacy research report, information school, university of washington, october 12, 2011, http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf (accessed june 14, 2012). 17. pew internet, “adult gadget ownership.” 18. smith and caruso, ecar study. http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/sb122477763884262815.html http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf table 1. mobile technology observed discussion gathering strength to combat access inequality: how a small rural public library supported virtual access for public school students, staff, and their families public libraries leading the way gathering strength to combat access inequality how a small rural public library supported virtual access for public school students, staff, and their families julie lane information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15161 julie lane (jlane@peclibrary.org) is technology resource centre coordinator and educational resource consultant, county of prince edward public library and archives. © 2022. prince edward county (pec) is located east of toronto and covers approximately 1,050 square kilometers. pec is a part of the hastings prince edward district school board (hpedsb) and have a total of 6 public schools, one catholic school, and one private school. the other county serviced by our school board is hastings county. the county of prince edward public library (cpepl) system of 6 branches services just under 25,000 residents and countless seasonal visitors during the tourism season. our public school board services approximately 15,000 students across 7,220 square kilometers and 39 in-person schools and a k-10 virtual school across the two counties. starting off a technology column with a bunch of statistics is not exactly how i figured i would write this. however, context is key when discussing equity and access; and in this piece, i intend to highlight how both of those are made significantly easier to achieve for community stakeholders, with the presence of technology and education. when the stay-at-home orders were announced in march 2020 due to the covid-19 pandemic, we knew that we would not be able to hold our scheduled and planned public library programs. we turned to live streaming story times, maker programs, and author visits, all using what equipment we had on hand—tablets, laptops, and the internet. once it became clear that students in the public schools would not return to in-person learning within any short amount of time, all school boards in ontario ensured that enough chromebooks were purchased so that every student had their own dedicated device, with the assumption that providing a device meant all students could participate in remote learning. teachers rushed to transition their teaching plans to an online format; school administrators scrambled to schedule safe device pick-ups for students; and parents were not only juggling professional responsibilities and parenthood, but now teaching and tech support. although school boards provided tools to meet the “classroom” requirements, they could not ensure that every single student had access to a high-speed internet connection, nor could they offer school library access remotely. this is where the cpepl was able to offer support. the global shut down had a significant impact on the relationship that the cpepl had with the schools in our county. a large focus of mine was to rebuild those working relationships to support students, staff, and families, and ultimately demonstrate in actionable ways how the local pu blic library system was there for them. one immediate way i thought we could demonstrate support was through lending our wi-fi hotspots. hotspot lending programs through public libraries have gained popularity over the last few years. although our program had been in place for nearly 5 years, i am always surprised at the number of people that do not realize it is an available resource. with that in mind, i persistently reached out to the school administrators in our area and set up meetings to discuss how our borrow the internet program could benefit those working remotely without reliable internet. wait lists for our 9 available hotspot devices drastically increased, but mailto:jlane@peclibrary.org information technology and libraries june 2022 gathering strength to combat access inequality | lane 2 our patron community was incredibly supportive of our students and would frequently request that their loan, which is at maximum 7 days in length, be passed to a student. though connecting families with internet hotspots was helpful for the required online learning, we could not fill the gap completely. if we had an unlimited communications budget, the situation would have been easily remedied, but, as we all know in the library world, budgets can be very tight. this fact pushed us to find creative ways to bring as many resources as possible to the students, staff, and families in our community. to broaden the reach to individual schools (and staying persistent with that outreach), i focused on not only ensuring that school communities knew what physical resources the library had, but also what electronic resources were available. these conversations and emails with school administrators led me to get in contact with the curriculum coordinator at the board office. this connection was a complete game changer. instead of us, as a public entity outside of the school community, contacting individual schools and trying to build relationships with teachers, librarians, and administrators, we had the person who oversaw all of the school librarians, library technicians, and curriculum development for the k-8 grades on our side. the coordinator was on board to help us make the desired connections with the schools in a number of ways. she put us in contact with the curriculum coordinator for the secondary grades (9-12) and our program and service list was sent from the board office to every teacher, principal, school librarian, and library technician in prince edward county. we were then able to set up a meeting with the coordinator of assistive technologies for the board, which set us on a track to completely revamp how we marketed and allocated our resources to schools. it became clear in our first conversation that we needed to get students connected with their public libraries as quickly and efficiently as possible. with students split between in-person learning, virtual learning, or a combination of the two, with still minimal to no access to school library borrowing, the online resources of the public library system seemed like the perfect solution. not only would connecting students, staff, and their families with their local public library be a way to get everyone reading, but we were fulfilling the opportunity to ensure that everyone had genuine and equitable access. what the school board had observed was that the required shift to remote learning made the inequality of literature access glaringly obvious. students who relied on their school library for reading were not getting that opportunity and students who had individual education plans were jumping through hoops to get digital copies of material. so though everyone had a school supplied chromebook, not everyone had the same access. this is where public library subscriptions to hoopla and libby came to the rescue for providing current and popular literature in a variety of electronic formats for students to immediately access for both course reading and leisure enjoyment. connecting with like-minded, growthand education-oriented people is incredibly empowering. the curriculum coordinators at the board office were so enthusiastic about connecting students, staff, and families in our school board with their public library that it made the next parts of the process not only successful, but fun as well! the curriculum coordinators and i created a presentation that we brought first to school administrators in prince edward county. having public library advocacy come from the school board was incredibly influential and a big step toward issuing library cards to students. once we had buy-in from the school administrators, we circulated registration forms for families to fill out and get everyone in their household public library access. we found that the easiest way to do this information technology and libraries june 2022 gathering strength to combat access inequality | lane 3 was using google forms. it was simple for parents to fill out and easy for library staff to glean the required information for card registration. since the library was also working with the virtual school, we needed to be able to issue library cards even if some students were not in our catchment area. it was common for virtual classes to consist of students from the smallest village in pec and all the way up to the northern most part of hastings county, a full 3 hours’ drive away. cpepl was able to accommodate this need. pec is a tourist destination and frequently issues cards for visitors staying in the area for an extended period of time under the rule of if you “wo rk, live, or play” in pec, you are eligible for a public library card. once library cards were set up or renewed for all families who requested them through the google form, i got to work teaching students and staff how to access library resources. after communicating with the curriculum staff and public school administrators, it was decided that creating an information presentation on getting started with hoopla was the best course of action. hoopla is an incredibly intuitive application in regards to the format possibilities (ebooks and audiobooks) as well as adjustable features within each format. the available settings and adjustment options make the reading experience comfortable and accessible as possible for users. also, since there is no wait time to borrow materials, this allowed entire classes learning remotely to all check out the same title and read together. the material presented to students was easy to understand and interactive. the session provided ample time for students to follow along and test each feature in the hoopla app with their own individual book selections. the best part? this presentation was just the starting point. while we were only able to schedule and virtually deliver this presentation at two in-person schools, the other five schools in pec and a number of primary classes in the virtual school still participated in the google form for library card registration. teachers started asking what else the public library had to offer to enhance the curriculum delivery with additional resources. many community teachers were reminded of the public library’s services and resources (beyond just hoopla) and reached out for class visits or access to materials. other schools outside of our prince edward county catchment reached out and connected with their local public libraries, or vice versa. we are still working to develop ways to meet the needs of students, staff, and their families through the public library. some schools in the northern area of the region have students coming from multiple, different public library catchment areas, and most of these libraries do not have the same resources as others, especially in the case of smaller systems. this posed an issu e of equitable access for students: why should some students in the class have access to library online resources, and some not because they come from different/smaller communities? we were able to mitigate this issue with the virtual school, but for students attending in-person learning, we could not give library cards to every student in the school board. thankfully, another public library system in our area stepped up their access to offer virtual library access to any student or teacher in hastings county (so everywhere except prince edward county). this recognition of the importance of equitable access enabled students to not only regain access to a public library system, but it also ensured that all students could access books in the way that best suited them. when i ask a class if listening to an audiobook counts as reading, it amazes me that the majority of the class say “no.” or if i ask students if they had ever read an ebook, some would say it was not a “real” book. these comments and notions are not only untrue, but they are information technology and libraries june 2022 gathering strength to combat access inequality | lane 4 also exclusionary. countless students need other formats than just printed materials. how many would benefit from listening to an audiobook along with reading a printed version? how many students dislike reading because it is just hard to see the words, but if the text was more spaced out, or a different font, it would make all the difference? how many times is a student not able to access a book they want because all available copies are already checked out at their school library? these are issues students in the classes i work with face. having a public library card can significantly ease these barriers to access. all in all, we processed hundreds of card requests and renewals and were able to powerfully illustrate to teachers how they could meaningfully integrate public library resources into their classrooms, either virtually or physically. our requests for library visits came back up to prepandemic levels, but we were working with more schools than we had previously. teachers were, and still are, reaching out and asking if we can get extra copies of books, or if we can lead virtual novel studies. one of our more popular pieces of progress is the integration of our coding programs with other subjects. currently, i am running a ukulele program where students are writing group arrangements using binary code as the basis for composition. we have classes doing art projects with robotics and integrating math learning objectives. we have done virtual story time and connected the story to creating scratch programs. the possibilities are endless , and now that we once again have the interest from teachers, we are working with them to support their students and all the learning that comes with incorporating technology and maker-thinking into a classroom environment. the momentum has not let up, and we are beyond thrilled. our communities and local school board have embraced the reality that public libraries are more than just books. public libraries are a critical part of any community and have the power to be a meaningful component to education at all levels. having schools and all educational stakeholders using public library services not only broadens the reach of a public library, but also broadens our advocacy potential. we know there is still a long way to go in terms of genuine equitable access, especially when it comes to technology. internet connectivity and technology literacy are just the tip of the iceberg, but when organizations support each other to truly serve their community, collectively, that is how you make change. 6 information technology and libraries | march 2010 sandra shores is [tk] sandra shores editorial board thoughts: issue introduction to student essays t he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in graduate library studies. it has been many years since i was a library science student—twenty-five in fact. i remember remarking to a future colleague at the time that i found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because i had just graduated. i was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. while i could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. the busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. i now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason i have been delighted to work with these student authors to help see their papers through to publication. the six papers represent the strongest work from a wide selection that students submitted to the lita/ ex libris student writing award competition. this year’s winner is michael silver, who looks forward to graduating in the spring from the mlis program at the university of alberta. silver entered the program with a strong library technology foundation, having provided it services to a regional library system for about ten years. he notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. as a result, there are a number of practices that libraries should adopt from the it world that many library staff have never been exposed to.”1 his paper, which details the implementation of an open-source monitoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. indeed, many of us who work in it in libraries have a library background and still have a great deal to learn from it professionals. silver is contemplating a phd program or else a return to a library systems position when he graduates. either way, the profession will benefit from his thoughtful, well-researched, and useful contributions to our field. todd vandenbark’s paper on library web design for persons with disabilities follows, providing a highly practical but also very readable guide for webmasters and others. vandenbark graduated last spring with a masters degree from the school of library and information science at indiana university and is already working as a web services librarian at the eccles health sciences library at the university of utah. like mr. silver, he entered the program with a number of years’ work experience in the it field, and his paper reflects the depth of his technical knowledge. vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying comment for readers of this journal. ilana tolkoff tackles the challenging concept of global interoperability in cataloguing. she was fascinated that a single database, oclc, has holdings from libraries all over the world. this is also such a recent phenomenon that our current cataloging standards still do not accommodate such global participation. i was interested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and independently developed cataloging standards. tolkoff also graduated this past spring and is hoping to find a position within a music library. marijke visser addresses the overwhelming question of how to organize and expose internet resources, looking at tagging and the social web as a solution. coming from a teaching background, visser has long been interested in literacy and life-long learning. she is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” she is excited by “the potential for creativity made possible by technology” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” visser looks forward to graduating in may. mary kurtz examines the use of the dublin core metadata schema within dspace institutional repositories. as a volunteer, she used dspace to archive historical photographs and was responsible for classifying them using dublin core. she enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. finally, lauren mandel wraps up the issue exploring the use of a geographic information system to understand how patrons use library spaces. mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. she is currently a doctoral candidate at florida state university and sees an academic future for herself. mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is guest editor of this issue and operations manager, information technology services, university of alberta libraries, edmonton, alberta, canada. editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 library management practices in the libraries of pakistan: a detailed retrospective article library management practices in the libraries of pakistan a detailed retrospective asim ullah, shah khusro, and irfan ullah information technology and libraries | september 2022 asim ullah (asimullah@uop.edu.pk) is doctoral candidate, department of computer science, university of peshawar. shah khusro (khusro@uop.edu.pk) is professor, department of computer science, university of peshawar. corresponding author irfan ullah (irfan@sbbu.edu.pk) is assistant professor, department of computer science, shaheed benazir bhutto university, sheringal. © 2022. abstract library and information science has been at an infant stage in pakistan, primarily in resource management, description, discovery, and access. the reasons are many, including the lack of interest and use of modern tools, techniques, and best practices by librarians in pakistan. finding a solution to these challenges requires a comprehensive study that identifies the current state of libraries in pakistan. this paper fills this gap in the literature by reviewing the relevant literature published between 2015 and 2021 and selected through a rigorous search and selection methodology. it also analyzes the websites of 82 libraries in pakistan through a theoretical framework based on various aspects. the findings of this study include: libraries in pakistan need a transition from traditional and limited solutions to more advanced information and communication technology (ict)-enabled, user-friendly, and state-of-the-art systems to produce dynamic, consumable, and sharable knowledge space. they must adopt social semantic cataloging to bring all the stakeholders on a single platform. a libraries consortium should be developed to link users to local, multilingual, and multicultural collections for improved knowledge production, recording, sharing, acquisition, and dissemination. these findings benefit pakistani libraries, librarians, information science professionals, and researchers in other developing countries. to the best of our knowledge, this is the first study of its kind providing insights into the current state of libraries in pakistan through the study of their websites using a rigorous theoretical framework and in the light of the latest relevant literature. introduction with the inception of the web, library and information science (lis) professionals and researchers have solved several major challenges and issues regarding resource description, discovery, and access. yet, many new problems arise in the practices and services delivered by libraries if they are not in line with emerging technologies and standards. these problems are promptly addressed by the libraries and their lis professionals using cutting-edge technologies, sufficient training, and the availability of the required resources. this practice keeps these libraries functional and acceptable among their users, especially in developed countries. on the other hand, in less developed and developing countries, libraries are losing their importance, which may be due to the adherence of these libraries to outdated lis approaches. pakistan is one of the developing countries where this is often observed. but, before devising a solution to regain their value, importance, and acceptance, it is essential to identify the current state of libraries in pakistan. to address this need, this paper reviews and summarizes the findings of the well-reputed published literature regarding libraries in pakistan and collects and analyzes important details from library websites. mailto:asimullah@uop.edu.pk mailto:khusro@uop.edu.pk mailto:irfan@sbbu.edu.pk information technology and libraries september 2022 library management practices in the libraries of pakistan 2 ullah, khusro, and ullah this study is inspired by two review articles that considered different aspects of lis research.1 the most similar is the article from noh and chang, who analyzed lis practices by reviewing relevant literature regarding libraries in korea from 1970 to 2018.2 however, to the best of our knowledge, we found no holistic, systematic literature review covering the current state of library management practices in pakistan and highlighting its key challenges, issues, and research opportunities. similarly, ganaee and rafiq studied the current state and features of the websites of the 85 academic libraries of pakistan via surveys and interviews to identify their issues and problems.3 the websites were analyzed for contrasting color schemes, readable text, minimal use of horizontal scrolling, language, staff details, opacs, navigation, and other details of the information architecture. inspired by ganaee and rafiq, this study contributes a theoretical evaluation framework to study the current state of libraries by analyzing their websites. it comprises several aspects and criteria, including the availability of general information and information about resources and collections, the use of web 2.0 tools, the design of the website, the offering of web-based services, the use of instruction tools, and the application of accessibility guidelines for supporting individuals with visual and other impairments. the paper extends the findings and implications of the aforementioned research by highlighting the current state of library management practices in the libraries of pakistan, the challenges and issues those libraries face, and the research opportunities that lie ahead of them in the realm of modern digital technologies. the paper provides a systematic literature review of the relevant literature on the libraries of pakistan and devises a theoretical framework to collect and analyze data by visiting the websites of the selected 82 libraries of pakistan that have an online presence.4 the study has implications for researchers and lis professionals in pakistan and those of developing countries coping with similar challenges and issues. the first section of this paper presents the methodology for selecting relevant literature by adopting the well-known prisma framework.5 the second section presents a summary of key findings. the third section presents a discussion and analysis. the last section concludes the paper, followed by endnotes and an appendix holding data about the selected 82 websites of the libraries of pakistan. methodology this section discusses the literature search and selection strategy and the theoretical evaluation framework used to study the websites of the selected 82 libraries of pakistan. the literature search and selection strategy this section discusses the search and selection process for collecting the relevant literature using google scholar. google scholar indexes more than 389 million records and has the highest coverage of knowledge and research areas.6 we developed rigorous search and selection criteria by adopting the prisma methodology for gathering the relevant scholarly literature. 7 the prisma methodology is a systematic literature review approach, ensuring transparent and complete reporting on selecting relevant literature in a given course of inquiry.8 it tracks a full record of how the relevant literature was selected. it visualizes the details in a prisma flow diagram,9 shown in figure 1. the first step in applying prisma and following this diagram is to develop a search framework consisting of keywords or search queries that maximize the coverage and accuracy of finding relevant studies. the search framework for this study was developed by following ullah and khusro and liberati et al.10 table 1 summarizes the search framework and provides details on the search query and the number of total records matched by reading the information technology and libraries september 2022 library management practices in the libraries of pakistan 3 ullah, khusro, and ullah search results list’s title and text snippet, which resulted in the number of relevant records reported in the third column. the duplicates that appear after entering the next search query are recorded in the fourth column. the duplicates are removed from the counting of relevant records matched against the given search query. the net results are reported in the final column to be further screened by title, abstract, and other details on the publisher’s website. the inclusion/exclusion criteria are required to narrow down the selection criteria further so that only relevant items are included, and the irrelevant ones are filtered out or excluded. using this search framework, our inclusion criteria selected the following publications: • publications that discuss computer and web-based software solutions regarding resource acquisition, description (cataloging), discovery, and access inside the library or libraries of pakistan. • publications that highlight the use and the adaptation of technologies, especially modern cataloging practices, the use of semantic web, and linked open data (lod) in the libraries of pakistan. • publications that highlight issues and challenges faced by pakistani libraries to become part of the global library community and learn from their best practices in terms of software and related technologies. • publications in the english language with pakistani context and published during 2015 – 2021. the exclusion criteria to remove items from the list included the following: • publications published before 2015 and written in languages other than english. • publications that are of low academic significance with low-quality publication venues. examples include papers having incomplete details or those published in non-peerreviewed journals and conferences. • theses, dissertations, surveys, review articles, patents, and citations. information technology and libraries september 2022 library management practices in the libraries of pakistan 4 ullah, khusro, and ullah table 1. the search framework – keywords and criteria for finding relevant publications s. no. search query records matched relevant records by title & text snippet duplicates identified net items to be screened by title & abstract 1. “library science”, “information science”, “lis”, “libraries”, “pakistan” 963 15 0 15 2. “academic libraries”, “university libraries”, “digital libraries”, “pakistan” 645 48 2 46 3. “library staff”, “training”, “resources”, “library automation”, “libraries”, “pakistan” 355 26 15 11 4. “libraries”, “university libraries”, “hec”, “digital library”, “pakistan” 258 71 39 32 5. “collection management”, “collection development”, “libraries”, “pakistan” 240 11 11 0 6. ”design”, “accessibility”, “usability”, “responsiveness”, “websites”, “libraries”, “pakistan” 109 3 0 3 7. “social networking”, “social web”, “libraries”, “facebook”, “twitter”, “youtube”, “pakistan” 93 0 0 0 8. “services”, “web 2.0”, “rating”, “review”, “comment”, “libraries”, “pakistan” 75 1 1 0 9. “library automation”, “computerization”, “library software”, “libraries”, “pakistan” 76 5 5 0 10. “automation”, “integrated library systems”, “library software”, “pakistan” 68 8 7 1 information technology and libraries september 2022 library management practices in the libraries of pakistan 5 ullah, khusro, and ullah s. no. search query records matched relevant records by title & text snippet duplicates identified net items to be screened by title & abstract 11. “azad jammu and kashmir”, “punjab”, “sindh”, “khyber pakhtunkhwa”, “balochistan”, “gilgit”, “libraries”, “pakistan” 30 0 0 0 12. “digitization”, “digital skills”, “digital competencies”, “libraries”, “pakistan” 29 4 3 1 13. “book selection”, “acquisition”, “classification”, “cataloging”, “libraries”, “pakistan” 17 2 1 1 total 2958 194 84 110 figure 1 visualizes the search process using the well-known prisma diagram.11 google scholar retrieved 2,958 records. the search queries brought 84 duplicate records identified and removed, leading to 2,874 records left for initial screening. after an initial screening using title and text snippets, we identified 110 records to be relevant, leaving a total of 2,764 records. these 110 records were then accessed by visiting their publisher’s websites to read their title, abstract, and other details. the full texts of these papers were obtained. after applying skimming on the full -text of these records and considering the inclusion/exclusion criteria, 26 were excluded leaving behind 84 publications for in-depth reading and analysis. an in-depth reading of these 84 articles and application of the inclusion/exclusion criteria identified a further 3 articles to be irrelevant, leaving behind 81 articles to be relevant and to be included in the analysis and discussion. information technology and libraries september 2022 library management practices in the libraries of pakistan 6 ullah, khusro, and ullah figure 1. prisma diagram regarding the selection of relevant publications. information technology and libraries september 2022 library management practices in the libraries of pakistan 7 ullah, khusro, and ullah the evaluation framework the theoretical evaluation framework used to collect relevant data from the selected websites is shown in table 2. it summarizes the purpose of each criterion and its possible values using abbreviations. table 2. the evaluation framework for libraries: criteria and their descriptions s. no. criteria explanation 1. s. no. the serial no. of each record in table a-1 of appendix a: details of libraries 2. library name purpose: the name of the library. values: library name 3. url values: the url of the library. values: url 4. library website design12 purpose: whether the website design is kept user-centered and accessible for the blind and visually impaired people. values: language clarity (lc: yes/no); presentation clarity (pc: yes/no); support for special people (sp: yes/no); logical structure (ls: yes/no); responsive web design (rwd: yes/no); multilinguality (mlw) of web pages (yes/no) 5. general information13 purpose: general information available on the website regarding content. values: copyright statement (c); resources and services (rs); mission/goals/objectives (g); news/events (ne); contact details (cn); frequently asked questions (faq); last updated (lu); map/directions to the library (mp); calender (cl); virtual tour (vt); policies (p); word cloud (wc); opening hours details (oh), not available (na) 6. web 2.0 tools14 purpose: the purpose of web 2.0 tools is to connect the library users and get updates from the library management about different contents demanded or needed by the library users. users can share and comment on the library holdings in their friends’ circle through these social networking applications. this criterion is set for analyzing whether social networking applications are used in pakistani libraries or not and which social networking tool is mostly used. values: facebook (fb); flickr (fr); twitter (t); rss (r); social bookmarking (s); instagram (i); blogs (b); wikis (w); youtube (yt); pinterest (pi); not available (na) information technology and libraries september 2022 library management practices in the libraries of pakistan 8 ullah, khusro, and ullah s. no. criteria explanation 7. web-based library services15 purpose: the services offered by the library on the web. it has subcolumns including search, browsing, and other. search, values: opac; author (at); title (tt); subject (su); keyword (ke); and advanced search (as) browsing, values: author (at); title (tt); subject (su); category (ca); keyword (ke) other, values: ask a librarian (al); email (em); loan (ln); awareness (aw); newsletter (nw); delivery (de); sms; ready reference questions (rq); chat (ch); library exhibits (lx); feedback (fb); reserving computers (rc); council services (cs); smartphone-based services (sp); not available (na) 8. resources and collections16 purpose: this criterion aims to analyze the nature, variety, and types of the resources that are mostly available in pakistani libraries. values: opac; bibliographic databases (bd); full-text databases (ft); journals (j); books (b); audiobooks (ab); magazines (mg); online reference sources (or); opac of other libraries (opac-o); multimedia collections (mc); other (o); special collections (sc); multilinguality (mlr) of resources; not available (na); information of physical resources (ph) 9. instructional tools17 purpose: tools to guide users in searching, browsing, and other services. values: research guides (rg); subject guides/pathfinders (sg); opac search tips (tips); information literacy program (infl); citation guides (cg); online tutorials (ot); user groups (ug); plagiarism guides (pg); webinars (wb); not available (na) 10. accessibility guidelines18 purpose: whether the website and library follows the accessibility guidelines values: yes/no summary of key observations the lis practices in pakistan’s libraries are gradually shifting from manual to digital. however, they are still far from meeting the latest international practices of resource management, acquisition, cataloging, classification, circulation, discovery, access, and accessibility for people with disabilities, including those with visual impairments. this section has a twofold objective. first, it reviews the latest literature regarding the current state of lis practices in the libraries of pakistan to identify challenges and issues being faced and future research opportunities. second, it information technology and libraries september 2022 library management practices in the libraries of pakistan 9 ullah, khusro, and ullah extends these findings by evaluating the websites of the selected 82 libraries for a clearer picture of the current state of these chosen libraries. lis practices in the light of published literature this section discusses lis practices in the libraries of pakistan with details from the published literature. the following subsections briefly discuss these practices. collection development and management books are given greater importance as the main holdings in the libraries of pakistan. currently, printed books are selected in the conventional manual manner. book selection tools include suppliers’ lists, publishers’ catalogs, book fairs/visits to book shops, book reviews, recommendations from the readers, selection committees, suggestion registers, and publishers/suppliers’ desk copies. the requested books are supplied to the libraries. librarians check these books physically and verify their accuracy. if a book is damaged or not present, it is reported to the vendor so that new copies could be arranged. there is a rare case of online or electronic book selection and procurement from national and international book vendors. there is also a very rare practice of purchasing softcover books in batches. these aspects have been discussed in several research publications by lis professionals and researchers of pakistan. one prominent reason is the lack of a sufficient budget and standard clear resource acquisition and management policy.19 the following are some of the notable challenges and issues that appeared in the published literature: • the development of the quality collection.20 • lack of formal policies and guidelines for collection selection, acquisition, and related activities.21 • lack of electronic resources22 and challenges in their subscription and off-campus access. 23 • inadequate collections and the resulting limited use of resources.24 • financial constraints.25 • lack of formal policies and procedures for collection development and management, including selection, acquisition, digitization, and access.26 • lack of proper library communities and the coordination among them for collection development and management.27 • failure to fulfill the user information needs.28 researchers have made some recommendations (that could also be treated as research opportunities) to address these challenges: the libraries need to meet user needs and maintain their pace for disseminating the current and updated scientific knowledge and new insights in the literature to achieve excellence in service delivery.29 the factors affecting lis practices in the academic libraries of pakistan include collection development goals, management policies , and procedures, user requirements, budget, and evaluation.30 the user information needs should be considered to the fullest, and a user-centric approach should be developed to improve content selection.31 librarians should understand the use of linked and open data (lod) for creating standard metadata records for information resources management in libraries. 32 in this regard, the librarians should consider the major challenges, including the lack of technical expertise, awareness of the latest tools and technologies, the complexity of technologies, non-availability of vocabularies, and legal issues.33 the librarians must consider the research community’s limited information technology and libraries september 2022 library management practices in the libraries of pakistan 10 ullah, khusro, and ullah demand and use of e-resources in academic activities.34 there is a significant relationship between the digital resources database and the development of academic research for generating innovative ideas and improving researchers’ cognitive abilities. 35 therefore, libraries must be well aware of maintaining sufficient and up-to-date resources. social networking sites should be considered for knowledge management practices among the employees in public and private universities.36 effective policies should be developed to increase the researchers’ satisfaction and research productivity.37 resource description, discovery, and access as it relates to resource description and access, most libraries in pakistan use online public access catalogs (opacs). the use of specialized software, including, e.g., dspace and e-prints, for developing and using institutional repositories and digital libraries is rare. libraries are still relying on the conventional manual, partially computerized, slow, and old methods of records management and are limited to opacs-based search and retrieval. they are less aware and familiar with the modern best practices of using lod for resource description, sharing, and access. it is unproven and new to the libraries of pakistan for several reasons, including the complexity in deployment and usage and the constraints on financial and human resources.38 some of the notable challenges and issues that appeared in the published literature include the following: • lack of or limited searching and access to resources39 and their sharing. 40 • lack of synchronous or digital reference services41 and the poor availability of virtual reference services.42 • lack of search and retrieval solution for multilingual resources written in pashtu, arabic, and urdu.43 • limited or no use of big data analytics to improve acquisition, preservation, curation, and data analysis.44 • insufficient information on the websites regarding their libraries and lack of communication support for end users.45 • less frequent use of web 2.0, website aid tools, and limited information about their libraries.46 • the smaller size of the library website and the lack of aids including site index, frequently asked questions, user guides about its use.47 • the lack of awareness, best practices, it staff, and the complexity in implementing lod in resource description, discovery, sharing, and access.48 these challenges can be addressed if the recommendations of the researchers are considered. some of these recommendations, which also serve as research opportunities, include: the library management practices should consider using and exploiting ontologies and lod to develop more rigorous classification systems for improved resource description, discovery, and access.49 strategic planning and policies are essential for incorporating ict in the libraries of pakistan, with emphasis on resource description, discovery, access, and sharing through web-based services.50 besides the library’s reference desk and e-mail service, the online instant messaging and search engines tools must be used for virtual reference service (vrs) in libraries. a proper set of written policies and standard operating procedures (sops) for vrs must be introduced.51 the collaboration and sharing of experiences and skills for deploying lod is also vital.52 through lod, the libraries of pakistan can be linked to other global libraries to promote our indigenous information technology and libraries september 2022 library management practices in the libraries of pakistan 11 ullah, khusro, and ullah literature on the web.53 it is challenging to migrate data from text-based and marc catalogs to linked data formats. in addition, the recognition and providence of the uris are challenging. synchronizing terminologies with linked data technology and minimizing its complexity is also challenging. the conversion of marc 21 records to resource description framework (rdf) is onerous.54 a list of the bibliographic databases should be provided on the library website with instructions for their usage and relevant content should be made accessible discipline-wise through proper authentication login.55 services like “ask a librarian,” search, searching via barcode scanners, and maintaining a rich database should be considered by each library through their online and mobile phone interfaces.56 in developing smartphone-based library applications, it is essential to consider service quality, affinity, usefulness, ease of use, satisfaction, confirmation, and continuous usage.57 the information architecture of libraries’ websites should be analyzed from the perspective of their users, and their navigation system should be improved and adapted accordingly.58 the usefulness and cost are the most influential factors that should be considered while adopting library software such as koha.59 the design and quality of the contents and services of the library website are important.60 the use of digital library resources positively impacts research productivity and should be considered. 61 adherence to new standards, practices, and technologies the lack of interest from library staff in adopting and adhering to new standards and technologies is another inevitable aspect. another reason for this non-adherence could be the lack of knowledge by upper management and failure to understand the modern-day needs of library users. however, some developments are taking momentum. for example, several libraries offer web-based services.62 in some scenarios, university students use the social web to access and share resources.63 the pakistan scientific and technological information center (pastic) is developing a searchable database of indigenous collections64 supporting smartphone-based search and access.65 pastic is also creating a consortium-level public access catalog of the scientific periodicals produced by the authors of pakistan.66 the agha khan university has developed an integrated resource management system for connecting different, geographically dispersed libraries of various campuses in pakistan.67 access to digital libraries through the higher education of pakistan (hec) digital library, a library management system, and e-document delivery are some of the notable innovations in the lis domain of pakistan. 68 there are 122 public universities, 95 private universities, and more than 600 non-degree-awarding institutions with hec-dl access.69 the lis practices in pakistani libraries mostly suffer from the lack of professional training,70 awareness of the latest library standards and technologies,71 technological and it proficiency,72 policies for library processes and ict,73 knowledge regarding lod technologies,74 engagement with digitization activities,75 resource sharing, and collaboration,76 sufficient financial resources,77 the supportive and assistive atmosphere for persons with special needs,78 as well as issues regarding archiving, cataloging, and disseminating local and indigenous literature and artifacts.79 a library must find ways of adapting new tools, standards, technologies, and necessary training to support users in resource management, discovery, and access. the hec pakistan maintains one such library to offer free access to research publications and periodicals in different universities of pakistan and their scholars for off-campus online access.80 however, most university library users are not fully satisfied with collection development, and a major part of the literature is still not information technology and libraries september 2022 library management practices in the libraries of pakistan 12 ullah, khusro, and ullah accessible.81 besides, as discussed, pastic is playing its active role in developing a library consortium and a searchable database/catalog of the indigenous collections of pakistan. several university librarians have adopted knowledge management practices to deliver and improve their library services efficiently.82 apart from these few initiatives, the research and development of lis practices in the libraries of pakistan have been at very minimum and need significant attention. some of the notable challenges and issues that appeared in the published literature include • librarians have limited or outdated knowledge regarding research data management.83 • the inappropriate infrastructure.84 • limited or no use of ict, knowledge, and expertise in the use of computers, internet connectivity issues, inadequate computer labs.85 • training and leadership.86 • lack of supporting it staff.87 • lack or limited use of human resource management88 and leadership.89 • financial constraints.90 • lack of dynamic websites for the libraries.91 • lack of tools and standard library software.92 • the very basic level of digital competencies for developing, managing, and protecting digital libraries in universities of pakistan.93 • lack of uniformity and standard features in library websites. 94 • there is less frequent use of web 2.0, website aid tools, and limited information about their libraries.95 • the smaller size of the library website and the lack of aids including site index, frequently asked questions, user guides about its use.96 • the relative infant stage of information commons (information technology infrastructure, services, and resources).97 • negligible willingness and interest in research data management. 98 • reluctance in sharing research data99 and weak and informal collaboration on research.100 some recommendations (that could also be treated as research opportunities) made by researchers include: services, including electronic services, librarian’s end services, and technical knowledge services, should be improved in the special libraries of pakistan. 101 it is essential to understand the need to deploy and use library software, including, e.g., koha, dspace, e-prints, and evergreen.102 human resource management, especially effective leadership with a broader vision, boldness, charismatic personality, and knowledge dissemination abilities, is required to lead staff and manage their social relationships.103 as an information manager of the library, a librarian must be fully aware of web 3.0, the semantic web, and artificial intelligence (ai) tools to become expert in the digital landscape.104 web 2.0 tools and social networking sites should be used in marketing and advertising the library services to the end users.105 the cataloging paradigms should incorporate social collaborative cataloging metadata. 106 artificial intelligence tools and services should be considered where lis professionals can collaborate and join hands with computer science professionals to develop libraries.107 academic libraries’ performance can be improved by using big data tools and analytics.108 quality enhancement and industrial affiliation are important for increasing the quality and quantity of research in academia. 109 the information technology and libraries september 2022 library management practices in the libraries of pakistan 13 ullah, khusro, and ullah digital library, institutional repository software, bibliographic databases, e-journals searching, and referencing tools are very important for increasing the research production of the public sector universities.110 the competencies of ict skills, education in copyright laws and intellectual property, using digital and physical learning resources, and collection development must be improved.111 hec must provide funds for information commons projects for significant benefits to library users.112 lis practices in the light of the studied websites this section attempts to highlight the current state of the libraries in pakistan through data and observations collected from their websites. reviewing a library’s website reveals several aspects of its current state. table a-1 in appendix a summarizes the collected data obtained through the evaluation framework discussed in the methodology section and summarized in table 2. for example, a library with a website that is not user-centered and accessible to people with visual impairments, a criterion outlined as the third item in table 2, may face issues with supporting it staff, lack of expertise, and budget constraints. a library that is unable to offer web-based services cannot meet the needs of a major portion of its users interested in accessing content and servi ces online. a similar impact is connected to each of the remaining criteria of the evaluation framework. the lack of certain pieces of information on the library website affects their users negatively and may restrain them from using it. it is notable that most of the libraries of pakistan have no websites at all, which makes it challenging to discuss their strengths and limitations. as shown in figure 2, only 36% (of the selected 82 websites) of the libraries listed on the hec website have websites leaving 64% that have no online presence. this also makes it challenging to draw a clearer picture of the current state of libraries of pakistan and, therefore, the statistics presented here depict only a rough estimation of the exact details. figure 2. percentage of libraries in pakistan with and without websites. information technology and libraries september 2022 library management practices in the libraries of pakistan 14 ullah, khusro, and ullah figure 3 shows the statistics concerning the appearance and design of library websites in pakistan, which are improving in language and presentation clarity, logical structure, responsive web design, and access to the hec digital library. these websites need improvement in providing accessibility tools for people with disabilities, meeting accessibility guidelines, and incorporating multilingual support. figure 3. library website design, accessibility, and access to hec digital library (lc: language clarity; pc: presentation clarity; sp: support for special people; ls: logical structure; rwd: responsive web design; mlw: multiliguality of web pages; accessibility guidelines; and hec dl access). information technology and libraries september 2022 library management practices in the libraries of pakistan 15 ullah, khusro, and ullah figure 4 shows that most of the libraries (63 out of 82: 76.8%) offer general information on their websites. the most prominent among these include contact details (50 out of 82: 61%), copyright statement (47 out of 82: 57.3%), and library operating hours (46 out of 82: 56.09%), followed by resources (27 out of 82: 32.9%), news/events (25 out of 82: 30.5%) , mission/goals/objectives (24 out of 82: 29.3%), and maps/directions to the library building (19 out of 82: 23.2%), policies (18 out 82: 22%), frequently asked questions (16 out of 82: 19.5%), and last update (12 out of 82: 14.6%). the virtual tour, calendar, and word cloud are the least provided, as shown. finally, a considerable number of libraries (19 out of 82: 23.2%) lack most of the general information. figure 4. number of library websites that offer general information to its users about cn: contact details; c: copyright; oh: opening hours details; rs: resources; ne: news/events; g: mission/goals/objectives; mp: map/directions to the library; p: policies; faq: frequently asked questions; lu: last update; vt: virtual tour; wc: word cloud; cl: calendar; and na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 16 ullah, khusro, and ullah figure 5 shows the details of the libraries that allow sharing their contents or communicating with their users using web 2.0 tools and social media. most of the libraries (53 out of 82: 64.6%) are not connected with their users through social networking. most of the libraries that exploit web 2.0 tools, use facebook (26 out of 82: 31.7%), followed by twitter (22 out of 82: 26.8%), youtube (9 out of 82: 11%), instagram (8 out 82: 9.7%), and rss (5 out of 82: 6.1%). figure 5. number of library websites that provide social networking through fb: facebook; t: twitter; yt: youtube; i: instagram; r: rss; b: blog; s: social bookmarking; w: wikis; pi: pinterest; fr: flicker; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 17 ullah, khusro, and ullah figure 6 shows the statistics for the instructional tools used by different websites of libraries in pakistan. these tools are for the new visitors or the person who requires instruction in navigation, search, and access to the contents of the library’s website. most of the libraries (67 out of 82: 81.7%) do not offer instructional tools on the websites. only a few (15 out of 82: 18. 3%) provide instructional tools in one form or the other. these include information literacy programs (10 out of 82: 12.2%), citation guides (7 out of 82: 8.5%), research guides (6 out 82: 7.3%), subject guides/pathfinders (4 out of 82: 4.8%), tutorials (3 out of 82: 3.6%), opac search tips (3 out of 82: 3.6%), webinars (2 out of 82: 2.4%), program guides (2 out of 82: 2.4%), and user guides (1 out of 82: 1.2%). figure 6. number of library websites that provide the instructional tools of infl: information literacy program; cg: citation guides; rg: research guides; sg: subject guides/pathfinders; ot: online tutorials; tips: opac search tips; wb: webinars; pg: plagiarism guides; ug: user groups; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 18 ullah, khusro, and ullah figure 7 shows the statistics about searching as part of web-based services provided by different libraries on their websites. most libraries (53 out of 82: 66.2%) offer search using keywords (44 out of 82: 53.6%) followed by title (42 out of 82: 51.2%), advanced search (39 out of 82: 47.6%), authors (38 out of 82: 46.3%), subjects (36 out of 82: 43.9%), and opac (5 out of 82: 6.1%). a considerable number of libraries (29 out of 82: 35.4%) have no search functionality. figure 7. number of libraries offering web-based searching services through at: author; tt: title; su: subject; ke: keyword; as: advanced search; opac; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 19 ullah, khusro, and ullah figure 8 shows that most libraries’ websites (53 out of 82: 64.6%) offer browsing using different options and filters. most libraries allow browsing through categories (42 out of 82: 51.5%) followed by the title (40 out of 82: 48.8%), author (38 out of 82: 46.3%), subject (36 out of 82: 43.9%), and keywords (28 out of 82: 34.1%). several libraries (29 out of 82: 35.36%) offer no such browsing functionalities. figure 8. number of libraries offering web-based browsing service parameters including ca: category; tt: title; at: author; su: subject; ke: keyword; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 20 ullah, khusro, and ullah figure 9 shows the statistics for web-based services offered by libraries other than search and browsing, which are depicted separately in figures 7 and 8, respectively. most libraries (63 out of 82: 76.8%) do not offer these services on their websites. only a few of them (19 out of 82: 23.2%) offer services such as ask a librarian (14 out of 82: 17.1%), followed by email, delivery (9 out of 82: 11% each), loan (6 out of 82: 7.3%), chat, ready reference questions (4 out of 82: 4.9% each), and spreading awareness among users (3 out of 82: 3.6%). the remaining services such as newsletter, reserving computers for the users, council services, smartphone-based services, and short messaging service are offered on almost none of the selected libraries’ websites. figure 9. number of libraries offering other web-based library services that provide support for accessing and discovering any service or resource other than search and browsing. these services include al: ask a librarian; em: email; de: delivery; ln: loan; fb: feedback; ch: chat; rq: ready reference questions; aw: awareness; nw: newsletter; rc: reserving computers; lx: library exhibits; cs: council services; sp: smartphone-based services; sms; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 21 ullah, khusro, and ullah figure 10 shows the details offered by libraries about their resources and collections. a considerable number of these libraries (20 out of 82: 24.4%) provide no such information. most libraries (62 out of 82: 75.6%) give details about books (45 out 82: 54.8%), followed by journals (39 out of 82: 45.6%), bibliographic databases (37 out of 82: 45.1%), opac (17 out of 82: 20.7%), full-text databases (10 out of 82: 12.2%), magazines (9 out of 82: 11%), physical books (7 out of 82: 8.7%), online reference services (6 out of 82: 7.3%), opac of other libraries (3 out of 82: 3.7%), audiobooks (2 out of 82: 2.4%), and multimedia collections (1 out of 82: 1.2%). figure 10. number of libraries offering resources and collections including b: books; j: journals; bd: bibliographic databases; opac; o: other; ft: full-text databases; mg: magazines; ph: physical books; or: online reference sources; opac-o: opac of other libraries; ab: audiobooks; mc: multimedia collections; sc: special collection; mlr: multilinguality of resources; na: not available. discussion and analysis the study of websites of the 82 libraries of pakistan reveals that the majority are not technically sound and cannot assist and offer services to its users, including people with visual or physical impairments. the key observations made in the previous section emphasize the need for the libraries of pakistan to transform their libraries’ practices from manual to automatic and webbased services. this can be achieved through collaborative research and development efforts from several domains, including computer science, lis, human-computer interaction, ai, the semantic web, and lod. there are several examples of library consortia that enable collaborative efforts to make available and accessible catalogs, websites, and activities from a single platform.113 these include the online information technology and libraries september 2022 library management practices in the libraries of pakistan 22 ullah, khusro, and ullah computer library center (oclc), the international coalition of library consortia (icolc), hathitrust digital library, the arxiv e-print archive, google books, and shared print storage.114 in pakistan, pastic made the first effort to develop such a consortium115 to allow access to the holdings of the libraries of pakistan by combining their opacs. it offers a searchable database of the collections and enables resource sharing among all the member libraries.116 however, its successful implementation in pakistan requires the willingness of data sharing, professional interaction, and benefiting from the modern technologies among all the libraries of pakistan. the consortium should be supported with the best practices from information retrieval and semantic web technologies to offer better search and retrieval functionalities. users should be made part of the resource description so that the idea of social semantic cataloging117 can be realized, where users can discuss their information needs, recommend books and resources, and enrich the catalog with user-generated content. the artificial intelligence and deep learning algorithms should be exploited in book recommendations so that the available professional metadata and user-generated content could be used to the fullest in serving the users’ information needs. the resulting rich metadata should be made available and consumable on the lod to benefit other potential applications. this will enable the libraries to meet the complex information needs of the users, who describe them in natural language. the natural language is ambiguous, and resources described through user-generated content produced by users in the same language will better support the search and recommendation of books.118 this will improve the resource description, discovery, and access services of the libraries of pakistan to a greater extent. figure 3 depicts another significant limitation of the websites of the libraries of pakistan : extremely limited availability of navigational, retrieval, and visualization aids for people with visual impairments. most of the libraries’ websites have no provision for accessibility mechanisms. this is unfortunate as in 2017 it was reported that 21.78 million people were affected by blindness and vision impairment.119 although several technological aids have been defined for performing daily life activities, including navigation, orientation, localization, obstacle detection, etc.,120 the libraries of pakistan, in the majority, lack accessibility-related solutions for those who are blind or have a visual impairment. holdings should be enriched with audio and braille books and supplemented with an ict-based accessibility solution. the library building should accommodate visitors with diverse needs. information about accessibility should be shared as part of the general information on the library’s website. in this regard, all the stakeholders of the libraries, including government and non-government organizations, educational institutions, and lis professionals, should be made involved to work collaboratively on an effective accessibility solution for all library users.121 smartphones have been among the top trends in pakistan, especially for college and university students who use them most frequently. according to the infographic by grappetite, 77% of smartphone users are between 21 and 30, and 12% are aged 31 to 40 years.122 by closely looking at these statistics, people of these two age groups are the most potential users of libraries as they usually need a variety of books. according to statista, smartphone ownership in pakistan has increased from 10% in 2014 to 51% in 2020.123 according to pakistan telecommunication authority, currently there are 191 million cellular/mobile phone subscribers, and there are 110 million 3g/4g subscribers.124 these statistics suggest that libraries should also benefit from information technology and libraries september 2022 library management practices in the libraries of pakistan 23 ullah, khusro, and ullah incorporating smartphones. the most prominent opportunities are developing smartphone apps that support users in knowing about the collection of a library via the web and producing an interactive user interface that helps them find answers to several of their questions regarding library services. the library opacs can be made usable and accessible through mobile web applications. there are several prospects and opportunities regarding using library space for people with disabilities through smartphones. a smartphone application can be developed to enable readers in navigation, localization, and finding items of interest in the library. conclusions this study aims to provide a holistic view of the current state of libraries in pakistan in the light of the most relevant and recent research works from lis professionals and researchers. it also attempts to identify some of the major challenges, issues, and research opportunities regarding the current state of lis practices in libraries of pakistan with that of technologically advanced countries. the study suggests a need for increasing technology proficiency, adaptability of the latest technologies, proper legislation for lis practices that meet international standards, improvements in collection development, and efforts to meet library users’ needs. the libraries of pakistan need a transition from traditional and limited solutions to a more advanced, ict-enabled, user-friendly, and state-of-the-art system to produce a dynamic, consumable, and sharable knowledge space. the libraries must adopt a social semantic cataloging environment to bring all stakeholders to a single platform. development of a library consortium is critical to connect our local, multilingual, and multicultural collections to users for improved knowledge production, recording, sharing, acquisition, and dissemination. we hope that lis professionals of pakistan and the rest of the world, in general, find this article supportive to their current and future studies. information technology and libraries september 2022 library management practices in the libraries of pakista 24 ullah, khusro, and ullah appendix a: details of libraries table a-1. the comparison and evaluation of libraries using the criteria in table 1. s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 1. central library university of peshawar http://www.uop.e du.pk/library/ ✓ ✓     c, g, ne, cn na na na na ph na  ✓ 2. brains institute peshawar http://www.brains .edu.pk/library-2/ ✓ ✓     mp fb, t, i na na na ph na  ✓ 3. library of edwardes college, the mall peshawar cantt https://www.edw ardes.edu.pk/libra ry ✓ ✓     c, oh na na na na na na  ✓ 4. the aga khan university library https://www.aku. edu/library/pages /home.aspx ✓ ✓  ✓ ✓  c, rs, g, ne, cn, faq, lu, mp, cl, vt, p, wc, oh fb, t, i, yt ke, as ca, ke na ph, o na  ✓ 5. air university central library https://www.au.e du.pk/pages/libra ry/about_library.a spx ✓ ✓  ✓ ✓  c, rs, g, ne, cn, faq, lu, mp, cl, vt, p, wc, oh fb, t ke, tt, su, as ca na ph, b, bd, j na  ✓ 6. the allama iqbal open university (aiou) http://library.aiou. edu.pk/ ✓ ✓  ✓ ✓  c, rs, g, ne, cn, faq, lu, oh, wc na ke, tt, su, as ca na ph, b, bd, j na  ✓ 7. bahria university libraries https://bahria.edu .pk/libraries/ ✓ ✓  ✓ ✓  c, rs, g, ne, cn, p, oh fb, t, i, fr ke, tt, su, as ca na ph, b, bd, j na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 25 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 8. library of balochistan university of engineering & technology, khuzdar http://www.buetk .edu.pk/?page_id= 7368 ✓ ✓  ✓   g na na na na ph na  ✓ 9. library of balochistan university of information technology, engineering & management sciences (buitems) https://www.buit ms.edu.pk/library/ defaulthecandbuit ems.aspx       na na na na na na na  ✓ 10. library of baqai medical university https://baqai.edu. pk/digitallibrary.php       na na na na na na na  ✓ 11. library of barrett hodgson university https://www.bhu. edu.pk/home/tier librarybuilding       na na na na na na na   12. library of beaconhouse national university https://www.bnu. edu.pk/bnu/facilit ies/library ✓ ✓  ✓   g, oh na na na na na na  ✓ 13. comsats university junaid https://ciit.insi gniails.com/lib ✓ ✓  ✓ ✓ ✓ c, rs, g, ne, cn, faq, lu, fb, t at, tt, su, ke, at, tt, su, ke, de, b, o ot ✓ ✓ https://ciit.insigniails.com/library/home https://ciit.insigniails.com/library/home information technology and libraries september 2022 library management practices in the libraries of pakista 26 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other zaidi library rary/home https://library.co msats.edu.pk/ oh as,ss ca al., 14. city university of science and information technology http://cusit.edu.p k/library/       na na na na na na na  ✓ 15. library of fatima jinnah women university https://fjwu.edu.p k/library/ ✓ ✓     rs, g, ne, cn, lu, oh, p fb, t na na na na na  ✓ 16. library of federal urdu university of arts, sciences & technology https://fuuast.edu .pk/library/       na na na na na na na  ✓ 17. library of forman christian college http://library.fccol lege.edu.pk/ ✓ ✓  ✓ ✓  c, rs, cn, faq, p, oh, mp r, t, fb at, tt, su, ke, as,ss at, tt, su, ca al, em, ln, aw, nw, de, rq, lx, fb, rc, cs opac, bd, ft, j, b, ab, mg, opac-o, mc rg, sg, tips, infl, cg, ot, ug, pg, wb  ✓ 18. library of foundation university, http://fui.edu.pk/ fui_main_site/in dex.php/campus      c na na na na na na  ✓ https://ciit.insigniails.com/library/home information technology and libraries september 2022 library management practices in the libraries of pakista 27 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other islamabad life/library 19. library of gift university https://www.gift.e du.pk/page/library -overview ✓ ✓  ✓   c, rs, g, ne, cn, p, oh na tt, at, su ke ch, al, em, aw, de opac, bd, ft, j, b, or na  ✓ 20. library of ghulam ishaq khan institute of engineering sciences & technology http://119.159.23 5.56:8085/forms/ default.aspx ✓ ✓  ✓   c, oh na at, tt, su at, tt, su, ca na opac, bd, ft, j, b, ab na  ✓ 21. library of gomal university http://clib.ddns.ne t/ ✓ ✓  ✓ ✓  p, wc na ke, at, tt, su, as ke, at, tt, su na opac, bd, ft, j, b na  ✓ 22. library of government college university http://library.gcu. edu.pk/ ✓ ✓  ✓ ✓  c, rs, g, ne, cn, lu, mp, p, oh na at, tt, su, ke, as at, tt, su, ca al, em, ln, rq, fb opac, bd, ft, j, b, or rg, sg, tips, cg  ✓ 23. government college university faisalabad https://library.gcu f.edu.pk/ ✓ ✓  ✓ ✓  g, cn, p na ke, tt, at, as su na opac, bd, ft, j na  ✓ 24. library of government college for women university https://www.gcw us.edu.pk/library/ ✓ ✓  ✓ ✓  c, rs, p, cn, oh na as, ke, at, tt, su ca, ke, at, tt, su na bd, ft, j na  ✓ 25. library of https://www.gre.a ✓ ✓  ✓ ✓  c, rs, g, ne, fb, t, i, as, at ca em, b, bd ot, wb ✓  information technology and libraries september 2022 library management practices in the libraries of pakista 28 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other greenwich university c.uk/it-andlibrary/library cn, faq, lu, mp, cl, vt, p, wc, oh yt ln, de, rq, ch, fb, sp 26. library of hitec university http://111.68.98.2 04/libmax/opac/in dex.aspx ✓ ✓  ✓ ✓  c, rs, cn, faq, p, oh na ke ke na b, j, opac na  ✓ 27. library of habib university https://habib.edu. pk/library/ ✓ ✓  ✓ ✓  oh, c, rs, cn, fb, t, i, yt ke ke al, em, ln, aw, nw, de, ch opac, j, ft, b, mg na  ✓ 28. library of hamdard university http://library.ham dard.edu.pk/ ✓ ✓  ✓ ✓  c, rs, cn, faq, p, oh r, t, fb tt, at, su, ke ca, ke al, de opac, bd, j, b infl, sg  ✓ 29. panjab elibrary https://elibrary.pu njab.gov.pk/ ✓ ✓  ✓ ✓  c, rs, cn, faq, g, ne, p, oh, mp fb, t, yt tt, at, su, as tt, at, su, ca fb opac, bd, mg, ft, b, j infl  ✓ 30. library of ilma university https://ilmauniver sity.edu.pk/digitall ibrary ✓ ✓  ✓ ✓  c, mp na na na na bd, ft, j, mg, or na  ✓ 31. library of iqra national university https://iqra.edu.p k/library/ ✓ ✓  ✓ ✓  c, oh, cn na na na na na na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 29 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 32. library of international islamic university https://www.ii u.edu.pk/?page _id=171 ✓ ✓  ✓ ✓  c, ne, oh, cn, rs fb, t, at, tt, su at, tt, su fb o, bd, opac na  ✓ 33. library of institute of space technology https://www.ist.e du.pk/library       na na na na na na na  ✓ 34. library of institute of southern punjab https://isp.edu.pk /libraryitsupport       na na na na na na na  ✓ 35. library of islamia university punjab http://library.iub.e du.pk/ ✓ ✓  ✓ ✓  oh na at, tt, su, ke, as at, tt, su, ke, ca rc, em, de opac na  ✓ 36. library of isra university https://isra.edu.pk /library/     ✓  na na tt, su, at, as tt, su, at, ca na opac na  ✓ 37. library of jinnah sindh medical university http://www.jsmu. edu.pk/faciltieslibrary.html       na na na na na na na  ✓ 38. library of khyber medical university https://www.kmc. edu.pk/new/librar y/       na na na na na na na  ✓ 39. library of king edward medical university https://kemu.edu. pk/library       g, oh na na na na na na  ✓ 40. library of lahore college for http://www.lcwu. edu.pk/lcwu✓ ✓  ✓ ✓  g, rs, faq, p na na na na bd na  ✓ https://www.iiu.edu.pk/?page_id=171 https://www.iiu.edu.pk/?page_id=171 https://www.iiu.edu.pk/?page_id=171 information technology and libraries september 2022 library management practices in the libraries of pakista 30 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other women university library-researchwebsites.html 41. library of lahore university of management sciences https://library.lum s.edu.pk/ ✓ ✓  ✓ ✓  ne, cn, vt, oh fb, i ke, at, tt ke, at, tt, su, ca al, ch bd, j, b infl, tips, rg  ✓ 42. library of mehran university of engineering & technology http://library.mue t.edu.pk/index.ph p ✓ ✓  ✓ ✓  c, ne, cn, oh fb, yt, t, i, r, b at, tt, su, ke, as at, tt, su, ca al, de, ln bd, j, b, opac, or infl, rg  ✓ 43. library of minhaj university https://library.mul .edu.pk/ ✓ ✓  ✓ ✓  c, ne, mp, rs, cn, oh, g fb, t, yt at, tt, su, ke, as at, tt, su, ca ln, de b, j, or, bd, infl, pg, cg, rg  ✓ 44. library of mirpur university of science & technology https://cms.must. edu.pk:8083/form s/default.aspx ✓ ✓  ✓ ✓  c, oh, cn na at, tt, su, as, ke at, tt, su, ke, ca na b na  ✓ 45. library of mohammad ali jinnah university http://ils.jinnah.ed u/ ✓ ✓  ✓ ✓  c, oh, cn na at, tt, su, ke, as at, tt, su, ca na b, j, bd na  ✓ 46. engr. abul kalam library ned university of engineering & technology https://library.ned uet.edu.pk/ ✓ ✓  ✓ ✓  c, cn na ke, au, tt ke, at, tt na b, j, mg, bd cg  ✓ 47. library of namal http://library.nam ✓ ✓  ✓ ✓  ne, cn, oh na at, tt, at, tt, na j, b, bd, na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 31 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other institute, mainwali al.edu.pk/ su, ke, as su, ca mg 48. library of national defense university http://111.68.99.1 07/libmax/opac/in dex.aspx ✓ ✓  ✓ ✓  c, cn, oh na ke, as ke, ca na b, j, mg na  ✓ 49. library of national textile university http://ntu.edu.pk/ library/ ✓ ✓  ✓ ✓  cn, oh, ne, faq fb tt, su, at, as tt, su, at, ca na b, bd, j, opac cg, infl  ✓ 50. library of national university of sciences & technology http://www.nust. edu.pk/library/pa ges/default.aspx ✓ ✓  ✓ ✓  cn, mp, c, g, oh, vt, faq na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j, opac infl  ✓ 51. library of peoples university of medical & health sciences for women http://opac.pumh s.edu.pk/ ✓ ✓  ✓ ✓  cn, mp, c, g, oh, vt, faq na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j, opac infl  ✓ 52. library of shaheed benazir bhutto university sheringal dir upper pakistan http://142.54.178. 188:5229/ ✓ ✓  ✓ ✓  na na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j, opac na  ✓ 53. library of shaheed zulfikar ali bhutto institute of science & technology https://szabist.ed u.pk/szabistlibrary/ ✓   ✓ ✓  cn, mp, c, g, oh, vt, ne, faq, p na ke, as at, tt, su, ke, ca al, em b, j na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 32 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 54. library of sir syed case institute of technology https://case.edu.p k/library/default. aspx ✓ ✓  ✓ ✓  oh, cn fb, t tt, at, ke, su, as tt, ca na b, o na  ✓ 55. library of the islamia college, peshawar http://142.54.178. 188:5209 ✓ ✓  ✓ ✓  na na na na na na na  ✓ 56. library of university of balochistan http://web.uob.ed u.pk/uob/departm ents/library/libra ry.php ✓ ✓  ✓ ✓  cn, mp, c na ke ke na b na  ✓ 57. library of the university of agriculture peshawar http://www.aup.e du.pk/library.php       na na na na na na na  ✓ 58. library of university of buner https://www.ubun er.edu.pk/library ✓      oh, g, c na na na na na na   59. library of university of central punjab http://library.ucp. edu.pk/ ✓ ✓  ✓ ✓  oh, g, c, rs, ne, cn, mp, vt fb tt, at, as tt, at, ca na b, mg, j, bd cg, infl  ✓ 60. library of university of engineering & technology khyber pakhtunkhwa https://www.uetp eshawar.edu.pk/li brary.php ✓      na na na na na na na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 33 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 61. library of university of engineering technology lahore http://library.uet. edu.pk/ ✓ ✓  ✓ ✓  na r at, tt, su, ke, as ke, ca, at, tt al b, j, bd na  ✓ 62. library of university of engineering & technology, taxila https://www.uett axila.edu.pk/librar y.aspx ✓ ✓  ✓ ✓  cn, rs, ne, oh, c t at, tt, su, ke, as at, tt, su, ke, ca al b, bd, j na  ✓ 63. library of university of haripur http://www.uoh.e du.pk/centrallibrary.php?page= mjyx ✓ ✓  ✓ ✓  cn, rs, ne, oh, c na ke at, tt, su, ke, ca na b, bd, j, o na  ✓ 64. library of university of karachi http://www.uok.e du.pk/library/inde x.php ✓ ✓  ✓ ✓  cn, rs, ne, oh, c, mp na ke at, tt, su, ke, ca na b, bd, j, o na  ✓ 65. library of university of management & technology https://library.um t.edu.pk/home.as px ✓ ✓  ✓ ✓  cn, rs, ne, oh, c, mp fb, t at, tt, su, ke, as at, tt, su, ke, ca al, em b, bd, j na  ✓ 66. online catalogue, central library, university of sargodha http://142.54.178. 188:5157/ ✓ ✓  ✓ ✓  na na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j na  ✓ 67. library of university of https://library.usa. edu.pk/ ✓ ✓  ✓ ✓  cn, rs, lu, oh, c na at, tt, su, ke, at, tt, su, ke, al, rq b, bd, j na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 34 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other south asia as ca 68. library of university of the punjab https://pulibrary.edu.pk / ✓ ✓  ✓ ✓  cn, rs, oh, c fb tt, at, as, ke tt, ca al, ch, em bd, b, j, o, opac-o rg, sg, cg  ✓ 69. library of zia-uddin university https://zu.edu.pk/ academics/library/ ✓ ✓  ✓ ✓  cn, rs, oh, c, g, ne, p, lu na at, tt, su, ke, as at, tt, su, ke, ca na bd, j, opac-o na  ✓ 70. library of cabinet division, islamabad http://ndw.gov.pk /index.html ✓ ✓  ✓ ✓  cn, rs, oh, c, faq, g, ne, p, lu na na na na na na   71. elibrary, government of the punjab https://elibrary.pu njab.gov.pk/ ✓ ✓  ✓ ✓  mp fb, t, yt at, tt, su, ke, as at, tt, su, ke, ca na bd, b, j, o, opac-o na   72. hec digital library http://hecpk.sum mon.serialssolutio ns.com/ ✓ ✓  ✓ ✓  na na ke, as at, su, ca na b, o, j, mg na  ✓ 73. bahauddin zakariya university (bzu), multan http://library.bzu. edu.pk ✓ ✓  ✓ ✓  na na na na na b, j na  ✓ 74. begum nustrat bhutto women university, sukkur http://143.244.15 7.171 ✓ ✓  ✓   na fb, i opac, at, tt, su, ke, as at, tt, su, ke, ca na b, j na  ✓ 75. cecos university of information http://sites.google .com/view/library ✓ ✓  ✓   lu, cn, oh na opac, at, tt na b na  ✓ https://pulibrary.edu.pk/ https://pulibrary.edu.pk/ information technology and libraries september 2022 library management practices in the libraries of pakista 35 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other technology & emerging sciences cup/home at, tt 76. dha suffa university http://dclkarachi.c om ✓ ✓  ✓ ✓  c, cn, mp fb, t tt, at, su, ke tt, at, su, ke na b na  ✓ 77. institute of business management https://iobm.daph nis.opalsinfo.net/b in/home ✓ ✓  ✓ ✓  c, cn, lu r opac, at, tt, ke, as, su ca, at, tt, su na b na  ✓ 78. jinnah university for women https://www.juw. edu.pk/campusfacilities/library-1/ ✓ ✓  ✓   c, cn na na na na na na  ✓ 79. khawaja freed university of engineering & information technology, rahim yar khan https://kfueit.edu. pk/aboutlibrary?1=1&menu =sidelink?main=840&m ain=859&parent=f acilities ✓ ✓  ✓   c, cn, oh fb, t, yt na na na na na  ✓ 80. kinnaird college for women, lahore http://www.kinnai rd.edu.pk/library3/ ✓ ✓  ✓   cn, faq na na na na or na   81. lahore leads university https://leads.edu. pk/libraries-.php ✓ ✓  ✓   cn, c fb, t opac, at, tt, ke, as, su ca, at, tt, su na b na  ✓ 82. minhaj university https://lrc.mul.ed ✓ ✓  ✓ ✓  c, cn, mp fb, t, yt opac, su, ca na b na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 36 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other u.pk/ ke information technology and libraries september 2022 library management practices in the libraries of pakistan 37 ullah, khusro, and ullah endnotes 1 younghee noh and rosa chang, “international collaboration in library and information science research in korea,” international journal of knowledge content development & technology 9, no. 2 (2019):91–110, https://doi.org/10.5865/ijkct.2019.9.2.091; muhammad abbas ganaee and muhammad rafiq, “pakistani university library web sites: features, contents, and maintenance issues,” journal of web librarianship 10, no. 4 (2016): 294–315, https://doi.org/10.1080/19322909.2016.1195308. 2 noh and chang, “international collaboration in korea,” 95. 3 ganaee and rafiq, “pakistani university library web sites,” 294. 4 in this study, we evaluated the websites of the libraries of public and private sector universities and research institutes. these websites are listed on the digital library website of hec, pakistan, available at http://www.digitallibrary.edu.pk/institutes.php. 5 alessandro liberati et al., “the prisma statement for reporting systematic reviews and metaanalyses of studies that evaluate health care interventions: explanation and elaboration,” journal of clinical epidemiology 6, no. 7 (2009): e1–e34, https://doi.org/10.1016/j.jclinepi.2009.06.006. 6 michael gusenbauer, “google scholar to overshadow them all? comparing the sizes of 12 academic search engines and bibliographic databases,” scientometrics 118, no. 1 (2019):177– 214, https://doi.org/10.1007/s11192-018-2958-5. 7 liberati et al., “prisma,” e9. 8 liberati et al., “prisma,” e1. 9 liberati et al., “prisma,” e5. 10 irfan ullah and shah khusro, “social book search: the impact of the social web on book retrieval and recommendation,” multimedia tools and applications 79, no. 11 (2020): 8011– 60, https://doi.org/10.1007/s11042-019-08591-0; liberati et al., “prisma,” e9–e10. 11 liberati et al., “prisma,” e5. 12 charlene l. al-qallaf and alaa ridha, “a comprehensive analysis of academic library websites: design, navigation, content, services, and web 2.0 tools,” international information & library review 51, no. 2 (2019): 93–106, https://doi.org/10.1080/10572317.2018.1467166; rozalynd p. mcconnaughy and steven p. wilson, “content and design features of academic health sciences libraries’ home pages,” medical reference services quarterly 37, no. 2 (2018): 153–67, https://doi.org/10.1080/02763869.2018.1439219; gricel dominguez, sarah j. hammill, and ava iuliano brillat, “toward a usable academic library web site: a case study of tried and tested usability practices,” journal of web librarianship 9, no. 2–3 (2015): 99–120, https://doi.org/10.1080/19322909.2015.1076710. https://doi.org/10.5865/ijkct.2019.9.2.091 https://doi.org/10.1080/19322909.2016.1195308 http://www.digitallibrary.edu.pk/institutes.php https://doi.org/10.1016/j.jclinepi.2009.06.006 https://doi.org/10.1007/s11042-019-08591-0 https://doi.org/10.1080/10572317.2018.1467166 https://doi.org/10.1080/02763869.2018.1439219 https://doi.org/10.1080/19322909.2015.1076710 information technology and libraries september 2022 library management practices in the libraries of pakistan 38 ullah, khusro, and ullah 13 al-qallaf and ridha, “web 2.0 tools,” 100; mcconnaughy and wilson, “libraries’ home pages,” 166–67; anna mierzecka and andrius suminas, “academic library website functions in the context of users’ information needs,” journal of librarianship and information science 50, no. 2 (2018): 157–67, https://doi.org/10.1177/0961000616664401; alan kerr and diane rasmussen pennington, “public library mobile apps in scotland: views from the local authorities and the public,” library hi tech 36, no. 2 (2018): 237–51, https://doi.org/10.1108/lht-05-2017-0091; saleeq ahmad dar, “mobile library initiatives: a new way to revitalize the academic library settings,” library hi tech news 36, no. 5 (2019): 15–21, https://doi.org/10.1108/lhtn-05-2019-0032. 14 al-qallaf and ridha, “web 2.0 tools,” 102; mcconnaughy and wilson, “libraries’ home pages,” 159. 15 al-qallaf and ridha, “web 2.0 tools,” 95–97; irfan ullah and shah khusro, “on the search behaviour of users in the context of interactive social book search,” behaviour & information technology 39, no. 4 (2020): 443–62, https://doi.org/10.1080/0144929x.2019.1599069; mcconnaughy and wilson, “libraries’ home pages,” 153–67; mierzecka and suminas, “website functions,” 164; kerr and pennington, “scotland,” 243; dar, “mobile library initiatives,” 15 –17. 16 al-qallaf and ridha, “web 2.0 tools,” 100; mcconnaughy and wilson, “libraries’ home pages” 153–67; mierzecka and suminas, “website functions,” 162–64; kerr and pennington, “scotland,” 243. 17 al–qallaf and ridha, “web 2.0 tools,” 95–100; mcconnaughy and wilson, “libraries’ home pages,” 158; mierzecka and suminas, “website functions,” 161, 162; dar, “mobile library initiatives,” 19. 18 mierzecka and suminas, “website functions,” 158; paul khawaja, “a software tool-based accessibility assessment of public library websites in the united states,” (masters paper, university of north carolina, chapel hill, (2020): 1–51, https://doi.org/10.17615/432g-f412; rita kosztyánné mátrai, “how to make an electronic library accessible,” the electronic library 36, no. 4 (2018): 620–32, https://doi.org/10.1108/el-07-2017-0143. 19 muhammad rafi, ghalib khan, and sikandar ali, “challenges associated with resource selection in public libraries of khyber pakhtunkhawa, pakistan,” information and knowledge management 6, no. 2 (2016): 27–33, https://www.iiste.org/journals/index.php/ikm/article/view/28709. 20 ghalib khan and rubina bhatti, “collection development and management in the university libraries of pakistan: a survey of users’ satisfaction,” international information & library review, 53, no. 3 (2021): 239–53, https://doi.org/10.1080/10572317.2020.1830739; muhammad rafi, sikandar ali, and ashfaq ahmad, “administrative challenges to public libraries in khyber pakhtunkhawa pakistan: an empirical study,” journal of studies in social sciences 15, no. 1 (2016): 32–48, https://infinitypress.info/index.php/jsss/article/view/1280. 21 khan and bhatti, “collection development,” 248; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 36. https://doi.org/10.1177/0961000616664401 https://doi.org/10.1108/lht-05-2017-0091 https://doi.org/10.1108/lhtn-05-2019-0032 https://doi.org/10.1080/0144929x.2019.1599069 https://doi.org/10.17615/432g-f412 https://doi.org/10.1108/el-07-2017-0143 https://www.iiste.org/journals/index.php/ikm/article/view/28709 https://doi.org/10.1080/10572317.2020.1830739 https://infinitypress.info/index.php/jsss/article/view/1280 information technology and libraries september 2022 library management practices in the libraries of pakistan 39 ullah, khusro, and ullah 22 amjid khan, rubina bhatti, ghalib khan, and muhammad ismail, “the role of academic libraries in facilitating undergraduate and post-graduate studies: a case study of the university of peshawar, pakistan,” chinese librarianship: an international electronic journal 2014, no. 38 (2014): 36–49, http://white-clouds.com/iclc/cliej/cl38kbki.pdf; atta ur-rehman marwat and muhammad younus, “evaluation of college libraries in khyber pakhtunkhwa, pakistan: condition, role, and challenges,” library philosophy and practice 2020, no. 4049 (2020): 1–43, https://digitalcommons.unl.edu/libphilprac/4049. 23 muhammad naeem and nadeem siddique, “use of print and electronic journals by the academic community: a survey at gc university lahore,” library philosophy and practice 2020, no. 3788 (2020): 1–16, https://digitalcommons.unl.edu/libphilprac/3788; muhammad abbas ganaee, “library websites of pakistani universities: an exploratory study,” qualitative and quantitative methods in libraries 5, no. 2 (2017): 385–95, http://www.qqml.net/index.php/qqml/article/view/325; alia arshad and kanwal ameen, “academic scientists’ scholarly use of information resources in the digital environment: perceptions and barriers,” global knowledge, memory and communication 67, no. 6/7 (2018): 467–83, https://doi.org/10.1108/gkmc-05-2018-0044. 24 khan et al., “facilitating,” 36, 45. 25 muhammad rafiq, kanwal ameen, and munazza jabeen, “barriers to digitization in university libraries of pakistan: a developing country’s perspective,” the electronic library 36, no. 3 (2018): 457–70, https://doi.org/10.1108/el-01-2017-0012; marwat and younus, “college libraries,” 24; nadeem siddique and khalid mahmood, “status of library software in higher education institutions of pakistan,” international information & library review 47, no. 3–4 (2015): 59–65, https://doi.org/10.1080/10572317.2015.1087796. 26muhammad rafiq and kanwal ameen, “towards a digitization framework: pakistani perspective,” pakistan journal of information management & libraries 15, no. 1 (2014): 22–29, http://journals.pu.edu.pk/journals/index.php/pjiml/article/view/757; khan and bhatti, “collection development,” 241; marwat and younus, “college libraries,” 37. 27 rafiq, ameen, and jabeen, “barriers,” 459, 465; khan and bhatti, “collection development,” 252. 28 khan and bhatti, “collection development,” 247. 29 khan et al., “facilitating,” 46; 30 khan and bhatti, “collection development,” 240. 31 khan and bhatti, “collection development,” 240, 247. 32nosheen fatima warraich and abebe rorissa, “adoption of linked data technologies among university librarians in pakistan: challenges and prospects,” malaysian journal of library & information science 23, no. 3 (2018): 1–13, https://doi.org/10.22452/mjlis.vol23no3.1. http://white-clouds.com/iclc/cliej/cl38kbki.pdf https://digitalcommons.unl.edu/libphilprac/4049 https://digitalcommons.unl.edu/libphilprac/3788 http://www.qqml.net/index.php/qqml/article/view/325 https://doi.org/10.1108/gkmc-05-2018-0044 https://doi.org/10.1108/el-01-2017-0012 https://doi.org/10.1080/10572317.2015.1087796 http://journals.pu.edu.pk/journals/index.php/pjiml/article/view/757 https://doi.org/10.22452/mjlis.vol23no3.1 information technology and libraries september 2022 library management practices in the libraries of pakistan 40 ullah, khusro, and ullah 33 nazia wahid nosheen, fatima warraich and muzammil tahira, “mapping the cataloguing practices in information environment: a review of linked data challenges,” information and learning science 119, no. 9/10 (2018): 586–96, https://doi.org/10.1108/ils-10-2017-0106. 34haseeb ahmad piracha and kanwal ameen, “policy and planning of research data management in university libraries of pakistan,” collection and curation 38, no. 2 (2019): 39–44, https://doi.org/10.1108/cc-08-2018-0019; amjid khan and shamsahd ahmed, “usage of edatabases and e-journals by research community in pakistani universities: issues and perspectives,” library philosophy and practice, 2020, no. 4570 (2020): 1–11, https://digitalcommons.unl.edu/libphilprac/4570. 35 muhammad rafi, zheng jianming, and khurshid ahmad, “evaluating the impact of digital library database resources on the productivity of academic research,” information discovery and delivery 47, no. 1 (2019): 42–52, https://doi.org/10.1108/idd-07-2018-0025; asif altaf and nosheen fatima warraich, “awareness and use of electronic information sources by university students in pakistan,” pakistan library & information science journal 48, no. 4 (2017): 14–25, https://www.researchgate.net/publication/326356264. 36muhammad naeem and mohammad javid khan, “do social networking applications support the antecedents of knowledge sharing practices?” vine journal of information and knowledge management systems 49, no. 4 (2019): 494–509, https://doi.org/10.1108/vjikms-12-20180133. 37 amjid khan, shamshad ahmed, asad khan, and ghalib khan, “the impact of digital library resources usage on engineering research productivity: an empirical evidences from pakistan,” collection building 36, no. 2 (2017): 37–44, https://doi.org/10.1108/cb-10-2016-0027; ikram ul haq and rabiya ali faridi, “knowledge sharing practices amongst the library and information professionals of pakistan in the digital era,” in cooperation and collaboration initiatives for libraries and related institutions, ed. collence takaingenhamo chisita (hershey, pa: igi global, 2020) 200–17, https://doi.org/10.4018/978-1-7998-0043-9.ch010. 38 warraich and rorissa, “adoption,” 7, 8, 13. 39 khan and bhatti, “collection development,” 248; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 42; siddique and mahmood, “library software,” 64. 40rafiq, ameen, and jabeen, “barriers,” 464, 465, 467; sajjad ahmad, shehzad ahmad, and muhammad kamran, “electronic information resource sharing among the research scholars: a case of university of peshawar,” pakistan library & information science journal 50, no. 2 (2019): 45–60. 41 mirza abdul rasheed and muhammad rafiq, “new trends and practices for digital reference service (drs) a survey in the university libraries of punjab, pakistan,” pakistan library & information science journal 48, no. 4 (2017): 44–55; saira hanif soroya and kanwal ameen, “what do they want? millennials and role of libraries in pakistan,” the journal of academic librarianship 44, no. 2 (2018): 248–55, https://doi.org/10.1016/j.acalib.2018.01.003. https://doi.org/10.1108/ils-10-2017-0106 https://doi.org/10.1108/cc-08-2018-0019 https://digitalcommons.unl.edu/libphilprac/4570 https://doi.org/10.1108/idd-07-2018-0025 https://www.researchgate.net/publication/326356264 https://doi.org/10.1108/vjikms-12-2018-0133 https://doi.org/10.1108/vjikms-12-2018-0133 https://doi.org/10.1108/cb-10-2016-0027 https://doi.org/10.4018/978-1-7998-0043-9.ch010 https://doi.org/10.1016/j.acalib.2018.01.003 information technology and libraries september 2022 library management practices in the libraries of pakistan 41 ullah, khusro, and ullah 42rubia khan, arif khan, sidra malik, and haroon idrees, “virtual reference services through web search engines: study of academic libraries in pakistan,” publications 5, no. 2 (2017): 1–13, https://doi.org/10.3390/publications5020006. 43 hafiz habib-ur-rehman, haroon idrees, and ahsan ullah, “organization and usage of information resources at deeni madaris libraries in pakistan,” library review 66, no. 3 (2017): 163–78, https://doi.org/10.1108/lr-02-2016-0016; nadeem siddique and khalid mahmood, “library software in pakistan: a review of literature,” library review 63, no. 3 (2014): 224– 40, https://doi.org/10.1108/lr-04-2013-0048. 44 khurshid ahmad, zheng jianming, and muhammad rafi, “an analysis of academic librarians competencies and skills for implementation of big data analytics in libraries,” data technologies and applications 53, no. 2 (2019): 201–16, https://doi.org/10.1108/dta-092018-0085. 45 shahzad abbas, shanawar khalid, and fakhar abbas hashmi, “library websites as source of marketing of library resources: an empirical study of hec recognized universities of pakistan,” qualitative and quantitative methods in libraries 5, no. 1 (2017): 235–49, http://www.qqml.net/index.php/qqml/article/view/321; rubina bhatti, awais asghar, and amjid khan, “the websites in university libraries of pakistan: current status and new perspectives,” pakistan library & information science journal 46, no. 1 (2015): 26–35. 46 ganaee and rafiq, “pakistani university library web sites,” 294, 303–9. 47 ganaee, “library websites,” 385. 48 warraich and rorissa, “linked data technologies,” 7–9. 49 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 50 muhammad ss mirza, and muhammad arif, “challenges in information technology adoption in pakistani university libraries,” international journal of knowledge content development & technology 6, no. 1 (2016): 105–16, https://doi.org/10.5865/ijkct.2016.6.1.105. 51 khan et al., “virtual reference services,” 4–8. 52 wahid, warraich, and tahira, “mapping,” 587, 593. 53nosheen fatima warraich, “linked data technologies in libraries: an appraisal,” journal of political studies 23, no. 2 (2016): 697–707. 54 corine deliot, “publishing the british national bibliography as linked open data,” catalogue & index, no. 174 (march 2014): 13–18, https://cdn.ymaws.com/www.cilip.org.uk/resource/collection/f71f19c3-49cf-462d-8165b07967ee07f0/catalogue_and_index_issue_174,_march_2014.pdf. https://doi.org/10.3390/publications5020006 https://doi.org/10.1108/lr-02-2016-0016 https://doi.org/10.1108/lr-04-2013-0048 https://doi.org/10.1108/dta-09-2018-0085 https://doi.org/10.1108/dta-09-2018-0085 http://www.qqml.net/index.php/qqml/article/view/321 https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.5865/ijkct.2016.6.1.105 https://cdn.ymaws.com/www.cilip.org.uk/resource/collection/f71f19c3-49cf-462d-8165-b07967ee07f0/catalogue_and_index_issue_174,_march_2014.pdf https://cdn.ymaws.com/www.cilip.org.uk/resource/collection/f71f19c3-49cf-462d-8165-b07967ee07f0/catalogue_and_index_issue_174,_march_2014.pdf information technology and libraries september 2022 library management practices in the libraries of pakistan 42 ullah, khusro, and ullah 55 muhammad rafi et al., “knowledge-based society and emerging disciplines: a correlation of academic performance,” the bottom line 33, no. 4 (2020): 337–58, https://doi.org/10.1108/bl-12-2019-0130. 56 ali mansouri and nooshin soleymani asl, “assessing mobile application components in providing library services,” the electronic library 37, no. 1 (2019): 49–66, https://doi.org/10.1108/el-10-2018-0204. 57 hamaad rafique et al., “do digital students show an inclination toward continuous use of academic library applications? a case study,” the journal of academic librarianship 47, no. 2 (2020): 1–15, https://doi.org/10.1016/j.acalib.2020.102298. 58 ganaee and rafiq, “pakistani university library web sites,” 303–10. 59 asad khan, “investigating the factors influencing librarians’ intention toward the adoption of koha—an open source integrated library system in pakistan,” library philosophy and practice 2020, no. 4360 (2020): 1–52: https://digitalcommons.unl.edu/libphilprac/4360. 60 arslan sheikh, “evaluating the usability of comsats institute of information technology library website: a case study,” the electronic library 35, no. 1 (2017): 121–36, https://doi.org/10.1108/el-08-2015-0149. 61 khan and ahmed, “research community,” 2, 3. 62arif khan, haroon idrees, and khan mudassir, “library web sites for people with disability: accessibility evaluation of library websites in pakistan,” library hi tech news 32, no. 6 (2015): 1–7, https://doi.org/10.1108/lhtn-01-2015-0010. 63muhammad tariq and khalid mahmood, “use, purpose and usage ranking of online informatio n resources by university research students,” 2015 4th international symposium on emerging trends and technologies in libraries and information services, noida, india (2015): 257–63, https://doi.org/10.1109/ettlis.2015.7048208. 64 “online book search,” pakistan scientific and technological information center (pastic), accessed march 24, 2022, http://pastic.gov.pk/advancebooksearch.aspx. 65“objectives,” pakistan scientific and technological information center (pastic), accessed march 22, 2022, http://pastic.gov.pk/objectives.aspx?par=abtp&cmenu=objectives. 66 “about us,” consortium of s&t and r&d libraries of pakistan (cstrdlp), accessed march 22, 2022, http://consortium.pastic.gov.pk. 67 ashraf sharif, “integrating libraries across continents: a case of aga khan university’s nine libraries in five countries,” (paper, national conference on career development of lis professionals and overall improvement of libraries in pakistan, islamabad, 2012): 1–12, https://ecommons.aku.edu/libraries/18. https://doi.org/10.1108/bl-12-2019-0130 https://doi.org/10.1108/el-10-2018-0204 https://doi.org/10.1016/j.acalib.2020.102298 https://digitalcommons.unl.edu/libphilprac/4360 https://doi.org/10.1108/el-08-2015-0149 https://doi.org/10.1108/lhtn-01-2015-0010 https://doi.org/10.1109/ettlis.2015.7048208 http://pastic.gov.pk/advancebooksearch.aspx http://pastic.gov.pk/objectives.aspx?par=abtp&cmenu=objectives http://consortium.pastic.gov.pk/ https://ecommons.aku.edu/libraries/18 information technology and libraries september 2022 library management practices in the libraries of pakistan 43 ullah, khusro, and ullah 68 sania awais and kanwal ameen, “the current innovation status of university libraries in pakistan,” library management 40, no. 3/4 (2019): 178–90, https://doi.org/10.1108/lm-112017-0125. 69 “participants of digital library,” higher education commission (hec) of pakistan, accessed march 24, 2022, http://digitallibrary.edu.pk/institutes.php. 70 warraich and rorissa, “adoption,” 8; rafiq, ameen, and jabeen, “barriers,” 465; shamshad ahmed, arslan sheikh, and muhammad akram, “implementing knowledge management in university libraries of punjab, pakistan,” information discovery and delivery 46, no. 2 (2018): 83–94, https://doi.org/10.1108/idd-08-2017-0065; sabah jan, “status of electronic resources in libraries: a review study,” library philosophy and practice 2019, no. 2524 (2019): 1–20, https://digitalcommons.unl.edu/libphilprac/2524. 71 warraich and rorissa, “adoption,” 1, 7–9; rafiq, ameen, and jabeen, “barriers,” 459, 460; ahmed, sheikh, and akram, “knowledge management,” 84. 72 rafiq and ameen, “digitization framework,” 26; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 39; kanwal ameen, “changing scenario of librarianship in pakistan: managing with the challenges and opportunities,” library management 32, no. 3 (2011): 171–82, https://doi.org/10.1108/01435121111112880. 73 rafiq, ameen, and jabeen, “barriers,” 463; asad khan, mohamad noorman masrek, khalid mahmood, and saima qutab, “factors influencing the adoption of digital reference services among the university librarians in pakistan,” the electronic library 35, no. 6 (2017): 1225–46, https://doi.org/10.1108/el-05-2016-0112; ghalib khan and rubina bhatti, “the impact of higher education commission of pakistan’s funding on the collection development budgets of university libraries,” the bottom line 29, no. 1 (2016): 12–24, https://doi.org/10.1108/bl06-2015-0008. 74 irfan ullah, shah khusro, asim ullah, and muhammand naeem, “an overview of the current state of linked and open data in cataloging,” information technology and libraries 37, no. 4 (2018): 47–80, https://doi.org/10.6017/ital.v37i4.10432; wahid, warraich, and tahira, “mapping,” 593. 75 rafiq, ameen, and jabeen, “barriers,” 460. 76 piracha and ameen, “policy and planning,” 39, 42; ahmed, sheikh, and akram, “knowledge management,” 85. 77 warraich and rorissa, “adoption,” 7, 8; jan, “status,” 13. 78 sania awais and kanwal ameen, “information accessibility for students with disabilities: an exploratory study of pakistan,” malaysian journal of library & information science 20, no. 2 (2017): 103–15, https://mjlis.um.edu.my/article/view/1768; khan, idrees, and mudassir, “accessibility evaluation,” 6. https://doi.org/10.1108/lm-11-2017-0125 https://doi.org/10.1108/lm-11-2017-0125 http://digitallibrary.edu.pk/institutes.php https://doi.org/10.1108/idd-08-2017-0065 https://digitalcommons.unl.edu/libphilprac/2524 https://doi.org/10.1108/01435121111112880 https://doi.org/10.1108/el-05-2016-0112 https://doi.org/10.1108/bl-06-2015-0008 https://doi.org/10.1108/bl-06-2015-0008 https://doi.org/10.6017/ital.v37i4.10432 https://mjlis.um.edu.my/article/view/1768 information technology and libraries september 2022 library management practices in the libraries of pakistan 44 ullah, khusro, and ullah 79 nosheen fatima warraich, amara malik, and kanwal ameen, “gauging the collection and services of public libraries in pakistan,” global knowledge, memory and communication 67, no. 4/5 (2018): 244–58, https://doi.org/10.1108/gkmc-11-2017-0089. 80 alia arshad and kanwal ameen, “scholarly communication in the age of google: exploring academics’ use patterns of e-journals at the university of the punjab,” the electronic library 35, no. 1 (2017): 167–84, https://doi.org/10.1108/el-09-2015-0171. 81 khan and bhatti, “collection development,” 242, 251. 82 khurshid ahmad and muhammad rafiq, “methods of knowledge management practices in pakistani universities’ libraries,” nust journal of social sciences and humanities 4, no. 1 (2018): 115–26, https://doi.org/10.51732/njssh.v4i1.30. 83 piracha and ameen, “policy and planning,” 42. 84 piracha and ameen, “policy and planning,” 39. 85 khan and bhatti, “collection development,” 248–49; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 34, 41, 45; muhammad arif and khalid mahmood, “the changing role of librarians in the digital world: adoption of web 2.0 technologies by pakistani librarians,” the electronic library 30, no. 4 (2012): 469–79, https://doi.org/10.1108/02640471211252184; warraich, malik, and ameen, “gauging,” 249– 55; amjid khan and shamshad ahmed, “analyzing the relationship between organizational culture and lifelong learning among the information professionals in the university libraries of pakistan,” information discovery and delivery 50, no. 1 (2022): 1–11, https://doi.org/10.1108/idd-01-2019-0001; ahsan ullah and harron idrees, “technical staff positions and technology related tasks: a study of university libraries in pakistan,” pakistan journal of information management & libraries 18, no. 1 (2017): 52–61, https://ssrn.com/abstract=2918822. 86 khan and bhatti, “collection development,” 242; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 44; marwat and younus, “college libraries,” 27; shehzad ahmad and sajjad ahmad, “status of ict in the university libraries of khyber pakhtunkhwa,” pakistan library & information science journal 48, no. 2 (2017): 37–48; sajjad ahmad, shehzad ahmad, and muhammad arshad, “attitude of university information professionals’ toward the use and application of ict: a case of khyber pakhtunkhwa,” pakistan library & information science journal 51, no. 3 (2020): 51–64; rabia abdul karim and anila fatima shakil, “a research study about the importance of e library for globalized learning among students at university level in karachi, pakistan,” rads journal of social sciences & business management 4, no. 2 (2017): 104–14, http://www.jssbm.juw.edu.pk/index.php/jssbm/article/view/45. 87 piracha and ameen, “policy and planning,” 39–41; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 44; warraich, malik, and ameen, “gauging,” 249; ahmad, ahmad, and kamran, “sharing,” 45; ullah and idrees, “technical staff,” 59; ahmad and ahmad, “ict,” 37; karim and shakil, “globalized learning,” 113; warraich and rorissa, “adoption,” 8; siddique and mahmood, “pakistan,” 237. https://doi.org/10.1108/gkmc-11-2017-0089 https://doi.org/10.1108/el-09-2015-0171 https://doi.org/10.51732/njssh.v4i1.30 https://doi.org/10.1108/02640471211252184 https://doi.org/10.1108/idd-01-2019-0001 https://ssrn.com/abstract=2918822 http://www.jssbm.juw.edu.pk/index.php/jssbm/article/view/45 information technology and libraries september 2022 library management practices in the libraries of pakistan 45 ullah, khusro, and ullah 88 rafiq, ameen, and jabeen, “barriers,” 463–66. 89 murtaza ashiq, shafiq ur rehman, and syeda hina batool, “academic library leaders’ conceptions of library leadership in pakistan,” malaysian journal of library & information science 24, no. 2 (2019): 55–71, https://doi.org/10.22452/mjlis.vol24no2.4; piracha and ameen, “policy and planning,” 42, 43; amara malik and kanwal ameen, “library and information science collaboration in pakistan: challenges and prospects,” information and learning science 119, no. 9/10 (2018): 555–71, https://doi.org/10.1108/ils-09-2017-0096. 90 rafiq, ameen, and jabeen, “barriers,” 463–67; marwat and younus, “college libraries,” 24; siddique and mahmood, “library software,” 61. 91 mirza and arif, “challenges,” 113. 92 marwat and younus, “college libraries,” 39; ahmad and ahmad, “ict,” 38–47; siddique and mahmood, “pakistan,” 224, 234, 235. 93 shakeel ahmad khan and rubina bhatti, “digital competencies for developing and managing digital libraries: an investigation from university librarians in pakistan,” the electronic library 35, no. 3 (2017): 573–97, https://doi.org/10.1108/el-06-2016-0133. 94 midrar ullah, “content analysis of medical college library websites in pakistan indicates necessary improvements,” health information & libraries journal (14 july, 2021): 1–10, https://doi.org/10.1111/hir.12386. 95 ganaee and rafiq, “pakistani university library web sites,” 294, 303–9. 96 ganaee, “library websites,” 385. 97arslan sheikh, “development of information commons in university libraries of pakistan: the current scenario,” the journal of academic librarianship 41, no. 2 (2015): 130–39, https://doi.org/10.1016/j.acalib.2015.01.002. 98 piracha and ameen, “policy and planning,” 39, 42, 43. 99 piracha and ameen, “policy and planning,” 42. 100 malik and ameen, “collaboration,” 563, 564. 101 waqar ahmad, muhammad shahid soroya, and munazza jubeen, “electronic, librarian’s end, techno knowledge and multifactor services in the special libraries of lahore,” pakistan library & information science journal 48, no. 4 (2017): 102–14, https://www.researchgate.net/publication/334546141. 102 nadeem siddique and khalid mahmood, “combating problems related to library software in higher education institutions of pakistan: an analysis of focus groups,” malaysian journal of library & information science 21, no. 1 (2016): 35–51, https://doi.org/10.22452/mjlis.vol21no1.3. https://doi.org/10.22452/mjlis.vol24no2.4 https://doi.org/10.1108/ils-09-2017-0096 https://doi.org/10.1108/el-06-2016-0133 https://doi.org/10.1111/hir.12386 https://doi.org/10.1016/j.acalib.2015.01.002 https://www.researchgate.net/publication/334546141 https://doi.org/10.22452/mjlis.vol21no1.3 information technology and libraries september 2022 library management practices in the libraries of pakistan 46 ullah, khusro, and ullah 103 ashiq, rehman, and batool, “library leaders,” 61–68. 104 waqar ahmed, “third generation of the web: libraries, librarians and web 3.0,” library hi tech news 32, no. 4 (2015): 6–8, https://doi.org/10.1108/lhtn-11-2014-0100. 105 abid hussain and saeed ullah jan, “awareness of web 2.0 technology in the academic libraries: an islamabad perspective,” library philosophy and practice 2018, no. 1945 (2018): 1–13, https://digitalcommons.unl.edu/libphilprac/1945; azizur rahman, amjid khan, and ghalib kan, “assessment of web 2.0 applications in university libraries of khyber pakhtunkhwa,” pakistan library & information science journal 50, no. 3 (2019): 9–18; muhammad tufail khan and muhammad rafiq, “library social media services (lsms)! going viral for survival,” pakistan library & information science journal 50, no. 3 (2019): 23–32. 106 ullah et al., “current state,” 64–66. 107 muhammad yousuf ali, salaman bin naeem, and rubina bhatti, “artificial intelligence tools and perspectives of university librarians: an overview,” business information review 37, no. 3 (2020): 116–24, https://doi.org/10.1177/0266382120952016. 108 y. m. atiquil islam, khurshid ahmad, muhammad rafi, and zheng jianming, “performance– based evaluation of academic libraries in the big data era,” journal of information science 47, no. 4 (2020): 458–71, https://doi.org/10.1177/0165551520918516. 109 abid hussain and muhammad ibrahim, “research productivity of library and information science in khyber pakhtunkhwa: a case study of sarhad university of science peshawar, pakistan,” journal of information management and library studies 1, no. 1 (2018): 54–63, http://jimls.kkkuk.edu.pk/jimls/index.php/jimls/article/view/14. 110 shamshad ahmed and atta ur rehman, “perceptions and level of ict competencies: a survey of librarians at public sector universities in khyber pakhtunkhwa, pakistan,” pakistan journal of information management and libraries 18, no. 1 (2016): 1–11, http://journals.pu.edu.pk/journals/index.php/pjiml/article/viewarticle/951; altaf and warraich, “awareness,” 14–22; munir moosa sadruddin, “contribution of digital libraries and its role in reaping quality researches in pakistan – challenges and opportunities,” pakistan library & information science journal 46, no. 1 (2015): 60–70. 111 muhammad umar farooq, ahsan ullah, memoona iqbal, and abid hussain, “current and required competencies of university librarians in pakistan,” library management 37, no. 8/9 (2016): 410–45, https://doi.org/10.1108/lm-03-2016-0017. 112 sheikh, “information commons,” 138. 113 kimberly l. armstrong and thomas h. teper, “library consortia and the cic: leveraging scale for collaborative success,” serials review 43, no. 1 (2017): 28–33. https://doi.org/10.1080/00987913.2017.1284493. 114 armstrong and teper, “library consortia,” 30–32. 115 “about us,” pastic. https://doi.org/10.1108/lhtn-11-2014-0100 https://digitalcommons.unl.edu/libphilprac/1945 https://doi.org/10.1177/0266382120952016 https://doi.org/10.1177/0165551520918516 http://jimls.kkkuk.edu.pk/jimls/index.php/jimls/article/view/14 http://journals.pu.edu.pk/journals/index.php/pjiml/article/viewarticle/951 https://doi.org/10.1108/lm-03-2016-0017 https://doi.org/10.1080/00987913.2017.1284493 information technology and libraries september 2022 library management practices in the libraries of pakistan 47 ullah, khusro, and ullah 116 “objectives,” pastic. 117 ullah et al, “current state,” 64–66. 118 ullah et al., “current state,” 64–67. 119 bilal hassan, ramsha ahmed, bo li, ayesha noor, and zahid ul hassan, “a comprehensive study capturing vision loss burden in pakistan (1990–2025): findings from the global burden of disease (gbd) 2017 study,” plos one 14, no. 5 (2019): e0216492, https://doi.org/10.1371/journal.pone.0216492. 120 izaz khan, shah khusro, and irfan ullah, “technology-assisted white cane: evaluation and future directions,” peerj 6, no. e6058 (2018.): 1–27, https://doi.org/10.7717/peerj.6058. 121 awais and ameen, “information accessibility,” 111–13. 122 “smartphone usage in pakistan,” accessed march 24, 2022, https://pas.org.pk/smart-phoneusage-in-pakistan-infographics. 123 “smartphone penetration rate as share of connections in pakistan from 2014 to 2020,” statista research department, december 30, 2016, https://www.statista.com/statistics/671542/smartphone-penetration-as-share-ofconnections-in-pakistan. 124 “telecom indicators,” pakistan telecommunication authority, january 2022, https://www.pta.gov.pk/en/telecom-indicators https://doi.org/10.1371/journal.pone.0216492 https://doi.org/10.7717/peerj.6058 https://pas.org.pk/smart-phone-usage-in-pakistan-infographics https://pas.org.pk/smart-phone-usage-in-pakistan-infographics https://www.statista.com/statistics/671542/smartphone-penetration-as-share-of-connections-in-pakistan https://www.statista.com/statistics/671542/smartphone-penetration-as-share-of-connections-in-pakistan https://www.pta.gov.pk/en/telecom-indicators abstract introduction methodology the literature search and selection strategy the evaluation framework summary of key observations lis practices in the light of published literature collection development and management resource description, discovery, and access adherence to new standards, practices, and technologies lis practices in the light of the studied websites discussion and analysis conclusions appendix a: details of libraries endnotes editorial board thoughts | dehmlow 53 mark dehmloweditorial board thoughts the ten commandments of interacting with nontechnical people m ore than ten years of working with technology and interacting with nontechnical users in a higher education environment has taught me many lessons about successful communication strategies. somehow, in that time, i have been fortunate to learn some effective mechanisms for providing constructive support and leading successful technical projects with both technically and “semitechnically” minded patrons and librarians. i have come to think of myself as someone who lives in the “in between,” existing more in the beyond than the bed or the bath, and, while not a native of either place, i like to think that i am someone who is comfortable in both the technical and traditional cliques within the library. ironically, it turns out that the most critical pieces to successfully implementing technology solutions and bridging the digital divide in libraries has been categorically nontechnical in nature; it all comes down to collegiality, clear communication, and a commitment to collaboration. as i ruminated on the last ten plus years of working in technology, i began to think of the behaviors and techniques that have proved most useful in developing successful relationships across all areas of the library. the result is this list of the top ten dos and don’ts for those of us self-identified techies who are working more and more often with the self-identified nontechnical set. 1. be inclusive—i have been around long enough to see how projects that include only technical people are doomed to scrutiny and criticism. the single best strategy i have found to getting buy-in for technical projects is to include key stakeholders and those with influence in project planning and core decision-making. not only does this create support for projects, but it encourages others to have a sense of ownership in project implementation—and when people feel ownership for a project, they are more likely to help it succeed. 2. share the knowledge—i don’t know if it is just the nature of librarianship, but librarians like to know things, and more often than not they have a healthy sense of curiosity about how things work. i find it goes a long way when i take a few moments to explain how a particular technology works. our public services specialists, in particular, often want to know the details of how our digital tools work so that they can teach users most effectively and answer questions users have about how they function. sharing expertise is a really nice way to be inclusive. 3. know when you have shared enough—in the same way that i don’t need to know every deep detail of collections management to appreciate it, most nontechies don’t need hour-long lectures on how each component of technology relates to the other. knowing how much information to share when describing concepts is critical to keeping people’s interest and generally keeping you approachable. 4. communicate in english—it is true that every specialization has its own vocabulary and acronyms (oh how we love acronyms in libraries) that have no relevance to nonspecialists. i especially see this in the jargon we use in the library to describe our tools and services. the best policy is to avoid jargon and explain concepts in lay-person’s terms or, if using jargon is unavoidable, define specialized words in the simplest terms possible. using analogies and drawing pictures can be excellent ways to describe technical concepts and how they work. it is amazing how much from kindergarten remains relevant later in life! 5. avoid techno-snobbery—i know that i am risking virtual ostracism in writing this, but i think it needs to be said. just because i understand technology does not make me better than others, and i have heard some variant of the “cup holder on the computer” joke way too often. even if you don’t make these kinds of comments in front of people who aren’t as technically capable as you, the attitude will be apparent in your interactions, and there is truly nothing more condescending. 6. meet people halfway—when people are trying to ask technology-related questions or converse about technical issues, don’t correct small mistakes. instead, try to understand and coax out their meaning; elaborate on what they are saying, and extend the conversation to include information they might not be aware of. people don’t like to be corrected or made to feel stupid—it is embarrassing. if their understanding is close enough to the basic idea, letting small mistakes in terminology slide can create an opening for a deeper understanding. you can provide the correct terminology when talking about the topic without making a point to correct people. 7. don’t make a clean technical/nontechnical distinction— after once offering the “technical” perspective on a topic, one librarian said to me that it wasn’t that they themselves didn’t have any technical mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, hesburgh libraries, university of notre dame, notre dame, indiana. 54 information technology and libraries | june 2009 perspective, it just wasn’t perhaps as extensive as mine. each person has some level of technical expertise; it is better to encourage the development of that understanding rather than compartmentalizing people on the basis of their area of expertise. 8. don’t expect everyone to be interested—just because i chose a technical track and am interested in it doesn’t mean everyone should be. sometimes people just want to focus on their area of expertise and let the technical work be handled by the techies. 9. assume everyone is capable—at least at some level. sometimes it is just a question of describing concepts in the right way, and besides, not everyone should be a programmer. everyone brings their own skills to the table and that should be respected. 10. expertise is just that—and no one, no one knows everything. there just isn’t enough time, and our brains aren’t that big. embrace those with different expertise, and bring those perspectives into your project planning. a purely technical perspective, while perhaps being efficient, may not provide a practical or intuitive solution for users. diversity in perspective creates stronger projects. in the same way that the most interesting work in academia is becoming increasingly more multidisciplinary, so too the most successful work in libraries needs to bring diverse perspectives to the fore. while it is easy to say libraries are constantly becoming more technically oriented because of the expanse of digital collections and services, the need for the convergence of the technical and traditional domains is clear—digital preservation is a good example of an area that requires the lessons and strengths learned from physical preservation, and, if anything, the technical aspects still raise more questions than solutions—just see henry newman’s article “rocks don’t need to be backed up” to see what i mean.1 increasingly, as we develop and implement applications that better leverage our collections and highlight our services, their success hinges on their usability, user-driven design, and implementations based on user feedback. these “user”-based evaluation techniques fit more closely with traditional aspects of public services: interacting with patrons. lastly, it is also important to remember that technology can be intimidating. it has already caused a good deal of anxiety for those in libraries who are worried about long-term job security as technology continues to initiate changes in the way we perform our jobs. one of the best ways to bring people along is to demystify the scary parts of technology and help them see a role for themselves in the future of the library. going back to maslow’s hierarchy of needs, people want to feel a sense of security and belonging, and i believe it is incumbent upon those of us with a deep understanding of technology to help bring the technical to the traditional in a way that serves everyone in the process. reference 1. henry newman, “rocks don’t need to be backed up,” enterprise storage forum.com (mar. 27, 2009), www.enterprise storageforum.com/continuity/features/article.php/3812496 (accessed april 24, 2009). first aid training for those on the front lines: digital preservation needs survey results 2012 jody deridder information technology and libraries | june 2013 18 “the dilemma for the cultural heritage preservation community derives from the lag between immediate need and the long-term transformation of digital preservation expertise.” 1 introduction every day history is being made and recorded in digital form. every day, more and more digitally captured history disappears completely or becomes inaccessible due to obsolescence of hardware, software, and formats.2 although it has long been the focus of libraries and archives to retain, organize, and preserve information, these communities face a critical skills gap. 3 further, the typical library cannot support a true, trusted digital repository compliant with the open archival information system (oais) framework.4 until we have in place the infrastructure, expertise, and resources to distill critical information from the digital deluge and preserve it appropriately, what steps can those in the field take to help mitigate the loss of our cultural heritage? the very “scale of the digital landscape makes it clear that preservation is a process of triage.” 5 while educational systems across the country are scrambling to develop training programs to address the problem, it will be years, if ever, before every cultural heritage institution has at least one of these formally trained employees on staff. librarians and archivists already in place are wondering what they can do in the meantime. those on the front lines of this battlefront to save our cultural history need training. surrounded by content under digitization, digital content coming into special collections and archives, assisting content creators in their research and scholarship, these archivists and librarians need to know what they can do to prevent more critical loss. even if developing a preservation program is limited to ensuring the digital content survives long enough to be collected by some better-funded agency, capturing records in open standard interoperable technology neutral formats would help to ease later ingest of such content into a trusted digital repository.6 as molinaro has pointed out, those in the field need “the knowledge and skills to ensure that their projects and programs are well conceived, feasible, and have a solid sustainability plan.” 7 for those on the front lines, digital preservation education needs to be accessible, practical, and targeted to an audience that may have little technical expertise. since “resources for preservation are meager in small and medium-sized heritage organizations,” 8 such training needs to be free or as low-cost as possible. jody l. deridder (jlderidder@ua.edu) is head of digital services at the university of alabama libraries, tuscaloosa. mailto:jlderidder@ua.edu first aid training for those on the front lines | deridder 19 in an effort to address these needs, the library of congress established the digital preservation outreach & education (dpoe) train-the-trainer network.9 in six one-hour modules,10 this training provides a basic overview of the framework necessary to begin to develop a digital preservation program. the modules formed the basis for three well-attended aserl webinars in february 2012.11 attendee feedback after the webinars indicated a deep need for practical, detailed instruction for those in the field. this article reports on the results of a follow-up survey to identify the topics and types of materials most important to webinar attendees and their institutions for digital preservation, in the fall of 2012. approach the survey was open from october 2 until december 15, 2012. invitations to participate were sent to the following discussion lists: society of american archivists (saa) archives & archivists (a&a), saa preservation section discussion list, saa metadata and digital object round table discussion list, digital-curation (google group), digital library federation (dlf-announce), and the library of congress digital preservation and outreach (dpoe) general listserv. each invitation clarified that respondents need not be association of south eastern research libraries (aserl) members in order to attend the free webinars or to participate in the survey. the survey consisted of three questions, the first to determine the sources of digital content most important for respondents’ institutions to preserve, and the second to identify the topics of greatest concern to respondents themselves. for these two questions, respondents were asked to rate the options as: • extremely important • somewhat important • maybe of value • not important at all the first two questions are as follows: please rate the following sources of digital content in terms of importance for preservation at your institution: • born-digital institutional records • born-digital special collections materials • digitized collections • digital scholarly content (institutional repository or grey literature) • digital research data • web content • other please rate the following topics in terms of importance to you, for inclusion in future training webinars: information technology and libraries | june 2013 20 • how to inventory content to be managed for preservation • developing selection criteria, and setting the scope for what your institution commits to preserving • selecting storage options and number of copies • determining what metadata to capture and store • methods of preservation metadata extraction, creation, and storage • legal issues surrounding access, use, migration, and storage • selecting file formats for archiving • validating files and capturing checksums • monitoring status of files and media • file conversion and migration issues • business continuity planning • security and disaster planning at multiple levels of scope • self-assessment and external audits of your preservation implementation • developing your institution's preservation policy and planning team • planning for provision of access over time • other after each of these questions, respondents were provided a free text field in which to add additional entries related to the “other” entry. the last question on the survey asked respondents whether they are members of an aserl institution, since aserl is supporting this series of webinars. results of the 182 respondents, 37 (20.7 percent) self-identified as aserl members, 142 (79.3 percent) as non-aserl members, and three skipped the question. all respondents answered the first two queries. sources of digital content for the complete set of respondents, the top three types of material considered extremely important for preservation were born-digital special collections materials (65 percent, 117 respondents), born-digital institutional records (62.7 percent, 111 respondents), and digitized collections (61.2 percent, 109 respondents). digital scholarly content, digital research data, and web content trailed in importance, rated extremely important by only 37 percent (64 respondents), 33.9 percent (59 respondents), and 30.6 percent (52 respondents) respectively. in clarification, one respondent listed “born-digital correspondence (e-mail),” another listed “state government digital archival records,” a third asked for instructions for use of “kodak’s new asset protection film for preservation of moving and still images,” and one specified that by “special collections” she meant “audiovisual.” first aid training for those on the front lines | deridder 21 the concern for a/v materials was echoed by some of the 8 respondents suggesting other content as extremely important: “born-digital moving image preservation” (an aserl respondent), “best practices for preservation of different audio and video formats” (also an aserl respondent), “born digital photographs and video of college events,” and a request for an “audio digitization workshop.” additional “other” entries were copyright pitfalls, data security, and “very practical steps that very small institutions can take to preserve their digital materials (e.g. how to check digital integrity, and how often, selection of storage media, and creation of a ‘dark archive’).” one aserl respondent indicated that she did not rate “born digital” institutional and special collections materials as extremely important for preservation only because her institution does not yet have a system set up for these, nor do they yet collect many born-digital special collections. she clarified that she does think this is extremely important despite the seeming lack of interest on the part of her institution. figure 1. results for all survey respondents indicating sources of digital content of importance for preservation at their institution. information technology and libraries | june 2013 22 in comparing the responses to the first question by whether the respondents self-identified as members of an aserl institution (37 respondents as opposed to 142), those who did considered born-digital special collections materials far more important (73 percent, 27 respondents) than non-aserl respondents (62.9 percent, 88 respondents), but this still was rated most important by both groups. second for aserl respondents was digitized collections (69.4 percent, 25 respondents) whereas born-digital institutional records held second place for non-aserl respondents (62 percent, 85 respondents). third and fourth-ranked material sources for aserl respondents were born-digital institutional records (64.9 percent, 24 respondents) and digital scholarly content (63.9 percent, 23 respondents); digital research data only rated 52.8 percent (19 respondents). non-aserl respondents considered digitized collections the third most important source of digital content for preservation (59.7 percent, 83 respondents), and this group of respondents was far less concerned with digital scholarly content (29.9 percent, 40 respondents) or digital research data (29.6 percent, 40 respondents) than the aserl respondents. web content ranked lowest for both groups: 29.4 percent (10) aserl respondents and 30.6 percent (41) nonaserl respondents considered this content extremely important. figure 2. results for aserl survey respondents indicating sources of digital content of importance for preservation at their institution. first aid training for those on the front lines | deridder 23 figure 3. results for non-aserl survey respondents indicating sources of digital content of importance for preservation at their institution. perhaps most surprising was that 20 non-aserl respondents (14.8 percent) rated digital research data as “not important at all” for preservation at their institutions, but this may be reflective of their type of institution. museums and historical societies, non-research institutions, and government agencies likely are not concerned with research data; this theory seems to be supported by the 12.7 percent (17) non-aserl respondents who rated digital scholarly content as “not important at all.” in comparison, only one aserl respondent (2.8 percent) indicated that research data had no importance to his institution for preservation (0 for digital scholarly content). this may simply reflect a lack of awareness of current issues on the part of the respondent. topics of interest both groups of respondents agreed on the three most important topics for future training webinars. “methods of preservation metadata extraction, creation and storage” led the way with 77.3 percent (140 respondents: 70.3 percent or 26 aserl and 79.4 percent or 112 non-aserl) information technology and libraries | june 2013 24 listing this as extremely important. next was “determining what metadata to capture and store” (68 percent, 96 respondents: 62.2 percent or 23 aserl and 66.7 percent or 120 non-aserl). the third most important topic is “planning for provision of access over time” at 65.4 percent (117 respondents: 1.1 percent or 22 aserl and 65.7 percent or 92 non-aserl). figure 4. results for all survey respondents indicating topics of importance to them, for future training webinars. fourth in importance overall was “file conversion and migration issues” (58.8 percent, 107 respondents: 54.1 percent or 20 aserl and 60.6 percent or 86 non-aserl), though the aserl respondents thought this topic was slightly less critical than “developing selection criteria, and setting the scope for what your institution commits to preserving” (56.8 percent, 21 respondents as opposed to 49.6 percent or 70 non-aserl respondents; overall percentage 51.9 percent, 94 respondents). close in relative importance were “validating files and capturing checksums” (53.9 percent, 97 respondents), “monitoring status of files and media” (52.8 percent, 95 respondents), and “developing your institution’s preservation policy and planning team” (51.1 percent, 92 first aid training for those on the front lines | deridder 25 respondents). interestingly, however, “validating files and capturing checksums” is far more important to non-aserl respondents (53.6 percent, 75 respondents) than those from aserl institutions (only 37.8 percent, 14 respondents). “legal issues surrounding access, use, migration and storage” is a more important topic for aserl respondents (51.4 percent, 19 respondents) than non-aserl (42.8 percent, 77 respondents), and aserl respondents were more concerned (37.8 percent, 14 respondents) than non-aserl (33.1 percent, 46 respondents) with “selfassessment and external audits.” additionally, “selecting file formats for archiving” and “selecting storage options and number of copies” is more important for non-aserl (47.5 percent, 67 respondents and 47.9 percent, 67 respondents) than aserl respondents (35.1 percent, 13 respondents and 32.4 percent, 12 respondents, respectively). figure 5. results for aserl survey respondents indicating topics of importance to them, for future training webinars. “security and disaster planning” was ranked extremely important by only 32.6 percent (45) respondents overall, followed by “business continuity planning” at only 29.2 percent (40) respondents. the latter may reflect a lack of widespread awareness of just how critical the loss of information technology and libraries | june 2013 26 a single key employee can be, especially in smaller institutions. it also seems clear that there’s a level of complacency or sense of security about our ephemeral digital content that may be in error. then again, it is quite possible that the respondents are not administrators and feel they do not have the power in their organizations to address such issues. figure 6. results for non-aserl survey respondents indicating topics of importance to them, for future training webinars. additional topics considered extremely important to respondents are as follows, listed in the free text area (the last four by aserl members): • "clean" work station setup—hardware & software for ingest, virus scan, checksum, disk image, metadata, conversion, etc. • integrating tools into your workflow. there is a need to address the nuts and bolts for those of us that are further along in determining the metadata required to capture, selection criteria, and asset audit and preservation policy. first aid training for those on the front lines | deridder 27 • methods for providing researchers access to born digital content (not necessarily online, could be just in-house). • strategies for locating digital assets on physical media in large collections that have been using mplp [“more product, less process”] for decades. • format determination and successful migration or emulation. • staff diversity and training. • how to validate files, migrate files, and which born-digital institutional files our special collections needs to be preserving. • creating and maintaining effective organizational models for digital preservation (i.e. collaboration with central it and/or external vendors, etc.). • case studies of digital preservation, establishing workflow of digital preservation. • web archiving (best practices, alternatives to archive-it, methods of selection, etc.). • one (non-aserl) respondent said it was “somewhat important” to include the topic of “trends for field, future outlook.” conclusions the results from this survey are clear: free or low-cost training needs to focus immediately on preservation of born-digital special collections materials, born-digital institutional records, and digitized collections. the topics of prime importance to respondents were “methods of preservation metadata extraction, creation and storage,” “determining what metadata to capture and store,” and “planning for provision of access over time.” the variations in ratings between respondents from self-identifying as aserl members versus non-aserl members indicates that the needs of those in research libraries differs somewhat from that of cultural heritage institutions in the field dealing with “the long tail” of digital content. 12 future training may need to target these differing audiences appropriately to ensure these needs are met. additionally, administrators need to be addressed as a unique audience in order to focus on the requirements for addressing “security and disaster planning” and “business continuity planning,” as these critical areas need to be developed by those in management positions. future surveys of this nature should include a component to determine the level of technical expertise and support the respondents have, as well as a measure of their position or power in the administrative hierarchy. continued surveys would be extremely helpful in ensuring that available educational options meet the needs of librarians and archivists in the field. as molinaro has pointed out, “getting the right information in the right hands at the right time is a problem that has plagued the library community for decades.” 13 now is the time to develop free, openly available, practical digital preservation training for those on the front lines, if we are to retain critical cultural heritage materials which are only available in digital form. for them to effectively perform necessary triage on incoming digital content, they must be trained in “first aid.” our history is at stake. information technology and libraries | june 2013 28 references 1. paul conway, “preservation in the age of google: digitization, digital preservation, and dilemmas,” library quarterly 80, no. 1 (january 2010): 73–74, doi:10.1086/64846.3. 2. clifford lynch, “challenges and opportunities for digital stewardship in the era of hope and crisis” (keynote speech, is&t archiving 2009 conference, arlington, virginia, may 2009). 3. karen f. gracy and miriam b. kahn, “preservation in the digital age,” american library association, library resources and technical services 56, no. 1 (2012): 30. 4. marshall breeding, “from disaster recovery to digital preservation,” computers in libraries 32, no. 4 (2012): 25. 5. mike kastellec, “practical limits to the scope of digital preservation,” information technology & libraries 31, no. 2 (2012): 70, doi:10.6017/ital.v31i2.2167. 6. charles dollar and lori ashley, “digital preservation capability maturity model,” ver. 2.4, (november 2012), https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 (accessed dec. 24, 2012). 7. mary molinaro, “how do you know what you don’t know? digital preservation education,” information standards quarterly 22, no. 2 (2010): 45. 8. conway, “preservation in the age of google,” 70. 9. library of congress, “digital preservation outreach & education: dpoe background,” accessed december 31, 2012, www.digitalpreservation.gov/education/background.html. 10. library of congress, “digital preservation outreach & education: dpoe curriculum,” accessed december 31, 2012, www.digitalpreservation.gov/education/curriculum.html. 11. jody l. deridder, “introduction to digital preservation—a three-part series based on the digital preservation, outreach and education (dpoe) model,” association of southeastern research libraries, 2012, [archived webinars], accessed december 31, 2012, www.aserl.org/archive. 12. jody l. deridder, “benign neglect: developing life rafts for digital content,” information technology & libraries 30:2 (june 2011): 71–74. 13. molinaro, “how do you know what you don’t know?” 47. https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 http://www.digitalpreservation.gov/education/background.html http://www.digitalpreservation.gov/education/curriculum.html http://www.aserl.org/archive/ letter from the editors (march 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14881 our first issue of 2022 brings the welcome appointment of marisha c. kelly as assistant editor for the journal. marisha is reference and instruction librarian at northcentral university, wh ere her job duties include planning, developing, integrating, implementing, and maintaining digital systems and services. she has a bachelor of science in journalism from syracuse university, a master of science in library and information science from drexel university, and is currently pursuing a master of science in information technology from northcentral university. contribute to the journal are you interested in furthering the scholarly record for library technology and have a background in information technology in libraries, archives, or museums? i would assume the answer is “yes” if you are reading this issue. ital needs new editorial board members to fill vacancies starting in july. joining the board is an exciting way for members of core to contribute to the profession and engage with colleagues across all types of organizations in examining the role of technology in libraries, archives, and museums. we are especially interested in applications from those in underrepresented groups and identities and encourage all interested individuals to apply. please see the full call for nominations for more information and details on how to apply. we also encourage all library technologists to consider submitting articles for publication. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. in this issue in the final thought-provoking editorial board thoughts column (“policy before technology— don’t outkick the coverage”) of his editorial board term, brady lund writes about the risks of adopting new technologies before thinking through the possible policy and practical implications of offering it. we likewise highly recommend the peer-reviewed content in this issue: 1. using dpla and the wikimedia foundation to increase usage of digitized resources / dominic byrd-mcdevitt and john dewees 2. researchgate metrics’ behavior and its correlation with rg score and scopus indicators / saeideh valizadeh-haghi, hamed nasibi-sis, maryam shekofteh, and shahabedin rahmatizadeh 3. balancing community and local needs: releasing, maintaining, and rearchitecting the institutional repository / daniel coughlin 4. using open access institutional repositories to save the student symposium during the covid-19 pandemic / allison symulevich and mark hamilton 5. migration of ict-based services of a research library to a cloud platform / francis jayakanth, ananda t. byrappa, and filbert minj 6. local hosting of faculty-created open education resources / joseph letriz kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu mkelly@ncu.edu https://www.ala.org/news/member-news/2022/01/marisha-c-kelly-selected-new-ital-assistant-editor https://drive.google.com/file/d/1-foy8y5hyhr8op9wmouvfc3yz3ctykeo/view?usp=sharing https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/14773 https://ejournals.bc.edu/index.php/ital/article/view/14773 https://ejournals.bc.edu/index.php/ital/article/view/13659 https://ejournals.bc.edu/index.php/ital/article/view/14033 https://ejournals.bc.edu/index.php/ital/article/view/14073 https://ejournals.bc.edu/index.php/ital/article/view/14073 https://ejournals.bc.edu/index.php/ital/article/view/14175 https://ejournals.bc.edu/index.php/ital/article/view/14175 https://ejournals.bc.edu/index.php/ital/article/view/13537 https://ejournals.bc.edu/index.php/ital/article/view/13803 mailto:varnum@umich.edu mailto:mkelly@ncu.edu contribute to the journal in this issue automating the diversity audit process public libraries leading the way automating the diversity audit process rachel k. fischer information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.16925 rachel k. fischer (rfischer@ccslib.org) is the member services librarian at cooperative computer services, a public library consortium in illinois. © 2023. introduction i’ve frequently come across the buzzwords “mirrors and windows” at conference sessions or in articles on collection development and diversity audits. this striking metaphor refers to how books and other items in libraries are windows into other cultures and mirrors that reflect our own lives and experiences. if libraries’ collections and programs don’t properly reflect the diversity in america, and the whole world, the citizens of this country may not be able to have access to the materials that they need to gain appreciation for other cultures, genders, sexual orientations, or socio-economic statuses. minorities will continue to feel marginalized by not seeing themselves and their experiences reflected on them in the books they read and movies they watch. rudine sims bishop, the professor who first popularized the metaphor, stated, “when there are enough books available that can act as both mirrors and windows for all our children, they will see that we can celebrate both our differences and similarities, because together they are what makes us human.”1 to get to that point, the first step that libraries need to take is to analyze their collections, programs, toys, and policies by doing a diversity audit. the diversity audit can function as a steppingstone towards an improved collection development policy. what is a diversity audit? organizations have historically conducted audits of the diversity of their staff or their policies to avoid lawsuits and to promote systemic change. once a baseline has been established, the company is expected to make systemic changes to improve the policies and diversity of the staff. the concept of diversity audits is rather new to libraries. karen jensen first promoted the concept of a diversity audit of library collections in 2017 in articles published on school library journal’s blog, “teen librarian tool box.”2 since then, sarah voels, a librarian at the cedar rapids public library, published a book on the topic titled, auditing diversity in library collections.3 regarding library collections, a diversity audit is a methodology for analyzing the amount of diversity represented by the items in the library to establish a baseline as a benchmark. the audit consists of analyzing the diversity represented by subjects, fictional characters, authors, and illustrators of the items in the library collection. these statistics can be compared to population statistics to set goals for increasing the diversity of the collection. library programs and toys can also be analyzed. types of diversity that are typically analyzed include race or ethnicity, gender, sexual orientation, religion, and socio-economic status. there are two main manual methodologies that librarians can choose from. library staff and volunteers can review the titles being audited and record the data in spreadsheets. the statistics are tabulated in the spreadsheet. audits can be done of titles as they are purchased, randomly selected titles in a collection, or a whole collection. another option is called a “reverse audit.” this type of audit is accomplished by comparing award lists of diverse books, like the pura belpré award, to the collection and purchasing the titles that the library doesn’t own. mailto:rfischer@ccslib.org information technology and libraries september 2023 public libraries leading the way: automating the diversity audit process 2 fischer doing a manual audit of a full collection or a whole library can be very time-consuming. betsy bird, collection development manager at evanston public library, estimated that it took about 12 weeks to audit 18,508 titles in the adult fiction collection. if they were to attempt a manual audit of the whole library, it would take more than 6000 hours to audit 382,981 items. can you imagine the number of hours it would take to audit the 923,673 bibliographic records for physical items that could be audited in the entire shared database of the cooperative computer services (ccs) consortium, the consortium that evanston public library is a member of? to accomplish that, an automated process needed to be utilized. existing automated tools for diversity audits several vendors are already providing an automated diversity audit service. diverse bookfinder’s collection analysis tool (cat) is a free diversity audit tool for picture books. it is an award winning digital tool that’s financially supported by bates college and the institute of museum and library services. the director and founder of the website is associate dean of faculty and professor of psychology at bates college, dr. krista aronson. diverse bookfinder has collected and analyzed more than 3,000 picture books published since 2002 featuring black people, indigenous people, and people of color (bipoc). cat allows you to upload a list of isbns and titles. the file is compared to the diverse bookfinder (dbf) collection. cat produces a report that depicts the results of the analysis in graphs. the report explains how many titles match the dbf collection. the graphs describe the representation of ethnicities in the collection that matches the dbf collection and how they are represented, such as in a biography or folklore. although this service is free, its capabilities are limited to analyzing the titles that match the dbf collection. the two leading collection analysis tools, collectionhq and libraryiq, both include diversity audit reports. collectionhq is vendor neutral and owned by baker and taylor. it specializes in public libraries. libraryiq is also vendor neutral and can support all types of libraries and library consortia. both services can analyze the diversity of the collection. they each have a user -friendly interface and graphs that are easily understood. collectionhq is now offering customers of baker and taylor’s cataloging utility, btcat, the ability to add diversity, equity, and inclusion (dei) subject headings to bibliographic records as a bulk process. however, only libraryiq can suggest items to purchase to increase the diversity of the collection. in addition to baker and taylor, other major library vendors, such as ingram and midwest tape, include diversity audit services. both companies allow customers to purchase a one-time analysis of their collection(s). ingram’s report includes data for the whole collection compared to the public library average for comparison. a separate report includes suggestions of diverse titles to purchase to improve the diversity of the collection. midwest tape’s library collection diversity audit specifically audits video and audiobook collections. this service also produces a report analyzing the diversity of the collection and identifies areas to improve. however, the company utilizes a third-party to assist them in providing community demographic data to compare the library’s collection to local and national demographics. midwest tape’s service is also integrated with hoopla instant to help the customer fill in the gaps in diversity in the collection. cooperative computer services’ diversity audit tool not all libraries are able to spend money on a diversity audit service provided by vendors. however, libraries that already have a systems administrator who is well-versed in sql or other query languages that are used by their ils can create their own automated diversity audit tool. ccs has completed the creation of a diversity audit tool using tableau (see fig. 1). this tool information technology and libraries september 2023 public libraries leading the way: automating the diversity audit process 3 fischer analyzes the diversity of the physical items in the member libraries’ collections according to the subject headings of the bibliographic records and allows the libraries to benchmark against the whole consortium’s data or other libraries of similar demographics. the categories audited include women, bipoc, lgbtqia+, disabilities/neurodiversity, religious minorities, immigrants, and low income/economic welfare. like vendor provided diversity audits, the ccs diversity audit tool is limited to the data in the bibliographic record. only the subject of the titles can be audited, not the characteristics of characters or creators. the tool also includes a function to drill down into the data of narrower subcategories. in addition to the diversity audit dashboard, a collection development dashboard allows the librarians to identify popular titles on diverse topics that are owned by other libraries that they don’t own yet. this makes the selection and cataloging processes more efficient because the library already knows that these books will be checked out and the bibliographic record already exists. for an introduction to the tool, check out the video at: https://youtu.be/nonp2mgssuo. information about the sql query that is used to pull the data from the database and build the tables that tableau uses to analyze the data can be found at: https://reports.ccslib.org/divdoc. figure 1. ccs diversity audit tool conclusion auditing the diversity of a whole library collection is possible with the use of automated diversity audit tools. with a growing number of vendors offering diversity audit services that are integrated into their collection development and sales platforms, it is becoming increasingly easier for selectors to identify diverse titles to add to library collections. it, systems administrators, and even catalogers without selector duties, don’t have to sit on the sidelines. they can become active contributors by working within libraries and consortia to create their own diversity audit tools using tableau, google data studio, or even excel. although homegrown solutions have limitations, being able to analyze an entire library or consortium automatically can greatly improve the efficiency of the diversity audit process and supplement the manual methodology. https://youtu.be/nonp2mgssuo https://reports.ccslib.org/divdoc information technology and libraries september 2023 public libraries leading the way: automating the diversity audit process 4 fischer endnotes 1 rudine sims bishop, “mirrors, window, and sliding glass doors,” perspectives: choosing and using books for the classroom 6, no. 3 (1990), https://scenicregional.org/wpcontent/uploads/2017/08/mirrors-windows-and-sliding-glass-doors.pdf. 2 karen jensen, “doing a ya collection diversity audit: understanding your local community (part 1),” teen librarian toolbox, school library journal, november 1, 2017, https://teenlibrariantoolbox.com/2017/11/01/doing-a-diversity-audit-understanding-yourlocal-community/. 3 sarah voels, auditing diversity in library collections (santa barbara: ca: libraries unlimited, 2022). https://scenicregional.org/wp-content/uploads/2017/08/mirrors-windows-and-sliding-glass-doors.pdf https://scenicregional.org/wp-content/uploads/2017/08/mirrors-windows-and-sliding-glass-doors.pdf https://teenlibrariantoolbox.com/2017/11/01/doing-a-diversity-audit-understanding-your-local-community/ https://teenlibrariantoolbox.com/2017/11/01/doing-a-diversity-audit-understanding-your-local-community/ introduction what is a diversity audit? existing automated tools for diversity audits cooperative computer services’ diversity audit tool conclusion endnotes editorial board thoughts drained-pool politics versus digital libraries in u.s. cyberspace mary a. guillory, mlis information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16988 about the author mary a. guillory (corresponding author: https://www.linkedin.com/in/maryaguillory/) is a member of the ital editorial board © 2023. opinions expressed in this column are the author’s and do not necessarily reflect those of the editorial board as a whole or of core, a division of ala. u.s. libraries in cyberspace are suffering from and combatting a series of actions and campaigns that aim to eliminate them rather than allocate mental bandwidth for diverse titles—in other words, digital libraries are dealing with modern-day drained-pool politics. as public, school, and university libraries become increasingly digital, the ability to simply switch off the entire library becomes more of a threat. with the internet being both literacy’s greatest enemy (it represents a single point of failure) and ally (it has been almost universally adopted), volume of access, money, and grey area legislation often play a large role in intellectual freedom. book banning—a naturally polarizing issue—has not yet found balance when it comes to books, magazines, audiobooks, movies, and music within digital libraries. though the capability to ban one book exists as it does in the physical world, access to digital libraries has become all-or-nothing in many instances around the nation. there is no consistency of approach across states, municipalities, or even school districts, with federal law applying only on a case-by-case basis. for example, brevard public schools in florida caused a stir last year when access to epic, the digital library software students had become accustomed to for leisurely family reading time, mysteriously disappeared.1 the school district cited an inability to comply with a new state law that requires all instructional material available in the digital library to be reviewed. while this loss of access primarily affected students under the age of 12 attending schools in the district, digital library bans have a way of expanding into the adult age group and the general public who may be receiving an entirely different type of education. in july 2023, mississippi state law essentially barred everyone under the age of 18 from the hoopla and overdrive digital libraries.2 in texas, patrons of the llano county library system—adults included—celebrated a small victory by having a federal judge restore some digital library access and the local government decide against the retaliatory closure of the libraries who filed the suit.3 when it comes to mass media, the category digital libraries belong to, there is a fine line between telling the stories of the people who make the world go round and promoting an idea or way of life. the realities of society can’t be treated as elephants in a room, when they are in fact the shared experiences of communities hiding in plain sight to avoid ostracism. that is, these are not stories that everyone knows about and refuses to acknowledge, but stories that people who most need to make a connection or understand a different viewpoint may never know exist. as the world depicted in fahrenheit 451 becomes closer to reality, there are both current tools in place and rebalancing efforts emerging. in september of this year, california took the opposite approach of texas and florida when its governor banned book bans in schools with state law.4 the bill was signed just months after https://www.linkedin.com/in/maryaguillory/ information technology and libraries december 2023 editorial board thoughts 2 guillory access to the sora digital library was taken away from orange unified school district students and families.5 many public libraries in the state of california already operate under a “universal borrower” policy that allows state residents to obtain a library card at any library, which provides access to a multitude of digital library materials. in georgia, a similar budding program exists. public libraries participating in the pines program allow residents to obtain cards at any participating library within the state. these preexisting tools facilitating access help to keep the intellectual freedom scale stable while newer programs hope to rebalance a tilting scale. some libraries and literacy organizations are taking a direct and national stand against book bans by opening their digital libraries nationwide to provide access to those who should be intellectually free. the brooklyn public library launched its books unbanned program to provide digital library access nation-wide to teens and young adults within the 13–21 age range.6 the program has grown with the boston public library, la county public library, san diego public library, and seattle public library all offering a variation. the digital public library of america’s banned book club created an entire digital library consisting of books that had been banned “somewhere.”7 broader legislation is needed to protect access to the truth, access to reality, and access to viewpoints sidelined by aggressive idealism. digital libraries are a melting pot for all the diverse people and experiences that make up the world. a story that is hard to hear is not a story that is wrong to listen to; writers are supposed to elicit emotion to help readers spend some time in someone else’s shoes. to learn more about this multifaceted issue, visit the american library association’s intellectual freedom advocacy webpage: https://www.ala.org/advocacy/intfreedom. endnotes 1 bailey gallion, “brevard public schools cancels free online library, math game to comply with new state law,” florida today, may 4, 2022, https://www.floridatoday.com/story/news/education/2022/05/04/brevard-public-schoolsremoves-access-epic-prodigy-florida-parental-rights-education-law/9629061002/. 2 kelly jensen, “hoopla, overdrive/libby now banned for those under 18 in mississippi,” book riot, july 7, 2023, https://bookriot.com/hoopla-overdrive-libby-now-banned-for-those-under18-in-mississippi/. 3 andrew albanese, “judge finds texas library's book bans unconstitutional, orders books returned,” publisher’s weekly, april 3, 2023, www.publishersweekly.com/pw/bytopic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bansunconstitutional-orders-books-returned.html; william melhado, “llano county library supporters declare victory as officials decide not to close all branches,” the texas tribune, april 13, 2023, https://www.texastribune.org/2023/04/13/llano-county-library-books/. 4 johnathan franklin, “new california law bars schoolbook bans based on racial and lgbtq topics,” npr, september 6, 2023, https://www.npr.org/2023/09/26/1201804972/californiagov-newsom-barring-book-bans-race-lgbtq. 5 jill replogle and michael flores, “a parent complained about a digital book. then an orange county school board suspended the whole library,” laist, february 3, 2023, https://www.ala.org/advocacy/intfreedom https://www.floridatoday.com/story/news/education/2022/05/04/brevard-public-schools-removes-access-epic-prodigy-florida-parental-rights-education-law/9629061002/ https://www.floridatoday.com/story/news/education/2022/05/04/brevard-public-schools-removes-access-epic-prodigy-florida-parental-rights-education-law/9629061002/ https://bookriot.com/hoopla-overdrive-libby-now-banned-for-those-under-18-in-mississippi/ https://bookriot.com/hoopla-overdrive-libby-now-banned-for-those-under-18-in-mississippi/ http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bans-unconstitutional-orders-books-returned.html http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bans-unconstitutional-orders-books-returned.html http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bans-unconstitutional-orders-books-returned.html https://www.texastribune.org/2023/04/13/llano-county-library-books/ https://www.npr.org/2023/09/26/1201804972/california-gov-newsom-barring-book-bans-race-lgbtq https://www.npr.org/2023/09/26/1201804972/california-gov-newsom-barring-book-bans-race-lgbtq information technology and libraries december 2023 editorial board thoughts 3 guillory https://laist.com/news/education/school-district-book-banning-censorship-appconservatives-orange-unifed. 6 “books unbanned,” brooklyn public library, accessed november 10, 2023, https://www.bklynlibrary.org/books-unbanned. 7 christopher parker, “readers can now access books banned in their area for free with new app,” smithsonian magazine, july 25, 2023, https://www.smithsonianmag.com/smartnews/banned-book-club-app-180982592/. https://laist.com/news/education/school-district-book-banning-censorship-app-conservatives-orange-unifed https://laist.com/news/education/school-district-book-banning-censorship-app-conservatives-orange-unifed https://www.bklynlibrary.org/books-unbanned https://www.smithsonianmag.com/smart-news/banned-book-club-app-180982592/ https://www.smithsonianmag.com/smart-news/banned-book-club-app-180982592/ 24 information technology and libraries | march 2011 ruben tous, manel guerrero, and jaime delgado semantic web for reliable citation analysis in scholarly publishing nevertheless, current practices in citation analysis entail serious problems, including security flaws related to the publishing process (e.g., repudiation, impersonation, and privacy of paper contents) and defects related to citation analysis, such as the following: ■■ nonidentical paper instances confusion ■■ author naming conflicts ■■ lack of machine-readable citation metadata ■■ fake citing papers ■■ impossibility for authors to control their related citation data ■■ impossibility for citation-analysis systems to verify the provenance and trust of citation data, both in the short and long term besides the fact that they do not provide any security feature, the main shortcoming of current citation-analysis systems such as isi citation index, citeseer (http:// citeseer.ist.psu.edu/), and google scholar is the fact that they count multiple copies or versions of the same paper as many papers. in addition, they distribute citations of a paper between a number of copies or versions, thus decreasing the visibility of the specific work. moreover, their use of different analysis databases leads to very different results because of differences in their indexing policies and in their collected papers.3 to remedy all these imperfections, this paper proposes a reference architecture for reliable citation analysis based on applying semantic trust mechanisms. it is important to note that a complete or partial adoption of the ideas defended in this paper will imply the effort to introduce changes within the publishing lifecycle. we believe that these changes are justified considering the serious flaws of the established solutions, and the relevance that citation-analysis systems are acquiring in our society. ■■ reference architecture we have designed a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. this architecture is based in the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow. as a trust scheme, we have chosen a public key infrastructure (pki), in which certificates are signed by certification authorities belonging to one or more hierarchical certification chains.4 trust scheme the goal of the architecture is to allow citation-analysis systems to verify the provenance and trust of machinereadable metadata about citations before incorporating analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, citation discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge management and security. because citation analysis has become the primary component in scholarly impact factor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that current practices need to be revised. this paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. the solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to independent reliable evidences that are resistant to forgery, impersonation, and repudiation. as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. i n recent years, the amount of scholarly communication brought into the digital realm has exponentially increased.1 this no-way-back process is fostering the exploitation of large-scale digitized scholarly repositories for analysis tasks, especially those related to impact factor calculation. the potential automation of the contribution– relevance calculation of scholarly artifacts and scholarly professionals has attracted the interest of several parties within the scholarly environment, and even outside of it. for example, one can find within articles of the spanish law related to the scholarly personnel certification the requirement that the papers appearing in the curricula of candidates should appear in the subject category listing of the journal citation reports of the science citation index.2 this example shows the growing relevance of these systems today. ruben tous (rtous@ac.upc.edu) is associate professor, manuel guerrero (guerrero@ac.upc.edu) is associate professor, and jaime delgado (jaime.delgado@ac.upc.edu) is professor, all in the departament d’arquitectura de computadors, universitat politècnica de catalunya, barcelona, spain. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 25 might send a signed notification of rejection. we feel that the notification of acceptance is necessary because in a certain kind of curriculum, evaluations for university professors conditionally accepted papers can be counted, and in other curriculums not. the camera-ready version will be signed by all the authors of the paper, not only the corresponding author like in the paper submission. after the camera-ready version of the paper has been accepted, the journal will send a signed notification of future publication. this notification will include the date of acceptance and an estimate date of publication. finally, once the paper has been published, the journal will send a signed notification of publication to the author. the reason for having both notification of future publication and notification of publication is that, again, some curriculum evaluations might be flexible enough to count papers that have been accepted for future publication, while stricter ones state explicitly that they only accept published papers. once this process has been completed, a citationanalysis system will only need to import the authors’ ca certificates (that is, the certificates of the universities, research centers, and companies) and the publishers’ ca certificates (like acm, ieee, springer, lita, etc.) to be able to verify all the signed information. a chain of cas will be possible both with authors (for example, university, department, and research line) and with publications (for example, publisher and journal). ■■ universal resource identifiers to ensure that authors’ uris are unique, they will have a tree structure similar to what urls have. the first level element of the uri will be the authors’s organization (be it a university or a research center) id. this organization id will be composed by the country code top-level domain (cctld) and the organization name, separated by an underscore.5 the citation-analysis system will be responsible for assigning these identifiers and ensuring that all organizations have different identifiers. then, in the same manner, each organization will assign second-level elements (similar to departments) and so forth. author’s ca_id: _ example: es_upc author ’s uri: author:/// . . . /. example: author://es_upc.dac/ruben.tous (in this example “es” is the cctdl for spain, upc (universitat politècnica de catalunya) is the university, and dac (departament d’arquitectura de computadors) is the department. them into their repositories. as a collateral effect, authors and publishers also will be able to store evidences (in the form of digitally signed metadata graphs) that demonstrate different facts related to the creating–editing–publishing process (e.g., paper submission, paper acceptance, and paper publication). to achieve these goals, our reference architecture requires each metadata graph carrying information about events to be digitally signed by the proper subject. because our approach is based in a pki trust scheme, each signing subject (author or publisher) will need a public key certificate (or identity certificate), which is an electronic document that incorporates a digital signature to bind a public key with an identity. all the certificates used in the architecture will include the public key information of the subject, a validity period, the url of a revocation center, and the digital signature of the certificate produced by the certificate issuer’s private key. each author will have a certificate that will include as a subject-unique identifier the author ’s universal resource identifier (uri), which we explain in the next section, along with the author ’s current information (such as name, e-mail, affiliation, and address) and previous information (list of former names, e-mails, and addresses), and a timestamp indicating when the certificate was generated. the certification authority (ca) of the author’s certificate will be the university, research center, or company with which the author is affiliated. the ca will manage changes in name, e-mail, and address by generating a new certificate in which the former certificate will move to the list of former information. changes in affiliation will be managed by the new ca, which will generate a new certificate with the current information. since the new certificate will have a new uri, the ca also will generate a signed link to the previous uri. therefore the citation-analysis system will be able to recognize the contributions signed with both certificates as contributions made by the same author. it will be the responsibility of the new ca to verify that the author was indeed affiliated to the former organization (which we consider a very feasible requirement). every time an author (or group of authors) submits a paper to a conference, workshop, or journal, the corresponding author will digitally sign a metadata graph describing the paper submission event. although the paper submission will only be signed by the corresponding author, it will include the uris of all the authors. journals (and also conferences and workshops) will have a certificate that contains their related information. their ca will be the organization or editorial board behind them (for instance, acm, ieee, springer, lita, etc.). if a paper is accepted, the journal will send a signed notification of acceptance, which will include the reviews, the comments from the editor, and the conditions for the paper to be accepted. if the paper is rejected, the journal 26 information technology and libraries | march 2011 ■■ microsoft’s conference management toolkit (cmt; http://cmt.research.microsoft.com) is a conference management service sponsored by microsoft research. it uses https to provide confidentiality, but it is a service for which you have to pay. although some of the web-based systems provide confidentiality through https, none of them provides nonrepudiation, which we feel is even more important. this is so because nonrepudiation allows authors to certify their publications to their curriculum evaluators. our proposed scheme always provides nonrepudiation because of its use of signatures. curriculum evaluators don’t need to search for the publisher’s website to find the evaluated author’s paper. in addition, our proposed scheme allows curriculum evaluations to be performed by computer programs. and confidentiality can easily be achieved by encrypting the messages with the public key of the destination of the message. it should not be difficult for authors to obtain the public key for the conference or journal (which could be included in its “call for papers” or included on its webpage). and, because the paper-submission message includes the author’s public key, notifications of acceptance, rejection, and publication can be encrypted with that key. ■■ modeling the scholarly communication process citation analysis systems operate over metadata about the scholarly communication process. currently, these metadata are usually automatically generated by the citation-analysis systems themselves, generally through a programmatic analysis of the scholarly artifacts unstructured textual contents. these techniques have several drawbacks, as enumerated already, but especially regarding the fact that there is metadata that cannot be inferred from the contents of a paper, like all the aspects of the publishing process. to allow citation-analysis systems accessing metadata about the entire scholarly artifacts lifecycle, we suggest a metadata model that captures a great part of the scholarly domain static and dynamic semantics. this model is based on knowledge representation techniques in semantic web, such as resource description framework (rdf) graphs and web ontology language (owl) ontologies. metadata and rdf the term “metadata” typically refers to a certain data representation that describes the characteristics of an information-bearing entity (generally another data representation such as a physical book or a digital video file). metadata plays a privileged role in the scholarly creations’ uris are built in a similar manner to authors’ uris. but it this case, the use of the country code as part of the publisher’s id is optional. because a creation and its metadata evolve through different stages (submission and camera-ready), we will use different uris for each phase. we propose the use of this kind of uri instead of other possible schemes such as the digital object identifier (doi), because the ones proposed in this paper has the advantage of being human readable and contain the cas chain.6 of course, that doesn’t mean that once published a paper cannot obtain a doi or another kind of identifier. publisher’s ca_id: or _ examples: lita and it_italianjournalofzoology creation’s uri: creation:// . . . / example: creation://lita.ital/vol27_num1_ paper124 confidentiality and nonrepudiation nowadays, some conferences manage their paper submissions and notifications of acceptance (with their corresponding reviews) through e-mail, while others use a web-based application, such as edas (http://edas.info/). the e-mail-based system has no means of providing any kind of confidentiality. each router through which the e-mail travel can see their contents (paper submissions and paper reviews). the web-based system can provide confidentiality through http secure (https), although some of the most popular applications (such as edas and myreview) do not provide it; their developers may not have thought that it was an important feature. the following is a short list of some of the existing web-based systems: ■■ edas (http://edas.info/) is probably the most popular sytem. it can manage a large number of conferences and special issues of journals. it does not provide confidentiality. ■■ myreview (http://myreview.intellagence.eu/index .php) is an open-source web application distributed under the gpl license for managing the paper submissions and paper reviews of a conference or journal. myreview is implemented with php and mysql. it does not provide confidentiality. ■■ conftool (http://www.conftool.net) is another web-based management system for conferences and workshops. a free license of the standard version is available for noncommercial conferences and events with fewer than 150 participants. it uses https to provide confidentiality. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 27 the purpose of the reference architecture described in this paper, we do not instruct which of the two described approaches for signing rdf graphs is to be used. the decision will depend on the implementation (i.e., on how the graphs will be interchanged and processed). owl and an ontology for the scholarly context to allow modeling the scholarly communication process with rdf graphs, we have designed an owl description logic (dl) ontology. owl is a vocabulary for describing properties and classes of rdf resources, complementing rdfs’s capabilities for providing semantics for generalization hierarchies of such properties and classes. owl enriches the rdfs vocabulary by adding, among others, relations between classes (e.g., disjointness), cardinality (e.g., “exactly one”), equality, richer typing of properties, characteristics of properties (e.g., symmetry), and enumerated classes. owl has the influence of more than ten years of dl research. this knowledge allowed the set of constructors and axioms supported by owl to be carefully chosen so as to balance the expressive requirements of typical applications with a requirement for reliable and efficient reasoning support. a suitable balance between these computational requirements and the expressive requirements was achieved by basing the design of owl on the sh family of description logics.10 the language has three increasingly expressive sublanguages designed for different uses: owl lite, owl dl, and owl full. we have chosen owl dl to define the ontology for capturing the static and dynamic semantics of the scholarly communication process. with respect to the other versions of owl, owl dl offers the most expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time). owl dl is so named because of its correspondence with description logics. figure 3 shows a simplified graphical view of the owl ontology we have defined for capturing static and dynamic semantics of the scholarly communication process. figure 4, figure 5, and figure 6 offer a (partial) tabular representation of the main classes and properties of the ontology. in owl, properties are independent from classes, but we have chosen to depict them in an object-oriented manner to improve understanding. for the same reason we have represented some properties as arrows between classes, despite this information being already present in the tables. uris do not appear as properties in the diagrams because each instance of a class will be an rdf resource, and any resource has a uri according to the rdf model. these uris will follow the rules described in the above section, “reference architecture.” it’s worth mentioning that the selection of the included properties has been based in the study of several metadata formats and standards, such as dublin communication process by helping identify, discover, assess, and manage scholarly artifacts. because metadata are data, they can be represented through any the existing data representation models, such as the relational model or the xml infoset. though the represented information should be the same regardless of the formalism used, each model offers different capabilities of data manipulation and querying. recently, a not-so-recent formalism has proliferated as a metadata representation model: rdf from the world wide web consortium (w3c).7 we have chosen rdf for modeling the citation lifecycle because of its advantages with respect to other formalisms. rdf is modular; a subset of rdf triples from an rdf graph can be used separately, keeping a consistent rdf model. it therefore can be used with partial information, an essential feature in a distributed environment. the union of knowledge is mapped into the union of the corresponding rdf graphs (information can be gathered incrementally from multiple sources). rdf is the main building block of the semantic web initiative, together with a set of technologies for defining rdf vocabularies like rdf schema (rdfs) and the owl.8 rdf comprises several related elements, including a formal model and an xml serialization syntax. the basic building block of the rdf model is the triple subjectpredicate-object. in a graph-theory sense, an rdf instance is a labeled directed graph consisting of vertices, which represent subjects or objects, and labeled edges, which represent predicates (semantic relations between subjects and objects). coming back to the scholarly domain, our proposal is to model static knowledge (e.g., authors and papers metadata) and dynamic knowledge (e.g., “the action of accepting a paper for publication,” or “the action of submitting a paper for publication”) using rdf predicates. the example in figure 1 shows how the action of submitting a paper for publication could be modeled with an rdf graph. figure 2 shows how the example in figure 1 would be serialized using the rdf xml syntax (the abbreviated mode). so, in our approach, we model assertions as rdf graphs and subgraphs. to allow anybody (authors, publishers, citation-analysis systems, or others) to verify a chain of assertions, each involved rdf graph must be digitally signed by the proper principal. there are two approaches to signing rdf graphs (as also happens with xml instances). the first approach applies when the rdf graph is obtained from a digitally signed file. in this situation, one can simply verify the signature on the file. however, in certain situations the rdf graphs or subgraphs come from a more complex processing chain, and one could not have access to the original signed file. a second approach deals with this situation, and faces the problem of digitally signing the graphs themselves, that is, signing the information contained in them.9 for 28 information technology and libraries | march 2011 note that instances of submitted and accepted event classes will point to the same creation instance because no modification of the creation is performed between these events. on the other hand, instances of tobepublished and published event classes will point to different creation instances (pointed by the cameraready and publishedcreation properties) because of the final editorial-side modifications to which a work can be subject. ■■ advantages of the proposed trust scheme the following is a short list of security features provided by our proposed scheme and attacks against which our proposed scheme is resilient: core (dc), dc’s scholarly works application profile, vcard, and bibtex.11 figure 4 shows the class publication and its subclasses, which represent the different kinds of publication. in the figure, we only show classes for journals, proceedings, and books. but it could obviously be extended to contain any kind of publication. figure 5 contains the classes for the agents of the ontology (i.e., the human beings that author papers and book chapters and the organizations to which human beings are affiliated or that edit publications). the figure also includes the creation class (e.g., a paper or a book chapter). finally, figure 6 has the part of the ontology that describes the different events that occur in the process of publishing a paper (i.e., paper submission, paper acceptance, notification of future publication, and publication). figure 1. example rdf graph semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 29 cryptography. the necessary changes do not apply only to the citation-management software, but also to all the involved parties in the publishing lifecycle (e.g., conference and journal management systems). authors and publishers would be the originators of the digitally signed evidences, thus user-friendly tools for generating and signing the rdf metadata would be required. plenty of rdf editors and digital signature toolkits exist, but we predict that conference and journal management systems such as edas could easily be extended to provide integrated functionalities for generating and processing digitally signed metadata graphs. this could be transparent to the users because the rdf documents would be automatically generated (and also signed in the case of the publishers) during the creating–editing– publishing process. because our approach is based on a pki trust scheme, we rely on a special setup assumption: the existence of cas, which certify that the identity information and the public key contained within the public key certificates of authors and publishers belong together. to get a publication recognized by a reliable citation-analysis system, an author or a publisher would need a public-key certificate issued by a ca trusted by this citation-analysis system. the selection of trusted ■■ an author can certify to any evaluation entity that will evaluate his or her curriculum the publications that he or she has done. ■■ an evaluator entity can query the citation-analysis system and get all the publications that a certain author has done. ■■ an author cannot forge notifications of publication. ■■ a publisher cannot repudiate the fact that it has published an article once it has sent the certificate. ■■ two or more authors cannot team up and make the system think that they are the same person to have more publications in their accounts (not even if they happen to have the same name). ■■ implications the adoption of the approach proposed in this paper has certain implications in terms of technological changes but also in terms of behavioral changes at some of the stages of the scholarly publishing workflow. regarding the technological impact, the approach relies on the use of semantic web technologies and public-key 2008–05–25 semantic web for reliable citation management in scholarly publishing . . . . . . figure 2. example rdf/xml representation of graph in figure 1 30 information technology and libraries | march 2011 figure 3. owl ontology for capturing the scholarly communication process figure 4. part of the ontology describing publications semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 31 the citation-analysis system obtains the information or whether the information is duplicated. the proposed approach guarantees that the citation-analysis subsystem can always verify the provenance and trust of the metadata, and the use of unique identifiers ensures the detection of duplicates. our approach also implies minor behavioral changes for authors, mainly related to the management of publickey certificates, which is often required for many other tasks nowadays. a collateral benefit of the approach would be the automation of the copyright transfer procedure, which in most cases still relies on handwritten signatures. authors would only be required to have their public-key certificate at hand (probably installed in the web browser), and the conference and journal management software would do all the work. cas by citation-analysis systems would require the deployment of the necessary mechanisms to allow an author or a publisher to ask for the inclusion of his or her institution in the list. however, this process would be eased if some institutional cas belonged to trust hierarchies (e.g., national or regional), so including some higher-level cas makes the inclusion of cas of some small institutions easier. another technological implication is related to the interchange and storage of the metadata. users and publishers should save the signed metadata coming from a publishing process digitally, and citation-analysis systems should harvest the digitally signed metadata. the metadata-harvesting process could be done in several different ways; but here raises an important benefit of the presented approach: the fact that it does not matter where figure 5. part of the ontology describing agents and creations 32 information technology and libraries | march 2011 domain, but which we have taken in consideration. in our approach, static and dynamic metadata cross many trust boundaries, so it is necessary to apply trust management techniques designed to protect open and decentralized systems. we have chosen a public-key infrastructure (pki) design to cover such a requirement. however, other approaches exist, such as the one by khare and rifkin, which combines rdf with digital signatures in a manner related to what is known as the “web of trust.”13 one aspect of any approach dealing with rdf and cryptography is how to digitally sign rdf graphs. as described above, in the section “modeling the scholarly communication process with semantic web knowledge representation techniques,” there are two different approaches for such a task, signing the file from which the graph will be obtained (which is the one we have chosen) or digitally signing the graphs themselves (the information represented in them), as described by carroll.14 ■■ conclusions the work presented in this paper describes a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. the paper defends that current practices in the analysis of impact of scholarly artifacts entail serious design and security flaws, including nonidentical instances confusion, author-naming conflicts, fake citing, repudiation, impersonation, etc. ■■ related work as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. regarding the use of ontologies and semantic web technologies for modeling the scholarly domain, we highlight the research by rodriguez, bollen, and van de sompel.12 they define a semantic model for the scholarly communication process, which is used within an associated large-scale semantic store containing bibliographic, citation, and use data. this work is related to the mesur (metrics from scholarly usage of resources) project (http://www.mesur.org) from los alamos national laboratory. the project’s main goal is providing novel mechanisms for assessing the impact of scholarly communication items, and hence of scholars, with metrics derived from use data. as in our case, the approach by rodriguez, bollen, and van de sompel models static and dynamic aspects of the scholarly communication process using rdf and owl. however, contrary to what happens in that approach, our work focuses on modeling the dynamic aspects of the creation–editing–publishing workflow, while the approach by rodriguez, bollen, and van de sompel focuses on modeling the use of alreadypublished bibliographic resources. regarding the combination of semantic web technologies with security aspects and cryptography, there exist several works that do not specifically focus in the scholarly figure 6. part of the ontology describing events semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 33 isi web of knowledge, http://www.isiwebofknowledge .com/ (accessed june 24, 2010); and eugene garfield, citation indexing: its theory and application in science, technology and humanities (new york: wiley, 1979). 3. judit bar-ilan, “an ego-centric citation analysis of the works of michael o. rabin based on multiple citation indexes,” information processing & management: an international journal 42 no. 6 (2006): 1553–66. 4. alfred arsenault and sean turner, “internet x.509 public key infrastructure: pkix roadmap,” draft, pkix working group, sept. 8, 1998, http://tools.ietf.org/html/draft-ietf-pkixroadmap-00 (accessed june 24, 2010). 5. internet assigned numbers authority (iana), root zone database, http://www.iana.org/domains/root/db/ (accessed june 24, 2010). 6. for information on the doi system, see bill rosenblatt, “the digital object identifier: solving the dilemma of copyright protection online,” journal of electronic publishing 3, no. 2 (1997). 7. resource description framework (rdf), world wide web consortium, feb. 10, 2004, http://www.w3.org/rdf/ (accessed june 24, 2010). 8. “rdf vocabulary description language 1.0: rdf schema. w3c working draft 23 january 2003,” http://www .w3.org/tr/2003/wd-rdf-schema-20030123/ (accessed june 24, 2010); “owl web ontology language overview. w3c recommendation 10 february 2004,” http://www.w3.org/tr/ owl-features/ (accessed june 24, 2010). 9. jeremy j. carroll, “signing rdf graphs,” in the semantic web—iswc 2003, vol. 2870, lecture notes in computer science, ed. dieter fensel, katia sycara, and john mylopoulos (new york: springer, 2003). 10. ian horrocks, peter f. patel-schneider, and frank van harmelen, “from shiq and rdf to owl: the making of a web ontology language” web semantics: science, services and agents on the world wide web 1 (2003): 10–11. 11. see the dublin core metadata initiative (dcmi), http:// dublincore.org/ (accessed june 24, 2010); julie allinson, pete johnston, and andy powell, “a dublin core application profile for scholarly works,” ariadne 50 (2007), http://www.ukoln .ac.uk/repositories/digirep/index/eprints_type_vocabulary_ encoding_scheme, http://www.ariadne.ac.uk/issue50/ allinson-et-al/ (accessed dec. 27, 2010); world wide web consortium, “representing vcard objects in rdf/xml: w3c note 22 february 2001,” http://www.w3.org/tr/2001/note -vcard-rdf-20010222/ (accessed dec. 3, 2010); and for bibtex, see “entry types,” http://nwalsh.com/tex/texhelp/bibtx-7. html (accessed june 24, 2010). 12. marko. a. rodriguez, johan bollen, and herbert van de sompel, “a practical ontology for the large-scale modeling of scholarly artifacts and their usage,” proceedings of the 7th acm/ ieee joint conference on digital libraries (2007): 278–87. 13. rohit khare and adam rifkin, “weaving a web of trust,” world wide web journal 2, no. 3 (1997): 77–112. 14. carroll, “signing rdf graphs.” the architecture presented in this work is based in the use of digitally signed rdf graphs in the different stages of the scholarly publishing workflow, in such a manner that authors, publishers, repositories, and citation-analysis systems could have access to independent reliable evidences. the architecture aims to allow the creation of a reliable information space that reflects not just static knowledge but also dynamic relationships, reflecting the full complexity of trust relationships between the different parties in the scholarly domain. to allow modeling the scholarly communication process with rdf graphs, we have designed an owl dl ontology. rdf graphs carrying instances of classes and properties from the ontology will be digitally signed and interchanged between parties at the different stages of the creation–editing–publishing process. citation-management systems will have access to these signed metadata graphs and will be able to verify their provenance and trust before incorporating them to their repositories. because citation analysis has become a critical component in scholarly impact factor calculation, and considering the relevance of this metric within the scholarly publishing value chain, we defend that the relevance of providing a reliable solution justifies the effort of introducing technological changes within the publishing lifecycle. we believe that these changes, which could be easily automated and incorporated to the modern conference and journal editorial systems, are justified considering the serious flaws of the established solutions and the relevance that citation-analysis systems are acquiring in our society ■■ acknowledgment this work has been partly supported by the spanish administration (tec2008-06692-c02-01 and tsi2007 66869-c02-01). references and notes 1. herbert van de sompel et al., “an interoperable fabric for scholarly value chains,” d-lib magazine 12 no. 10 (2006), http:// www.dlib.org/dlib/october06/vandesompel/10vandesompel .html (accessed jan. 19, 2011). 2. boletín oficial del estado (b.o.e.) 054 04/03/2005 sec 3 pag 7875 a 7887, http://www.boe.es/boe/dias/2005/03/04/pdfs/ a07875–07887.pdf (accessed june 24, 2010). see also thomson 100 information technology and libraries | june 2009 tutorial andrew darby and ron gilmour adding delicious data to your library website social bookmarking services such as delicious offer a simple way of developing lists of library resources. this paper outlines various methods of incorporating data from a delicious account into a webpage. we begin with a description of delicious linkrolls and tagrolls, the simplest but least flexible method of displaying delicious results. we then describe three more advanced methods of manipulating delicious data using rss, json, and xml. code samples using php and javascript are provided. o ne of the primary components of web 2.0 is social bookmarking. social bookmarking services allow users to store bookmarks on the web where they are available from any computer and to share these bookmarks with other users. even better, these bookmarks can be annotated and tagged to provide multiple points of subject access. social bookmarking services have become popular with librarians as a means of quickly assembling lists of resources. since anything with a url can become a bookmark, such lists can combine diverse resource types such as webpages, scholarly articles, and library catalog records. it is often desirable for the data stored in a social bookmarking account to be displayed in the context of a library webpage. this creates consistent branding and a more professional appearance. delicious (http://delicious .com/), one of the most popular social bookmarking tools, allows users to extract data from their accounts and to display this data on their own websites. delicious offers multiple ways of doing this, from simply embedding html in the target webpage to interacting with the api.1 in this paper we will begin by looking at the simplest methods for users uncomfortable with programming, and then move on to three more advanced methods using rss, json, and xml. our examples use php, a cross-platform scripting language that may be run on either linux/ unix or windows servers. while it is not possible for us to address the many environments (such as cmses) in which websites are constructed, our code should be adaptable to most contexts. this will be especially simple in the many popular php–based cmses such as drupal, joomla, and wordpress. it should be noted that the process of tagging resources in delicious requires little technical expertise, so the task of assembling lists of resources can be accomplished by any librarian. the construction of a website infrastructure (presumably by the library’s webmaster) is a more complex task that may require some programming expertise. linkrolls and tagrolls the simplest way of sharing links is to point users directly to the desired andrew darby (adarby@ithaca.edu) is web services librarian, and ron gilmour (rgilmour@ithaca.edu) is science librarian at ithaca college library, ithaca, new york. figure 1. delicious linkroll page adding delicious data to your library website | darby and gilmour 101 delicious page. to share all the items labeled “biology” for the user account “iclibref,” one could disseminate the url http://delicious.com/iclibref/ biology. the obvious downside is that the user is no longer on your website, and they may be confused by their new location and what they are supposed to do there. linkrolls, a utility available from the delicious site, provides a number of options for generating code to display a set of bookmarked links, including what tags to display, the number, the type of bullet, and the sorting criterion (see figure 1).2 this utility creates simple html code that can be added to a website. a related tool, tagrolls, creates the ubiquitous delicious tag cloud.3 for many librarians, this will be enough. with the embedded linkroll code, and perhaps a bit of css styling, they will be satisfied with the results. however, delicious also offers more advanced methods of interacting with data. for more control over how delicious data appears on a website, the user must interact with delicious through rss, json or xml. rss like most web 2.0 applications, delicious makes its content available as rss feeds. feeds are available at a variety of levels, from the delicious system as a whole down to a particular tag in a particular account. within a library context, the most useful types of feeds will be those that point to lists of resources with a given tag. for example, the request http://feeds.delicious.com/rss/iclibref/biology returns the rss feed for the “biology” tag of the “iclibref” account, with items listed as follows: darwin’s dangerous idea (evolution 1) 2008-0409t18:40:00z http://icarus.ithaca .edu/cgi-bin/pwebrecon. cgi?bbid=237870 iclibref this episode interweaves the drama in key moments of darwin&#039;s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001 biology to display delicious rss results on a website, the webmaster must use some rss parsing tool in combination with a script to display the results. the xml_rss package provides an easy way to read rss using php.4 the code for such an operation might look like this: parse(); foreach ($rss->getitems() as $item) { echo “”; } ?> this code uses xml_rss to parse the rss feed and then prints out a list of linked results. rss is designed primarily as a current awareness tool. consequently, a delicious rss feed only returns the most recent thirty-one items. this makes sense from an rss perspective, but it will not often meet the needs of librarians who are using delicious as a repository of resources. despite this limitation, the delicious rss feed may be useful in cases where currency is relevant, such as lists of recently acquired materials. json a second method to retrieve results from delicious is using javascript object notation or json.5 as with the rss feed method, a request with credentials goes out to the delicious server. the response returns in json format, which can then be processed using javascript. an example request might be http://feeds.delicious . c o m / v 2 / j s o n / i c l i b r e f / b i o l o g y . by navigating to this url, the json response can be observed directly. a json response for a single record (formatted for readability) looks like this: delicious.posts = [ {“u”:“http:\/\/icarus.ithaca .edu\/cgi-bin\/pwebrecon .cgi?bbid=237870”, “d”:“darwin’s dangerous idea (evolution 1)”, “t”:[“biology”], “dt”:“2008-04-09t06:40:00z”, “n”:“this episode interweaves the drama in key moments of darwin’s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001”} ]; it is instructive to look at the json feed because it displays the information elements that can be extracted: “u” for the url of the resource, “d” for the title, “t” for a comma-separated list of related tags, “n” for the note field, and “dt” for the timestamp. to display results in a webpage, the feed is requested using javascript: 102 information technology and libraries | june 2009 then the json objects must be looped through and displayed as desired. alternately, as in the script below, the json objects may be placed into an array for sorting. the following is a simple example of a script that displays all of the available data with each item in its own paragraph. this script also sorts the links alphabetically. while rss returns a maximum of thirty-one entries, json allows a maximum of one hundred. the exact number of items returned may be modified through the count parameter at the end of the url. at the ithaca college library, we chose to use json because at the time, delicious did not offer the convenient tagrolls, and the results returned by rss were displayed in reverse chronological order and truncated at thirty-one items. currently, we have a single php page that can display any delicious result set within our library website template. librarians generate links with parameters that designate a page title, a comma-delimited list of desired tags, and whether or not item descriptions should be displayed. for example, www.ithacalibrary.com/research/delish_feed. php?label=biology%20films&tag=bio logy,biologyi¬es=yes will return a page that looks like figure 2. the advantage of this approach is that librarians can easily generate webpages on the fly and send the url to their faculty members or add it to a subject guide or other webpage. the php script only has to read the “$_get” variables from the url and then query delicious for this content. xml delicious offers an application programming interface (api) that returns xml results from queries passed to delicious through https. for instance, the request https://api.del.icio.us/v1/posts/ recent?&tag=biology returns an xml document listing the fifteen most recent posts tagged as “biology” for a given account. unlike either the rss or the json methods, the xml api offers a means of retrieving all of the posts for a given tag by allowing requests such as https://api.del.icio.us/v1/ posts/all?&tag=biology. this type of request is labor intensive for the delicious server, so it is best to cache the results of such a query for future use. this involves the user writing the results of a request to a file on the server and then checking to see if such an archived file exists before issuing another request. a php utility called deliciousposts, which provides caching functionality, is available for free.6 note that the username is not part of the request and must be supplied separately. unlike the public rss or json feeds, using the xml api requires users to log in to their own account. from a script, this can be accomplished using the php curl function: $ch = curl_init(); curl_setopt($ch, curlopt_ url, $queryurl); curl_setopt($ch, curlopt_ userpwd, $username . “:” . $password); curl_setopt($ch, curlopt_ returntransfer, 1); $posts = curl_exec($ch); curl_close($ch); this code logs into a delicious account, passes it a query url, and makes the results of the query available as a string in the variable $posts. the content of $posts can then be processed as desired to create web content. one way of doing this is to use an xslt stylesheet to transform the results into html, which can then be printed to the browser: /* create a new dom document from your stylesheet */ $xsl = new domdocument; $xsl->load(“mystylesheet.xsl”); /* set up the xslt processor */ $xp = new xsltprocessor; $xp->importstylesheet($xsl); /* create another dom document from the contents of the $posts variable */ $doc = new domdocument; $doc->loadxml($posts); /* perform the xslt transformation and output the resulting html */ $html = $xp>transformtoxml($doc); echo $html; conclusion delicious is a great tool for quickly and easily saving bookmarks. it also offers some very simple tools such as linkrolls and tagrolls to add delicious content to a website. but to exert more control over this data, the user must interact with the delicious api or feeds. we have outlined three different ways to accomplish this: rss is a familiar option and a good choice if the data is to be used in a feed reader, or if only the most recent items need be shown. json is perhaps the fastest method, but requires some basic scripting knowledge and can only display one hundred results. the xml option involves more programming but allows an unlimited number of results to be returned. all of these methods facilitate the use of delicious data within an existing website. references 1. delicious, tools, http://delicious .com/help/tools (accessed nov. 7, 2008). 2. linkrolls may be found from your delicious account by clicking settings > linkrolls, or directly by going to http:// delicious.com/help/linkrolls (accessed nov. 7, 2008). 3. tagrolls may be found from your delivious account by clicking settings > tagrolls or directly by going to http:// delicious.com/help/tagrolls (accessed nov. 7, 2008) 4. martin jansen and clay loveless, “pear::package::xml_rss,” http://pear .php.net/package/xml_rss (accessed november 7, 2008). 5. introducing json, http://json.org (accessed nov. 7, 2008). 6. ron gilmour, “deliciousposts,” h t t p : / / r o n g i l m o u r. i n f o / s o f t w a r e / deliciousposts (accessed nov. 7, 2008). lita cover 2, cover 3, cover 4 mit press 92 index to advertisers information retrieval using a middleware approach danijela boberić krstićev information technology and libraries | march 2013 54 abstract this paper explores the use of a mediator/wrapper approach to enable the search of an existing library management system using different information retrieval protocols. it proposes an architecture for a software component that will act as an intermediary between the library system and search services. it provides an overview of different approaches to add z39.50 and search/retrieval via url (sru) functionality using a middleware approach that is implemented on the bisis library management system. that wrapper performs transformation of contextual query language (cql) into lucene query language. the primary aim of this software component is to enable search and retrieval of bibliographic records using the sru and z39.50 protocols, but the proposed architecture of the software components is also suitable for inclusion of the existing library management system into a library portal. the software component provides a single interface to server-side protocols for search and retrieval of records. additional protocols could be used. this paper provides practical demonstration of interest to developers of library management systems and those who are trying to use open-source solutions to make their local catalog accessible to other systems. introduction information technologies are changing and developing very quickly, forcing continual adjustment of business processes to leverage the new trends. these changes affect all spheres of society, including libraries. there is a need to add new functionality to existing systems in ways that are cost effective and do not require major redevelopment of systems that have achieved a reasonable level of maturity and robustness. this paper describes how to extend an existing library management system with new functionality supporting easy sharing of bibliographic information with other library management systems. one of the core services of library management systems is support for shared cataloging. this service consists of the following activities: a librarian when processing a new bibliographical unit first checks whether the bibliographic unit has already been recorded in another library in the world. if it is found, then the librarian stores that electronic records to his/her local database of bibliographic records. in order to enable those activities, it is necessary that standard way of communication between different library management systems exists. currently, the well-known standards in this area are z39.501 and sru.2 danijela boberić krstićev (dboberic@uns.ac.rs) is a member department of mathematics and informatics, faculty of sciences, university of novi sad, serbia. mailto:dboberic@uns.ac.rs information retrieval using a middleware approach | krstićev 55 in this paper, a software component that integrates services for retrieval bibliographic records using the z39.50 and sru standard is described. the main purpose of that component is to encapsulate server sides of the appropriate protocols and to provide a unique interface for communication with the existing library management system. the same interface may be used regardless of which protocols are used for communication with the library management system. in addition, the software component acts as an intermediary between two different library management systems. the main advantage of the component is that it is independent of library management system with which it communicates. also, the component could be extended with new search and retrieval protocols. by using the component, the functionality of existing library management systems would be improved and redevelopment of the existing system would not be necessary. it means that the existing library management system would just need to provide an interface for communication with that component. that interface can even be implemented as an xml web service. standards used for search and retrieval the z39.50 standard was one of the first standards that defined a set of services to search for and retrieve data. the standard is an abstract model that defines communication between the client and server and does not go into details of implementation of the client or server. the model defines abstract prefixes used for search that do not depend on the implementation of the underlying system. it also defines the format in which data can be exchanged. the z39.50 standard defines query language type-1, which is required when implementing this standard. the z39.50 standard has certain drawbacks that new generation of standards, like sru, is trying to overcome. sru tries to keep functionality defined by z39.50 standard, but to allow its implementation using current technologies. one of the main advantages of the sru protocol, as opposed to z39.50, is that it allows messages to be exchanged in a form of xml documents, which was not the case with the z39.50 protocol. the query language used in sru is called contextual query language (cql).3 the sru standard has two implementations, one in which search and retrieval is done by sending messages via the hypertext transfer protocol (http) get and post methods (sru version) and the other for sending messages using the simple object access protocol (soap) (srw version). the main difference between sru and srw is in the way of sending messages.4 the srw version of the protocol packs messages in the soap envelope element, while the sru version of the protocol sends messages based on parameter/value pairs that are included in the url. another difference between the two versions is that the sru protocol for messages transfer uses only http, while srw, in can use secure shell (ssh) and simple mail transfer protocol (smtp), in addition to http. information technology and libraries | march 2013 56 related work a common approach for adding sru support to library systems, most of which already support, the z39.50 search protocol,5 has been to use existing software architecture that supports the z39.50 protocol. simultaneously supporting both protocols is very important because individual libraries will not decide to move to the new protocol until it is widely adopted within the library community. one approach in the implementation of a system for retrieval of data using both protocols is to create two independent server-side components for z39.50 and sru, where both software components access a single database. this approach involves creating a server implementation from the scratch without the utilization of existing architectures, which could be considered a disadvantage. figure 1. software architecture of a system with separate implementations of serverside protocols this approach is good if there is an existing z39.50 or sru server-side implementation, or if there is a library management system, for example, that supports just the z39.50 protocol, but has open programming code and allows changes that would allow the development of an sru service. the system architecture that is based on this approach is shown in figure 1 as a unified modeling language (uml) component diagram. in this figure, the software components that constitute the implementation of the client and the server side for each individual protocol are clearly separated, while the database is shared. the main disadvantage of this approach is that adding support for new search and retrieval protocols requires the transformation of the query language supported by that new protocol into the query language of target system. for example, if the existing library management system uses a relational database to store bibliographic records, for every a new protocol added, its query language must be transformed into the structured query language (sql) supported by the database. z39.50 server side sru server side database z39.50 client side sru client side zservice sruservice jdbc information retrieval using a middleware approach | krstićev 57 however, in most commercial library management systems that support server-side z39.50, local development and maintenance of additional services may not be possible due to the closed nature of the systems. one of the solutions in this case would be to create a so-called “gateway” software component that implements both an sru server and a z39.50 client, used to access the existing z39.50 server. that is, if a sru client's application sends search request, the gateway will accept that request, transform it into the z39.50 request and forward the request to the z39.50 server. similarly, when the gateway receives a response from the z39.50 server, the gateway will transform this response in sru response and forward it to the client. in this way, the client will have the impression that communicates directly with the sru server, while the existing z39.50 server will think that it sends response directly to the z39.50 client. figure 2 presents a component diagram that represents the architecture of the system that is based on this approach. figure 2. software architecture of a system with a gateway the software architecture shown in the figure 2 is one of the most common approaches and is used by the library of congress (lc),6 which uses the commercial voyager7 library information system, which allows searching by the z39.50 protocol. in order to support search of the lc database using sru, indexdata8 developed the yazproxy software component,9 which is an sruz39.50 gateway. the same idea10 was used in the implementation of the "the european library”11 database sru client side jdbc gateway sru server side z39.50 client side srutoz3950converter zservice z39.50 server side sruservice information technology and libraries | march 2013 58 portal, which aims to provide integrated access to the major collections of all the european national libraries. another interesting approach in designing software architecture for systems dealing with retrieval of information can be observed in the systems involved in searching heterogeneous information sources. the architecture of these systems is shown in figure 3. the basic idea in most of these systems is to provide the user with a single interface to search different systems. this means that there is a separate component that will accept a user query and transform it into a query that is supported by the specific system component that offers search and data retrieval. this component is also known as a mediator. a separate wrapper component must be created for each system to be searched, to convert the user's query to a query that is understood by the particular target system.12 figure 3. architecture with the mediator/wrapper approach figure 3 shows a system architecture that enables communication with three different systems (system1, system2 and systemn), each of which may use a different query language and therefore need different wrapper components (wrapper1, wrapper2 and wrappern ). in this architecture, each system can be a new mediator component that will interact with other systems. that is, the wrapper component can communicate with the system or with another mediator. the role of the mediator is to accept the request defined by the user and send it to all wrapper components. the wrapper components know how to transform the query that is sent by a mediator into a query that is supported by the target system with which the wrapper communicates. in addition, the wrapper has to transform data received from the target system in a format prescribed by the mediator. communication between client applications and the mediator client mediator system1 system2 systemn wrapper1 wrapper2 wrappern converter1 concrete query languagenconcrete query language2concrete query language1 converter2 convertern uniform query language information retrieval using a middleware approach | krstićev 59 may be through one of the protocols for search and retrieval of information, for example through the sru or z39.50 protocols, or it may be a standard http protocol. systems in which the architecture is based on the mediator/wrapper approach are described in several papers. coiera et al (2005)13 describe the architecture of a system that deals with the federated search of journals in the field of medicine, using the internal query language unified query language (uql). for each information source with which the system communicates, a wrapper was developed to translate queries from uql into the native query language of the source. the wrapper also has the task of returning search results to the mediator. those results are returned as an xml document, with a defined internal format called a unified response language (urel). as an alternative to using particular defined languages (uql and urel), a cql query language and the sru protocol could be used. another example of the use of mediators is described by cousins and sanders (2006),14 who address the interoperability issues in cross-database access and suggest how to incorporate a virtual union catalogue into the wider information environment through the application of middleware, using the z39.50 protocol to communicate with underlying sources. software component for services integration this paper describes a software component that would enable the integration of services for search and retrieval of bibliographic records into an existing library system. the main idea is that the component should be modular and flexible in order to allow the addition of new protocols for search and easy integration into the existing system. based on the papers analyzed in the previous section, it was concluded that a mediator/wrapper approach would work best. the architecture of system that would include the component and that would allow search and retrieval of bibliographic records from other library systems is shown in figure 4. z39.50 client sru client library information system recordmanager intermediary mediator wrapper z39.50 server sru server information technology and libraries | march 2013 60 figure 4. architecture of system for retrieval of bibliographic records in figure 4, the central place is occupied by the intermediary component, which consists of a mediator component and a wrapper component. this component is an intermediary between the search service and an existing library system. the library system provides an interface (recordmanager) which is responsible for returning records that match the received query. figure 4 also shows the components that are client applications that use specific protocols for communication (sru and z39.50), as well as the components that represent the server-side implementation of appropriate protocols. this paper will not describe the architecture of components that implement the server side of the z39.50 and sru protocols, primarily because there are already a lot of open-source solutions15 that implement those components and can easily be connected with this intermediary component. in order to test the intermediary component, we used the server side of the z39.50 protocol developed through the jafer project16 ; for the sru server side, we developed a special web service in the java programming language. in further discussion, it is assumed that the intermediary component receives queries from server-side z39.50 and sru services, and that this component does not contain any implementation of these protocols. the mediator component, which is part of the intermediary component, must accept queries sent by the server-side search and retrieval services. the mediator component uses its own internal representation of queries, so it is therefore necessary to transform received queries into the appropriate internal representation. after that, the mediator will establish communication with the wrapper component, which is in charge of executing queries in existing library system. the basic role of the wrapper component is to transform queries received from the mediator into queries supported by library system. after executing the query, the wrapper sends search results as an xml document to the mediator. before sending those results to server side of protocol, the mediator must transform those results into the format that was defined by the client. mediator software component the mediator is a software component that provides a unique interface for different client applications. in this study, as shown in figure 4, a slightly different solution was selected. instead of the mediator communicating directly with the client application, which in the case of protocols for data exchange is client side of that protocol, it actually communicates with the server components that implement the appropriate protocols, and the client application exchanges messages with the corresponding server-side protocol. the z39.50 client exchanges messages with the appropriate z39.50 server, and it communicates with the mediator component. a similar process is done when communication is done using the sru protocol. what is important to emphasize is that the z39.50 and sru servers communicate with the mediator through a unified user interface, represented in figure 5 by class mediatorservice. in this way the same method is used to submit the query and receive results, regardless of which protocol is used. that means information retrieval using a middleware approach | krstićev 61 that our system becomes more scalable and that it is possible to add some new search and retrieval protocols without refactoring the mediator component. figure 5 shows the uml class diagram that describes the software mediator component. the mediatorservice class is responsible for communication with the server-side z39.50 and sru protocols. this class accepts queries from the server side of protocols and returns bibliographic records in the format defined by the server. the mediator can accept queries defined by different query languages. its task is to transform these queries to an internal query language, which will be forwarded to the wrapper component. in this implementation, accepted queries are transformed into an object representation of cql, as defined by the sru standard. one of the reasons for choosing cql is that concepts defined in the z39.50 standard query language can be easily mapped to the corresponding concepts defined by cql. cql is semantically rich, so can be used to create various types of queries. also, because it is based on the concept of context set, it is extensible and allows usage of various types of context sets for different purposes. so, cql is not just limited to the function of searching bibliographic material. it could, for example, be used for searching geographical data. accordingly, it was assumed that cql is a general query language and that probably any query language could be transformed into it. in this implementation, the object model of cql query defined in project cqljava17 was used. in the case that there is a new query language, it would be necessary to perform mapping of the new query language into cql or to extend the object model of cql with new concepts. this implementation of the mediator component could transform two different types of queries into the cql object model. currently, it can transform type-1 queries (used by z39.50) and cql queries into cql object representation. to to add a new query language, it would just be necessary to add a new class that would implement the interface queryconverter shown in figure 5, but the architecture of component mediator remains the same. one task of the mediator component is to return records in the format that was defined by the client that sent the request. information technology and libraries | march 2013 62 figure 5. uml class diagram of mediator component as the mediator communicates with the z39.50 and sru server side, the task of the z39.50 and sru server side will be to check whether the format that the client requires is supported by the underlying system. if it is not supported, the request is not sent to mediator. otherwise, the mediator ensures the transformation of retrieved records into the chosen format. the mediator obtains bibliographic records from the wrapper in the form of an xml document that is valid according to the appropriate xml schema.18 the xml schema allows the creation of an xml document describing bibliographic records according to the unimarc19 or marc2120 format. the current implementation of the mediator component supports transformation of bibliographic records into an xml document that can be an instance of the unimarcslim xml schema,21 the marc21slim xml schema,22 or the dublin core xml schema.23 adding support for a new format would require creating a new class that would extend the class recordserializer (figure 5). because this mediator component works with xml, the transformation of bibliographic records into a new format also could be done by using exstensible stylesheet language transformations (xslt). 0..11..1 0..1 1..* 0..1 0..1 mediatorservice + getrecords (object query, string format) : string[] wrapper + executequery (cqlnode cqlquery) : string[] cqlstringconverter + parsequery (object query) : cqlnode rpnconverter + parsequery (object query) : cqlnode queryconverter + parsequery (object query) : cqlnode marc21serializer + serialize (string r) : sting dublincoreserializer + serialize (string r) : sting unimarcserializer + serialize (string r) : sting recordserialize + serialize (string r) : sting information retrieval using a middleware approach | krstićev 63 wrapper software component the wrapper software component is responsible for ensuring communication between the mediator and the existing library system. that is, the wrapper component is responsible for transforming the cql object representation into a concrete query that is supported by the existing library system and for obtaining results that match the query. implementation of the wrapper component directly depends on the architecture of the existing library system. figure 7 proposes a possible architecture of the wrapper component. this proposed architecture assumes that the existing library system provides some kind of service that will be used by the wrapper component to send the query and obtain results. the recordmanager interface in figure 7 is an example of such a service. recordmanager has two operations, one which executes the query and returns the number of hits and the second operation which returns bibliographic records. this proposed solution is useful for libraries that use a library management system that can be extended. it may not be appropriate for libraries using an “off the self” library management system that cannot be extended. the proposed architecture of the wrapper component is based on a strategy design pattern,24 primarily because of the need for transformation of the cql query into a query that is supported by the library system. according to the cql concept of context sets, all prefixes that can be searched are grouped in context sets, and these sets are registered with the library of congress. the concept of context sets enables specific communities and users to define their own prefixes, relations, and modifiers without fear that their name will be identical to the name of prefix defined in another set. that is, it is possible to define two prefixes with the same name, but they belong to different sets and therefore have different semantics. cql offers the possibility of combining in a single query elements that are defined in different context sets. when parsing a query, it is necessary to check which context set a particular item belongs to and then to apply appropriate mapping of the element from the context set to the corresponding element defined by the query language used in the library system. the strategy design pattern includes patterns that describe the behavior of objects (behavioral patterns), which determine the responsibility of each object and the way in which objects communicate with each other. the main task of a strategy pattern is to enable easy adjustment of the algorithm that is applied by an object at runtime. strategy pattern defines a family of algorithms, each of which is encapsulated in a single object. figure 6 is shows a class diagram from the book “design patterns: elements of reusable object-oriented software,“25 which describes basic elements of strategy patterns. information technology and libraries | march 2013 64 figure 6. strategy design pattern the basic elements of this pattern are the classes context, strategy, concretestrategya and concretestrategyb. the class context is in charge of choosing and changing algorithms in a way that creates an instance of the appropriate class, which implements the interface strategy. interface strategy contains the method algorityinterface(), which should implement all classes that implement that interface. class concretestrategya implements one concrete algorithm. this design pattern is used when transforming cql queries primarily because cql queries can consist of elements that belong to different context sets, whose elements are interpreted differently. classes context, strategy, cqlstrategy and dcstrategy, shown in figure 7, are elements of strategy pattern responsible for mapping concepts defined by cql. the class context is responsible for selection of appropriate strategies for parsing, depending on which context set the element that is going to be transformed belongs to. class cqlstrategy and dcstrategy are responsible for mapping the elements belonging respectively to the cql or dublin core context set in the appropriate elements of a particular query language used by the library system. the use of strategy pattern makes it possible, in real time, to change the algorithm that will parse the query depending on what context set is used. the described implementation of a wrapper component enables the parsing of queries that contain only elements that belong to cql and/or the dublin core context set. in order to provide support for a new context set, a new implementation of interface strategy (figure 7) would be required, including an algorithm to parse the elements defined by this new set. information retrieval using a middleware approach | krstićev 65 figure 7. uml class diagram of wrapper component integration of intermediary software components into the bisis library system the bisis library system was developed at the faculty of science and the faculty of technical sciences in novi sad, serbia, and has had several versions since its introduction in 1993. the fourth and current version of the system is based on xml technologies. among the core functional units of bisis26 are: • circulation of library material • cataloging of bibliographic records • indexing and retrieval of bibliographic records • downloading bibliographic records through z39.50 protocol • creation of a card catalog • creation of statistical reports an intermediary software component has been integrated into the bisis system. the intermediary component was written in the java programming language and implemented as a web application. communication between server applications that support the z39.50 and sru protocols and the intermediary component is done using the software package hessian.27 hessian offers a simple implementation of two protocols to communicate with web services, a binary protocol and its corresponding xml protocol, both of which rely on http. use of hessian package makes it easy to create a java servlet on the server side and proxy object on client-side, which will be used to 0..1 1..1 0..11..1 0..1 1..1 context + + + setstrategy (string strategy) mapindext ounderlayingprefix (string index) parseoperand (string index, cqlt ermnode node) : void : string : object strategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object cqlstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object dcstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object recordmanager + + select (object query) getrecords (int hits[]) : int[] : string[] wrapper + executequery (cqlnode cqlquery) makequery (cqlnode cql, object underlayingquery) : string[] : object information technology and libraries | march 2013 66 communicate with the servlet. in this case, the proxy object is deployed on the server side of protocol and the intermediary component contains a servlet. communication between the intermediary and bisis is also realized using the hessian software package, which leads to the possibility of creating a distributed system because the existing library system, the intermediary component, and server applications that implement the protocols can be located on physically separate computers. the bisis library system uses the lucene software package for indexing and searching. lucene has defined its own query language,29 so the wrapper component that is integrated into bisis has to transform to the cql query object model the object representation of the query defined by lucene. therefore the wrapper first needs to determine to which context set the index belongs and then apply the appropriate strategy for mapping the index. the rules for mapping the index to lucene fields are read from the corresponding xml document that is defined for every context set. listing 1 below provides an example of an xml document that contains some rules for mapping indexes of the dublin core context set to lucene index fields. the xml element index represents the name of index which is going to be mapped, while the xml element mappingelement contains the name of lucene field. for example, the title index defined in the dublincore context set, which denotes search by title of the publication, is mapped to the field ti, which is used by the search engine of bisis system. title ti creator au subject sb listing 1. xml document with rules for mapping the dublincore context set after the index is mapped to corresponding fields in lucene, a similar procedure is repeated for a relationship that may belong to some other context set or may have modifiers that belong to some information retrieval using a middleware approach | krstićev 67 other context set. it is therefore necessary to change the current strategy for mapping into a new one. by doing this, all elements of the cql query are converted into a lucene query, so the new query can be sent to bisis to be executed. approximately 40 libraries in serbia currently use the bisis system, which includes a z39.50 client, allowing the libraries to search the collections of other libraries that support communication through the z39.50 protocol. by integrating the intermediary component in the bisis system, non-bisis libraries may now search the collections of libraries that use bisis. as a first step, the intermediary component was just integrated in a few libraries, without any major problems. the component is most useful to the city libraries that use system bisis, because they have many branches, which can now search and retrieve bibliographic records from their central libraries. the component could potentially be used by other library management system, assuming the presence of an appropriate wrapper component to transform cql to the target query language. conclusion this paper describes an independent, modular software component that enables the integration of a service for search and retrieval of bibliographic records into an existing library system. the software component provides a single interface to server-side protocols to search and retrieve records, and could be extended to support additional server-side protocols. the paper describes the communication of this component with z39.50 and sru servers. the software component was developed for integration with the bisis library system, but is an independent component that could be integrated in any other library system. the proposed architecture of the software component is also suitable for inclusion of the existing library system into a single portal. the architecture of the portal should involve one mediator component whose task would be to communicate with wrapper components of individual library systems. each library system would implement its own search and store functionalities and could function independently of the portal. the basic advantage of this architecture is that it is possible to include new library systems that provide search services. it is only necessary to add a new wrapper that will perform the appropriate transformation of the query obtained from the mediator component in a query that the library system can process. the task of the mediator is to send queries to the wrapper, while each wrapper can establish communication with a specific library system. after obtaining the results from underlying library system, the mediator should be able to combine results, remove duplicate, and sort results. in this way end user would have impression that he has been searched a single database. references 1. “information retrieval (z39.50): application service definition and protocol specification,” http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed february 22, 2013). http://www.loc.gov/z3950/agency/z39-50-2003.pdf information technology and libraries | march 2013 68 2. “search/retrieval via url,” http://www.loc.gov/standards/sru/. 3. “contextual query language – cql,” http://www.loc.gov/standards/sru/specs/cql.html. 4. eric lease morgan, "an introduction to the search/retrieve url service (sru),” ariadne 40 (2004), http://www.ariadne.ac.uk/issue40/morgan. 5. larry e. dixson, "yaz proxy installation to enhance z39.50 server performance,” library hi tech 27, no. 2 (2009): 277-285, http://dx.doi.org/10.1108/07378830910968227; mike taylor and adam dickmeiss, “delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 6. mike taylor and adam dickmeiss,“delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 7. “voyager integrated library system,” http://www.exlibrisgroup.com/category/voyager. 8. “indexdata,” http://www.indexdata.com/. 9. “yazproxy,” http://www.indexdata.com/yazproxy. 10. theo van veen and bill oldroyd, “search and retrieval in the european library,” d-lib magazine 10, no. 2 (2004), http://www.dlib.org/dlib/february04/vanveen/02vanveen.html.. 11. “тhe european library,” http://www.theeuropeanlibrary.org./tel4/. 12. gio wiederhold ,“mediators in the architecture of future information systems,” computer 25, no. 3 (1992): 38-49, http://dx.doi.org/10.1109/2/121508. 13. enrico coiera, martin walther, ken nguyen, and nigel h. lovell, “architecture for knowledgebased and federated search of online clinical evidence,” journal of medical internet research 7, no. 5 (2005), http://www.jmir.org/2005/5/e52/. 14. shirley cousins and ashley sanders, “incorporating a virtual union catalogue into the wider information environment through the application of middleware: interoperability issues in crossdatabase access,” journal of documentation 62, no. 1 (2006): 120-144, http://dx.doi.org/10.1108/00220410610642084. 15. “sru software and tools,” http://www.loc.gov/standards/sru/resources/tools.html; “z39.50 registry of implementators,” http://www.loc.gov/z3950/agency/register/entries.html. 16. “jafer toolkit project,” http://www.jafer.org. 17. “cql-java: a free cql compiler for java,” http://zing/z3950.org/cql/java/. http://www.loc.gov/standards/sru/ http://www.loc.gov/standards/sru/specs/cql.html http://www.ariadne.ac.uk/issue40/morgan http://dx.doi.org/10.1108/07378830910968227 http://www.exlibrisgroup.com/category/voyager http://www.indexdata.com/ http://www.indexdata.com/yazproxy http://www.dlib.org/dlib/february04/vanveen/02vanveen.html http://www.theeuropeanlibrary.org./tel4/ http://dx.doi.org/10.1109/2/121508 http://www.jmir.org/2005/5/e52/ http://dx.doi.org/10.1108/00220410610642084 http://www.loc.gov/standards/sru/resources/tools.html http://www.loc.gov/z3950/agency/register/entries.html http://www.jafer.org/ http://zing/z3950.org/cql/java/ information retrieval using a middleware approach | krstićev 69 18. bojana dimić, branko milosavljević and dušan surla,“xml schema for unimarc and marc 21 formats,” the electronic library 28, no. 2 (2010): 245-262, http://dx.doi.org/10.1108/02640471011033611. 19. “unimarc formats and related documentation,” http://www.ifla.org/en/publications/unimarcformats-and-related-documentation. 20. “marc 21 format for bibliographic data,” http://www.loc.gov/marc/bibliographic/. 21. “unimarcslim xml schema,” http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd. 22. “marc21slim xml schema,” http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd. 23. “dublincore xml schema,” http://www.loc.gov/standards/sru/resources/dc-schema.xsd. 24. erich gamma, richard helm, ralph johnson, and john vlissides, design patterns: elements of reusable object-oriented software (indianapolis: addison–wesley, 1994), 315-323. 25. ibid. 26. danijela boberić and branko milosavljević, “generating library material reports in software system bisis,” (proceedings of the 4th international conference on engineering technologies icet, novi sad, 2009); danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard”, the electronic library 27, no. 3 (2009): 474-495, http://dx.doi.org/10.1108/02640470910966916 (accessed february 22, 1013); bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloguing,” the electronic library 27, no. 3 (2009): 509-528, http://dx.doi.org/10.1108/02640470910966934 (accessed february 22, 2013); jelena rađenović, branko milosavljеvić and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43, no. 1 (2009): 63-76, http://dx.doi.org/10.1108/00330330934110 (accessed february 22, 2013); danijela tešendić, branko milosavljević and dušan surla, “a library circulation system for city and special libraries”, the electronic library 27, no. 1 (2009): 162-186, http://dx.doi.org/10.1108/02640470910934669. 27. “hessian,” http://hessian.caucho.com/doc/hessian-overview.xtp. 28. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” the electronic library 28, no. 4 (2010): 525-539, http://dx.doi.org/10.1108/02640471011065355. acknowledgement the work is partially supported by the ministry of education and science of the republic of serbia, through project no. 174023: "intelligent techniques and their integration into wide-spectrum decision support." http://dx.doi.org/10.1108/02640471011033611 http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.loc.gov/marc/bibliographic/ http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd http://www.loc.gov/standards/sru/resources/dc-schema.xsd http://dx.doi.org/10.1108/02640470910966916 http://dx.doi.org/10.1108/02640470910966934 http://dx.doi.org/10.1108/00330330934110 http://dx.doi.org/10.1108/02640470910934669 http://hessian.caucho.com/doc/hessian-overview.xtp http://dx.doi.org/10.1108/02640471011065355 abstract learning to share: measuring use of a digitized collection on flickr and in the ir melanie schlosser and brian stamper information technology and libraries | september 2012 85 abstract there is very little public data on usage of digitized library collections. new methods for promoting and sharing digitized collections are created all the time, but very little investigation has been done on the effect of those efforts on usage of the collections on library websites. this study attempts to measure the effects of reposting a collection on flickr on use of the collection in a library-run institutional repository (ir). the results are inconclusive, but the paper provides background on the topic and guidance for future efforts. introduction inspired by the need to provide relevant resources and make wise use of limited budgets, many libraries measure the use of their collections. from circulation counts and in-library use studies of print materials, to increasingly sophisticated analyses of usage of licensed digital resources, the techniques have changed even as the need for the data has grown. new technologies have simultaneously presented challenges to measuring use, and allowed those measurements to become more accurate and more relevant. in spite of the relative newness of the digital era, “librarians already know considerably more about digital library use than they did about traditional library use in the print environment.”1 arl’s libqual+,2 one of the most widelyadopted tools for measuring users’ perceptions of service quality, has recently been joined by digiqual and mines for libraries. these new statsqual tools3 extend the familiar libqual focus on users into the digital environment. there are tools and studies for seemingly every type of licensed digital content, all with an eye toward better understanding their users and making better-informed collection management decisions. those same tools and studies for measuring use of library-created digital collections are conspicuous in their absence. almost two decades into library collection digitization programs, there is not a significant body of literature on measuring use of digitized collections. a number of articles have been written about measuring usage of library websites in general; arendt and wagner4 is a recent example. in one of the few studies to specifically measure use of a digitized collection, herold5 uses google analytics to uncover the geographical location of users of a digitized archival image collection. otherwise, a literature search on usage studies uncovers very little. less formal communication channels are similarly quiet, and public usage data on digitized collections on library sites is virtually nonexistent. commercial sites for disseminating and sharing melanie schlosser (schlosser.40@osu.edu) is digital publishing librarian and brian stamper (stamper.10@osu.edu) is administrative associate, the ohio state university libraries, columbus, ohio. mailto:schlosser.40@osu.edu mailto:stamper.10@osu.edu information technology and libraries | september 2012 86 digital media frequently display simple use metrics (image views, for example, or file downloads) alongside content; such features do not appear on digitized collections on library sites. usage and digitization projects digitized library collections are created with an eye toward use from their early planning stages. an influential early clir publication on selecting collections for digitization written by a harvard task force6 included current and potential use of the analog and digitized collection as a criterion for selection. the factors to be considered include the quantitative (“how much is the collection used?”) and the qualitative (“what is the nature of the use?”). more than ten years later, ooghe and moreels7 find that use is still a criterion for selection of collections to digitize, tied closely to the value of the collection. facilitating discovery and use of the digitized collection is a major consideration during project development. payette and rieger8 is an early example of a study of the needs of users in digital library design. usability testing of the interface is frequently a component of site design; see jeng9 for a good overview of usability testing in the digital library environment. increasing usage of the digitized collection is also a major theme in metadata research and development. standards such as the open archives initiative’s protocol for metadata harvesting10 and object reuse and exchange11 are meant to allow discovery and reuse of objects in a variety of environments, and the linked data movement promises to make library data even more relevant and reusable in the world wide web environment.12 digital collection managers have also found more radical methods of increasing usage of their collections. inserting references into relevant wikipedia articles has become a popular way to drive more users to the library’s site.13 some librarians have taken the idea a step further and have begun reposting their digital content on third-party sites. the smithsonian pioneered one reposting strategy in 2008 when they partnered with flickr, the popular photo-sharing site, to launch flickr commons.14 the commons is a walled garden within flickr that contains copyrightfree images held by cultural heritage institutions such as libraries, archives, and museums. each partner institution has its own branded space “photostream” in flickr parlance organized into collections and sets. this model aggregates content from different organizations and locates it where users already are, but it still maintains the traditional institution/collection structure. flickr commons has been, by all measures, a very successful experiment in sharing collections with users. the smithsonian,15 the library of congress,16 the alcuin society,17 and the london school of economics18 have all written about their experiences with the commons. stephens19 and michel and tzoc20 give advice on how libraries can work with flickr, and garvin21 and vaughan22 take a broad view of the project and the partners. another sharing strategy is beginning to emerge, where digital collection curators contribute individual or small groups of images to thematic websites. a recent example is pets in collections,23 a whimsical tumblr photo blog created by the digital collections librarian at bryn mawr college. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 the site’s description states, “come on if you work in a library, archive, or museum, you know you’ve seen at least one of these a seemingly random image of that important person and his dog or a man and a monkey wearing overalls … so now you finally have a place to share them with the world!” the site requires submissions to include only the image and a link back to the institution or repository that houses it, although submitters may include more information if they choose. although more lighthearted than most traditional library image collections, it still performs the desired function of introducing users to digital collections they may never have encountered otherwise. clearly, these creative and thoughtful strategies are not dreamed up by digital librarians unconcerned with end use of their collections, so why do stewards of digitized collections so rarely collect, or at least publicly discuss, statistics on the use of their content? the one notable exception to this may shed some light on the matter. institutional repositories (irs) have been the one area of non-licensed digital library content where usage statistics are frequently collected and publicized. dspace,24 the widely-adopted ir platform developed by mit and hewlett-packard, has increasingly sophisticated tools for tracking and sharing use of the content it hosts. digital commons,25 the hosted ir solution created by bepress, provides automated monthly download reports for scholars who use it to archive their content. the development of these features has been driven by the need to communicate value to faculty and administrators. encouraging participation by faculty has been a major focus of ir managers since the initial ‘build it and they will come’ optimism faded and the challenge of adding another task to already busy faculty schedules became clear.26 having a clear need (outreach) and a defined audience (participating scholars) has led to a thriving program of usage tracking in the ir community. the lack of an obvious constituency and the absence of pointed questions about use in the digitized collections world have, one suspects, led to the current dearth of measurement tools and initiatives. still, questions about use do arise, particularly when libraries undertake laborintensive usability studies or venture into the somewhat controversial landscape of sharing library-created digital objects on third party sites.27 anecdotally, the thought of sharing library content elsewhere on the web also raises concerns about loss of context and control, as well as a fear of ‘dilution’ of the library’s web presence. “if patrons can use the library’s collections on other sites,” a fellow librarian once exclaimed, “they won’t come to the library’s website anymore!” without usage data, we cannot adequately answer questions about the value of our projects or the way they impact other library services. justification for study and research questions there were three major motivations for this project. first, inspired by the success of the flickr commons project, we wanted to explore a method for sharing our collections more widely. an image collection and a third-party image-sharing platform were an obvious choice, since image display is not a strength of our dspace-based repository. flickr is currently a major presence in information technology and libraries | september 2012 88 the image sharing landscape, and the existence of the commons was an added incentive for choosing flickr as our platform. second, the collection we selected for the project (described more fully below) is not fully described, and we wanted to take advantage of flickr’s annotation tools to allow user-generated metadata. since further description of the images would have required an unusual depth of expertise, we were not optimistic that we would receive much useful data, and in fact we did not. still, we lost nothing by asking, and gained familiarity with flickr’s capabilities for metadata capture. the final motivation for the project, and the focus of the study, was the desire to investigate the effect of third-party platform sharing of a local collection on usage of that collection on library sites. the data gathered were meant partly to inform our local practice, but also to address a concern that may hold librarians back from exploring such means of increasing collection usage the fear that doing so will divert traffic from library sites. we suspected that sharing collections more widely would actually increase usage of the items on library-owned sites, and the study was developed to explore the issue in a rigorous way. the research question for this study was: does reposting digitized images from a library site to a third-party image sharing site have an effect on usage of the images on the library site? about the study platforms for the study, the images were submitted to two different platforms the knowledge bank (kb),28 a library-managed repository, and flickr, a commercial image sharing site. the kb is an institutional repository built on dspace software with a manakin (xml-based) user interface. established in 2005, it holds more than 45,000 items, including faculty and student research, gray literature, institutional records, and digitized library collections. image collections like the one used in this study make up a small percentage of the items in the repository. in the kb’s organizational structure, the images in the study were submitted as a collection in the library’s community, under a sub-community for the special collection that contributed them. each image was submitted as an item consisting of one image file and dublin core metadata.29 the project originally called for submitting the images to flickr commons, but the commons was not accepting new partners during the study period. instead, we created a standard flickr pro account for the libraries, while following the commons guidelines in image rights and settings. in contrast to dspace’s community/sub-community/collection structure, flickr images are organized in sets, sets belong to collections, and all images make up the account owner’s photostream. a set was created for the images, with accompanying text giving background information and inviting users to contribute to the description of the images.30 the images were accompanied by the same metadata as the items in the kb, but the files themselves were higher resolution, to take advantage of flickr’s ability to display a range of sizes for each image. all items in the set were publicly learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 available for viewing, commenting, and tagging, and each image was accompanied by links back to the kb at the item, collection, and repository level. the collection the choice of a collection for the study was limited by a number of factors. first, and most obviously, it needed to be an image collection. second, it needed to be in the public domain, both to allow our digitization and distribution of the images, and also to satisfy flickr commons’ “no known copyright restrictions” requirement.31 this could be accomplished either by choosing a collection whose copyright protections had expired, or by removing restrictions from a collection to which the libraries owned the rights. third, the curator of the collection needed to be willing and able to post the images on a commercial site. this required not only an open-minded curator, but also a collection without a restrictive donor agreement or items containing sensitive or private information. finally, we wanted the collection to be of broad public interest. the collection chosen for the study was a set of 163 photographs from osu’s charles h. mccaghy collection of exotic dance from burlesque to clubs, held by the jerome lawrence and robert e. lee theatre research institute.32 the photographs, mainly images of burlesque dancers, were published on cabinet and tobacco cards in the 1890s, putting them solidly in the public domain. figure 1. "the devil's auction," j. gurney & son (studio). http://hdl.handle.net/1811/47633 (kb), http://www.flickr.com/photos/60966199@n08/5588351865/ (flickr) http://hdl.handle.net/1811/47633 learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 methodology phases the study took place in 2011 and was organized in three ten-week phases. for the first phase (january 31 through april 11), the images were submitted to the kb. the purpose of this phase was to provide a baseline level of usage for the images in the repository. in phase two (april 12 through june 20), half of the images were randomly selected and submitted to flickr (group a). the purpose of this phase was to determine what effect reposting would have on usage of items in the repository both on those images that were reposted, and on other images in the same collection that had not been reposted. in phase three (june 21 through august 29), the rest of the images (group b) were submitted to flickr. in this phase, we began publicizing the collection. publicity consisted of sharing links to the collection on social media and sending emails to scholars in relevant fields via email lists. these efforts led to further downstream publicity on popular and scholarly blogs.33 data collection the unit of measurement for the study was views of individual images. to understand the notion of a “view,” we must contrast two different ways that an image may be viewed in the knowledge bank. each image in the collection has an individual web page (the item page) where it is presented along with metadata describing it. in addition, from that page a visitor may download and save the image file itself (in this collection, a jpeg). in the former case, the image is an element in a web page, while in the latter it is an image file independent of its web context. search engines and other sources commonly link directly to such files, so it is not unusual for a visitor to download a file without ever having seen it in context. in light of this, we produced two data sets, one for visits to item pages, and another for file downloads. depending on one’s interpretation, either could be construed as a “view.” ultimately there was little distinction in usage patterns between the two types of measure. the data were generated by making use of dspace’s apache solr-based statistics system, which provides a queryable database of usage events. for each item in the study, we made two queries; one for per-day counts of item page views, and another for per-day counts of image file downloads (called “bitstream” downloads in dspace parlance.) in both cases, views that came from automated sources such as search engine indexing agents were excluded from our counts. views of the images in flickr were noted and used as a benchmark, but were not the focus of the study. unlike cumulative views, which are tabulated and saved indefinitely, flickr saves daily view numbers for only thirty days. as a result, daily view numbers for most of the study period were not available for analysis, and the discussion of the trends in the flickr data is necessarily anecdotal. information technology and libraries | september 2012 88 results at the end of the study period, the data showed very little usage of the collection in the repository. this lack of usage was relatively consistent through the three phases of the study, and in rough terms translates to less than one view of each item per day. of the two ways of measuring an image "view" either by counting views of the web page where the item can be found or by counting how many times the image file was downloaded there was little distinction. knowledge bank item pages received between 5 and 38 views per item, while files were downloaded between 5 and 34 times. further, there were no significant differences in number of views received between the first group released to flickr and the second. kb item page views image file downloads min median max min median max group a (images released to flickr in phase ii) 5 10 35 5 9 25 group b (images released to flickr in phase iii) 6 10 38 4 9 34 table 1. the items in the study are divided into group a and group b, depending on when the images were placed on flickr. this table shows that both groups received similar traffic over the course of the study, with items having between 5 and 38 views in both groups, with a median of 10 for both, and between 4 and 34 downloads, with a median of 9 for both groups. the items attracted more visitors on flickr, with the images receiving between 100 and 600 views each. with a few exceptions, the items that appeared towards the beginning of the set (as viewed by a user who starts from the set home page) received more views than items towards its end. this suggests a particular usage pattern start at the beginning, browse through a certain number of images, and navigate away. a more significant trend in the flickr data is that most views of the images came after publicity for the collection began (approximately midway through the third phase of the study). again, the lack of daily usage numbers on flickr makes it impossible to demonstrate the publicity ‘bump,’ but it was dramatic. we witnessed a similar, if smaller, ‘bump’ in usage of the items in the kb after publicity started. we were also able to identify 65 unique visitors to the kb who came to the site via a link on flickr, out of 449 unique visitors overall. of those who came to the kb from flickr, 31 continued on to other parts of the kb, and the rest left after viewing a single item or image. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 discussion with so little data, we cannot reliably answer the primary research question. reposting certainly does not seem to have lowered usage of the items in the kb, but the numbers of views in all phases were so small as to preclude drawing meaningful conclusions. a larger issue is the fact that much of the usage came immediately following our promotional efforts. this development complicated the research in a number of ways. first, because the promotional emails and social media messages specifically pointed users to the collection in flickr, it is impossible to know how the use may have differed if the primary link in the promotion had been to the knowledge bank. would the higher use seen on flickr simply have transferred to the kb? would the unfamiliarity and non-image-centric interface of the knowledge bank have thwarted casual users in their attempt to browse the collection? the centrality of the promotion efforts also suggests that one of the underlying assumptions of the study may have been wrong. this research project was premised on the idea that an openly available collection on a library website will attract a certain number of visitors (number dependent on the popularity and topicality of the subject of the collection) who find the content spontaneously via searching and browsing. placing that same content on a third-party site could theoretically divert a percentage of those users, who would then never visit the library’s site. the percentage of users diverted would likely depend on how many more users browse the third party site than the library site, as well as the relative position of the two in search rankings. the mccaghy collection should have been a good candidate for this type of use pattern. flickr is certainly heavily used and browsed, and burlesque, while not currently making headlines, is a subject with fairly broad popular appeal. the fact that users did not spontaneously discover the collection on either platform in significant numbers suggests that this may not be how discovery of library digitized collections works. it is not surprising that email lists and social media should drive larger numbers of users to a collection than happenstance the power of link curation by trusted friends via informal communication channels is well known. what is surprising is that it was the only significant use pattern in evidence. the primary takeaway is that promotion is key. if we do not promote our collections to the people who are likely to be interested in them, barring a stroke of luck, it is unlikely that they will be found. anecdotally, promotional efforts are often an afterthought in digital collections work a pleasant but unnecessary ‘extra.’ in our environment, the repository staff often feel that promotion is the work of the collection owner, who may not think of promoting the collection in the digital environment, nor know how to do so. as a result, users who would benefit from the collections simply do not know they exist. these results also suggest that librarians worried about the consequences of sharing their collections on third party sites may be worrying about the wrong thing. the sheer volume of information on any given topic makes it unlikely that any but the most dedicated researcher will information technology and libraries | september 2012 90 explore all available sources. most other users are likely to rely on trusted information sources (traditional media, blogs, social networking sites) to steer them towards the items that are most likely to interest them. instead of wondering if users will still come to the library’s site if the content is available elsewhere, perhaps we should be asking of our digital collections, “is anyone using them on any site?” and if the answer is no, the owners and caretakers of those collections should explore ways to bring them to the attention of relevant audiences. conclusion as a usage study of a collection hosted on a library site and a commercial site, this project was not a success. flawed assumptions and a lack of usable data resulted in an inability to address the primary research question in a meaningful way. however, it does shed light on the questions that motivated it. are our digitized collections being used? what effect do current methods of sharing and promotion have on that use? librarians working with digitized collections have fallen behind our colleagues in the print and institutional repository arenas in measuring use of collections, but we have the same needs for usage data. in the current climate of heightened accountability in higher education and publicly funded institutions, we need to demonstrate the value of what we do. we need to know when our efforts to promote our collections are working, and determine which projects have been most successful and merit continued development. and as always, we need to share our results, both formally and informally, with our colleagues. measuring use of digital resources is challenging, and obtaining accurate usage statistics requires not only familiarity with the tools involved, but also some understanding of the ways in which the numbers can be unrepresentative of actual use. the organizations that do collect usage statistics on their digitized collections should share their methods and their results with others to help foster an environment where such data are collected and used. next steps in this area could take the shape of further research projects, or simply more visible work collecting usage statistics on digital collections. of greatest utility to the field would be data demonstrating the relative effectiveness of different methods of increasing use. do labor-intensive usability studies deliver returns in the form of increased use of the finished site? which forms of reposting generate the most views? what types of publicity are most effective in bringing users to collections? how does use of a collection change over time? there are also more policy-driven questions to be answered. for example, should further investment in a collection or site be tied to increasing use of low-traffic collections, or capitalizing on success? differences in topic, format, and audience make it difficult to generalize in this area, but we can begin building a body of knowledge that helps us learn from each other’s successes and failures. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 91 references 1 brinley franklin, martha kyrillidou, and terry plum. "from usage to user: library metrics and expectations for the evaluation of digital libraries." in evaluation of digital libraries: an insight into useful applications and methods, ed. giannis tsakonas and christos papatheodorou, 17-39. (oxford: chandos publishing, 2009). http://www.libqual.org/publications (accessed february 29, 2012) 2 “libqual+,” accessed february 29, 2012. http://www.libqual.org/home 3 “statsqual,” accessed february 29, 2012. http://www.digiqual.org/ 4 julie arendt and cassie wagner. "beyond description: converting web site usage statistics into concrete site improvement ideas." journal of web librarianship 4, no. 1 (2010): 37-54. 5 irene m. h. herold. "digital archival image collections: who are the users?" behavioral & social sciences librarian 29, no. 4 (2010): 267-282. 6 dan hazen, jeffrey horrell, and jan merrill-oldham. selecting research collections for digitization. (council on library and information resources, 1998). http://www.clir.org/pubs/reports/hazen/pub74.html (accessed february 29, 2012) 7 bart ooghe and dries moreels. "analysing selection for digitisation: current practices and common incentives." d-lib magazine 15, no. 9 (2009): 28. http://www.dlib.org/dlib/september09/ooghe/09ooghe.html. 8 sandra d. payette and oya y. rieger. "supporting scholarly inquiry: incorporating users in the design of the digital library." the journal of academic librarianship 24, no. 2 (1998): 121-129. 9 judy jeng. "what is usability in the context of the digital library and how can it be measured?" information technology & libraries 24, no. 2 (2005): 47-56. 10 “open archives initiative protocol for metadata harvesting,” accessed february 29, 2012. http://www.openarchives.org/pmh/ 11 “open archives initiative object reuse and exchange,” accessed february 29, 2012. http://www.openarchives.org/ore/ 12 eric miller and micheline westfall. "linked data and libraries." serials librarian 60, no. 1&4 (2011): 17-22. 13 ann m. lally and carolyn e. dunford. “using wikipedia to extend digital collections,” d-lib magazine 13, no. 5&6 (2007). accessed february 29, 2012. doi:10.1045/may2007-lally 14 “flickr: the commons,” accessed february 29, 2012. http://www.flickr.com/commons/ 15 martin kalfatovic, effie kapsalis, katherine spiess, anne camp, and michael edson. "smithsonian team flickr: a library, archives, and museums collaboration in web 2.0 space." archival science 8, no. 4 (2008): 267-277. http://www.libqual.org/publications http://www.libqual.org/home http://www.digiqual.org/ http://www.clir.org/pubs/reports/hazen/pub74.html http://www.dlib.org/dlib/september09/ooghe/09ooghe.html http://www.openarchives.org/pmh/ http://www.openarchives.org/ore/ http://www.flickr.com/commons/ information technology and libraries | september 2012 92 16 josh hadro. "lc report positive on flickr pilot." library journal 134, no. 1 (2009): 23. 17 jeremiah saunders. “flickr as a digital image collection host: a case study of the alcuin society,” collection management 33, no. 4 (2008): 302-309. doi: 10.1080/01462670802360387 18 victoria carolan and anna towlson. "a history in pictures: lse archives on flickr." aliss quarterly 6 (2011): 16-18. 19 michael stephens. "flickr." library technology reports 42, 4 (2006): 58-62. 20 jason paul michel and elias tzoc. "automated bulk uploading of images and metadata to flickr." journal of web librarianship 4, no. 4 (10, 2010): 435-448. 21 peggy garvin. "photostreams to the people." searcher 17, no. 8 (2009): 45-49. 22 jason vaughan. "insights into the commons on flickr." portal: libraries & the academy 10, no. 2 (2010): 185-214. 23 “pets-in-collections,” accessed february 29, 2012. http://petsincollections.tumblr.com/ 24 “dspace,” accessed february 29, 2012. http://www.dspace.org/ 25 “digital commons,” accessed february 29, 2012. http://digitalcommons.bepress.com/ 26 dorothea salo. "innkeeper at the roach motel." library trends 57, no. 2 (2008): 98-123. 27 for an example of the type of debate that tends to surround projects like flickr commons, see http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/. (accessed february 29, 2012) 28 “the knowledge bank,” accessed february 29, 2012. http://kb.osu.edu 29 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://hdl.handle.net/1811/47556 30 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://flic.kr/s/ahsjua3bgi 31 “flickr: the commons (usage),” accessed february 29, 2012. http://www.flickr.com/commons/usage/ 32 “the jerome lawrence and robert e. lee theatre research institute,” http://library.osu.edu/find/collections/theatre-research-institute/; “charles h. mccaghy collection of exotic dance from burlesque to clubs,” http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-andspecial-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/; “loose women in tights digital exhibit,” http://library.osu.edu/find/collections/theatreresearch-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/. accessed february 29, 2012. http://petsincollections.tumblr.com/ http://www.dspace.org/ http://digitalcommons.bepress.com/ http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/.%29 http://hdl.handle.net/1811/47556 http://flic.kr/s/ahsjua3bgi http://www.flickr.com/commons/usage/ http://library.osu.edu/find/collections/theatre-research-institute/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 93 33 for an example of the kind of coverage it received, see http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesqueperformers (accessed february 29, 2012) http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers laneconnex | ketchell et al. 31 laneconnex: an integrated biomedical digital library interface debra s. ketchell, ryan max steinberg, charles yates, and heidi a. heilemann this paper describes one approach to creating a search application that unlocks heterogeneous content stores and incorporates integrative functionality of web search engines. laneconnex is a search interface that identifies journals, books, databases, calculators, bioinformatics tools, help information, and search hits from more than three hundred full-text heterogeneous clinical and bioresearch sources. the user interface is a simple query box. results are ranked by relevance with options for filtering by content type or expanding to the next most likely set. the system is built using component-oriented programming design. the underlying architecture is built on apache cocoon, java servlets, xml/xslt, sql, and javascript. the system has proven reliable in production, reduced user time spent finding information on the site, and maximized the institutional investment in licensed resources. m ost biomedical libraries separate searching for resources held locally from external database searching, requiring clinicians and researchers to know which interface to use to find a specific type of information. google, amazon, and other web search engines have shaped user behavior and expectations.1 users expect a simple query box with results returned from a broad array of content ranked or categorized appropriately with direct links to content, whether it is an html page, a pdf document, a streaming video, or an image. biomedical libraries have transitioned to digital journals and reference sources, adopted openurl link resolvers, and created institutional repositories. however, students, clinicians, and researchers are hindered from maximizing this content because of proprietary and heterogeneous systems. a strategic challenge for biomedical libraries is to create a unified search for a broad spectrum of licensed, open-access, and institutional content. n background studies show that students and researchers will use the search path of least cognitive resistance.2 ease and speed are the most important factors for using a particular search engine. a university of california report found that academic users want one search tool to cover a wide information universe, multiple formats, full-text availability to move seamlessly to the item itself, intelligent assistance and spelling correction, results sorted in order of relevance, help navigating large retrievals by logical subsetting and customization, and seamless access anytime, anywhere.3 studies of clinicians in the patient-care environment have documented that effort is the most important factor in whether a patient-care question is pursued.4 for researchers, finding and using the best bioinformatics tool is an elusive problem.5 in 2005, the lane medical library and knowledge management center (lane) at the stanford university medical center provided access to an expansive array of licensed, institutional, and open-access digital content in support of research, patient care, and education. like most of its peers, lane users were required to use scores of different interfaces to search external databases and find digital resources. we created a local metasearch application for clinical reference content, but it did not integrate result sets from disparate resources. a review of federated-search software in the marketplace found that products were either slow or they limited retrieval when faced with a broad spectrum of biomedical content. we decided to build on our existing application architecture to create a fast and unified interface. a detailed analysis of lane website-usage logs was conducted before embarking on the creation of the new search application. key points of user failure in the existing search options were spelling errors that could easily be corrected to avoid zero results; lack of sufficient intuitive options to move forward from a zero-results search or change topics without backtracking; lack of use of existing genre or role searches; confusion about when to use the resource, openurl resolver, or pubmed search to find a known item; and results that were cognitively difficult to navigate. studies of the web search engine and the pubmed search log concurred with our usagelog analysis: a single term search is the most common, with three words maximum entered by typical users.6 a pubmed study found that 22 percent of user queries were for known items rather than for a general subject, confirming our own log analysis findings that the majority of searches were for a particular source item.7 search-term analysis revealed that many of our users were entering partial article citations (e.g., author, date) in any query debra s. ketchell (debra.ketchell@gmail.com) is the former associate dean for knowledge management and library director; ryan max steinberg (ryan.max.steinberg@stanford .edu) is the knowledge integration programmer/architect; charles yates (charles.yates@stanford.edu) is the systems software developer; and heidi a. heilemann (heidi.heilemann@stanford .edu) is the former director for research & instruction and current associate dean for knowledge management and library director at the lane medical library & knowledge management center, information resources & technology, stanford university school of medicine, stanford, california. 32 information technology and libraries | march 2009 box expecting that article databases would be searched concurrently with the resource database. our displayed results were sorted alphabetically, and each version of an item was displayed separately. for the user, this meant a cluttered list with redundant title information that increased their cognitive effort to find meaningful items. overall, users were confronted with too many choices upfront and too few options after retrieving results. focus groups of faculty and students were conducted in 2005. attendees wanted local information integrated into the proposed single search. local information included content such as how-to information, expertise, seminars, grand rounds, core lab resources, drug formulary, patient handouts, and clinical calculators. most of this content is restricted to the stanford user population. users consistently described their need for a simple search interface that was fast and customized to the stanford environment. in late 2005, we embarked on a project to design a search application that would address both existing points of failure in the current system and meet the expressed need for a comprehensive discovery-andfinding tool as described in focus groups. the result is an application called laneconnex. n design objectives the overall goal of laneconnex is to create a simple, fast search across multiple licensed, open-access, and special-object local knowledge sources that depackages and reaggregates information on the basis of stanford institutional roles. the content of lane’s digital collection includes forty-five hundred journal titles and fortytwo thousand other digital resources, including video lectures, executable software, patient handouts, bioinformatics tools, and a significant store of digitized historical materials as a result of the google books program. media types include html pages, pdf documents, jpeg images, mp3 audio files, mpeg4 videos, and executable applications. more than three hundred reference titles have been licensed specifically for clinicians at the point of care (e.g., uptodate, emedicine, stat-ref, and micromedex clinical evidence). clinicians wanted their results to reflect subcomponents of a package (e.g., results from the micromedex patient handouts). other clinical content is institutionally managed (e.g., institutional formulary, lab test database, or patient handouts). more than 175 biomedical research tools have been licensed or selected from open-access content. the needs of biomedical researchers include molecular biology tools and software, biomedical literature databases, citation analysis, chemical and engineering databases, expertise-finding tools, laboratory tools and supplies, institutional-research resources, and upcoming seminars. the specific objectives of the search application are the following: n the user interface should be fast, simple, and intuitive, with embedded suggestions for improving search results (e.g., did you mean? didn’t find it? have you tried?). n search results from disparate local and external systems should be integrated into a single display based on popular search-engine models familiar to the target population. n the query-retrieval and results display should be separated and reusable to allow customization by role or domain and future expansion into other institutional tools. n resource results should be ranked by relevance and filtered by genre. n metasearch results should be hit counts and filtered by category for speed and breadth. results should be reusable for specific views by role. n finding a known article or journal should be streamlined and directly link to the item or “get item” option. n the most popular search options (pubmed, google, and lane journals) should be ubiquitous. n alternative pathways should be dynamic and interactive at the point of need to avoid backtracking and dead ends. n user behavior should be tracked by search term, resource used, and user location to help the library make informed decisions about licensing, metadata, and missing content. n off-the-shelf software should be used when available or appropriate with development focused on search integration. n the application should be built upon existing metadata-creation systems and trusted webdevelopment technologies. based on these objectives, we designed an application that is an extension of existing systems and technologies. resources are acquired and metadata are provided using the voyager integrated library system (ils). the sfx openurl link resolver provides full-text article access and expands the title search beyond biomedicine to all online journals at stanford. ezproxy provides seamless off-campus access. webtrends provides usage tracking. movable type is used to create faq and help information. a locally developed metasearch application provides a cross search with hit results from more than three hundred external and internal full-text sources. the technologies used to build laneconnex and integrate all of these systems include extensible stylesheet language laneconnex | ketchell et al. 33 transformations (xslt), java, javascript, the apache cocoon project, and oracle. n systems description architecture laneconnex is built on a principle of separation of concerns. the lane content owner can directly change the inclusion of search results, how they are displayed, and additional path-finding information. application programmers use java, javascript, xslt, and structured query language (sql) to create components that generate and modify the search results. the merger of content design and search results occurs “just in time” in the user’s browser. we use component-oriented programming design whereby services provided within the application are defined by simple contracts. in laneconnex, these components (called “transformers”) consume xml information and, after transforming it in some way, pass it on to some other component. a particular contract can be fulfilled in different ways for different purposes. this component architecture allows for easy extension of the underlying apache cocoon application. if laneconnex needs to transform some xml data that is not possible with built-in cocoon transformers, it is a simple matter to create a software component that does what is needed and fulfills the transformer contract. apache cocoon is the underlying architecture for laneconnex, as illustrated in figure 1. this java servlet is an xml–publishing engine that is built upon a component framework and uses a pipeline-processing model. a declarative language uses pattern matching to associate sets of processing components with particular request urls. content can come from a variety of sources. we use content from the local file system, network file system, http, and a relational database. the xslt language is used extensively in the pipelines and gives fine control of individual parts of the documents being processed. the end of processing is usually an xhtml document but can be any common mime type. we use cocoon to separate areas of concern so things like content, look and feel, and processing can all be managed as separate entities by different groups of people with little effect on another area. this separation of concerns is manifested by template documents that contain most of the html content common to all pages and are then combined with content documents within a processing pipeline. the declarative nature of the sitemap language and xslt facilitate rapid development with no need to redeploy the entire application to make changes in its behavior. the laneconnex search is composed of several components integrated into a query-and-results interface: oracle resource metadata, full-text metasearch application, movable type blogging software, “did you mean?” spell checker, ezproxy remote access, and webtrends tracking. n full-text metasearch integration of results from lane’s metasearch application illustrates cocoon’s many strengths. when a user searches laneconnex, cocoon sends his or her query to the metasearch application, which then dispatches the request to multiple external, full-text search engines and content stores. some examples of these external resources are uptodate, access medicine, micromedex, pubmed, and md consult. the metasearch application interacts with these external resources through jakarta commons http clients. responses from external resources are turned into w3c document object model (dom) objects, and xpath expressions are used to resolve hit counts from the dom objects. as result counts are returned, they are added to an xml–based result list and returned to cocoon. the power of cocoon becomes evident as the xml– based metasearch result list is combined with a separate display template. this template-based approach affords content curators the ability to directly add, group, and describe metasearch resources using the language and look that is most meaningful to their specific user communities. for example, there are currently eight metasearch templates curated by an informationist in partnership with a target community. curating these templates requires little to no assistance from programmers. in lane’s 2005 interface, a user’s request was sent to the metasearch application, and the application waited five seconds before responding to give external resources a chance to return a result. hit counts in the user interface included a link to refresh and retrieve more results from external resources that had not yet responded. usability studies showed this to be a significant user barrier, since the refresh link was rarely clicked. the initial five second delay also gave users the impression that the site was slow. the laneconnex application makes heavy use of javascript to solve this problem. after a user makes her initial request, javascript is used to poll the metasearch application (through cocoon) on the user’s behalf, popping in result counts as external resources respond. this adds a level of interactivity previously unavailable and makes the metasearch piece of laneconnex much more successful than its previous version. resource metadata laneconnex replaces the catalog as the primary discovery interface. metadata describing locally owned and 34 information technology and libraries | march 2009 licensed resources (journals, databases, books, videos, images, calculators, and software applications) are stored in the library’s current system of record, an instance of the voyager ils. laneconnex makes no attempt to replace voyager ’s strengths as an application for the selection, acquisition, description, and management of access to library resources. it does, however, replace voyager ’s discovery interface. to this end, metadata for about eight thousand digital resources is extracted from voyager ’s oracle database, converted into marcxml, processed with xslt, and stored in a simple relational database (six tables and twenty-nine attributes) to support fast retrieval speed and tight control over search syntax. this extraction process occurs nightly, with incremental updates every five minutes. the oracle text search engine provides functionality anticipated by our internet-minded users. key features are speed and relevance-ranked results. a highly refined results ranking insures that the logical title appears in the first few results. a user ’s query is parsed for wildcard, boolean, proximity, and phrase operators, and then translated into an sql query. results are then transformed into a display version. related services laneconnex compares a user’s query terms against a dictionary. each query is sent to a cocoon spell-checking component that returns suggestions where appropriate. this component currently uses the simple object figure 1. laneconnex architecture. laneconnex | ketchell et al. 35 access protocol (soap)–based spelling service from google. google was chosen over the national center for biotechnology information (ncbi) spelling service because of the breadth of terms entered by users; however, cocoon’s component-oriented architecture would make it trivial to change spell checkers in the future. each query is also compared against stanford’s openurl link resolver (findit@stanford). client-side javascript makes a cocoon-mediated query of findit@stanford. using xslt, findit@stanford responses are turned into javascript object notation (json) objects and popped into the interface as appropriate. although the vast majority of laneconnex searches result in zero findit@stanford results, the convenience of searching all of lane’s systems in a single, unified interface far outweighs the effort of implementation. a commercial analytics tool called webtrends is used to collect web statistics for making data-centric decisions about interface changes. webtrends uses client-side javascript to track specific user click events. libraries need to track both on-site clicks (e.g., the user clicked on “clinical portal” from the home page) and off-site clicks (e.g., the user clicked on “yamada’s gastroenterology” after doing a search for “ibs”). to facilitate off-site click capture, webtrends requires every external link to include a snippet of javascript. requiring content creators to input this code by hand would be error prone and tedious. laneconnex automatically supplies this code for every class of link (search or static). this specialized webtrends method provides lane with data to inform both interface design and licensing decisions. n results laneconnex version 1.0 was released to the stanford biomedical community in july 2006. the current application can be experienced at http://lane.stanford.edu. the figure 2. laneconnex resource search results. resource results are ranked by relevance. single word titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. uniform titles are used to co-locate versions (e.g., the three instances of science from different producers). journals titles are linked to their respective impact factor page in the isi web of knowledge. digital formats that require special players or restrictions are indicated. the metadata searched for ejournals, databases, ebooks, biotools, video, and medcalcs are lane’s digital resources extracted from the integrated library system into a searchable oracle database. the first “all” tab is the combined results of these genres and the lane site help and information. figure 3. laneconnex related services search enhancements. laneconnex includes a spell checker to avoid a common failure in user searches. ajax services allow the inclusion of search results from other sources for common zero results failures. for example, the stanford link resolver database is simultaneously searched to insure online journals outside the scope of biomedicine are presented as a linked result for the user. production version has proven reliable over two years. incremental user focus groups have been employed to improve the interface as issues arose. a series of vignettes will be used to illustrate how the current version of 36 information technology and libraries | march 2009 the “sunetid login” is required. n user query: “new yokrer.” a faculty member is looking for an article in the new yorker for a class reading assignment. he makes a typing error, which invokes the “did you mean?” function (see figure 3). he clicks on the correct spelling. no results are found in the resource search, but a simultaneous search of the link-resolver database finds an instance of this title licensed for the campus and displays a clickable link for the user. n user query: “pathway analysis.” a post–doc is looking for information on how to share an ingenuity pathway. figure 4 illustrates the integration of the locally created lane faqs. faqs comprise a broad spectrum of help and how-to information as described by our focus groups. help text is created in the movable type blog software, and made searchable through the laneconnex application. the movable type interface lowers the barrier to html content creation by any staff member. more complex answers include embedded images and videos to enable the user to see exactly how to do a particular procedure. cocoon allows for the syndication of subsets of this faq content back into static html pages where it can be displayed as both category-specific lists or as the text for scroll-over help for a link. having a single store of help information insures the content is updated once for all instances. n user query: “uterine cancer kapp.” a resident is looking for a known article. laneconnex simultaneously searches pubmed to increase the likelihood of user success (see figure 5). clicking on the pubmed tab retrieves the results in the native interface; however, the user sees the pubmed@stanford version, which includes embedded links to the article based on our openurl link resolver. the ability to retrieve results from bibliographic databases that includes article resolution insures that our biomedical community is always using the correct url to insure maximum full-text article access. user testing in 2007 found that adding the three most frequently used sources (pubmed, google, and lane catalog) into our one-box laneconnex search was a significant time saver. it addresses laneconnex meets the design objectives from the user’s perspective. n user query: “science.” a graduate student is looking for the journal science. the laneconnex results are listed in relevance order (see figure 2). singleword titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. results from local metadata are displayed by uniform title. for example, lane has three instances of the journal science, and each version is linked to the appropriate external store. brief notes provide critical information for particular resources. for example, restricted local patient education documents and video seminars note that figure 4. example of integration of local content stores. help information is managed in moveable type and integrated into laneconnex search results. laneconnex | ketchell et al. 37 the expectation on the part of our users that they could search for an article or a journal title in a single search box without first selecting a database. n user query: “serotonin pulmonary hypertension.” a medical student is looking for the correlation of two topics. clicking on the “clinical” tab, the student sees the results of the clinical metasearch in figure 6. metasearch results are deep searches of sources within licensed packages (e.g., textbooks in md consult or a specific database in micromedex), local content (e.g., stanford’s lab-test database), and openaccess content (e.g., ncbi databases). pubmed results are tailored strategies tiered by evidence. for example, the evidence-summaries strategy retrieves results from twelve clinical-evidence resources (e.g., buj, clinical evidence, and cochrane systematic reviews) that link to the full-text licensed by stanford. an example of the bioresearch metasearch is shown in figure 7. content selected for this audience includes literature databases, funding sources, patents, structures, clinical trials, protocols, and stanford expertise integrated with gene, protein, and phenotype tools. user testing revealed that many users did not click on the “clinical” tab. the clinical metasearch was originally developed for the clinical portal page and focused on clinicians in practice; however, the results needed to be exposed more directly as part of the laneconnex search. figure 8 illustrates the “have you tried?” feature that displays a few relevant clinical-content sources without requiring the user to select the “clinical” tab. this feature is managed by the smartsearch component of the laneconnex system. smartsearch sends the user’s query terms to pubmed, extracts a subset of articles associated with those terms, extracts the mesh headings for those articles, and computes the frequency of headings in the articles to determine the most likely mesh terms associated with the user’s query terms. these mesh terms are mapped to mesh terms associated with each metasearch resource. preliminary evaluation indicates that the clinical content is now being discovered by more users. figure 5. example of integration of popular search engines into laneconnex results. three of the most popular searches based on usage analysis are included at the top level. pubmed and google are mapped to lane’s link resolver to retrieve the full article. creating or editing metasearch templates is a curator driven task. programming is only required to add new sources to the metasearch engine. a curator may choose from more than three hundred sources to create a discipline-based layout using general templates. names, categories, and other description information are all at the curator ’s discretion. while developing new subspecialty templates, we discovered that clinicians were confused by the difference in layout of their specialty portal and their metasearch results (e.g., the cardiology portal used the generic clinical metasearch). to address this issue, we devised an approach that merges a portal and metasearch into a single entity as illustrated in figure 9. a combination of the component-oriented architecture of laneconnex and javascript makes the integration of metasearch results into a new template patterned after a portal easy to implement. this strategy will enable the creation of templates contextually appropriate to knowledge requests originating from electronic medical-record systems in the future. direct user feedback and usage statistics confirm that search is now the dominant mode of navigation. the amount of time each user spends on the website has dropped since the release of version 1.0. we speculate that the integrated search helps our users find relevant 38 information technology and libraries | march 2009 information more efficiently. focus groups with students are uniformly positive. graduate students like the ability to find digital articles using a single search box. medical students like the clinical metasearch as an easy way to look up new topics in texts and customized pubmed searches. bioengineering students like the ability to easily look up patient care–related topics. pediatrics residents and attendings have championed the development of their portal and metasearch focused on their patient population. medical educators have commented on their ability to focus on the best information sources. n discussion a review of websites in 2007 found that most biomedical libraries had separate search interfaces for their digital resources, library catalog, and external databases. biomedical libraries are implementing metasearch software to cross search proprietary databases. the university of california, davis is using the metalib software to federate searching multiple bibliographic databases.8 the university of south california and florida state university are using webfeat software to search clinical textbooks.9 the health sciences library system at the university of pittsburgh is using vivisimo to search clinical textbooks and bioresearch tools.10 academic libraries are introducing new “resource shopping” applications, such as the endeca project at north carolina state university, the summa project at the university of aarhus, and the vufind project at villanova university.11 these systems offer a single query box, faceted results, spell checking, recommendations based on user input, and asynchronous javascript and xml (ajax) for live status information. we believe our approach is a practical integration for our biomedical community that bridges finding a resource and finding a specific item through figure 6. integration of metasearch results into laneconnex. results from two general, role-based metasearches (bioresearch and clinical) are included in the laneconnex interface. the first image shows a clinician searching laneconnex for serotonin pulmonary hypertension. selecting the clinical tab presents the clinical content metasearch display (second image), and is placed deep inside the source by selecting a title (third image). laneconnex | ketchell et al. 39 a metasearch of multiple databases. the laneconnex application searches across digital resources and external data stores simultaneously and presents results in a unified display. the limitation to our approach is that the metasearch returns only hit counts rather than previews of the specific content. standardization of results from external systems, particularly receipt of xml results, remains a challenge. federated search engines do integrate at this level, but are usually slow or limit the number of results. true integration awaits health level seven (hl7) clinical decision support standards and national information standards organization (niso) metasearch initiative for query and retrieval of specific content.12 one of the primary objectives of laneconnex is speed and ease of use. ranking and categorization of results has been very successful in the eyes of the user community. the integration of metasearch results has been particularly successful with our pediatric specialty portal and search. however, general user understanding of how the clinical and biomedical tabs related to the genre tabs in laneconnex has been problematic. we reviewed web engines and found a similar challenge in presenting disparate format results (e.g., video or image search results) or lists of hits from different systems (e.g., ncbi’s entrez search results).13 we are continuing to develop our new specialty portal-and-search model and our smartsearch term-mapping component to further integrate results. n conclusion laneconnex is an effective and openended search infrastructure for integrating local resource metadata and full-text content used by clinicians and biomedical researchers. its effectiveness comes from the recognition that users prefer a single query box with relevance or categorically organized results that lead them to the most likely figure 7. example of a bioresearch metasearch. figure 8. the smartsearch component embeds a set of the metasearch results into the laneconnex interface as “have you tried?” clickable links. these links are the equivalent of selecting the title from a clinical metasearch result. the example search for atypical malignant rhabdoid tumor (a rare childhood cancer) invokes oncology and pediatric textbook results. these texts and pubmed provide quick access for a medical student or resident on the pediatric ward. figure 9. example of a clinical specialty portal with integrated metasearch. clinical portal pages are organized so metasearch hit counts can display next to content links if a user executes a search. this approach removes the dissonance clinicians felt existed between separate portal page and metasearch results in version 1.0. 40 information technology and libraries | march 2009 answer to a question or prospects in their exploration. the application is based on separation of concerns and is easily extensible. new resources are constantly emerging, and it is important that libraries take full advantage of existing and forthcoming content that is tailored to their user population regardless of the source. the next major step in the ongoing development of laneconnex is becoming an invisible backend application to bring content directly into the user’s workflow. n acknowledgements the authors would like to acknowledge the contributions of the entire laneconnex technical team, in particular pam murnane, olya gary, dick miller, rick zwies, and rikke ogawa for their design contributions, philip constantinou for his architecture contribution, and alain boussard for his systems development contributions. references 1. denise t. covey, “the need to improve remote access to online library resources: filling the gap between commercial vendor and academic user practice,” portal libraries and the academy 3 no.4 (2003): 577–99; nobert lossau, “search engine technology and digital libraries,” d-lib magazine 10 no. 6 (2004), www.dlib.org/dlib/june04/lossau/06lossau.html (accessed mar. 1, 2008); oclc, “college students’ perception of libraries and information resource,” www.oclc.org/reports/ perceptionscollege.htm (accessed mar 1, 2008); and jim henderson, “google scholar: a source for clinicians,” canadian medical association journal 12 no. 172 (2005). 2. covey, “the need to improve remote access to online library resources”; lossau, “search engine technology and digital libraries”; oclc, “college students’ perception of libraries and information resource.” 3. jane lee, “uc health sciences metasearch exploration. part 1: graduate student gocus group findings,” uc health sciences metasearch team, www.cdlib.org/inside/assess/ evaluation_activities/docs/2006/draft_gradreport_march2006. pdf (accessed mar. 1, 2008). 4. karen k. grandage, david c. slawson, and allen f. shaughnessy, “when less is more: a practical approach to searching for evidence-based answers,” journal of the medical library association 90 no. 3 (2002): 298–304. 5. nicola cannata, emanuela merelli, and russ b. altman, “time to organize the bioinformatics resourceome,” plos computational biology 1 no. 7 (2005): e76. 6. craig silverstein et al., “analysis of a very large web search engine query log,” www.cs.ucsb.edu/~almeroth/ classes/tech-soc/2005-winter/papers/analysis.pdf (accessed mar. 1, 2008); anne aula, “query formulation in web information search,” www.cs.uta.fi/~aula/questionnaire.pdf (accessed mar. 1, 2008); jorge r. herskovic, len y. tanaka, william hersh, and elmer v. bernstam, “a day in the life of pubmed: analysis of a typical day’s query log,” journal of the american medical informatics association 14 no. 2 (2007): 212–20. 7. herskovic, “a day in the life of pubmed.” 8. davis libraries university of california, “quicksearch,” http://mysearchspace.lib.ucdavis.edu/ (accessed mar. 1, 2008). 9. eileen eandi, “health sciences multi-ebook search,” norris medical library newsletter (spring 2006), norris medical library, university of southern california, www.usc.edu/hsc/ nml/lib-information/newsletters.html (accessed mar. 1, 2008); maguire medical library, florida state university, “webfeat clinical book search,” http://med.fsu.edu/library/tutorials/ webfeat2_viewlet_swf.html (accessed mar. 1, 2008). 10. jill e. foust, philip bergen, gretchen l. maxeiner, and peter n. pawlowski, “improving e-book access via a librarydeveloped full-text search tool,” journal of the medical library association 95 no. 1 (2007): 40–45. 11. north carolina state university libraries, “endeca at the ncsu libraries,” www.lib.ncsu.edu/endeca (accessed mar. 1, 2008); hans lund, hans lauridsen, and jens hofman hansen, “summa—integrated search,” www.statsbiblioteket.dk/ publ/summaenglish.pdf (accessed mar. 1, 2008); falvey memorial library, villanova university, “vufind,” www.vufind.org (accessed mar. 1, 2008). 12. see the health level seven (hl7) clinical decision support working committee activities, in particular the infobutton standard proposal at www.hl7.org/special/committees/dss/ index.cfm and the niso metasearch initiative documentation at www.niso.org/workrooms/mi (accessed mar 1, 2008). 13. national center for biotechnology information (ncbi) entrez cross-database search, www.ncbi.nlm.nih.gov/entrez (accessed mar. 1, 2008). acrl 5 alcts 15 lita cover 2, cover 3 jaunter cover 4 index to advertisers 60 information technology and libraries | june 2011 b ecause this is a family program and because we are all polite people, i can’t really use the term i want to here. let’s just say that i am an operating system [insert term here for someone who is highly promiscuous]. i simply love to install and play around with various operating systems, primarily free operating systems (oses), primarily linux distributions. and the more exotic, the better, even though i always dutifully return home at the end of the evening to my beautiful and beloved ubuntu. in the past year or two i can recall installing (and in some cases actually using) the following: gentoo, mint, fedora, debian, moonos, knoppix, damn small linux, easypeasy, ubuntu netbook remix, xubuntu, opensuse, netbsd, sabayon, simplymepis, centos, geexbox, and reactos. (aside from stock ubuntu and all things canonical, the one i keep a constant eye on is moonos [http://www.moonos.org/], a stunningly beautiful and eminently usable ubuntu-based remix by a young artist and programmer in cambodia, chanrithy thim.) in the old days i would have rustled up an old, sloughed-off pc to use as an experimental “server” upon which i would unleash each of these oses, one at a time. but those were the old days, and these are the new days. my boss kindly bought me a big honkin’ windows-based workstation about a year and a half ago, a box with plenty of processing power and memory (can you even buy a new workstation these days that’s not incredibly powerful, and incredibly inexpensive?), so my need for hardware above and beyond what i use in my daily life is mitigated. specifically, it’s mitigated through use of virtual machines. i have long used virtualbox (http://www.virtualbox .org/) to create virtual machines (vms), lopped-off hunks of ram and disk space to be used for the installation of a completely different os. with virtualbox, you first describe the specifications of the vm you’d like to create—how much of the host’s ram to provide, how large a virtual hard disk, boot order, access to host cd drives, usb devices, etc. you click a button to create it, then you install an os onto it, the “guest” os, in the usual way. (well, not exactly the usual way; it’s actually easier to install an os here because you can boot directly from a cd image, or iso file, negating the need to mess with anything so distasteful and old-fashioned and outre as an actual, physical cd-rom.) in my experience, you can create a new vm in mere seconds; then it’s all a matter of how difficult the os is to install, and the linux distributions are becoming easier and easier to install as the months plow on. at any rate, as far as your new os is concerned, it is being installed on bare metal. virtual? real? for most intents and purposes the guest os knows no difference. in the titillatingly dangerous and virus-ridden cyberworld in which we live, i’ll not mention the prophylactic uses of vms because, again, this is a family program and we’re all polite people. suffice it to say, the typical network connection of a vm is nated behind the nic of the host machine, so at least as far as active network– based attacks are concerned, your guest vm is at least as secure as its host, even more so because it sits in its own private network space. avoiding software-based viruses and trojans inside your vm? let’s just say that the wisdom passed down the cybergenerations still holds: when it rains, you wear a raincoat—if you see what i’m saying. aside from enabling, even promoting my shameless os promiscuity, how are vms useful in an actual work setting? for one, as a longtime windows guy, if i need to install and test something that is *nix-only, i don’t need a separate box with which to do so. (and vice versa too for all you unix-weaned ladies and gentlemen who find the need to test something on a rocker from redmond.) if there is a software dependency on a particular os, a particular version of a particular os, or even if the configuration of what i’m trying to test is so peculiar i just don’t want to attempt to mix it in with an existing, stable vm, i can easily and painlessly whip up a new instance of the required os and let it fly. and deleting all this when i’m done is easily accomplished within the virtualbox gui. using a virtual machine facilitates the easy exploration of new operating systems and new applications, and moving toward using virtual machines is similar to when i first started using a digital camera. you are free to click click click with no further expense accrued. you don’t like what you’ve done? blow it away and begin anew. all this vm business has spread, at my home institution, from workstation to data center. i now run both a development and test server on vms physically sitting on a massive production server in our data center—the kind of machine that when switched on causes a brown-out in the tri-state area. this is a very efficient way to do things though because when i needed access to my own server, our system administrator merely whipped up a vm for me to use. to me, real or virtual, it was all the same; to the system administrator, it greatly simplified operations. and i may joke about the loud clank of the host server’s power switch and subsequent dimming of the lights, but doing things this way has been shown to be more energy efficient than running a server farm in which each server editorial board thoughts: just like being there, or how i learned to stop coveting bare metal and learned to love my vm mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland. mark cyzyk editorial board thoughts | cyzyk 61 virtual machines: zero-cost playgrounds for the promiscuous, and energy efficient, staff saving tools for system operations. what’s not to like? throw dual monitors into the mix (one for the host os; one for the guest), and it’s just like being there. sucks in enough juice to quench the thirst of its redundant power supplies. (they’re redundant, they repeat themselves; they’re redundant, they repeat themselves—so you don’t want too many of them around slurping up the wattage, slurping up the wattage . . . ) 150 information technology and libraries | december 2011 hardly a day goes by in my professional life (and it sometimes creeps into my personal life too!) when i don’t think about the issues of connecting people with data, and then how to present that data in ways that are relevant to their needs. the tides are shifting in health sciences library and likely in your library too. ongoing changes in publishing and the changing nature of research have challenged the traditional nature of the library. it is no longer solely a repository for information, physical or virtual. as librarians move from collecting and cataloging bibliographic information new roles have emerged in data discovery, in its preservation, and in helping to make data more accessible. important specialties include; knowledge management, data visualization, e-science and copyright. librarians have valuable skills sets in mining and accessing data, human–computer interaction, computer interface design, and knowledge management that can be leveraged now. it is inevitable that data discovery will quicken the pace of science and lead to collaboration and collaboration will in turn lead to data discovery and accelerate the pace of science and so on and so on. in short twentieth century data stored in individual scientists’ notebooks or computers is largely inaccessible. twenty-first-century data needs to be available 24/7 in a curated state for continuous analysis. information overload and data deluge created by intersection of science and technology are two very real problems that the librarians have the skill and ability to deal with. and, as i talk of science, bear in mind that it extends beyond the biological and physical sciences to encompass the social sciences as well. interdisciplinary studies in particular have intensive data needs. in fields such as public health and urban planning, government data alongside research data is used to predict trends, forecast, make decisions, etc. government data is a particularly important part of the equation. consider the recent nsf requirement for researchers to provide open access to their data for any nsf-sponsored grants. it is likely other government agencies will follow suit. one of taiga’s provocative statements of 2011 is “#10. the oversupply of mlss” which states that “within five years, library programs will have overproduced mlss at a rate greater even than humanities phds and glutted a permanently diminished market.”1 as the alarming scenario of an over abundance of new mlss in proportion to available library jobs presents itself, i encourage librarians to begin to envision themselves as digital information brokers or data scientists. the us department of labor in the 2010–11 occupational outlook handbook, anticipates that librarian jobs in nontraditional settings will grow the fastest over this decade. nontraditional libraries and jobs include working as information brokers for private corporations, nonprofit organizations, and consulting firms. “many companies are turning to librarians because of l ast week i attended the second annual vivo conference in washington, d.c. vivo (vivoweb .org) is a semantic web application that enables the discovery of research and scholarship across disciplines in an institution with the potential to also link scholars and research across institutions. despite an earthquake and a hurricane the conference itself was the real showstopper—excellent, informative programming, engaging speakers, great networking and exchange of ideas. my institution is one of the core vivo members so it was an opportunity to showcase our work, see what others are doing as well as learn more about trends in research, e-science and data discovery and collaboration initiatives. much of what i learned or rediscovered at vivo will make it into my fifty-minute presentation on the subject at the lita national forum in st. louis later this month. in fact the vivo conference itself reminded me of our own national forum in size, scope and content. it was a good mix of in-depth technical discussions coupled with broad coverage of issues and trends in scientific research. this attention to content balance is something that lita consistently gets right at our annual forum—there is literally something for everyone from introductory concepts to technical details—and i look forward to seeing many familiar faces and meeting some new folks at this year’s lita national forum in st. louis “rivers of data: currents of change.” i would also like to take this opportunity to personally invite each and every ital reader to the 2012 lita national forum. building on this year’s theme, the 2012 lita national forum will be “the new world of data: discover. connect. remix.” i just signed off on theme this week and i am excited and impressed by the work completed by the national forum planning committee so far. please look for the call for papers and posters to come out in late december. i love the forum because it is much more intimate than the much larger ala meetings i always come away with new ideas and new friends. i am not alone in this feeling. a recent forum attendee commented,” (the lita forum) was one of the best conferences i have attended. i met a far greater concentration of peers—colleagues at other libraries doing similar work—at lita forum than i have met at other similar conferences.” i don’t think i could say it better myself. the 2012 forum theme is one of great personal interest to me and i plan to extend the theme to the lita president’s program on june 24, 2012, in anaheim. in fact colleen cuddypresident’s message: data discovery colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york president’s message | cuddy 151 column a call to arms for librarians of all backgrounds. the time to address data discovery is now! references 1. “taiga 2011 provocative statements,” http://taigaforum provocativestatements.blogspot.com/ (accessed sept. 22, 2011). 2. united states department of labor, bureau of labor statistics, occupational outlook handbook, 2010–11 edition, http:// www.bls.gov/oco/ocos068.htm (accessed sept. 22, 2011). their research and organizational skills and their knowledge of computer databases and library automation systems. librarians can review vast amounts of information and analyze, evaluate, and organize it according to a company’s specific needs.” 2 we have been seeing new job titles emerging to reflect these needs, such as data curation librarian, digital data outreach librarian, gis librarian, etc. what is your library doing with data? how can you and your library address the data needs of the twenty-first century? what technology is needed to address data needs? how can lita help you meet those needs? consider this statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2010 issue). total number of copies printed: average, 4,547; actual, 4,494. mailed outside country paid subscriptions: average, 3,608; actual, 3,577. sales through dealers and carriers, street vendors, and counter sales: average, 395; actual 367. total paid distribution: average, 4,003; actual, 3,944. free or nominal rate copies mailed at other classes through the usps: average, 27; actual, 27. free distribution outside the mail (total): average, 118; actual, 117. total free or nominal rate distribution: average, 145; actual, 144. total distribution: average, 4,148; actual, 4,088. office use, leftover, unaccounted, spoiled after printing: average, 399; actual, 406. total: average, 4,547; actual, 4,494. percentage paid: average, 96.50; actual, 96.48. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 11 . 110 information technology and libraries | september 2009 employing virtualization in library computing: use cases and lessons learned arwen hutt, michael stuart, daniel suchy, and bradley d. westbrook this paper provides a broad overview of virtualization technology and describes several examples of its use at the university of california, san diego libraries. libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. this paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure. o perating system virtualization, herein referred to simply as “virtualization,” is a powerful and highly adaptable solution to several library technology challenges, such as managing computer labs, automating cataloging and other procedures, and demonstrating new library services. virtualization has been used in one manner or another for decades,1 but it is only within the last few years that this technology has made significant inroads into library environments. virtualization technology is not without its drawbacks, however. libraries need to assess their needs, as well as the resources required for virtualization, before embarking on large-scale implementations. this paper provides a broad overview of virtualization technology and explains its benefits and drawbacks by describing some of the ways virtualization has been used at the university of california, san diego (ucsd) libraries.2 n virtualization overview virtualization is used to partition the physical resources (processor, hard drive, network card, etc.) of one computer to run one or more instances of concurrent, but not necessarily identical, operating systems (oss). traditionally only one instance of an operating system, such as microsoft windows, can be used at any one time. when an operating system is virtualized—creating a virtual machine (vm)—the vm communicates through virtualization middleware to the hardware or host operating system. this middleware also provides a consistent set of virtual hardware drivers that are transparent to the enduser and to the physical hardware. this allows the virtual machine to be used in a variety of heterogeneous environments without the need to reconfigure or install new drivers. with the majority of hardware and compatibility requirements resolved, the computer becomes simply a physical presentation medium for a vm. n two approaches to virtualization: host-based vs. hypervisor virtualization can be implemented using type 1 or type 2 hypervisor architectures. a type 1 hypervisor (figure 1), commonly referred to as “host-based virtualization,” requires an os such as microsoft windows xp to host a “guest” operating system like linux or even another version of windows. in this configuration, the host os treats the vm like any other application. host-based virtualization products are often intended to be used by a single user on workstation-class hardware. in the type 2 hypervisor architecture (figure 2), commonly referred to as “hypervisor-based virtualization,” the virtualization middleware interacts with the computer’s physical resources without the need of a host operating system. such systems are usually intended for use by multiple users with the vms accessed over the network. realizing the full benefits of this approach requires a considerable resource commitment for both enterprise-class server hardware and information technology (it) staff. n use cases archivists’ toolkit the archivists’ toolkit (at) project is a collaboration of the ucsd libraries, the new york university libraries, and the five colleges libraries (amherst college, hampshire college, mt. holyoke college, smith college, and university of massechusetts, amherst) and is funded by the andrew w. mellon foundation. the at is an open-source archival data management system that provides broad, integrated support for the management of archives. it consists of a java client that connects to a relational database back-end (mysql, mssql, or oracle). the database can be implemented on a networked server or a single workstation. since its initial release in december 2006, the at has sparked a great deal of interest and rapid uptake of the application within the archival community. this growing interest has, in turn, created an increased demand for demonstrations of the product, workshops and training, and simpler methods for distributing the application. (of the use cases described here, the two for the at arwen hutt (ahutt@ucsd.edu) is metadata specialist, michael stuart (mstuart@ucsd.edu) is information technology analyst, daniel suchy (dsuchy@ucsd.edu) is public services technology analyst, and bradley d. westbrook (bradw@library.ucsd.edu) is metadata librarian and digital archivist, university of california, san diego libraries. employing virtualization in library computing | hutt et al. 111 distribution and laptop classroom are exploratory, whereas the rest are in production.) at workshops the society of american archivists sponsors a two-day at workshop occurring on multiple dates at several locations. in addition, the at team provides oneand two-day workshops to different institutional audiences. at workshops are designed to give participants a hands-on experience using the at application. accomplishing this effectively requires, at the minimum, supplying all participants with identical but separate databases so that participants can complete the same learning exercises simultaneously and independently without concern for working in each other’s space. in addition, an ideal configuration would reduce the workload of the instructors, freeing them from having to set up the at instructional database onsite for each workshop. for these workshops we needed to do the following: n provide identical but separate databases and database content for all workshop attendees n create an easily reproducible installation and setup for workshops by preparing and populating the at instructional database in advance virtualization allows the at workshop instructors to predefine the workstation configuration, including the installation and population of the at databases, prior to arriving at the workshop site. to accomplish this we developed a workshop vm configuration with mysql and the at client installed within a linux ubuntu os. the workshop instructors then built the at vm with the data they require for the workshop. the at client and database are loaded on a dvd or flash drive and shipped to the classroom managers at the workshop sites, who then need only to install a copy of the vm and the freely available vmplayer software (necessary to launch the at vm) onto each workstation in the classroom. the at vm, once built, can be used many times both for multiple workstations in a classroom as well as for multiple workshops at different times and locations. this implementation has worked very well, saving both time and effort for the instructors and classroom support staff by reducing the time and communication figure 1. a type 1 hypervisor (host-based) implementation figure 2. a type 2 hypervisor-based implementation 112 information technology and libraries | september 2009 necessary for deploying and reconfiguring the vm. it also reduces the chances that there will be an unexpected conflict between the application and the host workstation’s configuration. but the method is not perfect. more than anything else, licensing costs motivated us to choose linux as the operating system instead of a proprietary os such as windows. this reduces the cost of using the vm, but it also requires workshop participants to use an os with which they are often unfamiliar. for some participants, unfamiliarity with linux can make the workshop more difficult than it would be if a more ubiquitous os was used. at demonstrations in a similar vein, members of the at team are often called upon to demonstrate the application at various professional conferences and other venues. these demonstrations require the setup and population of a demonstration database with content for illustrating all of the application’s functions. one of the constraints posed by the demonstration scenario is the importance of using a local database instance rather than a networked instance, since network connections can be unreliable or outright unavailable (network connectivity being an issue we’ve all faced at conferences). another constraint is that portions of the demonstrations need some level of preparation (for example, knowing what search terms will return a nonempty result set), which must be customized for the unique content of a database. a final constraint is that, because portions of the demonstration (import and data merging) alter the state of the database, changes to the database must be easily reversible, or else new examples must be created before the database can be reused. building on our experience of using virtualization to implement multiple copies of an at installation, we evaluated the possibility of using the same technology for simplifying the setup necessary for demonstrating the at. as with the workshops, the use of a vm for at demonstrations allows for easy distribution of a prepopulated database, which can be used by multiple team members at disparate geographic locations and on different host oss. this significantly reduces the cost of creating (and recreating) demonstration databases. in addition, demonstration scripts can be shared between team members, creating additional time savings as well as facilitating team participation in the development and refinement of the demonstration. perhaps most important is the ability to roll back the vm to a specific state or snapshot of the database. this means the database can be quickly returned to its original state after being altered during a demonstration. overall, despite our initial anxiety about depending on the vm for presentations to large audiences, this solution has proven very useful, reliable, and cost-effective. at distribution implementing the at requires installing both the toolkit client and a database application such as mysql, instantiating an at database, and establishing the connection between database and client. for many potential customers of the at, the requirements for database creation and management can be a significant barrier due to inexperience with how such processes work and a lack of readily available it resources. many of these customers simply desire a plug-and-play version of the application that they can install and use without requiring technical assistance. it is possible to satisfy this need for a plug-and-play at by constructing a vm containing a fully installed and ready-to-use at application and database instance. this significantly reduces the number and difficulty of steps involved in setting up a functional at instance. the customer would only need to transfer the vm from a dvd or other source to their computer, download and install the vm reader, and then launch the at vm. they would then be able to begin using the at immediately. this removes the need for the user to perform database creation and management; arguably the most technically challenging portion of the setup process. users would still have the option of configuring the application (default values, lookup lists, etc.) in accord with the practices of their repository. batch processing catalog records the rapid growth of electronic resources is significantly changing the nature of library cataloging. not only are types of library materials changing and multiplying, the amount of e-resources being acquired increases each year. electronic book and music packages often contain tens of thousands of items, each requiring some level of cataloging. because of these challenges, staff are increasingly cataloging resources with specialized programs, scripts, and macros that allow for semiautomated record creation and editing. such tools make it possible to work on large sets of resources—work that would not be financially possible to perform manually item by item. however, the specialized configuration of the workstation required for using these automated procedures makes it very difficult to use the workstation for other purposes at the same time. in fact, user interaction with the workstation while the process is running can cause a job to terminate prior to completion. in either scenario, productivity is compromised. virtualization offers an excellent remedy to this problem. a virtual machine configured for semiautomated batch processing allows for unused resources on the workstation to process the batch requests in an isolated environment while, at the same time and on the same machine, the user is able to work on other tasks. in cases employing virtualization in library computing | hutt et al. 113 where the user’s machine is not an ideal candidate for virtualization, the vm can be hosted via a hypervisorbased solution, and the user can access the vm with familiar remote access tools such as remote desktop in windows xp. secure sandbox in addition to challenges posed by increasingly large quantities of acquisitions, the ucsd libraries is also encountering an increasing variety of library material types. most notable is the variety and uniqueness of digital media acquired by the library, such as specialized programs to process and view research data sets, new media formats and viewers, and application installers. cataloging some of these materials requires that media be loaded and that applications be installed and run to inspect and validate content. but running or opening these materials, which are sometimes from unknown sources, poses a security risk to both the user’s workstation and to the larger pool of library resources accessible via the network. many installers require a user to have administrative privileges, which can pose a threat to network security. the virtual machine allows for a user to have administrative privileges within the vm, but not outside of the vm. the user can be provided with the privileges needed for installing and validating content without modifying their privileges on the host machine. in addition, the vm can be isolated by configuring its network connection so that any potential security risks are limited to the vm instance and do not extend to either the host machine or the network. laptop classroom instructors at the ucsd libraries need a laptop classroom that meets the usual requirements for this type of service (mobility, dependability, etc.) but also allows for the variety of computing environments and applications in use throughout our several library locations. in a least-common-denominator scenario, computers are configured to meet a general standard (usually microsoft windows with a standard browser and office suite) and allow minimal customization. while this solution has its advantages and is easy to configure and maintain from the it perspective, it leaves much to be desired for an instructor who needs to use a variety of tools in the classroom, often on demand. the goal in this case is not to settle for a single generic build but instead look for a solution that accommodats three needs: n the ability to switch quickly between different customized os configurations n the ability to add and remove applications on demand in a classroom setting n the ability to restore a computer modified during class to its original state of course, regardless of the approach taken, the laptops still needed to retain a high level of system security, application stability, and regular hardware maintenance. after a thorough review of the different technologies and tools already in use in the libraries, we determined that virtualization might also serve to meet the requirements of our laptop classroom. the need to support multiple users and multiple vms makes this scenario an ideal candidate for hypervisor-based virtualization. we decided to use vdi (virtual desktop infrastructure), a commercially available hypervisor product from vmware. vmware is one of the largest providers of virtualization software, and we were already familiar with several iterations of its host-based vm services. the core of our project plan consists of a base vm to be created and managed by our it department. to support a wide variety of applications and instruction styles, instructors could create a customized vm specific to their library’s instruction needs with only nominal assistance from it staff. the custom vm would then be made available on demand to the laptops from a central server (as depicted in figure 2 above). in this manner, instructors could “own” and maintain a personal instructional computing environment, while the classroom manager could still ensure the laptop classroom as a whole maintained the necessary secure software environment required by it. as an added benefit, once these vms are established, they could be accessed and used in a variety of diverse locations. n considerations for implementation before implementing any virtualization solution, in-depth analysis and testing is needed to determine which type of solution, if any, is appropriate for a specific use case in a specific environment. this analysis should include three major areas of focus: user experience, application performance in the virtualized environment, and effect on the enterprise infrastructure. in this section of this paper, we review considerations that, in hindsight, we would have found to be extremely valuable in the ucsd libraries’ various implementations of virtualization. user experience traditionally, system engineers have developed systems and tuned performance according to engineering metrics (e.g., megabytes per second and network latency). while such metrics remain valuable to most assessments of a 114 information technology and libraries | september 2009 computer application, performance assessments are being increasingly defined by usability and user experience factors. in an academic computing environment, especially in areas such as library computer labs, these newer kinds of performance measures are important indicators of how effectively an application performs and, indirectly, of how well resources are being used. virtualization can be implemented in a way that allows library users to have access to both the virtualized and host oss or to multiple virtualized oss. since virtualization essentially creates layers within the workstation, multiple os layers (either host or virtualized) can cause the users to become confused as to which os they are interacting with at a given moment. in that kind of implementation, the user can lose his or her way among the host and guest oss as well as become disoriented by differing features of the virtualized oss. for example, the user may choose to save a file to the desktop, but may not be aware that the file will be saved to the desktop of the virtualized os and not the host os. external device support can also be problematic for the end user, particularly with regard to common devices such as flash drives. the user needs to be aware of which operating system is in use, since it is usually the only one with which an external device is configured to work. authentication to a system is another example of how the relationship between the host and guest os can cause confusion. the introduction of a second os implicitly creates a second level of authentication and authorization that must be configured separately from that of the host os. user privileges may differ between the host and guest os for a particular vm configuration. for instance, a user might need to remember two logins or at least enter the same login credentials twice. these unexpected differences between the host and guest os produce negative effects on a user’s experience. this can be a critical factor in a time-sensitive environment such as a computer lab, where the instructor needs to devote class time to teaching and not to preparing the computers for use and navigating students through applications. interface latency and responsiveness latency (meaning here the responsiveness or “sluggishness” of the software application or the os) in any interface can be a problem for usability. developers devote a significant amount of time to improving operating systems and application interfaces to specifically address this issue. however, users will often be unable to recognize when an application is running a virtualized os and will thus expect virtualized applications to perform with the same responsiveness as applications that are not-virtualized. in our experience, some vm implementations exhibit noticeable interface latency because of inherent limitations of the virtualization software. perhaps the most notable and restrictive limitation is the lack of advanced 3d video rendering capability. this is due to the lack of support for hardware-accelerated graphics, thus adding an extra layer of communication between the application and the video card and slowing down performance. in most hardware-accelerated 3d applications (e.g., google earth pro or second life), this latency is such a problem that the application becomes unusable in a virtualized environment. recent developments have begun to address and, in some cases, overcome these limitations.3 in every virtualization solution there is overhead for the virtualization software to do its job and delegate resources. in our experience, this has been found to cause an approximately 10–20 percent performance penalty. most applications will run well with little or moderate changes to configuration when virtualized, but the overhead should not be overlooked or assumed to be inconsequential. it is also valuable to point out that the combination of applications in a vm, as well as vms running together on the same host, can create further performance issues. traditional bottlenecks the bottlenecks faced in traditional library computing systems also remain in almost every virtualization implementation. general application performance is usually limited by the specifications of one or more of the following components: processor, memory, storage, and network hardware. in most cases, assuming adequate hardware resources are available, performance issues can be easily addressed by reconfiguring the resources for the vm. for example, a vm whose application is memorybound (i.e., performance is limited by the memory available to the vm), can be resolved by adjusting the amount of memory allocated to the vm. a critical component of planning a successful virtualization deployment includes a thorough analysis of user workflow and the ways in which the vm will be utilized. although the types of user workflows may vary widely, analysis and testing serve to predict and possibly avoid potential bottlenecks in system performance. enterprise impact when assessing the effect virtualization will have on your library infrastructure, it is important to have an accurate understanding of the resources and capabilities that will form the foundation for the virtualized infrastructure. it is a misconception that it is necessary to purchase stateof-the-art hardware to implement virtualization. not only are organizations realizing how to utilize existing hardware better with virtualization for specific projects, they are discovering that the technology can be extended employing virtualization in library computing | hutt et al. 115 to the rest of the organization and be successfully integrated into their it management practices. virtualization does, however, impose certain performance requirements for large-scale deployments that will be used in a 24/7 production environment. in such scenarios, organizations should first compare the level of performance offered by their current hardware resources with the performance of new hardware. the most compelling reasons to buy new servers include the economies of scale that can be obtained by running more vms on fewer, more robust servers, as well as the enhanced performance supplied by newer, more virtualization-aware hardware. in addition, virtualization allows for resources to be used more efficiently, resulting in lower power consumption and cooling costs. also, the network is often one of the most overlooked factors when planning a virtualization project. while a local virtualized environment (i.e., a single computer) may not necessarily require a high performance network environment, any solution that calls for a hypervisor-based infrastructure requires considerable planning and scaling for bandwidth requirements. the current network hardware available in your infrastructure may not perform or scale adequately to meet the needs of this vm use. again, this highlights the importance of thorough user workflow analyses and testing prior to implementation. depending on the scope of your virtualization project, deployment in your library can potentially be expensive and can have many indirect costs. while the initial investment in hardware is relatively easy to calculate, other factors, such as ongoing staff training and system administration overhead, are much more difficult to determine. in addition, virtualization adds an additional layer to oftentimes already complex software licensing terms. to deal with the increased use of virtualization, software vendors are devoting increasing attention to the intricacies of licensing their products for use in such environments. while virtualization can ameliorate some licensing constraints (as noted in the at workshop use case), it can also conceal and promote licensing violations, such as multiple uses of a single-license applications or access to license-restricted materials. license review is a prudent and highly recommended component of implementing a virtualization solution. finally, concerning virtualization software itself, it also should be noted that while commercial vm companies usually provide plentiful resources for aiding implementation, several worthy open-source options also exist. as with any opensource software, the total cost of operation (e.g., the costs of development, maintenance, and support) needs to be considered. n conclusion as our use cases illustrate, there are numerous potential applications and benefits of virtualization technology in the library environment. while we have illustrated a number of these, many more possibilities exist, and further opportunities for its application will be discovered as virtualization technology matures and is adapted by a growing number of libraries. as with any technology, there are many factors that must be taken into account to evaluate if and when virtualization is the right tool for the job. in short, successful implementation of virtualization requires thoughtful planning. when so implemented, virtualization can provide libraries with cost-effective solutions to long-standing problems. references and notes 1. alessio gaspar et al., “the role of virtualization in computing education,” in proceedings of the 39th sigcse technical symposium on computer science education (new york: acm, 2008): 131–32; paul ghostine, “desktop virtualization: streamlining the future of university it,” information today 25, no. 2 (2008): 16; robert p. goldberg, “formal requirements for virtualizable third generation architectures,” in communications of the acm 17, no. 7 (new york: acm, 1974): 412–21; and karissa miller and mahmoud pegah, “virtualization: virtually at the desktop,” in proceedings of the 35th annual acm siguccs conference on user services (new york: acm, 2007): 255–60. 2. for other, non–ucsd use cases of virtualization, see joel c. adams and w. d. laverell, “configuring a multi-course lab for system-level projects,” sigcse bulletin 37, no. 1 (2005): 525–29; david collins, “using vmware and live cd’s to configure a secure, flexible, easy to manage computer lab environment,” journal of computing for small colleges 21, no. 4 (2006): 273–77; rance d. necaise, “using vmware for dual operating systems,” journal of computing in small colleges 17, no. 2 (2001): 294–300; and jason nieh and chris vaill, “experiences teaching operating systems using virtual platforms and linux,” sigcse bulletin 37, no 1 (2005): 520–24. 3. h. andrés lagar-cavilla, “vmgl (formerly xen-gl): opengl hardware 3d acceleration for virtual machines,” www .cs.toronto.edu/~andreslc/xen-gl/ (accessed oct. 21, 2008). contactless services: a survey of the practices of large public libraries in china article contactless services a survey of the practices of large public libraries in china yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14141 yajun guo (yadon0619@hotmail.com) is professor, school of information management, zhengzhou university of aeronautics. zinan yang (yangzinan612@163.com) is master, school of information management, zhengzhou university of aeronautics. yiming yuan (yuanyiming361@163.com) is master, school of information management, zhengzhou university of aeronautics. huifang ma (mahuifang126@126.com) is master, school of information management, zhengzhou university of aeronautics. *corresponding author hamed yan quan liu (liuy1@southernct.edu) is professor, department of information and library science, southern connecticut state university. © 2022. abstract contactless services have become a common way for public libraries to provide services. as a result, the strategy used by public libraries in china will effectively stop the spread of epidemics caused by human touch and will serve as a model for other libraries throughout the world. the primary goal of this study is to gain a deeper understanding of the contactless service measures provided by large chinese public libraries for users in the pandemic era, as well as the challenges and countermeasures for providing such services. the data for this study was obtained using a combination of website investigation, content analysis, and telephone interviews for an analytical survey study of 128 large public libraries in china. the study finds that touch-free information dissemination, remote resources use, no-touch interaction self-services, network services, online reference, and smart services without personal interactions are among the contactless services available in chinese public libraries. exploring the current state of contactless services in large public libraries in china will help to fill a need for empirical attention to contactless services in libraries and the public sector. up-to-date information to assist libraries all over the world in improving their contactless services implementation and practices is provided. introduction the spread of covid-19 began in 2020, and people all over the world are still fighting the severity of its spread, the breadth of its impact, and the extent of its endurance. the virus’s continued spread has had a wide-ranging impact on industry sectors worldwide, including libraries. the growth of public libraries has also seen significant changes as a result of covid-19, resulting in added patron services, including contactless services. contactless services are those that patrons can use without having to interact face to face with librarians. these services transcend time and geographical constraints, as well as lower the danger of disease transmission through human interaction. since the covid-19 pandemic, contactless or touch-free interaction services are emerging in chinese public libraries. this service model can also serve as a reference for other libraries. this study evaluates and analyzes contactless service patterns in large public libraries in china, and then suggests a contactless service framework for public libraries, which is currently in the process of being implemented. mailto:yadon0619@hotmail.com mailto:yangzinan612@163.com mailto:mahuifang126@126.com mailto:liuy1@southernct.edu information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 3 literature review the available literature shows that the term “non-contact” appeared as early as 1916 in the article “identification of the meningococcus in the naso-pharynx with special reference to serological reactions” and described a patient’s infection in the context of medical research.1 in recent years, with the widespread application of “internet +” and the development and promotion of technologies such as the internet of things, cloud computing, and artificial intelligence, the contactless economy has grown by leaps and bounds, and so has the research on library contactless services.2 library contactless services encompass a wide range of services such as selfservices, online reference, and smart services without personal interactions. library self-service has become a major service model for contact-free services. the self-service model was first adopted in american public libraries in the 1970s with the emergence of self service borrowing and returning practices.3 many public libraries have since adopted stand-alone, fully automated self-service halls, self-service counters, etc.4 by the 1990s, a range of commercial self-service kiosks and self-service products had been introduced.5 currently, the most mature self-service type used by the library community is the circulation self-service product.6 in addition to self-service borrowing and returning of titles, libraries have launched self-service printing systems, self-service computer systems, and self-service booking of study spaces.7 as an example, patrons can complete printing operations using a self-service system and can offer payment by bank card, alipay, wechat, and other means.8 a face recognition system can also be used to borrow and return books, a solution for patrons who forget their library cards.9 these library selfservice system elements are confined to simple, repetitive, and routine tasks such as conducting book inventories, book handling, circulating books, and the like, whose development stems from the widespread application of electronic magnetic stripe technology and radio frequency identification (rfid), optical character recognition (ocr) technology, and face recognition.10 new applications of technology continue to advance the development of contactless services in libraries. the overall work and service processes of the library have been made intelligent to varying degrees. online reference is an important service in the contactless service program. researchers have started to study the current state of library reference services. interactive online reference services support patrons using the library, including how to search for literature, locate and renew books, schedule a study or seminar room, and participate in other library activities, such as seminars, lectures, etc.11 in response to the problem of how patrons access various library service abilities, digital reference systems need to have functions such as automated semantic processing, automated scene awareness, through automatic calculation and adaptive matching, understanding of patrons’ interests preferences and needs, and the ability to recommend the most suitable information resources for them.12 at present, most library reference services in china mainly include the use of telephone, email, wechat, robot librarians/interactive communication, microblogs, and qq, an instant messaging software popular in china. during the past two years, most public libraries in china have essentially implemented the use of the aforementioned reference tools to communicate and interact with patrons, with wechat having a 55.6% adoption rate when compared to other instant reference tools.13 the use of online chat in reference services has allowed librarians to help patrons from anywhere and at any time through embedding chat plug-ins into multiple pages of the library website and directing patrons to ask questions based on the specific page they are viewing, setting up automatic pop-up chat windows, and changing patrons’ passive waiting to active engagement. 14 in terms of technology, emerging technologies information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 4 such as patron profiling, natural language processing, and contextual awareness can support the development of reference advisory services in libraries.15 the online reference service provides a 24/7, high-quality, efficient, and personalized service that connects libraries more closely with society and is an important window in the future smart library service system. smart services without personal interactions may become the most popular form of library services development for the future, and research on library smart services has gradually deepened. in terms of conceptual definition, the library community generally understands the concept of library smart services as mobile library services that are not limited by time and space and can help patrons find books and other types of materials in the library by connecting to the wireless internet.16 apart from this, there are two other ways to define library smart services. one discusses the meaning of smart services in an abstract way, such as library smart services that should be an advanced library form dedicated to knowledge services through human-computer interaction, a comprehensive ecosystem.17 the other concretizes the extension of this concept expressed with a formula “smart library = library + internet of things + cloud computing + smar t devices.”18 applied technology research is an important part of smart services in libraries. library smart services have three main features: digitization, networking, and clustering. among them, digitization provides the technical basis, networking provides the information guarantee, and clustering provides the library management model of resources sharing, complementary advantages, and common development among libraries.19 the key breakthrough in the development of smart services is the applications deployment of smart technologies to truly realize a new form of integration of online and offline, virtual and reality. 20 the integration of face recognition technology in traditional libraries, as well as its application to services like acces s control management, book borrowing and returning, and wallet payment, can help libraries build smart services faster.21 the integration of deep learning into a mobile visual search system for library smart services can play an important role in integrating multiple sources of heterogeneous visual data and the personalized preferences of patrons.22 blockchain technology, born out of the impact of the new wave of information technology, has also been applied to the construction of smart library information systems because of its decentralized and secure features.23 library smart services can leverage new technologies and smart devices to enhance the efficiency of library contact-free services and provide new opportunities for knowledge innovation, knowledge sharing, and universal participation, thereby enabling innovation in service models. additional research on the development of contactless services in service areas such as library self-services, online reference, and smart services is discussed. in particular, the research and construction of smart library services have been enriched with the advent of big data and artificial intelligence. however, non-contact service has not been systematically researched and elaborated in domestic and international librarianship. the emergence and prevalence of covid-19 has enabled libraries in many countries to practice various types of touch-free services, such as the introduction of postal delivery, storage deposit, and click-and-collect in australian libraries; curbside pickup service or build a book bag service in us public libraries; and delivery book to the building services in chinese university libraries. 24 therefore, a systematic investigation and study of contactless services in public libraries in the pandemic is of great importance for the adaptation and innovation of library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 5 methods survey samples the survey selected some of the most typical public libraries for the study. the selection criteria were those large public libraries in the more economically and culturally developed regions of china. a total of 128 large public libraries were identified, including national libraries, 32 provincial public libraries, and municipal public libraries in the top 100 cities by gdp ranking in 2020, of which five public libraries, including the capital library and nanjing library, are both top 100 city libraries and provincial libraries. these 128 large public libraries can more obviously reflect the current service level of the better developed public libraries in china, and represent the highest level of public library construction in china. (see table 1 for a list of the libraries studied.) table 1. a list of the 128 public libraries that were studied no. library no. library 1. national library of china 2. hebei library 3. shanxi library 4. liaoning provincial library 5. jilin province library 6. heilongjiang provincial library 7. zhejiang library 8. anhui provincial library 9. fujian provincial library 10. jiangxi provincial library 11. shandong library 12. henan provincial library 13. hubei provincial library 14. hunan library 15. guangzhou library 16. hainan library 17. sichuan library 18. guizhou library 19. yunnan provincial library 20. shanxi library 21. gansu provincial library 22. qinghai library 23. guangxi library 24. inner mongolia library 25. tibet library 26. ningxia library 27. xinjiang library 28. shanghai library 29. capital library of china 30. shenzhen library 31. guangzhou digital library 32. chongqing library 33. tianjin library 34. suzhou library 35. chengdu public library 36. wuhan library 37. hangzhou public library 38. nanjing library 39. qingdao library 40. wuxi library 41. changsha library 42. ningbo library 43. foshan library 44. zhengzhou library 45. nantong library 46. dongguan library 47. yantai library 48. quanzhou library 49. dalian library 50. jinan library 51. xi’an public library 52. hefei city library 53. fuzhou library 54. tangshan library 55. changzhou library 56. changchun library 57. guilin library 58. harbin library 59. xuzhou library 60. shijiazhuang library 61. weifang library 62. shenyang library 63. wenzhou library 64. shaoxing library information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 6 no. library no. library 65. yangzhou library 66. yancheng library 67. nanchang library 68. zibo library 69. kunming library 70. taizhou library 71. erdos city library 72. public library of jining 73. taizhou library 74. linyi library 75. luoyang library 76. xiamen library 77. dongying library 78. nanning library 79. zhenjiang library 80. jiaxing library 81. xiangyang library 82. jinhua library 83. yichang library 84. huizhou tsz wan library 85. cangzhou digital library 86. zhangzhou library 87. weihai library 88. digital library of handan 89. guiyang library 90. sun yat-sen library of guangdong province 91. ganzhou library 92. baotou library 93. huaian library 94. yulin digital library 95. dezhou network library 96. yuyang library 97. changde library 98. baoding library 99. the library of jiujiang city 100. taiyuan library 101. hohhot library 102. wuhu library 103. langfang library 104. national library of hengyang city 105. maoming library 106. nanyang library 107. heze library 108. urumqi library 109. zhanjiang library 110. zunyi library 111. shangqiu library 112. jiangmen library 113. liuzhou library 114. zhuzhou library 115. xuchang library 116. chuzhou library 117. lianyungang library 118. suqian library 119. mianyang library 120. zhuhai library 121. xinyang library 122. zhoukou library 123. zhumadian library 124. huzhou library 125. lanzhou library 126. fuyang library 127. xinxiang library 128. jiaozuo library survey methods web-based investigation, content analysis, and interviews with librarians were used to assess 128 public libraries in china. the survey was carried out between march 10 and september 15 in 2021. first, the authors identified the media platforms for sharing information about each public library’s contactless services, including an official website, a social networking account on wechat, or a library-developed app. the authors investigated whether these media platforms were updated with information about the contactless services and if they provided various information about these services. next, the authors searched the various contactless services offered by this library through these media platforms and recorded them. finally, the authors reviewed the data and findings from the survey to minimize errors and ensure the accuracy of the findings. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 7 findings touch-free information distribution the distribution of library information is generally carried out in a touch-free manner. there are three commonly used information media in libraries: official website, wechat official account, and library-developed app. the adoption rate of each information medium by libraries is determined by investigating whether libraries have opened information media platforms and whether the opened platforms are updated with service information. the results showed that the information medium with the highest adoption rate was the wechat official account, reaching 100%. the library’s official website showed an adoption rate of 94%. only 57% of libraries use apps to distribute contactless information (see fig, 1). figure 1. percentage of touch-free information distribution platforms in large public libraries in china. patron services must provide timely and convenient access if public libraries want to effectively expand their patron base or increase library usage. wechat is better adapted to user convenience than websites, which explains the greater utilization rate as a contactless information dissemination tool for libraries. as a public service institution, the chinese public library has an incomparable impact on politics, economy, and culture. libraries have a great influence on the cultural popularization and educational development of the public. therefore, touch-free information dissemination plays an important role in improving the efficiency of information dissemination. wechat has been fully integrated into china’s public library services as a communication tool, allowing libraries to better foster cultural growth. in the process of cultural growth, libraries need to emphasize interactive public participation and combine public culture, social topics, citizen interaction and media communication, bringing innovative value to promote urban vitality and urban humanism. the 100% 94% 57% 0% 20% 40% 60% 80% 100% 120% wechat official account official website app information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 8 widespread use of wechat helps users stay up to date on the newest information and access library resources services more conveniently. remote resources services restrictions on the use of digital resources are closely related to the frequency of patrons’ use. restrictive measures that posed obstacles to patrons using digital resources were identified. among the 128 large public libraries surveyed, 42% of libraries require reader card authentication by patrons before they can access remote resources services; 8% of libraries do not require users to have reader cards for services. patrons can use the remote resources services available in the remaining 49% of public libraries without needing to register for a user account or patron id on the library website. to reduce the risk of infection between librarians and patrons, some libraries adopted noncontact paper document delivery services for users in urgent need of paper books during the pandemic. for example, the peking university library’s book delivery to building service (see fig. 2) and xiamen library and wenzhou library’s book delivery to home (see fig. 3) allow patrons to reserve books online, and librarians will express mail the books to patrons’ homes according to their needs. figure 2. peking university library’s book delivery service to the building. figure 3. book delivery service of xiamen library and wenzhou library. contactless services have two outstanding advantages: services can be obtained without contact with people, and convenience. however, if the use of remote resources is restricted in many ways, it will lead to a decrease in the utilization of digital resources in libraries. while intellectual property requirements and concerns must be appropriately managed, public libraries should strive to provide patrons with unlimited access to digital materials and physical print books. no-touch interaction self-services no-touch interaction self-services in chinese public libraries mainly include self-checkout, selfretrieval, self-storage, self-printing, self-card registration, and other self-service services, such as self-payment, and self-reservation of study rooms or seminar rooms (see fig. 4). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 9 figure 4. percentage of large public libraries in china that provide contactless self-service. the survey of large public libraries in china shows that the majority offer self-checkout and selfretrieval services. the percentage of public libraries offering self-storage, self-certification and self-printing is low, with only 50% or less usage. self-storage, as one of the earlier self-services, has a usage rate of 50%. only 34 percent of public libraries offered self-card registration. the selfservice card registration machine has four main functions: reader card registration, payment, password modification, and renewal. for example, when patrons need to pay deposits or overdue fines, they can use the self-service card registration machine to swipe their cards and payment to facilitate subsequent borrowing of various resources. the machine supports face recognition technology for card application and online deposit recharge, catering to the needs of patrons in many aspects of operation (see fig. 5). the proportion of self-printing is even lower available at only 15% of libraries. self-card registration and self-printing are both emerging self-service options that require strong financial and technical support and are therefore not widely available. 5% 99% 98% 50% 34% 15% 0% 20% 40% 60% 80% 100% 120% others self-checkout self-retrieval self-storage self-card registration self-printing information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 10 figure 5. self-service card registration machine in chinese large public libraries. most public libraries in china have set up dedicated self-service libraries or microservice halls on the wechat public account platform in addition to further promoting library contactless services and enabling users to enjoy self-service library services anytime, anywhere. for example, the changsha library (see fig. 6) and the taiyuan library (see fig. 7) have both set up a microservice hall column on their wechat public numbers, containing services such as personal appointment, book renewal, event registration, and digital resources. the emergence of online self -service library services has greatly contributed to the development of equalization and standardization of public library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 11 figure 6. changsha library no-touch interaction self-service hall. figure 7. taiyuan library no-touch interaction self-service hall. 24-hour self-service library the 24-hour self-service library, a contactless phenomenon in china’s public libraries, was introduced in 2006 and officially launched in 2007 by dongguan library and followed by shenzhen library’s initial batch of ten self-service libraries. the success of the shenzhen model has sparked a boom in the construction of self-service libraries in china, with 77% of the chinese public libraries surveyed having opened self-help libraries. the development of self-service libraries is divided into two types of service models: space-based self-service libraries (see fig. 8), i.e., unattended libraries with a certain amount of space for use, in which patrons can freely select books and read for leisure, such as 24-hour city bookstores; and a cabinet-type self-service library (see fig. 9), similar to a bank atm with an operating panel and similar in appearance to a bookcase, which allows real-time data interaction with the central library via the network. the eight self-service libraries in taiyuan library in shanxi can provide self-service book borrowing services through the new model of library + internet + credit, which allows patrons to apply for a reader’s card without a deposit and make reservations online and deliver books to the counter (see fig. 10). by cross-referencing the reader’s card with the patron’s face information, the guangzhou self-service library provides self-service borrowing and returning services for patrons through face recognition. there are many similar self-service libraries in china, which provide various types of patron services in different forms, largely reducing direct contact between patrons and librarians, and between patrons and readers. for example, when the pandemic was most severe, data collected from the ningbo self-service library showed that 7,022 physical books were borrowed and returned from january to march 2020, 50% more than in a normal year.25 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 12 figure 8. space-based self-service libraries. figure 9. cabinet type self-service library. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 13 figure 10. taiyuan self-service library. the popularity of 24-hour self-service libraries in china is first and foremost due to the strong support and financial investment of government departments in the construction of self -service libraries. secondly, the features of self-service libraries, which are convenient, time-independent, time-saving, efficient, and diversified, are in line with modern lifestyles, integrating public library services into people’s lives, increasing the visibility and penetration of public library patron services, and maximizing patrons’ needs in reading. network services there is a wide range of network services but the most common are seat reservation, online renewal, and overdue fee payment (see fig. 11). the survey found that 89% of chinese public libraries offer at least one of these network services, indicating a high adoption rate of network services. in 2002, online renewals began to appear in china and then gradually became popular. most of the public libraries in china provide this service in the personal library or wechat official account. the rate of adoption of network service is as high as 85% in the 128 public libraries surveyed. the prevalence of seat reservation services is not high. only 28% of the public libraries surveyed offered seat reservation services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 14 figure 11. percentage of large chinese public libraries that provide network services. coverage of the online overdue fee payment service was even lower with only 21% of public libraries providing access. however, some libraries have replaced the overdue fee system with other methods, such as the shantou library’s lending points system. in the system, the initial number of points on a patron’s account is 100, with two points added for each book borrowed and one point deducted for each day a book is overdue. when the number of points deducted on the account reaches zero, the reader’s card will be frozen for seven days and cannot be used to borrow books. after the freeze is lifted, the number of points will be reset to 20.26 in summary, contactless services in china’s public libraries are moving in a more humane direction. online reference services as a type of contactless service, online reference services are extremely helpful in developing access to documentary information resources. the survey shows that 94% of public libraries provide online reference services. online reference services are available by telephone, website, email, qq, and wechat. telephone reference and website reference are the earliest forms of contactless service, with the highest usage rates of 79% and 71% respectively among public libraries surveyed. this is followed by slightly lower coverage of email reference and qq reference at 55% and 48% respectively. wechat reference coverage rate is the lowest with only 16% (see fig. 12). qq and wechat are both tencent’s instant messengers, but qq’s file function is slightly stronger than wechat’s. qq can send large files of over 1gb and files do not expire, making it easy for the reference librarians to communicate with patrons. 85% 28% 21% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% online renewal seat reservation overdue fee payment information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 15 figure 12. percentage of large public libraries in china that provided online reference service tools. other online reference methods such as microblog reference and intelligent robot reference are present in chinese large public libraries. real-time reference is labor-intensive and timeconsuming, and where librarians may be unavailable to provide an immediate response, intelligent robotic referencing can make up for the problem of consultants being online full time. applying intelligent robots to library reference can also provide accurate and personalized consultation services according to patrons’ needs and behavioral patterns, greatly improving the quality, effectiveness, and satisfaction of consultation services. for example, the zhejiang library has an online reference service which includes online 24-hour robot reference and offline message modules. patrons can also choose expert reference and see available reference experts in the expert list and their details, including name, library, title, specialties, status, etc.27 in addition, the hunan library provides joint online reference, which is a public welfare platform of the hunan provincial literature and information resources common construction and sharing collaborative network, to provide online reference services to the public. eleven member units, including hunan library, hunan university library, and hunan science and technology information institute benefit from the rich literature resources, information technology, and human resources of the network, and all sites work together to provide free online reference advice and remote delivery of literature to a wide range of patrons, as well as advisory and tutorial services to guide patrons on how to use the library’s physical and digital resources.28 smart services without personal interactions driven by artificial intelligence, blockchain, cloud computing, and other technologies, libraries are evolving from physical and digital libraries to smart libraries. smart services without personal interactions are a fundamental capability of smart libraries. this survey found that the coverage of 4% 79% 71% 55% 48% 16% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% others telephone website email qq wechat information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 16 smart services was 52%, with virtual reality coverage at 21%, face recognition coverage at 20%, and swipe face to borrow books at 9%. face recognition can be used in library resources services, face gates, security monitoring, self-checkout, and other online and offline real-name identity verification instances, which can improve the efficiency of identity verification. the biggest advantage of face recognition is that it is contactless and easy to use, avoiding the health and safety risks associated with contact identification such as fingerprints. swipe face to borrow books is one of the applications included in face recognition technology that allows patrons to quickly borrow and return books by swiping faces, even if they have forgotten their reader’s card. this technology also tracks the interests of patrons based on their borrowing habits and history records, providing them with corresponding reading recommendation services. it is worth noting that chinese public libraries have a rich variety of smart service methods. in terms of vr technology applications, the national library of china launched the national library virtual reality system in 2008, the first service in china to bring vr technology to the public eye. the virtual reality system provides patrons with the option to explore virtual scenes and interact with virtual resources available in the library. the virtual scenes are distributed by using computer systems to build realistic architectural structures and reading rooms, so that patrons can learn about the library in the library lobby with the help of vr equipment. virtual resources are digital resources presented in virtual form. the technology combines flash and human gesture recognition systems, allowing patrons to flip through books touch-free at virtual reality reading stations, enhancing the reading style and interactive experience. in addition, the fuzhou library is concerned with the characteristics of different groups of people and has made virtual experiences a focus of its services, using vr technology to innovate reading methods, such as presenting animal images in 3d form on a computer screen, which has been welcomed by a large number of readers, especially children. shanghai library, tianjin library, shenzhen library, chongqing library, and jinan library have introduced vr technology into their patron services as to attract more users. in terms of blockchain applications, the national digital library of china makes use of the special features of blockchain technology in terms of distributed storage, traceable transmission, and high-grade encryption to provide full-time, full-domain, and full-scene copyright protection for massive digital resources and promotes the construction of intelligent library services. related to big data technology, the shanghai library provides personalized recommendation services for e-books based on the characteristics of the books borrowed by readers. patrons using a mobile phone can scan a code on borrowed books and click on the recommended book’s cover for immediate reading.29 conclusion & recommendations an in-depth analysis of the contactless service strategy will help to steadily improve the smart library development process in public libraries and to support their transition to smart libraries. this report provides a systematic framework for contactless services for public libraries based on a survey and assessment of the contactless service status of large public libraries in china. contactless patron services, contactless space services, contactless self -services, and contactless extension services are the four key components of the framework (see fig. 13). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 17 figure 13. a systematic framework of contactless services for public libraries. providing contactless patron services patron services are the heart and soul of each public library. the library’s services providing no personal physical contact or touch-free connection with patrons are referred to as contactless patron services. this includes book lending, online reference, digital resources and network reading promotion. at present, most chinese public libraries have few contactless lending options, making it difficult to meet the needs of patrons who cannot access the library due to covid-19 or transportation difficulties for various reasons. therefore, public libraries can enrich their existing book lending methods by providing patrons with contactless services, such as book delivery and online lending, to create a convenient reading environment. a focus on digital resources is fundamental to achieving contactless patron services. at present, some public libraries in china neglect the management of digital resources due to the emphasis on paper resources, and digital resources are not updated and maintained in a timely manner, which leads to the inability of patrons to use them smoothly; therefore, the effective management of digital resources in libraries is crucial. in addition, public libraries can carry out activities such as network reading promotion and reader education to effectively improve the utilization of library resources. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 18 building contactless space services contactless space services refer to the touch-free interaction between physical space and virtual space. physical space services mainly include self-reservation of study rooms, discussion rooms, meeting rooms, as well as providing venues for public lectures or exhibitions, etc., to fulfill the space demands arising from patrons’ access to information. virtual space services mainly include building spaces for collaboration and communication, creative spaces, information sharing spaces, and cultural spaces, providing a virtual integrated environment for patrons’ needs for information exchange and acquisition in the online environment. public libraries can develop their activities through different channels according to the characteristics and elements of physical and virtual spaces, so that libraries can evolve from “library as a place” to “library as a platform.” the combination of an offline library space and an online library platform provides a more convenient and accessible library experience for patrons. implementing no-touch interaction self-services no-touch interactive self-service plays a pivotal role as one of the service forms of the contactless service strategy. it mainly includes no-touch interaction self-services such as information retrieval, resources navigation, self-checkout, and self-printing. public libraries can set up no-touch interaction self-service sections on their official websites or social media accounts to help patrons quickly access up-to-date information from anywhere and at any time. developing contactless extension services in the three dimensions of time, space, and approach, contactless extension services refer to the mutual extension of the library. public libraries can be open year round on a 24/7 basis or during holidays without librarians, allowing patrons to swipe their own cards to gain access. the traditional collection of paper books should not only be available in offline libraries but can extend to individual self-service libraries or city bookshops. libraries can approach patrons with a more individualized service strategy. for example, some public libraries provide a service called build a book bag, where librarians select books according to the patron’s personal interests and reading preferences and deliver them to a designated location. limitations and prospects after analyzing the current status of contactless services in large public libraries in china, this paper finds that contactless services such as reference and access to digital resources are well established in chinese public libraries. on the other hand, the availability of contactless applications such as no-touch interaction self-services, network services, and smart services without personal interaction are less well-developed. despite the rapid development of touch-free services and their variety, public libraries in china have not yet implemented a system of contactless services. this paper proposes a systematic framework to improve the development and practice of contactless services in public libraries and interrupt the spread of covid-19. the framework includes four core modules: contactless patron services, contactless space services, contactless self-help services, and contactless extension services. it is foreseeable that contactless services will become the mainstream of public library services in the future. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 19 endnotes 1 fred griffith, “identification of the meningococcus in the naso-pharynx with special reference to serological reactions,” journal of hygiene 15, no. 3 (1916): 446–63, https://doi.org/10.1017/s0022172400006355. 2 “guiding opinions of the state council on actively promoting the ‘internet +’ action,” 2015, http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm. 3 d. brooks, “a program for self-service patron interaction with an online circulation file,” in proceedings of the american society for information science 39th annual meeting (oxford, england, 1976). 4 beth dempsey, “do-it-yourself libraries,” library journal 135, no. 12 (2010): 86–93, https://doi.org/10.1016/j.lisr.2010.03.004. 5 jackie mardikian, “self-service charge systems: current technological applications and their implications for the future library,” reference services review 23, no. 4 (1995): 19–38, https://doi.org/10.1108/eb049262. 6 pan yongming, liu huihui, and liu yanquan, “mobile circulation self-service in u.s. university libraries,” library and information service 58, no. 12 (2014): 26–31, https://doi.org/10.13266/j.issn.0252-3116.2014.12.004. 7 chen wu and jang airong, “building a modern self-service oriented library,” journal of academic libraries, no. 3 (2013): 93–96, https://doi.org/cnki:sun:mrfs.0.2016-24-350. 8 rao zengyang, “innovative strategies for university library services in the era of smart libraries,” library theory and practice, no. 12 (2016): 75–76, https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018. 9 wang weiqiu and liu chunli, “functional design and model construction of intelligent library services in china based on face recognition technology,” research on library science, no. 18 (2018): 44–50, https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008. 10 cheng huanwen and zhong yuanxin, “a three-dimensional analysis of a smart library,” library tribune 41, no. 6 (2021): 43–45. 11 nahyun kwon and vicki l. gregory, “the effects of librarians’ behavioral performance and user satisfaction in chat reference services,” reference & user services quarterly, no. 47 (2007): 137–48, https://doi.org/10.5860/rusq.47n2.137. 12 w. uutoni, “providing digital reference services: a namibian case study,” new library world 119, no. 5 (2018): 342–56, https://doi.org/10.1108/ils-11-2017-0122. 13 zhu hui, liu hongbin, and zhang li, “an analysis of the remote service model of university libraries in response to public safety emergencies,” new century library, no. 5 (2021): 39–45, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.007. https://doi.org/10.1017/s0022172400006355 http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm https://doi.org/10.1016/j.lisr.2010.03.004 https://doi.org/10.1108/eb049262. https://doi.org/10.13266/j.issn.0252-3116.2014.12.004 https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018 https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008 https://doi.org/10.5860/rusq.47n2.137 https://doi.org/10.1108/ils-11-2017-0122 https://doi.org/10.1080/24750158.2020.1840719 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 20 14 xiangming mu, alexandra dimitroff, jeanette jordan, and natalie burclaff, “a survey and empirical study of virtual reference service in academic libraries,” journal of academic librarianship 37, no. 2 (2011): 120–29, https://doi.org/10.1016/j.acalib.2011.02.003. 15 cheng xiufeng et al., “a study on a library’s intelligent reference service model based on user portraits,” research on library science, no. 2 (2021): 43–55, https://doi.org/10.15941/j.cnki.is sn1001-0424.2021.02.012. 16 m. aittola, t. ryhänen, and t. ojala, “smart library-location-aware mobile library service,” in human-computer interaction with mobile devices and services, international symposium, (2003). 17 chu jingli and duan meizhen, “from smart libraries to intelligent libraries,” journal of the national library of china, no. 1 (2019): 3–9, https://doi.org/10.13666/j.cnki.jnlc.2019.01.001. 18 yan dong, “iot-based smart libraries,” journal of library science 32, no. 7 (2010): 8–10, http://doi.org/10.14037/j.cnki.tsgxk.2010.07.034. 19 wang shiwei, “a brief discussion of the five relationships of smart libraries,” library journal 36, no. 4 (2017): 4–10, https://doi.org/10.13663/j.cnki.lj.2017.04.001. 20 morell d. boone, “unlv and beyond,” library hi tech 20, no. 1 (2002): 121–23, https://doi.org/10.1108/07378830210733981. 21 qin hong et al., “research on the application of face recognition technology in libraries,” journal of academic libraries 36, no. 6 (2018): 49–54, https://doi.org/10.16603/j.issn10021027.2018.06.008. 22 li mo, “research on a mobile visual search service model for smart libraries based on deep learning,” journal of modern information 39, no. 5 (2019): 89–96. 23 zhou jie, “study on the application of lora technology in smart libraries,” new century library, no. 5 (2021): 57–61, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010. 24 international federation of library associations and institutions, “the covid-19 and the global library community,” 2020, https://www.ifla.org/covid-19-and-the-global-library-field/; guo yajun, yang zinan, and yang zhishun, “the provision of patron services in chinese academic libraries responding to the covid-19 pandemic,” library hi tech 39, no. 2 (2021): 533–48, https://doi.org/10.1108/lht-04-2020-0098; peking university library, “book delivery service to the buildings where the patrons live,” (2020), https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a. 25 hu bin ying yan, “study on the intelligent construction of ningbo library under the influence of epidemic,” jiangsu science & technology information 38, no. 24 (2021): 17–21, https://doi.org/10.3969/j.issn.1004-7530.2021.24.005. 26 shantou library, “come and be a book ‘saint’! city library changes lending rules, points system instead of overdue fees,” 2021, http://www.stlib.net/information/26182. https://doi.org/10.1016/j.acalib.2011.02.003 https://doi.org/10.13666/j.cnki.jnlc.2019.01.001 https://doi.org/10.13663/j.cnki.lj.2017.04.001 https://doi.org/10.1108/07378830210733981 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010 https://www.ifla.org/covid-19-and-the-global-library-field/ https://doi.org/10.1108/lht-04-2020-0098 https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a' http://dx.chinadoi.cn/10.3969/j.issn.1004-7530.2021.24.005 http://www.stlib.net/information/26182 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 21 27 zhejiang library, “online reference services,” 2020, https://www.zjlib.cn/yibanwt/index.htm?liid=2. 28 hunan provincial collaborative network for the construction and sharing of literature and information resources, “reference union of public libraries in hunan province,” 2021, http://zx.library.hn.cn/. 29 ministry of culture and tourism of the people’s republic of china, “shanghai library launches personalized recommendation service for e-books,” 2021, https://www.mct.gov.cn/whzx/qg whxxlb/sh/202101/t20210106_920497.htm. https://www.zjlib.cn/yibanwt/index.htm?liid=2 http://zx.library.hn.cn/ https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm abstract introduction literature review methods survey samples survey methods findings touch-free information distribution remote resources services no-touch interaction self-services 24-hour self-service library network services online reference services smart services without personal interactions conclusion & recommendations providing contactless patron services building contactless space services implementing no-touch interaction self-services developing contactless extension services limitations and prospects endnotes article exploring final project trends utilizing nuclear knowledge taxonomy an approach using text mining faizhal arif santosa information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15603 faizhal arif santosa (faizhalarif@gmail.com) is academic librarian, polytechnic institute of nuclear technology, national research and innovation agency. © 2022. abstract the national nuclear energy agency of indonesia (batan) taxonomy is a nuclear competence field organized into six categories. the polytechnic institute of nuclear technology, as an institution of nuclear education, faces a challenge in organizing student publications according to the fields in the batan taxonomy, especially in the library. the goal of this research is to determine the most efficient automatic document classification model using text mining to categorize student final project documents in indonesian and monitor the development of the nuclear field in each category. the knn algorithm is used to classify documents and identify the best model by comparing cosine similarity, correlation similarity, and dice similarity, along with vector creation binary term occurrence and tf-idf. a total of 99 documents labeled as reference data were obtained from the batan repository, and 536 unlabeled final project documents were prepared for prediction. in this study, several text mining approaches such as stem, stop words filter, n-grams, and filter by length were utilized. the number of k is 4, with cosine-binary being the best model with an accuracy value of 97 percent, and knn works optimally when working with binary term occurrence in indonesian language documents when compared to tf-idf. engineering of nuclear devices and facilities is the most popular field among students, while management is the least preferred. however, isotopes and radiation are the most prominent fields in nuclear technochemistry. text mining can assist librarians in grouping documents based on specific criteria. there is also the possibility of observing the evolution of each existing category based on the increase of documents and the application of similar methods in various circumstances. because of the curriculum and courses given, the growth of each discipline of nuclear science in the study program is different and varied. introduction the national nuclear energy agency of indonesia (batan), now known as the research organization for nuclear energy (ortn)—national research and innovation agency (brin), in 2018 issued a decision regarding batan’s six competencies: isotopes and radiation (ir), nuclear fuel cycle and advanced materials (nfcam), engineering of nuclear devices and facilities (endf), nuclear reactor (nr), nuclear and radiation safety and security (nrss), and management (mgt). these areas of focus are also known as batan’s knowledge taxonomy, which is used to support nuclear knowledge management (nkm) and the grouping of explicit knowledge in repositories.1 the polytechnic institute of nuclear technology (pint), which is under the auspices of batan and is now in one of the directorates of brin, can also utilize batan’s knowledge taxonomy to classify students’ final assignments. every year the pint library accepts final assignments from mailto:faizhalarif@gmail.com information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 2 santosa students who have graduated from three study programs, namely nuclear technochemistry, electronics instrumentation, and electromechanics. over the past six years (2017 to 2022), 563 final assignments in indonesian were collected and needed to be classified into the batan’s knowledge taxonomy in order to see the document growth of each existing competency. however, it is quite time consuming for librarians to assign individual documents to the most appropriate taxonomy term. it is also possible to involve experts to determine the right group, which results in increased working time to complete a document. this obstacle arises because librarians do not have in-depth and detailed knowledge of the nuclear field so it is feared that grouping errors will occur. in this study, the author tried to classify the collection of final project documents owned by the pint library based on batan’s knowledge taxonomy. the author used text mining tools, choosing the k-nearest neighbors (knn) algorithm for this study. similar research also leads to trying to focus on automatic document classification of certain subjects,2 which in this case is the subject of nuclear engineering. the hope is that users will find it easier to explore knowledge according to their area of interest through taxonomy grouping based on explicit knowledge,3 in this case, pint students’ final project documents. finding the trend of research conducted by students on each subject is also one of the goals of this research. literature review text mining in libraries the increasing number of publications currently makes it a challenge to classify and find out the growth and trends of a topic. document classification is one of the jobs that is quite time consuming so document classification automation by utilizing text mining is very necessary.4 the application and utilization of text mining itself is very broad. several studies have demonstrated the usefulness of text mining in libraries. pong et al. from city university of hong kong conducted research to facilitate the classification process using machine learning.5 this study aimed to streamline document categorization utilizing automatic document classification by using a system called the web-based automatic document classification system (wadcs) and claimed to be the pioneer of a comprehensive study of automatic document classification on a classification that is already popular in the world, namely the library of congress classification (lcc) utilizing knn and naive bayes (nb). this research indicates that the machine-learning algorithm they used can be applied by the library for document classification. wagstaff and liu utilized text mining to perform automatic classification to help make decisions to select candidate documents for weeding.6 this study used data from wesleyan university from 2011 to 2014 to predict which documents were eligible for weeding and which will be stored. five classifier models, namely knn, naive bayes, decision tree, random forest, and support vector machines (svm), were used to compare their performance. while this process may not replace librarians, this study can help librarians make better decisions and reduce their workload significantly. lamba and madhusudhan applied the use of text mining to extract important topics which were published in the desidoc journal of library and information technology over a period of 38 years.7 the latent dirichlet allocation (lda) method used in this study is able to find topics from information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 3 santosa within a collection of documents so that they can see how these topics develop over time. because lda is an algorithm for looking at topics from a group of words that appear together, the authors suggest that this study be expanded by utilizing articles that have been labeled using supervised classification. knn classifier various studies try to find answers to the most appropriate method of grouping the collection of documents. the knn and svm algorithms were used as comparative methods in the document classification study.8 however, there is no definite standard for the methods used in text mining.9 choosing the right technique in each phase of document classification can improve the performance of the text classifier, so, experts generally make adjustments to existing methods to get better results.10 kim and choi compared knn, maximum entropy model (mem), and svm to classify japanese patent documents by focusing on the structure of patents.11 instead of comparing the entire text, specific components named semantic elements, such as purpose, background, and application fields, are compared from the training document. these semantically grouped components are the basis for patent categorization. in addition, the strategy used is the existence of cross -references from two semantic fields that are useful for determining the intentions of the patent writer s who are still unsure or hidden. this strategy works well on knn compared to mem and svm where svm doesn’t do very well when handling large data sets. however, research conducted by alhaj et al. on arabic documents showed that svm can outperform knn by implementing a stemming strategy.12 meanwhile, through the approach to the relationship between unstructured text documents, the study conducted by mona et al. was able to increase the performance of knn combined with tf-idf by 5 percent.13 the knn algorithm is one of the popular classifiers that categorizes new data based on the concept of similarity from the amount of data (determined by the specified “k” value) around it.14 this method is believed to be able to group documents effectively because it is not limited to the number of vector sizes.15 wagstaff and liu noted that one of the weaknesses of knn is the long processing time when faced with large datasets, but knn as a classifier is easy to apply.16 in terms of measurement, previous experiments showed that knn was not suitable when used with euclidean distance.17 generally, similarity measures such as cosine, jaccard, and dice were used in the knn classifier.18 one of the problems in text classification is the number of attributes or dimensions so that many irrelevant attributes in the data set cause the classifier’s performance to not run optimally.19 for this reason, it is necessary to have a technique to increase effectiveness and reduce dimensions that are too large through the selection of features or terms,20 such as within-document tf, weighting with one of the popular methods, namely tf-idf (which sees how important a word is in a collection of corpus),21 and binary representation which looks at the absence and presence of a concept in a document22 by converting it to 0 and 1.23 aims of the study university libraries have a vital role in managing internal publications to support the education ecosystem. in connection with the role of the pint to support nkm and nuclear development, it is necessary to apply technology to help provide advice on certain classes of documents. in addition, in order to see scientific developments, generally experts conduct bibliometric studies which are information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 4 santosa limited to the title and abstract fields. text mining provides an opportunity to dig deeper. instead of just the title and abstract, this study used the full text of the final project collection. the trend of a subject will be seen from the growth and percentage of existing documents. so, the objectives of this study are to • explore the best knn model to be applied to classify the final project; • know the development of nuclear subjects based on batan’s knowledge taxonomy; and • know the development of nuclear subjects from each study program at the pint. methods a total of 99 documents were taken from the batan repository and manually labeled as reference data. this study was conducted using rapidminer studio software. the first document processing method is to convert all words into lower case and divide the text into a collection of tokens. filters on tokens are also applied based on the length of the token. in this case, the author applied a minimum of 3 characters and a maximum of 25 characters. stop words were also applied to eliminate short words (e.g., “and,” “the,” and “are”), thereby reducing the vector size. english and indonesian stop words were used for this study to overcome the use of english in the abstract section and indonesian as the document language. the collection of words from haryalesmana was chosen to be the stop words for indonesian.24 the stemming technique is applied to reduce dimensions that are useful for improving the function of the classification system 25 by changing word forms into basic word,26 e.g., water, waters, watered, and watering into water. this analysis applies wicaksana data to indonesian stemming.27 some words cannot be separated from other words because they form a meaning, e.g., nondestructive testing, biological radiation effects, structural chemical analysis, and water -cooled reactors. to overcome this case, the use of n-grams can help identify compound words that have a meaning so that the words are not reduced.28 n-grams will record a number of “n” words that follow the previous word.29 to accommodate these words, in this study, three words were assigned to n-grams. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 5 santosa figure 1. nuclear taxonomy classification framework. vector creation in this study used tf-idf and binary term occurrence and then compared them to determine the best performance. in the knn method, it is necessary to determine the value of “k” manually, so a value of 2–10 was chosen by activating a weighted vote which is useful for weighing the contributions of neighbors in the vicinity. weight voting indicates the use of multiple voting methods by assigning a weight to each neighbor depending on their distance from the unknown item.30 the types of measurement chosen to get maximum results were numerical measure and tested cosine similarity, correlation similarity, and dice similarity. meanwhile, to measure performance, the author used cross validation with a number of folds of 10. then, using this set of procedures, documents from the batan repository are classified. the procedure that achieves the highest level of accuracy is then submitted as a model. this model was applied to 563 final project documents that have not been labeled so that each document has a label according to batan’s knowledge taxonomy. results the experiment was carried out 54 times to determine the best knn performance from the proposed approach, namely cosine-binary, correlation-binary, dice-binary, cosine–tf-idf, correlation–tf-idf, and dice–tf-idf utilizing cross validation. cosine was still the most accurate in the tf-idf vector creation process, with an accuracy of 81.89 percent on seven neighbors, and dice reaches the lowest point when used on four neighbors. in contrast to correlation and dice, cosine can perform well when creating binary vectors. cosine on four neighbors had the best performance, with a 97 percent accuracy rate. the lowest accuracy occurred when the number of selected neighbors was two and the overall numerical measure had decreased in neighbors more than nine. the classification model for unlabeled documents was determined to be the cosine-binary method with four neighbors. the experiment found that this method did not successfully group three information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 6 santosa documents (for details of the confusion matrix, see appendix a). even though document 7 ought to be on nfcam, but with a lower score of 0.49921, it was predicted on the nrss with a confidence value of 0.50079. documents 86 and 93, which were supposed to be about endf, were unable to be foreseen. document 93 was predicted on the nrss with a confidence value of 0.50126 and document 86 was predicted on the nr with a value of 0.49936. figure 2. a comparison of the accuracy levels in the knn method. this study utilized 563 unlabeled documents that were divided into six years. there were 34 fewer documents in 2021 than there were in 2020, a significant drop from the previous year (see table 1). the number of documents then climbed again in 2022, reaching 98. rapidminer’s labeling process ran into issues when it got to the process document stage. to improve memory performance, the documents were split into three runs (2017–2018, 2019–2020, and 2021–2022) because the memory was not sufficient to execute a set of commands on docu ment processing. the results of the previous set of procedures were then exported as tabular data for further study. every year, the evolution of each nuclear subject can be seen in the final project report (see fig. 3). during the test period, 282 documents (50.09%) of the total extant papers had an endf study, followed by ir with 95 documents (16.87%) and nfcam with 69 documents (12.26%). while there were very little changes between nr and nrss, nr contains 47 papers (8.35%) connected while nrss had 45 documents (7.99%). mgt was the subject with the fewest documents, with a total of 25 (4.44%) from 2017 to 2022. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 7 santosa table 1. the pint’s final project documents growth from 2017 to 2022 study program 2017 2018 2019 2020 2021 2022 grand total electromechanics 35 34 43 35 24 41 212 electronics instrumentation 27 34 38 38 22 28 187 nuclear technochemistry 31 31 26 27 20 29 164 grand total 93 99 107 100 66 98 563 see appendix b for more information on the confidence value of each predicted document. of the 212 final project reports in the electromechanics study program 63.68 percent (135 documents) were projected to be on the endf subject, followed by 17.92 percent (38 documents) on nfcam, nrss with 8.96 percent (19 documents), and nr 5.19 percent (11 documents). meanwhile, ir had the fewest papers predicted, with 2.83 percent (6 documents) while mgt had 1.42 percent (3 documents) predicted. every year, endf was the most predicted subject in this study (see fig. 4). figure 3. nuclear subject development by percentage each year. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 8 santosa figure 4. nuclear subject development in electromechanics by % each year. the final project report on instrumentation electronics, which included 187 papers, was successfully predicted into five subjects. endf was projected to contain 141 documents (75.40%), nrss was likely to contain 24 documents (12.83%), and nr was predicted to contain 14 documents (7.49%). furthermore, only 7 documents (3.74%) on mgt and 1 document (0.53%) on ir were predicted. nfcam, on the other hand, is not mentioned in any of the electronics instrumentation publications (see fig. 5). final processing was performed on a collection of nuclear technochemistry documents. one hundred sixty-four documents are predicted at ir of 53.66 percent (88 documents), nfcam of 18.90 percent (31 documents), nr of 13.41 percent (22 documents), mgt of 9.15 percent (15 documents), endf of 3.66 percent (6 documents), and the remaining 1.22 percent (2 documents) were predicted on the nrss. subjects that were popular each year vary (see fig. 6) when compared to electromechanics and instrumentation electronics, where endf was the most popular topic in these two study programs. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 9 santosa figure 5. nuclear subject development in electronics instrumentation by % each year. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 10 santosa figure 6. nuclear subject development in nuclear technochemistry by % each year. discussion the study found that implementing knn with cosine similarity in association with vector construction=binary and k=4 resulted in the highest accuracy results of 97 percent. in general, this strategy outperformed in every class examined, and it can only be balanced on one occasion, notably at k=9 by utilizing correlation similarity. when compared to the use of tf-idf, the results likewise indicated that binary term occurrence always functioned well. tf-idf was only able to achieve its highest accuracy of 81.89 percent when k was 7 using correlation similarity. cosine similarity also seemed to work efficiently on every vector creation, both when using binary and tf-idf (in classes numbering 2, 5, and 10 the use of tf-idf was not optimal), compared to numerical measures of correlation similarity and dice similarity. cosine similarity evaluates the similarity of documents, and a high similarity score indicates that the documents are quite similar.31 nuclear field growth in general, aside from the endf field, which is steady and increasing, other subjects endure annual changes in development. for the past six years, endf has been the most popular subject among students. the endf reached the highest percentage rate in 2022, with 59 documents predicted on this subject. students preferred engineering final project reports on mechanics and structures, electromechanics, control systems, nuclear instrumentation, or nuclear facility process technology. research conducted by wang et al. also suggests that the current popular topic of research on nuclear power is modeling and simulation.32 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 11 santosa the endf document’s average confidence value was 0.6499916, with a median value of 0.7490455. the two documents with the lowest confidence in the endf were document numbers 233 and 597. document 233 had a confidence value of 0.25105 and was predicted in the other three subject areas (nrss, nr, mgt) with close values. likewise, the 597 documents predicted in the endf with a confidence value of 0.25156 were higher than the nrss, nfcam, and ir subjects, but with a not too significant difference. both of these documents can be investigated further and directly evaluated by the librarian in order to obtain a more precise field. the majority of the final project reports projected in the endf have confidence levels around 0.50, and some even higher at 0.75. this study also reveals that 11 documents in the endf category have a confidence value of 1. with lower nrss confidence values, 239 endf documents connected to the nrss field. this relationship demonstrates a good tendency among students conducting nuclear engineering related to the nrss discipline. though it differs significantly from endf, ir is becoming a prominent field. the final project report for ir was developed in 2017–2018, but it shrank again from 2019 to 2021, then increased in 2022. in comparison to other fields, ir has the highest minimal confidence score of 0.4 987, with many documents lying within the 0.5 and 0.75 range. meanwhile, the confidence value for 26 documents predicted by ir is 1. the nfcam subject area is a prediction that appears frequently in ir predictions but has a lower level of confidence. there are 54 documents indicating the existence of research that involves isotopes and radiation in nuclear materials, nuclear excavations, radioactive waste, structures, or advanced materials. nfcam is inversely proportional to the conditions that occur in endf. after increasing in 2019, this subject faced a reversal over the next three years, with only two documents classified in this subject through 2022. students are still uncommonly interested in nuclear minerals, nuclear fuel, radioactive waste, structural materials, and advanced materials. six projected documents in this field have confidence levels of 1, while many more have confidence levels between 0.50 and 0.75. the ir field is also expected to appear alongside the nfcam field publications. there were also ups and downs in nr and nrss. twenty-five of the 47 documents identified on the nr were also predicted with a lower value in the nrss field. this demonstrates that students explored the relationship between the subject of reactor research and safety and security in various documents. meanwhile, only eight of the 46 nrss papers are unrelated to the endf field. this demonstrates that students who study nuclear safety and security tend to perform engineering to address situations involving nuclear safety and security. documents in these two fields are usually concentrated in the 0.5 confidence value range in both nr and nrss. mgt is one of the least studied topics among students. human resources, organization, management, program planning, auditing, quality systems, informatics utilization, or cooperation are more commonly associated with the mgt field. the mgt increased in 2020, although it became the field with the fewest documents on earlier occasions (2017 to 2019 and 2021 to 2022). in terms of confidence value, 21 mgt documents have a value greater than 0.5, with eight documents worth 1. with 10 documents, the endf is the most often discussed study area with mgt. progression in each study program even if they are still within the purview of nuclear science, the growth of the nuclear field in each study program differs depending on the curriculum. students are influenced by knowledge, and information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 12 santosa more specifically the process of learning and comprehending (whether theoretical or more practical).33 endf is still the most popular field in electromechanics and electronics instrumentation study programs. these two study programs offer courses in endf topic areas such as mechanical, civil and architectural, electromechanical, electrical, control systems, and radiation detection for nuclear devices. furthermore, the electronics instrumentation study program offers courses on nuclear electronics, signal processing techniques, and practical work on interface and data acquisition techniques, all of which are part of the endf nuclear instrumentation group. apart from endf, the fields of nfcam and nrss have been present in electromechanics for a period of six years. while mgt is currently a less appealing topic, there have been no final project reports relating to mgt in the most recent three years. in electronics instrumentation, the absence of a field occurs in nfcam. the findings of the predictions demonstrate that none of the documents predicted on nfcam were proper. meanwhile, only 10 documents that intersect with nfcam which have lower confidence in the range of values from 0.247 to 0.251. nuclear minerals, nuclear fuel, structural materials and advanced materials, and radioactive waste were not studied in depth in this study program, illustrating why nfcam is not predicted in instrumentation electronics. in contrast to other study programs, ir is the most predictable field in the final project report in nuclear technochemistry. in this investigation, nuclear technochemistry owns 88 of the 95 documents examined. this study program includes ir specializations such as the use of isotopes and radiation in agriculture, health, and industry. radioisotope production becomes another discipline that specializes in the creation of isotopes and radiation sources, which explains why ir is so popular among nuclear technochemistry students. the nfcam field was not present in 2022, despite the fact that it had been the topic of several students’ studies throughout the preceding five years. while the endf and mgt fields have only been present in the last three years, there were no predictable papers in the previous three years. conclusion the trend of research activities carried out by students from one study program to the next appears to vary although they are both within the scope of the nuclear field. for example, the field of endf is quite popular among electromechanics and electronics instrumentation students but not for nuclear technochemistry students because endf only appeared three years ago and the number of documents is still modest. however, endf deserves to be a field that needs attention. nuclear technochemistry students with radiochemistry learning experiences demonstrate that the ir field is linear and interesting to them. due to a paucity of publications, the low proportion in certain categories, e.g., mgt, shows a potential to further investigate this field. this study demonstrates an opportunity to use text mining to assist librarians in performing automatic document classification based on specific subjects. the best model in this study is produced by combining knn with cosine similarity and binary term occurrence. the model used can help improve the quality of decisions made to accurately and efficiently categorize documents. to determine a more specific classification, pay close attention to documents that have a low level of confidence and intersect with other issues. this study is limited to the knn method and information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 13 santosa documents from the batan repository, as well as final project documents for pint students. large-scale testing can be conducted, for instance, in the international atomic energy agency ’s (iaea) nuclear repository known as the international nuclear information system (inis) repository, or in other databases with the complexity of categorizing documents throughout many languages. data accessibility datasets and data analysis code for rapidminer have been uploaded to the rin dataverse: https://hdl.handle.net/20.500.12690/rin/asrgvo. data visualization can be accessed through tableau public: https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknow ledgetaxonomy/story1 https://hdl.handle.net/20.500.12690/rin/asrgvo https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknowledgetaxonomy/story1 https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknowledgetaxonomy/story1 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 14 santosa appendix a: confusion matrix of 10-fold cross validation accuracy: 97.00% +/4.83% (micro average: 96.97%) true nfcam true ir true nrss true mgt true nr true endf class precision pred. nfcam 13 0 0 0 0 0 100.00% pred. ir 0 18 0 0 0 0 100.00% pred. nrss 1 0 20 0 0 1 90.91% pred. mgt 0 0 0 19 0 0 100.00% pred. nr 0 0 0 0 13 1 92.86% pred. endf 0 0 0 0 0 13 100.00% class recall 92.86% 100.00% 100.00% 100.00% 100.00% 86.67% information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 15 santosa appendix b: the confidence value of each field e n d f ir m g t information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 16 santosa n f c a m n r n r s s information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 17 santosa endnotes 1 budi prasetyo and anggiana rohandi yusuf, “pengelolaan pengetahuan eksplisit berbasis teknologi informasi di batan,” in prosiding seminar nasional sdm teknologi nuklir (seminar nasional sdm teknologi nuklir, yogyakarta: sekolah tinggi teknologi nuklir, 2018), 126–32, https://inis.iaea.org/collection/nclcollectionstore/_public/50/062/50062856.pdf?r=1 . 2 joanna yi-hang pong et al., “a comparative study of two automatic document classification methods in a library setting,” journal of information science 34, no. 2 (april 2008): 213–30, https://doi.org/10.1177/0165551507082592. 3 prasetyo and yusuf, “pengelolaan pengetahuan eksplisit.” 4 jae-ho kim and key-sun choi, “patent document categorization based on semantic structural information,” information processing & management 43, no. 5 (september 2007): 1200–15, https://doi.org/10.1016/j.ipm.2007.02.002; pong et al., “a comparative study”; khusbu thakur and vinit kumar, “application of text mining techniques on scholarly research articles: methods and tools,” new review of academic librarianship (may 12, 2021): 1–25, https://doi.org/10.1080/13614533.2021.1918190. 5 pong et al., “a comparative study.” 6 kiri l. wagstaff and geoffrey z. liu, “automated classification to improve the efficiency of weeding library collections,” the journal of academic librarianship 44, no. 2 (march 2018): 238–47, https://doi.org/10.1016/j.acalib.2018.02.001. 7 manika lamba and margam madhusudhan, “mapping of topics in desidoc journal of library and information technology, india: a study,” scientometrics 120, no. 2 (august 2019): 477– 505, https://doi.org/10.1007/s11192-019-03137-5. 8 fábio figueiredo et al., “word co-occurrence features for text classification,” information systems 36, no. 5 (july 2011): 843–58, https://doi.org/10.1016/j.is.2011.02.002; yen-hsien lee et al., “use of a domain-specific ontology to support automated document categorization at the concept level: method development and evaluation,” expert systems with applications 174 (july 2021): 114681, https://doi.org/10.1016/j.eswa.2021.114681; yousif a. alhaj et al., “a study of the effects of stemming strategies on arabic document classification,” ieee access 7 (2019): 32664–71, https://doi.org/10.1109/access.2019.2903331. 9 david antons et al., “the application of text mining methods in innovation research: current state, evolution patterns, and development priorities,” r&d management 50, no. 3 (june 2020): 329–51, https://doi.org/10.1111/radm.12408; muhammad arshad et al., “next generation data analytics: text mining in library practice and research,” library philosophy and practice (2020): 1–12. 10 mowafy mona, rezk amira, and hazem m. el-bakry, “an efficient classification model for unstructured text document,” american journal of computer science and information technology 06, no. 01 (2018), https://doi.org/10.21767/2349-3917.100016. https://inis.iaea.org/collection/nclcollectionstore/_public/50/062/50062856.pdf?r=1 https://doi.org/10.1177/0165551507082592 https://doi.org/10.1016/j.ipm.2007.02.002 https://doi.org/10.1080/13614533.2021.1918190 https://doi.org/10.1016/j.acalib.2018.02.001 https://doi.org/10.1007/s11192-019-03137-5 https://doi.org/10.1016/j.is.2011.02.002 https://doi.org/10.1016/j.eswa.2021.114681 https://doi.org/10.1109/access.2019.2903331 https://doi.org/10.1111/radm.12408 https://doi.org/10.21767/2349-3917.100016 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 18 santosa 11 kim and choi, “patent document categorization.” 12 alhaj et al., “a study of the effects of stemming strategies.” 13 mona, amira, and el-bakry, “an efficient classification model.” 14 thakur and kumar, “application of text mining techniques.” 15 kim and choi, “patent document categorization.” 16 wagstaff and liu, “automated classification.” 17 najat ali, daniel neagu, and paul trundle, “evaluation of k-nearest neighbour classifier performance for heterogeneous data sets,” sn applied sciences 1, no. 12 (december 2019): 1559, https://doi.org/10.1007/s42452-019-1356-9. 18 roiss alhutaish and nazlia omar, “arabic text classification using k-nearest neighbour algorithm,” the international arab journal of information technology 12, no. 2 (2015): 190–95. 19 mona, amira, and el-bakry, “an efficient classification model.” 20 guozhong feng et al., “a probabilistic model derived term weighting scheme for text classification,” pattern recognition letters 110 (july 2018): 23–29, https://doi.org/10.1016/j.patrec.2018.03.003. 21 snezhana sulova et al., “using text mining to classify research papers,” in 17th international multidisciplinary scientific geoconference sgem 2017, vol. 17, international multidisciplinary scientific geoconference-sgem (17th international multidisciplinary scientific geoconference sgem, sofia: surveying geology & mining ecology management (sgem), 2017), 647 –54, https://doi.org/10.5593/sgem2017/21/s07.083. 22 lee et al., “use of a domain-specific ontology.” 23 man lan et al., “supervised and traditional term weighting methods for automatic text categorization,” ieee transactions on pattern analysis and machine intelligence 31, no. 4 (april 2009): 721–35, https://doi.org/10.1109/tpami.2008.110. 24 devid haryalesmana, “masdevid/id-stop words,” 2019, https://github.com/masdevid/id-stop words. 25 alhaj et al., “a study of the effects of stemming strategies.” 26 pong et al., “a comparative study.” 27 ananta pandu wicaksana, “nolimitid/nolimit-kamus,” 2015, https://github.com/nolimitid/nolimit-kamus. 28 antons et al., “the application of text mining methods.” https://doi.org/10.1007/s42452-019-1356-9 https://doi.org/10.1016/j.patrec.2018.03.003 https://doi.org/10.5593/sgem2017/21/s07.083 https://doi.org/10.1109/tpami.2008.110 https://github.com/nolimitid/nolimit-kamus information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 19 santosa 29 kanish shah et al., “a comparative analysis of logistic regression, random forest and knn models for the text classification,” augmented human research 5, no. 1 (december 2020): 12, https://doi.org/10.1007/s41133-020-00032-0. 30 judit tamas and zsolt toth, “classification-based symbolic indoor positioning over the miskolc iis data-set,” journal of location based services 12, no. 1 (january 2, 2018): 2–18, https://doi.org/10.1080/17489725.2018.1455992. 31 hanan aljuaid et al., “important citation identification using sentiment analysis of in -text citations,” telematics and informatics 56 (january 2021): 101492, https://doi.org/10.1016/j.tele.2020.101492. 32 qiang wang, rongrong li, and gang he, “research status of nuclear power: a review,” renewable and sustainable energy reviews 90 (july 2018): 90–96, https://doi.org/10.1016/j.rser.2018.03.044. 33 ronald barnett, “knowing and becoming in the higher education curriculum,” studies in higher education 34, no. 4 (june 2009): 429–40, https://doi.org/10.1080/03075070902771978. https://doi.org/10.1007/s41133-020-00032-0 https://doi.org/10.1080/17489725.2018.1455992 https://doi.org/10.1016/j.tele.2020.101492 https://doi.org/10.1016/j.rser.2018.03.044 https://doi.org/10.1080/03075070902771978 abstract introduction literature review text mining in libraries knn classifier aims of the study methods results discussion nuclear field growth progression in each study program conclusion data accessibility appendix a: confusion matrix of 10-fold cross validation appendix b: the confidence value of each field endnotes factors affecting university library website design | kim 99 yong-mi kim factors affecting university library website design factors include usability testing and institutional forces.5 because website design studies are sparse, this study examines the success of technology utilization studies to further identify factors that are pertinent to website design in order to provide a comprehensive view of web design success factors. a review of literature related to university library website design will be offered in the next section. the research methods, which discuss the data collection strategies and the measurements used in the current study, will be followed by the literature review. the findings of the study will later be reported and discussed after the research methods section. the paper will then conclude with an overview of the implications the findings have for academia and managers. ■■ literature review this section offers an overview of the existing website design literature and relevant success factors. these factors include institutional forces, supervisors’ technical knowledge and support, input from secondary sources, and input from users. because the aforementioned elements are identified as independent variables, this study also adopts them as such. following existing studies, website success factors are identified from the utilitarian perspective.6 the dependent variables are (1) the extent to which website designers meet users’ needs, (2) the extent to which users perceive ulwr to be useful, and (3) their actual usage. in this manner, the evaluation of success is measured from different perspectives. this discussion of the independent and the dependent variables appears in the conceptual model, figure 1. institutional forces institutional forces refer to as organizations following other organizations practices to secure efficiency and legitimacy. existing studies have identified three institutional forces: coercive, mimetic, and normative.7 coercive force takes place when an organization pressures others to adopt a certain practice. it is higher when an organization is a subset of another organization. in this research context, the university could be an agent of coercive force. mimetic force refers to organizations following other organizations’ practices, and it is especially common for organizations within the same industry group.8 because organizations within existing studies have extensively explored factors that affect users’ intentions to use university library website resources (ulwr); yet little attention has been given to factors affecting university library website design. this paper investigates factors that affect university library website design and assesses the success of the university library website from both designers’ and users’ perspectives. the findings show that when planning a website, university web designers consider university guidelines, review other websites, and consult with experts and other divisions within the library; however, resources and training for the design process are lacking. while website designers assess their websites as highly successful, user evaluations are somewhat lower. accordingly, use is low, and users rely heavily on commercial websites. suggestions for enhancing the usage of ulwr are provided. f rom a utilitarian perspective, a website evaluation is based on users’ assessments of the website’s instrumental benefits.1 if a website helps users complete their tasks, they are likely to use the website. following this line of reasoning, dominant research has reported that users are most likely to use university library website resources (ulwr) when they can help with user tasks.2 although we know now that the utilitarian perspective should be applied to web design, not clear is the extent to which web designers consider users’ needs and, likewise, the extent to which users consider ulwr to be successful in terms of meeting their needs. also not clear are what factors other than user needs influence university library website design. this is not a trivial issue because university libraries have invested a massive number of resources into providing web services and need to justify their investments to stakeholders (such as the university) by demonstrating their ability to meet users’ needs.3 also important is the identification of these factors because web design and website performance are closely correlated.4 as a consequence, investigating factors that influence successful university library website design and providing managerial guidance is a timely pursuit. later, the objectives of this paper are twofold: 1. what factors influence university library website design? 2. to what extent do website designers and users consider the university library website to be successful? to explore these research questions, this study identifies factors influencing university library website design that have been reported in existing literature. these yong-mi kim (yongmi@ou.edu) is assistant professor, school of library and information studies, university of oklahoma, tulsa, oklahoma. 100 information technology and libraries | september 2011 although it is a critical factor for website success, there is little evidence that website designers receive strong support from their supervisors. research shows that supervisors’ lack of knowledge about websites inhibits user-centered website design.17 a respondent from chen et al.’s study reports, “it’s really a pain trying to connect with our administration on the topic of web design and usability, because even definitions are completely out the window” and “the dean and the associate directors know little about the need for usability and view it as a last minute check-off, so they can say that the web site is tested and usable.”18 lack of supervisor support inhibits website usability.19 input from secondary sources website designers typically aggregate information from secondary sources rather than from users. identified secondary sources are consultations with experts, other divisions within the library, webmasters, web committees, and focus groups.20 the most widely used method is consultation with experts.21 experts uncover technical flaws and any obvious usability problems with a design,22 facilitate focus groups,23 and create new information architecture.24 because they are experts, however, their ways of thinking may not be the same as users.’25 research shows that 43 percent of the problems found by expert evaluators were actually false alarms and that 21 percent of users’ problems were missed by those evaluators. if this analysis is true, expert evaluators tend to miss and incorrectly identify more problems than they correctly identify;26 consequently, expert testing should not substitute for user testing.27 another problem with secondary sources is that web committees “are ignorant about integrating design with usability and focus on their own agenda.”28 nonetheless, because of the lack of available resources to conduct more rigorous usability tests and the difficulty of collecting information directly from users, secondary sources such as expert evaluations are commonly used.29 input from users user input provides a great advantage for directly finding out users’ needs and integrating a user-centered design during the development stage.30 often, information from secondary sources makes assumptions about users’ needs.31 to discover users’ genuine needs, designers can conduct a regular user survey and/or seek out users’ input.32 by surveying users’ needs, one can overcome criticism such as, “most websites are created with assumptions of more expert knowledge than the users may actually possess,” and can address users’ needs more effectively.33 discovering users’ needs goes beyond usability testing because information obtained directly the same industry face similar problems or issues, mimetic decisions can reduce uncertainty and secure legitimacy.9 in this context, website designers may analyze and emulate other universities’ websites to claim that their websites are congruent with successful websites, thereby justifying their managerial practices. normative force is associated with professionalism.10 normative force occurs when the norms (e.g., equity, democracy, etc.) of the professional community are integrated into organizational decision-making. in a library setting, website designers may follow a set of value systems or go to conferences to discover ways to better deliver services. there is evidence that website designers follow other organizations.11 this phenomenon is known as isomorphism. the appearance and the structure of websites show isomorphic patterns when an organization follows examples of other organizations’ websites or conforms to institutional pressures.12 another study reports coercive forces in the design of university library websites; the parent institution exercises power over library website design by providing guidelines, and later, the design is not independent.13 supervisors’ technical knowledge and support literature on supervisors’ knowledge of and support for technology has long been recognized as one of the most important technology success factors.14 if supervisors are knowledgeable about technology, they are likely to support and provide resources for training.15 supervisors’ technical knowledge also serves as a signal for the importance of the utilization of technology within the organization; consequently, employees actively look for ways to utilize technology and vigorously adopt technology.16 figure 1. conceptual model for website design success factors affecting university library website design | kim 101 march and may 2009. a total of 315 responses were collected (139 males and 176 female; 148 undergraduates, 101 master ’s, and 66 doctoral/faculty; business 152, human relations 51, psychology 43, engineering 41, education 20, other 8). because detailed discussion of the user side of this sample appears elsewhere,36 it will not be repeated here to avoid redundancy. because sparse research has been done in this area, the questionnaire and its measurements were created based on literature relating to the successful deployment of technology, but they were modified to fit into the website design context. because of this modification, the finalized instrument was pretested and pilot tested before use in this study.37 the institutional forces are measured in three categories: coercive isomorphism (i.e., following the university guidelines regarding website creation), mimetic isomorphism (i.e., investigating other university websites and investigating commercial websites), and normative isomorphism (i.e., attending conferences). following existing studies, supervisors’ knowledge and support are assessed by the web designer in two areas: the extent to which a supervisor is knowledgeable about technology and aware of the importance of technology. the supervisor ’s support for the website is measured by asking web designers about the extent to which their supervisors allocated resources and offered training. input from secondary sources is measured by asking the extent to which website designers consult sources such as experts, other divisions, webmasters, and web committees. input from users is measured by the extent to which web designers collect information from website users. finally website successes are measured by two categories: assessments made by the web designers and the website users themselves. the finalized measurements and the sources appear in table 1. ■■ report of findings this section reports the empirical findings of each category discussed in the previous section. figure 2 shows institutional forces that influence university library website design. the first category is coercive force, the second category is mimetic forces, and the third category is normative force. it is clear that the majority of university library web designers (75 percent) comply with the guidelines given by the university, which is a measurement of coercive force; and also designers investigate other universities’ websites (75 percent) and commercial websites (59 percent), which is a measurement of mimetic forces; however, designers don’t appear to actively attend conferences that influence website design, which is a measurement of normative force. from users will reveal what users want and what should be done to meet their needs, thereby enhancing ulwr usage. however, research shows that this aspect is not actively integrated into web design due to the lack of support from supervisors.34 website success success can be measured according to the website’s purpose: to what extent does the website meet users’ needs? in the university library website context, following a utilitarian perspective, researchers measured the success by the degree of ulwr integrated into users’ tasks and users’ frequent visits to the website.35 these two measurements, when combined with the designers’ perceptions of success, will allow one to measure the users’ and designers’ perspectives of website success. by measuring from these two sides, if there is a discrepancy between the two success outcomes, it will prompt designers to adjust their viewpoints to align their success measures with users. ■■ research methods this section discusses the sampling strategies and the measurements for the independent and the dependent variables. because one of the contributions of this study is to compare users’ and designers’ perceptions of website success, the samples are drawn from two groups: one is from university library website designers and the other one is from university library users. for the designer side, it is directly collected from university library website designers; later, libraries without website designers within the library are excluded. the designer sample is identified from the publicly available yahoo academic library list (http://dir.yahoo.com/ reference/libraries). the list contains 448 academic libraries, including those outside the united states. the research assistant made a phone call to the libraries that reside in the united states and verified the existence of website designers within the library, which included 86 academic libraries. if a library had a website designer, the research assistant contacted the person and invited him or her to participate in the study. because of difficulties contacting website designers, the research assistant was able to collect 16 responses between may 2009 and february 2010. once the graduate assistant identified the unreachable designers, the researcher e-mailed those designers between january and april of 2010 and added 30 more responses to the dataset, which resulted in a total of 46 responses (a 54 percent response rate). for the user side, a survey questionnaire was sent to faculty, doctoral, master ’s, and undergraduate students between 102 information technology and libraries | september 2011 the second group of factors that affects website design is supervisors’ knowledge about technology and support for the utilization of technology (see figure 3). web designers have a somewhat mixed perception about their supervisors’ technical knowledge. more specifically, 37 percent of respondents responded that their supervisors do not have good knowledge about technology; 23 percent responded that their supervisors were somewhat knowledgeable about technology; and 40 percent responded that their supervisors have good knowledge about technology; thus, web designers have mixed evaluations about supervisors’ technical knowledge. web designers reported that their supervisors’ perceptions of the importance of technology and websites are higher than their technical knowledge. approximately 60 percent of designers responded that their supervisors emphasize the importance of technology and websites, and the remaining respondents answered that their supervisors are somewhat aware of the importance or do not value it at all. table 1. instrument construct operationalization source institutional forces following university guidelines regarding website creations investigating other university websites investigating commercial websites attending conferences 11, 12, 15 supervisor’s technical knowledge and support supervisor’s knowledge about technology supervisor’s evaluation of the importance of technology supervisor’s evaluation of the importance of website utilization availability of website tools availability of budgeting availability of technical training availability of website creation training 17, 22 input from secondary sources consulting with experts consulting with other divisions within the library consulting with webmasters consulting with website committee consulting with focus group 10, 25–26 input from users conducting user survey utilizing users’ inputs 10 website success measures from web designer we meet users’ needs we provide better services via the website we satisfy users’ needs we provide quality services our library is overall successful 1, 2 website success measures from website users it lets me finish my project more quickly it helps improve my productivity it helps enhance the quality of my project the extent to which users integrate website library resources into users’ tasks* frequency of users’ visits to university library website** 3, 41, 43 all items are measured with a likert scale: 1 not really; 2: somewhat; and 3: greatly. * measured by percentage **measured by frequency figure 2. institutional forces factors affecting university library website design | kim 103 percent of respondents reported that they consult with web experts; over 70 percent responded that they integrate input from other divisions; and around 70 percent consult with webmasters. the utilization of secondary information sources for website creation is very high except for focus groups. the most widely used technique in this category is expert consultations followed by consultations with other divisions within the library. web designers also consider input from webmasters and web committees. figure 6 shows the extent to which website designers apply input directly derived from web users. around half of respondents reported that they obtain information from user surveys, and around 70 percent responded that they consider users’ input collected via comments, feedback, and complaints. figure 4 shows the extent to which supervisors support web designers. fifty-five percent of respondents reported that they have good web creation tools; 44 percent responded that they have enough budget for website creation, and almost a similar rate of respondents (39 percent) reported that they do not have adequate budgets for website creation. the last two questions concerning training show somewhat different results from the findings of the first two questions. the majority of web designers do not get technology-related or website creation-related training. less than one-third of respondents reported that they receive enough technology-related and web creationrelated training. the findings of the use of secondary sources show in figure 5 that web designers actively leverage such information sources for web design. by category, over 80 figure 3. supervisor’s knowledge about technology figure 4. supervisor’s support figure 5. input from secondary sources figure 6. input from users 104 information technology and libraries | september 2011 majority of users rely on commercial web resources for their academic tasks. ■■ discussion based on the study’s findings, this discussion will first cover the most influential factors first followed by the least influential elements in designing a university library website. first, the most influential factors for website designers are expert opinions and consultations with other divisions within the library. these may be the most important factors because relying on experts allows designers to discover users’ needs while saving costs. web designers also consider input from webmasters and web committees. coercive and mimetic forces are also highly significant factors affecting web designers. the university library is a subset of the university, and thus, designers may need to align themselves with university policy. also, designers can claim legitimacy by imitating other successful university websites, thereby securing necessary resources and support for website creation; however, web designers are much less likely to imitate commercial websites. this finding is consistent with existing reports that organizations imitate other successful organizations’ managerial practices that are within the same industry category.38 the least influential website creation factors are supervisors’ knowledge, which in turn impacts low budget allocations, and web designers’ technical training. this finding is consistent with successful technology deployment literature that shows supervisors’ technical knowledge is highly correlated with budget allocations.39 the lack of training for web designers does not appear to be improved since the last study, which was conducted in 2001;40 library ■■ website success website success is evaluated from two sides: designer opinion and user opinion. overall, designers evaluated their websites to be highly successful. they believe that they meet users’ needs, provide better services via the web, satisfy users’ needs, and provide quality services. later, their evaluation of their website is extremely positive, as reported in figure 7. figure 8 shows users’ perceptions of the usefulness of ulwr. users generally agree that ulwr are useful for their academic projects. more specifically, 55 percent responded that they are able to finish their tasks quickly because of the resources; 65 percent reported that they could increase their productivity; and 67 percent responded that they enhanced project quality thanks to the resources. on the other hand, a significant portion of respondents (more than 30 percent) do not think or have no opinions that ulwr are useful for their academic tasks. figure 9 investigates how often users visit university library websites. approximately 30 percent reported that they never visited or rarely visited the university library website. thirty-two percent made a visit to the website a couple of times a month, and approximately 40 percent visited the library website a couple of times a week or daily. figure 10 examines the users’ utilization of ulwr versus commercial website resources. the responses from 315 users show that they utilize commercial websites more than ulwr. specifically, 46 percent of respondents reported that they use less than 20 percent of ulwr and only 8 percent utilize ulwr more than 80 percent. in contrast, 14 percent utilize less than 20 percent of commercial website resources, and 22 percent utilize more than 80 percent of commercial website resources. the figure 7. website success evaluated by design figure 8. users’ perceptions of website usefulness factors affecting university library website design | kim 105 from a utilitarian perspective, web designers primarily need to consider the ability of the website to meet users’ needs. usefulness again needs to be evaluated by users. according to user assessments ulwr are somewhat satisfactory but not strong enough to rely heavily on for academic projects. it is an alarming fact that users use commercial website resources at a much higher rate than ulwr. this is somewhat disturbing given that web designers strive to provide good services to users, and libraries have invested massive resources into providing online services. this study has implications for academia and practitioners. for academia, there has been sparse research on web design studies from a designer standpoint. it may be because of difficulties in collecting data directly from website designers. from this line of research, this study enhances the understanding of what factors influence university web design. although university websites may be deemed successful, information managers should discover why the majority of users turn to commercial websites for their academic projects. without addressing this problem, the existence of library websites may be compromised. although there is evidence that libraries consider user input, it may not accurately represent all user populations because only extremely satisfied or extremely dissatisfied users tend to provide feedback;43 consequently, a regular survey may facilitate the utilization of ulwr. finally, supervisors’ technical knowledge is found to be low. this problem may be alleviated as time goes on because new generations are more aware of the importance of technology. in the meantime, web designers are encouraged to actively communicate with supervisors about the value of the utilization of technology and seek more financial support. this study’s data have some limitations. although the web designers are usually self-taught rather than formally trained.41 one promising finding, though, is that despite the relatively low technical knowledge held by supervisors, the respondents tend to rank highly when it comes to their perceptions of the importance of technology. compared with other institutional forces, normative force is relatively low. this kind of institutional force is higher at the early stage of technology adoption. in other words, the majority of universities have already launched their websites and have established rules and policies, so libraries are already past this stage. also, input from user surveys is relatively low. this may be because it is very costly, and they have other sources to turn to such as other universities’ successful websites. website success evaluations by web designers and users show discrepancies. overall, web designers evaluate their websites to be highly successful, while user ratings offer a different picture. this incongruity is a red flag in terms of ulwr usage. the majority of users report that they turn to commercial websites more than ulwr, and one-third never or rarely visit the university website. the disparity of the success between web designers and users may be attributed to the sources of information that website designers rely on. more specifically, existing studies report that input from experts and website committees is incongruent with what users really want, while feedback from focus groups can assist in understanding users’ needs.42 ■■ conclusions this study investigates the factors that website designers consider when designing university library websites. figure 9. frequency of visits to university library websites figure 10. university library vs. commercial website 106 information technology and libraries | september 2011 seriously in information systems research,” mis quarterly 29, no. 4 (2005): 591–605. 9. scott, institutions and organizations; dimaggio and powell, “the iron cage revisited”; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 10. scott, institutions and organizations. 11. k. lee, dinesh mirchandani, and xinde zhang, “an investigation on institutionalization of web sites of firms,” the data base for advances in information systems 41, no. 2 (2010): 70–88. 12. lee, mirchandani, and zhang, “an investigation on institutionalization of web sites of firms.” 13. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 14. y-m. kim, an investigation of the effects of it investment on firm performance: the role of complementarity (saarbrucken, germany: vdm verlag, 2008); p. weill, “the relationship between investment in information technology and firm performance: a study of the valve manufacturing sector,” information systems research 3, no. 4 (1992): 307–33. 15. a. lederer and v. sethi, “the implementation of strategic information systems planning methodologies,” mis quarterly (1988): 445–461; j. thong, c. yap, and k. raman, “top management support, external expertise and information systems implementation in small business,” information systems research 7, no. 2 (1996): 248–67; m. earl, “experiences in strategic information systems planning,” mis quarterly (1993): 1–24; a. boynton and r. zmud, “information technology planning in the 1990’s: directions for practice and research,” mis quarterly 11, no. 1 (1987): 59–72. 16. s. jarvenpaa and b. ives, “information technology and corporate strategy: a view from the top,” information systems research 1, no. 4 (1990): 351–76. 17. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 18. ibid. 19. j. veldof and s. nackerud, “do you have the right stuff? seven areas of expertise for successful web site design in libraries,” internet reference services quarterly 6, no. 1 (2001): 20. 20. chen, germain, yang, “an exploration into the practices of library web usability in arl academic libraries”; r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36; j. bobay et al., “working with consultants to test usability: the indiana university bloomington experience,” in usability assessment of library-related web sites: methods and case studies, ed. n. campbell (chicago: ala, 2002): 60–76; h. king and c. jannik, “redesigning for usability: information architecture and usability testing for georgia tech library’s website,” oclc systems & services 21, no. 3 (2005): 235–43. 21. j. h. spyridakis, j. b. barrick, and e. cuddihy, “internetbased research: providing a foundation for web-design guidelines,” ieee transactions on professional communication 48, no. 3 (2005): 242–60; t. a. powell, web design: the complete reference (berkeley, calif.: osborne/mcgraw-hill, 2002). 22. powell, web design. 23. r. tolliver, d. carter, and s. chapman, “website redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–66; l. vandecreek, author tried to increase responses using various means, the number of responses does not allow one to use a sophisticated analytical technique such as regression. this study includes academic libraries with a web designer within the library; as a consequence, libraries without a web designer are not included. it is recommended to collect data from both groups and compare those with a designer (resource rich) and without a designer (resource poor), and discover underlying patterns of the factors impacting website designs and offer implications for academia and managers. references 1. d. v. parboteeah, j. s. valacich and j. d. wells, “the influence of website characteristics on a consumer’s urge to buy impulsively,” information systems research 20, no. 1 (2009): 60–78; m-h. huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 2. y-m. kim, “the adoption of university library web site resources: a multigroup analysis,” journal of the american society for information science & technology 61, no. 5 (2010): 978–93; o. nov and c. ye, “users’ personality and perceived ease of use of digital libraries: the case for resistance to change,” journal of the american society for information science & technology 59 (2008): 845–51; n. park et al., “user acceptance of a digital library system in developing countries: an application of the technology acceptance model” international journal of information management 29, no. 3 (2009): 196–209. 3. w. hong et al., “determinants of user acceptance of digital libraries: an empirical examination of individual differences and system characteristics,” journal of management information systems 18, no. 3 (2001–2): 97–124. 4. parboteeah, valacich and wells, “the influence of website characteristics; j. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151–67. 5. c. burton, “library web site user testing,” collect & undergraduate libraries 9, (2002): 10; s. ryan, “library web site administration: a strategic planning model for the smaller academic library,” journal of academic librarianship 29, no. 4 (2003): 207–18; y-h chen, c.a. germain., and h. yang, “an exploration into the practices of library web usability in arl academic libraries,” journal of the american society for information science and technology 60, no. 5 (2009): 953–68. 6. m-h huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 7. w. r. scott, institutions and organizations (thousand oaks, calif.: sage publications, inc, 1995); p. dimaggio and w. powell, “the iron cage revisited: institutional isomorphism and collective rationality in organizational fields,” american sociological review 48 (1983): 147–60. 8. w. r. scott, institutions and organizations; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627; m. w. chiasson and e. davidson,” taking industry factors affecting university library website design | kim 107 “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; j. ward, “web site redesign: the university of washington libraries’ experience,” oclc systems & services 22, no. 3 (2006): 207–16. 32. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 33. ibid. 34. kim, “the adoption of university library web site resources.” 35. ibid. 36. ibid. 37. y-m. kim, “validation of psychometric research instruments: the case of information science,” journal of the american society for information science & technology 60, no. 6 (2009): 1178–91. 38. h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 39. t. teo and j. ang, “an examination of major is planning problems,” information journal of information management 21 (2001): 457–70. 40. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 41. ibid. 42. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries”; powell, web design; b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http:// www.humanfactors.com/downloads/jan01.asp (accessed june 15, 2011). 43. t. hennig-thurau et al., “electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?” journal of interactive marketing 18, no. 1 (2004): 38–52. “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 24. spyridakis, barrick, and cuddihy, “internet-based research.” 25. b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http://www.humanfactors .com/downloads/jan01.asp (accessed june 10, 2011). 26. powell, web design. 27. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 28. k.a. saeed, y. hwang, and v. grover, “investigating the impact of web site value and advertising on firm performance in electronic commerce,” international journal of electronic commerce 7, no. 2 (2003): 119–41. 29. l. manzari and j. trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163–69. 30. e. abels, m. white, and k. hahn, “identifying user-based criteria for web pages,” internet research 7, no. 4 (1997): 252–56. 31. l. vandecreek, “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; m. ascher, h. lougee-heimer, and d. cunningham, “approaching usability: a study of an academic health sciences library web site,” medical reference services quarterly 26, no. 2 (2007): 37–53; b. battleson, a. booth and j. weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188– 98; g. h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” journal of academic librarianship 28, no. 4 (2002): 205–10; b. thomsett-scott and f. may, “how may we help you? online education faculty tell us what they need from libraries and librarians,” journal of library administration 49, no. 1/2 (2009): 111–35; d. turnbow et al., from our readers | eden 93 bradford lee edenfrom our readers the new user environment: the end of technical services? editor’s note: “from our readers” is an occasional feature highlighting ital readers’ letters and commentaries on timely issues. technical services: an obsolete term used to describe the largest component of most library staffs in the twentieth century. that component of the staff was entirely devoted to arcane and mysterious processes involved in selecting, acquiring, cataloging, processing, and otherwise making available to library users physical material containing information content pieces (incops). the processes were complicated, expensive, and time-consuming, and generally served to severely limit direct service to users both by producing records that were difficult to understand and interpret, even by other library staff, and by consuming from 75–80 percent of the library’s financial and personnel resources. in the twenty-first century, the advent of new forms of publication and new techniques for providing universal records and universal access to information content made the organizational structure obsolete. that change in organizational structure, more than any other single factor, is generally credited as being responsible for the dramatic improvement in the quality of library service that has occurred in the first decade of the twenty-first century. t here are many who would say that i was the one who wrote this quotation. i didn’t, and it is, in fact, more than twenty-five years old!1 while i was beginning to research and prepare for this article, i began as most users today start their search for information: i started with google. granted, i rarely go beyond the first page of results (as most user surveys indicate), but the paucity of links made me click to the next screen. there, at number 16, was a scanned article. jackpot! i thought as i started perusing the contents of this resource online, thinking to myself how the future had changed so dramatically since 1984, with the emergence of the internet and the laptop, all of the new information formats, and the digitization of information. ahh, the power of full text! after reading through the table of contents, introduction, and the first chapter, i noticed that some of the pages were missing. mmmm, obviously some very shoddy scanning on the part of google. but no, i finally realized that only part of this special issue was available on google. obviously, i missed the statement at the bottom of the front scan of the book: “this is a preview. the total pages displayed will be limited. learn more.” and thus the issues regarding copyright reared their ugly head. when discussing the new user environment, there are many demands facing libraries today. in a report by martha bates, citing the principle of least effort first attributed to philologist george zipf and quoted in the calhoun report to the library of congress, she states: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find . . . despite heroic efforts on the part of librarians, students seldom have sufficiently sustained exposure to and practice with library skills to reach the point where they feel real ease with and mastery of library information systems.2 according to the final report of bibliographic services task force of the university of california libraries, users expect the following: ■■ one system or search to cover a wide information universe (e.g., google or amazon) ■■ enriched metadata (e.g., onix, tables of contents, and cover art) ■■ full-text availability ■■ to move easily and seamlessly from a citation about an item to the item itself—discovery alone is not enough ■■ systems to provide a lot of intelligent assistance ■❏ correction of obvious spelling errors ■❏ results sorting in order of relevance to their queries ■❏ help in navigating large retrievals through logical subsetting or topical maps or hierarchies ■❏ help in selecting the best source through relevance ranking or added commentary from peers and experts or “others who used this also used that” tools ■❏ customization and personalization services ■■ authenticated single sign-on ■■ security and privacy ■■ communication and collaboration ■■ multiple formats available: e-books, mpeg, jpeg, rss and other push technologies, along with traditional, tangible formats ■■ direct links to e-mail, instant messaging, and sharing ■■ access to online virtual communities ■■ access to what the library has to offer without actually having to visit the library3 bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services & scholarly communication, university of california, santa barbara. 94 information technology and libraries | june 2010 what is there in this new user environment for those who work in technical services? as indicated in the opening quote, would a dramatic improvement in library services occur if technical services were removed from the organizational structure? even in 1983, the huge financial investment that libraries made in the organization and description of information, inventory, workflows, and personnel was recognized; today, that investment comes under intense scrutiny as libraries realize that we no longer have a monopoly on information access, and to survive we need to move forward more aggressively into the digital environment than ever before. as marcum stated in her now-famous article, ■■ if the commonly available books and journals are accessible online, should we consider the search engines the primary means of access to them? ■■ massive digitization radically changes the nature of local libraries. does it make sense to devote local efforts to the cataloging of unique materials only rather than the regular books and journals? ■■ we have introduced our cataloging rules and the marc format to libraries all over the world. how do we make massive changes without creating chaos? ■■ and finally, a more specific question: should we proceed with aacr3 in light of a much-changed environment?4 there are larger internal issues to consider here as well. the budget situation in libraries requires the application of business models to workflows that have normally not been questioned nor challenged. karen calhoun discusses this topic in a number of her contributions to the literature: when catalog librarians identify what they contribute to their communities with their methods (the cataloging rules, etc.) and with the product they provide (the catalog), they face the danger of “marketing myopia.” marketing myopia is a term used in the business literature to describe a nearsighted view that focuses on the products and services that a firm provides, rather than the needs those products and services are intended to address.5 for understanding the implementation issues associated with the leadership strategy, it is important to be clear about what is meant by the “excess capacity” of catalogs. most catalogers would deny there is excess capacity in today’s cataloging departments, and they are correct. library materials continue to flood into acquisitions and cataloging departments and staff can barely keep up. yet the key problem of today’s online catalog is the effect of declining demand. in healthy businesses, the demand for a product and the capacity to produce it are in balance. research libraries invest huge sums in the infrastructure that produces their local catalogs, but search engines are students and scholars’ favorite place to begin a search. more users bypass catalogs for search engines, but research libraries’ investment in catalogs—and in the collections they describe—does not reflect the shift in user demand.6 i have discussed this exact problem in recent articles and technical reports as well.7 there have to be better, more efficient ways for libraries to organize and describe information not based on the status quo of redundant “localizing” of bibliographic records. a good analogy would be the current price of gas and the looming transportation crisis. for many years, americans have had the luxury of being able to purchase just about any type of car, truck, suv, hummer, etc., that they wanted on the basis of their own preferences, personalities, and incomes, not on the size of the gas tank or on the mileage per gallon. why not buy a mercedes over a kia? but with gas prices now well above the average person’s ability to consistently fill their gas tank without mortgaging their future, the market demands that people find alternative solutions in order to survive. this has meant moving away from the status quo of personal choice and selection toward a more economic and sustainable model of informed fuel-efficiency transportation, so much so that public transportation is now inundated with more users than it can handle, and consumers have all but abandoned the truck and suv markets. libraries have long worked in the mercedes arena, providing features such as authority control, subject classification, and redundant localizing of bibliographic records that were essential when libraries held the monopoly on information access but are no longer cost-efficient—nor even sane—strategies in the current information marketplace. users are not accessing the opac anymore; well-known studies indicate that more than 80 percent of information seekers begin their search on a web search engine. libraries are investing huge resources in staffing and priorities fiddling with marc bibliographic records in a time when they are struggling to survive and adapt from a monopoly environment to being just one of many players in the new information marketplace. budgets are stagnant, staffing is at an all-time low, new information formats continue to appear and require attention, and users are no longer patient nor comfortable working with our clunky opacs.8 why do libraries continue to support an infrastructure of buying and offering the same books, cds, dvds, journals, etc., at every library, when the new information environment offers libraries the opportunity to showcase and present their unique information resources and one-of-a-kind collections to the world? special collections materials held by every major research and public library in the world can now be digitized, and from our readers | eden 95 sparse library resources need to be adjusted to compete and offer these unique collections and their services to our users and the world. the october 2007 issue of computers in libraries is devoted solely to articles related to the enhancement, usability, appropriateness, and demise of the library opac. interesting articles include “fac-back-opac: an open source solution interface to your library system,” “dreaming of a better ils,” “plug your users into library resources with opensearch plug-ins,” delivering what people need, when and where they need it,” “the birth of a new generation of library interfaces,” and “will the ils soon be as obsolete as the card catalog?” an especially interesting quote is given by cervone, then assistant university librarian for information technology at northwestern university: what i’d like to see is for the catalog to go away. to a great degree, it is an anachronism. what we need from the ils is a solid, business-process back end that would facilitate the functions of the library that are truly unique such as circulation, acquiring materials, and “cataloging” at the item level for what amounts to inventory-control purposes. most of the other traditional ils functions could be rolled over into a centralized system, like oclc, that would be cooperatively shared. the catalog itself should be treated as just another database in the world of resources we have access to. a single interface to those resources that would combine our local print holdings, electronic text (both journal and ebook), as well as multimedia material is what we should be demanding from our vendors.9 one book that needs to be required reading for all librarians, especially catalogers, is weinberger ’s everything is miscellaneous.10 he describes the three orders of order (self organization, metadata, and digital); provides an extensive history of how western civilization has ordered information, specifically the links to nineteenth-century victorianism; and the concepts of lumping and splitting. in the end, weinberger argues that the digital environment allows users to manipulate information into their own organization system, disregarding all previous organizational attempts by supposed experts using outdated and outmoded systems. in the digital disorder of information, an object (leaf) can now be placed on many shelves (branches), figuratively speaking, and this new shape of knowledge brings out four strategic principles: 1. filter on the way out, not on the way in. 2. put each leaf on as many branches as possible. 3. everything is metadata and everything can be a label. 4. give up control. it is this last principle that libraries have challenges with. whether we agree with this principle or not, it has already happened. arguing about it, ignoring it, or just continuing to do business as usual isn’t going to change the fact that information is user-controled and user initiated in the digital environment. so, where do we go from here? the future of technical services (and its staff) far be it from me to try to predict the future of libraries as viable, and more importantly marketable, information organizations in this new environment. one has only to examine the quotations from the first issues of technical services quarterly to see what happens to predictions and opinions. titles of some of the contributions (from 1983, mind you) are worthy of mention: “library automation in the year 2000,” “musings on the future of the catalog,” and “libraries on the line.” there are developments, however, that require reexamination and strategic brainstorming regarding the future of library bibliographic organization and description. the appearance of worldcat local will have a tremendous impact on the disappearance of proprietary vendor opacs. there will no longer be a need for an integrated library system (ils); with worldcat local, the majority of the world’s marc bibliographic records are available in a library 2.0 format. the only things missing are some type of inventory and acquisitions module that can be formatted locally and a circulation module. if oclc could focus their programming efforts on these two services and integrate them into worldcat local, library administrators and systems staff would no longer have to deal with proprietary and clunky opacs (and their huge budgetary lines), but could use the power of web 2.0 (and hopefully 3.0) tools and services to better position themselves in the new information marketplace. another major development is the google digitization project (and other associated ventures). while there are some concerns about quality and copyright,11 as well as issues related to the disappearance of print and the time involved to digitize all print,12 no one can deny the gradual and inevitable effect that mass digitization of print resources will have in the new information marketplace. just the fact that my research explorations for this article brought up digitized portions of the 1983 technical services quarterly articles is an example. more and more, published print information will be available in full-text online. what effect will this have on the physical collection that all libraries maintain, not only in terms of circulation, but also in terms of use of space, preservation, and collection development? no one knows for sure, but if the search strategies and information discovery patterns of our users are any 96 information technology and libraries | june 2010 indication, then we need to be strategically preparing and developing directions and options. automatic metadata generation has been a topic of discussion for a number of years, and jane greenberg’s work at the university of north carolina–chapel hill is one of the leading examples of research in this area.13 while there are still viable concerns about metadata generation without any type of human intervention, semiautomatic and even nonlibrary-facilitated metadata generation has been successful in a number of venues. as libraries grapple with decreased budgets, multiplying formats, fewer staff to do the work, and more retraining and reprofessional development of existing staff, library administrators have to examine all options to maximize personnel as well as budgetary resources. incorporating new technologies and tools for generating metadata without human intervention into library workflows should be viewed as a viable option. user tagging would be included in this area. even intner, a long-time proponent of traditional technical services, has written that generating cataloging data automatically would be of great benefit to the profession, and that more tools and more programming ought to be focused toward this goal.14 so, with print workflows being replaced by digital and electronic workflows, how can administrators assist their technical services staff to remain viable in this new information environment? how can technical services staff not only help themselves but their supervisors and administrators to incorporate their unique talents, expertise, education, and experience toward the type of future scenarios indicated above? competencies and challenges for technical services staff there are some good opinions available for assisting technical services staff with moving into the new environment. names have power, whether we like to admit it or not, and changing the name from “technical services” to something more understandable to our users, let alone our colleagues within the library, is one way to start. names such as “collections and data management services” or “reference data services” have been mentioned.15 an interesting quote sums up the dilemma: it’s pretty clear that technical services departments have long been the ugly ducklings in the library pond, trumped by a quintet of swans: reference departments (the ones with answers for a grateful public); it departments (the magicians who keep the computers humming); children’s and youth departments (the warm and fuzzy nurturers); other specialty departments (the experts in good reads, music, art, law, business, medicine, government documents, av, rare books and manuscripts, you-name-it); and administrative groups (the big bosses). part of the trouble is that the rest of our colleagues don’t really know what technical services librarians do. they only know that we do it behind closed doors and talk about it in language no one else understands. if it can’t be seen, can’t be understood, and can’t be discussed, maybe it’s all smoke and mirrors, lacking real substance. it’s easy to ignore.16 ruschoff mentions competencies for technical services librarians in the new information environment: comfortable working in both print and digital worlds, specialized skills such as foreign languages and subject area expertise, comfortable working in both digital and web-based technologies (suggesting more computing and technology skills), expertise in digital asset management, and problem-solving analytical skills.17 in a recent blog posting summarizing a presentation at the 2008 ala annual conference on this topic, comparisons between catalogers going extinct or retooling are provided. the following is a summary of that post: converging trends ■■ more catalogers work at the support-staff level than as professional librarians. ■■ more cataloging records are selected by machines. ■■ more catalog records are being captured from publisher data or other sources. ■■ more updating of catalog records is done via batch processes. ■■ libraries continue to deemphasize processing of secondary research products in favor of unique primary materials. what are our choices? ■■ behind door number one—the extinction model. ■■ behind door number two—the retooling model. how it’s done ■■ extinction ■❏ keep cranking about how nobody appreciates us. ■❏ assert over and over that we’re already doing everything right—why should we change? ■❏ adopt a “chicken little” approach to envisioning the future. ■■ retooling ■❏ considers what catalogers already do. ■❏ look for support. ■❏ find a new job. what catalogers do ■■ operate within the boundaries of detailed standards. ■■ describe items one-at-a-time. ■■ treat items as if they are intended to fit carefully from our readers | eden 97 within a specific application—the catalog. ■■ ignore the rest of the world of information. what metadata librarians do ■■ think about descriptive data without preconceptions around descriptive level, granularity, or descriptive vocabularies. ■■ consider the entirety of the discovery and access issues around a set or collection of materials. ■■ consider users and uses beyond an individual service when making design decisions—not necessarily predetermined. ■■ leap tall buildings in a single bound. what new metadata librarians do ■■ be aware of changing user needs. ■■ understand the evolving information environment. ■■ work collaboratively with technical staff. ■■ be familiar with all metadata formats and encoding metadata. ■■ seek out tall buildings—otherwise jumping skills will atrophy. the cataloger skill set ■■ aacr2, lc, etc. the metadata librarian skill set ■■ views data as collections, sets, streams. ■■ approaches the task as designing data to “play well with others.” characteristics of our new world ■■ no more ils ■■ bibliographic utilities are unlikely to be the central node for all data. ■■ creation of metadata will become more decentralized. ■■ nobody knows how this will all shake out, but metadata librarians will be critical in forging solutions.18 while the above summary focuses on catalogers and their future, many of the directions also apply to any librarian or support staff member currently working in technical services. in a recent educause review article, brantley lists a number of mantras that all libraries need to repeat and keep in mind in this new information environment: ■■ libraries must be available everywhere. ■■ libraries must be designed to get better through use. ■■ libraries must be portable. ■■ libraries must know where they are. ■■ libraries must tell stories. ■■ libraries must help people learn. ■■ libraries must be tools of change. ■■ libraries must offer paths for exploration. ■■ libraries must help forge memory. ■■ libraries must speak for people. ■■ libraries must study the art of war.19 you will have to read the article to find out about that last point. the above mantras illustrate that each of these issues must also be aligned with the work done by technical services departments in support of the rest of the library’s services. and there definitely isn’t one right way to move forward; each library with its unique blend of services and staff has to define, initiate, and engender dialogue on change and strategic direction, and then actively make decisions with integrity and vigor toward both its users and its staff. as calhoun indicates, there are a number of challenges to feasibility for next steps in this area, some technically oriented but many based on our own organizational structures and strictures: ■■ difficulty achieving consensus on standardized, simplified, more automated workflows. ■■ unwillingness or inability to dispense with highly customized acquisitions and cataloging operations. ■■ overcoming the “not invented here” mindset preventing ready acceptance of cataloging copy from other libraries or external sources. ■■ resistance to simplifying cataloging. ■■ inability to find and successfully collaborate with necessary partners (e.g., ils vendors). ■■ difficulty achieving basic levels of system interoperability. ■■ slow development and implementation of necessary standards. ■■ library-centric decision making; inability to base priorities on how users behave and what they want ■■ limited availability of data to support management decisions. ■■ inadequate skill set among library staff; unwillingness or inability to retrain. ■■ resistance to change from faculty members, deans, or administrators.20 moving forward in the new information world in a recent discussion on the autocat electronic discussion list regarding the client-business paradigm now being impressed on library staff, an especially interesting quote puts the entire debate into perspective: the irony of this discussion is that our patrons/users/ clients [et al.] expect to be treated as well as business customers. they pay tuition or taxes to most of our institutions and expect to have a return in value. and a very large percentage of them care about the differences between the government services vs. business 98 information technology and libraries | june 2010 arguments we present. what they know is that when they want something, they want it. more library powers-that-be now come from the world of business rather than libraries because of the pressure on the bottom line. business administrators are viewed, even by those in public administration, as being more fiscally able than librarians. i would recommend that we fuss less about titles and semantics and develop ways to show the value of libraries to the public.21 wheeler, in a recent educause review article, documents a number of “eras” that colleges and universities have gone through in recent history.22 first is the “era of publishing,” followed by the “era of participation” with the appearance of the internet and its social networking tools. the next era, the “era of certitude,” is one in which users will want quick, timely answers to questions, along with some thought about the need and context of the question. wheeler espouses five dimensions that tools of certitude must have: reach, response, results, resources, and rights. he explains these dimensions in regards to various tools and services that libraries can provide through human–human, human–machine, and machine–machine interaction.23 wheeler sees extensive rethinking and reengineering by libraries, campuses, and information technology to assist users to meet their information needs. are there ways that technical services staff can assist in these efforts? although somewhat dated, calhoun’s extensive article on what is needed from catalogers and librarians in the twenty-first century expounds a number of salient points.24 in table 1, she illustrates some of the many challenges facing traditional library cataloging, providing her opinion on what the challenges are, why they exist, and some solutions for survivability and adaptability in the new marketplace.25 one quote in particular deserves attention: at the very least, adapting successfully to current demands will require new competencies for librarians, and i have made the case elsewhere that librarians must move beyond basic computer literacy to “it fluency”—that is, an understanding of the concepts of information technology, especially applying problem solving and critical thinking skills to using information technology. raising the bar of it fluency will be even more critical for metadata specialists, as they shift away from a focus on metadata production to approaches based on it tools and techniques on the one hand, and on consulting and teamwork on the other. as a result of the increasing need for it fluency among metadata specialists, they may become more closely allied with technical support groups in campus computing centers. the chief challenges for metadata specialists will be getting out of library back rooms, becoming familiar with the larger world of university knowledge communities, and developing primary contacts with the appropriate domain experts and it specialists.26 getting out of the back room and interacting with users seems to be one of the dominant themes of evolving technical services positions to fit the new information marketplace. putting web 2.0 tools and services into the library opac has also gained some momentum since the launch of the endeca-based opac at north carolina state university. as some people have stated, however, putting “lipstick on a pig” doesn’t change the fundamental problems and poor usability of something that never worked well in the first place.27 in their recent article, jia mi and cathy weng tried to answer the following questions: why is the current opac ineffective? what can libraries and librarians do to deliver an opac that is as good as search engines to better serve our users?28 of course, the authors are biased toward the opac and wish to make it better, given that the last sentence in their abstract is, “revitalizing the opac is one of the pressing issues that has to be accomplished.” users’ search patterns have already moved away from the opac as a discovery tool; why should personnel and resource investment continue to be allocated toward something that users have turned away from? in their recommendations, mi and weng indicate that system limitations, not fully exploiting the functionality already made available by ilss, and the unsuitability of marc standards to online bibliographic display are the primary factors to the ineffectiveness of library opacs. exactly. debate and discussion on autocat after the publication of their article again shows the line drawn between conservative opinions (added value, noncommercialization, and overall ideals of the library profession and professional cataloging workflows) and the newer push for open-source models, junking the opac, and learning and working with non-marc metadata standards and tools. conclusion from an administrative point of view, there are a number of viable options for making technical services as efficient as possible, in its current emanation: ■■ conduct a process review of all current workflows, following each type of format from receipt at loading dock to access by user. revise and redesign workflows for efficiency. ■■ eliminate all backlogs, incorporating and standardizing various types of bibliographic organization (from brief records to full records, using established criteria of importance and access). ■■ as much as possible, contract with vendors to make from our readers | eden 99 all print materials shelf-ready, establishing and monitoring profiles for quality and accuracy. establish a rate of error that is amenable to technical services staff; once that error rate is met, review incoming print materials only once or twice a year. ■■ assure technical services staff that their skills, experience, and attention to detail are needed in the electronic environment, and provide training and professional development to assist them in scanning and digitizing unique collections, learning non-marc metadata standards, improving project management, and performing consultation training to interact with faculty and students who work with data sets, metadata, and research planning. support and actively work for revised job reclassification of library support staff positions. most libraries are forced to work with fewer staff, and it is essential that current personnel are valued for their institutional knowledge and skill sets (knowledge management philosophy). library administrations need to emphasize to their staff that the organization has a vested interest in providing them with the tools and training they need to assist the organization in the new information marketplace. the status quo of technical services operations is no longer viable or cost-effective; all of us must look at ways to regain market share and restructure our organizations to collaborate and consult with users regarding their information and research needs. no longer is it enough to just provide access to information; we must also provide tools and assistance to the user in manipulating that information. to end, i would like to quote from a few of the articles from that 1983 issue of technical services quarterly i have alluded to throughout this chapter: like all prognostications, predictions about cataloging in a fully automated library may bear little resemblance to the ultimate reality. while the future cataloging scenario discussed here may seem reasonable now, it could prove embarrassing to read 10–20 years hence. still, i would be pleasantly surprised if, by the year 2000, ts operations are not fully integrated, ts staff has not been greatly reduced, there has not been a large-scale jump in ts productivity accompanied by a dramatic decline in ts costs, and if most of us are not cooperating through a national database.29 in conclusion, i will revert to my first subject, the uncertain nature of predictions. in addition to the fearless predictions already recorded, i predict that some of these predictions will come true and perhaps even most of them. some of them will come true, but not in the time anticipated, while others never will. let us hope that the influences not guessed that will prevent the actualization of some of these predictions will be happy ones, not dire. however they turn out, i predict that in ten years no one will remember or really care what these predictions were.30 technical services as we know them now may well not exist by the end of the century. the aims of technical services will exist for as long as there are libraries. the technical services quarterly may well have changed its name and its coverage long before then, but its concerns will remain real and the work to which many of us devote our lives will remain worthwhile. there can be few things in life that are as worth doing as enabling libraries to fulfill their unique and uniquely important role in culture and civilization.31 twenty-five years have come and gone; some of the predictions in this first issue of technical services quarterly came true, many of them did not. there have been dramatic changes in those twenty-five years, most of which were unforeseen, as they always are. what is a certainty is that libraries can no longer sustain or maintain the status quo in technical services. what also is a certainty is that technical services staff, with their unique skills, talents, abilities, and knowledge in relation to the organization and description of information, are desperately needed in the new information environment. it is the responsibility of both library administrators and technical services staff to work together to evolve and redesign workflows, standards, procedures, and even themselves to survive and succeed into the future. references 1. norman d. stevens, “selections from a dictionary of libinfosci terms,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 260. 2. marcia j. bates, “improving user access to library catalog and portal information: final report,” (paper presented at the library of congress bicentennial conference on bibliographic control for the new millennium, june 1, 2003): 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03 .doc.pdf (accessed apr. 7, 2009). see also karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” final report to the library of congress, mar. 17, 2006, 25, http://www.loc.gov/catdir/calhoun-report-final .pdf (accessed apr. 7, 2009). 3. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, dec. 2005, 8, http://libraries.universityofcalifornia.edu/sopag/bstf/final. pdf (accessed apr. 7, 2009). 4. deanna b. marcum, “the future of cataloging,” library resources & technical services 50, no. 1 (jan. 2006): 9, http://www .loc.gov/library/reports/catalogingspeech.pdf (accessed apr. 100 information technology and libraries | june 2010 7, 2009). 5. karen calhoun, “being a librarian: metadata and metadata specialists in the twenty-first century,” library hi tech 25, no. 2 (2007), http://www.emeraldinsight.com/insight/view contentservlet?filename=published/emeraldfulltextarticle/ articles/2380250202.html (accessed apr. 7, 2009). 6. calhoun, “the changing nature of the catalog,” 15. 7. bradford lee eden, “ending the status quo,” american libraries 39, no. 3 (mar. 2008): 38; eden, introduction to “information organization future for libraries,” library technology reports 44, no. 8 (nov./dec. 2007): 5–7. 8. see karen schneider’s “how opacs suck” series on the ala techsource blog, http://www.techsource.ala.org/ blog/2006/03/how-opacs-suck-part-1-relevance-rank-or-the -lack-of-it.html, http://www.techsource.ala.org/blog/2006/04/ how-opacs-suck-part-2-the-checklist-of-shame.html, and http:// www.techsource.ala.org/blog/2006/05/how-opacs-suck-part3-the-big-picture.html (accessed apr. 7, 2009). 9. h. frank cervone, quoted in ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (oct. 2007): 14. 10. david weinberger, everything is miscellaneous: the power of the new digital disorder (new york: times, 2007). 11. for a list of these concerns, see robert darnton, “the library in the new age,” the new york review of books 55, no. 10 (june 12, 2008), http://www.nybooks.com/articles/21514 (accessed apr. 7, 2009). 12. see calhoun, “the changing nature of the catalog,” 27. 13. see the metadata research center, “automatic metadata generation applications (amega),” http://ils.unc.edu/mrc/ amega (accessed, apr. 7, 2009). 14. sheila s. intner, “generating cataloging data automatically,” technicalities 28, no. 2 (mar./apr. 2008): 1, 15–16. 15. sheila s. intner, “a technical services makeover,” technicalities 27, no. 5 (sept./oct. 2007): 1, 14–15. 16. ibid, 14 (emphasis added). 17. carlen ruschoff, “competencies for 21st century technical services,” technicalities 27, no. 6 (nov./dec. 2007): 1, 14–16. 18. diane hillman, “a has-been cataloger looks at what cataloging will be,” online posting, metadata blog, july 1, 2008, http://blogs.ala.org/nrmig.php?title=creating_the_future_of_ the_catalog_aamp_&more=1&c=1&tb=1&pb=1 (accessed apr. 7, 2009). 19. peter brantley, “architectures for collaboration: roles and expectations for digital libraries,” educause review 43, no. 2 (mar./apr. 2008): 31–38. 20. calhoun, “the changing nature of the catalog,” 13. 21. brian briscoe, “that business/customer stuff (was: letter to al),” online posting, autocat, may 30, 2008. 22. brad wheeler, “in search of certitude,” educause review 43, no. 3 (may/june 2008): 15–34. 23. ibid., 22. 24. karen calhoun, “being a librarian.” 25. ibid. 26. ibid. (emphasis added). 27. andrew pace, quoted in roy tennant, “digitl libraries: ‘lipstick on a pig,’” library journal, apr. 15, 2005, http:// www.libraryjournal.com/article/ca516027.html (accessed apr. 7, 2009). 28. jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (mar. 2008): 5–22. 29. gregor a. preston, “how will automation affect cataloging staff?” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/ winter 1983): 134. 30. david c. taylor, “the library future: computers,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 92–93. 31. michael gorman, “technical services, 1984–2001 (and before),” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 71. lita cover 2, cover 3 neal-schuman cover 4 index to advertisers a tale of two tools: comparing libkey discovery to quicklinks in primo ve communication a tale of two tools comparing libkey discovery to quicklinks in primo ve jill k. locascio and dejah rubel information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16253 jill k. locascio (jlocascio@sunyopt.edu) is associate librarian, suny college of optometry. dejah rubel (dejahrubel@ferris.edu) is metadata and electronic resources management librarian, ferris state university. © 2023. introduction consistent delivery of full-text content has been a challenge for libraries since the development of online databases. library systems have attempted to meet this challenge, but link resolvers and early direct linking tools often fell short of patron expectations. in the last several years, a new generation of direct linking tools has appeared, two of which will be discussed in this article: third iron’s libkey discovery and quicklinks by ex libris, a clarivate company. figure 1 shows the “download pdf” link added by libkey. figure 2 shows the “get pdf” link provided by quicklinks. the way we configured our discovery interface, a resource cannot receive both the libkey and quicklinks pdf links. these two direct linking tools were chosen because they were both relatively new to the market in april 2021 when this analysis took place and they can both be integrated into primo ve, the library discovery system of choice at the authors’ home institutions of suny college of optometry and ferris state university. through analysis of the frequency of direct links, link success rate, and number of clicks, this study may help determine which product is most likely to meet your patrons’ needs. figure 1. example of a libkey discovery link in primo ve. figure 2. example of a quicklink in primo ve. mailto:jlocascio@sunyopt.edu mailto:dejahrubel@ferris.edu information technology and libraries june 2023 a tale of two tools 2 locascio and rubel literature review over the past 20 years link resolvers and direct linking have evolved in tandem. early link generator tools, such as proquest’s sitebuilder, often involved a process that “… proved too cumbersome for most end-users.”1 five years later, tools from ebsco, gale, ovid, and proquest had improved, but they were all proprietary. bickford postulates that metadata-based standards, like openurl, may make linking as simple as copying and pasting from the address bar; however, they may be more likely to fail “… as long as vendors use incompatible, inaccurate, or incomplete metadata.”2 the first research was wakimoto’s 2006 study of sfx, which relied on 224 test queries and 188,944 individual uses for its data set. 3 of those queries, 39.7% of search results included a full-text link and that link was accessed 65.2% of the time. unfortunately, wakimoto also discovered that 22.2% of all full-text results failed and concluded that most complaints against sfx were problems with the systems it links to and not the link resolver itself. alth ough intended to be provider-neutral, the openurl standard is, in fact, vulnerable to metadata omissions. content providers, whether aggregators or publishers, have a vested interest in link stability and platform use and have therefore invested in building direct link generation tools. in 2006, grogg examined ebsco’s smartlink, which checks access rights before generating the link; proquest’s crosslinks, which was used to link from proquest to another vendor’s content; silverplatter and links@ovid, which relied on a knowledge base in the terabytes for static links.4 in 2008, cecchino described the national library of medicine’s linkout tool for selected publishers within pubmed.5 they also described two ovid products: links@ovid and linksolver, noting that the former is similar to linkout and the latter is similar to sfx. most of the time these tools worked well, but their use was restricted to a particular platform or set of publishers. as online public catalogs became discovery layers, direct linking became a feature of the library management system. two studies have been done thus far: silton’s analysis of summon and stuart’s analysis of 360 link. in 2014, silton tested the percentage of full-text articles retrievable from summon by running a test query and examining the first 100 results. over a year, the total success rate for unfiltered queries rose from 61% to 76%. after direct linking was introduced, the success rate of link resolver links rose to 65.8–73% and direct links succeeded 90.48–100% of the time. silton concluded, “while direct linking had some issues in its early months, it generally performs better than the link resolver.”6 in 2011, stuart, varnum, and ahronheim began testing the 1-click feature of 360 link on 579 citations, 82.2% of which were successful. after direct linking became an option for summon in 2012, 61–70% of their sample relied on it. “between direct linking and 1-click about 93 to 94% of the time an attempt was made to lead users directly to the full text of the article … [and] … we were able to reach full text … from 79% to about 84% of the time.”7 direct linking outperformed 1-click with a 90% success rate compared to 58–67% for 1-click. stuart also compared the actual error rate with one based on user reports and discovered that “relying solely on user reports of errors to judge the reliability of full-text links dramatically underreports true problems by a factor of 100.”8 openurl links were especially alarming with approximately 20% of them failing. although direct linking is more reliable, stuart closes by noting that direct linking binds libraries closer to vendors thereby decreasing institutional their flexibility. information technology and libraries june 2023 a tale of two tools 3 locascio and rubel methods the goal of this project was to assess two of the latest direct linking tools: ex libris’s native quicklinks feature and third iron’s libkey discovery. we performed a side-by-side comparison of the two tools by searching for specific articles in primo ve, the library discovery system used by the authors’ respective home institutions, suny college of optometry and ferris state university, and measuring • how often each vendor’s direct links appeared on the brief record; • success rate of the links; and • number of clicks it takes from each link to reach the pdf full text. both suny college of optometry and ferris state university use ex libris’ alma as their library services platform. alma provides a number of usage reports in their analytics module. we sourced the queries used in our analysis from the alma analytics link resolver usage report. the report contains a field number of requests, which records the number of times an openurl request was sent to the link resolver. an openurl request is sent to the link resolver when the user clicks on a link to the link resolver from an outside source (such as google scholar), for example, when the user submits a request using primo’s citation linker or when the user accesses the article’s full record in primo by clicking on either the brief record’s title or availability statement. this means that results that have a direct link (whether a quicklink or libkey discovery link) on the brief record will not appear in the report if the user clicked the direct link to the article. thus, in order to create test searches that would be an accurate representation of articles being accessed, we used article titles taken from suny optometry’s october 2019 alma link resolver usage report— a report that was generated prior to the implementation of both libkey discovery and quicklinks. the report was filtered to include only articles with the source type of primo/primo central to ensure that the initial search was taking place within the native primo interface, as requests from outside sources like google scholar or from primo’s citation linker are irrelevant to this analysis. this filtering generated a total of 412 articles. after further removal of duplicates and non -article material, there were 386 article titles in our test query set. we created two separate primo views as test environments: one with libkey discovery and the other with quicklinks. we ran the test searches twice in each view. in the first round of testing, we recorded whether a direct link was present. we also recorded the name of the full-text provider (if present), as well as whether the article was open access. suny optometry does not filter their primo results by availability; therefore, many of the articles included in the initial search did not have any associated full-text activations. since these articles are irrelevant to our assessment, we removed them before analyzing the first round of data and proceeding with the second search. the exception to these removals were articles identified as open access by unpaywall, as the presence of unpaywall links is independent of any activations in alma. furthermore, third iron’s libkey discovery and ex libris’ quicklinks both incorporate unpaywall’s api into their products to provide direct links to pdfs of open access articles. this functionality helps fill coverage gaps where institutions may not have activated a hybrid open access journal due to its paywalls. therefore, we are including the presence of direct links resulting from the unpaywall api when determining whether a libkey discovery link or quicklink is present. after filtering for availability, we had 254 article titles for the first round of searching and analysis. the initial analysis revealed the need to further filter the information technology and libraries june 2023 a tale of two tools 4 locascio and rubel articles used for the second round of searching, which would provide a much closer comparison of the two direct linking tools as third iron had partnered with more content providers than ex libris. controlling for shared providers would give a more accurate representation of how each direct linking tool performs in relation to the other. when controlling for shared providers and open access articles, we were left with 145 article titles for the second query set. during the second round of searching, we measured whether the direct link was successful in linking to the full text—meaning that the link was neither broken nor linked to an incorrect article—and how many clicks were necessary to get from the direct link to the article pdf. along the way, additional qualitative measures were observed, such as document download time and metadata record quality. while not as easy to measure as the quantitative data, these observations provided additional insight into the strengths and weaknesses of each of these direct linking tools. since april 2022, when our research was conducted, ex libris has added several quicklinks providers, possibly increasing the current number of quicklinks available. additionally, both rounds of searching were conducted on campus, so our analysis excludes any consideration of authentication and/or proxy information. results of the 254 articles searched, 208 (82%) had libkey discovery links present while 129 (52%) had quicklinks present. while this seems like a large discrepancy between the two direct link providers, it can be explained by the fact that during the time of testing, ex libris was collaborating with fewer content providers than third iron. ex libris has since added more providers. while the provider discrepancy meant that there were many instances where a libkey discovery link was present where a quicklink was not, there were 5 articles where a quicklink was present while a libkey discovery link was not. as mentioned previously, the criterion for the 254 articles included in the second round of searching was that the articles must be activated in alma or must be open access. of these 254 articles, we identified 137 (54%) as open access. of those open access articles, 132 (96%) had libkey discovery links present, and 118 (86%) had quicklinks present. we found that 113 (82%) of the open access articles had both libkey discovery links and quicklinks present. we also discovered within this set of 137 open access articles that 30 (22%) were from non-activated resources. of those 30 open access articles from non-activated titles, all 30 (100%) had libkey discovery links appearing on the brief results and 24 (80%) had quicklinks. to get a better idea of how libkey discovery links and quicklinks compared in terms of linking success, we filtered to only those articles available from providers who were participating in both libkey discovery links as well as quicklinks. since both direct linking tools use unpaywall integrations, we continued to include open access articles. this filtering resulted in 145 articles where libkey discovery links were present in 137 articles (94%) while quicklinks were present in 129 articles (89%). we found that 123 (85%) of these 145 articles had both libkey discovery links and quicklinks present. there were 2 (1%) articles that had neither libkey discovery links nor quicklinks present despite being activated in a journal currently participating as a provider in both direct linking tools. there were also 14 articles (10%) that had libkey discovery links but information technology and libraries june 2023 a tale of two tools 5 locascio and rubel not quicklinks; all of these articles were open access. in total, of the 145 articles searched, 128 (88%) were identified as open access. as for the 137 libkey discovery links, 130 (95%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. of the 129 quicklinks, 126 (98%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. we also attempted to measure the time it took for the pages to load after the initial click on the libkey discovery links and quicklinks; however, the tools used to measure this, as well as the environments in which the links were being clicked, proved too varied to provide an appropriate comparison. nevertheless, we noted observations such as the page load times after clicking on libkey discovery links and quicklinks were generally consistent, but quicklinks attempts to connect to the wiley platform took a significant time (at least 10 seconds) to load. conclusions with high article linking success rates, both third iron’s libkey discovery and ex libris’ quicklinks deliver on the promise to provide fast and seamless access to full-text articles. however, the libkey discovery tool far outpaces quicklinks when it comes to coverage. both direct linking tools perform well with open access articles, supplying libraries with better options for full-text links to articles that may be in hybrid journals. as with any kind of full-text linking, both direct linking tools rely on metadata. in conclusion, while libkey discovery provides a more complete direct linking solution, both libkey discovery and quicklinks are reliable tools that improve primo’s discovery and delivery experience. endnotes 1 david bickford, “using direct linking capabilities in aggregated databases for e-reserves,” journal of library administration 41, no. 1/2 (2004): 31–45, https://doi.org/10.1300/j111v41n01_04. 2 bickford, 45. 3 wendy furlan, “library users expect link resolvers to provide full text while librarians expect accurate results,” evidence based library and information practice 1, no. 4 (2006): 60–63, https://doi.org/10.18438/b88c7p. 4 jill e. grogg, “linking without a stand-alone link resolver,” library technology reports 42, no. 1 (2006): 31–34. 5 nicola j. cecchino, “full-text linking demystified,” journal of electronic resources in medical libraries 5, no. 1 (2008): 33–42, https://doi.org/10.1080/15424060802093377. 6 kate silton, “assessment of full-text linking in summon: one institution’s approach,” journal of electronic resources librarianship 26, no. 3 (2014): 163–69, https://doi.org/10.1080/1941126x.2014.936767. https://doi.org/10.1300/j111v41n01_04 https://doi.org/10.18438/b88c7p https://doi.org/10.1080/15424060802093377 https://doi.org/10.1080/1941126x.2014.936767 information technology and libraries june 2023 a tale of two tools 6 locascio and rubel 7 kenyon stuart, ken varnum, and judith ahronheim, “measuring journal linking success from a discovery service,” information technology and libraries 34, no. 1 (2015): 52–76, https://doi.org/10.6017/ital.v34i1.5607. 8 stuart, varnum, and ahronheim, 74. https://doi.org/10.6017/ital.v34i1.5607 introduction literature review methods results conclusions using qualtrics xm to create a point-of-use survey to assess the usability of a local implementation of primo communication using qualtrics xm to create a point-of-use survey to assess the usability of a local implementation of primo matthew black, heather ganshorn, and justine wheeler information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16475 about the authors matthew black (corresponding author: mblack@ucalgary.ca) is the director, systems and discovery, libraries & cultural resources, university of calgary. heather ganshorn (hganshor@ucalgary.ca) is a science subject librarian, libraries and cultural resources, university of calgary. justine wheeler (jwheeler@ucalgary.ca) is the assessment librarian, libraries and cultural resources, university of calgary. © 2023. submitted: may 29, 2023. accepted for publication: september 12, 2023. published 18 december 2023. abstract in 2020, libraries and cultural resources (lcr) at the university of calgary used qualtrics xm to design and pilot a point-of-use survey to collect user feedback on the usability of our implementation of primo, ex libris's web-scale discovery service. over a two-week period, users were presented with the pop-up survey while searching and asked to provide feedback. this article summarizes how we designed and implemented this point-of-use survey and the lessons learned from this project. introduction in 2018, libraries and cultural resources (lcr) at the university of calgary implemented ex libris’s primo, a web-scale discovery service.1 through an embedded search box on the library webpage (https://www.library.ucaglary.ca), users can use primo to discover and access resources from lcr’s physical and digital collections. after adopting primo, lcr wanted to assess usability of the interface and the initial decisions made around user interface display customizations. to do this, in early 2020 lcr’s primo working group piloted a point-of-use intercept survey using qualtrics xm to collect feedback from users on the success of their search experience. for this, we defined success as users’ perception that they found what they were looking for. as we were unable to find comprehensive guidance on how to set up such a survey, this article will share our process and lessons learned in the hope that others will find it useful. primo and usability ex libris designed primo “to catch up with user expectations” through implementing “contemporary user experience elements.”2 since the initial release, many academic libraries have conducted user experience studies on primo that involved recruiting users and asking them to complete specific common tasks while under observation. study designers then analyze the results for insights into user experience.3 these studies capture feedback from a sample of users in the context of a usability test environment, which is to some extent artificial. in our study, we aimed to capture authentic user feedback as they searched using primo, not as they completed predetermined tasks. in the marketing profession, this is known as an intercept survey, an online survey that is triggered during the use of a site or application and can be used to study “natural use of the product.”4 in the library context, martha kyrillidou, terry plum, and bruce thompson used this method, which they referred to as point-of-use surveys, to collect user feedback on networked electronic resources. importantly, they contended that point-of-use surveys can improve the validity and mailto:mblack@ucalgary.ca mailto:hganshor@ucalgary.ca mailto:jwheeler@ucalgary.ca https://www.library.ucaglary.ca/ information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 2 black, ganshorn, and wheeler response rates of a survey because the survey does not require users to report on “predicted, intended, or remembered use” (which can introduce error) and that proactive interception increases the number of responses and decreases the potential for bias due to nonresponse rates.5 further, jane nichols, richard stoddart, and terry reese explained how oregon state university libraries and press designed and implemented an in-house intercept survey for collection assessment and noted the in-house method required significant time and support from their developers and that they would investigate whether qualtrics could support the intercept functionality.6 using qualtrics xm to deliver a survey in early 2020, lcr formed a primo working group, which includes representative from library systems, research and learning services, public services, and collections. the group meets monthly to review primo developments and work on improving our search experience. after our initial meetings, the group began to discuss how we collect user feedback on our discovery search interface. a subgroup was formed to come up with a strategy for this assessment. in the subgroup’s initial meeting, we decided to collect user feedback on search experience by asking open-ended questions about how successful users perceived they were with their search and to ask them to identify the elements of the interface that helped or hinder their success. for this, we decided to use qualtrics xm because it is the university of calgary’s licensed survey tool and we determined it could be used to deliver the survey as a pop-up. using qualtrics xm, we first developed a test survey to understand how to set up the survey flow and how to create the pop-up to display to users. after setting this up, we shared a test version with the primo working group for feedback on the design and testing in our primo sandbox. based on this feedback, we finalized a short survey that asked users if they found the resources they were looking for and included a follow-up question about why they feel their search was successful or unsuccessful (see fig. 1). the questions were: 1. what user group do you belong to? if you belong to more than one group, choose the group that had brought you to the library today: • academic staff • support staff and management & professional staff • undergraduate student • graduate student • continuing education student • community user • other 2. did you find what you needed? • yes (conditionally linked to question 3a) • no (conditionally linked to question 3b) • somewhat (conditionally linked to question 3a and 3b) 3. • 3a. what helped you find what you needed? • 3b. why do you think your search was not successful? information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 3 black, ganshorn, and wheeler figure 1. finalized survey questions and design in qualtrics xm. after finalizing the survey, we configured the pop-up. our plan was to run the survey for a twoweek period. during this two-week window when the survey was active, users would be presented with a pop-up during their primo search session asking if they would like to provide feedback. to trigger the display of the survey in a pop-up, we used qualtrics xm’s built-in functionality to create a point-of-use intercept, a website & app feedback project (see fig. 2). figure 2. create a website & app feedback project screen. after creating the project, we needed to configure the intercept and creative. the intercept is used to define the conditions for when a user on a site or app is presented with the survey. the creative is the method for delivering the survey to the users once intercepted on the site. for this project, we decided to use the responsive dialog creative to connect users to our survey (see fig. 3). information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 4 black, ganshorn, and wheeler figure 3. feedback collection method selection screen. within the creative, we were able to configure and preview the size, style, and text for the dialog and buttons; add images, such as a logo; and set the display animation (see fig. 4). figure 4. responsive dialog creative configuration screen. we configured the dialog to allow users to accept or decline our request to take the survey and branded it with our logo and colors (see fig. 5). information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 5 black, ganshorn, and wheeler figure 5. responsive dialog creative with customized message, buttons, colors, and logo. once we configured the creative, we needed to set up the intercept. we set the survey as the target for the intercept and defined the targeting logic and frequency. for the targeting logic, we used the “if current url starts with” option and used the base url for our primo instance (see fig. 6). figure 6. intercept targeting logic configuration. for the frequency, we configured it to intercept users only when their mouse left the web page because we did not want the survey to intercept users before they had tried searching. this was our best approximation for ensuring the intercept timing was appropriate for the questions we were asking. we determined the other available options would display the survey too early (on load or on focus) and not have achieved a valid timing for the questions. there is also an option to use custom javascript code to set the timing, but we decided to just use the out-of-the-box options because we wanted to keep the configuration simple (see fig. 7). information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 6 black, ganshorn, and wheeler figure 7. intercept frequency configuration. to avoid frustrating users by displaying the survey too frequently, we used the “repeated display prevention for browser cookie” option. within this configuration, we set the survey to display to 100% of the users who qualify but prevented repeated display for one day. this meant users would not be asked to take the survey more than once a day. to add the intercept to primo, we copied the javascript code snippet that qualtrics xm generates and added this to the custom.js file in our primo customization package (see fig. 8). figure 8. javascript code snippet for the responsive dialog. once this code is deployed in primo, the intercept behavior can be controlled within qualtrics xm by simply activating or deactivating it within the website/app feedback project (see fig 9). figure 9. configuration page with the activation toggle highlighted. information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 7 black, ganshorn, and wheeler this was convenient because we could leave the code in the primo custom.js file and deactivate or activate in qualtrics xm as necessary. in addition, when we later updated the target of the intercept to an updated version of the survey we did not have to update the code in primo. lessons learned proactively intercepting users at point of use can result in a significant number of responses over a short period of time. as mentioned above, we initially ran the survey with the intercept for a two-week period and again, for the same period of time, during two additional semesters. between these periods, we left the survey active as a passive sidebar that users could choose to access. this passive method collected fewer responses (107) over a six-month period than the point-of-use survey (755) collected over the combined six weeks that the pop-up prompt was active. for those who are interested in running a point-of-use survey, we summarize the lessons we learned below. when designing a survey consider the following • keep the survey short and focused on a specific goal. • assess the survey questions after piloting and iterate—adjust or clarify questions if needed. when designing the point-of-use intercept consider the following • make sure the point-of-use intercept will not frustrate users by controlling how frequently the survey is presented. • make sure the timing of the pop-up dialog aligns with your goal(s) by considering: o when do your primary stakeholders most heavily use library resources? o what is the minimum number of responses you aim to collect? o how long do you need to run the survey to achieve this number? o at what point in the users’ search will the survey intercept users to ask for feedback? will this timing be at the point of use you want feedback on? provide users with options for timely support we found that users sometimes confused the feedback survey with an opportunity to report issues or ask for help. accordingly, at the end of the survey we added a closing message that provided users with options to get immediate help using chat or our faqs (see fig. 10). figure 10. survey closing message. information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 8 black, ganshorn, and wheeler conclusion for academic libraries with access via an institutional license, qualtrics xm is a flexible and relatively simple tool that can be used to gather point-of-use feedback on user experiences with library discovery services such as primo. in our case, we analyzed the results for themes and shared these with our primo working group. as a result, the group reviewed the results and came up with actionable items such as updating our display labels for facets, resource availability statements, and access and licensing. this method may also be applicable for collecting feedback on other elements of a library website, vendor platforms, or online collections. endnotes 1 athena hoeppner, “the ins and outs of evaluating web-scale discovery services,” computers in libraries, no. 3 (april 2012), https://www.infotoday.com/cilmag/apr12/hoeppner-webscale-discovery-services.shtml. 2 tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (january 11, 2008): 7–24, https://doi.org/10.1108/03074800810845976. 3 annis lee adams and margot hanson, “primo on the go: a usability study of the primo mobile interface,” journal of web librarianship 14, no. 1–2 (april 2, 2020): 1–27, https://doi.org/10.1080/19322909.2020.1784820; kelsey renee brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries 35, no. 1 (april 1, 2016): 7–25, https://doi.org/10.6017/ital.v35i1.8965; david j. comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (april 2012): 189–206, https://doi.org/10.1080/10691316.2012.695671; sarah dahlen, kenny garcia, and kathlene hanson, “comparing apples and bananas? a/b testing for discovery system optimization,” library faculty publications and presentations, january 1, 2018, https://digitalcommons.csumb.edu/lib_fac/9; blake lee galbreath, corey johnson, and erin hvizdak, “primo new user interface: usability testing and local customizations implemented in response,” information technology and libraries 37, no. 2 (june 18, 2018): 10–33, https://doi.org/10.6017/ital.v37i2.10191; scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (january 2, 2015): 1–21, https://doi.org/10.1080/19322909.2014.983259; w. jacobs, mike demars, and j. m. kimmitt, “a multi-campus usability testing study of the new primo interface,” college & undergraduate libraries 27, no. 1 (january 2, 2020): 1–16, https://doi.org/10.1080/10691316.2019.1695161; kylie jarrett, “findit@flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (december 2012): 278–99; greta kliewer et al., “using primo for undergraduate research: a usability study,” library hi tech 34, no. 4 (january 1, 2016): 566–84, https://doi.org/10.1108/lht-05-2016-0052; aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (april 3, 2014): 172–95, https://doi.org/10.1080/19322909.2014.903133; xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of human-computer interaction 30, no. 5 (may 4, 2014): 422–33, https://doi.org/10.1080/10447318.2013.873281; joy marie perrin et al., “usability testing for greater impact: a primo case study,” information technology and libraries 33, no. 4 (december 18, 2014): 57–66, https://doi.org/10.6017/ital.v33i4.5174; lynne porat and nir zinger, “primo new user interface—not just for undergrads: a usability study,” weave: https://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml https://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml https://doi.org/10.1108/03074800810845976 https://doi.org/10.1080/19322909.2020.1784820 https://doi.org/10.6017/ital.v35i1.8965 https://doi.org/10.1080/10691316.2012.695671 https://digitalcommons.csumb.edu/lib_fac/9 https://doi.org/10.6017/ital.v37i2.10191 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.1080/10691316.2019.1695161 https://doi.org/10.1108/lht-05-2016-0052 https://doi.org/10.1080/19322909.2014.903133 https://doi.org/10.1080/10447318.2013.873281 https://doi.org/10.6017/ital.v33i4.5174 information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 9 black, ganshorn, and wheeler journal of library user experience 1, no. 9 (2018), https://doi.org/10.3998/weave.12535642.0001.904; barbara valentine and beth west, “improving primo usability and teachability with help from the users,” journal of web librarianship 10, no. 3 (july 2, 2016): 176–96, https://doi.org/10.1080/19322909.2016.1190678. 4 christian rohrer, “when to use which user-experience research methods,” nielsen norman group, july 17, 2022, https://www.nngroup.com/articles/which-ux-research-methods/. 5 martha kyrillidou, terry plum, and bruce thompson, “evaluating usage and impact of networked electronic resources through point-of-use surveys: a mines for librariestm study,” the serials librarian 59, no. 2 (july 30, 2010): 485, https://doi.org/10.1080/03615261003674057. 6 jane nichols, richard stoddart, and terry reese, “nuanced and timely: capturing collections feedback at point of use,” in too much is not enough! (charleston conference, against the grain, 2014), 299, https://doi.org/10.5703/1288284315275. https://doi.org/10.3998/weave.12535642.0001.904 https://doi.org/10.1080/19322909.2016.1190678 https://www.nngroup.com/articles/which-ux-research-methods/ https://doi.org/10.1080/03615261003674057 https://doi.org/10.5703/1288284315275 abstract introduction primo and usability using qualtrics xm to deliver a survey lessons learned when designing a survey consider the following when designing the point-of-use intercept consider the following provide users with options for timely support conclusion endnotes 34 information technology and libraries | march 2010 tagging: an organization scheme for the internet marijke a. visser how should the information on the internet be organized? this question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve internet content. this paper discusses the benefits and the controversies of using a tagging system to organize internet resources. tagging refers to a classification system where individual internet users apply labels, or tags, to digital resources. tagging increased in popularity with the advent of web 2.0 applications that encourage interaction among users. as more information is available digitally, the challenge to find an organizational system scalable to the internet will continue to require forward thinking. trained to ensure access to a range of informational resources, librarians need to be concerned with access to internet content. librarians can play a pivotal role by advocating for a system that supports the user at the moment of need. tagging may just be the necessary system. w ho will organize the information available on the internet? how will it be organized? does it need an organizational scheme at all? in 1998, thomas and griffin asked a similar question, “who will create the metadata for the internet?” in their article with the same name.1 ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. given new classification tools available on the internet, the time is right to reassess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. this paper gives particular attention to delicious, a social networking tool for generating folksonomies. the amount of information available to anyone with an internet connection has increased in part because of the internet’s participatory nature. users add content in a variety of formats and through a variety of applications to personalize their web experience, thus making internet content transitory in nature and challenging to lock into place. the continual influx of new information is causing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. conversations on a range of topics that take place using web technologies happen in real time. unless you are a participant in these conversations and debates using web-based communication tools, changes are passing you by. internet users in general have barely grasped the concept of web 2.0 and already the advanced “internet cognoscenti” write about web 3.0.2 regarding the organization and availability of internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. internet users actively participating in and shaping the online communities are, perhaps unintentionally, influencing how those who access information via the internet expect to be able to receive and use digital resources. librarians understand that the way information is organized is critical to its accessibility. they also understand the communities in which they operate. today, librarians need to be able to work seamlessly among the online communities, the resources they create, and the end user. as internet use evolves, librarians as information stakeholders should stay abreast of web 2.0 developments. by positioning themselves to lead the future of information organization, librarians will be able to select the best emerging web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing internet content. shirky argues that the internet has allowed new communities to form. primarily online, these communities of internet users are capable of dramatically changing society both onand offline. shirky contends that because of the internet, “group action just got easier.”3 according to shirky, we are now at the critical point where internet use, while dependent on technology, is actually no longer about the technology at all. the web today (web 2.0) is about participation. “this [the internet] is a medium that is going to change society.”4 lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 it is significant that creating content is no longer reserved for the internet cognoscenti. internet users with a variety of technological skills are participating in web 2.0 communities. information architects, web designers, librarians, business representatives, and any stakeholder dependent on accessing resources on the internet have a vested interest in how internet information is organized. not only does the architecture of participation inherent in the internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in marijke a. visser (marijkea@gmail.com) is a library and information science graduate student at indiana university, indianapolis, and will be graduating may 2010. she is currently working for ala’s office for information and technology policy as an information technology policy analyst, where her area of focus includes telecommunications policy and how it affects access to information. tagging: an organization scheme for the internet | visser 35 personal and organizationally sponsored blogs: lessig 2.0, boing boing, open access news, and others. these internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. web 2.0 technologies challenge our understanding of what constitutes information and push policy makers to negotiate equitable internet-use policies for the public, the content creators, corporate interests, and the service providers. to maintain an open internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social web represents. for users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. its digital format makes it especially easy for people to manipulate it and shape it to create new works. people are sharing these new works via social technologies for others to then remix into yet more distinct creative work. communication is fundamentally altered by the ability to share content on the internet. today’s internet requires a reevaluation of how we define and organize information. the manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. new paradigms for talking about and classifying information that reflect the participatory internet are essential. n background the controversy over organizing web-based information can be summed up comparing two perspectives represented by shirky and peterson. both authors address how information on the web can be most effectively organized. in her introduction, peterson states, “items that are different or strange can become a barrier to networking.”6 shirky maintains, “as the web has shown us, you can extract a surprising amount of value from big messy data sets.”7 briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). folksonomy describes an organizational system where individuals determine the placement and categorization of digital information. both terms are discussed in detail below. although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. to organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. applying an ontological, or hierarchical, classification system to digital information raises several points to consider. first, there are no physical space restrictions on the internet, so relationships among digital resources do not need to be strictly identified. second, after recognizing that internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the internet. third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of internet users. traditional systems for organizing information were developed prior to the information explosion on the internet. the internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. designing an organizational system for the internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n assessing alternative models controversy encourages scrutiny of alternative models. in understanding the options for organizing digital information, it is important to understand traditional classification models. smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. according to smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. it links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 taxonomies are hierarchical, controlled vocabularies that establish parent–child relationships between terms. a faceted classification system categorizes information using the distinct properties of that information.9 in such a system, information can exist in more than one place at a time. a faceted classification system is a precursor to the bottom-up system represented by folksonomic tagging. folksonomy, a term coined in 2004 by thomas vander wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 vander wal further separates the definition into two types: a narrow and a broad folksonomy.11 in a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. in a narrow folksonomy, one or few people tag an object with primarily singular terms. internet searching represents a unique challenge to people wanting to organize its available information. search engines like yahoo! and google approach the chaotic mass of information using two different techniques. yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. in so doing, the yahoo! developers made assumptions about how the general public would categorize and access information. the categories 36 information technology and libraries | march 2010 and subsequent subcategories were not necessarily logically linked in the eyes of the general public. the yahoo! directory expanded as internet content grew, but the digital folder system, like a taxonomy, required an expert to maintain. shirky notes the yahoo! model could not scale to the internet. there are too many possible links to be able to successfully stay within the confines of a hierarchical classification system. additionally, on the internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 a hierarchical system does not assure a successful internet search and it requires a user to comprehend the links determined by the managing expert. in the google approach, developers acknowledged that the user with the query best understood the unique reasoning behind her search. the user therefore could best evaluate the information retrieved. according to shirky, the google model let go of the hierarchical file system because developers recognized effective searching cannot predetermine what the user wants. unlike yahoo!, google makes the links between the query and the resources after the user types in the search terms.13 trusting in the link system led google to understand and profit from letting the user filter the search results. to select the best organizational model for the internet it is critical to understand its emergent nature. a model that does not address the effects of web 2.0 on internet use and fails to capture participant-created content and tagging will not be successful. one approach to organizing digital resources has been for users to bookmark websites of personal interest. these bookmarks have been stored on the user’s computer, but newer models now combine the participatory web with saving, or tagging, websites. social bookmarking typifies the emergent web and the attraction of online networking. innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. a social bookmarking network, delicious is a tool for generating folksonomies. it combines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. delicious users add metadata to resources on the internet by applying terms, or tags, to urls. users save these tagged websites to a personal library hosted on the delicious website. the default settings on delicious share a user’s library publicly, thus allowing other people—not limited to registered delicious account holders—to view any library. that the delicious developers understood how internet users would react to this type of interactive application is reflected in the popularity of delicious. delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collaboration. with a new look (going from the original del.icio.us to its current moniker, delicious) as well as more ways for users to retrieve and share resources by 2007, delicious had 3 million registered users and 100 million unique urls.14 the reputation of delicious has generated interest among people concerned with organizing the information available via the internet. how does the folksonomy or delicious model of open-ended tagging affect searching, information retrieving, and resource sharing? delicious, whose platform is heavily influenced by its users, operates with no hierarchical control over the vocabulary used as tags. this underscores the organization controversy. bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. at the same time, it creates a chaotic information-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 a folksonomy follows no hierarchical scheme. every tag generated supplies personal meaning to the associated url and is equally weighted. there will be overlap in some of the tags users select, and that will be the point of access for different users. for the unique tags, each delicious user can choose to adopt or reject them for their personal tagging system. either way, the additional tags add possible future access points for the rest of the user community. the social usefulness of the tags grows organically in relationship to their adoption by the group. can the internet support an organizational system controlled by user-generated tags? by the very nature of the participatory web, whose applications often get better with user input, the answer is yes. delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. defined by vander wal, a broad folksonomy is a classification system scalable to the internet.16 the problem with projecting already-existing search and classification strategies to the internet is that the internet is constantly evolving, and classic models are quickly overcome. even in the nonprint world of the internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with yahoo!, a taxonomy is not necessarily what users are looking for. to be used effectively, taxonomies demand a certain amount of user finesse and complacency. the user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. the search model used by google, where the user does the filtering, has been a significantly more successful search engine. google recognizes natural language, making it user friendly; however, it remains merely a search engine. it is successful at making links, but it leaves the user stranded without a means to organize search results beyond simple page rank. traditional tagging: an organization scheme for the internet | visser 37 hierarchical systems and search strategies like those of yahoo! and google neglect to take into account the tremendous popularity of the participatory web. successful web applications today support user interaction; to disregard this is naive and short-sighted. in contrast to a simple page-rank results list or a hierarchical system, delicious results provide the user with rich, multilayer results. figure 1 shows four of the first ten results of a delicious search for the term “folksonomy.” the articles by the four authors in the left column were tagged according to the diagram. two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the internet. in this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. further information available using delicious shows that the guy article was tagged by 1,323 users, the mathes article by 2,787 users, the shirky article by 4,383 users, and the peterson article by 579 users.17 from the basic delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a delicious user can browse other users’ libraries, often finding additional pertinent resources. a user will return a greater number of relevant and automatically filtered results than with an advanced google search. as an ancillary feature, once a delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. hence delicious is effective personally and socially. it emulates what internet users expect to be able to do with digital content: find interesting resources, personalize them, in this case with tags, and put them back out for others to use if they so choose. proponents of folksonomy recognize there are benefits to traditional taxonomies and controlled vocabulary systems. shirky delineates two features of an organizational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 these characteristics apply to situations using databases, journal articles, and dissertations as spelled out by peterson, for example.19 specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. these domains are the antithesis of the domain represented by the web. the success of controlled vocabularies, taxonomies, and their resulting systems depends on broad user adoption. that, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the web. though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. a folksonomy is never static. taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of internet resources. neither have traditional viewpoints been scaled to accept the challenges encountered in trying to organize the internet. folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. folksonomy, however, relies on users to create the links that will retrieve the desired results. doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of applying traditional classification schemes to the digital stage: 1. there is not a “correct” way to categorize an idea. 2. competing interests cannot come to a consensus figure 1. search results for “folksonomy” using delicious. table 1. domains and their participants domain to be organized participants in the domain small corpus expert catalogers formal categories authoritative source of judgment restricted entities coordinated users clear edges expert users 38 information technology and libraries | march 2010 on a hierarchical vocabulary. 3. there is more than one way to describe something. doctorow elaborates: “requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 the internet raises the level of participation to include innumerable voices. the astonishing thing is that it thrives on this participation. guy and tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. they can be ambiguous, overly personal, misspelled, and a contrived compound word. guy and tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 this, however, does not acknowledge that successful web 2.0 applications depend on the emergent wisdom of the user community. the systems permit organic evolution and continual improvement by user participation. a folksonomy evolves much the way a species does. unique or single-use tags have minimal social import and do not gain recognition. tags used by more than a few people reinforce their value and emerge as the more robust species. n conclusion the benefits of the internet are accessible to a wide range of users. the rewards of participation are immediate, social, and exponential in scope. user-generated content and associated organization models support the internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. to paraphrase shirky and lessig, people are participating in a moment of social and technological evolution that is altering traditional ways of thinking about information, thereby creating a break from traditional systems. folksonomic classification is part of that break. its utility grows organically as users add tagged content to the system. it is adaptive, and its strengths can be leveraged according to the needs of the group. while there are “folksonomic flaws” inherent in a bottomup classification system, there is tremendous value in weighting individual voices equally. following the logic of web 2.0 technology, folksonomy will improve according to the input of the users. it is an organizational system that reflects the basic tenets of the emergent internet. it may be the only practical solution in a world of participatory content creation. shirky describes the internet by saying, “there is no shelf in the digital world.”22 classic organizational schemes like the dewey decimal system were created to organize resources prior to the advent of the internet. a hierarchical system was necessary because there was a physical limitation on where a resource could be located; a book can only exist in one place at one time. in the digital world, the shelf is simply not there. material can exist in many different places at once and can be retrieved through many avenues. a broad folksonomy supports a vibrant search strategy. it combines individual user input with that of the group. this relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. this is why a folksonomic approach to organizing information on the internet is successful. users are rewarded for their participation, and the system improves because of it. folksonomy mirrors and supports the evolution of the internet. librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the web, corporate interests, and policy makers. critical conversations are no longer happening only in traditional publications of the print world. they are happening with communication platforms like youtube, twitter, digg, and delicious. information organization is one issue on which librarians can be progressive. dedicated to making information available, librarians are in a unique position to take on challenges raised by the internet. as the profession experiments with the introduction of web 3.0, librarians need to position themselves between what is known and what has yet to evolve. librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. because more and more resources are accessed via the internet, librarians will have to maintain a presence throughout the web if they are to continue to speak for the informational needs of their users. part of that presence necessitates an ability to adapt current models to the internet. more importantly, it requires recognition of when to forgo conventional service methods in favor of more innovative approaches. working in concert with the early adopters, corporate interests, and general internet users, librarians can promote a successful system for organizing internet resources. for the internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. references and notes 1. charles f. thomas and linda s. griffin, “who will create the metadata for the internet?” first monday 3, no. 12 (dec. 1998). 2. web 2.0 is a fairly recent term, although now ubiquitous among people working in and around internet technologies. attributed to a conference held in 2004 between medialive tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” 198 information technology and libraries | december 2011 yan hantutorial articles: one was to make a case for using the cloud;4 while the other provided more details of moving a library’s it infrastructure (ils, website, and digital library systems) to a cloud along with discussing motivation, results, and evaluation in three areas (quality and stability, impact on library services, and cost).5 on the cost discussion, mitchell mentioned the difficulty of calculating technology total cost of ownership (tco) and cited two papers suggesting minimal cost savings. mitchell suggested the same but did not provide detailed cost information. in comparison, this paper has a detailed breakdown cost analysis along with different services, such as web applications and storage. mirsa and mondal proposed a suitability index and a return on investment (roi) model by taking into consideration impacts and real value.6 their suitability index and roi model is well thought but consider using the cloud for every aspect of all it operations as a whole. as a result, a company using this model will have the final conclusion of a “suitable,” or “may or may not be,” or “not suitable.” however, modular it operations and services (e.g., e-mail and storage) can be evaluated individually because these services can be easily upgraded or changed with minimal impacts to customers. i/o intensive services and storage intensive services have different resource requirements and thus the same evaluation criteria may not give an accurate picture of costs and benefits. for example, storing digital preservation files for libraries is a one-time data intensive operation. giving the above different nature of it operations and services, cloud computing may be suitable for some it operations but not for others. healy suggested that many companies did not have a complete financial analysis by missing staff retraining and system management. he listed the following areas for tco: hardware, software, recurring licensing and maintenance, bandwidth, a starting point for locating information for research; (2) buyer, the library as a purchaser of resources; and (3) archive, the library as a repository of resources. the 2009 survey indicates a gradual decline in their perception of the importance of “gateway,” no change in “archive,” growth in “buyer,” and increased importance for two new roles: “teaching support” and “research support.”1 to meet customers’ needs in these roles, libraries are innovating services, including catalogs and home websites (as “gateway” services), repository and digital library programs (as “archive,” “teaching support,” and “research support” services), and interlibrary loan (as a “buyer” and “research support” services). these services rely on stable and effective it infrastructure to operate. in the past, the growing needs of these web applications increased it expenditures and work complexity. more web applications, more storage, and more it support staff are weaved into centralized on-site it infrastructure along with huge investments in physical servers, networks, and buildings. however, decreasing budgets in libraries have had huge impact on all aspects of library operations and staffing. web applications running on local, managed servers might not be effective in technology nor efficient in cost. web applications utilizing cloud computing can be much more effective and efficient in some cases. literature review there are a growing number of articles related to cloud computing in libraries. chudnov described his personal experience of using cloud services amazon ec2 and s3 in an informal tone, costing him 50 cents.2 jordan discussed oclc’s strategies of building its next generation of services in cloud and provided a clear view of oclc’s future directions for us.3 mitchell wrote two cloud computing: case studies and total costs of ownership this paper consists of four major sections: the first section is a literature review of cloud computing and a cost model. the next section focuses on detailed overviews of cloud computing and its levels of services: saas, paas, and iaas. major cloud computing providers are introduced, including amazon web services (aws), microsoft azure, and google app engine. finally, case studies of implementing web applications on iaas and paas using aws, linode and google appengine are demonstrated. justifications of running on an iaas provider (aws) and running on a paas provider (google appengine) are described. the last section discusses costs and technology analysis comparing cloud computing with local managed storage and servers. the total costs of ownership (tco) of an aws small instance are significantly lower, but the tco of a typical 10tb space in amazon s3 are significantly higher. since amazon offers lower storage pricing for huge amounts of data, the tco might be lower. readers should do their own analysis on the tcos. a 2009 study from ithaka suggested that faculty perceive three traditional functions of a library: (1) gateway, the library as yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson, arizona. selecting a web content management system for an academic library website | han 199cloud computing: case studies and total costs of ownership | han 199 fundamental computing resources so that they can deploy and run arbitrary software such as operating systems and applications.13 in this model, the providers only manage underlying physical cloud infrastructure (e.g. physical servers and network), and provides services via virtualization. the users have maximum control on the infrastructure as if they own underlying physical servers and network. leading providers of this model includes amazon, linode, rackspace, joyent, and ibm blue cloud. major cloud computing providers include amazon web services (aws), microsoft windows azure, and google appengine. aws is considered to be an iaas, paas, and saas provider, which offers a collection of multiple computing services through the internet, including a few well-known services such as amazon elastic compute cloud (ec2),14 amazon simple storage service (s3), and amazon simpledb. ec2 started as a public beta in 2006. it allows users to pay for computing resources as they use them. with scalable use of computing resources and attractive pricing models, ec2 is one of the biggest brand names in cloud computing. it offers different os options, including multiple linux distributions, opensolaris, and windows server. ec2 uses xen virtualization, each virtual machine is called an instance. an instance in ec2 has no persistent storage, and data stored will be lost if the instance is terminated. therefore it is typical to use ec2 along with amazon elastic block store (ebs) or s3, which provides persistent storage for ec2 instances. amazon claims that both ebs and s3 are highly available and reliable. a user can create, start, stop, and terminate server instances through multiple geographical locations for benefits of resource optimization and high availability. for example, a user can start an instance in northern virginia, a potential to transform the it industry and it services, shifting the way it infrastructure and hardware are designed, purchased, and managed. many experts have their own version of cloud computing, which was discussed before.9 the national institute of standards and technology defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configuration computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.”10 nist also gives its three service models layered based on computing infrastructure: ■■ software as a service (saas) allows users to use the cloud computing providers’ applications through a thin client interface such as a web browser.11 in the saas model, the cloud computing providers manage almost everything in the cloud infrastructure (e.g., physical servers, network, os, applications). it is directly targeted for general end users. the end users can directly run applications on the clouds and do not need install, upgrade, and backup applications and their work. typical saas products are google apps and salesforce sales crm. ■■ platform as a service (paas) allows users to deploy their own applications on the provider’s cloud infrastructure under the provider’s environment such as programming languages, libraries, and tools.12 in this model, the cloud computing providers manage everything except the application in the cloud infrastructure. paas is directly targeted for general software developers. they can develop, test, and run their codes on a paas platform. typical examples of this model includes google appengine, windows azure, and joyent. ■■ infrastructure as a service (iaas) allows users to manage processing, storage, networks, and other staffing allocation, monitoring, backup, failover, security audit and compliance, integration, training, and speed to implementation.7 the author published his first paper regarding cloud computing in 2010.8 since then, the author has implemented and has been managing multiple web applications and services using iaas and paas providers. several web applications of the university of arizona libraries (ual) have been migrated to the cloud. this paper focuses on enterprise-level applications and services, not individual-level cloud applications such as google docs. the purposes of this article are to ■■ define cloud computing and levels of services; ■■ introduce and compare major cloud computing providers; ■■ provide case studies of running two web applications (dspace and a home grown java application) utilizing cloud computing with justification; ■■ provide a comparison of tco of running web applications comparing a cloud computing provider with a local managed server; ■■ provide a comparison of tco of 10tb storage space comparing a cloud computing provider with local managed storage; and ■■ briefly discuss technology advantages of cloud computing. definition of cloud computing and levels of services cloud computing services and providers cloud computing is becoming popular in the it industry. over the past few years, the supply-and-demand of this new area has been seeing a huge increase of investment in infrastructure and has been drawing broader uses in the united states. the author believes that it has a 200 information technology and libraries | december 2011 16gb storage, 200gb transfer, and the cost is $19.95 per month.20 customers pay up front. open-source cloud computing software and private cloud cloud computing also goes to open source if any person or organization wants to set up their own clouds. eucalyptus is an open-source cloud computing system developed by the university of california at santa barbara. some of its eye-catching features include full compatibility with amazon ec2 public infrastructure and multiple hypervisors, which allows different virtual machines (e.g., xen, kvm, vsphere) to run on one platform.21 its open-source company, eucalyptus systems, provides technical supports to end users. building a cloud infrastructure on cloud(s) is also possible and might be desirable in certain situations. current linux distributions work with eucalyptus to provide private cloud services such ubuntu enterprise cloud and red hat’s deltacloud. some organizations have been setting up private clouds to utilize advantages of cloud computing. the azure allows non-windows applications to run on the platform. for example, apache web server can be run as a “worker role.”17 there also are a few small-to-medium size providers such as linode.18 table 1 lists major cloud computing providers. the cloud computing providers operate in two business models: variable (pay-for-your-usage) plans and fixed plans. variable plans allows customers to pay only for the resources actually consumed (e.g., instancehours, data transfer). aws offers a variable plan. google app engine works in a similar way. google app engine offers two interesting features: daily budgets and free quotas. a daily budget allows customers to control the amount of resources used every day. the free quota is currently set as 6.5 hours of cpu time per day, 1 gb data in and out per day, and 1gb of data storage.19 by the end of each month, customers receive a bill listing the number of running hours, the amount of storage used, the size of data transfers, and other add-on services. linode only offers a fixed plan. the charge is based on the amount of ram, data storage, and data transfer by assuming an instance is always running. for example, the smallest instance has 360mb ram, mirroring instance in ireland, and another mirroring instance in asia. amazon keeps increasing its offering by introducing new paas and saas services, such as simpledb, simple e-mail service, and e-commerce. google app engine is a paas provider offering a cloud platform for web applications in google’s data centers. it was released as a beta version in 2008 but is currently in a full service mode. appengine functions like a middle layer, which frees customers worrying about running oss, modules, and libraries. it currently supports python and java programming languages and related frameworks, and it is expected to support more languages in the future. google app engine uses bigtable with its gql (a sqllike language). bigtable15 is google’s proprietary database, used in multiple google applications such as google earth, google search, and app engine. the design of gql intentionally does not support “join” statement for multiple machine optimization.16 unlike aws, google appengine has a nice feature that allows customers a taste of the platform: it is free of charge up to a certain level of resource use. after that, fees are charged for additional cpu time, bandwidth and storage. windows azure also is a paas provider, which runs on microsoft data centers. it provides a new way to run applications and storing data in microsoft way. microsoft customers can install and run applications on microsoft cloud. customers are provided with two different instance types: web role instances and worker role instances. customers can use a “web role instance” to accept incoming http/https requests using asp.net, windows communication foundation (wcf) or another.net technology working with iis. a “worker role instance” is not associated with iis, but functions as a background job. the two instances can be combined to create desired web services. it is clear that windows table 1. list of major cloud computing providers cloud computing provider layer akamai paas, saas amazon web services iaas, paas, saas emc saas eucalyptus iaas open source software google paas(appengine), saas ibm paas, saas linode iaas microsoft paas (azure), saas rackspace iaas, paas, saas salesforce.com paas, saas vmware vcloud paas, iaas zoho saas selecting a web content management system for an academic library website | han 201cloud computing: case studies and total costs of ownership | han 201 the work of modification of sql-style code would have been significant. the author has a monthly bill of $40 using an aws small instance. case study 2: japanese gif holding library finder application the author helped the north american coordinating council on japanese library resources (ncc) to develop and maintain a web service to identify japanese global ill framework (gif) libraries to facilitate interlibrary loan (ill) service. the application was developed in java using j2ee framework, and run in typical java servlet container such as tomcat. the application was initially operated in a small, locally managed server, and was migrated to linode and google appengine in may 2010. cloud computing provider selection and implementation unlike case 1, the author tested and installed the application to aws, linode and google appengine. aws and linode are iaas providers which give users greater control over virtual nodes on their cloud infrastructure. google appengine might be a better choice when applications run on normal os environments, because system administration tasks can be completed by paas providers, saving users’ time and resources. as a paas provider, google maintains its infrastructure environment such as os, programming languages, and tools. installing the application in google appengine can go through an eclipse plug-in or through command lines. in this case, the gif application is a simple system written in java without any database transactions. therefore google app engine’s proprietary gql database is not a barrier. however, users should be aware that google appengine has other unique features. for example, cloud computing provider selection and implementation a typical dspace instance requires java and related libraries, j2ee environment, and postgresql as database backend. three cloud computing providers have been evaluated: aws, linode, and google appengine. two instances were successfully installed and configured in aws and linode after a few days of testing. building a dspace instance on the cloud is the same process as running it on local except that it is much quicker to build, restart, rebuild, and backup. for example, an initial os installation in a traditional server will take a few hours compared to doing the same task that takes a few minutes using an iaas provider. installation on the aws ec2 and linode is almost the same except creating a login and setting up security policies. to log on to aws, command line tools using an x.509 certificate using public/private key are by default. a generated keypair is required to ssh an instance and no password ssh option is provided. in addition, appropriate “security groups” are required to set up to enable network protocols. in this case, protocols such as ssh and http along with typical port number 80 and 8080 must be enabled. activities such as manage instances, creating images, and setup security policies can be set up through aws web interface (see figure 1). steps and commands of running regular operations can be found in the appendix. in linode, using “root” to log on is allowed. users do not need to set network and security policies, as protocols and ports are already open. in system administration practice, running applications without enforcing security policies does present security risks to applications and systems. linode allows users to set up security policies. the author decided not to proceed with installation in google appengine because of its proprietary database gql. if implemented in google appengine, private cloud eases concerns in the public cloud such as security of data, control of data, and legal issues. for example, an institution can build its own cloud infrastructure using eucalyptus (or ubuntu cloud) with its own computing resources or simply using amazon aws. the private cloud computing service becomes customizable cloud computing resources which can be configured and reconfigured as needed. why is this valuable? in traditional computing approaches, servers, storage, and networking equipment are purchased, configured, and then used without significant changes for three to five years until lives end. in this case, some planning must be scheduled ahead of time thinking of computing resource needs in three to five years. it is certain that additional resources (e.g., ram, hard disks, cpu) will be reserved for future needs and are currently wasted. the private cloud reduces concerns regarding security and data control. however, one must still buy, build, and manage the private cloud, increasing tco and reducing the cost benefit. case studies: applications on the cloud case study 1: dspace implementation and analysis many libraries are running their institutional repositories at locally managed servers. ual has been running its repositories since 2004 as one of the earliest dspace adapters. one of the dspace instances was tested on the cloud in january 2010 after comparing costs and supports. later the author chose to run a production dspace in aws starting march 2010. the repository (http://www.afghan data.org/) currently holds 1,800 titles of digitized unique afghan materials. since then, several content and system updates have been applied. 202 information technology and libraries | december 2011 a good case for calculating the tco.25 in cases below, readers should be aware that there are the following assumptions: ■■ software, training, licensing, and maintenance costs are the same by assuming using on the same software environment on the local managed infrastructure and on the cloud. ■■ monitoring costs are the same based on the fact that monitoring software has to be hosted somewhere. ■■ bandwidth and network costs ignored. ■■ security audit and compliance ignored by assuming all data are open. the author runs an instance of 100gb in aws and a monthly bill of this node is around $40. in comparison, if running a local managed server, a physical server would have been purchased. in our case, a comparison of tco shows that the cloud computing model has a significant 50 percent cost saving, assuming a server life expectancy is five years. analysis and discussions cost analysis running applications on the cloud gives many technical advantages and results in significant cost savings over running them on local managed servers. in this section, the author presents detailed cost comparisons between virtual managed nodes in the cloud computing and local managed storage and servers in the traditional model. cost saving and low barriers to launch web services using the cloud is significant when considering easy start-up, scalability, and flexibility. one of the biggest advantages of the cloud computing lies in its on-demand, allowing users to start applications with minimal cost. the current cost of starting an instance on aws is 0.03 per hour if reserved. above the clouds: a berkeley view of cloud computing cites a comparison: “it costs $2.56 to rent $2 worth of cpu” and “costs are $6.00 when purchasing vs. $1.20–$1.50 per month on s3.”24 clearly healy made currently google appengine only allows users to have their codes running in python and java; it uses its own database query language gql. this creates an extra step for developers who are willing to migrate existing codes to google and existing sql queries have to be rewritten. in addition, other limitations with google app engine include allowing only a subset of the jre standard edition and users are unable to create new threads.22 the cost of running the application on google app engine is great, because google app engine offers free of charge up to its free quota. google identified 90 percent of applications were hosted free.23 this is a great paas resource for small web applications. applications on the cloud since 2009, the author has been running multiple web applications and services on multiple iaas and paas providers and has been very happy regarding services and overall costs. the running applications and services are listed in table 2. figure 1. amazon aws management console selecting a web content management system for an academic library website | han 203cloud computing: case studies and total costs of ownership | han 203 ■❏ operation expense: $7,190– $10,690. ignoring downtime and failure expenses, insurance cost, technology training, and backup process. ■● system administrator cost: $3,500–$7,000 = 5 years x 1–2 percent time x (50,000 salary + 50000 x 40 percent benefits). 1–2 percent time is about 5–10 minutes per day assuming this administrator works at 8 hours per day 5 days per week at 100 percent capacity. ■■ space cost: $1,500. ■● space cost for a book in ual is $2.80 per year. a physical server is estimated to be $300 dollars per year for space. ■● electricity cost: $2,190. of a 1.0–1.2 ghz 2007 opteron or 2007 xeon processor.”26 ■■ the tco of a physical server comparable to an aws small instance for 5 years: $5,858–$7,608. ■❏ an aws small instance is roughly 50 percent of computing power of a server quoted. (the tco here is calculated as 50 percent of $11,715–$15,215). ■❏ hardware: $4,525. ■● $4,525 = $2,658 (server) + $1,125 (3-year support) + $1,125 x2 /3 (additional 2-year support). note: dell poweredge server: intel xeon e56302.53ghz with 5-year support for mission critical 6-hours repair (source: dell. com quoted on oct. 20, 2010). ■■ the tco of an aws small instance for 5 years: $2,750–$3,750. ■❏ hardware: $0. ■❏ operation expense: $2,750– $3,750 ■● system administrator cost: $0–$1,000?. by eliminating physical infrastructure, there is no need or minimal cost to manage a server. ■● $2,750 = $350 (aws initial subscription fee) + $40/ month x 12 months x 5 years. the instance’s capacity can be found on aws, and cpu power can be evaluated by using /proc/cpuinfo. amazon indicated that “one ec2 compute unit provides the equivalent cpu capacity table 2. some ual web applications and cloud computing service providers computing infrastructure functions applications computing environment instances service providers data storage data storage n/a linux / windows data storage using ebs or s3 aws access digital repository dspace j2ee, java, tomcat, postgresql, afghanistan digital collections aws linode content management system joomla linux, apache, php, mysql, afghanistan digital libraries aws linode website html html sonoran desert knowledge exchange aws linode integrated library system koha linux, apache, perl, mysql afghanistan higher education union catalog aws linode web applications home-grown j2ee web application j2ee, java, tomcat japanese gif (global interlibrary-loan) holding finder at linode at google app engine aws linode google app engine computing services monitoring nagios linux, perl internal application aws linode networked devices administration ssh, sftp linux n/a aws linode 204 information technology and libraries | december 2011 meet users’ needs at will. rebuilding nodes and creating imaging are also easier on the cloud. server failure resulting from hardware error can result in significant downtime. the ual has a few server failure in the past few years. last year a server’s raid hard drives failed. the time spent on ordering new hard disks, waiting for server company technician’s arrival, and finally rebuilding software environment (e.g., os, web servers, application servers, user and group privileges) took six or more hours, not to mention about stress rising among customers due to unavailability of services. mirroring servers could minimize service downtime, but the cost would be almost doubled. in comparison, in the cloud computing model, the author took a few snapshots using the aws web management interface. if a node fails, the author can launch an instance using the snapshot within a minute or two. factors such as software and hardware failure, natural disasters, network failure, and human errors are the main causes for system downtime. the cloud computing providers generally have multiple data centers in different regions. for instance, amazon s3 and google appengine are claimed to be highly available and highly reliable. both aws and google app engine offer automatic scaling and load balancing. the cloud computing providers have huge advantages in offering high availability to minimize hardware failure, natural disasters, network failure, and human errors, while the locally managed server and storage approach has to be invested a lot to reduce these risks. in 2009 and 2010 the university of arizona has experienced at least two network and server outages each lasting a few hours; one failure was because of human error and the other was because of a power failure from tucson electric power. when a power line was cut by accident, what can you do? in comparison, over the past two years minimal downtime from includes 12tb hard disks (about 10tb usable space after raid 5 configuration) with 5-year support, assuming 5-year life expectancy. ■❏ operation expense: $1,438– $2,138 per year. ■● system administrator cost: $700–$1,200. see above. ■● space cost: $300. see above. ■● electricity costs: $438 per year. see above. ■● network cost ignored. technology analysis there is no need to purchase a server; no need to initial a cloud node; no need to setup security policies; no need to install tomcat, java and j2ee environment; and no need to update software. compared to the traditional approach, paas eliminates upfront hardware and software investment, reduces time and work for setting up running environment, and removes hardware and software upgrade and maintenance tasks. iaas eliminates upfront hardware investment along with other technical advantages discussed below. the cloud computing model offers much better scalability over the traditional model due to its flexibility and lower cost. in our repository, the initial storage requirement is not significant, but can grow over time if more digital collections are added. in addition, the number of visits is not high, but can increase significantly later. an accurate estimate of both factors can be difficult. in the traditional model, a purchased server has preconfigured hardware with limited storage. upgrading storage and processing power can be costly and problematic. downtime will be certain during the upgrade process. in comparison, the cloud computing model provides an easy way to upgrade storage and processing power with no downtime if handling well. bigger storage and larger instances with high-memory or highcpu can be added or removed to ■■ electricity cost: $2,190 = 5 years x 365 days/year x 24 hours/day x 0.5 kilowatt / hour x $0.10/kilowatt. most libraries running digital library programs require big storage for preserving digitization files. the analysis below just illustrates a comparison of the tco of 10tb space. it shows that the tco of locally managed storage has lower costs than amazon s3’s storage tco. though the cloud computing model still have the advantage of on-demand, avoid big initial investment on equipment, the author believes that locally managed storage may be a better solution if planned well. since amazon s6 storage pricing decreases from $0.14/gb to $0.095/gb over 500tb, amazon s3’s tco might be lower if an organization has huge amounts of data. the author suggests readers should do their own analysis. ■■ the tco of 10tb in amazon s3 per year: $16,800. note: amazon s3 replicate data at least 3 times, assuming these preservation files do not need constant changes. otherwise, data transfer fees could be high. ■❏ operation expense: $16,800 per year. ■● $16,800 = $1,400/month x 12 months. (based on amazon s3 pricing of $0.14/gb per month) ■● network cost ignored. ■■ the tco of a 10tb physical storage per year: $11,212–$12,612. ■❏ to match reliability of amazon s3, local managed storage needs three copies of data: two in hard disk and one in tape. note: dell ax4–5i san storage: quoted on october 26, 2010. replicate data 3 times, including 2 copies in hard disks, one copy in tape. ignoring time value of money, 3 percent inflation per year based on cpi statistic data. ■❏ hardware: $4,168 per year. ■● $20,840 a san storage selecting a web content management system for an academic library website | han 205cloud computing: case studies and total costs of ownership | han 205 ’06), nov. 6–8, 2006, seattle, wash., h t t p s : / / w w w. u s e n i x . o r g / e v e n t s / o s d i 0 6 / t e c h / c h a n g / c h a n g _ h t m l / (accessed apr. 21, 2010). 16. google, “gql reference, 2010, http://code.google.com/appengine/ docs/python/datastore/gqlreference .html (accessed apr. 21, 2010); google developers, “campfire one: introducing google app engine (pt. 3),” 2010, http:// www.youtube.com/watch?v=og6ac7dnx8 (accessed apr. 21, 2010). 17. david chappell, “introducing windows azure,” 2009, http://download.microsoft.com/download/e/4/3/ e43bb484–3b52–4fa8-a9f9-ec60a32954bc/ azure_services_platform.pdf (accessed apr. 2, 2010). 18. linode, “linode—xen vps hosting,” 2010, http://www.linode.com/ (accessed apr. 7, 2010). 19. google, “quotas—google app engine,” 2010, http://code.google.com/ appengine/docs/quotas.html (accessed oct. 21, 2010). 20. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 21. nurmi daniel et al., “the eucalyptus open-source cloud-computing system,” in 9th ieee/acm international symposium on cluster computing and the grid, 2009, doi: 10.1109/ccgrid.2009.93. 22. google, “the jre white list— google app engine—google code,” 2010, http://code.google.com/appengine/ docs/java/jrewhitelist.html (accessed apr. 9, 2010); google, “the java servelet environment,” 2010, http://code.google .com/appengine/docs/java/runtime .html (accessed apr. 9, 2010). 23. google, “changing quotas to keep most apps serving free,” 2009, http:// googleappengine.blogspot.com/2009/ 06/changing-quotas-to-keep-most-apps .html (access oct. 21, 2010). 24. michael armbust et al., above the clouds: a berkeley view of cloud computing (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 2 8 . h t m l (accessed july 1, 2009). 25. amazon, “amazon ec2 pricing,” 2010, http://aws.amazon.com/ec2/pricing/ (accessed feb. 20, 2010). 26. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. case of 10tb storage. since amazon offers lower storage pricing for huge amounts of data, readers are recommended to do their own analysis on the tcos. references 1. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies, 2010, http://www.ithaka .org/ithaka-s-r/research/faculty-surveys -2000–2009/faculty-survey-2009 (accessed apr. 20, 2010). 2. daniel chudnov, “a view from the clouds,” computers in libraries 30, no. 3 (2010): 33–35. 3. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 4. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 5. erik mitchell, “using cloud services for library it infrastructure,” code4lib journal 9 (2010), http://journal .code4lib.org/articles/2510 (accessed feb 10, 2011). 6. subhas c. misra and arka mondal, “identification of a company’s suitability for the adoption of cloud computing and modelling its corresponding return on investment,” mathematical & computer modelling 53 (2011): 504–21, doi: 10.1016/j. mcm.2010.03.037. 7. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. 8. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. 9. ibid. 10. peter mell and tim grance, the nist definition of cloud computing, nist, http://csrc.nist.gov/groups/sns/cloud -computing/ (accessed oct. 21, 2010). 11. ibid. 12. ibid. 13. ibid. 14. amazon, amazon elastic compute cloud (amazon ec2), 2010, http://aws .amazon.com/ec2/ (accessed oct. 21, 2010). 15. fay chang et al., “bigtable: a distributed storage system for structure data,” in 7th symposium on operating systems design and implementation (osdi the cloud computing providers was reported. there are some issues when implementing cloud computing. above the clouds: a berkeley view of cloud computing discusses ten obstacles and related opportunities for cloud computing.27 all of these obstacles and opportunities are technical. the author’s first paper on this topic also discusses legal jurisdiction issues when considering cloud computing.28 users should be aware of these potential issues when making a decision of adopting the cloud. summary this paper starts with literature review of articles in cloud computing, some of them describing how libraries are incorporating and evaluating the cloud. the author introduces cloud computing definition, identifies three-level of services (saas, paas, and iaas), and provides an overview of major players such as amazon, microsoft, and google. open source cloud software and how private cloud helps are discussed. then he presents case studies using different cloud computing providers: case 1 of using an iaas provider amazon and case 2 of using a paas provider google. in case 1, the author justifies the implementation of dspace on aws. in case 2, the author discusses advantages and pitfalls of paas and demonstrates a small web application hosted in google appengine. detailed analysis of the tcos comparing aws with local managed storage and servers are presented. the analysis shows that the cloud computing has technical advantages and offers significant cost savings when serving web applications. shifting web applications to the cloud provides several technical advantages over locally managed servers. high availability, flexibility, and cost-effectiveness are some of the most important benefits. however, the locally managed storage is still an attractive solution in a typical 206 information technology and libraries | december 2011 (accessed july 1, 2009). 29. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 – 2 8 . h t m l 27. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 28. michael armbust et al., above the clouds: a berkeley view of cloud computing, appendix. running instances on amazon ec2 task 1: building a new dspace instance ■■ build a clean os: select an amazon machine image (ami) such as ubuntu 9.2 to get up and running in a minute or two. ■■ install required modules and packages: install java, tomcat, postgresql, and mail servers. ■■ configure security and network access on the node. ■■ install and configure dspace: install system and configure configuration files. task 2: reloading a new dspace instance ■■ create a snapshot of current node with the ebs if desired: use aws’s management tools to create a snapshot. ■■ register the snapshot using aws’s management tools and write down the snapshot id, specify the kernel and ramdisk. command: ec2-register: registers the ami specified in the manifest file and generate a new ami id (see amazon ec2 documentation) (example: ec2-register -s snap-12345 -a i386 -d “description of ami” -n “name-of-image” —kernel aki-12345 — ramdisk ari-12345 ■■ in the future, a new instance can be started from this snapshot image in less than a minute. command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instance ami-a553bfcc -k keypair2 -b /dev/sda1=snap-c3fcd5aa: 100:false) task 3: increasing storage size of current instance ■■ to create an instance with desired persistent storage (e.g., 100 gb) command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instances ami-54321 -k ec2-key1 -b /dev/sda1=snap-12345:100:false) ■■ if you boot up an instance based on one of these amis with the default volume size, once it’s started up you can do an online resize of the file system: command: resize2fs: ext2 file system resizer (example: resize2fs /dev/sda1) task 4: backup ■■ go to aws web interface and navigate to the “instances” panel. ■■ select our instance and then choose “create image (ebs ami).” ■■ this newly created ami will be a snapshot of our system in its current state. 154 information technology and libraries | september 2009 tutorial kathleen carlson delivering information to students 24/7 with camtasia this article examines the selection process for and use of camtasia studio software, a screen video capture program created by techsmith. the camtasia studio software allows the author to create streaming videos which gives students 24 hour access on any topics including how to order books through interlibrary loan. h ow does one engage students in the library research process? in my brief time at the downtown phoenix campus library of arizona state university (asu) i have found a software program that allows librarians to bring the classroom to the student. screen capture programs allow you to create presentations and publish them for students to view on their own time. instead of telling students how to do something, we need to show them.1 recent studies show there are numerous benefits to using streaming video in higher education. students that receive streaming video instruction as well as traditional instruction show dramatic improvement in class.2 this article takes a look at how i selected one software program and created a streaming video using the application. i examined three software applications that help create video tutorials and presentations: cam studio, macromedia’s captivate, and techsmith’s camtasia studio. i first experimented with cam studio, which is open-source software. there are limitations to what you can do with software that is free. the screen size is too small and the file size it can create is limited. macromedia’s captivate is good if you want to create a series of screenshots with accompanying audio. i did not choose this streaming video program because i was unsure of the software’s capability, and i had no one to provide technical support. the third choice, the open-source camtasia studio, was the software i selected. there were several reasons why i preferred this software. i had more familiarity with it, and the software is very easy to load and is user friendly. it also has the ability to record a video of everything that is happening on your computer screen.3 another reason i selected camtasia studio was because of the availability of an asu software technician who had experience editing the streaming video. most users view camtasia’s video through adobe flash, but the program also can produce windows media, quicktime, dvd-ready avi, ipod, iphone, realmedia mp3, web, cd, blog, and animated gif formats.4 camtasia performs screen captures in real time. you are able to simultaneously use slideshow software, navigate to a website, and narrate step-by-step instructions. the full version of camtasia studio runs around $300. in addition to the software program, you also must have a combination headset and microphone. a stick microphone will work, but the combination headset will help eliminate any noise that can be picked up by a stick microphone. i purchased a logitech extreme pc gaming headset for about $20. when you purchase the camtasia license online at http://www.techsmith .com/, the customer service department will e-mail you the access code along with a link from which you can download the software. the cd-rom loaded with the camtasia software arrives about ten days later. my first camtasia studio project was a tutorial on how to use the university’s interlibrary loan system. here are the basic steps i took to create a streaming video: 1. preproduction. this involves the creation of a script. 2. production. the actual capturing of the video and audio content. have all websites and programs open and minimized at the bottom of the screen in order to easily select them during the video capturing. 3. postproduction. this is the most time-consuming and involves editing the video and compressing the file for delivery to users. 4. publishing. posting the video to a web server and assessing the material’s success. to see the full 3 minute 53 second streaming video “how to order an article that asu does not own” go to http://www.asu.edu/lib/tutorials/ illiad/index.html. implementing camtasia studio once camtasia studio is installed on your computer, double click on the camtasia studio icon. it will bring up a welcome window where you can select from the following (see figure 1): n start a new project by recording the screen n start recording a powerpoint presentation n start a new project by importing media files n open an existing project i have selected “start a new project by recording the screen.” on the left hand menu there is a task list, and you can select one of the kathleen carlson (kathleen.carlson@ asu.edu) is health sciences librarian, information commons library, arizona state university, downtown phoenix campus. delivering information to students 24/7 with camtasia | carlson 155 following (see figure 2): n record the screen n record the powerpoint i have selected “start a new project by recording the screen.” this will bring up a window, “new recording wizard screen recording setup.” it asks you what you would like to record (see figure 3). n region of the screen n specific window n entire screen i have selected “entire screen.” when you click on the “next” button, it brings up a recording options window (see figure 4). select from the following: n record audio n record camera i have selected “record audio while recording the screen.” next you see a window that lets you choose audio settings from the following (see figure 5): n microphone n speaker audio n microphone and speaker audio n manual input selection i have selected “microphone” (see figure 6). the next window is titled “tune volume input levels.” use the input level lever to set the audio input level (see figure 7). figure 1. welcome screen and what do you want to do? figure 5. choose audio settingsfigure 3. screen recording setup figure 4. recording options figure 2. record the screen 156 information technology and libraries | september 2009 the “begin recording” window appears, which includes instructions on how to start and stop recording. you have the choice of clicking the “record” button on camtasia recorder or clicking the f9 key to start recording. to stop, click the “stop” button on camtasia recorder or click the f10 key (see figure 8). finally click on either “record the screen” or “record powerpoint.” to view your streaming video, click on the saved icon where it says clip bin or go to camtasia toolbar and click on view. then click on clip bin, then click on thumbnails. that’s all there is to it. summary i found camtasia studio to be very user friendly, although i cannot emphasize enough how important it is for librarians to collaborate with their it staff. this software enables you to bring the classroom to the student when they need it. you may have instructed a class on library research, but many of these students may have already forgotten where to begin. streaming video allows students to access presentations 24/7. here is a checklist of things to think about when selecting software: n what do you want to accomplish with the software? n what kind of access are you trying to give? n do you want audio, video, or both? n is it easy for the student to access and understand? n have you researched the software to make sure it meets your needs? n how much money do you want to spend? n what additional equipment is necessary? finally, and most importantly, work with your it staff on all phases of your project. by developing a collaborative relationship with them you will have fewer bumps in the road. use your imagination: the sky is the limit. references 1. diane murley, “tools for creating video tutorials,” law library journal 99, no. 4 (2007). 2. ron reed, “streaming technology improves achievement: study shows the use of standards-based video content, powered by new internet technology application, increases student achievement,” t.h.e. journal 30, no. 7 (2003). 3. christopher cox, “from cameras to camtasia: streaming media without the stress,” internet reference services quarterly 9 no. 3/4 (2004). 4. john d. clark and qinghua kou, “captivate/camtasia,” journal of the medical library association 96, no. 1 (2008), http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=2212324 (accessed june 24, 2009). figure 6. audio volume levels figure 7. begin recording figure 8. camtasia recorder public library computer waiting queues: alternatives to the first -come-first-served strategy stuart williamson public library computer waiting queues | williamson 72 abstract this paper summarizes the results of a simulation of alternative queuing strategies for a public library computer sign-up system. using computer usage data gathered from a public library, the performance of these various queuing strategies is compared in terms of the distribution of user wait times. the consequences of partitioning a pool of public computers are illustrated as are the potential benefits of prioritizing users in the waiting queue according to the amount of computer time they desire. introduction many of us at public libraries are all too familiar with the scene: a crowd of customers huddled around the library entrance in the morning, anxiously waiting for the doors to open to begin a race for the computers. from this point on, the wait for a computer at some libraries, such as the one we will examine, can hover near thirty minutes on busy days and peak at an hour or more. such long waiting times are a common source of frustration for both customers and staff. by far the most effective solution to this problem is to install more public computers at your library. of course, when the space or money run out, this may no longer be possible. another approach is to reduce the length or number of sessions each customer is allowed. unfortunately, reducing session length can make completion of many important tasks difficult; whereas, restricting the number of sessions per day can result in customers upset over being unable to use idle computers.1 finally, faced with daunting wait times, libraries eager to make their computers accessible to more people may be tempted to partition their waiting queue by installing separate fifteen-minute “express” computers. a primary focus of this paper is to illustrate how partitioning the pool of public computers can significantly increase waiting times. additionally, several alternative queuing strategies are presented for providing express-like computer access without increasing overall waiting times. we often take for granted the notion that first-come-first-served (fcfs) is a basic principle of fairness. “i was here first,” is an intuitive claim that we understand from an early age. however, stuart williamson (swilliamson@metrolibrary.org) is researcher, metropolitan library system, oklahoma city, oklahoma. mailto:swilliamson@metrolibrary.org information technology and libraries | june 2012 73 the inefficiency present in a strictly fcfs queue is implicitly acknowledged when we courteously invite a person with only a few items to bypass our overflowing grocery cart to proceed ahead in the check-out line. most of us would agree to wait an additional few minutes rather than delay someone else for a much greater length of time. when express lanes are present, they formalize this process by essentially allowing customers needing help for only a short period of time to cut in line. these line cuts are masked by the establishment of separate dedicated lines, i.e., the queue is partitioned into express and non-express lines. one question addressed by this article is “is there a middle ground?” in other words, how might a library system set up its computer waiting queue to achieve express-lane type service without splitting the set of public internet computers into partitions that operate separately and in parallel? several such strategies are presented here along with the results of how each performed in a computer simulation using actual customer usage data from a public library. strategies queuing systems are heavily researched in a number of disciplines, particularly computer science and operations research. the complexity and sheer number of different queuing models can present a formidable barrier to library professionals. this is because, in the absence of real-world data, it is often necessary to analyze a queuing system mathematically by approximating its key features with an applicable probability distribution. unfortunately, applying these distributions entails adopting their underlying assumptions as well as any additional assumptions involved in calculating the input parameters. for instance, the poisson distribution (used to approximate customer arrival rates) requires that the expected arrival rate be uniform across all time intervals, an assumption which is clearly violated when school lets out and teenagers suddenly swarm the computers.2 even if we can account for such discrepancies, there remains the difficulty of estimating the correct arrival rate parameter for each discrete time interval being analyzed. fortunately, many libraries now use automated computer sign-up systems which provide access to vast amounts of real-world data. with realistic data, it is possible to simulate various queuing strategies, a few of which will be analyzed in this article. a computer simulation using real-world data provides a good picture of the practical implications of any queuing strategy we care to devise without the need for complex models. as is often the case, designing a waiting queue strategy involves striking a balance among competing factors. for instance, one way of reducing waiting times involves breaking with the fcfs rule and allowing users in one category to cut in front of other users. how many cuts are acceptable? does the shorter wait time for users in one category justify the longer waits in another? there are no right answers to these questions. while simulating a strategy can provide a realistic picture of its results in terms of waiting times, evaluating which strategy’s results are preferable for a particular library must be done on a case-by-case basis. in addition to the standard fcfs strategy with a single pool of computers and the same fcfs strategy implemented with one computer removed from the pool to serve as a dedicated fifteen public library computer waiting queues | williamson 74 minute express computer (referred to as fcfs-15), we will consider for comparison three other well-known alternative queuing strategies: shortest-job-first (sjf), highest-response-ratio-next (hrrn), and a variant of shortest-job-first (sjf-fb) which employs a feedback mechanism to restrict the number of times a given user may be bypassed in the queue.3 the three alternative strategies all require advance knowledge or estimation of how long each particular computer session will last. in our case, this means customers would need to indicate how long of a session they desire upon first signing up for a computer. any number of minutes is acceptable so we will limit the sign-up options to four categories in fifteen-minute intervals: fifteen minutes, thirty minutes, forty-five minutes, and sixty minutes. each session will then be initially categorized into one of four priority classes (p1, p2, p3, and p4) accordingly. as the data will show, customers selecting shorter sessions are given a higher priority in the queue and will thus have a shorter expected waiting time. it should be noted that relying on users to choose their own session length presents its own set of problems. it is often difficult to estimate how much time will be required to accomplish a given set of tasks online. however, users face a similar difficulty in deciding whether to opt for a dedicated fifteen-minute computer under the fcfs-15 system. the trade-off between use time and wait time should provide an incentive for some users to self-ration their computer use, placing an additional downward pressure on wait times. however, user adaptations in response to various queuing strategies are outside the scope of this analysis and will not be considered further. the shortest-job-first (sjf) strategy functions by simply selecting from the queue the user in the highest priority class. the amount of time spent waiting by each user is only considered as a tie breaker among users occupying the same priority class. our results demonstrate that the sjf strategy is generally best for minimizing overall average waiting time as well as for getting customers needing the least amount of computer time online the fastest. the main drawbacks of this strategy are that these gains come at the expense of more line cuts and higher average and maximum waiting times for the lowest priority users—those needing the longest sessions (sixty minutes). there is no limit to how many times a user can be passed over in the queue. in theory, this means that such a user could be continually bypassed and never be assigned a computer during the day. the sjf-fb strategy is a variant of sjf with the addition of a feedback mechanism that increases the priority of users each time they are cut in line. for instance, if a user signs up for a sixtyminute session, he/she is initially assigned a priority of 4. suppose that shortly after, another user signs up for a thirty-minute session and is assigned a priority of 2. the next available computer will be assigned to the user with the priority 2. the bypassed user’s priority will now be bumped up by a set interval. in this simulation an interval of 0.5 is used so the bypassed user’s new priority becomes 3.5. as a result, users beginning with a priority of 4 will reach the highest priority of 1 after being bypassed six times and will not be bypassed further. this effectively restricts the maximum number of times a user can be cut in front of at six. information technology and libraries | june 2012 75 the final alternative strategy, highest-response-ratio-next (hrrn), is a balance between fcfs and sjf. it considers both the arrival time and requested session length when assigning a priority to each user in the queue. each time a user is selected from the queue, the response ratio is recalculated for all users. the user with the highest response ratio is selected and assigned the open computer. the formula for response ratio is: ( ) this allows users with a shorter session request to cut in line, but only up to a point. even customers requesting the longest possible session move up in priority as they wait, just at a slower pace. this method produces the same benefits and drawbacks as the sjf strategy; but the effects of both are moderated, and the possibility of unbounded waiting is eliminated. still, although the expected number of cuts will be lower using hrrn than with sjf, there is no limit on how many times a user may be passed over in the queue. the response ratio formula can be generalized by scaling the importance of the waiting time factor. for instance in the modified response ratio below, increasing values of x > 1 will cause the strategy to more resemble fcfs, and decreasing values of 0 < x < 1 will more resemble sjf. ( ) one could experiment with different values of x to find a desired balance between the number of line cuts and the impact on average waiting times for customers in the various priority classes. this won’t be pursued here, and x will be assumed to be 1. methodology the data used in this simulation come from the metropolitan library system’s southern oaks library in oklahoma city. this library has eighteen public internet computers that customers can sign up for using proprietary software developed by jimmy welch, deputy executive director/technology for the metropolitan library system. the waiting queue employs the firstcome-first-served (fcfs) strategy. customers are allotted an initial session of up to sixty minutes but may extend their session in thirty-minute increments so long as the waiting queue is empty. repeat customers are also allowed to sign up for additional thirty-minute sessions during the day, provided that no user currently in the queue has been waiting for more than ten minutes (an indication that demand for computers is currently high). anonymous usage data gathered by the system in august 2010 was compiled to produce the information about each customer session shown in table 1. public library computer waiting queues | williamson 76 table 1. session data (units in minutes) the information about each session required for the simulation includes the time at which the user arrived to sign up for a computer, the number of minutes it took the user to log in once assigned a computer, how many minutes of computer time were used, whether or not this was the user’s first or a subsequent session for the day, and finally, whether the user gave up waiting and abandoned his/her place in the queue. users are given eight minutes to log in once a computer station is assigned to them before they are considered to have abandoned the queue. once this data has been gathered, the computer simulation runs by iterating through each second the library is open. as user sign-up times are encountered in the data, they are added to the waiting queue. when a computer becomes available, a user is selected from the queue using the strategy being simulated and assigned to the open computer. the customer occupies the computer for the length of time given by their associated log-in delay and session length. when this time expires, customers are removed from their computer and the information recorded during their time spent in the waiting queue is logged. results there were 7,403 sign-ups for the computers at the southern oaks library in august 2010. each of these requests is assigned a priority class based on the length of the session as detailed in table 2. the intended session length of users choosing to abandon the queue is unknown. abandoned sign-ups are assigned a priority class randomly in proportion to the overall distribution of priority classes in the data so as not to introduce any systematic bias into the results. even though their actual session length is zero, these users participate in the queue and cause the computer eventually assigned to them to sit idle for eight minutes until it is re-assigned. customers signing up for a subsequent session during the day are always assigned the lowest priority class (p-4) regardless of their requested session length. this is a policy decision to not give priority to users who have already received a computer session for the day. information technology and libraries | june 2012 77 table 2. assignment of priority classes figure 1 displays the average waiting time for each priority class during the simulation (bars) along with the total number of sessions initially assigned to each class (line). it is immediately obvious from the chart that each alternative strategy excels at reducing the average wait for high priority (p1) users. also observe how removing one computer from the pool to serve exclusively as a fifteen-minute computer drastically increases the fcfs-15 average wait times in the other priority classes. clearly, removing one (or more) computer from the pool to serve as a dedicated fifteen-minute station is a poor strategy here for all but the 519 users in class p-1. losing just one of the eighteen available computers nearly doubles the average wait for the remaining 6,884 users in the other priority classes. figure 1. average user wait minutes by priority class public library computer waiting queues | williamson 78 by contrast, note that the reduced average wait times for the highest priority users in class p-1 persist in classes p-2 and p-3 for the non-fcsc strategies. the sjf strategy produces the most dramatic reductions for the 2,164 users not in class p-4. however, for the 5,239 users in class p-4, the sjf strategy produced an average wait time that was 2.1 minutes longer than the purely fcfs strategy. the hrrn strategy achieves lesser wait time reductions than sjf in the higher priority classes, but hrrn increased the average wait for users in class p-4 by only 0.7 minutes relative to fcfs. the average wait using the sjf-fb strategy falls in between that of sjf and hrrn for each priority class while guaranteeing users will be cut at most six times. an examination of the maximum wait times for each priority class in figure 2 illustrates how the express lane itself can be a bottleneck. even with a dedicated fifteen-minute express computer under the fcfs-15 strategy, at least one user would have waited over half an hour to use a computer for fifteen minutes or less. in all but the highest priority class (p-2 through p-4), the fcfs-15 strategy again performs poorly with at least one user in each of these classes waiting over ninety minutes for a computer. figure 2. maximum user wait minutes by priority class capping the number of times a user may be passed over in the queue under the sfj-fb strategy makes it less likely that members of classes p-2 and p-3 will be able to take advantage of their higher priority to cut in front of users in class p-4 during periods of peak demand. as a result, the sjf-fb maximum wait times for classes p-2 and p-3 are similar to those under the fcfs strategy. this was not the case in the breakdown of sjf-fb average waiting times across priority classes in figure 1. information technology and libraries | june 2012 79 table 3 breaks down waiting times for each queuing strategy according to the overall percentage of users waiting no more than the given number of minutes. here we see the effects of each strategy on the system as a whole, instead of by priority class. notice that the overall average wait times for the non-fcfs strategies are lower than those of fcfs. this indicates that the total reduction in waiting times for high-priority users exceeds the additional time spent waiting by users in class p-4. in other words, these strategies are globally more efficient than fcfs. notice, too, in table 3 that the non-fcfs strategies achieve significant reductions in the median wait time compared with fcfs. table 3. distribution of wait times by strategy after demonstrating the impact that breaking the first-come-first-served rule can have on waiting times, it is important to examine the line cuts that are associated with each of these strategies. line cuts are recorded by each user in the simulation while waiting in the queue. each time a user is selected from the queue and assigned a computer, remaining users who arrived prior to the one just selected note having been skipped over. by the time they are assigned a computer, users have recorded the total number of times they were passed over in the queue. public library computer waiting queues | williamson 80 figure 3. cumulative distribution of line cuts by queuing strategy figure 3 displays the cumulative percentage of users experiencing no more than the listed number of cuts for each non-fcfs strategy. the majority of users are not passed over at all under these strategies. however, there is a small minority of users that will be repeatedly cut in line. for instance, in our simulation, one unfortunate individual was passed over in the queue sixteen times under the sjf strategy. this user waited ninety-one minutes using this strategy as opposed to only fifty-nine minutes under the familiar fcfs waiting queue. most customers would become upset upon seeing a string of sixteen people jump over them in the queue and get on a computer while they are enduring such a long wait. the hrrn strategy caused a maximum of nine cuts to an individual in this simulation. this user waited seventy-three minutes under hrrn versus only fifty-five minutes using fcfs. extreme examples such as those above are the exception. under the hrrn and sjf-fb strategies, 99% of users were passed over at most four times while waiting in the queue. conclusion we have examined the simulation of several queuing strategies using a single month of computer usage data from the southern oaks library. the relative performance difference between queuing strategies will depend on the supply and demand of computers at any given location. clearly, at libraries with plenty of public computers for which customers seldom have to wait, the choice of queuing strategy is inconsequential. however, for libraries struggling with waiting times on par with those examined here, the choice can have a substantial impact. information technology and libraries | june 2012 81 in general, however, these simulation results demonstrate the ability of non-fcfs queuing strategies to significantly lower waiting times for certain classes of users without partitioning the pool of computers. these reductions in waiting times come at the cost of allowing high priority users to essentially cut in line. this causes slightly longer wait times for low priority users; but, overall average and median wait times see a small reduction. of course, for some customers, being passed over in line even once is intolerable. furthermore, creating a system to implement an alternative queuing strategy may present obstacles of its own. however, if the need to provide for quick, short-term computer access is pressing enough for a library to create a separate pool of “express” computers; then, one of the non-fcfs queuing strategies discussed in this paper may be a viable alternative. at the very least, the fcfs-15 simulation results should give one pause before resorting to designated “express” and “nonexpress” computers in an attempt to remedy unacceptable customer waiting times. acknowledgments the author would like to thank the metropolitan library system, kay bauman, jimmy welch, sudarshan dhall, and bo kinney for their support and assistance with this paper as well as tracey thompson and tim spindle for their excellent review and recommendations. references 1. j. d. slone, “the impact of time constraints on internet and web use,” journal of the american society for information science and technology 58 (2007): 508–17. 2. william mendenhall and terry sincich, statistics for engineering and the sciences (upper saddle river, nj: prentice-hall, 2006), 151–54. 3. abraham silberschatz, peter baer galvin, and greg gagne, operating system concepts (hoboken, nj: wiley, 2009), 188–200. 50 communications how long the wait until we can call it television jerry borrell: congressional research service, library of congress , washington , d.c* this brief article will review videotex and teletext. there is little need to define terminology because new hybrid systems are being devised almost constantly (hats off to oclc's latest buzzword-viewtel). ylost useful of all would be an examination of the types of technology being used for information provision. the basic requirement for all systems is a data base-i.e ., data stored so as to allow its retrieval and display on a television screen. the interactions between the computer and the television screens are means to distinguish technologies. in teletext and videotex a device known as a decoder uses data encoded onto the lines of a broadcast signal (whatever the medium of transmission ) to generate the display screen. in videotex, voice grade telephone lines or interactive cable are used to carry data communications between two points (usually 1200 baud from the computer and 300 baud or less from the decoder and th e television screen). in teletext the signal is broadcast over airwaves (wideband) or via a time-sharing system (narrowband). the numerous configurations possible make straightforward classification of syst e ms questionable. a review of the systems currently available is useful to illustrate these terms, videotex and teletext. compuserve, the columbus, ohio-based company, provides on-line searching of newspapers to about 4,000 users. reader's digest recently acquired 51 percent of the source, a time*the views expressed in this paper do not necessarily represent those of the library of congress or of the congressional research ser~ vice. sharing service that provides more than 100 different (nonbibliographic) data bases to about 5,000 users. the warner and american express joint project, qube (also columbus-based), utilizes cable broadcast with a limited interactive capability . it does not allow for on-demand provision of information ; rather, it uses a polling technique. antiope, the french teletext system, used at ksl in st. louis last year and undergoing further tests in los angeles at knxt in the coming year, is only part of a complex data transmission system known as didon. antiope is also at an experimental stage in france, with 2,500 terminals scheduled for use in 1981. ceef ax and oracle , broadcast teletext by the bbc and ibc in britain, have an estimated 100,000 users currently. two thousand adapted television sets are being sold every month . prestel, bbc's videotex system, currently has approximately 5,000 users, half of whom are businesses. all other countries in europe are conducting experiments with one of the technologies. in canada, telidon, the most technically advanced system, has 200 users. experiments involving telidon are being conducted nationwide due to government interest in telecommunications improvements. telidon will also be used in washington in the spring of 1981 for consumer evaluation. these cursory notes should indicate the breadth of interes t in alternative means of information provision. video and electronic publishing newsletters (see references) keep track of the number of users and are the best way to keep informed of activities and developments. several important trends are becoming evident. perhaps the most evident is the realization that videography is being developed in countries other than the u.s. as a result of strong support by the national posts and telecommunications (ptt) authorities . until recently there was a feeling that the u.s. was technically behind europe. what is now evident is that in the free market system of the u . s. manufacturers or other potential system providers have had insufficient impetus to provide videotex/teletext technology. the technology of information display (see borrell, journal of library automation, v.13 (dec. 1980), p.277-81) in the u.s. is an order of magnitude more sophisticated than in europe. the point being that in the absence of strong ptt pressure, videography in the u . s. developed for specialized markets in which telecommunications were not a central need. in the one area of great demand, teletext services for the hearing impaired, decoders were developed and have been employed for a number of years (about 25,000 are currently in use ). as the high cost of telecommunications bandwidth is eased by data compression, direct broadcasting by satellite, enhanced cable services, and fiber optic networks, then videotex and te letext will become available on a wide scale in the u.s. the computer inquiry ii decision by the fcc involving reinterpretation of the communications act of 1934 has given at&t permission to enter the data processing market . in fact, at&t, in its third experiment with videotex, is taking such an aggressive stance that it seems to be doing everything that its critics have feared: providing updatable classified ads (dynamic yellow pages), allowing users to place information into the system memory , and providing voice mail servicesthereby taking on the newspapers, home computer manufacturers, and the u . s. postal service. in addition, banking services will be offered . as the largest company in the u.s ., this stance cannot be ignored. at&t supplies about 80 percent of the phone service in the u .s., and has the potential, if allowed , to become a broadcaster, data processor, publisher, and banker ; cross-ownership was never allowed up to this time . the trend toward specialized services provision is also exemplified by the communications 51 french and british systems. prestel , which was originally targeted for a home market, is now promoted with the tacit policy of being a special business service allowing financial and private data to be provided to subscribers. sofratev, the marketers of the french teletext system, are acknowledging the importance of transactional markets in two ways, based on technology they have named "smart card," a credit card-size (in one configuration) plate with a built-in microprocessor or chip. the card will allow system users to access material that will have controlled readership. an example would be a magazine of financial data provided to those who need such information (or, more importantly, are willing to pay for it). in a more complex effort, the largest retailer in paris will advertise material via teletext and system users will be able to make acquisitions with their smart card, which can be programmed with financial data. nor is this the end of the effort by the french to market information display technology. the electronic phone directory, being offered by bell in austin , is replicated in a more modest way by the french, who plan to produce a six-byeight-inch black-and,white display unit that will provide. phone directory information (both white and yellow pages) to all of france by the 1990s. developed as part of the "telerriatique" program of the .french government, the terminals represent to some (the parent company of the source has tendered an offer for up to 250,000 of the terminals) a low-cost alternative for providing videotex to a mass market. the tandy home computer in its videotex configuration seems to fill the same market slot. perhaps the most disturbing trend, at least from a librarian's point of view, is the fact that contemporary data systems are being created which could benefit greatly from the experience of librarians and libraries. for instance, research into the methods of access-keyword, phonetic and geographical-by the french is intended to pro:vide a flexible and easily used system for untrained persons searching for directory information, and is being performed by an advertising and yellow pages 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and listthe next generation library catalog | zhou 151are your digital documents web friendly? | zhou 151 are your digital documents web friendly?: making scanned documents web accessible the internet has greatly changed how library users search and use library resources. many of them prefer resources available in electronic format over traditional print materials. while many documents are now born digital, many more are only accessible in print and need to be digitized. this paper focuses on how the colorado state university libraries creates and optimizes text-based and digitized pdf documents for easy access, downloading, and printing. t o digitize print materials, we normally scan originals, save them in archival digital formats, and then make them webaccessible. there are two types of print documents, graphic-based and text-based. if we apply the same techniques to digitize these two different types of materials, the documents produced will not be web-friendly. graphic-based materials include archival resources such as historical photographs, drawings, manuscripts, maps, slides, and posters. we normally scan them in color at a very high resolution to capture and present a reproduction that is as faithful to the original as possible. then we save the scanned images in tiff (tagged image file format) for archival purposes and convert the tiffs to jpeg (joint photographic experts group) 2000 or jpeg for web access. however, the same practice is not suitable for modern text-based documents, such as reports, journal articles, meeting minutes, and theses and dissertations. many old text-based documents (e.g., aged newspapers and books), should be yongli zhoututorial files for fast web delivery as access files. for text-based files, access files normally are pdfs that are converted from scanned images. “bcr’s cdp digital imaging best practices version 2.0” says that the master image should be the highest quality you can afford, it should not be edited or processed for any specific output, and it should be uncompressed.1 this statement applies to archival images, such as photographs, manuscripts, and other image-based materials. if we adopt the same approach for modern text documents, the result may be problematic. pdfs that are created from such master files may have the following drawbacks: ■■ because of their large file size, they require a long download time or cannot be downloaded because of a timeout error. ■■ they may crash a user’s computer because they use more memory while viewing. ■■ they sometimes cannot be printed because of insufficient printer memory. ■■ poor print and on-screen viewing qualities can be caused by background noise and bleedthrough of text. background noise can be caused by stains, highlighter marks made by users, and yellowed paper from aged documents. ■■ the ocr process sometimes does not work for high-resolution images. ■■ content creators need to spend more time scanning images at a high resolution and converting them to pdf documents. web-friendly files should be small, accessible by most users, full-text searchable, and have good treated as graphic-based material. these documents often have faded text, unusual fonts, stains, and colored background. if they are scanned using the same practice as modern text documents, the document created can be unreadable and contain incorrect information. this topic is covered in the section “full-text searchable pdfs and troubleshooting ocr errors.” currently, pdf is the file format used for most digitized text documents. while pdfs that are created from high-resolution color images may be of excellent quality, they can have many drawbacks. for example, a multipage pdf may have a large file size, which increases download time and the memory required while viewing. sometimes the download takes so long it fails because a time-out error occurs. printers may have insufficient memory to print large documents. in addition, the optical character recognition (ocr) process is not accurate for high resolution images in either color or grayscale. as we know, users want the ability to easily download, view, print, and search online textual documents. all of the drawbacks created by high-quality scanning defeat one of the most important purposes of digitizing text-based documents: making them accessible to more users. this paper addresses how colorado state university libraries (csul) manages these problems and others as staff create web-friendly digitized textual documents. topics include scanning, long-time archiving, full-text searchable pdfs and troubleshooting ocr problems, and optimizing pdf files for web delivery. preservation master files and access files for digitization projects, we normally refer to images in uncompressed tiff format as master files and compressed yongli zhou is digital repositories librarian, colorado state university libraries, colorado state university, fort collins, colorado 152 information technology and libraries | september 2010152 information technology and libraries | september 2010 factors that determine pdf file size. color images typically generate the largest pdfs and black-and-white images generate the smallest pdfs. interestingly, an image of smaller file size does not necessarily generate a smaller pdf. table 1 shows how file format and color mode affect pdf file size. the source file is a page containing black-and-white text and line art drawings. its physical dimensions are 8.047" by 10.893". all images were scanned at 300 dpi. csul uses adobe acrobat professional to create pdfs from scanned images. the current version we use is adobe acrobat 9 professional, but most of its features listed in this paper are available for other acrobat versions. when acrobat converts tiff images to a pdf, it compresses images. therefore a final pdf has a smaller file size than the total size of the original images. acrobat compresses tiff uncompressed, lzw, and zip the same amount and produces pdfs of the same file size. because our in-house scanning software does not support tiff g4, we did not include tiff g4 test data here. by comparing similar pages, we concluded that tiff g4 works the same as tiff uncompressed, lzw, and zip. for example, if we scan a text-based page as blackand-white and save it separately in tiff uncompressed, lzw, zip, or g4, then convert each page into a pdf, the final pdf will have the same file size without a noticeable quality difference. tiff jpeg generates the smallest pdf, but it is a lossy format, so it is not recommended. both jpeg and jpeg 2000 have smaller file sizes but generate larger pdfs than those converted from tiff images. recommendations 1. use tiff uncompressed or lzw in 24 bits color for pages with color graphs or for historical documents. 2. use tiff uncompressed or lzw compress an image up to 50 percent. some vendors hesitate to use this format because it was proprietary; however, the patent expired on june 20, 2003. this format has been widely adopted by much software and is safe to use. csul saves all scanned text documents in this format. ■■ tiff zip: this is a lossless compression. like lzw, zip compression is most effective for images that contain large areas of single color. 2 ■■ tiff jpeg: this is a jpeg file stored inside a tiff tag. it is a lossy compression, so csul does not use this file format. other image formats: ■■ jpeg: this format is a lossy compression and can only be used for nonarchival purposes. a jpeg image can be converted to pdf or embedded in a pdf. however, a pdf created from jpeg images has a much larger file size compared to a pdf created from tiff images. ■■ jpeg 2000: this format’s file extension is .jp2. this format offers superior compression performance and other advantages. jpeg 2000 normally is used for archival photographs, not for text-based documents. in short, scanned images should be saved as tiff files, either with compression or without. we recommend saving text-only pages and pages containing text and/or line art as tiff g4 or tiff lzw. we also recommend saving pages with photographs and illustrations as tiff lzw. we also recommend saving pages with photographs and illustrations as tiff uncompressed or tiff lzw. how image format and color mode affect pdf file size color mode and file format are two on-screen viewing and print qualities. in the following sections, we will discuss how to make scanned documents web-friendly. scanning there are three main factors that affect the quality and file size of a digitized document: file format, color mode, and resolution of the source images. these factors should be kept in mind when scanning text documents. file format and compression most digitized documents are scanned and saved as tiff files. however, there are many different formats of tiff. which one is appropriate for your project? ■■ tiff: uncompressed format. this is a standard format for scanned images. however, an uncompressed tiff file has the largest file size and requires more space to store. ■■ tiff g3: tiff with g3 compression is the universal standard for faxs and multipage line-art documents. it is used for blackand-white documents only. ■■ tiff g4: tiff with g4 compression has been approved as a lossless archival file format for bitonal images. tiff images saved in this compression have the smallest file size. it is a standard file format used by many commercial scanning vendors. it should only be used for pages with text or line art. many scanning programs do not provide this file format by default. ■■ tiff huffmann: a method for compressing bi-level data based on the ccitt group 3 1d facsimile compression schema. ■■ tiff lzw: this format uses a lossless compression that does not discard details from images. it may be used for bitonal, grayscale, and color images. it may the next generation library catalog | zhou 153are your digital documents web friendly? | zhou 153 to be scanned at no less than 600 dpi in color. our experiments show that documents scanned at 300 or 400 dpi are sufficient for creating pdfs of good quality. resolutions lower than 300 dpi are not recommended because they can degrade image quality and produce more ocr errors. resolutions higher than 400 dpi also are not recommended because they generate large files with little improved on-screen viewing and print quality. we compared pdf files that were converted from images of resolutions at 300, 400, and 600 dpi. viewed at 100 percent, the difference in image quality both on screen and in print was negligible. if a page has text with very small font, it can be scanned at a higher resolution to improve ocr accuracy and viewing and print quality. table 2 shows that high-resolution images produce large files and require more time to be converted into pdfs. the time required to combine images is not significantly different compared to scanning time and ocr time, so it was omitted. our example is a modern text document with text and a black-and-white chart. most of our digitization projects do not require scanning at 600 dpi; 300 dpi is the minimum requirement. we use 400 dpi for most documents and choose a proper color mode for each page. for example, we scan our theses and dissertations in black-andwhite at 400 dpi for bitonal pages. we scan pages containing photographs or illustrations in 8-bit grayscale or 24-bit color at 400 dpi. other factors that affect pdf file size in addition to the three main factors we have discussed, unnecessary edges, bleed-through of text and graphs, background noise, and blank pages also increase pdf file sizes. figure 1 shows how a clean scan can largely reduce a pdf file size and cover. the updated file has a file size of 42.8 mb. the example can be accessed at http://hdl.handle .net/10217/3667. sometimes we scan a page containing text and photographs or illustrations twice, in color or grayscale and in black-and-white. when we create a pdf, we combine two images of the same page to reproduce the original appearance and to reduce file size. how to optimize pdfs using multiple scans will be discussed in a later section. how image resolution affects pdf file size before we start scanning, we check with our project manager regarding project standards. for some funded projects, documents are required in grayscale 8 bits for pages with black-and-white photographs or grayscale illustrations. 3. use tiff uncompressed, lzw, or g4 in black-and-white for pages containing text or line art. to achieve the best result, each page should be scanned accordingly. for example, we had a document with a color cover, 790 pages containing text and line art, and 7 blank pages. we scanned the original document in color at 300 dpi. the pdf created from these images was 384 mb, so large that it exceeded the maximum file size that our repository software allows for uploading. to optimize the document, we deleted all blank pages, converted the 790 pages with text and line art from color to blackand-white, and retained the color table 1. file format and color mode versus pdf file size file format scan specifications tiff size (kb) pdf size (kb) tiff color 24 bits 23,141 900 tiff lzw color 24 bits 5,773 900 tiff zip color 24 bits 4,892 900 tiff jpeg color 24 bits 4,854 873 jpeg 2000 color 24 bits 5,361 5,366 jpeg color 24 bits 4,849 5,066 tiff grayscale 8 bits 7,729 825 tiff lzw grayscale 8 bits 2,250 825 tiff zip grayscale 8 bits 1,832 825 tiff jpeg grayscale 8 bits 2,902 804 jpeg 2000 grayscale 8 bits 2,266 2,270 jpeg grayscale 8 bits 2,886 3,158 tiff black-and-white 994 116 tiff lzw black-and-white 242 116 tiff zip black-and-white 196 116 note: black-and-white scans cannot be saved in jpeg, jpeg 2000, or tiff jpeg formats. 154 information technology and libraries | september 2010154 information technology and libraries | september 2010 many pdf files cannot be saved as pdf/a files. if an error occurs when saving a pdf to pdf/a, you may use adobe acrobat preflight (advanced > preflight) to identify problems. see figure 2. errors can be created by nonembedded fonts, embedded images with unsupported file compression, bookmarks, embedded video and audio, etc. by default, the reduce file size procedure in acrobat professional compresses color images using jpeg 2000 compression. after running the reduce file size procedure, a pdf may not be saved as a pdf/a because of a “jpeg 2000 compression used” error. according to the pdf/a competence center, this problem will be eliminated in the second part of the pdf/a standard— pdf/a-2 is planned for 2008/2009. there are many other features in new pdfs; for example, transparency and layers will be allowed in pdf/a2.5 however, at the time this paper was written pdf/a-2 had not been announced.6 portable, which means the file created on one computer can be viewed with an acrobat viewer on other computers, handheld devices, and on other platforms.3 a pdf/a document is basically a traditional pdf document that fulfills precisely defined specifications. the pdf/a standard aims to enable the creation of pdf documents whose visual appearance will remain the same over the course of time. these files should be software-independent and unrestricted by the systems used to create, store, and reproduce them.4 the goal of pdf/a is for long-term archiving. a pdf/a document has the same file extension as a regular pdf file and must be at least compatible with acrobat reader 4. there are many ways to create a pdf/a document. you can convert existing images and pdf files to pdf/a files, export a document to pdf/a format, scan to pdf/a, to name a few. there are many software programs you can use to create pdf/a, such as adobe acrobat professional 8 and later versions, compart ag, pdflib, and pdf tools ag. simultaneously improve its viewing and print quality. recommendations 1. unnecessary edges: crop out. 2. bleed-through text or graphs: place a piece of white or black card stock on the back of a page. if a page is single sided, use white card stock. if a page is double sided, use black card stock and increase contrast ratio when scanning. often color or grayscale images have bleedthrough problems. scanning a page containing text or line art as black-and-white will eliminate bleed-through text and graphs. 3. background noise: scanning a page containing text or line art as black-and-white can eliminate background noise. many aged documents have yellowed papers. if we scan them as color or grayscale, the result will be images with yellow or gray background, which may increase pdf file sizes greatly. we also recommend increasing the contrast for better ocr results when scanning documents with background colors. 4. blank pages: do not include if they are not required. blank pages scanned in grayscale or color can quickly increase file size. pdf and longterm archiving pdf/a pdf vs. pdf/a pdf, short for portable document format, was developed by adobe as a unique format to be viewed through adobe acrobat viewers. as the name implies, it is table 2. color mode and image resolution vs. pdf file size color mode resolution (dpi) scanning time (sec.) ocr time (sec.) tiff lzw (kb) pdf size (kb) color 600 100 n/a* 16,498 2,391 color 400 25 35 7,603 1,491 color 300 18 16 5,763 952 grayscale 600 36 33 6,097 2,220 grayscale 400 18 18 2,888 1370 grayscale 300 14 12 2,240 875 b/w 600 12 18 559 325 b/w 400 10 10 333 235 b/w 300 8 9 232 140 *n/a due to an ocr error the next generation library catalog | zhou 155are your digital documents web friendly? | zhou 155 able. this option keeps the original image and places an invisible text layer over it. recommended for cases requiring maximum fidelity to the original image.8 this is the only option used by csul. 2. searchable image: ensures that text is searchable and selectable. this option keeps the original image, de-skews it as needed, and places an invisible text layer over it. the selection for downsample images in this same dialog box determines whether the image is downsampled and to what extent.9 the downsampling combines several pixels in an image to make a single larger pixel; thus some information is deleted from the image. however, downsampling does not affect the quality of text or line art. when a proper setting is used, the size of a pdf can be significantly reduced with little or no loss of detail and precision. 3. clearscan: synthesizes a new type 3 font that closely approximates the original, and preserves the page background using a low-resolution copy.10 the final pdf is the same as a born-digital pdf. because acrobat cannot guarantee the accuracy of manipulate the pdf document for accessibility. once ocr is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document.7 acrobat professional provides three ocr options: 1. searchable image (exact): ensures that text is searchable and selectfull-text searchable pdfs and troubleshooting ocr errors a pdf created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot figure 1. pdfs converted from different images: (a) the original pdf converted from a grayscale image and with unnecessary edges; (b) updated pdf converted from a blackand-white image and with edges cropped out; (c) screen viewed at 100 percent of the pdf in grayscale; and (d) screen viewed at 100 percent of the pdf in black-and-white. dimensions: 9.127” x 11.455” color mode: grayscale resolution: 600 dpi tiff lzw: 12.7 mb pdf: 1,051 kb dimensions: 8” x 10.4” color mode: black-and-white resolution: 400 dpi tiff lzw: 153 kb pdf: 61 kb figure 2. example of adobe acrobat 9 preflight 156 information technology and libraries | september 2010156 information technology and libraries | september 2010 but at least users can read all text, while the black-and-white scan contains unreadable words. troubleshoot ocr error 3: cannot ocr image based text the search of a digitized pdf is actually performed on its invisible text layer. the automated ocr process inevitably produces some incorrectly recognized words. for example, acrobat cannot recognize the colorado state university logo correctly (see figure 6). unfortunately, acrobat does not provide a function to edit a pdf file’s invisible text layer. to manually edit or add ocr’d text, adobe acrobat capture 3.0 (see figure 7) must be purchased. however, our tests show that capture 3.0 has many drawbacks. this software is complicated and produces it’s own errors. sometimes it consolidates words; other times it breaks them up. in addition, it is time-consuming to add or modify invisible text layers using acrobat capture 3.0. at csul, we manually add searchable text for title and abstract pages only if they cannot be ocr’d by acrobat correctly. the example in troubleshoot ocr error 2: could not perform recognition (ocr) sometimes acrobat generates an “outside of the allowed specifications” error when processing ocr. this error is normally caused by color images scanned at 600 dpi or more. in the example in figure 4, the page only contains text but was scanned in color at 600 dpi. when we scanned this page as blackand-white at 400 dpi, we did not encounter this problem. we could also use a lower-resolution color scan to avoid this error. our experiments also show that images scanned in black-and-white work best for the ocr process. in this article we mainly discuss running the ocr process on modern textual documents. black-and-white scans do not work well for historical textual documents or aged newspapers. these documents may have faded text and background noise. when they are scanned as blackand-white, broken letters may occur, and some text might become unreadable. for this reason they should be scanned in color or grayscale. in figure 5, images scanned in color might not produce accurate ocr results, ocred text at 100 percent, this option is not acceptable for us. for a tutorial on to how to make a full-text searchable pdf, please see appendix a. troubleshoot ocr error 1: acrobat crashes occasionally acrobat crashes during the ocr process. the error message does not indicate what causes the crash and where the problem occurs. fortunately, the page number of the error can be found on the top shortcuts menu. in figure 3, we can see the error occurs on page 7. we discovered that errors are often caused by figures or diagrams. for a problem like this, the solution is to skip the error-causing page when running the ocr process. our initial research was performed on acrobat 8 professional. our recent study shows that this problem has been significantly improved in acrobat 9 professional. figure 3. adobe acrobat 8 professional crash window figure 4. “could not perform recognition (ocr)” error figure 5. an aged newspaper scanned in color and black-and-white aged newspaper scanned in color aged newspaper scanned in black-and-white the next generation library catalog | zhou 157are your digital documents web friendly? | zhou 157 a very light yellow background. the undesirable marks and background contribute to its large file size and create ink waste when printed. method 2: running acrobat’s built-in optimization processes acrobat provides three built-in processes to reduce file size. by default, acrobat use jpeg compression for color and grayscale images and ccitt group 4 compression for bitonal images. optimize scanned pdf open a scanned pdf and select documents > optimize scanned pdf. a number of settings, such as image quality and background removal, can be specified in the optimize scanned pdf dialog box. our experiments show this process can noticably degrade images and sometimes even increase file size. therefore we do not use this option. reduce file size open a scanned pdf and select documents > reduce file size. the reduce file size command resamples and recompresses images, removes embedded base-14 fonts, and subset-embeds fonts that were left embedded. it also compresses document structure and cleans up elements such as invalid bookmarks. if the file size is already as small as possible, this command has no effect.11 after process, some files cannot be saved as pdf/a, as we discussed in a previous section. we also noticed that different versions of acrobat can create files of different file sizes even if the same settings were used. pdf optimizer open a scanned pdf and select advanced > pdf optimizer. many settings can be specified in the pdf optimizer dialog box. for example, we can downsample images from sections, we can greatly reduce a pdf’s size by using an appropriate color mode and resolution. figure 9 shows two different versions of a digitized document. the source document has a color cover and 111 bitonal pages. the original pdf, shown in figure 9 on the left, was created by another university department. it was not scanned according to standards and procedures adopted by csul. it was scanned in color at 300 dpi and has a file size of 66,265 kb. we exported the original pdf as tiff images, batch-converted color tiff images to black-and-white tiff images, and then created a new pdf using blackand-white tiff images. the updated pdf has a file size of 8,842 kb. the image on the right is much cleaner and has better print quality. the file on the left has unwanted marks and figure 8 is a book title page for which we used acrobat capture 3.0 to manually add searchable text. the entire book may be accessed at http://hdl .handle.net/10217/1553. optimizing pdfs for web delivery a digitized pdf file with 400 color pages may be as large as 200 to 400 mb. most of the time, optimizing processes may reduce files this large without a noticeable difference in quality. in some cases, quality may be improved. we will discuss three optimization methods we use. method 1: using an appropriate color mode and resolution as we have discussed in previous ~do university original logo text ocred by acrobat figure 6. incorrectly recognized text sample figure 7. adobe acrobat capture interface figure 8. image-based text sample 158 information technology and libraries | september 2010158 information technology and libraries | september 2010 grayscale. a pdf may contain pages that were scanned with different color modes and resolutions. a pdf may also have pages of mixed resolutions. one page may contain both bitonal images and color or grayscale images, but they must be of the same resolution. the following strategies were adopted by csul: 1. combine bitmap, grayscale, and color images. we use grayscale images for pages that contain grayscale graphs, such as black-and-white photos, color images for pages that contain color images, and bitmap images for text-only or text and line art pages. 2. if a page contains high-definition color or grayscale images, scan that page in a higher resolution and scan other pages at 400 dpi. 3. if a page contains a very small font and the ocr process does not work well, scan it at a higher resolution and the rest of document at 400 dpi. 4. if a page has both text, color, or grayscale graphs, we scan it twice. then we modify images using adobe photoshop and combine two images in acrobat. in figure 10, the grayscale image has a gray background and a true reproduction of the original photograph. the black-and-white scan has a white background and clean text, but details of the photograph are lost. the pdf converted from the grayscale image is 491 kb and has nine ocr errors. the pdf converted from the black-and-white image is 61kb and has no ocr errors. the pdf converted from a combination of the grayscale and black-and-white images is 283 kb and has no ocr errors. the following are the steps used to create a pdf in figure 10 using acrobat: 1. scan a page twice—grayscale optimizer can be found at http:// www.acrobatusers.com/tutorials/ understanding-acrobats-optimizer. method 3: combining different scans many documents have color covers and color or grayscale illustrations, but the majority of pages are textonly. it is not necessary to scan all pages of such documents in color or a higher resolution to a lower resolution and choose a different file compression. different collections have different original sources, therefore different settings should be applied. we normally do several tests for each collection and choose the one that works best for it. we also make our pdfs compatible with acrobat 6 to allow users with older versions of software to view our documents. a detailed tutorial of how to use the pdf figure 9. reduce file size example figure 10. reduce file size example: combine images the next generation library catalog | zhou 159are your digital documents web friendly? | zhou 159 help.html?content=wsfd1234e1c4b69f30 ea53e41001031ab64-7757.html (accessed mar. 3, 2010). 3. ted padova adobe acrobat 7 pdf bible, 1st ed. (indianapolis: wiley, 2005). 4. olaf drümmer, alexandra oettler, and dietrich von seggern, pdf/a in a nutshell—long term archiving with pdf, (berlin: association for digital document standards, 2007). 5. pdf/a competence center, “pdf/a: an iso standard—future development of pdf/a,” http://www. pdfa.org/doku.php?id=pdfa:en (accessed july 20, 2010). 6. pdf/a competence center, “pdf/a—a new standard for longterm archiving,” http://www.pdfa.org/ doku.php?id=pdfa:en:pdfa_whitepaper (accessed july 20, 2010). 7. adobe, “creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities,” 2005, http://www.adobe.com/enterprise/ a c c e s s i b i l i t y / p d f s / a c ro 7 _ p g _ u e . p d f (accessed mar. 8, 2010). 8. adobe, “recognize text in scanned documents,” 2010, http:// help.adobe.com/en_us/acrobat/9.0/ s t a n d a rd / w s 2 a 3 d d 1 fa c fa 5 4 c f 6 -b993-159299574ab8.w.html (accessed mar. 8, 2010). 9. ibid. 10. ibid. 11. adobe, “reduce file size by saving,” 2010, http://help.adobe.com/en_us/ acrobat/9.0/standard/ws65c0a053 -bc7c-49a2-88f1-b1bcd2524b68.w.html (accessed mar. 3, 2010). the other 76 pages as grayscale and black-and-white. then we used the procedure described above to combine text pages and photographs. the final pdf has clear text and correctly reproduced photographs. the example can be found at http://hdl .handle.net/10217/1553. conclusion our case study, as reported in this article, demonstrates the importance of investing the time and effort to apply the appropriate standards and techniques for scanning and optimizing digitized documents. if proper techniques are used, the final result will be web-friendly resources that are easy to download, view, search, and print. users will be left with a positive impression of the library and feel encouraged to use its materials and services again in the future. references 1. bcr’s cdp digital imaging best practices working group, “bcr’s cdp digital imaging best practices version 2.0,” june 2008, http://www.bcr.org/ dps/cdp/best/digital-imaging-bp.pdf (accessed mar. 3, 2010). 2. adobe, “about file formats and compression,” 2010, http://livedocs .adobe.com/en_us/photoshop/10.0/ and black-and-white. 2. crop out text on the grayscale scan using photoshop. 3. delete the illustration on the black-and-white image using photoshop. 4. create a pdf using the blackand-white image. 5. run the ocr process and save the file. 6. insert the color graph. select tools > advanced editing > touchup object tool. rightclick on the page and select place image. locate the color graph in the open dialog, then click open and move the color graph to its correct location. 7. save the file and run the reduce file size or pdf optimizer procedure. 8. save the file again. this method produces the smallest file size with the best quality, but it is very time-consuming. at csul we used this method for some important documents, such as one of our institutional repository’s showcase items, agricultural frontier to electronic frontier. the book has 220 pages, including a color cover, 76 pages with text and photographs, and 143 text-only pages. we used a color image for the cover page and 143 black-and-white images for the 143 text-only pages. we scanned appendix a. step-by-step creating a full-text searchable pdf in this tutorial, we will show you how to create a full-text searchable pdf using adobe acrobat 9 professional. creating a pdf from a scanner adobe acrobat professional can create a pdf directly from a scanner. acrobat 9 provides five options: black and white document, grayscale document, color document, color image, and custom scan. the custom scan option allows you to scan, run the ocr procedure, add metadata, combine multiple pages into one pdf, and also make it pdf/a compliant. to create a pdf from a scanner, go to file > create pdf > from scanner > custom scan. see figure 1. at csul, we do not directly create pdfs from scanners because our tests show that it can produce fuzzy text and it is not time efficient. both scanning and running the ocr process can be very time consuming. if an error occurs during these processes, we would have to start over again. we normally scan images on scanning stations by student employees 160 information technology and libraries | september 2010160 information technology and libraries | september 2010 or outsource them to vendors. then library staff will perform quality control and create pdfs on seperate machines. in this way, we can work on multiple documents at the same time and ensure that we provide high-quality pdfs. creating a pdf from scanned images 1. from the task bar select combine > merge files into a single pdf > from multiple files. see figure 2. 2. in the combine files dialog, make sure the single pdf radio button is selected. from the add files dropdown menu select add files. see figure 3. 3. in the add files dialog, locate images and select multiple images by holding shift key, and then click add files button. 4. by default, acrobat sorts files by file names. use move up and move down buttons to change image orders and use the remove button to delete images. choose a target file size. the smallest icon will produce a file with a smaller file size but a lower image quality pdf, and the largest icon will produce a high image quality pdf but with a very large file size. we normally use the default file size setting, which is the middle icon. 5. save the file. at this point, the pdf is not full-text searchable. making a full-text searchable pdf a pdf document created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot manipulate the pdf document for accessibility. once optical character recognition (ocr) is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document. adobe acrobat professional provides three ocr options, searchable image (exact), searchable image, and clean scan. because searchable image (exact) is the only option that keeps the original look, we only use this option. to run an ocr procedure using acrobat 9 professional: 1. open a digitized pdf. 2. select document > ocr text recognition > recognize text using ocr. 3. in the recognize text dialog, specify pages to be ocred. 4. in the recognize text dialog, click the edit button in the settings section to choose ocr language and pdf output style. we recommend the searchable image (exact) option. click ok. the setting will be remembered by the program and will be used until a new setting is chosen. sometimes a pdf’s file size increases greatly after an ocr process. if this happens, use the pdf optimizer to reduce its file size. figure 2. merge files into a single pdf figure 3. combine files dialog figure 1. acrobat 9 professional’s create pdf from scanner dialog president’s message: the year in review—open everything colleen cuddy information technologies and libraries | june 2012 1 as i sit down to write my last president’s column a variety of topics are running through my mind. but as i focus on just one word to sum up the year, “open” rises to the top of the list. for truly it was a year of all things open. my presidential theme is open data/open science and i am looking forward to hearing tony hey and clifford lynch speak at the lita president’s program later this month on this topic. dr. lynch is also the recipient of this year’s lita/library hi tech award for outstanding communication in library and information technology, cosponsored by emerald group publishing limited. the prestigious frederick g. kilgour award for research in library and information technology award, co-sponsored by oclc, is being given to g. sayeed choudhury this year. dr. choudhury is a longtime proponent of open data and the award recognizes his leadership in the field of data curation through the national science foundation supported data conservancy project. as you well know ital is now an open-access journal. open access continues to be a hot topic, and rightly so. my last column was devoted to the subject of open access, but, i do want to remind librarians to advocate for open access in the coming year—please keep up the fight! in addition to seeing our journal to its new platform, the publications committee has also been busy with a few new lita guides, one of which, “getting started with gis,” by eva dodsworth, provides some guidance on harnessing data sets to work with geospatial technology. ms. dodworth will be conducting an online course on this topic in august and the education committee has many new courses in the pipeline. internally lita has been working towards a more open and transparent governance structure. the board has been relentless in making sure that all of its meetings are open, from in-person meetings at conferences to our monthly phone meetings to conversations on ala connect. we have been streaming our board meetings live and now will archive the recordings for a limited amount of time. this move has not been without challenges as board members and the lita office struggled to build open communication with each other and the membership. sometimes the challenges were ideological or legal, and sometimes the very technology that we embrace has caused problems, but i think it is safe to say that lita leadership is working towards a common goal of a transparent structure with open communication channels. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 we opened up communication channels to get feedback on what our membership would like most when zoe stewart-marshall, incoming president, hosted a town hall meeting at the ala midwinter meeting that focused on member feedback. i know that she is working hard to address membership needs during her presidency. as a medical librarian i often travel in circles outside of ala and when my medical colleagues learned that i was lita president they were really impressed. lita is a well-known and wellrespected brand in the library community. talking to my non-lita colleagues reinforced the value that lita brings to the entire profession, particularly through our programming, education, and they way in which we share and exchange information in open forums such as the lita blog and listserv. (of course i hope that we have gained some new members through this outreach!) clearly we are doing many things right and we should not lose sight of what is great about lita as we work on addressing areas that need improvement. one thing that is consistently great about lita is its annual sponsorship of ala emerging leaders. this year we sponsored two lita members who were part of the 2012 ala emerging leaders cohort: jodie gambill and tasha keagan. both were assigned to a team working on a lita project that asked for a recommendation and plan for the implementation of a lita experts profile system. the team was responsible for identifying the software to employ and creating an implementation plan with ontology recommendations. the team has identified vivo (an opensource, semantic-web application) as the software for the project and will present its findings and implementation plan to the lita board and the ala community at the ala annual conference. the team did an outstanding job on this project and completed the deliverable on time, with very little guidance from lita leadership—a sure sign of leadership! yet, i was often reminded that as we embrace our upcoming leaders, we should not forget that leadership occurs on all levels. one message that i heard throughout my presidency is that lita should do more for mid-career librarians—and this sentiment is shared by members of other organizations in which i am active. this is a challenge that lita leadership is poised to take on as it balances its services to membership. as i now count eighteen occurrences of the word “open” in this column i believe i have made my point and it is time to sign off. although i am finishing up my duties as lita president, i am not saying goodbye. i look forward to my new role as past-president, particularly in hosting the 2012 lita national forum in columbus, ohio (october 4-7): new world of data: discover. connect. remix. the national forum planning committee led by susan sharpless smith has done an outstanding job putting together an excellent meeting. the committee has lined up interesting speakers such as eric hellman, ben schneiderman, and sarah houghton, and thoughtfully evaluated many paper and poster submissions. i am sure we will all learn quite a bit from our colleagues as we attend sessions and network. i will be hosting a dinner and i hope to see some of you there as i enjoy what i hope will be a more relaxed role as past-president. it has been an honor to serve you and i look forward to working with lita in the years to come! article to thine own 3d selfie be true outreach for an academic library makerspace with a 3d selfie booth alex watson information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.15107 about the author alex watson (corresponding author: apwatson@olemiss.edu) is research & instruction librarian and associate professor, university of mississippi © 2023. submitted: may 27, 2022. accepted for publication: october 12, 2023. published 18 december 2023. abstract to promote an academic library makerspace, the university of mississippi libraries hosted a “3d selfie” booth which used body scanning technology. this booth, advertised on campus and set up outside the library during the first weeks of class, was designed to attract attention and perform outreach through the use of body scans to be printed in the makerspace at a later date. although the hoped-for printing of “selfies” did not materialize, the project resulted in data about interested patrons and ideas for similar projects going forward. this paper serves as a case study for other academic library makerspaces interested in similar outreach. introduction the idealab, a makerspace in the j. d. williams library at the university of mississippi, opened in 2017. as a part of outreach efforts designed to increase awareness of and excitement for this makerspace, library staff created and ran a pop-up “3d selfie” booth in august 2018 during the first weeks of the fall semester. library staff focused on advertising 3d body scanning technology and using the term “3d selfie” to generate enthusiasm for the makerspace with the hope that students might spend time at an associated booth or even visit the makerspace in person to obtain a 3d print of their selfie, with “3d selfie” being the term they were marketed under as opposed to the more accurate, but less catchy “3d image” or “3d scan.” as such, the library collected data from participants who agreed to be scanned during the four days the booth was in operation. ultimately, library staff sought to use this unique concept and technology to generate buzz and harvest data for future maker-specific outreach efforts in addition to the makerspace’s already extant workshops. while the hoped-for result of a large number of selfies being printed in the idealab was not achieved, the outcome is still interesting and offers lessons for the university of mississippi—as well as any other academic library makerspaces interested in running “3d selfie” booths—going forward. literature review makerspaces in libraries, and academic libraries in particular, have been an important and growing area of discourse. relatively little has been written about outreach undertaken for these spaces since their inception as a trend in the early 2010s, though, and what scholarship exists tends to focus on programming and workshops rather than bespoke outreach. wallace et al. provide a variety of approaches to programming and workshopping, for instance.1 similarly, the mailto:apwatson@olemiss.edu information technology and libraries december 2023 to thine own 3d selfie be true 2 watson literature is heavily skewed toward 3d printing over 3d scanning, and many articles on scanning seem to focus on the utility of scanning and printing rare or delicate items in special collections or similar contexts. passehl-stoddart et al. provide an example of this form of makerspace outreach, in this case 3d printing of cultural objects from a local collection.2 one of the earlier scholarly treatments of 3d printing and scanning in an academic library was michael groenendyk and riel gallant’s work at dalhousie university in 2013, near the dawn of the current maker movement.3 writing about the dalhousie “hackerspace,” which predated even the term “makerspace” being in wide use, groenendyk and gallant described the technical details of their setup for a 2013 audience before writing frankly about their outreach efforts and difficulties with 3d scanned models.4 their 3d printer and 3d scanner were placed at the main desk on the library’s first floor to allow staff supervision, and with a booking system in place, the high traffic and visibility of the main desk location served to promote these services at a basic level, with the library hoping to engage in more direct outreach to specific departments later on.5 this is similar to the university of mississippi’s first library 3d printer, which was kept behind a desk and strictly mediated. the dalhousie librarians cited “trying to print 3d models that were not designed with 3d printing in mind, or by users with no knowledge of how 3d printing works” as their largest challenge, resulting in many unprintable files and the need for library staff to offer guidance and other interventions to avoid wasted resources.6 they found that, in particular, user inexperience with the 3d scanning software coupled with the relative difficulty of making a “fully formed and realized” 3d scan led to unrealistic user expectations for the timeliness and quality of their scans.7 positioning the object to be scanned, layering different scans together into a composite, and filling holes caused by missing scan data—all problems that still existed as of 2018, albeit somewhat moderated by new technology—were cited as particular difficulties with the nextengine 3d scanner used at dalhousie.8 the very first experiments in 3d scanning at the university of mississippi, when only a microsoft kinect video game scanner/controller was available, faced similar problems and prevented an early rollout of 3d scanning technology. a year later, in 2014, jason reuscher proposed 3d scanning as a library public service at penn state’s schuylkill campus with the same nextengine 3d equipment, reasoning that “academic libraries generally offer two-dimensional (2-d) or flatbed scanning to their patrons—why not 3-d scanning?”9 penn state went a step further than dalhousie in introducing mobility into the mix; their 3d scanner was able to be checked out, though initially limited to the building and requiring an attached laptop and auto-rotating platform.10 reuscher noted in particular that the device had a strong word-of-mouth outreach component; though only a few preliminary scans and workshops had been held by 2014, news of the scanner had spread far beyond schuylkill and sparked intense interest from other penn state affiliated students and faculty and created difficulties for those who wished to use it from distant campuses.11 reuscher, uniquely, advocated separating 3d printing from 3d scanning in terms of infrastructure as a library public service, similar to the way the university of mississippi sought to unencumber the scanner from the printer for outreach and promotional purposes.12 beginning in 2014, and through a 2017 revision of his book, jason griffey was beginning to realize some of the possibilities inherent in 3d scanning in library settings, dedicating a portion of a book chapter to various devices capable of doing such.13 in addition to turntable-style scanners, which mirror the rotating platform used by reuscher, griffey also addressed wireless handheld scanners and included an image of a 3d systems sense scanner being used to create a bust scan of a human information technology and libraries december 2023 to thine own 3d selfie be true 3 watson model, similar to some early experiments undertaken at the university of mississippi.14 while the issues that griffey cites, such as difficulty scanning color-contrasting areas, were less of an issue in 2018–2019 than in 2014, and prices for the scanners themselves have plummeted, he is prescient in seeing the potential of the scanners to be used for outreach by creating “really interesting and useful things from a scanner.”15 griffey’s enthusiasm for the idea is palpable as he goes on to say that portable scanners like the sense are “far more interesting ... in that [they allow] you to scan absolutely arbitrary objects rather than being limited to things that will fit onto a turntable ... you can scan freestanding objects, people, parts of rooms—nearly anything.”16 in particular, griffey was excited about the sense scanner that worked with an apple ipad, calling it a “an excellent and truly portable solution”—exactly what the university of mississippi found a similar ipad-based product to be in its 3d scanning outreach efforts.17 in 2015, ann marie lynn davis sought to do an analysis of academic library makerspace technology in a particular region—new england—through a brief survey and literature review. davis found that eight of the responding academic libraries had makerspaces, with six of them possessing some form of 3d scanner, including both handheld models and microsoft kinects.18 davis mentioned one library staff member in particular who was highly invested in 3d scanning at an institution that did not have a formal makerspace but planned to establish one in the future; this person had participated in a university museum 3d printing day, both serving as an “expert panelist” and “giving public demonstrations on library-owned 3d printers and a scanner kinect bar.”19 both in using a kinect scanner as an early-stage 3d scanner and in using 3d scanning to advertise 3d printing services, this library staffer at an unnamed institution demonstrated striking parallels to the university of mississippi. also in 2015, megan lotts described a number of “participatory events, which essentially are popup making spaces,” including a 2d selfie booth at the university of virginia, primarily as stress relief.20 these pop-up makerspaces can be “easily put up, taken down, sent from one library to another, and they use little space for storing materials.”21 emphasizing their outreach aspects, lotts noted that “making events can bring positive attention to the library and invite patrons to share their skills and talents.”22 pushing the idea of academic library outreach with 3d scanning technology even further, erich purpur at the university of nevada reno performed an internal “pop-up maker technology outreach event” with the intent of allowing staff from a large library system to lay hands on makerspace technologies, get an introduction to 3d printing, and develop adobe cs6 literacy.23 with over 150 librarians and staff available to attend, purpur attracted 40 attendees, with surveyed attendees responding favorably to the event’s effectiveness.24 the unr librarians also noted that “… faculty and staff regularly take [their] maker technology on the road engaging with both the university and greater reno communities at a variety of events … these outreach and engagement efforts have repeatedly proven to be successful, prompting the consideration of their possible use as part of internal professional development and outreach.”25 these maker technology roadshows are similar to what the university of mississippi attempted, especially the “internal pop-up maker outreach event,” albeit with a different user group as its focus—library staff at reno, patrons at mississippi. finally, jennifer grayburn and ammon shepherd spoke to the overriding need for outreach in academic library makerspaces in 2017. writing for the 2nd international symposium for academic makerspaces, the authors found that what they called a “build it and they will come” mentality did not always apply satisfactorily in academic settings.26 “while many users are information technology and libraries december 2023 to thine own 3d selfie be true 4 watson familiar with making through their academic studies or personal interests, many faculty and students have never encountered these technologies or methods, let alone thought critically about the relevance of making for their own research,” they wrote, adding that “... this unfamiliarity and even confusion about the purpose of a makerspace and how it is used might create the sense that a makerspace is not open or relevant to everyone.”27 while citing outreach efforts such as themed programs and project-based workshops as possible ways for academic libraries to attract makers, they noted that the “apparent irrelevance” of academic makerspaces was a “major barrier-to-entry for faculty and students who might otherwise benefit academically from critical making and the resources found in a makerspace.”28 continuous outreach and engagement by makerspace staff, they argued, was the only solution, taking the form of traditional outreach and other nontraditional forms of collaboration and publication such as blogging.29 the picture that emerges from this timeline of scholarship is one of academic libraries with makerspaces experimenting with outreach and 3d scanning but very rarely combining the two in a pop-up event. with this in mind, a case study of using 3d scanning from an academic library makerspace for outreach purposes seems both warranted and timely. background the university of mississippi libraries (uml) acquired its first 3d printer in late 2016 and began assembling a makerspace in mid-2017 once sufficient space had been allocated. uml populated the makerspace area—which was dubbed the idealab—with a 3d printer, a 3d scanner, a largeformat printer, arduino circuit prototyping kits, soldering irons, sewing machines, snap circuits, lego mindstorms, and a variety of other tools. the available location in the library was large and versatile but did not receive a great deal of foot traffic and was missed by many patrons due to its location and some architectural quirks in the design of the older part of the main uml building. this lack of foot traffic to the idealab, combined with the lack of a full-time staffer to run the makerspace, led to a general lack of awareness of the space and its contents among university patrons and community stakeholders. since the idealab had been conceptualized as a service the library felt would be useful and a future growth area, this represented a problem. once a full-time staffer was hired for the idealab in summer 2018, uml turned to the problem of trying to raise campus awareness of the makerspace during the first few weeks of class, when many new students would be on campus but before most major assignments and tests were scheduled. through a series of brainstorming sessions, uml staff decided to open a tent on the trent lott plaza, a major thoroughfare in the center of the university of mississippi’s campus, around lunchtime for four days across two weeks: august 22–23 and august 27–28, 2018. a variety of other campus organizations had booths, tables, or tents in the trent lott plaza, and foot traffic through the area was anticipated to be high. the setup was similar to that used by purpur at the university of nevada reno in its internal staff-only pop-up, with the addition of a public demonstration concept similar to that reported by davis in new england.30 uml could have simply advertised the idealab at this tent, but during the brainstorming sessions, the idea emerged that perhaps the advertising should focus on a single technology to build interest. the expectation was that a sufficiently new and exciting technology would have a “halo effect” for the idealab as a brand: by seeing an interesting and exciting makerspace technology, students, faculty, and staff might be more apt to visit the booth or the idealab itself, even if they never used the technology that initially intrigued them. the idea of a halo effect, that a positive impression created by one part of a makerspace—the 3d selfies—would influence positive information technology and libraries december 2023 to thine own 3d selfie be true 5 watson opinion for the other parts—the remainder of the idealab and its equipment—is borrowed from psychology, where the term has been in use for nearly a century.31 as the oldest and most mature technology in the idealab, the 3d printer seemed like an ideal candidate for this halo effect—several library staff members knew how to use it and there had been a great deal of hands-on experimentation with the system since its acquisition nearly two years prior. however, participants in the brainstorming sessions also felt that a simple “3d print petting zoo” composed of preprinted objects would not be dynamic enough to attract interest. the desired effect was, instead, akin to that reported by reuscher at penn state’s schuylkill campus: strong interest carried by word-of-mouth.32 3d body scanning was the next option that the brainstorming suggested. an early form of 3d body scanning had been devised by one of the authors in late 2016, consisting of a microsoft kinect for windows sensor and a freeware product called reconstruct.me. this 3d scanning, which could only capture the head and shoulders for the purposes of making busts, had generated considerable attention at the library’s 2016 holiday party, where a 3d selfie booth had been set up. a later demonstration to the friends of the library (fol) executive board in mid-2017 had been met with a similar positive reaction. even though the scans were limited to busts and to a single color with no surface texture mapping, the enthusiastic response among uml staff and the fol board showed a strong impression. this response was very similar to the results reported by davis in new england in a survey conducted in roughly the same time frame.33 figure 1 shows one of the early prototype busts created using this basic scanning rig. figure 1. an early 3d selfie bust made with windows kinect and reconstruct.me. at the same time, the idealab had been in the middle of a procurement process to acquire and implement an improved 3d body scanner. the prototype kinect-based unit had been large, bulky, required an external computer and power source, and could only scan a limited part of the body. by late 2017, the library had acquired an occipital structure 3d scanner, which offered a superior option. the occipital structure clipped to an ipad, was powered by the ipad’s internal batteries, and allowed scans to be immediately emailed or uploaded to a cloud drive. the new unit also allowed for full body scans to be easily created, as well as texturing, even though the library 3d printer—a lulzbot taz 5—was only capable of printing in a single color. information technology and libraries december 2023 to thine own 3d selfie be true 6 watson the brainstorming group therefore decided that the idealab makerspace tent would be advertised as a “3d selfie” booth, allowing for a short and catchy message in advertising that showcased an exciting new technology that provoked the halo effect and lead more traffic to the idealab’s physical space. the group also considered that, by giving patrons something to print on the 3d printer that was unique and personalized, those patrons would be encouraged to come to the idealab to see their selfies printed out. by collecting some basic demographic information from the patrons at the time of the selfie scans, the uml could also see which campus group or groups were most interested in the technology, allowing further targeting and customization of the outreach message. methods in the weeks before august 22, 2018, uml employees advertised the 3d selfie booth across all available library channels. this included the library’s official social media pages, physical flyers pinned to bulletin boards, emails sent to campus listservs, and a daily reminder in university of mississippi today—the university of mississippi’s daily email event calendar. an example of a 3d selfie booth release form may be seen in appendix a. beginning august 22, 2018, a working group of three uml employees—one faculty member and two staffers—assembled the materials for a tent to be pitched in the trent lott plaza of the university of mississippi. the uml-branded tent was already owned by the uml and was primarily used for a library presence at tailgating events before major home football games. due to the need for consistent lighting in order to allow the occipital structure 3d scanner to work properly, all scanning was done beneath the tent. the tent was also situated near enough to a major university building to be able to receive a wireless signal, which was essential to the 3d scanner’s functionality. the first day—august 22, 2018—the tent was erected in the middle of the trent lott plaza just outside of weir hall, near the geographical center of campus. difficulties with glare, distance from power outlets, and other issues led the tent to be moved beneath a large tree at the east entrance to the plaza for the remaining three days—august 23 and august 27–28, 2018. figure 2 shows the library tent with 3d scanning of patrons in progress. information technology and libraries december 2023 to thine own 3d selfie be true 7 watson figure 2. the library’s 3d selfie booth. with the area under the tent dedicated to taking the 3d selfies, several tables were set up directly adjacent to it. these tables were managed by a uml staff or faculty member and contained a variety of technology from the idealab on display. items included the aforementioned “petting zoo” of interesting objects created on the 3d printer during its testing and proving period—things information technology and libraries december 2023 to thine own 3d selfie be true 8 watson such as a 3d-printed printing press capable of rolling small inked images onto paper, a 3d-printed shadowbox that seemed opaque but would cast a shadow that revealed a picture, a bust from the earlier prototype 3d scanner, and several other small widgets. other technology on display included an arduino software prototyping kit, a makey-makey input device, and a large-format printer image that doubled as an advertising billboard. library promotional materials, including handheld fans and idealab flyers, were made available at the tables. in order to prevent any issues regarding usage of the 3d selfies for research, uml staff prepared a photo and video release form to collect patron consent. the back of the form served as the demographic information collection survey, asking a variety of questions useful to future uml library promotions and events. information collected included school status (undergraduate, faculty, etc.), whether or not patrons had visited the physical idealab, a question about how the patrons knew about the 3d selfie booth, two questions gauging future interest in the idealab and uml resources, and a final question asking which—if any—social media platform(s) the patrons used to follow uml. for the purposes of this survey, an older library resource was also included: studioone, a one-button recording and editing studio. this was done because studioone fell under the jurisdiction of the idealab staff despite being in a separate location. a full copy of this form can be found in appendix b. once the 3d selfie was complete, it was shown to the patron and they were given the option of receiving a copy of the scan through email. patrons were also told to come to the idealab for assistance in printing their 3d selfies, if desired. figure 3 shows a completed 3d selfie, taken of the first patron in figure 2. figure 3. a 3d selfie taken at the library’s booth. the student signed a release form for this purpose. note that there are some 3d artifacts present around the periphery where the scanner picked up miscellaneous items like the legs of the tent; these would have been removed using a 3d image editor before printing. the 3d model is also textured in full color, which works as a digital artifact but which cannot be reproduced on the library’s lulzbot taz 5, which was limited to monochrome printing at the time. information technology and libraries december 2023 to thine own 3d selfie be true 9 watson data a total of 112 patrons stopped by the library tent during the four days it was in operation. of those 112 patrons, a total of 29 (approximately 26%) had a 3d selfie made with the library scanner. this includes several patrons who had 3d selfies taken together with friends or loved ones but excludes the test scans made by library workers on-site to test their equipment. table 1 shows the breakdown in the number of patrons and selfies per day, as well as the conversion rate, or the percent of visitors who had 3d selfies taken. table 1. patrons, 3d selfies, and conversion rate by day date visitors selfies conversion rate 8/22/2018 15 4 26.67% 8/23/2018 32 7 21.88% 8/27/2018 38 7 18.42% 8/28/2018 27 11 40.74% total 112 29 the first day had by far the fewest number of visitors and selfies, while the final day had the best conversion rate, nearly double that of the next best day. aside from the very high conversion rate on the final day, the other three days had very similar rates. the number of visitors on the final three days were also broadly similar. of the 112 patrons who visited the library booth for long enough to be counted, only the 29 who had 3d selfies taken completed the full demographic survey form, as had been the original plan. tables 2–5 show highlights of this demographic data; the complete responses are available in appendix c. table 2. patrons by school status school status patrons percent undergraduate 20 68.97% graduate 6 20.69% faculty/staff 3 10.34% undergraduates were by far the largest patron group, with nearly 70% of all 3d selfies, followed by graduate students at approximately 20% and faculty/staff at 10%. table 3. patron responses to library-specific questions question yes no maybe ever visited the idealab or studioone? 5 24 n/a will you use either after this event? 20 0 9 like to know more about library resources? 24 5 n/a information technology and libraries december 2023 to thine own 3d selfie be true 10 watson a significant majority of patrons (19/24) had not visited the idealab, the uml makerspace, or studioone, the uml one-button film studio. all but nine patrons indicated an interest in using either or both after the event, and those nine patrons indicated “maybe.” no patrons claimed to be uninterested in either or both spaces. table 4. how patrons found out about the 3d selfie booth source number of patrons percent flyer 2 6.90% university of mississippi today 7 24.14% website 1 3.45% walk-up 11 37.93% other 8 27.59% the largest number of patrons (approximately 38%) simply walked by the tent or were drawn in by speaking to library staff. many (approximately 25%) learned about the 3d selfie booth from university of mississippi today, the university of mississippi’s daily email event calendar. very few cited flyers or the uml website. of those listing “other,” two cited friends, one cited facebook, one cited a departmental listserv, and three others did not specify. table 5. patrons following @umlibraries on social media social media platform(s) number of patrons instagram 2 facebook 4 twitter 1 instagram, facebook, and twitter 4 instagram and facebook 2 instagram and twitter 0 facebook and twitter 0 none of the above 13 do not use social media 3 of the patrons who had 3d selfies made, 13 followed uml on at least one social media platform. thirteen others did not follow uml on any of the listed platforms, while two others were not social media users of any sort. finally, of the 29 patrons who had 3d selfies made, only one later stopped by the idealab to have it printed on the uml lulzbot taz 5 machine. discussion while a relatively large number of people visited the uml 3d selfie booth during its four days of operation, nearly three-quarters of them did not have a 3d selfie made. based on the anecdotal experience of the library staff members running the booth, there appeared to be several reasons for this relatively low conversion rate. with only three staffers and one scanner, it could take 5–10 information technology and libraries december 2023 to thine own 3d selfie be true 11 watson minutes to create a single scan, meaning that there was occasionally a line or interested patrons that declined in the interest of time. many passersby who did not stop also cited lack of time— often needing to walk to classes across campus. according to anecdotal observations from uml staff, several people also showed interest in the technology, stopping to chat with uml staff and look at the idealab equipment on display, but declined to have a 3d selfie made. a variety of factors may have been behind this, from a simple reluctance to be photographed, to genuine interest in the other technologies on display, to a desire to take a free uml-branded fan on a hot summer day. it seems clear that some of the assumptions made by the uml brainstorming group were incorrect. only one of the 29 patrons was interested in having a 3d print of their selfie made, for instance—this did not seem to be a strong inducement for the patrons to visit the idealab. as far as could be determined, this student did not follow through on their interest and print their selfie. based on discussions the patrons had with uml staff, some patrons seemed to be more invested in their scans as digital objects rather than physical models—they wanted the scan but not necessarily the print. one patron spoke of using their scan to create user profile pictures for gaming sites, for instance, and another wanted to use the texture mapping from the scan— available as a separate file—to put their face on a video game avatar. the location and timing of the 3d selfie booth also seems to merit further consideration. looking at the patron data, the first day had far fewer patrons and 3d selfies while the remaining three are much more consistent. this may reflect the change in location between the first two days, from a hot and exposed location in the middle of the trent lott plaza to a much more shaded area at its east end. the choice of lunchtime for the booth may also have affected the ability of passersby to stop and engage, since students, faculty, and staff may not have been able to sacrifice their lunch hour to experimenting with library technology. nevertheless, most patrons who stopped for any length of time seemed impressed by the 3d selfie apparatus and the petting zoo of 3d objects. as such, while the hoped-for halo effect may not have led to many patrons coming to the idealab to have their 3d selfies printed, it did seem to open the door for more engagement with the space. due to the space that the 3d selfie booth occupied, it was not surprising that nearly 70% of patrons were undergraduates—the trent lott plaza is one of three primary routes from east to west on the university of mississippi campus and lies adjacent to the business and accountancy schools. choosing a similar high-traffic location would seem to be a good way to entice walk-up patrons, especially considering that almost 40% of the patrons who got 3d selfies were walk-ups. the effect of library staff verbally advertising the tent cannot be understated, as well. hardly any of the patrons had used the idealab or studioone before having a scan made. in the case of the idealab, that was unsurprising—it had only been open with a full-time staffer for a few months at that time. studioone had been open for much longer, several years, and had seen relatively steady use, but due to the fact that it was a separate location in the library, its inclusion on a survey otherwise focusing on the idealab may have confused respondents. patrons may also have been familiar with the spaces but not their names, or had thought that they needed to use both, rather than either, to answer affirmatively. in retrospect, the inclusion of studioone on the survey may have been a mistake—none of its technology was represented on the table, in the petting zoo of 3d objects, or in the promotional materials. information technology and libraries december 2023 to thine own 3d selfie be true 12 watson responses to the other survey questions were encouraging—when asked, the patrons getting 3d selfies seemed very open to the possibility of using library technology and learning about library resources based on their responses. however, this may have partly been because of a selfselecting bias; patrons who were more drawn to technology may have been more likely to take a 3d selfie and therefore answer the survey. ideally, in the future, surveys would be provided to all participants to rule out any such inaccuracies. it is also worth noting that a fair percentage of the patrons who stopped followed uml on some variety of social media—despite very few saying that they had learned about the 3d selfie booth from social media. this suggests that many of the patrons that responded to the selfie booth were already predisposed to engage positively with uml, even if they did not follow it closely enough to cite library social media as the primary thing bringing them to the 3d selfie booth. library staff were also impressed with the number of patrons who cited university of mississippi today as the source of their information about the event—anecdotally, many in the library claim that no one reads the daily emails. clearly, this is an assumption that the data has challenged. thus far, further attempts to iterate on the 3d selfie booth have failed due to circumstance. in 2019 and 2021, the lab’s head left the institution, resulting in a vacancy that precluded any major new initiatives. the uml campus was closed during most of 2020 and had restrictions for part of 2021 due to the covid-19 pandemic as well. conclusion ultimately, the 3d selfie booth did not work entirely as intended. since only one patron had their selfie printed, there was no real way to track how many patrons who were intrigued by the idealab actually made it to the makerspace in person. the way the questionnaires were organized, the format of their questions, and the fact that they were only given to people who had scans made are all factors that make the data susceptible to self-selecting bias. the small number of patrons overall also makes it dangerous to generalize much from the data set. ideally, any future iterations of the 3d selfie booth would have a questionnaire that is altered to reflect a smaller sample size, to be better organized, and to remove references to superfluous resources. the 3d selfie booth did succeed in attracting a large amount of foot traffic and interest; however, 112 individuals across four days, with nearly 30 being engaged enough to go through the lengthy scanning process, compares favorably to many other library events and activities that uml has held. the hoped-for halo effect of patrons being intrigued by the 3d scanning technology did seem to be at least partially true, even if this did not translate into actual 3d prints as expected. a microsoft kinect for windows costs $50–$75, while an occipital structure sensor costs $399 new. given the costs of the tools involved, and the fact that a 3d printer did not seem to be integral to the process, it seems like a 3d selfie booth may be a possibility for other makerspaces looking to drum up interest. the strategy of putting it in a high-traffic area seems to be sound, especially if library staff is available to manage the booth for more than four days per semester. the experience that uml has had with its 3d selfie booth serves as a case study to other makerspaces attempting a similar promotion, both in refining the general idea to be more effective, questioning assumptions about the process, and avoiding some of the pitfalls. information technology and libraries december 2023 to thine own 3d selfie be true 13 watson appendix a: release form the university of mississippi photo & video release 1. i hereby authorize the university of mississippi and those acting pursuant to its authority (“university”) to: (a) record my likeness and voice on video, audio, film, photograph, digital, electronic or any other medium; (b) use my name in connection with these recordings; and, (c) use, reproduce, exhibit or distribute, in any manner and medium, these recordings for any purpose that the university deem appropriate, including promotional, educational or advertising efforts. 2. i agree that all licenses and permissions granted in this agreement are perpetual and transferable. 3. i hereby represent that i have the full rights to enter into this agreement and i release the university from liability for any violation of any personal or proprietary right i may have in connection with such use. i understand that all such recording, in whatever medium, shall remain the property of the university. i represent and warrant that i am over 18 years of age and have authority to enter into this agreement. name: _ address: phone no.: _ signature: parent/guardian signature (if under 18): information technology and libraries december 2023 to thine own 3d selfie be true 14 watson appendix b: demographic information 1. audience participation information: undergraduate graduate faculty staff community resident other (please specify) _____ 2. have you ever visited the ldealab or studioone in the [library]? yes no 3. how did you learn about today’s event? newspaper website social media (specify) _____ flyer university of mississippi today other (specify) _____ 4. after this event, will you use or visit studioone or the idealab in the library? yes maybe no 5. would you like to know more about the resources the [instituion] libraries has to offer? yes no 6. do you follow the [library handle] on any social media platforms? lnstagram facebook twitter none of the above i do not use social media information technology and libraries december 2023 to thine own 3d selfie be true 15 watson appendix c: full responses # q1 q2 q3 q4 q5 q6 1 undergraduate yes other (walk-up) yes no facebook 2 undergraduate yes other (unspecified) yes yes instagram, facebook, twitter 3 graduate no flyer maybe no facebook 4 undergraduate no other (english listserv) maybe no none of the above 5 graduate no other (facebook) yes no facebook 6 staff no other (coworker) yes yes instagram, facebook, twitter 7 staff no other (unspecified) yes yes instagram, facebook, twitter 8 staff no university of mississippi today yes yes instagram, twitter 9 undergraduate yes university of mississippi today yes yes instagram 10 undergraduate no university of mississippi today yes yes none of the above 11 undergraduate no other (walk-up) yes yes do not use social media 12 undergraduate no other (walk-up) maybe no do not use social media 13 undergraduate no university of mississippi today maybe yes none of the above 14 graduate no other (walk-up) yes yes none of the above 15 undergraduate yes flyer yes yes none of the above 16 graduate no university of mississippi today yes yes facebook 17 graduate no website yes yes instagram, facebook 18 undergraduate no other (walk-up) yes yes none of the above 19 undergraduate no n/a yes yes none of the above 20 undergraduate no other (unspecified) maybe yes none of the above 21 undergraduate no university of mississippi today yes yes none of the above 22 undergraduate yes other (walk-up) yes yes instagram, facebook 23 undergraduate no other (walk-up) maybe yes twitter 24 undergraduate no other (walk-up) maybe yes none of the above 25 undergraduate no university of mississippi today maybe yes instagram 26 undergraduate no other (friend) yes yes none of the above 27 undergraduate no other (walk-up) maybe yes instagram, facebook, twitter 28 undergraduate no other (walk-up) yes yes none of the above 29 graduate no other (walk-up) yes yes none of the above information technology and libraries december 2023 to thine own 3d selfie be true 16 watson endnotes 1 martin k. wallace et al., “making maker literacies: integrating academic library makerspaces into the undergraduate curriculum,” in 2nd international symposium for academic makerspaces (cleveland, ohio: case western reserve, 2017). 2 erin passehl-stoddart et al., “history in the making: outreach and collaboration between special collections and makerspaces,” collaborative librarianship 10, no. 2 (2018). 3 michael groenendyk and riel gallant, “3d printing and scanning at the dalhousie university libraries: a pilot project,” library hi tech 31 (2014): 34. 4 groenendyk and gallant, “3d printing,” 38. 5 groenendyk and gallant, “3d printing,” 39. 6 groenendyk and gallant, “3d printing,” 38. 7 groenendyk and gallant, “3d printing,” 39. 8 groenendyk and gallant, “3d printing,” 39. 9 jason reuscher, “three-dimensional (3-d) scanning within academic libraries: exploring and considering a new public service,” pennsylvania libraries: research & practice 2, no. 1 (2014): 64. 10 reuscher, “three-dimensional,” 68. 11 reuscher, “three-dimensional,” 69. 12 reuscher, “three-dimensional,” 68. 13 jason griffey, 3-d printers for libraries (chicago: american library association, 2017), pages??. 14 griffey, “3-d printers,” 19. 15 griffey, “3-d printers,” 18–20. 16 griffey, “3-d printers,” 19. 17 griffey, “3-d printers,” 20. 18 ann marie lynn davis, “current trends and goals in the development of makerspaces at new england college and research libraries,” information technology and libraries 37, no. 2 (2018): 107, 109, https://doi.org/10.6017/ital.v37i2.9825. 19 davis, “current trends,” 108–9. 20 megan lotts, “implementing a culture of creativity: pop-up making spaces and participating events in academic libraries,” college & research libraries news 76, no. 2 (2015): 72. 21 lotts, “implementing a culture of creativity,” 72. information technology and libraries december 2023 to thine own 3d selfie be true 17 watson 22 lotts, “implementing a culture of creativity,” 75. 23 erich purpur et al., “refocusing mobile makerspace outreach efforts internally as professional development,” library hi tech 34, no. 1 (2016): 132. 24 purpur et al., “refocusing mobile makerspace outreach,” 137. 25 purpur et al., “refocusing mobile makerspace outreach,” 130. 26 jennifer greyburn and ammon shepherd, “beyond the page: outreach and research in academic library makerspaces,” in 2nd international symposium for academic makerspaces (cleveland, ohio: case western reserve, 2017), paragraph 5. 27 greyburn and shepherd, “beyond the page,” paragraph 5. 28 greyburn and shepherd, “beyond the page,” paragraph 5. 29 greyburn and shepherd, “beyond the page,” paragraph 14. 30 purpur et. al, “refocusing mobile makerspace outreach,”132; davis, “current trends,” 109. 31edward l. thorndike, “a constant error in psychological ratings,” journal of applied psychology 4, no. 1 (1920): 27, https://doi.org/10.1037/h0071663. 32 reuscher, “three-dimensional,” 69. 33 davis, “current trends,” 109. https://doi.org/10.1037/h0071663 149 an integrated computer based technical processing system in a small college library jack w. scott: kent state university library, kent, ohio (formerly lorain county community college, lorain, ohio) a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order. records generated by the initial order, with slight updating of information,. are used to produce, via computer, manual and mechanized order files and shelf lists, catalogs in both the traditional 3x5 card form and book form, mechanized claiming of unfilled orders, and subject bibliographies. the lorain county community college, a two-year institution designed for 4000 students, opened in september 1964, with no librarian and no library collection. when the librarian was hired in october 1964, lack of personnel, both professional and clerical, forced him to examine closely traditional ways of ordering and preparing materials, his main task being the controlled building of a collection as quickly as possible. no library having been established, there were no inflexible rules governing acquisitions or cataloging and no catalogs or other files enforcing their pattern on future plans. the librarian was free to experiment and adapt as much as he desired; and adapt and experiment he did, remembering, at least most of the time, the primary reasons for designing the 150 journal of library automation vol. 1/3 september, 1968 system. these were 1) to notify the vendor about what material was desired; 2) to have readily available information about when material had been ordered and when it might arrive; 3) to provide a record of encumbrances; 4) to make sure that material received was the material which had been ordered; 5) to initiate payment for material received; 6) to provide catalog copy for technical processes to use in producing card and book catalogs; 7) to provide inexpensive control cards for a circulation system; and 8) to provide whatever other statistics might be needed by the librarian. the librarian attended the purdue conference on library automation (october 2-3, 1964) and an ibm conference on a-utomation held in cleveland (december 1964), and visited libraries with data processing installations, such as the decatur public library. then an extensive literature search was run on the subject of mechanization of libraries and the available material thoroughly reviewed. it was the consensus of the president, the librarian, and the manager of data processing that, as white said later, "the computer will play a major part in how libraries are organized and operated because libraries are a part of the fabric of society and computers are becoming a daily accepted part of life." ( 1) moreover, it was agreed that the use of data processing equipment would be justified only if it made building a collection more efficient and more economical than manual methods could do. metro}) after careful consideration of the ibm 870 document writing system ( 2) and the system described by kraft ( 3) as input techniqu~s for the college library, ·it . was decided to use the friden flexowriter, recommended both at purdue and, in european applications, by bernstein ( 4). its most attractive feature was the use of paper tapes to generate various secondary. records without the necessity of proofreading each one. the college, by mid-1965, ·had the following equipment available for library use: one friden flexowriter (model 2201) with card punch control unit and tab card reading unit, one ibm 026 key punch with alternate programming, and guaranteed time on the college-owned ibm 1440 8k computer with two tape and lwo disc drives. to produce punched paper tape and tab cards with only one keyboarding, an electrical connection between the flexowtiter and the keypunch was especially designed and installed. . it was fortunate for the library that the college also had an excellent data processing· manager who was interested in seeing data processing machines and techniques utilized in as many ways as possible. with his enthusiastic support, aid in programming and preparation of flow charts, and patient cooperation, it was not surprising that the automation of library processes was completely successful. ·· at this time it ·was decided that since the college was likely to remain integrated computer based processing/ scott 151 a single-campus institution it would be uneconomical to rely solely on a book catalog, even though the portability of such a device was most attractive to librarian and faculty alike. therefore, it was planned to have the public catalog, as well as the official shelf list, in card form, permitting both to be kept current economically. these two files were to be supplemented with crude book catalogs which would be a by-product, among others, of the typing of the original book orders. these book catalogs were not to replace the card catalog but simply to extend and facilitate use of the collection. it was also decided to design a system which would duplicate as few as possible of the manual aspects of normal technical processing systems, but one which would, at the same time, permit the return to a manual system from a machine system with a minimum of trouble and tribulation if support for the library's automated system should be withdrawn. concern about such withdrawal of support had originally been voiced by durkin and white in 1961, when they said: "there have been a number of unfortunate examples of libraries that abandoned their home-grown catalogs for a machine retriev(tl program because there was some free computer time, only to lose their machine time to a higher priority project and to be left with information storage to which they no longer have access. many of these librarians, and others who have heard about their plight, are determined not to bum their bridges behind' them by abandoning their reliable, if old-fashioned, 3x5 card catalogs." ( 5) although the necessity of returning to an inefficient manual system has not, to date, raised its ugly head, there were times when it was most comforting to know that routes of retreat and reformation were available. under the present system there is only one manual keyboarding of descriptive catalog main entries for most titles. all other records are generated from these main entries. this integrated system was adopted on the assumption that cataloging infonnation in some form ( 6) would be available for a high percentage of books. experience showed that about 95 percent of acquisitions did have catalog copy readily available. of 4029 titles processed in a 5-month period, catalog copy was available for 3824. after verification that a requested title is neither in the library nor on order, a copy of a catalog entry is located in a source such as the national union catalog, library of congress proofsheets, or publisher's weekly, etc. the catalog information is manually typed in its entirety (including subject headings) onto five-part multiple request forms, using the friden flexowriter. output from the friden consists of the multiple order, a punched paper tape containing the full bibliographic entry but no order information, and tab cards, punched by the slave ibm key punch, which contain full order information but only abbreviated bibliographic data. (figure 1 ). the tab cards, containing full order information, are used as input to the 1440 computer to create an "on order" file arranged by order 152 /ou·rnal of library automation vol. 1/ 3 september, 1968 mail copies to vendor typed multiple book orders on order tape fig. 1 on order creation routine. start flexowriter 026 key punch on order cards cards to week integrated computer based p1'0cessing / scott 153 number and stored on magnetic tape, from which an "on order" printout is produced weekly (figure 2). at any given time this magnetic tape order file can be used to total the dollar amount of outstanding orders to any given vendor, or the total amount outstanding to all vendors (figure 3 ). the punched paper tape and two copies of the request form are stored in a standard 3x5 card file arranged by main entry. one copy of the request form is to be used as a work slip when material is received. on order cards for one week fig. 2 on order update. start cpu on order update scratc h a f ter update 154 journal of library automation vol. 1/ 3 september, 1968 the original and one copy of the request form is sent to the vendor, with instructions to return one copy with shipment. in the event the vendor does not comply, the main entry can be located readily by checking the order number or order .date on the "on order" printout and using the abbreviated bibliographic information which appears there. if the material requested has not been shipped within three months, the magnetic tape order file is used to prepare tab cards containing all original order information and the cards are sent to the library with a notice stating that shipment is overdue. these tab cards are used as input fig. 3 on order cost tally. start cpu list or tab of on order file by cost #30000 on order cost tab integrated computer based processing/ scott 155 to the flexowriter tab card reader unit which activates the flexowriter itself and prepares "overdue, ship or cancel" notices to the vendor (figfig. 4 late on order routine. ure 4). 156 journal of library automation vol. 1/ 3 september, 1968 products when material is received, the paper tape and one copy of the main entry work slip are pulled from the card order file and sent to the cataloger who notes on the work slip the call number to be used as well as any changes. the work slip, punched paper tape and book then pass to the technician who does the shelf listing. at this point the original output paper tape containing full bibliographic information is used as input for the flexowriter to create a standard 3x5 hard-copy shelf list card containing full bibliographic information, as well as inventory data such as vendor, date of receipt and cost. the last three items and the call number are added manually as "changes." simultaneously a new paper tape is produced as output which contains bibliographic information from the first tape and all revisions deemed necessary by the cataloger. the revised paper tape is used on the flexowriter to prepare 3x5 card sets for the public catalog. at the same time the slave keypunch prepares a set of tab cards containing full acquisitions fig. 5 shelf list creation routine. integrated computer based processing/scott 157 information: cost, vendor, date of receipt; and abbreviated bibliographic information: short author, short title, full call number (including copy, year, part and volume), accession number and short edition statement (figure 5). the tab cards are used first to delete the item from the magnetic tape "on order" file and second as input to create a magnetic tape shelf list of abbreviated information arranged by call number (figure 6). the magnetic tape shelf list is used to create 1) eight copies of author, title, and classified catalogs which are updated semi-annually; 2 ) printouts of weekly acquisitions; 3) subject printouts on demand; and 4) tab cards which serve as circulation cards for books, film s, drawings, tape and disc recordings, filmstrips and any other materials. the tab cards can be used with the ibm 357 circulation system or any similar system. discussion the efficiency of this system is most dramatically demonstrated by the amount of work accomplished per person per year. one technician can sort by call number cpu circ. caro prep fig. 6 weekly shelf list update. sort by control number cpu 158 journal of library automation vol. 1/ 3 september, 1968 process over one thousand orders per month. over fifteen thousand fully cataloged volumes per year (approximately eleven thousand titles) are added to the collection by a technical processing department which consists solely of one full-time cataloger and two full-time technicians. one technician spends one half of her time typing orders and the other half preparing the shelf list. at present the limiting factor in processing material is not the personnel time available but rather time on the flexowriterkeypunch combination, which runs continuously for sixty hours per week. the cataloger feels if some thirty hours more per week were available for running the machines, or if a second flexowriter were available to handle catalog card output, it would then be possible to order, receive, and fully process fifteen thousand titles per year (eighteen to twenty thousand volumes) with only the present technical processing staff. references 1. white, herbert s.: "to the barricades! the computers are coming!" special libmries 57 (november, 1966), 631. 2. general information manual: mechanized library procedures (white plains, n.y.: ibm, n.d.). 3. kraft, donald h .: libmry automation with data processing equipment (chicago: ibm, 1964). 4. bernstein, hans h.: "die verwendung von flexowritern in dokumentation und bibliothek", n achrichten fur dokumentation 12 (june, 1961), 92. 5. durkin, robert e.; white, herbert s.: "simultaneous preparation of library catalogs for manual and machine applications", special libraries 52 (may, 1961), 231. 6. kaiser, walter h.: "new face and place for the catalog card", library journal 88 (january, 1963 ), 186. 164 information technology and libraries | december 2009 “discovery” focus as impetus for organizational learning jennifer l. fabbi the university of nevada las vegas libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information began with an organizational review of the libraries’ technical services division. this article outlines the phases of this review and subsequent planning and organizational commitment to discovery. using the theoretical lens of organizational learning, it highlights how the emerging focus on discovery has provided an impetus for genuine learning and change. t he university of nevada las vegas (unlv) libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information stemmed from the confluence of several initiatives. however, a significant path that is directly responsible for the increased attention on discovery leads through one unit in unlv libraries—technical services. this unit, consisting of the materials ordering and receiving (acquisitions) and bibliographic and metadata services (cataloging) departments, had been without a permanent director for three years when i was asked to take the interim post in april 2008. while the initial expectation was that i would work with the staff to continue to keep technical services functioning while we performed our third search for a permanent director, it became clear after three months that, because of nevada’s budgetary limitations, we would not be able to go forward with a search at that time. as all personnel searches in unlv libraries were frozen, managers and staff across the divisions moved quickly to reassign staff with the aim of mitigating the effects of staff vacancies. there was division between the library administrators as to what the solution would be for technical services: split up the division—for which we had trouble recruiting and retaining a leader in the past—and divvy up its functions among other divisions in the libraries, or to continue to hold down the fort while conducting a review of technical services that would inform what it might become in the future. other organizations have taken serious looks at, and provided roadmaps of, how their organizations’ focus of technical services will change in the future.1 the latter route was chosen, and the review—eventually dubbed revisioning technical services—led directly to the inquiries and activities documented in this ital special issue. detailing the process of revisioning technical services and using the theoretical lens of organizational learning, i will demonstrate how the libraries’ emerging focus on the concept of discovery has provided an impetus for genuine learning and change. n organizational learning in images of organization, morgan devotes a chapter to theories of organizational development that characterize organizations using the metaphor of the brain.2 based on the principles of modern cybernetics, argyris and schön provide a framework for thinking about how organizations can learn to learn.3 while many organizations have become adept at single-loop learning—the ability to scan the environment, set objectives, and monitor their own figure 1. singleand double-loop learning source: learning-org discussion pages, “single and double loop learning,” learning-org dialog on learning organizations, http://www.learning-org.com/ graphics/lo23374singledll.jpg (accessed aug. 11, 2009). jennifer l. fabbi (jennifer.fabbi@unlv.edu) is special assistant to the dean at the university of nevada las vegas libraries. “discovery” focus as impetus for organizational learning | fabbi 165 general performance in relation to existing operating norms—these types of systems are generally designed to keep the organization “on course.” double-loop learning, on the other hand, is a process of learning to learn, which depends on being able to take a “double look” at the situation by questioning the relevance of operating norms (see figure 1). bureaucratized organizations have fundamental organizing principles, including management hierarchy and subunit goals that are seen as ends to themselves, which can actually obstruct the learning process. to become skilled in the art of double-loop learning, organizations must avoid getting trapped in singlelooped processes, especially those created by “traditional management control systems” and the “defensive routines” of organizational members.4 according to morgan, cybernetics suggests that learning organizations must develop capacities that allow them to do the following:5 n scan and anticipate change in the wider environment to detect significant variations by o embracing views of potential futures as well as of the present and the past; o understanding products and services from the customer’s point of view; and o using, embracing, and creating uncertainty as a resource for new patterns of development. n develop an ability to question, challenge, and change operating norms and assumptions by o challenging how they see and think about organizational reality using different templates and mental models; o making sure strategic development does not run ahead of organizational reality; and o developing a culture that supports change and risk taking. n allow an appropriate strategic direction and pattern of organization to emerge by o developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed; o absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation; and o placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals. unlv libraries’ revisioning technical services process and the resulting organizational focus on discovery is outlined below, and the elements identifying unlv libraries as a learning organization throughout this process are highlighted (see appendix a). n revisioning technical services this review of technical services was a process consisting of several distinct steps over many months, and each step was informed by the data and opinions gained in the prior steps: phase 1: technical services baseline, focusing on the nature of technical services work at unlv libraries, in the library profession, and factors that affect this work now and in the future phase 2: organizational call to action, engaging the entire organization in shared learning and input phase 3: summit on discovery, shifting significantly away from technical services and toward the concept of discovery of information and the experience of our users technical services baseline the first phase of the process, which i called the “technical services baseline,” included a face-to-face meeting with me and all technical services staff. we talked openly about the challenges that we faced, options on the table for the division and why i thought that taking on this review would be the best course to pursue, and goals of the review. outcomes of the process were guided by the dean of libraries, were written by me, and received input from technical services staff, resulting in the following goals: 1. collect input about the kinds of skills and leadership we would like to see in our new technical services director. (while creating these goals, we were given the go-ahead to continue our search for a new director). 2. investigate the organization of knowledge at a broad level—what is the added value that libraries provide? 3. increase overall knowledge of professional issues in technical services and what is most meaningful for us at unlv. 4. encourage technical services staff to consider current and future priorities. after establishing these goals, i began to document information about the process on unlv libraries’ staff website (figure 2) so that all staff could follow its progress. 166 information technology and libraries | december 2009 with the feedback i received at the face-to-face meeting and guided by the stated goals of the process, i gave technical services staff a series of three questions to answer individually: 1. what do you think the major functions of technical services are? examples are “cataloging physical materials” and “ordering and paying for all resources purchased from the collections budget.” 2. what external factors—in librarianship and otherwise—should we be paying the most attention to in terms of their effect on technical services work? examples are “the ways that users look for information” and “reduction of print book and serials budgets.” feel free to do a little research on this question and provide the sources of the information that you find. 3. what are the three highest priority/most important tasks on your to-do list right now? eighteen of twenty staff members responded to the questions. i then analyzed the twenty pages of feedback according to two specific criteria: (1) i paid special attention to phrases that indicated an individual’s beliefs, values, or philosophies to identify potential sources of conflict as we moved through the process; and (2) i looked for priority tasks listed that are not directly related to the individual’s job duties, as many of them were indicators of work stress or anxiety related to perceived impending change. during this phase, organizational learning was initiated through the process of challenging how technical services staff and others viewed technical services as a unit in the organization, and through the creation of shared reference points to guide our future actions. while beginning a dialogue about a variety of future management options for technical services work functions may have raised levels of anxiety within the organization, it also invited administration and staff to question the status quo and consider alternative modes of operation within the context of efficiency.6 in addition to thinking about current realities and external influences, staff were asked to participate in generating outcomes to guide the review process. these shared goals helped to develop a sense of coherence for what started out as a very loose assignment—a review that would inform what the unit might become in the future. organizational call to action the next phase of the process, “a call to action,” required library-wide involvement and input. while i knew that this phase would involve a library staff survey, i also desired that all staff responding to the survey had a basic knowledge of some of the issues that are facing library technical services today. using input from the two technical services department heads, i selected two readings for all library staff: bothmann and holmberg’s chapter on strategic planning for electronic resource management addressed many of the planning, policy, and workflow issues that unlv libraries has experienced7; and coyle’s article on information organization and the future of the library catalog offers several ideas for ensuring that valuable information is visible to our users in the information environments they are using.8 i also asked the library staff to visit the university of nebraska–lincoln’s “encore catalog search” (http://iris.unl.edu) and go through the discovery experience by performing a guided search and a search on a topic of their choice. they were then asked to ponder what collections of physical or digital resources we currently own at the libraries that are not available from the library catalog. after completing these steps, i directed library staff to a survey of questions related to the importance of several items referenced in the articles in terms of the following unlv libraries priorities: n creating a single search interface for users pulling together information from the traditional library catalog as well as other resources (e.g., journal articles, images, archival materials) n considering non–marc records in the library catalog for the integration of nontraditional library and nonlibrary resources into the catalog n linking to access points for full-text resources from the catalog n creating ways for the catalog to recommend items to users figure 2. project’s wiki page on staff website “discovery” focus as impetus for organizational learning | fabbi 167 n creating metadata for materials not found in the catalog n creating “community” within the library catalog n implementing an electronic resource management system (erms) to help manage the details related to subscriptions to electronic content n implementing federated searching so that users can search across multiple electronic resource interfaces at once n making electronic resource license information available to library staff and patrons there also were several questions asking library staff to prioritize many of the functions that technical services already undertakes to some extent: n cataloging specialized or unique materials n cataloging and processing gift collections n ensuring that full-text electronic access is represented accurately in the catalog n claiming and binding print serials n ordering and receiving physical resources n ordering and receiving electronic resources n maintaining and communicating acquisitions budget and serials data the survey asked technical services staff to “think of your current top three priority to-do items. in light of what you read and what you think is important for us to focus on, how do you think your work now will have changed in five years?” all other library staff members were asked to respond to the following: 1. please list two ways that technical services supports your work now. 2. please list two things you would like technical services to start doing in support of your work now. 3. please list two things you think technical services can stop doing now. 4. please list two things technical services will need to begin doing to support your work in the next five years. finally, the survey included ample opportunity for additional comments. fifty-eight staff members (over half of all library staff) completed the readings, activity, and survey. i analyzed the information to inform the design of subsequent phases of revisioning technical services. the dean of libraries’ direct reports then reviewed the design. in addition, many library staff contributed additional readings and links to library catalogs and other websites to add to the revisioning technical services staff webpage. throughout this phase, the organization was invited into the learning process through engagement with shared reference points, the ability to question the status quo, and the ability to embrace views of potential futures as well as of the present and the past.9 the careful selection of shared readings and activities created coherence among the staff in terms of thinking about the future, but these ideas also raised many questions about the concept of discovery and what route unlv libraries might take. the survey allowed library staff to better understand current practices in technical services, to prioritize new ideas against these practices, and to think about future options and their potential impact on their individual work as well as the collective work of the libraries. summit on discovery in the third phase of this process, “the discovery summit,” focus began to shift significantly from technical services as an organizational unit to the concept of discovery and what it means for the future of unlv libraries. during this half-day event, employing a facilitator from off campus, the dean of libraries and i designed a program to fulfill the following desired outcome: through a process of focused inquiry, observation, and discussion, participants will more fully understand the discovery experience of unlv libraries users. the event was open to all library staff members; however, individuals were required to rsvp and complete an activity before the day of the event. (the facilitator worked specifically with the technical services staff at a retreat designed to prepare for upcoming interviews for technical services director candidates.) participants were each sent a “summit matrix” (see appendix b) ahead of time, which asked them to look for specific pieces of information by doing the following: 1. search for the information requested with three discovery tools as your starting points: the libraries’ catalog, the libraries’ website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease—few results) to 5 (highest ease—best results). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? the information that staff members were asked to search for using each discovery tool was mostly specific to the region of southern nevada, such as, “i heard that henderson (a city in southern nevada) started as a mining community. does unlv libraries have any books about that?” and “find any photograph of the gay 168 information technology and libraries | december 2009 pride parade in las vegas that you can look at in unlv libraries.” during the summit, the approximately sixty participants were asked to discuss their experiences searching for the matrix information, including any affective component to their experience, and they were asked to specify criteria for their definition of “ease of discovery.” next, we showed end-user usability video testing footage of a unlv professor, a human resources employee, and a unlv librarian going through similar discovery exercises. after each video, we discussed these users’ experiences—their successes, failures, and frustrations— and the fact that even our experts were unable to discover some of this information. finally, we facilitated a robust brainstorming session on initiatives we could undertake to improve the discovery experience of our users. [editor’s note: read more about this usability testing in “usability as a method for assessing discovery” on page 181 of this issue.] during the wrap-up of the discovery summit, the final phase of this initial process, the discovery miniconference was introduced. a call for proposals for library staff to introduce or otherwise present discovery concepts to other library staff was distributed. this call tied together the revisioning technical services process to date and also placed the focus on discovery to the libraries’ upcoming strategic planning process. this strategic planning process, outlining broad directions for the libraries to focus on for the next two years, would be the first time we would use our newly created evaluation framework. we focused on the concepts of discovery, access, and use, all tied together through an emphasis on the user. all library staff members were invited to submit a poster session or other visual display on various themes related to discovery of information to add to our collective and individual knowledge bases and to better understand our colleagues’ philosophies and positions on discovery. in addressing one of six mini-conference themes listed below, all drawn directly from the revisioning technical services survey results, potential participants were asked to consider the question, “what are your ideas for ways to improve how users find library resources?” n single search interface (federated searching, harvester-type platform, etc.) n open source vs. vendor infrastructure n information-seeking behavior of different users n social networking and web 2.0 features as related to discovery n describing primary sources and other unique materials for discovery n opening the library catalog for different record types and materials proposals could include any of these perspectives: n an environmental scan with a summary of what you learn n a visual representation of what you would consider improvement or success n a position for a specific approach or solution that you advocate ultimately, we had seventeen distinct projects involving twenty-four staff members for the afternoon miniconference. it was attended by approximately seventy additional staff members from unlv libraries as well as representatives from institutions who share our innovative system. we collected feedback on each project in written form and electronically after the mini-conference. miniconference content was documented on its own wiki pages and in this special issue of ital. during this phase of the revisioning technical services process, there was an emphasis on understanding our services from the customers’ point of view, a hallmark of a learning organization.10 during the discovery summit, we aimed to transform frustration and uncertainty over the user experience of the services we are providing into a motivation to embrace potential futures. the mini-conference utilized the discovery themes that had evolved throughout the revisioning technical services process to provide a cohesive framework for library staff members to share their knowledge and ideas about discovery systems and to question the status quo. n organizational ownership of discovery: strategic planning and beyond through the phases of the revisioning technical services process outlined above, it should be evident how the concept of discovery, highlighted during the process, moved from being focused on technical services to being owned by the entire organization. while the vocabulary of discovery had previously been owned by pockets of staff throughout unlv libraries, it has now become a common lexicon for all. the libraries’ evaluation framework, which includes discovery, had set the stage for our upcoming organizational strategic plan. just prior to the discovery summit, the dean of libraries’ direct reports group began to discuss how it would create a strategic plan for the 2009–11 biennium. it became increasingly apparent how important a focus on discovery would be in this process, and that we needed to time our planning right, allowing the organization and ourselves time to become familiar with the potential activities we might commit to in this area before locking into a strategic plan. “discovery” focus as impetus for organizational learning | fabbi 169 the dean’s direct reports group first spent time crafting a series of strategic directions to focus on in the two-year time period we were planning for. rather than give the organization specific activities to undertake, the strategic directions were meant to focus our new initiatives—and in a way to limit that activity to those that would move us past the status quo. of the sixteen directions, one stemmed directly from the organization’s focus on discovery: “improve discoverability of physical and electronic resources in empowering users to be self sufficient; work toward an interface and system architecture that incorporates our resources, internal and external, and allows the user to access them from their preferred starting point.” an additional direction also touched on the discovery concept: “monitor and adapt physical and virtual spaces to ensure they respond to and are informed by next-generation technologies, user expectations, and patterns in learning, social interactions, and research collaboration; encourage staff to experiment with, explore, and share innovative and creative applications of technology.” through their division directors and standing committees, all library staff members were subsequently given the opportunity to submit action items to the strategic plan within the framework of the strategic directions. the effort was made by the dean of libraries for this part of the process to coincide with the discovery mini-conference, a time when many library staff members were being exposed to a wide variety of potential activities that we might take as an organization in this area. one of the major action items that made it into the strategic plan was for the dean’s direct reports to charge an oversight task force with the investigation and recommendation of a systems or systems that would foster increased, unified discovery of library collections. the charge of this newly created discovery task force includes a set of guiding principles for the group in recommending a discovery solution that n creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g., journal articles, images, archival materials); n enhances discoverability of as broad a spectrum of library resources as possible; n is intuitive: minimizes the skills, time, and effort needed by our users to discover resources; n supports a high level of local customization (such as accommodating branding and usability considerations); n supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure); n demonstrates commitment to sustainability and future enhancements; and n is informed by preferred starting points of the user. in setting forth these guiding principles, the work of the discovery task force is informed by the organization’s discovery values, which have evolved over a year of organizational learning. in the timing of the strategic planning process and the emphasis of the plan, we made sure that the organization’s strategic development did not run ahead of organizational reality and also have worked to develop a culture that supports change and risk taking.11 the strategic discovery direction and pattern of organizational focus has been allowed to emerge throughout the organizational learning process. as evidenced in both the strategic plan directions and guiding principles laid out in the charge of the discovery task force, the organization has begun to absorb the basic philosophy that will guide appropriate objectives in this area and has focused more on this guiding philosophy than on the active pursuit of one right answer as it continues to learn. n conclusion using the theoretical lens of organizational learning, i have documented how unlv libraries’ emerging focus on the concept of discovery has provided an impetus for learning and change (see appendix a). our experience throughout this process supports the theory that organizational intelligence evolves over time and in reference to current operating norms.12 argyris and schön warn that a top-down approach to management focusing on control and clearly defined objectives encourages singleloop learning.13 had unlv libraries chosen a more management-oriented route at the beginning of this process, it most likely would have yielded an entirely different result. in this case, genuine organizational learning proved to be action based and ever-emerging, and while this is known to introduce some level of anxiety into an organization, the development of the ability to question, challenge, and potentially change operating norms has been worth the cost.14 i believe that while any single idea we have broached in the discovery arena may not be completely unique, it is the entire process of organizational learning that is significant and applicable to many information and technology-related areas of interest. references 1. karen calhoun, the changing nature of the catalog and its integration with other discovery tools (washington, d.c.: library 170 information technology and libraries | december 2009 scan and anticipate change in the wider environment to detect significant variations by n embracing views of potential futures as well as of the present and the past (revisioning phase 1: technical services questions); n understanding products and services from the customer’s point of view (revisioning phase 3: summit); and n using, embracing, and creating uncertainty as a resource for new patterns of development (revisioning phase 1: meeting; phase 3: summit). develop an ability to question, challenge, and change operating norms and assumptions by n challenging how they see and think about organizational reality using different templates and mental models (revisioning phase 2: survey); n making sure strategic development does not run ahead of organizational reality (strategic planning process; discovery task force charge); and n developing a culture that supports change and risk taking (strategic planning process). allow an appropriate strategic direction and pattern of organization to emerge by n developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed (revisioning phase 1: outcomes; phase 2: shared readings, activity; strategic planning process; discovery task force charge); n absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation (strategic planning process, discovery task force charge); and n placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals (strategic planning process, discovery task force charge). of congress, 2006), http://www.loc.gov/catdir/calhoun-report -final.pdf (accessed aug. 12, 2009); bibliographic services task force, rethinking how we provide bibliographic services for the university of california (univ. of california libraries, 2005), http://libraries.universityofcalifornia.edu/sopag/bstf/final .pdf (accessed aug. 12, 2009). 2. gareth morgan, images of organization (thousand oaks, calif.: sage, 2006). 3. chris argyris and donald a. schön, organizational learning ii: theory, method, and practice (reading, mass.: addison wesley, 1996). 4. morgan, images of organization, 87. 5. morgan, images of organization, 87–97. 6. ibid. 7. robert l. bothmann and melissa holmberg, “strategic planning for electronic management,” in electronic resource management in libraries: research and practice, ed. holly yu and scott breivold, 16–28 (hershey, pa.: information science reference, 2008). 8. karen coyle, “the library catalog: some possible futures,” the journal of academic librarianship 33, no. 3 (2007): 414–16. 9. morgan, images of organization. 10. ibid. 11. ibid. 12. ibid. 13. argyris and schön, organizational learning ii. 14. morgan, images of organization. appendix a. tracking unlv libraries’ discovery focus across characteristics of organizational learning “discovery” focus as impetus for organizational learning | fabbi 171 please complete the following and bring to the summit on discovery—february 24: 1. search for the information requested in each row of the table below with three discovery tools as your starting points: the libraries catalog, the libraries website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease) to 5 (highest ease). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information in the space provided. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? appendix b. summit matrix what am i looking for? libraries catalog libraries website google thoughts, etc., on what i discovered what’s all the fuss about frazier hall? why is it important? does unlv libraries have any documents about the history of the university that reference it? it’s black history month and my professor wants me to find an oral history about african americans in las vegas that is available in unlv libraries. i heard that henderson started as a mining community. does unlv libraries have any books about that? find any photograph of the gay pride parade in las vegas that you can look at in unlv libraries. 2 information technology and libraries | march 2010 michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: join us at the forum! t he first lita national forum i attended was in milwaukee, wisconsin. it seems like it was only a couple of years ago, but in fact nine national forums have since passed. i was a new librarian, and i went on a lark when a colleague invited me to attend and let me crash in her room for free. i am so glad i took her up on the offer because it was one of the best conferences i have ever attended. it was the first conference that i felt was made up of people like me, people who shared my interests in technology within the library. the programming was a good mix of practical know-how and mindblowing possibilities. my understanding of what was possible was greatly expanded, and i came home excited and ready to try out the new things i had learned. almost eight years passed before i attended my next forum in cincinnati, ohio. after half a day i wondered why i had waited so long. the program was diverse, covering a wide range of topics. i remember being depressed and outraged on the current state of internet access in the united states as reported by the office for information technology policy. i felt that surge of recognition when i discovered that other universities were having a difficult time documenting and tracking the various systems they run and maintain. i was inspired by david lanke’s talk, “obligations of leadership.” if you missed it you can still hear it online. it is linked from the lita blog (http:// www.litablog.org). while the next forum may seem like a long way off to you, it is in the forefront of my mind. the national forum 2010 planning committee is busy working to make sure this forum lives up to the reputation of forums past. this year’s forum takes place in atlanta, georgia, september 30–october 3. the theme is “the cloud and the crowd.” program proposals are due february 19, so i cannot give you specifics about the concurrent sessions, but we do hope to have presentations about projects, plans, or discoveries in areas of library-related technology involving emerging cloud technologies; software-as-service, as well as social technologies of various kinds; using virtualized or cloud resources for storage or computing in libraries; library-specific open-source software (oss) and other oss “in” libraries; technology on a budget; using crowdsourcing and user groups for supporting technology projects; and training via the crowd. each accepted program is scheduled to maximize the impact for each attendee. programming ranges from five-minute lightening talks to full day preconferences. in addition, on the basis of attendee comments from previous forums, we have also decided to offer thirtyand seventy-five-minute concurrent sessions. these concurrent sessions will be a mix of traditional singleor multispeaker formats, panel discussions, case studies, and demonstrations of projects. finally, poster sessions will also be available. while programs such as the keynote speakers, lightning talks, and concurrent sessions are an important part of the forum experience, so is the opportunity to network with other attendees. i know i have learned just as much talking with a group of people in the hall between sessions, during lunch, or at the networking dinners as i have sitting in the programs. not only is it a great opportunity to catch up with old friends, you will also have the opportunity to make new ones. for instance, at the 2009 national forum in salt lake city, utah, approximately half of the people who attended were first-time attendees. the national forum is an intimate event whose attendance ranges between 250 and 400 people, thus making it easy to forge personal connections. attendees come from a variety of settings, including academic, public, and special libraries; library-related organizations; and vendors. if you want to meet the attendees in a more formal setting you can attend a networking dinner organized on-site by lita members. this year the dinners were organized by the lita president, lita past president, lita presidentelect, and a lita director-at-large. if you have not attended a national forum or it has been a while, i hope i have piqued your interest in coming to the next national forum in atlanta. registration will open in may! the most up-to-date information about the 2010 forum is available at the lita website (http:// www.lita.org). i know that even after my lita presidency is a distant memory, i will still make time to attend the lita national forum. i hope to see you there! 162 information technology and libraries | december 2010 within that goal are two strategies that lend themselves to the topics including playing a role with the office for information technology policy (oitp) with regard to technology related public policy and actively participating in the creation and adoption of international standards within the library community. colby riggs (university of california–irvine) represents lita on the office for information technology policy advisory committee. she also serves on the lita technology access committee, which addresses similar issues. the committee is chaired by elena m. soltau (nova southeastern university). the standards interest group is chaired by anne liebst (linda hall library of science, engineering, and technology). yan han (university of arizona) chairs the standards task force, which was charged to explore and recommend strategies and initiatives lita can implement to become more active in the creation and adoption of new technology standards that align with the library community. the task force will submit their final report before the 2011 ala midwinter meeting. for ongoing information about lita committees, interest groups, task forces, and activities being implemented on these and related topics, be sure to check out ala connect (http://connect.ala.org/) and the lita website (http://www.lita.org). the lita electronic discussion list is there to pose questions you might have. lita members have an opportunity to advocate and participate in a leadership role as the broadband initiative sets the infrastructure for the next ten to fifteen years. lita members are encouraged to pursue these opportunities to ensure a place at the table for lita, its members, and libraries. b y now, most lita members have likely heard about the broadband technology opportunities program (btop) and the national broadband plan. the federal government is allocating grants to the states to develop their broadband infrastructure, and libraries are receiving funding to implement and expand computing in their local facilities. by september 30, 2010, the national telecommunications and information administration (ntia) will have made all btop awards. information about these initiatives can be found at www2.ntia.doc.gov (btop), www.broadband.gov (national broadband plan), and www.ala.org/ala/aboutala/offices/oitp/index.cfm (ala office for information technology policy). on september 21, 2010, a public forum was held in silicon valley to discuss e-rate modernization and innovation in education. the conversation addressed the need to prepare schools and public libraries for broadband. information about the forum is archived at blog .broadband.gov. established in 1996, the e-rate program has provided funding for k–12 schools and public libraries for telecommunications and internet access. the program was successful in a dial-up world. it is time to now address broadband access which is not ubiquitous on a national basis. while the social norm suggests that technology is everywhere and everyone has the skills to use it, there is still plenty of work left to do to ensure that people can use technology and compete in an increasingly digital and global world. how does lita participate? the new strategic plan includes an advocacy and policy goal that calls for lita to advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: btop, broadband, e-rate, and lita generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 171 from previous experience and from research in software engineering. wasted effort and poor interoperability can therefore ensue, raising the costs of dls and jeopardizing the fluidity of information assets in the future. in addition, there is a need for modeling services and data structures as highlighted in the “digital library reference model” proposed by the delos eu network of excellence (also called the “delos manifesto”);2 in fact, the distribution of dl services over digital networks, typically accessed through web browsers or dedicated clients, makes the whole theme of interaction between users important, for both individual usage and remote collaboration. designing and modeling such interactions call for considerations pertaining to the fields of human– computer interaction (hci) and computer-supported cooperative work (cscw). as an example, scenariobased or activity-based approaches developed in the hci area can be exploited in dl design. to meet these needs we developed cradle (cooperative-relational approach to digital library environments),3 a metamodel-based digital library management system (dlms) supporting collaboration in the design, development, and use of dls, exploiting patterns emerging from previous projects. the entities of the cradle metamodel allow the specification of collections, structures, services, and communities of users (called “societies” in cradle) and partially reflect the delos manifesto. the metamodel entities are based on existing dl taxonomies, such as those proposed by fox and marchionini,4 gonçalves et al.,5 or in the delos manifesto, so as to leverage available tools and knowledge. designers of dls can exploit the domain-specific visual language (dvsl) available in the cradle environment—where familiar entities extracted from the referred taxonomies are represented graphically—to model data structures, interfaces and services offered to the final users. the visual model is then processed and transformed, exploiting suitable templates, toward a set of specific languages for describing interfaces and services. the results are finally transformed into platformindependent (java) code for specific dl applications. cradle supports the basic functionalities of a dl through interfaces and service templates for managing, browsing, searching, and updating. these can be further specialized to deploy advanced functionalities as defined by designers through the entities of the proposed visual the design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. to this end, high-level, language-neutral models have to be devised. metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. this paper describes cradle (cooperative-relational approach to digital library environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. a collection of tools allows the automatic generation of several services, defined with the cradle visual language, and of the graphical user interfaces providing access to them for the final user. the effectiveness of the approach is illustrated by presenting digital libraries generated with cradle, while the cradle environment has been evaluated by using the cognitive dimensions framework. d igital libraries (dls) are rapidly becoming a preferred source for information and documentation. both at research and industry levels, dls are the most referenced sources, as testified by the popularity of google books, google video, ieee explore, and the acm portal. nevertheless, no general model is uniformly accepted for such systems. only few examples of modeling languages for developing dls are available,1 and there is a general lack of systems for designing and developing dls. this is even more unfortunate because different stakeholders are interested in the design and development of a dl, such as information architects, to librarians, to software engineers, to experts of the specific domain served by the dl. these categories may have contrasting objectives and views when deploying a dl: librarians are able to deal with faceted categories of documents, taxonomies, and document classification; software engineers usually concentrate on services and code development; information architects favor effectiveness of retrieval; and domain experts are interested in directly referring to the content of interest without going through technical jargon. designers of dls are most often library technical staff with little to no formal training in software engineering, or computer scientists with little background in the research findings of hypertext information retrieval. thus dl systems are usually built from scratch using specialized architectures that do not benefit alessio malizia (alessio.malizia@uc3m.es) is associate professor, universidad carlos iii, department of informatics, madrid, spain; paolo bottoni (bottoni@di.uniroma1.it) is associate professor and s. levialdi (levialdi@di.uniroma1.it) is professor, “sapienza” university of rome, department of computer science, rome, italy. alessio malizia, paolo bottoni, and s. levialdi generating collaborative systems for digital libraries: a model-driven approach 172 information technology and libraries | december 2010 a formal foundation for digital libraries, called 5s, based on the concepts of streams, (data) structures, (resource) spaces, scenarios, and societies. while being evidence of a good modeling endeavor, the approach does not specify formally how to derive a system implementation from the model. the new generation of dl systems will be highly distributed, providing adaptive and interoperable behaviour by adjusting their structure dynamically, in order to act in dynamic environments (e.g., interfacing with the physical world).13 to manage such large and complex systems, a systematic engineering approach is required, typically one that includes modeling as an essential design activity where the availability of such domain-specific concepts as first-class elements in dl models will make application specification easier.14 while most of the disciplines related to dls—e.g., databases,15 information retrieval,16 and hypertext and multimedia17—have underlying formal models that have properly steered them, little is available to formalize dls per se. wang described the structure of a dl system as a domain-specific database together with a user interface for querying the records stored in the database.18 castelli et al. present an approach involving multidimensional query languages for searching information in dl systems that is based on first-order logic.19 these works model metadata specifications and thus are the main examples of system formalization in dl environments. cognitive models for information retrieval, as used for example by oddy et al.,20 focus on users’ information-seeking behavior (i.e., formation, nature, and properties of a users’ information need) and on how information retrieval systems are used in operational environments. other approaches based on models and languages for describing the entities involved in a dl are the digital library definition language,21 the dspace data model22 (with the definitions of communities and workflow models), the metis workflow framework,23 and the fedora structoid approach.24 e/r approaches are frequently used for modeling database management system (dbms) applications,25 but as e/r diagrams only model the static structure of a dbms, they generally do not deal deeply with dynamic aspects. temporal extensions add dynamic aspects to the e/r approach, but most of them are not object-oriented.26 the advent of object-oriented technology calls for approaches and tools to information system design resulting in object-oriented systems. these considerations drove research toward modeling approaches as supported by uml.27 however, since the uml metamodel is not yet widespread in the dl community, we adopted the e/r formalism and complemented it with the specification of the dynamics made available through the user interface, as described by malizia et al.28 using the metamodel, we have defined a dsvl, including basic entities and language. cradle is based on the entity-relationship (e/r) formalism, which is powerful and general enough to describe dl models and is supported by many tools as a metamodeling language. moreover, we observed that users and designers involved in the dl environment, but not coming from a software engineering background, may not be familiar with advanced formalism like unified modeling language (uml), but they usually have basic notions on database management systems, where e/r is largely employed. ■■ literature review dls are complex information systems involving technologies and features from different areas, such as library and information systems, information retrieval, and hci. this interdisciplinary nature is well reflected in the various definitions of dls present in the literature. as far back as 1965, licklider envisaged collections of digital versions of scanned documents accessible via interconnected computers.6 more recently, levy and marshall described dls as sets of collections of documents, together with digital resources, accessible by users in a distributed context.7 to manage the amount of information stored in such systems, they proposed some sort of user-assisting software agent. other definitions include not only printed documents, but multimedia resources in general.8 however different the definitions may be, they all include the presence of collections of resources, their organization in structured repositories, and their availability to remote users through networks (as discussed by morgan).9 recent efforts toward standardization have been taken by public and private organizations. for example, a delphi study identified four main ingredients: an organized collection of resources, mechanisms for browsing and searching, a distributed networked environment, and a set of objectified services.10 the president’s information technology advisory committee (pitac) panel on digital libraries sees dls as the networked collections of digital text, documents, images, sounds, scientific data, and software that make up the core of today’s internet and of tomorrow’s universally accessible digital repositories of all human knowledge.11 when considering dls in the context of distributed dl environments, only few papers have been produced, contrasting with the huge bibliography on dls in general. the dl group at the universidad de las américas puebla in mexico introduced the concept of personal and group spaces, relevant to the cscw domain, in the dl system context.12 users can share information stored in their personal spaces or share agents, thus allowing other users to perform the same search on the document collections in the dl. the cited text by gonçalves et al. gives generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 173 education as discussed by wattenberg or zia.33 in the nsdl program, a new generation of services has been developed that includes support for teaching and learning; this means also considering users’ activities or scenarios and not only information access. services for implementing personal content delivery and sharing, or managing digital resources and modeling collaboration, are examples of tools introduced during this program. the virtual reference desk (vrd) is emerging as an interactive service based on dls. with vrd, users can take advantage of domain experts’ knowledge and librarians’ experience to locate information. for example, the u.s. library of congress ask a librarian service acts as a vrd for users who want help in searching information categories or to interact with expert librarians to search for a specific topic.34 the interactive and collaborative aspects of activities taking place within dls facilitate the development of user communities. social networking, work practices, and content sharing are all features that influence the technology and its use. following borgmann,35 lynch sees the future of dls not in broad services but in supporting and facilitating “customization by community,” i.e., services tailored for domain-specific work practices.36 we also examined the research agenda on systemoriented issues in dls and the delos manifesto.37 the agenda abstracts the dl life cycle, identifying five main areas, and proposes key research problems. in particular we tackle activities such as formal modeling of dls and their communities and developing frameworks coherent with such models. at the architectural level, one point of interest is to support heterogeneous and distributed systems, in particular networked dls and services.38 for interoperability, one of the issues is how to support and interoperate with different metadata models and standards to allow distributed cataloguing and indexing, as in the open archive initiative (oai).39 finally, we are interested in the service level of the research agenda and more precisely in web services and workflow management as crucial features when including communities and designing dls for use over networks and for sharing content. as a result of this analysis, the cradle framework features the following: ■■ a visual language to help users and designers when visual modeling their specific dl (without knowing any technical detail apart from learning how to use a visual environment providing diagrams representations of domain specific elements) ■■ an environment integrating visual modeling and code generation instead of simply providing an integrated architecture that does not hide technical details ■■ interface generation for dealing with different users relationships for modeling dl-related scenarios and activities. the need for the integration of multiple languages has also been indicated as a key aspect of the dsvl approach.29 in fact, complex domains like dls typically consist of multiple subdomains, each of which may require its own particular language. in the current implementation, the definition of dsvls exploits the metamodeling facilities of atom3, based on graph-grammars.30 atom3 has been typically used for simulation and model transformation, but we adopt it here as a tool for system generation. ■■ requirements for modeling digital libraries we follow the delos manifesto by considering a dl as an organization (possibly virtual and distributed) for managing collections of digital documents (digital contents in general) and preserving their images on storage. a dl offers contextual services to communities of users, a certain quality of service, and the ability to apply specific policies. in cradle we leave the definition of quality of service to the service-oriented architecture standards we employ and partially model the applicable policy, but we focus here on crucial interactivity aspects needed to make dls usable by different communities of users. in particular, we model interactive activities and services based on librarians’ experiences in face-to-face communication with users, or designing exchange and integration procedures for communicating between institutions and managing shared resources. while librarians are usually interested in modeling metadata across dls, software engineers aim at providing multiple tools for implementing services,31 such as indexing, querying, semantics,32 etc. therefore we provide a visual model useful for librarians and information architects to mimic the design phases they usually perform. moreover, by supporting component services, we help software engineers to specify and add services on demand to dl environments. to this end, we use a service component model. by sharing a common language, users from different categories can communicate to design a dl system while concentrating on their own tasks (services development and design for software engineers and dl design for librarians and information architects). users are modeled according to the delos manifesto as dl end-users (subdivided into content creators, content consumers, and librarians), dl designers (librarians and information architects), dl system administrators (typically librarians), and dl application developers (software engineers). several activities have been started on modeling domain specific dls. as an example, the u.s. national science digital library (nsdl) program promotes educational dls and services for basic and advanced science 174 information technology and libraries | december 2010 ■■ how that information is structured and organized (structural model) ■■ the behavior of the dl (service model) and the different societies of actors ■■ groups of services acting together to carry out the dl behavior (societal model) figure 1 depicts the design approach supported by cradle architecture, namely, modeling the society of actors and services interacting in the domain-specific scenarios and describing the documents and metadata structure included with the library by defining a visual model for all these entities. the dl is built using a collection of stock parts and configurable components that provide the infrastructure for the new dl. this infrastructure includes the classes of objects and relationships that make up the dl, and processing tools to create and load the actual library collection from raw documents, as well as services for searching, browsing, and collection maintenance. finally, the code generation module generates tailored dl services code stubs by composing and specializing components from the component pool. initially, a dl designer is responsible for formalizing (starting from an analysis of the dl requirements and characteristics) a conceptual description of the dl using metamodel concepts. model specifications are then fed into a dl generator (written in python for atom3), to produce a dl tailored suitable for specific platforms and requirements. after these design phases, cradle generates the code for the user interface and the parts of code corresponding to services and actors interacting in the described society. a set of templates for code generation and designers ■■ flexible metadata definitions ■■ a set of interactive integrated tools for user activities with the generated dl system to sum up, cradle is a dlms aimed at supporting all the users involved in the development of a dl system and providing interfaces, data modeling, and services for user-driven generation of specific dls. although cradle does not yet satisfy all requirements for a generic dl system, it addresses issues focused on developing interactive dl systems, stressing interfaces and communication between users. nevertheless, we employed standards when possible to leave it open for further specification or enhancements from the dl user community. extensive use of xml-based languages allows us to change document information depending on implemented recognition algorithms so that expert users can easily model their dl by selecting the best recognition and indexing algorithms. cradle evolves from the jdan (java-based environment for document applications on networks) platform, which managed both document images and forms on the basis of a component architecture.40 jdan was based on xml technologies, and its modularity allowed its integration in service-based and grid-based scenarios. it supported template code generation and modeling, but it required the designer to write xml specifications and edit xml schema files in order to model the dl document types and services, thus requiring technical knowledge that should be avoided to let users concentrate on their specific domains. ■■ modeling digital library systems the cradle framework shows a unique combination of features: it is based on a formal model, exploits a set of domain-specific languages, and provides automatic code generation. moreover, fundamental roles are played by the concepts of society and collaboration.41 cradle generates code from tools built after modeling a dl (according to the rules defined by the proposed metamodel) and performs automatic transformation and mapping from model to code to generate software tools for a given dl model. the specification of a dl in cradle encompasses four complementary dimensions: ■■ multimedia information supported by the dl (collection model) figure 1. cradle architecture generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 175 socioeconomic, and environment dimensions. we now show in detail the entities and relations in the derived metamodel, shown in figure 2. actor entities actors are the users of dls. actors interact with the dl through services (interfaces) that are (or can be) affected by the actors preferences and messages (raised events). in the cradle metamodel, an actor is an entity with a behavior that may concurrently generate events. communications with other actors may occur synchronously or asynchronously. actors can relate through services to shape a digital community, i.e., the basis of a dl society. in fact, communities of students, readers, or librarians interact with and through dls, generally following predefined scenarios. as an example, societies can behave as query generator services (from the point of view of the library) and as teaching, learning, and working services (from the point of view of other humans and organizations). communication between actors within the same or different societies occur through message exchange. to operate, societies need shared data structures and message protocols, enacted by sending structured sequences of queries and retrieving collections of results. the actor entity includes three attributes: 1. role identifies which role is played by the actor within the dl society. examples of specific human roles include authors, publishers, editors, maintainers, developers, and the library staff. examples of nonhuman actors include computers, printers, telecommunication devices, software agents, and digital resources in general. 2. status is an enumeration of possible statuses for the actor: i. none (default value) ii. active (present in the model and actively generating events) iii. inactive (present in the model but not generating events) iv. sleeping (present in the model and awaiting for a response to a raised event) 3. events describes a list of events that can be raised by the actor or received as a response message from a service. examples of events are borrow, reserve, return, etc. events triggered from digital resources include store, trash, and transfer. examples of response events are found, not found, updated, etc. have been built for typical services of a dl environment. to improve acceptability and interoperability, cradle adopts standard specification sublanguages for representing dl concepts. most of the cradle model primitives are defined as xml elements, possibly enclosing other sublanguages to help define dl concepts. in more detail, mime types constitute the basis for encoding elements of a collection. the xml user interface language (xul)42 is used to represent appearance and visual interfaces, and xdoclet is used in the libgen code generation module, as shown in figure 1.43 ■■ the cradle metamodel in the cradle formalism, the specification of a dl includes a collection model describing the maintained multimedia documents, a structural model of information organization, a service model for the dl behavior, and a societal model describing the societies of actors and groups of services acting together to carry out the dl behavior. a society is an instance of the cradle model defined according to a specific collaboration framework in the dl domain. a society is the highest-level component of a dl and exists to serve the information needs of its actors and to describe its context of usage. hence a dl collects, preserves, and shares information artefacts for society members. the basic entities in cradle are derived from the categorization along the actors, activities, components, figure 2. the cradle metamodel with the e/r formalism 176 information technology and libraries | december 2010 a text document, including scientific articles and books, becomes a sequence of strings. the struct entity a struct is a structural element specifying a part of a whole. in dls, structures represent hypertexts, taxonomies, relationships between elements, or containment. for example, books can be structured logically into chapters, sections, subsections, and paragraphs, or physically into cover, pages, line groups (paragraphs), and lines. structures are represented as graphs, and the struct entity (a vertex) contains four attributes: 1. document is a pointer to the document entity the structure refers to. 2. id is a unique identifier for a structure element. 3. type takes three possible values: i. metadata denotes a content descriptor, for instance title, author, etc. ii. layout denotes the associated layout, e.g., left frame, columns, etc. iii. item indicates a generic structure element used for extending the model. 4. values is a list of values describing the element content, e.g., title, author, etc. actors interact with services in an event-driven way. services are connected via messages (send and reply) and can be sequential, concurrent, or task-related (when a service acts as a subtask of a macroservice). services perform operations (e.g., get, add, and del) on collections, producing collections of documents as results. struct elements are connected to each other as nodes of a graph representing metadata structures associated with documents. the metamodel has been translated to a dsvl, associating symbols and icons with entities and relations (see “cradle language and tools” below). with respect to the six core concepts of the delos manifesto (content, user, functionality, quality, policy, and architecture), content can be modeled in cradle as collections and structs, user as actor, and functionality as service. the quality concept is not directly modeled in cradle, but for quality of service we support standard service architecture. policies can be partially modeled by services managing interaction between actors and collections, making it possible to apply standard access policies. from the architectural point of view, we follow the reference architecture of figure 1. ■■ cradle language and tools in this section we describe the selection of languages and tools of the cradle platform. to improve interoperability service entities services describe scenarios, activities, operations, and tasks that ultimately specify the functionalities of a dl, such as collecting, creating, disseminating, evaluating, organizing, personalizing, preserving, requesting, and selecting documents and providing services to humans concerned with fact-finding, learning, gathering, and exploring the content of a dl. all these activities can be described and implemented using scenarios and appear in the dl setting as a result of actors using services (thus societies). furthermore, these activities realize and shape relationships within and between societies, services, and structures. in the cradle metamodel, the service entity models what the system is required to do, in terms of actions and processes, to achieve a task. a detailed task analysis helps understand the current system and the information flow within it in order to design and allocate tasks appropriately. the service entity has four attributes: 1. name is a string representing a textual description of the service. 2. sync states whether communication is synchronous or asynchronous, modeled by values wait and nowait, respectively. 3. events is a list of messages that can trigger actions among services (tasks); for example, valid or notvalid in case of a parsing service. 4. responses contain a list of response messages that can reply to raised events; they are used as a communication mechanism by actors and services. the collection entity collections are sets of documents of arbitrary type (e.g., bits, characters, images, etc.) used to model static or dynamic content. in the static interpretation, a collection defines information content interpreted as a set of basic elements, often of the same type, such as plain text. examples of dynamic content include video delivered to a viewer, animated presentations, and so on. the attributes of collection are name and documents. name is a string, while documents is a list of pairs (documentname, documentlabel), the latter being a pointer to the document entity. the document entity documents are the basic elements in a dl and are modeled with attributes label and structure. label defines a textual string used by a collection entity to refer to the document. we can consider it as a document identifier, specifying a class or a type of document. structure defines the semantics and area of application of the document. for example, any textual representation can be seen as a string of characters, so that generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 177 graphs. model manipulation can then be expressed via graph grammars also specified in atom3. the general process of automatic creation of cooperative dl environments for an application is shown in figure 3. initially, a designer formalizes a conceptual description of the dl using the cradle metamodel concepts. this phase is usually preceded by an analysis of requirements and interaction scenarios, as seen previously. model specifications are then provided to a dl code generator (written in python within atom3) to produce dls tailored to specific platforms and requirements. these are built on a collection of templates of services and configurable components providing infrastructure for the new dl. the sketched infrastructure includes classes for objects (tasks), relationships making up the dl, and processing tools to upload the actual library collection from raw documents, as well as services for searching and browsing and for document collections maintenance. the cradle generator automatically generates different kinds of output for the cradle model of the cooperative dl environment, such as service and collection managers. collection managers define the logical schemata of the dl, which in cradle correspond to a set of mime types, xul and xdoclet specifications, representing digital objects, their component parts, and linking information. collection managers also store instances of their and collaboration, cradle makes extensive use of existing standard specification languages. most cradle outputs are defined with xml-based formats, able to enclose other specific languages. the basic languages and corresponding tools used in cradle are the following: ■■ mime type. multipurpose internet mail extensions (mime) constitute the basis for encoding documents in cradle, supporting several file formats and types of character encoding. mime was chosen because of wide availability of mime types, and standardisation of the approach. this makes it a natural choice for dls where different types of documents need to be managed (pdf, html, doc, etc.). moreover, mime standards for character encoding descriptions help keeping the cradle framework open and compliant with standards. ■■ xul. the xml user interface language (xul) is an xml-based markup language used to represent appearance and visual interfaces. xul is not a public standard yet, but it uses many existing standards and technologies, including dtd and rdf,44 which makes it easily readable for people with a background in web programming and design. the main benefit of xul is that it provides a simple definition of common user interface elements (widgets). this drastically reduces the software development effort required for visual interfaces. ■■ xdoclet. xdoclet is used for generating services from tagged-code fragments. it is an open-source code generation library which enables attribute-oriented programming for java via insertion of special tags.45 it includes a library of predefined tags, which simplify coding for various technologies, e.g., web services. the motivation for using xdoclet in the cradle framework is related to its approach for template code generation. designers can describe templates for each service (browse, query, and index) and the xdoclet generated code can be automatically transformed into the java code for managing the specified service. ■■ atom3. atom3 is a metamodeling system to model graphical formalisms. starting from a metaspecification (in e/r), atom3 generates a tool to process models described in the chosen formalism. models are internally represented using abstract syntax figure 3. cooperative dl generation process with cradle framework 178 information technology and libraries | december 2010 and (3) the metadata operations box. the right column manages visualization and multimedia information obtained from documents. the basic features provided with the ui templates are document loading, visualization, metadata organization, and management. the layout template, in the collection box, manages the visualization of the documents contained in a collection, while the visualization template works according to the data (mime) type specified by the document. actually, by selecting a document included in the collection, the corresponding data file is automatically uploaded and visualized in the ui. the metadata visualization in the code template reflects the metadata structure (a tree) represented by a struct, specifying the relationship between parent and child nodes. thus the xul template includes an area (the metadata box) for managing tree structures as described in the visual model of the dl. although the tree-like visualization has potential drawbacks if there are many metadata items, there should be no real concern with medium loads. the ui template also includes a box to perform operations on metadata, such as insert, delete, and edit. users can select a value in the metadata box and manipulate the presented values. figure 4 shows an example of a ui generated from a basic template. service templates to achieve automated code generation, we use xdoclet to specify parameters and service code generation according to such parameters. cradle can automatically annotate java files with name–value pairs, and xdoclet provides a syntax for parameter specification. code generation is classes and function as search engines for the system. services classes also are generated and are represented as attribute-oriented classes involving parts and features of entities. ■■ cradle platform the cradle platform is based on a model-driven approach for the design and automatic generation of code for dls. in particular, the dsvl for cradle has four diagram types (collection, structure, service, and actor) to describe the different aspects of a dl. in this section we describe the user interface (ui) and service templates used for generating the dl tools. in particular, the ui layout is mainly generated from the structured information provided by the document, struct, and collection entities. the ui events are managed by invoking the appropriate services according to the imported xul templates. at the service and communication levels, the xdoclet code is generated by the service and actor entities, exploiting their relationships. we also show how code generation works and the advanced platform features, such as automatic service discovery. at the end of the section a running example is shown, representing all the phases involved in using the cradle framework for generating the dl tools for a typical library scenario. user interface templates the generation of the ui is driven by the visual model designed by the cradle user. specifically, the model entities involved in this process are document, struct and collection (see figure 2) for the basic components and layout of the interfaces, while linked services are described in the appropriate templates. the code generation process takes place through transformations implemented as actions in the atom3 metamodel specification, where graph-grammar rules may have a condition that must be satisfied for the rule to be applied (preconditions), as well as actions to be performed when the rule is executed (postconditions). a transformation is described during the visual modeling phase in terms of conditions and corresponding actions (inserting xul language statements for the interface in the appropriate code template placeholders). the generated user interface is built on a set of xul template files that are automatically specialized on the basis of the attributes and relationships designed in the visual modeling phase. the layout template for the user interface is divided into two columns (see figure 4). the left column is made of three boxes: (1) the collection box (2) the metadata box, figure 4. an example of an automatically generated user interface. (a) document area; (b) collection box; (c) metadata box; (d) metadata operations box. generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 179 "msg arguments.argname"> { "" , "" "" } , }; the first two lines declare a class with a name class nameimpl that extends the class name. the xdoclet template tag xdtclass:classname denotes the name of the class in the annotated java file. all standard xdoclet template tags have a namespace starting with “xdt.” the rest of the template uses xdtfield : forallfield to iterate through the fields. for each field with a tag named msg arguments.argname (checked using xdtfield : ifhasfieldtag), it creates a subarray of strings using the values obtained from the field tag parameters. xdtfield : fieldname gives the name of the field, while xdtfield : fieldtagvalue retrieves the value of a given field tag parameter. characters that are not part of some xdoclet template tags are directly copied into the generated code. the following code segment was generated by xdoclet using the annotated fields and the above template segment: public class msgargumentsimpl extends msgarguments { public static string[ ][ ] argumentnames = new string[ ][ ]{ { "eventmsg" , " event " , " eventstring " } , { " responsemsg " , " response " , " responsestring " } , }; } similarly, we generate the getter and setter methods for each field: public get () { return ; } public void set ( string value ) { based on code templates. hence service templates are xdoclet templates for transforming xdoclet code fragments obtained from the modeled service entities. the basic xdoclet template manages messages between services, according to the event and response attributes described in “cradle language and tools” above. in fact, cradle generates a java application (a service) that needs to receive messages (event) and reply to them (response) as parameters for the service application. in xdoclet, these can be attached to the corresponding field by means of annotation tags, as in the following code segments: public class msgarguments { . . . . . . /* * @msg arguments.argname name="event " desc="event_string " */ protected string eventmsg = null; /* * @msg arguments.argname name="response" * desc="response_string " */ protected string responsemsg = null; } each msg arguments.argname related to a field is called a field tag. each field tag can have multiple parameters, listed after the field tag. in the tag name msg arguments .argname, the prefix serves as the namespace of all tags for this particular xdoclet application, thus avoiding naming conflicts with other standard or customized xdoclet tags. not only fields can be annotated, but also other entities such as class and functions can have tags too. xdoclet enables powerful code generation requiring little or no customization (depending on how much is provided by the template). the type of code to be generated using the parameters is defined by the corresponding xdoclet template. we have created template files composed of java codes and special xdoclet instructions in the form of xml tags. these xdoclet instructions allow conditionals (if) and loops (for), thus providing us with expressive power close to a programming language. in the following example, we first create an array containing labels and other information for each argument: public class impl extends { public static string[ ][ ] argumentnames = new string[ ][ ] { " , value ) ; }< /xdtfield : ifhasfieldtag> this translates into the following generated code: public java.lang.string get eventmsg ( ) { return eventmsg ; } public void set eventmsg ( string value ) { setvalue ( "eventmsg" , value ) ; } public java.lang.string getresponsemsg ( ) { return getresponsemsg ; } public void setresponsemsg ( string value ) { setvalue ( " responsemsg " , value ) ; } the same template is used for managing the name and sync attributes of service entities. code generation, service discovery, and advanced features a service or interface template only describes the solution to a particular design problem—it is not code. consequently, users will find it difficult to make the leap from the template description to a particular implementation even though the template might include sample code. others, like software engineers, might have no trouble translating the template into code, but they still may find it a chore, especially when they have to do it repeatedly. the cradle visual design environment (based on atom3) helps alleviate these problems. from just a few pieces of information (the visual model), typically application-specific names for actors and services in a dl society along with choices for the design tradeoffs, the tool can create class declarations and definitions implementing the template. the ultimate goal of the modeling effort remains, however, the production of reliable and efficiently executable code. hence a code generation transformation produces interface (xul) and service (java code from xdoclet templates) code from the dl model. we have manually coded xul templates specifying the static setup of the gui, the various widgets and their layout. this must be complemented with code generated from a dl model of the systems dynamics coded into services. while other approaches are possible,46 we employed the solution implemented within the atom3 environment according to its graph grammar modeling approach to code generation. cradle supports a flexible iterative process for visual design and code generation. in fact, a design change might require substantial reimplementation generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 181 selecting one, the ui activates the metadata operations box—figure 6(d). the selected metadata node will then be presented in the lower (metadata operations) box, labeled “set metadata values,” replacing the default “none” value as shown in figure 6. after the metadata item is presented, the user can edit its value and save it by clicking on the “set value” button. the associated action saves the metadata information and causes its display in the intermediate box (tree-like structure), changing the visualization according to the new values. the code generation process for the do_search and front desk services is based on xdoclet templates. in particular, a message listener template is used to generate the java code for the front desk service. in fact, the front desk service is asynchronous and manages communications between actors. the actors classes are generated also by using the services templates since they have attributes, events, and messages, just like the services. the do_search service code is based on the producer and consumer templates, since it is synchronous by definition in the modeled scenario. a get method retrieving a collection of documents is implemented from the getter template. the routine invoked by the transformation action for struct entities performs a breadth-first exploration of the metadata tree in the visual model and attaches the corresponding xul code for displaying the struct node in the correct position within the graph structure of the ui. collections, while a single rectangle connected to a collection represents a document entity; the circles linked to the document entity are the struct (metadata) entities. metadata entities are linked to the node relationships (organized as a tree) and linked to the document entity by a metadata linktype relationship. the search service is synchronous (sync attribute set to “wait”). it queries the document collection (get operation) looking for the requested document (using metadata information provided by the borrow request), and waits for the result of get (a collection of documents). based on this result, the service returns a boolean message “is_available,” which is then propagated as a response to the librarian and eventually to the student, as shown in figure 5. when the library designer has built the model, the transformation process can be run, executing the code generation actions associated with the entities and services represented in the model. the code generation process is based on template code snippets generated from the atom3 environment graph transformation engine, following the generative rules of the metamodel. we also use pre– and postconditions on application of transformation rules to have code generation depend on verification of some property. the generated ui is presented in figure 6. on the right side, the document area is presented according to the xul template. documents are managed according to their mime type: the pdf file of the example is loaded with the appropriate adobe acrobat reader plug-in. on the left column of the ui are three boxes, according to the xul template. the collection box—figure 6(b)— presents the list of documents contained in the collection specified by the documents attribute of the library collection entity, and allows users to interact with documents. after selecting a document by clicking on the list, it is presented in the document area—figure 6(a)—where it can be managed (edit, print, save, etc.). in the metadata box—figure 6(c)—the tree structure of the metadata is depicted according to the categorization modeled by the designer. the xul template contains all the basic layout and action features for managing a tree structure. the generated box contains the parent and child nodes according to the attributes specified in the corresponding struct elements. the user can click on the root for compacting or exploding the tree nodes; by figure 5. the library model, alias the model of the library society 182 information technology and libraries | december 2010 workflow system. the release collection maintains the image files in a permanent storage, while data is written to the target database or content management software, together with xml metadata snippets (e.g., to be stored in xml native dbms). a typical configuration would have the recognition service running on a server cluster, with many dataentry services running on different clients (web browsers directly support xul interfaces). whereas current document capture environments are proprietary and closed, the definition of an xml-based interchange format allows the suitable assembly of different component-based technologies in order to define a complex framework. the realization of the jdan dl system within the cradle framework can be considered as a preliminary step in the direction of a standard multimedia document managing platform with region segmentation and classification, thus aiming at automatic recognition of image database and batch acquisition of multiple multimedia documents types and formats. personal and collaborative spaces a personal space is a virtual area (within the dl society) that is modeled as being owned and maintained by a user including resources (document collections, services, etc.), or references to resources, which are relevant to a task, or set of tasks, the user needs to carry out in the dl. personal spaces may thus contain digital documents in multiple media, personal schedules, visualization tools, and user agents (shaped as services) entitled with various tasks. resources within personal spaces can be allocated ■■ designing and generating advanced collaborative dl systems in this section we show the use of cradle as an analytical tool helpful in comprehending specific dl phenomena, to present the complex interplays that occur between cradle components and dl concepts in a real dl application, and to illustrate the possibility of using cradle as a tool to design and generate advanced tools for dl development. modeling document images collections with cradle, the designer can provide the visual model of the dl society involved in document management and the remaining phases are automatically carried out by cradle modules and templates. we have provided the user with basic code templates for the recognition and indexing services, the data-entry plug-in, and archive release. the designer can thus simply translate the particular dl society into the corresponding visual model within the cradle visual modeling editor. as a proof of concept, figure 7 models the jdan architecture, introduced in “requirements for modeling digital libraries,” exploiting the cradle visual language. the recognition service performs the automatic document recognition and stores the corresponding document images, together with the extracted metadata in the archive collection. it interacts with the scanner actor, representing a machine or a human operator that scans paper documents. designers can choose their own segmentation method or algorithm; what is required to be compliant with the framework is to produce an xdoclet template. it stores the document images into the archive collection, with its different regions layout information according to the xml metadata schema provided by the designer. if there is at least one region marked as “not interpreted,” the dataentry service is invoked on the “not interpreted” regions. the data-entry service allows operators to evaluate the automatic classification performed by the system and edit the segmentation for indexing. operators can also edit the recognized regions with the classification engine (included in the recognition service) and adjust their values and sizes. the output of this phase is an xml description that will be imported in the indexing service for indexing (and eventually querying). the archive collection stores all of the basic information kept in jdan, such as text labels, while the indexing service, based on a multitier architecture, exploiting jboss 3.0, has access to them. this service is responsible for turning the data fragments in the archive collection into useful forms to be presented to the final users, e.g., a report or a query result. the final stage in the recognition process could be to release each document to a content management or figure 6. the ui generated by cradle transforming the library model in xul and xdoclet code generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 183 and metadata, but also can share information with the various committees collaborating for certain tasks. ■■ evaluation in this section we evaluate the presented approach from three different perspectives: usability of the cradle notation, its expressiveness, and usability of the generated dls. usability of cradle notation we have tested it by using the well known cognitive dimensions framework for notations and visual language design.48 the dimensions are usually employed to evaluate the usability of a visual language or notation, or as heuristics to drive the design of innovative visual languages. the significant results are as follows. abstraction gradient an abstraction is a grouping of elements to be treated as one entity. in this sense, cradle is abstraction-tolerant. it provides entities for high-level abstractions of communication processes and services. these abstractions are intuitive as they are visualized as the process they represent (services with events and responses) and easy to learn as their configuration implies few simple attributes. although cradle does not allow users to build new abstractions, the e/r formalism is powerful enough to provide basic abstraction levels. closeness of mapping cradle elements have been assigned icons to resemble their real-world counterparts (e.g., a collection is represented as a set of paper sheets). the elements that do not have a correspondence with a physical object in the real world have icons borrowed from well-known notations (e.g., structs represented as graph nodes). consistency a notation is consistent if a user knowing some of its structure can infer most of the rest. in cradle, when two elements represent the same entity but can be used either as input or as output, then their shape is equal but incorporates an incoming or an outgoing message in order to differentiate them. see, for example, the icons for services or those for graph nodes representing either a according to the user’s role. for example, a conference chair would have access to conference-specific materials, visualization tools and interfaces to upload papers for review by a committee. similarly, we denote a group space as a virtual area in which library users (the entire dl society) can meet to conduct collaborative activities synchronously or asynchronously. explicit group spaces are created dynamically by a designer or facilitator who becomes (or appoints) the owner of the space and defines who the participants will be. in addition to direct user-touser communication, users should be able to access library materials and make annotations on them for every other group to see. ideally, users should be able to act (and carry dl materials with them) between personal and group spaces or among group spaces to which they belong. it may also be the case, however, that a given resource is referenced in several personal or group spaces. basic functionality required for personal spaces includes capabilities for viewing, launching, and monitoring library services, agents, and applications. like group spaces, personal spaces should provide users with the means to easily become aware of other users and resources that are present in a given group space at any time, as well as mechanisms to communicate with other users and make annotations on library resources. we employed this personal and group space paradigm in modeling a collaborative environment in the academic conferences domain, where a conference chair can have a personal view of the document collections (resources) figure 7. the cradle model for the jdan framwork 184 information technology and libraries | december 2010 of “sapienza” university of rome (undergraduate students), shown in figure 5, and (2) an application employed with a project of records management in a collaboration between the computer science and the computer engineering department of “sapienza” university, as shown in figure 7. usability of the generated tools environments for single-view languages generated with atom3 have been extensively used, mostly in an academic setting, in different areas like software and web engineering, modeling, and simulation; urban planning; etc. however, depending on the kind of the domain, generating the results may take some time. for instance, the state reachability analysis in the dl example takes a few minutes; we are currently employing a version of atom3 that includes petri-nets formalism where we can test the services states reachability.49 in general, from application experience, we note the general agreement that automated syntactical consistency support greatly simplifies the design of complex systems. finally, some users pointed out some technical limitations of the current implementation, such as the fact that it is not possible to open several views at a time. altogether, we believe this work contributes to make more efficient and less tedious the definition and maintenance of environments for dls. our model-based approach must be contrasted with the programmingcentric approach of most case tools, where the language and the code generation tools are hard-coded so that whenever a modification has to be done (whether on the language or on the semantic domain) developers have to dive into the code. ■■ conclusions and future work dls are complex information systems that integrate findings from disciplines such as hypertext, information retrieval, multimedia, databases, and hci. dl design is often a multidisciplinary effort, including library staff and computer scientists. wasted effort and poor interoperability can therefore ensue. examining the related bibliography, we noted that there is a lack of tools or automatic systems for designing and developing cooperative dl systems. moreover, there is a need for modeling interactions between dls and users, such as scenario or activity-based approaches. the cradle framework fulfills this gap by providing a model-driven approach for generating visual interaction tools for dls, supporting design and automatic generation of code for dls. in particular, we use a metamodel made of different diagram types (collection, structures, service, and struct or an actor, with different colors. diffuseness/terseness a notation is diffuse when many elements are needed to express one concept. cradle is terse and not diffuse because each entity expresses a meaning on its own. error-proneness data flow visualization reduces the chance of errors at a first level of the specification. on the other hand, some mistakes can be introduced when specifying visual entities, since it is possible to express relations between source and target models which cannot generate semantically correct code. however, these mistakes should be considered “programming errors more than slips,” and may be detected through progressive evaluation. hidden dependencies a hidden dependency is a relation between two elements that is not visible. in cradle, relevant dependencies are represented as data flows via directed links. progressive evaluation each dl model can be tested as soon as it is defined, without having to wait until the whole model is finished. the visual interface for the dl can be generated with just one click, and services can be subsequently added to test their functionalities. viscosity cradle has a low viscosity because making small changes in a part of a specification does not imply lots of readjustments in the rest of it. one can change properties, events or responses and these changes will have only local effect. the only local changes that could imply performing further changes by hand are deleting entities or changing names; however, this would imply minimal changes (just removing or updating references to them) and would only affect a small set of subsequent elements in the same data flow. visibility a dl specification consists of a single set of diagrams fitting in one window. empirically, we have observed that this model usually involves no more than fifteen entities. different, independent cradle models can be simultaneously shown in different windows. expressiveness of cradle the paper has illustrated the expressiveness of cradle by defining different entities end relationships for different dl requisites. to this end, two different applications have been considered: (1) a basic example elaborated with the collaboration of the information science school generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 185 retrieval (reading, mass.: addison-wesley, 1999). 17. d. lucarella and a. zanzi, “a visual retrieval environment for hypermedia information systems,” acm transactions on information systems 14 (1996): 3–29. 18. b. wang, “a hybrid system approach for supporting digital libraries,” international journal on digital libraries 2 (1999): 91–110,. 19. d. castelli, c. meghini, and p. pagano, “foundations of a multidimensional query language for digital libraries,” in proc. ecdl ’02, lncs 2458 (berlin: springer, 2002): 251–65. 20. r. n. oddy et al., eds., proc. joint acm/bcs symposium in information storage & retrieval (oxford: butterworths, 1981). 21. k. maly, m. zubair et al., “scalable digital libraries based on ncstrl/dienst,” in proc. ecdl ’00 (london: springer, 2000): 168–79. 22. r. tansley, m. bass and m. smith, “dspace as an open archival information system: current status and future directions,” proc. ecdl ’03, lncs 2769 (berlin: springer, 2003): 446–60. 23. k. m. anderson et al., “metis: lightweight, flexible, and web-based workflow services for digital libraries,” proc. 3rd acm/ieee-cs jcdl ’03 (los alamitos, calif.: ieee computer society, 2003): 98–109. 24. n. dushay, “localizing experience of digital content via structural metadata,” in proc. 2nd acm/ieee-cs jcdl ’02 (new york: acm, 2002): 244–52. 25. m. gogolla et al., “integrating the er approach in an oo environment,” proc. er, ’93 (berlin: springer, 1993): 376–89. 26. heidi gregersen and christian s. jensen, “temporal entity-relationship models—a survey,” ieee transactions on knowledge & data engineering 11 (1999): 464–97. 27. b. berkem, “aligning it with the changes using the goal-driven development for uml and mda,” journal of object technology 4 (2005): 49–65. 28. a. malizia, e. guerra, and j. de lara, “model-driven development of digital libraries: generating the user interface,” proc. mddaui ’06, http://sunsite.informatik.rwth-aachen.de/ publications/ceur-ws/vol-214/ (accessed oct 18, 2010). 29. d. l. atkins et al., “mawl: a domain-specific language for form-based services,” ieee transactions on software engineering 25 (1999): 334–46. 30. j. de lara and h. vangheluwe, “atom3: a tool for multi-formalism and meta-modelling,” proc. fase ’02 (berlin: springer, 2002): 174–88. 31. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” journal of information technology & libraries 28 (2009): 21–30. 32. n. santos, f. c. a. campos, and r. m. m. braga, “digital libraries and ontology,” in handbook of research on digital libraries: design, development, and impact, ed. y.-l. theng et al. (hershey, pa.: idea group, 2008): 1:19. 33. f. wattenberg, “a national digital library for science, mathematics, engineering, and technology education,” d-lib magazine 3 no. 10 (1998), http://www.dlib.org/dlib/october98/ wattenberg/10wattenberg.html (accessed oct 18, 2010); l. l. zia, “the nsf national science, technology, engineering, and mathematics education digital library (nsdl) program: new projects and a progress report,” d-lib magazine, 7, no. 11 (2002), http://www.dlib.org/dlib/november01/zia/11zia.html (accessed oct 18, 2010). 34. u.s. library of congress, ask a librarian, http://www.loc society), which describe the different aspects of a dl. we have built a code generator able to produce xul code from the design models for the dl user interface. moreover, we use template code generation integrating predefined components for the different services (xdoclet language) according to the model specification. extensions of cradle with behavioral diagrams and the addition of analysis and simulation capabilities are under study. these will exploit the new atom3 capabilities for describing multiview dsvls, to which this work directly contributed. references 1. a. m. gonçalves, e. a fox, “5sl: a language for declarative specification and generation of digital libraries,” proc. jcdl ’02 (new york: acm, 2002): 263–72. 2. l. candela et al., “setting the foundations of digital libraries: the delos manifesto,” d-lib magazine 13 (2007), http://www.dlib.org/dlib/march07/castelli/03castelli.html (accessed oct 18, 2010). 3. a. malizia et al., “a cooperative-relational approach to digital libraries,” proc. ecdl 2007, lncs 4675 (berlin: springer, 2007): 75–86. 4. e. a. fox and g. marchionini, “toward a worldwide digital library,” communications of the acm 41 (1998): 29–32. 5. m. a. gonçalves et al., “streams, structures, spaces, scenarios, societies (5s): a formal model for digital libraries,” acm transactions on information systems 22 (2004): 270–312. 6. j. c. r. licklider, libraries of the future (cambridge, mass.: mit pr., 1965). 7. d. m. levy and c. c. marshall, “going digital: a look at assumptions underlying digital libraries,” communications of the acm 38 (1995): 77–84. 8. r. reddy and i. wladawsky-berger, “digital libraries: universal access to human knowledge—a report to the president,” 2001, www.itrd.gov/pubs/pitac/pitac-dl-9feb01.pdf (accessed mar. 16, 2010). 9. e. l. morgan, “mylibrary: a digital library framework and toolkit,” journal of information technology & libraries 27 (2008): 12–24. 10. t. r. kochtanek and k. k. hein, “delphi study of digital libraries,” information processing management 35 (1999): 245–54. 11. s. e. howe et al., “the president’s information technology advisory committee’s february 2001 digital library report and its impact,” in proc. jcdl ’01 (new york: acm, 2001): 223–25. 12. n. reyes-farfan and j. a. sanchez, “personal spaces in the context of oa,” proc. jcdl ’03 (ieee computer society, 2003): 182–83. 13. m. wirsing, report on the eu/nsf strategic workshop on engineering software-intensive systems, 2004, http://www.ercim. eu/eu-nsf/sis.pdf (accessed oct 18, 2010) 14. s. kelly and j.-p. tolvanen, domain-specific modeling: enabling full code generation (hoboken, n.j.: wiley, 2008). 15. h. r. turtle and w. bruce croft, “evaluation of an inference network-based retrieval model,” acm transactions on information systems 9 (1991): 187–222. 16. r. a. baeza-yates, b. a. ribeiro-neto, modern information 186 information technology and libraries | december 2010 .mozilla.org/en/xul (accessed mar. 16, 2010). 43. xdoclet, welcome! what is xdoclet? http://xdoclet .sourceforge.net/xdoclet/index.html (accessed mar. 16, 2010). 44. w3c, extensible markup language (xml) 1.0 (fifth edition), http://www.w3.org/tr/2008/rec-xml-20081126/ (accessed mar. 16, 2010); w3c, resource description framework (rdf), http://www.w3.org/rdf/ (accessed mar. 16, 2010). 45. h. wada and j. suzuki, “modeling turnpike frontend system: a model-driven development framework leveraging uml metamodeling and attribute-oriented programming,” proc. models ’05, lncs 3713 (berlin: springer, 2005): 584–600. 46. i. horrocks, constructing the user interface with statecharts (boston: addison-wesley, 1999). 47. universal discover, description, and integration oasis standard, welcome to uddi xml.org, http://uddi.xml.org/ (accessed mar. 16, 2010). 48. t. r. g. green and m. petre, “usability analysis of visual programming environments: a ‘cognitive dimensions framework,’” journal of visual languages & computing 7 (1996): 131–74. 49. j. de lara, e. guerra, and a. malizia, “model driven development of digital libraries—validation, analysis and formal code generation,” proc. 3rd webist ’07 (berlin: springer, 2008). .gov/rr/askalib/ (accessed on mar. 16, 2010). 35. c. l. borgmann, “what are digital libraries? competing visions,” information processing & management 25 (1999):227–43. 36. c. lynch, “coding with the real world: heresies and unexplored questions about audience, economics, and control of digital libraries,” in digital library use: social practice in design and evaluation, ed. a. p. bishop, n. a. van house, and b. buttenfield (cambridge, mass.: mit pr., 2003): 191–216. 37. y. ioannidis et al., “digital library information-technology infrastructure,” international journal of digital libraries 5 (2005): 266–74. 38. e. a. fox et al., “the networked digital library of theses and dissertations: changes in the university community,” journal of computing higher education 13 (2002): 3–24. 39. h. van de sompel and c. lagoze, “notes from the interoperability front: a progress report on the open archives initiative,” proc. 6th ecdl, 2002, lncs 2458 (berlin: springer 2002): 144–57. 40. f. de rosa et al., “jdan: a component architecture for digital libraries,” delos workshop: digital library architectures, (padua, italy: edizioni libreria peogetto, 2004): 151–62. 41. defined as a set of actors (users) playing roles and interacting with services. 42. mozilla developer center, xul, https://developer librarians and technology skill acquisition: issues and perspectives | farney 141click analytics: visualizing website use data | farney 141 tutorial tabatha a. farney librarians who create website content should have access to website usage statistics to measure their webpages’ effectiveness and refine the pages as necessary.3 with web analytics libraries can increase the effectiveness of their websites, and as marshall breeding has observed, libraries can regularly use website statistics to determine how new webpage content is actually being used and make revisions to the content based on this information.4 several recent studies used google analytics to collect and report website usage statistics to measure website effectiveness and improve their usability.5 while web analytics are useful in a website redesign process, several studies concluded that web usage statistics should not be the sole source of information used to evaluate a website. these studies recommend using click data in conjunction with other website usability testing methods.6 background a lack of research on the use of click analytics in libraries motivated the web services librarian to explore their potential by directly implementing them on the library’s website. she found that there are several click analytics products available and each has its own unique functionality. however, many are commercially produced and expensive. with limited funding, the web services librarian selected google analytics’ in-page analytics, clickheat, and crazy egg because they are either free or inexpensive. each tool was evaluated on the library’s website for over a six month period. because google analytics cannot discern between the same link repeated in multiple places on a webpage. furthermore, she wanted to use website use data to determine the areas of high and low usage on the library’s homepage, and use this information to justify her webpage reorganization decisions. although this data can be found in a google analytics report, the web services librarian found it difficult to easily identify the necessary information within the massive amount of data the reports contain. the web services librarian opted to use click analytics, also known as click density analysis or site overlay, a subset of web analytics that reveals where users click on a webpage.1 a click analytics report produces a visual representation of what and where visitors are clicking on an individual webpage by overlaying the click data on top of the webpage that is being tested. rather than wading through the data, libraries can quickly identify what content users are clicking by using a click analytics report. the web services librarian tested several click analytics products while reassessing the library’s homepage. during this process she discovered that each click analytics tool had different functionalities that impacted their usefulness to the library. this paper introduces and evaluates three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, in the context of redesigning the library’s homepage and discusses the benefits and drawbacks of each. literature review library literature indicates that libraries are actively engaged in interpreting website usage data for a variety of purposes. laura b. cohen’s study encourages libraries to use their website usage data to enhance their understanding of how visitors access and use library websites.2 jeanie m. welch further recommends that all click analytics: visualizing website use data editor’s note: this paper is adapted from a presentation given at the 2010 lita forum click analytics is a powerful technique that displays what and where users are clicking on a webpage helping libraries to easily identify areas of high and low usage on a page without having to decipher website use data sets. click analytics is a subset of web analytics, but there is little research that discusses its potential uses for libraries. this paper introduces three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, and evaluates their usefulness in the context of redesigning a library’s homepage. w eb analytics tools, such as google analytics, assist libraries in interpreting their website usage statistics by formatting that data into reports and charts. the web services librarian at the kraemer family library at the university of colorado, colorado springs wanted to use website use data to reassess the library’s homepage that was crowded with redundant links. for example, all the links in the site’s dropdown navigation were repeated at the bottom of the homepage to make the links more noticeable to the user, but it unintentionally made the page long. to determine which links the web services librarian would recommend for removal, she needed to compare the use or clicks the repetitive links received. at the time, the library relied solely on google analytics to interpret website use data. however, this practice proved insufficient tabatha a. farney (tfarney@uccs.edu) is web services librarian, kraemer family library, university of colorado, colorado springs, colorado. 142 information technology and libraries | september 2011 libraries, outbound links include library catalogs or subscription databases. additional javascript tags must be added to each outbound link for google analytics to track that data.9 once google analytics recognizes the outbound links, their click data will be available in the in-page analytics report. visitors to that page, and outbound destinations, links that navigate visitors away from that webpage. the inbound sources and outbound destinations reports can track outbound links, which are links that have a different domain or url address from the website tracked within google analytics. for in-page analytics google analytics is a popular, comprehensive web analytics tool that contains a click analytics feature called in-page analytics (formerly site overlay) that visually displays click data by overlaying that information on the current webpage (see figure 1). site overlay was used during the library’s redesign process, however, it was replaced by in-page analytics in october 2010.7 the web services librarian reassessed the library’s homepage using in-page analytics, and found that the current tool resolved some of site overlay’s shortcomings. site overlay is no longer accessible in google analytics, so this paper will discuss in-page analytics. essentially, in-page analytics is an updated version of the site overlay (see figure 2). in addition to visually representing click data on a webpage, in-page analytics contains new features including the ability to easily segment data. web analytics expert, avinash kaushik, stresses the importance of segmenting website use data because it breaks down the aggregated data into specific data sets that represents more defined groups of users.8 rather than studying the total number of clicks a link received, an in-page analytics report can segment the data into specific groups of users, such as mobile device users. in-page analytics provides several default segments, but custom segments can also be applied allowing libraries to further filter the data that is constructive to them. in-page analytics also displays a complementing overview report of statistics located in a side panel next to the typical site overlay view. this overview report extracts useful data from other reports generated in google analytics without having to leave the in-page analytics report screen. the report includes the webpage’s inbound sources, also called top referrals, which are links from other webpages leading figure 1. screenshot of google analytics’ defunct site overlay figure 2. screenshot of google analytic’s in-page analytic librarians and technology skill acquisition: issues and perspectives | farney 143click analytics: visualizing website use data | farney 143 services librarian uses a screen capture tool, such as the firefox add-on screengrab13, to collect and archive the in-page analytics reports, but the process is clunky and results in the loss of the ability to segment the data. clickheat labsmedia’s clickheat is an open source heat mapping tool that visually displays the clicks on a webpage using color to indicate the amount of clicks an area receives. similar to in-page analytics, a clickheat heat map displays the current webpage and overlays that page with click data (see figure 3). instead of listing percentages or actual numbers of clicks, the heat map represents clicks using color. the warmer the color, such as yellows, oranges, or reds, the more clicks that area receives; the absence of color implies little to no click activity. each heat map has an indicator that outlines the number of clicks a color represents. a heat map clearly displays the heavily used and underused sections on a webpage making it easy for people with little experience interpreting website usage statistics to interpret the data. however, a heat map is not about exact numbers, but rather general areas of usage. for exact numbers, a traditional, comprehensive web analytics tool is required. clickheat can stand alone or be integrated into other web analytic tools.14 to have a more comprehensive web analytics product, the web services librarian opted to use the clickheat plugin for piwik, a free, open source web analytics tool that seeks to be an alternative to google analytics.15 by itself piwik has no click analytics feature, therefore clickheat is a useful plugin. both piwik and clickheat require access to a web server for installation and knowledge of php and mysql to configure them. because the kraemer family library does not maintain its own web servers, the pages, but it is time consuming and may not be worth the effort since the data are indirectly available.11 a major drawback to in-page analytics is that it does not discern between the same links listed in multiple places on a webpage. instead it tracks redundant links as one link, making it impossible to distinguish which repeated link received more use on the library’s homepage. similarly, the library’s homepage uses icons to help draw attention to certain links. these icons are linked images next to their counterpart text link. since the icon and text link share the same url, in-page analytics cannot reveal which is receiving more clicks. in-page analytics is useless for comparing repetitive links on a webpage, but google reports that they are working on adding this capability.12 as stated earlier, in-page analytics lays the click data over the current webpage in real-time, which can be both useful and limiting. using the current webpage allows libraries to navigate through their site while staying within the in-page analytics report. libraries can follow in the tracks of website users to learn how they interact with the site’s content and navigation. the downside is that it is difficult to compare a new version of a webpage with an older version since it only displays the current webpage. for example, the web services librarian could not accurately compare the use data between the old homepage and the revised homepage within the in-page analytics report because the newly redesigned homepage replaced the old page. comparing different versions of a webpage could help determine whether the new revisions improved the page or not. an archive or export feature would remedy this problem, but in-page analytics does not have this capacity. additionally, an export function would improve the ability to share this report with other librarians without having them login to the google analytics website. currently, the web evaluation of in-page analytics in-page analytics’ advanced segmenting ability far exceeds the old site overlay functionality. segmenting click data at the link level helps web managers to see how groups of users are navigating through a website. for example, in-page analytics can monitor the links mobile users are clicking, allowing web managers to track how that group of users are navigating through a website. this data could be used in designing a mobile version of a site. in-page analytics integrates a site overlay report and an overview report that contains selected web use statistics for an individual webpage. although the overview report is not in visual context with the site overlay view, it combines the necessary data to determine how a webpage is being accessed and used. this assists in identifying possible flaws in a website’s navigation, layout, or content. it also has the potential to clarify misleading website statistics. for instance, google analytics top exit pages report indicates the library’s homepage is the top exit page for the site. exit pages are the last page a visitor views before leaving the site.10 having a high exit rate could imply visitors were leaving the library’s site from the homepage and potentially missing a majority of the library’s online resources. using in-page analytics, it was apparent the library’s homepage had a high number of exits because many visitors clicked on outbound links, such as the library catalog, that navigated visitors away from the library’s website. rather than finding a potential problem, in-page analytics indicated that the homepage’s layout successfully led visitors to a desired point of information. while the data from the outbound links is available in the data overview report, it is not displayed within the site overlay view. it is possible to work around this problem by creating internal redirect 144 information technology and libraries | september 2011 the precise number of clicks is available in traditional web analytics reports. installing and configuring clickheat is a potential drawback for some libraries that do not have access to the necessary technology or staff to maintain it. even with access to a web server and knowledgeable staff, the web services librarian still experienced glitches implementing clickheat. she could not add clickheat to any high trafficked webpage because it created a slight, but noticeable, lag in response time to any page it was added. the cause was an out-of-box configuration setting that had to be fixed by the campus’ information technology department.17 another concern for libraries is that clickheat is continuously being developed with new versions or patches released periodically.18 like any locally installed software, libraries must plan for continuing maintenance of clickheat to keep it current. just as with in-page analytics, clickheat has no export or archive function. this impedes the web main navigation on the homepage and opted to use links prominently displayed within the homepage’s content. this indicated that either the users did not notice the main navigation dropdown menus or that they chose to ignore them. further usability testing of the main navigation is necessary to better understand why users do not utilize it. clickheat is most useful when combined with a comprehensive web analytics tool, such as piwik. since clickheat only collects data where visitors are clicking, it does not track other web analytics metrics, which limits its ability to segment the click data. currently, clickheat only segments clicks by browser type or screen resolution. additional segmenting ability would enhance this tool’s usefulness. for example, the ability to segment clicks from new visitors and returning visitors may reveal how visitors learn to use the library’s homepage. furthermore, the heat map report does not provide the actual number of clicks on individual links or content areas since heat maps generalize click patterns. web services librarian worked with the campus’ information technology department to install piwik with the clickheat plugin on a campus web server. once installed, piwik and clickheat generate javascript tags that must be added to every page that website use data will be tracked. although piwik and clickheat can be integrated, the tools work separately so two javascript tags must be added to a webpage to track click data in piwik as well as in clickheat. only the pages that contain the clickheat tracking script will generate heat maps that are then stored within the local piwik interface. evaluation of clickheat in-page analytics only tracks links or items that perform some sort of action, such as playing a flash video,16 but clickheat tracks clicks on internal links, outbound links, and even nonlinked objects, such as images. hence, clickheat is able to track clicks on the entire webpage. tracking non-linked objects was unexpectedly useful in identifying potential flaws in a webpage’s design. for instance, within a week of beta testing the library’s redesigned homepage, it was evident that users clicked on the graphics that were positioned closely to text links. the images were intended to draw the user’s attention to the text link, but instead users clicked on the graphic itself expecting it to be a link. to alleviate possible user frustration, the web services librarian added links to the graphics to take visitors to the same destinations as their companion text links. clickheat treats every link or image as its own separate component, so it has the ability to compare the same link listed in multiple places on the same page. unlike in-page analytics, clickheat was particularly helpful in analyzing which redundant links received more use on the homepage. in addition, the heat map also revealed that users ignored the site’s figure 3. screenshot of clickheat’s heat map report librarians and technology skill acquisition: issues and perspectives | farney 145click analytics: visualizing website use data | farney 145 clicks that area has received with the brighter colors representing the higher percentage of clicks. the plus signs can be expanded to show the total number of clicks an item has received, and this number can be easily filtered into eleven predefined allowing crazy egg to differentiate between the same link or image listed multiple times on a webpage. crazy egg displays this data in color-coded plus signs which are located next to the link or graphic it represents. the color is based on the percentage of services librarian’s ability to share the heat maps and compare different versions of a webpage. again, the web services librarian manually archives the heat maps using a screen capture tool, but the process is not the perfect solution. crazy egg crazy egg is a commercial, hosted click analytics tool selected for this project primarily for its advanced click tracking functionality. it is a fee-based service that requires a monthly subscription. there are several subscription packages based on the number of visits and “snapshots.” snapshots are webpages that are tracked by crazy egg. the kraemer family library subscribes to the standard package that allows up to twenty snapshots at one time with a combined total of 25,000 visits a month. to help manage how those visits are distributed, each tracked page can be assigned a specific number of visits or time period so that one webpage does not use all the visits early in the month. once a snapshot reaches its target number of visits or its allocated time period, it automatically stops tracking clicks and archives that snapshot within the crazy egg website.19 the snapshots convert the click data into three different click analytic reports: heat map, site overlay, and something called “confetti view.” crazy egg’s heat map report is comparable to clickheat’s heat map; they both use intensity of colors to show high areas of clicks on a webpage (see figure 4). crazy egg’s site overlay is similar to in-page analytics in that they both display the number of clicks a link receives (see figure 5). unlike in-page analytics, crazy egg tracks all clicks including outbound links as well as nonlinked content, such as graphics, if it has received multiple clicks. every clicked link and graphic is treated as its own separate entity, figure 4. screenshot of crazy egg’s heat map report figure 5. screenshot of crazy egg’s site overlay report 146 information technology and libraries | september 2011 to decide which redundant links to remove from the homepage. the confetti view report was useful for studying clicks on the entire webpage. segmenting this data allowed the web services librarian to identify click patterns on the webpage from a specific group. for example, the report revealed that mobile device users would scroll horizontally on the homepage to click on content, but rarely vertically. she also focused on the time to click segment, which reports how long it took a visitor to click on something, in the confetti view to identify links or areas that took users over half a minute to click. both segments provided interesting information, but further usability testing is necessary to better understand why mobile users preferred not to scroll vertically or why it took users longer to click on certain links. crazy egg also has the ability to archive its snapshots within its profile. this is useful for comparing different versions of a webpage to discover if the modifications were an improvement or not. one goal for the library’s homepage redesign was to shorten the page so users did not have to scroll evaluation of crazy egg crazy egg combines the capabilities of in-page analytcis and clickheat in one tool and expands on their abilities. it is not a comprehensive web analytics tool like google analytics or piwik, but rather is designed to specifically track where users are clicking. crazy egg’s heat map report is comparable to the one freely available in clickheat, however, its site overlay and confetti view reports are more sophisticated than what is currently available for free. the web services librarian found crazy egg to be a worthwhile investment during the library’s homepage redesign because it provided additional context to show how users were interacting with the library’s website. the site overlay facilitated the ability to compare the same link listed in multiple locations on the library’s homepage. not only could the web services librarian see how many clicks the links received, but she could also segment and compare that data to learn which links users were finding faster and which links new visitors or returning visitors preferred. this data helped her segments that include day of week, browser type, and top referring websites. custom segments may be applied if they are set up within the crazy egg profile. the confetti view report displays every click the snapshot recorded and overlays those clicks as colored dots on the snapshot as shown in figure 6. the color of the dot corresponds to specific segment value. the confetti view report uses the same default segmented values used in the site overlay report but here they can be further filtered into defined values for that segment. for example, the confetti view can segment the clicks by window width and then further filter the data to display only the clicks from visitors with window widths under 1000 pixels to see if users with smaller screen resolutions are scrolling down long webpages to click on content. this information is hard to glean from crazy egg’s site overlay report because it focuses on the individual link or graphic. the confetti view report focuses on clicks at the webpage level, allowing libraries to view usage trends on a webpage. crazy egg is a hosted service like google analytics, which means all the data are stored on crazy egg’s web servers and accessed through its website. implementing crazy egg on a webpage is a two-step process requiring the web manager to first set up the snapshot within the crazy egg profile and then add the tracking javascript tags to the webpage it will track. once the javascript tags are in place, crazy egg takes a picture of the current webpage and stores that as the snapshot on which to overlay the click data reports. since it uses a “snapshot” of the webpage, the website manager needs to retake a snapshot of the webpage if there are any changes to it. retaking the snapshot requires only a click of a button to automatically stop the old snapshot and regenerate a new one based on the current webpage without having to change the javascript tags. figure 6. screenshot of crazy egg’s confetti view report librarians and technology skill acquisition: issues and perspectives | farney 147click analytics: visualizing website use data | farney 147 website. next, she will explore ways to automate the process of sharing of website use data to make this information more accessible to other interested librarians. by sharing this information, the web services librarian hopes to promote informed decision making for the library’s web content and design. references 1. avinash kaushik, web analytics 2.0: the art of online accountability and science of customer centricity (indianapolis: wiley, 2010): 81–83. 2. laura b. cohen, “a two-tiered model for analyzing library website usage statistics, part 2: log file analysis,” portal: libraries & the academy 3, no. 3 (2003): 523–24. 3. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33, no. 4 (2005): 377–78. 4. marshall breeding, “an analytical approach to assessing the effectiveness of web-based resources,” computers in libraries, 28, no. 1 (2008): 20–22. 5. julie arendt and cassie wagner, “beyond description: converting web site statistics into concrete site improvement ideas,” journal of web librarianship 4, no. 1 (january 2010): 37–54; steven j. turner, “websites statistics 2.0: using google analytics to measure library website effectiveness,” technical services quarterly 27, no. 3 (2010): 261–278; wei fang and marjorie e. crawford, “measuring law library catalog web site usability: a web analytic approach,” journal of web librarianship 2, no. 2–3 (2008): 287–306. 6. ardent and wagner, “beyond description,” 51–52; andrea wiggins, “data-driven design: using web analytics to validate heuristics,” bulletin of the american society for information science and technology 33, no. 5 (2007): 20–21; elizabeth l. black, “web analytics: a picture of the academic library web site user,” journal of web librarianship 3, no. 1 (2009): 12–13. 7. trevor claiborne, “introducing in-page analytics: visual context for your analytics data,” google analytics blog, oct. 15, 2010, http://analytics.blogspot .com/2010/10/introducing-in-page-ana tracking abilities, however, all provide a distinct picture of how visitors use a webpage. by using all of them, the web services librarian was able to clearly identify and recommend the links for removal. in addition, she identified other potential usability concerns, such as visitors clicking on nonlinked graphics rather than the link itself. a major bonus of using click analytics tools is their ability to create easy to understand reports that instantly display where visitors are clicking on a webpage. no previous knowledge of web analytics is required to understand these reports. the web services librarian found it simple to present and discuss click analytics reports with other librarians with little to no background in web analytics. this helped increase the transparency of why links were targeted for removal from the homepage. as useful as click analytics tools are, they cannot determine why users click on a link, only where they have clicked. click analytics tools simply visualize website usage statistics. as elizabeth black reports, these “statistics are a trail left by the user, but they do not explain the motivations behind the behavior.”20 she concludes that additional usability studies are required to better understand users and their interactions on a website.21 libraries can use the click analytics reports to identify a problem on a webpage, but further usability testing will explain why there is a problem and help library web managers fix the issue and prevent repeating the mistake in the future. the web services librarian incorporated the use of in-page analytics, clickheat, and crazy egg in her web analytics practices since these tools continue to be useful to test the usage of new content added to a webpage. furthermore, she finds that click analytics’ straightforward reports prompted her to share website use data more often with fellow librarians to assist in other decisionmaking processes for the library’s down too much to get to needed links. by comparing the old homepage and the new homepage confetti reports in crazy egg, it was instantly apparent that the new homepage had significantly fewer clicks on its bottom half than the old version. furthermore, comparing the different versions using the time to click segment in the site overlay showed that placing the link more prominently on the webpage decreased the overall time it took users to click on it. crazy egg’s main drawback is that archived pages that are no longer tracking click data count toward the overall number of snapshots that can be tracked at one time. if libraries regularly retest a webpage, they will easily reach the maximum number of snapshots their subscription permits in a relatively short period. once a crazy egg subscription is cancelled data stored in the account is no longer accessible. this increases the importance of regularly exporting data. crazy egg is designed to export the heat map and confetti view reports. the direct export function takes a snapshot of the current report as it is displayed, and automatically converts that image into a pdf. exporting the heat map is fairly simple because the report is a single image, but exporting all the content in the confetti view report is more difficult because the report is based on segments of click data. each segment type would have to be exported in a separate pdf report to retain all of the content. in addition, there is no export option for the site overlay report so there is not an easy method to manage that information outside of crazy egg. even if libraries are actively exporting reports from crazy egg, data loss is inevitable. summary and conclusions closely examining in-page analytics, clickheat, and crazyegg reveals that each tool has different levels of click 148 information technology and libraries | september 2011 (2009): 81–84. 17. clickheat performance and optimization, labsmedia, http://www .labsmedia.com/clickheat/156894.html (accessed feb. 7, 2011). 18. clickheat, sourceforge, http:// sourceforge.net/projects/clickheat/files/ (accessed feb. 7, 2011). 19. crazy egg, http://www.crazyegg .com/, (accessed on mar. 25, 2011). 20. black, “web analytics,” 12. 21. ibid., 12–13. 13. screengrab, firefox add-ons, https://addons.mozilla.org/en-us/fire fox/addon/1146/ (accessed feb. 7, 2011). 14. clickheat, labsmedia, http:// www.labsmedia.com/clickheat/index .html (accessed feb. 7,2011). 15. piwik, http://piwik.org/ (accessed feb. 7, 2011). 16. paul betty, “assessing homegrown library collections: using google analytics to track use of screencasts and flash-based learning objects,” journal of electronic resources librarianship, 21, no. 1 lytics-visual.html (accessed feb. 7, 2011). 8. kaushik, web analytics 2.0, 88. 9. turner, “websites statistics 2.0,” 272–73. 10. kaushik, web analytics 2.0, 53–55. 11. site overlay not displaying outbound links, google analytics help forum, http://www.google.com/ support/forum/p/google+analytics/ thread?tid=39dc323262740612&hl=en (accessed feb. 7, 2011). 12. claiborne, “introducing in-page analytics.” 106 information technology and libraries | september 2009 michelle frisquepresident’s message michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. b y the time you read this column i will be lita president, however, as i write this i still have a couple of weeks left in my vice-presidential year. i have been warned by so many that my presidential year will fly by, and i am beginning to understand how that could be. i can’t believe i am almost done with my first year. i have enjoyed it and sometimes been overwhelmed by it—especially when i began the process of appointing lita volunteers to committees and liaison roles. i didn’t realize how many appointments there were to make. i want to thank all of the lita members who volunteered. you really helped make the appointment process easier. as a volunteer organization, lita relies on you, and once again many of you have stepped up. thank you. during the appointment process i was introduced to many lita members whom i had not yet met. i enjoyed being introduced to you virtually, and i look forward to meeting you in person in the coming year. i also want to thank the lita office. they were there whenever i needed them. without their assistance i would not have been able to successfully complete the appointment process. over the last year i have been working closely with this year’s lita emerging leaders, lisa thomas and holly tomren. i have really enjoyed the experience. their enthusiasm and energy is contagious. i wish every lita member could have been at this year’s lita camp in columbus, ohio, on may 8. during one of the lightning round sessions, lisa went to the podium and gave an impassioned speech about the benefits of belonging to a professional organization like lita. if there was a person in the audience that was not yet a lita member, i am sure they joined immediately afterward. she really captured the essence of why i became active in lita and why i continue to stay so involved in this organization so many years later. i can honestly say that as much as i have given to lita, i have received so much more in return. that is the true benefit of lita membership. over the last year, the lita board has had some great discussions with lita members and leaders. those conversations will continue as we start the work of drafting a new strategic plan. i want to create a strategic plan that will chart a meaningful path for the association and its members for the next several years. i want it to provide direction but also be flexible enough to adapt to changes in the information technology association landscape. as andrew pace mentioned in his last president’s message, changes will be coming. while we still aren’t sure exactly what those changes are, we know that it is time to seriously look at the current organizational structure of lita to make sure it best fits our needs today while continuing to remain flexible enough to meet our needs tomorrow. when i think of the organizational changes we are exploring, i can’t help but think of the houses i see on my favorite home improvement shows. lita has good bones. the structure and foundation are solid and well built, and as long as the house is well cared for, should last for years to come. however, like all houses, improvements need to be made over time to keep up with the market. the lita structure and foundation will be the same. when you drive up to the house you will still recognize the lita structure. when you walk in the door my hope is that you will still get that same homey feeling you had before, maybe with a few “oohs” and “aahs” thrown in as you notice the upgrades and enhancements. as the year progresses we will know more. i will use this column and other communication avenues to keep you informed of our plans and to gather your input. i would like to close my first column by thanking you for giving me this opportunity to serve you as the lita president. i am honored and humbled by the trust you have placed in me, and i am ready to start my presidential year. i hope it does not go by too quickly. i want to savor the experience. now let’s get started! smartphones: a potential discovery tool | starkweather and stoward 187 smartphones: a potential discovery tool wendy starkweather and eva stowers the anticipated wide adoption of smartphones by researchers is viewed by the authors as a basis for developing mobile-based services. in response to the unlv libraries’ strategic plan’s focus on experimentation and outreach, the authors investigate the current and potential role of smartphones as a valuable discovery tool for library users. w hen the dean of libraries announced a discovery mini-conference at the university of nevada las vegas libraries to be held in spring 2009, we saw the opportunity to investigate the potential use of smartphones as a means of getting information and services to students. being enthusiastic users of apple’s iphone, we and the web technical support manager, developed a presentation highlighting the iphone’s potential value in an academic library setting. because wendy is unlv libraries’ director of user services, she was interested in the applicability of smartphones as a tool for users to more easily discover the libraries’ resources and services. eva, as the health sciences librarian, was aware of a long tradition of pda use by medical professionals. indeed, first-year bachelor of science nursing students are required to purchase a pda bundled with select software. together we were drawn to the student-outreach possibilities inherent in new smartphone applications such as twitter, facebook, and myspace. n presentation our brief review of the news and literature about mobile phones in general provided some interesting findings and served as a backdrop for our presentation: n a total of 77 percent of internet experts agreed that the mobile phone would be “the primary connection tool” for most people in the world by 2020.1 the number of smartphone users is expected to top 100 million by 2013. there are currently 25 million smartphone users, with sales in north america having grown 69 percent in 2008.2 n smartphones offer a combination of technologies, including gps tracking, digital cameras, and digital music, as well as more than fifty-thousand specialized apps for the iphone and new ones being designed for the blackberry and the palm pre.3 the palm pre offered less than twenty applications at its launch, but one million apllication downloads had been performed by june 24, 2009, less than a month after launch.4 n the 2009 horizon report predicts that the time to adoption of these mobile devices in the educational context will be “one year or less.”5 data gathered from campus users also was presented, providing another context. in march 2009, a survey of university of california, davis (uc-davis) students showed that 43 percent owned a smartphone.6 uc-davis is participating in apple’s university education forum. here at unlv, 37 percent of students and 26 percent of faculty and staff own a smartphone.7 the presentation itself highlighted the mobile applications that were being developed in several libraries to enhance student research, provide library instruction, and promote library services. two examples were abilene christian university (http://www.acu.edu/technology/ mobilelearning/index.html), which in fall 2008 distributed iphones and ipod touches to the incoming freshman class; and stanford university (http://www.stanford .edu/services/wirelessdevice/iphone/) which participates in “itunes u” (http://itunes.stanford.edu/). if the libraries were to move forward with smartphone technologies, it would be following the lead of such universities. readers also may be interested in joan lippincott’s recent concise summary of the implications of mobile technologies for academic libraries as well as the chapter on library mobile initiatives in the july 2008 library technology report.8 n goals: a balancing act ultimately the goal for many of these efforts is to be where the users are. this aspiration is spelled out in unlv libraries’ new strategic plan relating to infrastructure evolution, namely, “work towards an interface and system architecture that incorporates our resources, internal and external, and allows the user to access from their preferred starting point.”9 while such a goal is laudable and fits very well into the discovery emphasis of the mini-conference presentation, we are well aware of the need for further investigation before proceeding directly to full-scale development of a complete suite of mobile services for our users. of critical importance is ascertaining where our users are and determining whether they want us to be there and in what capacity. the value of this effort is demonstrated in booth’s research report on student interest in emerging technologies at ohio state university. the report includes the results of an extensive environmental survey of their wendy starkweather (wendy.starkweather@unlv.edu) is director, user services division, and eva stowers (eva.stowers @unlv.edu) is medical/health sciences librarian at the university of nevada las vegas libraries. 188 information technology and libraries | december 2009 library users. the study is part of ohio state’s effort to actualize their culture of assessment and continuous learning and to use “extant local knowledge of user populations and library goals” to inform “homegrown studies to illuminate contextual nuance and character, customization that can be difficult to achieve when using externally developed survey instruments.”10 unlv libraries are attempting to balance early experimentation and more extensive data-driven decision-making. the recently adopted strategic plan includes specific directions associated with both efforts. for experimentation, the direction states, “encourage staff to experiment with, explore, and share innovative and creative applications of technology.”11 to that end, we have begun working with our colleagues to introduce easy, small-scale efforts designed to test the waters of mobile technology use through small pilot projects. “text-a-librarian” has been added to our existing group of virtual reference service, and we introduced a “text the call number and record” service to our library’s opac in july 2009. unlv libraries’ strategic plan helps foster the healthy balance by directing library staff to “emphasize data collection and other evidence based approaches needed to assess efficiency and effectiveness of multiple modes and formats of access/ownership” and “collaborate to educate faculty and others regarding ways to incorporate library collections and services into education experiences for students.”12 action items associated with these directions will help the libraries learn and apply information specific to their users as the libraries further adopt and integrate mobile technologies into their services. as we begin our planning in earnest, we look forward to our own set of valuable discoveries. references 1. janna anderson and lee rainie, the future of the internet iii, pew internet & american life project, http://www.pewinternet .org/~/media//files/reports/2008/pip_futureinternet3.pdf .pdf (accessed july 20, 2009). 2. sam churchill, “smartphone users: 110m by 2013,” blog entry, mar. 24, 2009, dailywireless.org, http://www.daily wireless.org/2009/03/24/smartphone-users-100m-by-2013 (accessed july 20, 2009). 3. mg siegler, “state of the iphone ecosystem: 40 million devices and 50,000 apps,” blog entry, june 8, 2009, tech crunch, http://www.techcrunch.com/2009/06/08/40-million-iphones -and-ipod-touches-and-50000-apps (accessed july 20, 2009). 4. jenna wortham, “palm app catalog hits a million downloads,” blog entry, june 24, 2009, new york times technology, http://bits.blogs.nytimes.com/2009/06/24/palm-app-cataloghits-a-million-downloads (accessed july 20, 2009). 5. larry johnson, alan levine, and rachel smith, horizon report, 2009 edition (austin, tex.: the new media consortium, 2009), http://www.nmc.org/pdf/2009-horizon-report.pdf (accessed july 20, 2009). 6. university of california, davis. “more than 40% of campus students own smartphones, yearly tech survey says,” technews, http://technews.ucdavis.edu/news2.cfm?id=1752 (accessed july 20, 2009). 7. university of nevada las vegas, office of information technology, “student technology survey report: 2008– 2009,” http://oit.unlv.edu/sites/default/files/survey/survey results2008_students3_27_09.pdf (accessed july 20, 2009). 8. joan lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl bi-monthly report 261 (dec. 2008), http://www.arl.org/bm~doc/arl-br-261-mobile .pdf. (accessed july 20, 2009); ellyssa kroski, “library mobile initiatives,” library technology reports 44, no. 5 (july 2008): 33–38. 9. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 20, 2009): 2. 10. char booth, informing innovation: tracking student interest in emerging library technologies at ohio university (chicago: association of college and research libraries, 2009), http:// www.ala.org/ala/mgrps/divs/acrl/publications/digital/ ii-booth.pdf (accessed july 20, 2009); “unlv libraries strategic plan 2009–2011,” 6. 11. “unlv libraries strategic plan 2009–2011,” 2. 12. ibid. editor’s note: we have an excellent editorial board for this journal and with this issue we’ve decided to begin a new column. in each issue of ital, one of our board members will reflect on some question related to technology and libraries. we hope you find this new feature thought-provoking. enjoy! any librarian who has been following the profes-sional literature at all in the past ten years knows that there has been an increasing emphasis on user-centeredness in the design and creation of library services. librarians are trying to understand and even anticipate the needs of users to a degree that’s perhaps unprecedented in the history of our profession. it’s no mystery as to why. we now live in a world where global computer networks link users directly with information in such a way that often, no middleman is required. users are exploring information on their own terms, at their own convenience, sometimes even using technologies and systems that they themselves have designed or contributed to. at the same time, most libraries are feeling a financial pinch. resources are tight, and local governments, institutions of higher education, and corporations are all scrutinizing their library operations more closely, asking “what have you done for me lately?” the unspoken coda is “it better be something good, or i’m cutting your funding.” the increasing need to justify our existence, together with our desire to build more relevant services, is driving an increased interest in assessment. how do we know when we’ve built a successful service? how do we define “success?” and, perhaps most importantly, in a world filled with technologies that are “here today, gone tomorrow,” how do we decide which ones are appropriate to build into enduring and useful services? as a library technologist, it’s this last question that concerns me the most. i’m painfully aware of how quickly new technologies develop, mature, and fade silently into that good night with nary a trace. it’s like watching protozoa under a microscope. which of these can serve as the foundation for real, useful services? it’s obvious to me that if i’m going to choose well, it’s vital that i place these services in context—and not my context, the user context. in order to do that, i need to understand the users. how do they do their work? what are they most concerned with? how do they think about the library in relation to the research process? how do they use technology as part of that process? how does that process fit into the larger context of the assignment? to answer questions like these, librarians often turn to basic marketing techniques such as the survey or the focus group. whether we are aware of it or not, the emphasis on user-centered design is making librarians into marketers. this is a new role for us, and one that most of us have not had the training to cope with. since most of us haven’t been exposed to marketing as a discipline of study, we don’t think of what we do as marketing, even when we use marketing techniques. but that’s what it is. so whether we know it or not, marketing, particularly market research, is important to us. marketing as a discipline is in the process of undergoing some major changes right now. recent research in sociology, psychology, and neuroscience has uncovered some new and often startling insights into how human beings think and make decisions. marketers are struggling to incorporate these new models into their research methods, and to change their own thinking about how they discover what people want. i recently collided with this change when my own library decided to do a focus group to help us redesign our website. since we have a school of business, i asked one of our marketing professors for help. her advice? don’t do it. as she put it: “you and the users would just be trading ignorances.” she then gave me a reading list, which included how customers think by gerald zaltman, which i now refer to as “the book that made marketing sexy.”1 zaltman’s book pulls together a lot of the recent research on how people think, make choices, and remember. some of it is pretty mind-blowing: n 95% of human reasoning is unconscious. it happens at a level we are barely aware of. n we think in images much more than we do in language n social context, emotion, and reason are all involved in the decision-making process. without emotion, we literally are unable to make choices. n all human beings use metaphors to explain and understand the world around them. metaphor is the bridge between the rational and emotional parts of the decision-making process. n memory is not a collection of immutable snapshots we carry around in our heads. it’s much more like a narrative or story—one that we change just by remembering it. our experience of the past and present are inextricably linked—one is constantly influencing the other. heady stuff. if you follow many of these ideas to their logical conclusions, you end up questioning the value of many traditional marketing techniques, such as surveys and focus groups. for example, if the social context in 4 information technology and libraries | june 2008 kyle felker (felkerk@wlu.edu) is an ital editorial board member, 2007–09, and technology coordinator at washington and lee university library in lexington, virginia. editorial board thoughts kyle felker ital board member’s column | felker 5 which a decision is made is important, then surveys are often going to yield false data, since the context in which the person is deciding to tick off this or that box is very different from the context in which they actually decide to use or not use your service or product. asking users “what services would be useful” in a focus group won’t be effective because you are only interviewing the users’ rational thought process—it’s at least as important to find out how they feel about the service, your library, the task itself, and how they perceive other people’s feelings on the subject. zaltman proposes a number of very different marketing techniques to get a more complete picture of user decision making: n use lengthy, one-on-one interviews. interviewing the unconscious is tricky and takes trust, it’s something you can’t do in a traditional focus group setting. n use images. we think in images, and images are a richer field for bringing unconscious attitudes to the surface. n use metaphor. invite interviewees to describe their feelings and experiences in metaphor. explore the metaphors they come up with to more fully understand all the context. if this sounds more like therapy than marketing to you, then your initial reaction is pretty similar to mine. but the techniques follow logically from the research zaltman presents. how many of us have done user assessment and launched a new service, only to find a less than warm reception for it? how many of us have had users tell us they want something, only to see it go unused when it’s implemented? zaltman’s model offers potential explanations for why this happens, and methods for avoiding it. lest you think this has nothing to do with technology, let me offer an example: library facebook/myspace profile pages. there’s been a lot of debate on how effective and appropriate these are. it seems to me that we can’t gauge how receptive users are to this unless we understand how they feel about and think about those social spaces. this is exactly the sort of insight that new marketing techniques purport to offer us. in fact, if the research is right, and there is a social and emotional component to every choice a person makes, then that applies to every choice a user makes with regard to the library, whether it’s the choice to ask a question at the reference desk, the choice to use the library website, or the choice to vote on a library bond issue. librarians are doing a lot of things we never imagined we’d ever need or want to do. web design. archival digitization. tagging. perhaps it’s also time to acknowledge that what we do has an important marketing component, and to think of ourselves as marketers (at least part time). i’m sold enough on zaltman’s ideas that i’m willing to try them out at my own institution, and i encourage you to do the same. reference 1. zaltman, gerald. how customers think: essential insights into the mind of the market (boston, mass.: harvard business school press, 2003.) 158 information technology and libraries | december 2009 michelle frisquepresident’s message i know the president’s message is usually dedicated to talking about where lita is now or where we are hoping lita will be in the future, but i would like to deviate from the usual path. the theme of this issue of ital is “discovery,” and i thought i would participate in that theme. like all of you, i wear many hats. i am president of lita. i am head of the information services department at the galter health sciences library at northwestern university. i also am a new part-time student in the masters of learning and organizational change program at northwestern university. as a student and a practicing librarian, i am now on both sides of the discovery process. as head of the information systems department, i lead the team that is responsible for developing and maintaining a website that assists our health-care clinicians, researchers, students, and staff with selecting and managing the electronic information they need when they need it. as a student, i am a user of a library discovery system. in a recent class, we were learning about the burkelitwin causal model of organization performance and change. the article we were reading described the model; however, it did not answer all of my questions. i thought about my options and decided i should investigate further. before i continue, i should confess that, like many students, i was working on this homework assignment at the last minute, so the resources had to be available online. this should be easy, right? i wanted to find an overview of the model. i first tried the library’s website using several search strategies and browsed the resources in metalib, the library catalog, and libguides with no luck. the information i found was not what i was looking for. i then tried wikipedia without success. finally, as a last resort, i searched google. i figured i would find something there, right? i didn’t. while i found many scholarly articles and sites that would give me more information for a fee, none of the results i reviewed gave me an overview of the model in question. i gave up. the student in me thought: it should not be this hard! the librarian in me just wanted to forget i had ever had this experience. this got me to thinking: why is this so hard? libraries have “stuff” everywhere. we access “stuff,” like books, journals, articles, images, datasets, etc., from hundreds of vendors and thousands of publishers who guard their stuff and dictate how we and our users can access that stuff. that’s a problem. i could come up with a million other reasons why this is so difficult, but i won’t. instead, i would like to think about what could be. in this same class we learned about appreciative inquiry (ai) theory. i am simplifying the theory, but the essence of ai is to think about what you want something to be instead of identifying the problems of what is. i decided to put ai to the test and tried to come up with my ideal discovery process. i put both my student and librarian hats on, and here is what i have come up with so far: n i want to enter my search in one place and search once for what i need. i don’t want to have to search the same terms many times in various locations in the hopes one of them has what i am looking for. i don’t care where the stuff is or who provides the information. if i am allowed to access it i want to search it. n i want items to be recommended to me on the basis of what i am searching. i also want the system to recommend other searches i might want to try. n i want the search results to be organized for me. while perusing a result list can be loads of fun because you never know what you might find, i don’t always have time to go through pages and pages of information. n i want the search results to be returned to me in a timely manner. n i want the system to learn from me and others so that the results list improves over time. n i want to find the answer. i’m sure if i had time i would come up with more. while we aren’t there yet, we should continually take steps—both big and small—to perfect the discovery process. i look forward to reading the articles in this issue to see what other librarians have discovered, and i hope to learn new things that will bring us one step closer to creating the ultimate discovery experience. michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, northwestern university, chicago. : | wang 81building an open source institutional repository at a small law school library | wang 81 fang wangcommunications v700 flatbed scanner, which was recommended by many digitization best practices in texas. for software, we had all the important basics such as ocr and image editing software for the project to start. for the following several months, i did extensive research on what digital asset management platform would be the best solution for the law library. we had options to continue displaying the digital collections through webpages or use a digital asset management platform that would provide long-term preservation as well as retrieval functions. we made the decision to go with the latter. generally speaking, there are two types of digital asset management platforms: proprietary and open source. in some rare occasions, a library chooses to develop its own system and not to use either type of the platforms if the library has designated programmers. there are pros and cons to both proprietary and open source platforms. although setting up the repository is fairly quick and easy on a proprietary platform, it can be very expensive to pay annual fees for hosting and using the service. for the open source software, it may appear to be “free” up front; however, installing and customizing the repository can be very time consuming and these solutions often lack technical and development support. there is no uniform rule for choosing a platform. it depends on what the organization wants to achieve and its own unique circumstances. i explored several popular proprietary platforms such as contentdm and digital commons. contentdm is an oclc product, which has a lot of capability and is especially good for displaying image collections. digital commons is owned of the repository is ongoing; it is valuable to share the experience with other institutions who wish to set up an institutional repository of their own and also add to the knowledgebase of ir development. institutional repository from the ground up unlike most large university libraries, law school libraries are usually behind on digital initiative activities because of smaller budgets, lack of staff, and fewer resources. although institutional repositories have already become a trend for large university libraries, it still appears to be a new concept for many law school libraries. at the beginning of 2009, i was hired as the digital information management librarian to develop a digital repository for the law school library. when i arrived at texas tech university law library, there was no institutional repository implemented. there were very few digital projects done at the law library. one digital collection was of faculty scholarship. this collection was displayed on a webpage with links to pdf files. another digital project, to digitize and provide access to the texas governor executive orders found in the texas register, was planned then disbanded because of the previous employee leaving the position. i started by looking at the digitization equipment in the library. the equipment was very limited: a very old and rarely used book scanner and a sheet-fed scanner. the good thing was that the library did have extra pcs to serve as workstations. i did research on the book scanner we had and also consulted colleagues i met at various digital library conferences about it. because the model is very outdated and has been discontinued by the vendor and thus had little value to our digitization project, i decided to get rid of the scanner. i then proposed to purchase an epson perfection building an open source institutional repository at a small law school library: is it realistic or unattainable? digital preservation activities among law libraries have largely been limited by a lack of funding, staffing and expertise. most law school libraries that have already implemented an institutional repository (ir) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. the texas tech university school of law digital repository is one of the few law school repositories in the nation that is built on the dspace open source platform.1 the repository is the law school’s first institutional repository in history. it was designed to collect, preserve, share and promote the law school’s digital materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. in addition, the repository also serves as a dark archive to house internal records. i n this article, the author describes the process of building the digital repository from scratch including hardware and software, customization, collection development, marketing and outreach, and future projects. although the development fang wang (fang.wang@ttu.edu) is digital information management librarian, texas tech university school of law library, lubbock, texas. 82 information technology and libraries | june 2011 two months later, we discovered that a preconfigured application called jumpbox for dspace was released and approved to be a much easier solution for the installation. the price was reasonable too, $149 a year (the price has jumped quite a bit since then). however, using jumpbox would leave our newly purchased red hat linux server of no use because jumpbox runs on ubuntu, therefore after some discussion we decided not to pursue it. we were a little stuck in the installation process. outsourcing the installation seemed to be a feasible solution for us at this point. we identified a reputable dspace service provider after doing extensive research including comparing vendors, obtaining references, and pursuing other avenues. after obtaining a quote, we were quite satisfied with the price and decided to contract with the vendor. while waiting for the contract to be approved by the university contracting office, i began designing the look and feel that is unique to the ttu school of law with some help from another library staff member. the installation finally took place at the beginning of january 2010. i worked very closely with the service provider during the installation to ensure the desired configuration for our dspace instance. our repository site with the ttu law branding became accessible to the public three days later. and with several weeks of warranty, we were able to adjust several configurations including display thumbnails for images. overall, we are very pleased with the results. after the installation, our it department maintains the dspace site and we host all the content on our own server. collection development of the ir content is the most critical element to an institutional repository. while we were waiting for our it department 66, the majority of the repositories worldwide were created using the dspace platform.2 for the installation, we looked at the opportunity to use services provided by the state digital library consortium texas digital library (tdl) and tried to pursue a partnership with the main university library, which had already implemented a digital repository. however, because of financial reasons and separate budgets, those approaches did not work out. so we decided to have our own it department install dspace. installation and customization of our dspace unlike large university libraries, smaller special libraries face many challenges while trying to establish an open source repository. after making the decision to use dspace, the first challenge we faced was the installation. dspace runs on postgresql or oracle and requires a server installation. customizing the web interface requires either the jspui (javaserver pages user interface) or xmlui (extensible markup language user interface). the staff in our it department knew little about dspace. however, another special library on campus offered their installation notes to our system administrator because they just installed dspace. although dspace runs on a variety of operating systems, we purchased red hat enterprise linux after some testing because it is the recommended os for dspace. then our system administrator spent several months trying to figure out how to install the software in addition to his existing projects. because we did not have dedicated it personnel working on the installation, the work was often interrupted and very difficult to complete. our it staff also found it very difficult to continue with the installation because the software requires a lot of expertise. by berkley press and is often used in the law library community. as a smaller law library, our budget did not allow us to purchase those platforms, which require annual fees of more than $10,000. so we had to look at the open source options. for the open source platforms, i investigated dspace, fedora, eprints and green stone. dspace is a javabased system developed by mit and hp labs. it offers a communitiescollections model and has built-in submission workflows and long-term preservation function. it can be installed “out of the box” and is easy to use. it has been widely adopted as institutional repository software in the united states and worldwide. fedora was also developed in the united states. it is more of a backend software with no web-based administration tools and requires a lot of programming effort. similar to dspace, eprints is another easy to set up and use ir software developed in the u.k. it is written in perl and is more widespread in europe. greenstone is a tool developed in new zealand for building and distributing digital library collections. it provides interfaces in 35 languages so it has many international users. when choosing an ir platform, it is not a question of which software is superior to others but rather which is more appropriate for the purpose and the content of the repository. our goal was to find a platform that had low costs and did not involve much programming. we also wanted a system that was capable of archiving digital items in various formats for the long term, flexible for data migration, had a widely accepted metadata scheme, decent search capability, and was easy to use. another factor we had to consider was the user base. because open source software relies on the user themselves for technical support for the most part, we wanted a software that had an active user community in the united states. dspace seemed to satisfy all of our needs. also, according to repository : | wang 83building an open source institutional repository at a small law school library | wang 83 hosted by the lubbock county bar association at the ttu law school. we made the initial announcement to the law faculty and staff and later to the lubbock county bar about the new digital initiative service we have established. we received very positive feedback from the law community. professor edgar’s family was delighted to see his collection made available to the public. following the success of the initial launch, i developed an outreach plan to promote the digital repository. to make the repository site more visible, several efforts were made: the repository site url was submitted to the dspace user registry, the directory of open access repositories (opendoar), and registry of open access repositories (roar); the site was registered with google webmaster tools for better indexing; and the repository was linked to several websites of the law school and library. the “faculty scholarship” collection and the “texas governor executive orders” collection became available shortly after. i then developed a poster of the newly established digital repository and presented it at the texas conference on digital libraries held at university of texas austin in may 2010. currently, our digital repository has more than eight hundred digital items as of august 2010. with more and more content becoming available in the repository, we plan on making an official announcement to the law community. we will also make entering first-year law students aware of the ir by including an article about the new repository in the library newsletter that is distributed to them during their orientation. our future marketing plan includes sending out announcements of new collections to the law school using our online announcement system techlawannounce and promoting the digital repository through the law library social networking pages on facebook and twitter. we also plan reviewed each year. based on the collection development policy, we made a decision to migrate the content of the old “faculty scholarship” collection from webpages into the digital repository. it was intended to include all publications of the texas tech law school faculty in the collection. we then hired a second-year law student as the digital project assistant and trained him on scanning, editing, and ocr-ing pdf files; uploading files to dspace; and creating basic metadata. we also brought another two student assistants on board to help with the migration of the faculty scholarship collection. the faculty services librarian checked the copyright with faculty members and publishers while i (the digital information management librarian) served as the repository manager handling more complicated metadata creation, performing quality control over student submissions, and overseeing the whole project. later development and promoting the ir during the faculty scholarship migration process, we discovered a need to customize dspace to allow active urls for publications. we wanted all the articles linked to three widely used legal databases: westlaw, lexisnexis, and hein online. because the default dspace system does not support active urls, it requires some programming effort to make the system detect a particular metadata field then render it as a clickable link. we outsourced the development to the same service provider who installed dspace for us. the results were very satisfying. the vendor customized the system to allow active urls and displayed the links as clickable icons for each legal database. in april 2010, “professor j. hadley edgar ’s personal papers” collection was made available in conjunction with his memorial service, to install dspace, we prepared and scanned two collections: the “texas governor executive orders” collection and the “professor j. hadley edgar’s personal papers” collection. the latter was a collection donated by professor edgar’s wife after he passed away in 2009. professor edgar taught at the law school from 1971 to 1991. he was named the robert h. bean professor of law and was twice voted by the student body as the outstanding law professor. the collection contains personal correspondence, photos, newspaper clippings, certificates, and other materials. many of the items have a high historic value to the law school. for the scanning standards, we used 200 dpi for text-based materials and 400 dpi for pictures. we chose pdf as our production file format as it is a common document format and smaller in size to download. after the installation was completed at the beginning of january, i drafted and implemented a digital repository collection development policy shortly after to ensure proper procedures and guidance of the repository development. the policy includes elements such as the purpose of the repository, scope of the collections, selection criteria and responsibilities, editorial rights, and how to handle challenges and withdrawals. i also developed a repository release form to obtain permissions from donors and authors to ensure open access for the materials in the repository. twelve collections were initially planned for the repository: “faculty scholarship,” “personal manuscripts,” “texas governor executive orders,” “law school history,” “law library history,” “regional legal history,” “law student works,” “audio/ video collection,” “dark archive,” “electronic journals,” “conference, colloquium and symposium,” and “lectures and presentations.” there will be changes to the collections in the future as the digital repository collection development policy will be 84 information technology and libraries | june 2011 all roads lead to rome. no matter what platform you choose, whether open source or not, the goal is to pick a system that best suits your organization’s needs. to build a successful institutional repository is not simply “scanning” and “putting stuff online.” various factors need to be considered, such as digitization, ir platform, collection development, metadata, copyright issues, and marketing and outreach. our experience has proven that it is possible for a smaller special library with limited resources and funding to establish an open source ir such as dspace and continue to maintain the site and build the collections with success. open source software is certainly not “free” because it requires a lot of effort. however, in the end it still costs a lot less than what we would pay to the proprietary software vendors. references 1. “the texas tech university school of law digital repository,” http://reposi tory.law.ttu.edu/ (accessed apr. 5, 2011). 2. “repository maps,” accessed http://maps.repository66.org/ (accessed aug. 16, 2010). (ssrn) links to individual articles in the faculty scholarship collection. after that, the next collections we will work on are the law school and law library history materials. we also plan to do some development on the dspace authentication to integrate with the ttu “eraider” system to enable single log-in. in the future, we want to explore the possibilities of setting up a collection for the works of our law students and engage in electronic journal publishing using our digital repository. conclusion it is not an easy task to develop an institutional repository from scratch, especially for a smaller organization. installation and development are certainly a big challenge for a smaller library with limited number of it staff. outsourcing these needs to a service provider seems to be a feasible solution. another challenge is training. we overcame this challenge by taking advantage of the state consortium’s dspace training sessions. subscribing to the dspace mailing list is necessary as it is a communication channel for dspace users to ask questions, seek help, and keep up to date about the software. on hosting information sessions for our law faculty and students to learn more about the digital repository. future projects there is no doubt that our digital repository will grow significantly because we have exciting collections planned for future projects. one of our law faculty, professor daniel benson, donated some of his personal files from an eight-year litigation representing the minority plaintiffs in the civil rights case of jones v. city of lubbock, 727 f. 2d 364 (5th cir. 1984) in which the minority plaintiffs won the case. the lawsuit changed the city of lubbock’s election system for city council members from the “at large” method to the “single member district system,” which allowed the minority candidates consistently being elected. this collection contains materials, notes, memoranda, letters, and other documents prepared and utilized by the plaintiffs’ attorneys. it has significant historical value because a texas tech law professor and five texas tech law graduates participated in that case successfully as pro bono attorneys for the minority plaintiffs. in addition, we plan on adding social science research network president’s message: open access/open data colleen cuddy information technologies and libraries | march 2012 1 i am very excited to write this column. this issue of information technology and libraries (ital) marks the beginning of a new era for the journal. ital is now an open-access, electronic-only journal. there are many people to thank for this transition. the lita publications committee led by kristen antelman did a thorough analysis of publishing options and presented a thoughtful proposal to the lita board; the lita board had the foresight to push for an open-access journal even if it might mean a temporary revenue loss for the division; bob gerrity, ital editor, has enthusiastically supported this transition and did the heavy lifting to make it happen; and the lita office staff worked tirelessly for the past year to help shepherd this project. i am proud to be leading the organization during this time. to see ital go open access in my presidential year is extremely gratifying. as cliff lynch notes in his editorial, “the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer.” as librarians challenge publishers to pursue open-access venues, myself included, i am relieved to no longer be a hypocrite. by supporting open access we are sending a strong message to the community that we believe in the benefits of open access and we encourage other library organizations to do the same. ital will now reach a much broader and larger audience. this will benefit our authors, the organization, and the scholarship of our profession. i understand that while our members embrace open access, not everyone is pleased with an online-only journal. the number of new journals being offered electronically only is growing and i believe we are beginning to see a decline in the dual publishing model of publishers and societies offering both print and online journals. my library has been cutting back consistently on print copies of journals and this year will get only a handful of journals in print. personally, i have embraced the electronic publishing world. in fact, i held off on subscribing to the new yorker until it had an ipad subscription model! i estimate that i read 95 percent of my books and all of my professional journals electronically. the revolution has happened for me and for many others. i know that our membership will adapt and transition their ital reading habits to our new electronic edition and i look forward to seeing this column and the entire journal in its new format. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 earlier this week saw the research works act die. librarians and researchers across the country celebrated this victory as we preserved an important open-access mandate requiring the deposition of research articles funded by the national institutes of health into pubmed central. this act threatened not just research but the availability of health information to patients and their families. as librarians, we still need to be vigilant about preserving open access and supporting open-access initiatives. i would like to draw your attention to the federal research public access act (frpaa, hr 4004). this act was recently introduced in the house, with a companion bill in the senate. as described by the association of research libraries, frppa would ensure free, timely, online access to the published results of research funded by eleven u.s. federal agencies. the bill gives individual agencies flexibility in choosing the location of the digital repository to house this content, as long as the repositories meet conditions for interoperability and public accessibility, and have provisions for long-term archiving. the legislation would extend and expand access to federally-funded research resources and, importantly, spur and accelerate scientific discovery. notably, this bill does not take anything away from publishers. no publisher will be forced to publish research under the bill’s provisions; any publisher can simply decline to publish the material if it feels the terms are too onerous. i encourage the library community to contact their representatives to support this bill. open access and open data are the keystones of e-science and its goals of accelerating scientific discovery. i hope that many of you will join me at the lita president’s program on june 24, 2012, in anaheim. tony hey, corporate vice president of microsoft research connections and former director of the u.k.'s e-science initiative, and clifford lynch, executive director of the coalition for networked information, will discuss data-intensive scientific discovery and its implications for libraries, drawing from the seminal work the fourth paradigm. librarians are beginning to explore our role in this new paradigm of providing access to and helping to manage data in addition to bibliographic resources. it is a timely topic and one in which librarians, due to our skill set, are poised to take a leadership role. reading the fourth paradigm was a real game changer for me. it is still extremely relevant. you might consider reading a chapter or two prior to the program. it is an open-access e-book available for download from microsoft research (http://research.microsoft.com/en-us/collaboration/fourthparadigm/). i keep a copy on my ipad, right there with downloaded ital article pdfs. http://www.arl.org/pp/access/frpaa-2012.shtml http://research.microsoft.com/en-us/collaboration/fourthparadigm/ introducing zoomify image | smith 25 column title editor author id box for 3 column layout communications “just in case” answers: the twenty-first-century vertical file | dalrymple 25 tam dalrymple “just-in-case” answers: the twenty-first century vertical file this article discusses the use of oclc’s questionpoint service for managing electronic publications and other items that fall outside the scope of oclc library’s opac and web resources pages, yet need to be “put somewhere.” the local knowledge base serves as both a collection development tool and as a virtual vertical file, with records that are easy to enter, search, update, or delete. we do not deliberately collect for the vertical file, but add to it day by day the useful thing which turns up. these include clippings from newspapers, excerpts from periodicals . . . broadsides that are not injured by folding . . . anything that we know will be used if available. —wilson bulletin, 1919 i nformation that “will be used if available” sounds like the contents of the internet.1 as with libraries everywhere, the oclc library has come to depend on the internet as an almost limitless resource. and like libraries everywhere, it has confronted the advantages and disadvantages of that scope. this means that in addition to using the opac and oclc library’s webpages, oclc library staff have used a mix of bookmarks, del.icio.us tags, and post-it® notes to keep track of relevant, authoritative, substantive, and potentially reusable information. much has been written about the use of questionpoint’s transaction management capabilities and of the important role of knowledge bases in providing closure to an inquiry. in contrast, this article will look at questionpoint’s use as a management tool for future questions, for items that fall outside the scope of oclc library’s opac and web resources pages yet need to be “put somewhere.” the questionpoint local knowledge base is just the spot for these new vertical file items. about oclc library oclc is the world’s largest nonprofit membership computer library service and research organization. more than 69,000 libraries in 112 countries and territories around the world use oclc services to locate, acquire, catalog, lend, and preserve library materials. oclc library was established in 1977 to provide support for oclc’s mission. the collection concentrates on library, information and computer sciences, business management, and has special collections that include the papers of frederick g. kilgour and archives of the dewey decimal classification™. oclc library has a distinct clientele to which it offers a complete range of services—print and electronic collections, reference, interlibrary loan—within its subject areas. because of the nature of the organization, the library supports longterm and collaborative research, such as that done by oclc programs and research staff, as well as the immediate information needs of product management and marketing staff. oclc library also provides information to oclc’s other service areas, such as finance and human resources. while most oclc library acquisitions are done on demand, oclc library selects and maintains an extensive collection of periodicals, journals, and reference resources, most of them online and accessible—along with the opac—to oclc employees worldwide from the library’s webpages (see figure 1). often, however, oclc staff, like those of many organizations, are too busy to consult these resources themselves and thus depend on the library. oclc library staff pursue the answers to such research questions through its collections and look to enhance the collections with “anything that we know will be” of use. one of the challenges is keeping track of the “anything” that falls outside the library’s primary collections scope; questionpoint helps with that task. traditional uses of questionpoint questionpoint is a service that provides question management tools aimed at increasing the visibility of reference services and making them more efficient. oclc library uses many of those tools, but there are significant ones it does not use (for example, chat). and although the library’s questionpoint-based aska link is visible by default on the front page of the corporate intranet as well as on oclc library–specific pages, less than than 8 percent of questions over the last year were received through that link. one reason for this low use may be that for most of oclc library’s history, e-mail has been the primary contact method, and so it remains. even when the staff need clarification of a question, they automatically opt for telephone or e-mail messaging. working with a web form and question-and-answer software has not caught on as a replacement for these more established methods. however, questionpoint remains tam dalrymple (dalrympt@oclc.org) is senior information specialist at oclc, dublin, ohio. 26 information technology and libraries | december 200826 information technology and libraries | december 2008 the reference “workspace.” when questions come in through e-mail or phone, librarians enter them into questionpoint, using it to add notes and keep track of sources checked. completed transactions are added to the local knowledge base. (because their questions involve proprietary matters, many special libraries do not add their answers to the global knowledge base, and oclc library is no exception. the local knowledge base is accessible only by oclc library staff.) not surprisingly, most of the questions received are about libraries, museums, and other cultural institutions, their collections, users, and staff. this means that the likelihood of reuse of the information in the oclc library knowledge base is relatively high, and makes the local knowledge base an early stop in the reference process. though statistics vary widely by individual institutions and type of library—and though some libraries have opted not to use the knowledge base—the average ratio for all questionpoint libraries is about one knowledge base search for every three questions received. in contrast, in the past year oclc library staff averaged 4.2 local knowledge base searches for every three questions received. the view of the questionpoint knowledge base as a repository of answers to questions that have been asked is a traditional one. oclc library’s use of the questionpoint knowledge base in anticipation of information needs of its clients—as a way of collection development—is distinctive. in many respects this use creates an updated version of the oldfashioned vertical file. nontraditional uses of questionpoint just-in-case the vertical file has a quirky place in the annals of librarianship. it has been the repository for facts and information too good to throw away but not quite good enough to catalog. h. w. wilson still offers its vertical file index, a specialized subject index to pamphlets issued on topics often unavailable in book form, which began in 1932. by now, except for special collections, the internet has practically relegated the vertical file to the backroom with the card platens and electric erasers. oclc library now uses its questionpoint knowledge base to manage information that once might have gone into a vertical file: the authoritative reports, studies, .org sites, and other resources that are often not substantive enough to catalog, but too good to hide away in a single staff member’s bookmarks. the questionpoint knowledge base provides a place for these resources; more importantly, questionpoint provides fast, efficient ways to collect, tag, manage, and use them. questionpoint allows development of such collections with powerful capabilities that allow for future retrieval and use of the information, and it does so without the incredibly time-consuming processes of the past. a 1909 description of such processes describes in detail the inefficiency of yore: in the public library [sic] of newark, n.j., material is filed in folders made of no. 1 tag manila paper, cut into pieces about 11x18 inches in size. one end is so turned up against the others as to make a receptacle 11x19 1/2 inches. the front fold is a half inch shorter than the back one, and this leaves a margin exposed on the back one, whereon the subject of that folder is written.2 thus a major benefit of using questionpoint to manage these resources is saving time. because questionpoint is a routine part of oclc library’s workflow, it allows the addition of items directly to the figure 1. oclc library intranet homepage introducing zoomify image | smith 27“just in case” answers: the twenty-first-century vertical file | dalrymple 27 knowledge base quickly and with a minimum of fuss. there is initially no need to make the entry “pretty,” but only to describe the resource briefly, add the url, and tag it (see figure 2). unlike a physical vertical file, tagging items in the knowledge base allows items to be “put” in multiple places. staff can also add comments that characterize the authoritativeness of a resource. occasionally librarians come across articles or resources that might address multiple questions. instead of burying the data in one overarching knowledge base record, staff can make an entry for each aspect of the resource. an example of this is www .galbithink.org/libraries/analysis. htm, a page created by douglas galbi, senior economist with the federal communications commission (see figure 3). the site provides statistics, including historical statistics, on u.s. public libraries. rather than describe these generically with a tag like “library statistics”—not very useful in any case—each source can be added separately to the questionpoint knowledge base. for example, the item “audiovisual materials in u.s. public libraries” can be assigned specific tags—audiovisual, av, videos—that will make the data more accessible in the future. in other words, librarians use the faq model of asking and answering just one question at a time. an important element in adding “answers” to oclc library’s knowledge base is the ability to provide context. with questionpoint, librarians can not only describe what the resource is, but why it may be of future use. and just the act of adding information to the knowledge base serves as a valuable mnemonic— “i’ve seen that somewhere.” records added to the knowledge base in this way can be easily updated with information about newer editions or better sources. equally valuable is the ability to edit and add keywords when the resource becomes useful for unforeseen questions. sharing information with staff the knowledge base also serves as a more formal collection development tool. when librarians run across potentially valuable resources, they can send a description and a link to a product manager who may find it of use. library staff use questionpoint’s keyword capability to add tags of people’s names and job titles to facilitate ongoing current awareness. employees may provide feedback suggesting an item be added to the figure 3. a page with diverse facts and figures: www.galbithink.org/libraries/analysis.htm figure 2. a sample questionpoint entry, this for a report by the national endowment for the arts 28 information technology and libraries | december 200828 information technology and libraries | december 2008 permanent print collection, or linked to from the library website. oclc library strives to inform users without subjecting them to information overload. when a 2007 survey of oclc staff found the library’s rss feeds seldom used, librarians began to send e-mails directly to individuals and teams. the reaction of oclc staff indicates that such personal messages, with content summaries that allow recipients to quickly evaluate the contents, are more often read than oclc library rss feeds—especially if items sent continue to be valuable. requirements that enable this kind of sharing include knowledge of company goals, staff needs, and product initiatives. to keep up-todate, librarians meet regularly with other oclc staff, and monitor organizational changes. attendance at oclc’s members council meetings provides information on hot topics that help identify resources for future use. while oclc’s growth as a global organization has brought challenges in maintaining awareness of the full range of organization needs, the questionpoint knowledge base offers a practical way to manage increased volume. maintaining resources of potential interest to staff with questionpoint has another benefit: it helps keep librarians aware of internal experts who can help the library with questions, and in many cases allows the library to connect staff with mutual interests to one another. this has become especially important as oclc has grown and its services continue to integrate with one another. conclusions beyond its usefulness as a system to receive, manage, and answer inquiries, questionpoint is providing a way to facilitate access to online resources that addresses the particular needs of oclc library’s constituency. it is fast and easy to use: a standard part of the daily workflow. it enables direct links to sources and accommodates tagging those sources with the names of people and projects, as well as subjects. it serves as part of the library’s collection management and selection system. using questionpoint in this way has some potential drawbacks. “just in case” acquisition of virtual resources entails some of the risks of traditional acquisitions: acquiring resources that are seldom used, creating a database of resources that are difficult to retrieve, and perhaps the necessity of “weeding” or updating obsolete items. with company growth comes the issue of scalability, as well. but for now, the benefits have far outweighed the risks. most of the items added have been identified for and shared with at least one staff member, so the effort has provided immediate payoff. n the knowledge base serves as a collection development tool, helping to identify items that can be cataloged and added to the permanent collection. n the record in the knowledge base can serve as a reminder to check for later editions. n the knowledge base records are easy to update or even delete. the questionpoint virtual vertical file helps oclc library manage and share those useful things that “just turn up.” references 1. “the vertical file for pamphlets and miscellany,” wilson bulletin 1, no. 16 (june 1919): 351. 2. kate louise roberts, “vertical file,” public libraries 12 (oct. 1907): 316–17. editorial | truitt 3 marc truitteditorial a s i write this, hurricane ike is within twelve hours of making landfall in texas; currently, it appears that the storm will strike directly at the houston– galveston area. houstonians with long memories will be comparing ike to hurricane alicia, which devastated the region in 1983, killing twenty-one and doing $2.6 billion in damage.1 younger residents and/or more recent immigrants to the area will recall tropical storm allison, which though not of hurricane force, lashed the city and much of east texas for two weeks in june 2001, leaving in its wake twenty-three dead, $6.4 billion in losses, and tens of thousands of homes damaged or destroyed.2 and of course, more recently, and much better known to all of us, regardless of where we live, katrina, the “mother of all storms,” killed over eighteen hundred, caused over $80 billion in damage, left huge swaths of new orleans uninhabitable, and created a population exodus with whose effects we are living even to this day.3 common to each of these disasters—and so many others like them—is the fact that they have often wrought terrible damage on libraries in their areas. most of us have probably seen the pictures of the waterand mildewdamaged collections at tulane, xavier, the university of new orleans, and the new orleans public library system. and the damage from these events is long-term or even permanent. i formerly worked at the university of houston (uh), and when i left there in 2006 that institution was still dealing with the consequences of allison’s destruction of uh’s subterranean law library. and now i have to wonder whether uh librarians, faculty, and students might not be facing a similar or even worse catastrophe all over again with ike. ital editorial board member donna hirst has done the profession a great service with her column, “the iowa city flood of 2008: a librarian and it professional’s perspective,” which appears in this issue. her account of how library it folks there dealt with relocations of servers, library staff, and indeed library it staff members themselves should be made required reading for all of us in the field, as well as for senior library administrators. the problem, i think we all secretly know, is that emergency preparedness—also known by its current moniker “business continuity planning” (bc)—and disaster recovery (dr) are not “sexy” subjects. devoting a portion of our always too modest resources of money, equipment, staffing, and time to what is, at best, a sort of insurance against what might happen someday seems inexcusably profligate today. such planning and preparation doesn’t roll out any shiny new services and will win few plaudits from staff or patrons, to say nothing of new resources from those who control our institutional purse strings. buying higher bandwidth equipment for a switching closet is likely to be a far easier sell. that is, until that unthinkable something happens, and your organization is facing (or suffers) a catastrophic loss of it services. note that i didn’t say “equipment” or “infrastructure.” the really important loss will be one of services. “stuff”—in the form of servers, workstations, networks, etc.—all costs money, but ultimately is replaceable. what are not replaceable—at least not immediately—are library services to staff and patrons: access to computing (networking, e-mail, productivity applications, etc.), internet resources, and perhaps most importantly nowadays, the licensed electronic content on which we and our patrons have so come to rely. while the news coverage will emphasize (not without justice, i think) the lost or rescued books in a catastrophic loss situation, what staff and patrons are likely to demand first and loudest will be continuation or restoration of technology-based library services such as e-mail, web presence, web access, and licensed content. lest there be doubt, does anyone recall what drove evacuees into public libraries in the wake of katrina? it was, as much as anything, the desire to locate loved ones and especially the need to seek information and forms for government assistance—all of which required access to networked computing resources. if we have one at all—i suspect that many of us have a dr plan that is sadly dated and that has never been tested. look at it this way: would you roll out a critical and highly visible new web service without careful preparation and testing? yet many of us somehow think that bc or dr is somehow different, with no periodic review or testing required. since we feel we have no resources to devote to bc or dr planning and testing, we excuse our failure to do so by telling ourselves and our administrations that “we can’t really plan for a disaster, since the precise circumstances for which we’re planning won’t be the ones that actually occur.” and so we find ourselves later facing a crisis without any preparation. here at the university of alberta libraries, we’ve been giving the questions of business continuity and disaster recovery a good deal of thought lately. our preexisting dr plan was typical of the sort i’ve described above: outof-date, vanishingly skeletal in its details, without explicit reference or relevance to maintenance and restoration of mission critical services, and of course, untested. impetus for our review has come from several sources. perhaps the most interesting of these has been a university-sponsored bc planning process that embraces a twopronged approach: marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | december 2008 n identify and prioritize your organization’s services. working with other constituencies within the library, we have identified and prioritized approximately ten broad services to be maintained or restored in the event of an interruption of our normal business activities. for example, our top priority is the continuation or restoration of access to licensed electronic content (e.g., e-journals, e-books, databases, etc.). our it disaster planning will be informed by and respond to this goal. n identify “upstream” and “downstream” dependencies. we are dependent on others for services so that we can provide our own; thus we cannot offer access to the internet for our users unless campus it provides us with a gateway to off-campus networks. we need to make certain as we plan that campus it is aware of and can provide this service in the scenarios for which we’re planning. by the same token, others are dependent on us for the provision of services critical to their planning: our consortial partners, for example, rely on us for ils, document delivery, and other technology-based services that we need to plan to continue in the event of a disaster. these two facets—services and dependencies—can be expressed as a matrix that is helpful in planning for bc and dr goals that are both responsive to the needs of the organization and achievable in terms of upstream and downstream dependencies. it has been an enlightening exercise. one consequence has been our decision to include, as part of next fiscal year’s budget request, funding to help create a dr site at our library’s remote storage facility, to enable us quickly to restore access to our most critical technology services. in the past, we might have used this annual request as an opportunity to highlight our need for funding to support rolling out some glamorous new service initiative. with this request, though, we are explicitly recognizing that we as an organization need to commit to measures that ensure the continuance in a variety of situations of our existing core services. that’s a major change in mindset for us, as i suspect it would be for many library it organizations. a final interesting aspect of our planning process is that one of the major drivers for the university is a concern about business continuity in the event of a peoplebased disaster. as avian influenza (aka, “bird flu”) has spread beyond the confines of its southeast asian point of origin, worry about how we continue to operate in the midst of a pandemic has been added to the more predictable suite of fires, floods, tornadoes, and earthquakes (okay, not likely in alberta). indeed, pandemic planning is in many ways far more difficult than that for more “normal” disasters. while in many smaller libraries the “it shop” may be comprised of one person in many hats, in larger organizations such as ours (approximately 25 full-time equivalent employees in library it), there tends to be a great deal of specialization. can the webmaster, in the midst of a crisis, support staff workstations? can the help desk technician deduce why our vendor for web of science has suddenly and inexplicably disabled our access? our bc process rules tell us that we should be planning for “three-deep” expertise in all critical areas, since the assumption is that a pandemic might mean that a third or more of our staff would be ill (or worse) at any given time. how many of us offer critical technology services that suffer from that it manager’s ultimate staffing nightmare, the single point of failure? we have no profound answers to these questions, and our planning process is by no means the one that will work for all organizations. but the evidence of katrina, ike, and iowa city is plain: we need to be as prepared as possible for these events. the time to “get religion” about business continuity and disaster recovery is before the unthinkable occurs, not after. are there any of you out there with experiences—either in preparation and planning or in recovery operations—that you would consider sharing with ital readers? we all would benefit from your thoughts and experiences. i know i would! post-ike postscript. ike roared ashore four days ago and it is clear from media coverage since that galveston suffered a catastrophe and houston was badly damaged. reports from area libraries are sketchy and only today beginning to filter out. meanwhile, at the university of houston, the building housing the architecture library lost its roof, and the salvageable portions of its collection are to be relocated to the main m.d. anderson library. references 1. “hurricane alicia,” wikipedia, http://en.wikipedia.org/ wiki/hurricane_alicia (accessed sept. 12, 2007). 2. “tropical storm allison,” wikipedia, http://en.wikipedia .org/wiki/tropical_storm_allison (accessed sept. 12, 2007). 3. “hurricane katrina,” wikipedia, http://en.wikipedia .org/wiki/hurricane_katrina (accessed sept. 12, 2007). 152 information technology and libraries | december 2011 ■■ more from the far side of the k–t boundary in my september column, i offered some old-school suggestions for how we as a profession might cope with our confused and unbalanced times. since then, several more have crossed my mind, and i thought i’d offer them, for what they may be worth: ■■ we can outsource everything but responsibility. whether it’s “the cloud,” vendor acquisition profiles, or shelfready cataloguing, outsourcing has become a popular way of dealing with budgetary and staffing stresses during the past few years. generally speaking, i have serious reservations about outsourcing our services, but i do recognize the imperatives that have caused us to resort to them. that said, in farming out critical library services, we do not at the same time gain license to farm out responsibility for their efficient operation. oversight and quality control are still up to us, and it simply will not wash with patrons today, next year, or a century from now to be told that a collection or service is unacceptably substandard because we outsourced it. a vendor’s failure is our failure, too. it’s still “our stuff,” and so are the services. ■■ we’re here to make decisions, not avoid them. document delivery, patron-driven acquisitions, usability studies, and evidence-based methodologies should help to inform and serve as validity checks for our decisions, not be replacements for them. as with outsourcing and our over-reliance on technology-driven solutions, i fear that these services and methodologies are in real danger of becoming crutches, enabling us to avoid making decisions that may be difficult, unpopular, tedious, or simply too much work. but if decisions regarding collections and services can be reduced to simple questions of demand or the outcome of a survey, then who needs us? it’s our job to make these decisions; demandor survey-driven techniques are simply there to assist us in doing so. ■■ relevance is relative. we talk about “relevance” in much the same breathlessly reverential voice as we speak of the “user” . . . as if there were but one, uniquely “relevant” service model for that single, all-encompassing “user.” one of the perils of our infatuation with “relevance” is the illusion that by adopting this or that technology or targeted service, we are somehow remaining relevant to “the user.” which user? just as not all patrons come to us seeking potboiler romances, so too not all users demand that all services and collections be made available electronically, over mobile platforms. since we do recognize that our resources are finite, rather than pandering to some groups at the expense of others with trendy temporal come-ons, why not instead focus on long-term services and collections that reflect our values? the patrons who really should matter most to us will respect us for this demonstration of our integrity. ■■ libraries are ecosystems. as with the rest of the world around us, libraries comprise arrays of interlocking, interdependent, and often poorly understood/ documented entities, services, and systems. they’ve developed that way over centuries. and just as so often happens in the larger world, any and every change we make can cause a cascade of countless other changes, many of which we might not anticipate before making that seemingly simple initial change. we are stewards of the libraries in which we work: our obligation, as librarians, is to respect what was bequeathed to us, to care for and use it wisely, and to pass it on to those who follow in at least the condition in which we received it—preferably better. environments, including libraries, change and evolve of course, but critics of the supposedly slow pace of change in libraries fail to grasp that our role is just as much that of the conservationist as it is the advocate of development and change. our mission is not change for change’s sake; rather, it is incremental, considered change that will benefit not only today’s patrons and librarians, but respect those of the past and serve those of the future as well. perhaps librarians need an analogue to the medical profession’s hippocratic oath: primum non nocere, “first, do no harm.” ■■ innocents abroad probably few ital readers will be aware (i certainly wasn’t!) that mark twain’s bestselling book during his lifetime was not tom sawyer or huckleberry finn—or any of a host of others of his now better-remembered works— but rather his 1869 travelogue innocents abroad, or the new pilgrims’ progress. the book, which i’ve been savoring in my spare leisure reading time over the past several months, records in journal form twain’s involvement in a voyage in 1867 by a group of american tourists to various locales in mediterranean europe, northern africa, and the near east. in the book, twain gleefully skewers marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. outgoing editor’s column: parting thoughts outgoing editor’s column | truitt 153 committee assignments go, i think it fair to say that this is probably one of the more thankless. board members must be expert in all areas of technology, and as important, willing and able to do a credible job of pretending to be so in those areas where they are not expert! they must be able to recognize and create good prose and to offer authors practical, constructive insights and guidance in the sometimes black art of turning promising manuscripts into great articles. as i think many ital authors will attest, they do a superb job at this. they also write some of the most interesting and perceptive editorial columns you’ll see in ital! ■■ judith carter. it’s really impossible to overstate the contributions made by judith to ital. other than a brief four-year interlude during which i served in the role, judith has been managing editor for much of the past decade and more. she taught me the job when she relinquished it in early 2004, and then graciously offered to take it back again when i was named editor four years later. more than any other single person, she is responsible for the ital you hold in your hands, and she does it with skill and tireless dedication. she also has been my coach, my confidante, and—as only a true friend can be—even my butt-kicker when i was late in observing a deadline, which has not infrequently been the case. thank you for everything, judith. ■■ dan and john. the late dan marmion brought me on board at ital as a board member in 2000; he later asked me to serve as his managing editor. he also encouraged me to succeed john webb as editor in 2007. from both dan and john i learned much about the role of an editor and especially about what ital could and should be. i am endlessly appreciative for their mentoring and hope that i have been reasonably successful in maintaining the high standards that they set for the journal. ■■ the authors. without interesting, well-researched, and timely content, there would be no ital. i have been blessed with a rich and nearly constant supply of superb manuscript submissions that the folks who make up the ital “publication machine” have then turned into a highly stimulating and readable journal. i hope you agree. ■■ the readers. and finally, i thank all of you, gentle readers. you are the reason that ital exists. i have been grateful for your support, your patience, and your always-constructive suggestions. beginning with the march 2012 issue, ital will be edited by bob gerrity of boston college. i’ve been acquainted with bob for a number of years, and i can’t think of a better person to guide this journal through the tour-goers, those they encounter, and of course himself; as with twain generally, it is at turns witty, outlandish, biting, and—by today’s lights—completely lacking in political correctness. in short, it’s vintage mark twain: delicious! i mention innocents abroad not simply because i’m currently enjoying it (and hoping that by saying so, i might pique some other ital reader ’s interest in giving it a test drive) but also because it—as with other books, songs, stories, etc., about journeys-taken—is a metaphor for life. we are all “innocents” in some sense as we traverse the days and years of growth in selfawareness, relationships, work, and all the other facets that make up life. it’s a comforting way of viewing the world, i think. i’ve served with ital in various capacities for more than eleven years. that’s a very long time in terms of one particular ala/lita committee. it’s now time for my journey and ital’s to part ways. this is my final column as editor of this journal. this “innocent” is debarking the ital ship and moving on. ital is the product of the dedicated labor of many people of whom i am but one. for some of them, it is a labor of love. as with the credits at the end of a film, it is customary for an editor in her or his final column to recognize and thank the people who made it all possible. i’d like to do so now. polite audience members know to remain until “the end” rolls by. i hope you’ll help me honor these people by doing so, too: ■■ mary taylor, valerie edmonds, and melissa prentice in the lita office. over the years, they’ve been unfailingly helpful to me, to say nothing of being nearly as unfailingly tolerant of my clueless and occasionally obstreperous, passive-aggressive ignorance of the byzantine ways of the ala bureaucracy. ■■ ala production services. production services folk are the professionals who, among innumerable other skills, copyedit and typeset manuscripts, perform miracles with figures and tables, and generally make ital into the quality product you receive (whether it is celluloseor electron-based). regardless of ital’s future publishing format and directions, count yourself fortunate as long as the good people in production services continue to play a role. i’d especially like to single out tim clifford, ital’s production editor, who over the past several years has brought skill, grace, stability, and a healthy dose of humor to this critical post. ■■ the members—past and present—of the ital editorial board. the editorial board is a lita committee; the members of this committee serve as the editor’s primary group of reviewer-referees of manuscripts submitted for publication consideration. as 154 information technology and libraries | december 2011 “happy trails,” and “t-t-t-t-that’s all, folks!” “the end.” changes that will be coming over the next few years. i wish him the very best and hope that he has as much fun in the job—and on the journey—as have i. from the managing editor i’d like to take this opportunity to give marc truitt my heartfelt thanks and best wishes as he leaves his longterm relationship with information technology and libraries (ital). i appreciate how he ably stepped into the role of managing editor (me) when i needed to resign to focus on my full-time job. a few years later he became the new editor and i accepted his request to be his me. i think we’ve had a good partnership. i’ve nudged marc about the production schedule while he has managed manuscripts, the peer review process, and eloquently represented the journal when needed. marc held and communicated a clear and scholarly view of the journal to the editorial board and to lita. i have fond memories of many cups of tea drunk in various ala conference venues while we discussed ital, lita, and shared news of mutual friends. we endured the loss of our friend and mentor dan marmion together a year ago september when marc wrote a letter which i read at the memorial service. this too may be my final issue of ital. it is unknown at time of printing. i support the online future of ital and have offered my services to robert gerrity until a paper version is no longer supported and we successfully transition my duties into an online environment/to a new me. i know he will take the journal into its new iteration with skill and grace. i have served lita and ital for over 13 years and am proud of the quality peer reviewed journal dan marmion, john webb, marc truitt, the editorial board members and i have shared with the members of lita. it has also been my honor to communicate with each of the authors and to facilitate their scholarly communication to our profession. without the authors, where would we be? thank you all, judith carter. 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and listings . historically, this information must be kept indefinitely on an archival basis . the machine-readabl e codes themselves are grouped together and maintained from the library's budget . finally , a reference desk is staffed to answer questions from the entire user community. documentation for programs is strictly controlled . code changes are arranged chronologically to provide only the most current release of a program to all users. historical information is kept and is crucial to satisfy the demands of auditors (such as the nuclear regulatory commission). additionally, the names of people administratively connected with the program are recorded and their responsibilities communications 53 defined (valuable in situations of liability for work complete d yesteryear). the backbone of the operation is a standards manual that spells out and discusses the file requirements, documentation specifications, and control forms. this standard is made readily available throughout bechtel. in addition, there are in-house education classes about the same document. indeed, the central data processing library is the repository of computer information at bechtel. the centralization and control of computer programs eliminates the chaos that can occur if too many individuals maintain and use the same computer program . 6 information technology and libraries | september 2008 mireia ribera turróeditorial board thoughts the june issue of ital featured a new column enti-tled editorial board thoughts. the column features commentary written by ital editorial board members on the intersection of technology and libraries. in the june issue kyle felker made a strong case for gerald zaltman’s book how customers think as a guide to doing user-centered design and assessment in the context of limited resources and uncertain user needs. in this column i introduce another factor in the library–it equation, that of rapid technological change. in the midst of some recent spring cleaning in my library i had the pleasure of finding a report documenting the current and future it needs of purdue university’s hicks undergraduate library. the report is dated winter 1995. the following summarizes the hicks undergraduate library’s it resources in 1995: [the library] has seven public workstations running eight different databases and using six different search software programs. six of the stations support a single database only; one station supports one cd-rom application and three other applications (installed on the hard drive). none of the computers runs windows, but the current programs do not require it. five stations are equipped with six-disc cd-rom drives. we do not anticipate that we will be required to upgrade to windows capability in the near future for any of the application programs. today the hicks undergraduate library’s it resources are dramatically different. as opposed to seven public workstations, we have more than seventy computers distributed throughout the library and the digital learning collaboratory, our information commons. this excludes forty-six laptops available for patron checkout and eighty-eight laptops designated for instructional use. we have moved from eight cd-rom databases to more than four hundred networked databases accessible throughout the purdue university libraries, campus, and beyond. as a result, there are hundreds of “search software programs”—doesn’t that phrase sound odd today?—including the library databases, the catalog, and any number of commercial search engines like google. today all, or nearly all, of our machines run windows, and the macs have the capability of running windows. in addition to providing access to databases, our machines are loaded with productivity and multimedia software allowing students to consume and produce a wide array of information resources. beyond computers, our library now loans out additional equipment including hard drives, digital cameras, and video cameras. the 1995 report also includes system specifications for the computers. these sound quaint today. of the seven computers six were 386 machines with processors clocking in at 25 mhz. the computers had between 640k and 2.5mb of ram with hard drives with capacities between 20 and 60mb. the seventh computer was a 286 machine probably with a 12.5 mhz processor, and correspondingly smaller memory and hard disc capacity. the report does not include monitor specifications, though, based on the time, they were likely fourteenor fifteen-inch cga or ega cathode ray tube monitors. modern computers are astonishingly powerful in comparison. according to a member of our it unit, the computers we order today have 2.8 ghz dual core processors, 3gb of ram, and 250gb hard drives. this equates to being 112 times faster, 1,200 times more ram, and hard drives that are 4,167 times larger than the 1995 computers! as a benchmark, consider moore’s law, a doubling of capacitors every two years, a sixty-four fold increase over a thirteen year period. who would have thought that library computers would outpace moore’s law?! today’s computers are also smaller than those of 1995. our standard desktop machines serve as an example, but perhaps not as dramatically as laptops, mini-laptops, and any of the mobile computing machines small enough to fit into your pocket. monitors are smaller, though also bigger. each new computer we order today comes standard with a twenty-inch flat panel lcd monitor. it is smaller in terms of weight and overall size, but the viewing area is significantly larger. these trends are certainly not unique to purdue. nearly every other academic library could boast similar it advancements. with this in mind, and if moore’s law continues as projected, imagine the computer resources that will be available on the average desktop machine— although one wonders if it will in fact be a desktop machine—in the next thirteen years. what things out on the distant horizon will eventually become commonplace? here the quote from the 1995 report about windows is particularly revealing. what things that are currently state-of-the-art will we leave behind in the next decade? what’s dos? what’s a cd-rom? will we soon say, what’s a hard drive? what’s software? what’s a desktop computer? in the last thirteen years we have also witnessed the widespread adoption and proliferation of the internet, the network that is the backbone for many technologies that have become essential components of physical and digital libraries. earlier this year, i co-authored an arl spec kit entitled social software in libraries.1 the survey reports on the usage of ten types of social software within arl libraries: (1) social networking sites like myspace and facebook; (2) media sharing sites like 6 information technology and libraries | september 2008 matthew m. bejune (mbejune@purdue.edu) is an ital editorial board member (2007–09), assistant professor of library science at purdue university, and doctoral student in the graduate school of library and information science at the university of illinois at urbana–champaign. matthew m. bejune editorial board thoughts | bejune 7 youtube and flickr; (3) social bookmarking and tagging sites like del. icio.us and librarything; (4) wikis like wikipedia and library success: a best practices wiki; (5) blogs; (6) rss used to syndicate content from webpages, blogs, podcasts, etc.; (7) chat and instant messenger services; (8) voice over internet protocol (voip) services like googletalk and skype; (9) virtual worlds like second life and massively multiplayer online games (mmogs) like world of warcraft; and (10) widgets either developed by libraries like facebook applications, firefox catalog search extensions, or widgets implemented by libraries like meebome and firefox plugins. of the 64 arl libraries that responded, a 52% response rate, 61 (95% of respondents) said they are using social software. of the three libraries not using social software, two indicated they plan to do so in the future. in combination then, 63 out of 64 respondents (98%) indicated they are either currently using or planning to use social software. as part of the survey there was a call for examples of social software used in libraries. of the 370 examples we received, we selected around 70 for publication in the spec kit. the examples are captivating and they illustrate the wide variety of applications in use today. of the ten social software applications in the spec kit, how many of them were at our disposal in 1995? by my count three: chat and instant messenger services, voip, and virtual worlds such as text-based muds and moos. of these three, how many were in use in libraries? very few, if any. in our survey we asked libraries for the year in which they first implemented social software. the earliest applications were cu-seeme, a voip chat service at cornell university in 1996, im at the university of california riverside in 1996 as well, and interoffice chat at the university of kentucky in 1998. the remaining libraries adopted social software in year 2000 and beyond, with 2005 being the most common year with 22 responses or 34% of the libraries that had adopted social software. a look at this data shows that my earlier use of a thirteen-year time period to illustrate how difficult it is to project technological innovations that may prove disruptive to our organizations is too broad a time frame. perhaps we should scale this back to looking at five-year increments of time. using the spec kit data, in year 2003, a total of 16 arl libraries had adopted social software. this represents 25% of the total number of institutions that responded when we did our survey. this seems like figure 1. responses to the question, “please enter the year in which your library first began using social software” (n=61). a more reasonable time frame to be looking to the future. so, what does the future hold for it and libraries, whether it be thirteen or five years in the future? i am not a technologist by training, nor do i consider myself a futurist, so i typically defer to my colleagues. there are three places i look to for prognostications of the future. the first is lita’s top technology trends, a recurring discussion group that is a part of ala’s annual conference sand midwinter meetings. past top technology trends discussions can be found on lita’s blog (www.ala .org/ala/lita/litaresources/toptechtrends/toptechnology.cfm) and on lita’s website (www.ala.org/ala/lita/ litaresources/toptechtrends/toptechnology.cfm). the second source is the horizon project, a five-year qualitative research effort aimed at identifying and describing emerging technologies within the realm of teaching and learning. the project is a collaboration between the new media consortium and educause. the horizon project website (http://horizon.nmc.org/wiki/main_page) contains the annual horizon reports going back to 2004. a final approach to project the future of it and libraries is to consider the work of our peers. the next library innovation may emerge from a sister institution. or perhaps it may take route at your local library first! reference 1. bejune, matthew m. and jana ronan. social software in libraries. arl spec kit 304. washington, d.c.: association of research libraries, 2008. 16 information technology and libraries | march 2009 mathew j. miles and scott j. bergstrom classification of library resources by subject on the library website: is there an optimal number of subject labels? the number of labels used to organize resources by subject varies greatly among library websites. some librarians choose very short lists of labels while others choose much longer lists. we conducted a study with 120 students and staff to try to answer the following question: what is the effect of the number of labels in a list on response time to research questions? what we found is that response time increases gradually as the number of the items in the list grow until the list size reaches approximately fifty items. at that point, response time increases significantly. no association between response time and relevance was found. i t is clear that academic librarians face a daunting task drawing users to their library’s web presence. “nearly three-quarters (73%) of college students say they use the internet more than the library, while only 9% said they use the library more than the internet for information searching.”1 improving the usability of the library websites therefore should be a primary concern for librarians. one feature common to most library websites is a list of resources organized by subject. libraries seem to use similar subject labels in their categorization of resources. however, the number of subject labels varies greatly. some use as few as five subject labels while others use more than one hundred. in this study we address the following question: what is the effect of the number of subject labels in a list on response times to research questions? n literature review mcgillis and toms conducted a performance test in which users were asked to find a database by navigating through a library website. they found that participants “had difficulties in choosing from the categories on the home page and, subsequently, in figuring out which database to select.”2 a review of relevant research literature yielded a number of theses and dissertations in which the authors compared the usability of different library websites. jeng in particular analyzed a great deal of the usability testing published concerning the digital library. the following are some of the points she summarized that were highly relevant to our study: n user “lostness”: users did not understand the structure of the digital library. n ambiguity of terminology: problems with wording accounted for 36 percent of usability problems. n finding periodical articles and subject-specific databases was a challenge for users.3 a significant body of research not specific to libraries provides a useful context for the present research. miller’s landmark study regarding the capacity of human shortterm memory showed as a rule that the span of immediate memory is about 7 ± 2 items.4 sometimes this finding is misapplied to suggest that menus with more than nine subject labels should never be used on a webpage. subsequent research has shown that “chunking,” which is the process of organizing items into “a collection of elements having strong associations with one another, but weak associations with elements within other chunks,”5 allows human short-term memory to handle a far larger set of items at a time. larson and czerwinski provide important insights into menuing structures. for example, increasing the depth (the number of levels) of a menu harms search performance on the web. they also state that “as you increase breadth and/or depth, reaction time, error rates, and perceived complexity will all increase.”6 however, they concluded that a “medium condition of breadth and depth outperformed the broadest, shallow web structure overall.”7 this finding is somewhat contrary to a previous study by snowberry, parkinson, and sisson, who found that when testing structures of 26, 43, 82, 641 (26 means two menu items per level, six levels deep), the 641 structure grouped into categories proved to be advantageous in both speed and accuracy.8 larson and czerwinksi recommended that “as a general principle, the depth of a tree structure should be minimized by providing broad menus of up to eight or nine items each.”9 zaphiris also corroborated that previous research concerning depth and breadth of the tree structure was true for the web. the deeper the tree structure, the slower the user performance.10 he also found that response times for expandable menus are on average 50 percent longer than sequential menus.11 both the research and current practices are clear concerning the efficacy of hierarchical menu structures. thus it was not a focus of our research. the focus instead was on a single-level menu and how the number and characteristics of subject labels would affect search response times. n background in preparation for this study, library subject lists were collected from a set of thirty library websites in the united mathew j. miles (milesm@byui.edu) is systems librarian and scott j. bergstrom (bergstroms@byui.edu) is director of institutional research at brigham young university–idaho in rexburg. classification of library resources by subject on the library website | miles and bergstrom 17 states, canada, and the united kingdom. we selected twelve lists from these websites that were representative of the entire group and that varied in size from small to large. to render some of these lists more usable, we made slight modifications. there were many similarities between label names. n research design participants were randomly assigned to one of twelve experimental groups. each experimental group would be shown one of the twelve lists that were selected for use in this study. roughly 90 percent of the participants were students. the remaining 10 percent of the participants were full-time employees who worked in these same departments. the twelve lists ranged in number of labels from five to seventy-two: n group a: 5 subject labels n group b: 9 subject labels n group c: 9 subject labels n group d: 23 subject labels n group e : 6 subject labels n group f: 7 subject labels n group g: 12 subject labels n group h: 9 subject labels n group i: 35 subject labels n group j: 28 subject labels n group k: 49 subject labels n group l: 72 subject labels each participant was asked to select a subject label from a list in response to eleven different research questions. the questions are listed below: 1. which category would most likely have information about modern graphical design? 2. which category would most likely have information about the aztec empire of ancient mexico? 3. which category would most likely have information about the effects of standardized testing on high school classroom teaching? 4. which category would most likely have information on skateboarding? 5. which category would most likely have information on repetitive stress injuries? 6. which category would most likely have information about the french revolution? 7. which category would most likely have information concerning walmart’s marketing strategy? 8. which category would most likely have information on the reintroduction of wolves into yellowstone park? 9. which category would most likely have information about the effects of increased use of nuclear power on the price of natural gas? 10. which category would most likely have information on the electoral college? 11. which category would most likely have information on the philosopher emmanuel kant? the questions were designed to represent a variety of subject areas that library patrons might pursue. each subject list was printed on a white sheet of paper in alphabetical order in a single column, or double columns when needed. we did not attempt to test the subject lists in the context of any web design. we were more interested in observing the effect of the number of labels in a list on response time independent of any web design. each participant was asked the same eleven questions in the same order. the order of questions was fixed because we were not interested in testing for the effect of order and wanted a uniform treatment, thereby not introducing extraneous variance into the results. for each question, the participant was asked to select a label from the subject list under which they would expect to find a resource that would best provide information to answer the question. participants were also instructed to select only a single label, even if they could think of more than one label as a possible answer. participants were encouraged to ask for clarification if they did not fully understand the question being asked. recording of response times did not begin until clarification of the question had been given. response times were recorded unbeknownst to the participant. if the participant was simply unable to make a selection, that was also recorded. two people administered the exercise. one recorded response times; the other asked the questions and recorded label selections. relevance rankings were calculated for each possible combination of labels within a subject list for each question. for example, if a subject list consisted of five labels, for each question there were five possible answers. two library professionals—one with humanities expertise, the other with sciences expertise—assigned a relevance ranking to every possible combination of question and labels within a subject list. the rankings were then averaged for each question–label combination. n results the analysis of the data was undertaken to determine whether the average response times of participants, adjusted by the different levels of relevance in the subject list labels that prevailed for a given question, were significantly different across the different lists. in other words, would the response times of participants using a particular list, for whom the labels in the list were highly relevant 18 information technology and libraries | march 2009 to the question, be different from students using the other lists for whom the labels in the list were also highly relevant to the question? a separate univariate general linear model analysis was conducted for each of the eleven questions. the analyses were conducted separately because each question represented a unique search domain. the univariate general linear model provided a technique for testing whether the average response times associated with the different lists were significantly different from each other. this technique also allowed for the inclusion of a covariate—relevance of the subject list labels to the question—to determine whether response times at an equivalent level of relevance was different across lists. in the analysis model, the dependent variable was response time, defined as the time needed to select a subject list label. the covariate was relevance, defined as the perceived match between a label and the question. for example, a label of “economics” would be assessed as highly relevant to the question, what is the current unemployment rate? the same label would be assessed as not relevant for the question, what are the names of four moons of saturn? the main factor in the model was the actual list being presented to the participant. there were twelve lists used in this study. the statistical model can be summarized as follows: response time = list + relevance + (list × relevance) + error the general linear model required that the following conditions be met: first, data must come from a random sample from a normal population. second, all variances with each of the groupings are the same (i.e., they have homoscedasticity). an examination of whether these assumptions were met revealed problems both with normality and with homoscedasticity. a common technique— logarithmic transformation—was employed to resolve these problems. accordingly, response-time data were all converted to common logarithms. an examination of assumptions with the transformed data showed that all questions but three met the required conditions. the three 0.70 0.80 0.90 1.00 1.10 1.20 0.50 0.60 avg log performance trend figure 1. the overall average of average search times for the eight questions for all experimental groups (i.e., lists) questions (5, 6, and 7) were excluded from subsequent analysis. n conclusions the series of graphs in the appendix show the average response times, adjusted for relevance, for eight of the eleven questions for all twelve lists (i.e., experimental groups). three of the eleven questions were excluded from the analysis because of heteroscedascity. an inspection of these graphs shows no consistent pattern in response time as the number of the items in the lists increase. essentially, this means that, for any given level of relevance, the number of items of the list does not affect response time significantly. it seems that for a single question, characteristics of the categories themselves are more important than the quantity of categories in the list. the response times using a subject list with twenty-eight labels is similar to the response times using a list of six labels. a statistical comparison of the mean response time for each classification of library resources by subject on the library website | miles and bergstrom 19 group with that of each of the other groups for each of the questions largely confirms this. there were very few statistically significant different comparisons. the spikes and valleys of the graphs in the appendix are generally not significantly different. however, when the average response time associated with all lists is combined into an overall average from all eight questions, a somewhat clearer picture emerges (see figure 1). response times increase gradually as the number of the items in the list increase until the list size reaches approximately fifty items. at that point, response time increases significantly. no association was found between response time and relevance. a fast response time did not necessarily yield a relevant response, nor did a slow response time yield an irrelevant response. n observations we observed that there were two basic patterns exhibited when participants made selections. the first pattern was the quick selection—participants easily made a selection after performing an initial scan of the available labels. nevertheless, a quick selection did not always mean a relevant selection. the second pattern was the delayed selection. if participants were unable to make a selection after the initial scan of items, they would hesitate as they struggled to determine how the question might be reclassified to make one of the labels fit. we did not have access to a high-tech lab, so we were unable to track eye movement, but it appeared that the participants began scanning up and down the list of available items in an attempt to make a selection. the delayed selection seemed to be a combination of two problems: first, none of the available labels seemed to fit. second, the delay in scanning increased as the list grew larger. it’s possible that once the list becomes large enough, scanning begins to slow the selection process. a delayed selection did not necessarily yield an irrelevant selection. the label names themselves did not seem to be a significant factor affecting user performance. we did test three lists, each with nine items and each having different labels, and response times were similar for the three lists. a future study might compare a more extensive number of lists with the same number of items with different labels to see if label names have an effect on response time. this is a particular challenge to librarians in classifying the digital library, since they must come up with a few labels to classify all possible subjects. creating eleven questions to span a broad range of subjects is also a possible weakness of the study. we had to throw out three questions that violated the assumptions of the statistical model. we tried our best to select questions that would represent the broad subject areas of science, arts, and general interest. we also attempted to vary the difficulty of the questions. a different set of questions may yield different results. references 1. steve jones, the internet goes to college, ed. mary madden (washington, d.c.: pew internet and american life project, 2002): 3, www.pewinternet.org/pdfs/pip_college_report.pdf (accessed mar. 20, 2007). 2. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 361. 3. judy h. jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, new brunswick, new jersey): 38–42. 4. george a. miller, “the magical number seven plus or minus two: some limits on our capacity for processing information,” psychological review 63, no. 2 (1956): 81–97. 5. fernand gobet et al., “chunking mechanisms in human learning,” trends in cognitive sciences 5, no. 6 (2001): 236–43. 6. kevin larson and mary czerwinski, “web page design: implications of memory, structure and scent for information retrieval” (los angeles: acm/addison-wesley, 1998): 25, http://doi.acm.org/10.1145/274644.274649 (accessed nov. 1, 2007). 7. ibid. 8. kathleen snowberry, mary parkinson, and norwood sisson, “computer display menus,” ergonomics 26, no 7 (1983): 705. 9. larson and czerwinski, “web page design,” 26. 10. panayiotis g. zaphiris, “depth vs. breath in the arrangement of web links,” www.soi.city.ac.uk/~zaphiri/papers/hfes .pdf (accessed nov. 1, 2007). 11. panayiotis g. zaphiris, ben shneiderman, and kent l. norman, “expandable indexes versus sequential menus for searching hierarchies on the world wide web,” http:// citeseer.ist.psu.edu/rd/0%2c443461%2c1%2c0.25%2cdow nload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/ cache/papers/cs/22119/http:zszzszagrino.orgzszpzaphiriz szpaperszszexpandableindexes.pdf/zaphiris99expandable.pdf (accessed nov. 1, 2007). 20 information technology and libraries | march 2009 appendix. response times by question by group 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 gr p a (5 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p d (2 3 ite m s) gr p e (6 it em s) gr p f (7 it em s) gr p g (1 2 ite m s) gr p h (9 it em s) gr p i (3 5 ite m s) gr p j (2 8 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) question 1 question 8 question 2 question 9 question 3 question 10 question 4 question 11 54 information technology and libraries | june 2010 tinuing education opportunities for library information technologists and all library staff who have an interest in technology. 2. innovation: to serve the library community, lita expert members will identify and demonstrate the value of new and existing technologies within ala and beyond. 3. advocacy and policy: lita will advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. 4. the organization: lita will have a solid structure to support its members in accomplishing its mission, vision, and strategic plan. 5. collaboration and outreach: lita will reach out and collaborate with other library organizations to increase the awareness of the importance of technology in libraries, improve services to existing members, and reach out to new members. the lita executive committee is currently finalizing the strategies lita will pursue to achieve success in each of the goal areas. it is my hope that the strategies for each goal are approved by the lita board of directors before the 2010 ala annual conference in washington, d.c. that way the finalized version of the lita strategic plan can be introduced to the committee and interest group chairs and the membership as a whole at that conference. this will allow us to start the next fiscal year with a clear road for the future. while i am excited about what is next, i have also been dreading the end of my presidency. i have truly enjoyed my experience as lita president, and in some way wish it was not about to end. i have learned so much and have met so many wonderful people. thank you for giving me this opportunity to serve you and for your support. i have truly appreciated it. a s i write this last column, the song “my way” by frank sinatra keeps going through my head. while this is definitely not my final curtain, it is the final curtain of my presidency. like sinatra i have a few regrets, “but then again, too few to mention.” there was so much more i wanted to accomplish this year; however, as usual, my plans were more ambitious than the time i had available. being lita’s president was a big part of my life, but it was not the only part. those other parts—like family, friends, work, and school—demanded my attention as well. i have thought about what to say in this final column. do i list my accomplishments of the last year? nah, you can read all about that in the lita annual report, which i will post in june. tackle some controversial topic? while i can think of a few, i have not yet thought of any solutions, and i do not want to rant against something without proposing some type of solution or plan of attack. i thought instead i would talk about where i have devoted a large part of my lita time over the last year. as i look back at the last year, i am also thinking ahead to the future of lita. we are currently writing lita’s strategic plan. we have a lot to great ideas to work with. lita members are always willing to share their thoughts both formally and informally. i have been charged with the task of taking all of those great ideas, gathered at conferences, board meetings, hallway conversations, surveys, e-mail, etc., to create a roadmap for the future. after reviewing all of the ideas gathered over the last three years, i was able to narrow that list down to six major goal areas. with the assistance of the lita board of directors and the lita executive committee, we whittled the list down to five major goal areas of the lita strategic plan: 1. training and continuing education: lita will be nationally recognized as the leading source for conmichelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: the end and new beginnings bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 205 kayla l. quinney, sara d. smith, and quinn galbraith bridging the gap: self-directed staff technology training of hbll patrons. as anticipated, results indicated that students frequently use text messages, social networks, blogs, etc., while fewer staff members use these technologies. for example, 42 percent of the students reported that they write a blog, while only 26 percent of staff and faculty do so. also, 74 percent of the students and only 30 percent of staff and faculty indicated that they belonged to a social network. after concluding that staff and faculty were not as connected as their student patrons are to technology, library administration developed the technology challenge to help close this gap. the technology challenge was a self-directed training program requiring participants to explore new technology on their own by spending at least fifteen minutes each day learning new technology skills. this program was successful in promoting lifelong learning by teaching technology applicable to the work and home lives of hbll employees. we will first discuss literature that shows how technology training can help academic librarians connect with student patrons, and then we will describe the technology challenge and demonstrate how it aligns with the principles of self-directed learning. the training will be evaluated by an analysis of the results of two surveys given to participants before and after the technology challenge was implemented. ■■ library 2.0 and “librarian 2.0” hbll wasn’t the first to notice the gap between librarians and students, mcdonald and thomas noted that “gaps have materialized,” and library technology does not always “provide certain services, resources, or possibilities expected by emerging user populations like the millennial generation.”1 college students, who grew up with technology, are “digital natives,” while librarians, many having learned technology later in life, are “digital immigrants.”2 the “digital natives” belong to the millennial generation, described by shish and allen as a generation of “learners raised on and confirmed experts in the latest, fastest, coolest, greatest, newest electronic technologies.”3 according to sweeny, when students use libraries, they expect the same “flexibility, geographic independence, speed of response, time shifting, interactivity, multitasking, and time savings” provided by the technology they use daily.4 students are undergraduates, as members of the millennial generation, are proficient in web 2.0 technology and expect to apply these technologies to their coursework—including scholarly research. to remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these technologies themselves. because leaders at the harold b. lee library of brigham young university (hbll) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the technology challenge, a self-directed technology training program that rewarded employees for exploring technology daily. the purpose of this paper is to examine the technology challenge through an analysis of results of surveys given to participants before and after the technology challenge was implemented. the program will also be evaluated in terms of the adult learning theories of andragogy and selfdirected learning. hbll found that a self-directed approach fosters technology skills that librarians need to best serve students. in addition, it promotes lifelong learning habits to keep abreast of emerging technologies. this paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies. l eaders at the harold b. lee library of brigham young university (hbll) began to suspect a need for technology training when employees were asked during a meeting if they owned an ipod or mp3 player. out of the twenty attendees, only two raised their hands—one of whom worked for it. perceiving a technology gap between hbll employees and student patrons, library leaders began investigating how they could help faculty and staff become more proficient with the technologies that student patrons use daily. to best serve student patrons, academic librarians need to be proficient with the technologies that student patrons expect. hbll found that a self-directed learning approach to staff technology training not only fosters technology skills, but also promotes lifelong learning habits. to further examine the technology gap between librarians and students, the hbll staff, faculty, and student employees were given a survey designed to explore generational differences in media and technology use. student employees were surveyed as representatives of the larger student body, which composes the majority kayla l. quinney (quinster27@gmail.com) is research specialist, sara d. smith (saradsmith@gmail.com) is research specialist, and quinn galbraith (quinn_galbraith@byu.edu) is library human resource training and development manager, brigham young university library, provo, utah. 206 information technology and libraries | december 2010 2.0,” a program that “focuses on self-exploration and encourages staff to learn about new technologies on their own.”24 learning 2.0 encouraged library staff to explore web 2.0 tools by completing twenty-three exercises involving new technologies. plcmc’s program has been replicated by more than 250 libraries and organizations worldwide,25 and several libraries have written about their experiences, including academic26 and public libraries.27 these programs—and the technology challenge implemented by hbll—integrate the theories of adult learning. in the 1960s and 1970s, malcolm knowles introduced the theory of andragogy to describe the way adults learn.28 knowles described adults as learners who (1) are self-directed, (2) use their experiences as a resource for learning, (3) learn more readily when they experience a need to know, (4) seek immediate application of knowledge, and (5) are best motivated by internal rather than external factors.29 the theory and practice of self-directed learning grew out of the first learning characteristic and assumes that adults prefer self-direction in determining and achieving learning goals, and therefore learners exercise independence in determining how and what they learn.30 these theories have had a considerable effect on adult education practice31 and employee development programs.32 when adults participate in trainings that align with the assumptions of andragogy, they are more likely to retain and apply what they have learned.33 ■■ the technology challenge hbll’s technology challenge is similar to learning 2.0 in that it encourages self-directed exploration of web 2.0 technologies, but it differs in that participants were even more self-directed in exploration and that they were asked to participate daily. these features encouraged more self-directed learning in areas of participant interest as well as habit formation. it is not our purpose to critique learning 2.0, but to provide some evidence and analysis to demonstrate the success of hands-on, self-directed training approaches and to suggest other ways for libraries to apply self-directed learning to technology training. the technology challenge was implemented from june 2007 to january 2008. hbll staff included 175 full-time employees, 96 of whom participated in the challenge. (the student employees were not involved.) participants were asked to spend fifteen minutes each day learning a new technology skill. hbll leaders used rewards to make the program enjoyable and to motivate participation: for each minute spent learning technology, participants earned one point, and when one thousand points were earned, the participant would receive a gift certificate to the campus bookstore. staff and faculty participated and tracked their progress through an online masters of “informal learning”; that is, they are accustomed to easily and quickly gathering information relevant to their lives from the internet and from friends. shish and allen claimed that millennials prefer “interactive, hyper-linked multimedia over the traditional static, textoriented printed items. they want a sense of control; they need experiential and collaborative approaches rather than formal, librarian-guided, library-centric services.”5 these students arrive on campus expecting “to handle the challenges of scholarly research” using similar methods and technologies.6 interactive technologies such as blogs, wikis, streaming media applications, and social networks, are referred to as “web 2.0.” abram argued that web 2.0 technology “could be useful in an enterprise, institutional research, or community environment, and could be driven or introduced by the library.”7 “library 2.0” is a concept referring to a library’s integration of these technologies; it is essentially the use of “web 2.0 opportunities in a library environment.”8 manesss described library 2.0 is user-centered, social, innovative, and provider of a multimedia experiences.9 it is a community that “blurs the line between librarian and patron, creator and consumer, authority and novice.”10 libraries have been using web 2.0 technology such as blogs,11 wikis,12 and social networks13 to better serve and connect with patrons. blogs allow libraries to “provide news, information and links to internet resources,”14 and wikis create online study groups15 and “build a shared knowledge repository.”16 social networks can be particularly useful in connecting with undergraduate students: millennials use technology to collaborate and make collective decisions,17 and libraries can capitalize on this tendency by using social networks, which for students would mean, as bates argues, “an informational equivalent of the reliance on one’s facebook friends.”18 students expect library 2.0—and as libraries integrate new technologies, the staff and faculty of academic libraries need to become “librarian 2.0.” according to abram, librarian 2.0 understands users and their needs “in terms of their goals and aspirations, workflows, social and content needs, and more. librarian 2.0 is where the user is, when the user is there.”19 the modern library user “needs the experience of the web . . . to learn and succeed,”20 and the modern librarian can help patrons transfer technology skills to information seeking. librarian 2.0 is prepared to help patrons familiar with web 2.0 to “leverage these [technologies] to make a difference in reaching their goals.”21 therefore staff and faculty “must become adept at key learning technologies themselves.”22 stephen abram asked, “are the expectations of our users increasing faster than our ability to adapt?”23 and this same concern motivated hbll and other institutions to initiate staff technology training programs. the public library of charlotte and mecklenburg county of north carolina (plcmc) developed “learning bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 207 their ability to learn and use technology. to be eligible to receive the gift card, participants were required to take this exit survey. sixty-four participants, all of whom had met or exceeded the thousand-point goal, chose to complete this survey, so the results of this survey represent the experiences of 66 percent of the participants. of course, if those who had not completed the technology challenge had taken the survey the results may have been different, but the results do show how those who chose to actively participate reacted to this training program. the survey included both quantifiable and open-ended questions (see appendix b for survey results and a list of the open-ended questions). the survey results, along with an analysis of the structure of the challenge itself, demonstrates that the program aligns with knowles’s five principles of andragogy to successfully help employees develop both technology skills and learning habits. self-direction the technology challenge was self-directed because it gave participants the flexibility to select which tasks and challenges they would complete. garrison wrote that in a self-directed program, “learners should be provided with choices of how they wish to proactively carry out the learning process. material resources should be available, approaches suggested, flexible pacing accommodated, and questioning and feedback provided when needed.”34 hbll provided a variety of challenges and training sessions related to various technologies. technology challenge participants were given the independence to choose which learning methods to use, including which training sessions to attend and which challenges to complete. according to the exit survey, the most popular training methods were small, instructor-led groups, followed by self-learning through reading books and articles. group training sessions were organized by hbll leadership and addressed topics such as microsoft office, rss feeds, computer organization skills, and multimedia software. other learning methods included web tutorials, dvds, large group discussions, and one-on-one tutoring. the group training classes preferred by hbll employees may be considered more teacher-directed than self-directed, but the technology challenge was self-directed as a whole in that learners were given the opportunity to choose what they learned and how they learned it. the structure of the technology challenge allowed participants to set their own pace. staff and faculty were given several months to complete the challenge and were responsible to pace themselves. on the exit survey, one participant commented: “if i didn’t get anything done one week, there wasn’t any pressure.” another enjoyed flexibility in deciding when and where to complete the tasks: “i liked being able to do the challenge anywhere. when i had a few minutes between appointments, classes, board game called “techopoly.” participation was voluntary, and staff and faculty were free to choose which tasks and challenges they would complete. tasks fell into one of four categories: software, hardware, library technology, and the internet. participants were required to complete one hundred points in each category, but beyond that, were able to decide how to spend their time. examples of tasks included attending workshops, exploring online tutorials, and reading books or articles about a relevant topic. for each hundred points earned, participants could complete a mini-challenge, which included reading blogs or e-books, listening to podcasts, or creating a photo cd (see appendix a for a more complete list). participants who completed fifteen out of twenty possible challenges were entered into a drawing for another gift certificate. before beginning the challenge, all participants were surveyed about their current use of technology. on this survey, they indicated that they were most uncomfortable with blogs, wikis, image editors, and music players. these results provided a focus for technology challenge trainings and mini-challenges. while not all of these technologies may apply directly to their jobs, 60 percent indicated that they were interested in learning them. forty-four percent reported that time was the greatest impediment to learning new technology; therefore the daily fifteen-minute requirement was introduced with the hope that it was small enough to be a good incentive to participate but substantial enough to promote habit formation and allow employees enough time to familiarize themselves with the technology. although some productivity may have been lost due to the time requirement (especially in cases where participants may have spent more than the required time), library leaders felt that technology training was an investment in hbll employees and that, at least for a few months, it was worth any potential loss in productivity. because participants could chose how and when they learned technology, they could incorporate the challenge into their work schedules according to their own needs, interests, and time constraints. of ninety-six participants, sixty-six reached or exceeded the thousand-point goal, and eight participants earned more than two thousand points. ten participants earned between five hundred and one thousand points, and another six earned between one hundred and five hundred. although not all participants completed the challenge, most were involved to some extent in learning technology during this time. ■■ the technology challenge and adult learning after finishing the challenge, participants took an exit survey to evaluate the experience and report changes in 208 information technology and libraries | december 2010 were willing, even excited, to learn technology skills: 37 percent “agreed” and 60 percent “strongly agreed” that they were interested in learning new technology. their desire to learn was cultivated by the survey itself, which helped them recognize and focus on this interest, and the challenge provided a way for employees to channel their desire to learn technology. immediate application learners need to see an opportunity for immediate application of their knowledge: ota et al. explained that “they want to learn what will help them perform tasks or deal with problems they confront in everyday situations and those presented in the context of application to real life.”39 because of the need for immediate application, the technology challenge encouraged staff and faculty to learn technology skills directly related to their jobs—as well as technology that is applicable to their personal or home lives. hbll leaders hoped that as staff became more comfortable with technology in general, they would be motivated to incorporate more complex technologies into their work. here is one example of how the technology challenge catered to adult learners’ need to apply what they learn: before designing the challenge, hbll held a training session to teach employees the basics of photoshop. even though attendees were on the clock, the turnout was discouraging. library leaders knew they needed to try something new. in the revamped photoshop workshop that was offered as part of the technology challenge, attendees brought family photos or film and learned how to edit and experiment with their photos and burn dvd copies. this time, the class was full: the same computer program that before drew only a few people was now exciting and useful. focusing on employees’ personal interests in learning new software, instead of just on teaching the software, better motivated staff and faculty to attend the training. motivation as stated by ota et al., adults are motivated by external factors but are usually more motivated by internal factors: “adults are responsive to some external motivators (e.g., better job, higher salaries), but the most potent motivators are internal (e.g., desire for increased job satisfaction, self-esteem).”40 on the entrance survey, participants were given the opportunity to comment on their reasons for participating in the challenge. the gift card, an example of an external motivation, was frequently cited as an important motivation. but many also commented on more internal motivations: “it’s important to my job to stay proficient in new technologies and i’d like to stay current”; “i feel that i need to be up-to-date or meetings i could complete some of the challenges.” employees could also determine how much or how little of the challenge they wanted to complete: many reached well over the thousand-point goal, while others fell a little short. participants began at different skill levels, and thus could use the time and resources allotted to explore basic or more advanced topics according to their needs and interests. garrison had noted the importance of providing resources and feedback in self-directed learning.35 the techopoly website provided resources (such as specific blogs or websites to visit) and instructions on how to use and access technology within the library. hbll also hired a student to assist staff and faculty one-on-one by explaining answers to their questions about technology and teaching other skills he thought may be relevant to their initial problem. the entrance and exit surveys provided opportunities for self-reflection and self-evaluation by questioning the participants’ use of technology before the challenge and asking them to evaluate their proficiency in technology after the challenge. use of experience the use of experience as a source of learning is important to adult learners: “the richest resource for learning resides in adults themselves; therefore, tapping into their experiences through experiential techniques (discussions, simulations, problem-solving activities, or case methods) is beneficial.”36 the small-group discussions and one-onone problem solving made available to hbll employees certainly fall into these categories. small-group classes are one of the best ways to encourage adults to share and validate their experiences, and doing so increases retention and application of new information.37 the trainings and challenges encouraged participants to make use of their work and personal experiences by connecting the topic to work or home application. for example, one session discussed how blogs relate to libraries, and another helped participants learn adobe photoshop skills by editing personal photographs. need to know adult learners are more successful when they desire and recognize a need for new knowledge or skills. the role of a trainer is to help learners recognize this “need to know” by “mak[ing] a case for the value of learning.”38 hbll used the generational survey and presurvey to develop a need and desire to learn. the results of the generational survey, which demonstrated a gap in technology use between librarians and students, were presented and discussed at a meeting held before the initiation of the technology challenge to help staff and faculty understand why it was important to learn 2.0 technology. results of the presurvey showed that staff and faculty bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 209 statistical reports or working with colleagues from other libraries.” ■■ “i learned how to set up a server that i now maintain on a semi-regular basis. i learned a lot about sfx and have learned some perl programming language as well that i use in my job daily as i maintain sfx.” ■■ “the new oclc client was probably the most significant. i spent a couple of days in an online class learning to customize the client, and i use what i learned there every single day.” ■■ “i use google docs frequently for one of the projects i am now working on.” participants also indicated weaknesses in the technology challenge. almost 20 percent of those who completed the challenge reported that it was too easy. this is a valid point—the challenge was designed to be easy so as not to intimidate staff or faculty who are less familiar with technology. it is important to note that these comments came from those who completed the challenge—other participants may have found the tasks and mini-challenges more difficult. the goal was to provide an introduction to web 2.0, not to train experts. however, a greater range of tasks and challenges could be provided in the future to allow staff and faculty more selfdirection in selecting goals relevant to their experience. to encourage staff and faculty to attend sponsored training sessions as part of the challenge, hbll leaders decided to double points for time spent at these classes. this certainly encouraged participation, but it lead to “point inflation”—perhaps being one reason why so many reported that the challenge was too easy to complete. the doubling of points may also have encouraged staff to spend more time in workshops and less time practicing or applying the skills learned. a possible solution would be offering 1.5 points, or offering a set number of points for attendance instead of counting per minute. it also may have been informative for purpose of analysis to have surveyed both those who did not complete the challenge as well as those who chose not to participate. because the presurvey indicated that time was the biggest deterrent to learning and incorporating new technology, we assume that many of those who did not participate or who did not complete the challenge felt that they did not have enough time to do so. there is definitely potential for further investigation into why library staff would not want to participate in a technology training program, what would motivate them to participate, and how we could redesign the technology challenge to make it more appealing to all of our staff and faculty. several library employees have requested that hbll sponsor another technology challenge program. because of the success of the first and because of continuing interest in technology training, we plan to do so in the future. we will make changes and adjustments according to the on technology in order to effectively help patrons”; “to identify and become comfortable with new technologies that will make my work more efficient, more presentable, and more accurate.” ■■ lifelong learning staff and faculty responded favorably to the training. none of the participants who took the exit survey disliked the challenge; 34 percent even reported that they strongly liked it. ninety-five percent reported that they enjoyed the process of learning new technology, and 100 percent reported that they were willing to participate in another technology challenge—thus suggesting success in the goal of encouraging lifelong technology learning. the exit survey results indicate that after completing the challenge, staff and faculty are more motivated to continue learning—which is exactly what hbll leaders hoped to accomplish. eighty-nine percent of the participants reported that their desire to learn new technology had increased, and 69 percent reported that they are now able to learn new technology faster after completing the technology challenge. ninety-seven percent claimed that they were more likely to incorporate new technology into home or work use, and 98 percent said they recognized the importance of staying on top of emerging technologies. participants commented that the training increased their desire to learn. one observed, “i often need a challenge to get motivated to do something new,” and another participant reported feeling “a little more comfortable trying new things out.” the exit survey asked participants to indicate how they now use technology. one employee keeps a blog for her daughter’s dance company, and another said, “i’m on my way to a full-blown googlereader addiction.” another participant applied these new skills at home: “i’m not so afraid of exploring the computer and other software programs. i even recently bought a computer for my own personal use at home.” the technology challenge was also successful in helping employees better serve patrons: “i can now better direct patrons to services that i would otherwise not have known about, such as streaming audio and video and e-book readers.” another participant felt better connected to student patrons: “i understand the students better and the things they use on a daily basis.” staff and faculty also found their new skills applicable to work beyond patron interaction, and many listed specific examples of how they now use technology at work: ■■ “i have attended a few microsoft office classes that have helped me tremendously in doing my work more efficiently, whether it is for preparing monthly 210 information technology and libraries | december 2010 2. richard t. sweeny, “reinventing library buildings and services for the millennial generation,” library administration & management 19, no. 4 (2005): 170. 3. win shish and martha allen, “working with generationd: adopting and adapting to cultural learning and change,” library management 28, no. 1/2 (2006): 89. 4. sweeney, “reinventing library buildings,” 170. 5. shish and allen, “working with generation-d,” 96. 6. ibid., 98. 7. stephen abram, “social libraries: the librarian 2.0 pheonomenon,” library resources & technical services 52, no. 2 (2008): 21. 8. ibid. 9. jack m. maness “library 2.0 theory: web 2.0 and its implications for libraries,” webology 3, no. 2 (2006), http:// www.webology.ir/2006/v3n2/a25.html?q=link:webology.ir/ (accessed jan. 8, 2010). 10. ibid., under “blogs and wikis,” para. 4. 11. laurel ann clyde, “library weblogs,” library management 22, no. 4/5 (2004): 183–89; maness, “library 2.0. theory.” 12. see matthew m. bejune, “wikis in libraries,” information technology & libraries 26, no. 3 (2007): 26–38 ; darlene fichter, “the many forms of e-collaboration: blogs, wikis, portals, groupware, discussion boards, and instant messaging,” online: exploring technology & resources for information professionals 29, no. 4 (2005): 48–50; maness, “library 2.0 theory.” 13. mary ellen bates, “can i facebook that?” online: exploring technology and resources for information professionals 31, no. 5 (2007): 64; sarah elizabeth miller and lauren a. jensen, “connecting and communicating with students on facebook,” computers in libraries 27, no. 8 (2007): 18–22. 14. clyde, “library weblogs,” 183. 15. maness, “library 2.0 theory.” 16. fichter, “many forms of e-collaboration,” 50. 17. sweeney, “reinventing library buildings”; bates, “can i facebook that?” 18. bates, “can i facebook that?” 64. 19. abram, “social libraries,” 21. 20. ibid., 20. 21. ibid., 21. 22. shish and allen, “working with generation-d,” 90. 23. abram, “social libraries,” 20. 24. helene blowers and lori reed, “the c’s of our sea change: plans for training staff, from core competencies to learning 2.0,” computers in libraries 27, no. 2 (2007): 11. 25. helene blowers, learning 2.0, 2007, http://plcmclearning .blogspot.com (accessed jan. 8, 2010). 26. for examples, see ilana kingsley and karen jensen, “learning 2.0: a tool for staff training at the university of alaska fairbanks rasmuson,” the electronic journal of academic & special librarianship 12, no. 1 (2009), http://southernlibrarianship.icaap.org/content/v10n01/kingsley_i01.html (accessed jan. 8, 2010); beverly simmons, “learning (2.0) to be a social library,” tennessee libraries 58, no. 2 (2008): 1–8. 27. for examples, see christine mackenzie, “creating our future: workforce planning for library 2.0 and beyond,” australasian public libraries & information services 20, no. 3 (2007): 118–24; liisa sjoblom, “embracing technology: the deschutes public library’s learning 2.0 program,” ola quarterly 14, no. 2 (2007): 2–6; hui-lan titango and gail l. mason, “learning library 2.0: 23 things @ scpl,” library management 30, no. 1/2 feedback we have received, and continue to evaluate it and improve it based on survey results. the purpose of a second technology challenge would be to reinforce what staff and faculty have already learned, to teach new skills, and to help participants remember the importance of lifelong learning when it comes to technology. ■■ conclusion hbll’s self-directed technology challenge was successful in teaching technology skills and in promoting lifelong learning—as well as in fostering the development of librarian 2.0. abram listed key characteristics and duties of librarian 2.0, including learning the tools of web 2.0; connecting people, technology, and information; embracing “nontextual information and the power of pictures, moving images, sight, and sound”; using the latest tools of communication; and understanding the “emerging roles and impacts of the blogosphere, web syndicasphere, and wikisphere.”41 survey results indicated that hbll employees are on their way to developing these attributes, and that they are better equipped with the skills and tools to keep learning. like plcmc’s learning 2.0, the technology challenge could be replicated in libraries of various sizes. obviously an exact replication would not be feasible or appropriate for every library—but the basic ideas, such as the principles of andragogy and self-directed learning could be incorporated, as well as the daily time requirement or the use of surveys to determine weaknesses or interests in technology skills. whatever the case, there is a great need for library staff and faculty to learn emerging technologies and to keep learning them as technology continues to change and advance. but the most important benefit of a self-directed training program focusing on lifelong learning is effective employee development. the goal of any training program is to increase work productivity—and as employees become more productive and efficient, they are happier and more excited about their jobs. on the exit survey, one participant expressed initially feeling hesitant about the technology challenge and feared that it would increase an already hefty workload. however, once the challenge began, the participant enjoyed “taking the time to learn about new things. i feel i am a better person/librarian because of it.” and that, ultimately, is the goal—not only to create better librarians, but also to create better people. notes 1. robert h. mcdonald and chuck thomas, “disconnects between library culture and millennial generation values,” educause quarterly 29, no. 4 (2006): 4. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 211 ers,” journal of extension 33 (2005), http://www.joe.org/ joe/2006december/tt5.php (accessed jan. 8, 2010); wayne g. west, “group learning in the workplace,” new directions for adult and continuing education 71 (1996): 51–60. 33. ota et al., “needs of learners.” 34. d. r. garrison, “self-directed learning: toward a comprehensive model,” adult education quarterly 48 (1997): 22. 35. ibid. 36. ota et al., “needs of learners,” under “needs of the adult learner,” para. 4. 37. ota et al., “needs of learners”; west, “group learning.” 38. ota et al., “needs of learners,” under “needs of the adult learner,” para. 2. 39. ibid., para. 6. 40. ibid., para 7. 41. abram, “social library,” 21–22. (2009): 44–56; illinois library association, “continuous improvement: the transformation of staff development,” the illinois library association reporter 26, no. 2 (2008): 4–7; and thomas simpson, “keeping up with technology: orange county library embraces 2.0,” florida libraries 20, no. 2 (2007): 8–10. 28. sharan b. merriam, “andragogy and self-directed learning: pillars of adult learning theory,” new directions for adult & continuing education 89 (2001): 3–13. 29. malcolm shepherd knowles, the modern practice of adult education: from pedagogy to andragogy (new york: cambridge books, 1980). 30. jovita ross-gordon, “adult learners in the classroom,” new directions for student services 102 (2003): 43–52. 31. merriam, “pillars of adult learning”; ross-gordon, “adult learners.” 32. carrie ota et al., “training and the needs of learnappendix a. technology challenge “mini challenges” technology challenge participants had the opportunity to complete fifteen of twenty mini-challenges to become eligible to win a second gift certificate to the campus bookstore. below are some examples of technology mini-challenges: 1. read a library or a technology blog 2. listen to a library podcast 3. check out a book from circulation’s new self-checkout machine 4. complete an online copyright tutorial 5. catalog some books on librarything 6. read an e-book with sony ebook reader or amazon kindle 7. scan photos or copy them from a digital camera and then burn them onto a cd 8. backup data 9. change computer settings 10. schedule meetings with microsoft outlook 11. create a page or comment on a page on the library’s intranet wiki 12. use one of the library’s music databases to listen to music 13. use wordpress or blogger to create a blog 14. post a photo on a blog 15. use google reader or bloglines to subscribe to a blog or news page using rss 16. reserve and check out a digital camera, camcorder, dvr, or slide scanner from the multimedia lab and create something with it 17. convert media on the analog media racks 18. edit a family photograph using photo-editing software 19. attend a class in the multimedia lab 20. make a phone call using skype 212 information technology and libraries | december 2010 how did you like the technology challenge overall? answer response percent strongly disliked 0 0 disliked 0 0 liked 42 66 strongly liked 22 34 how did you like the reporting system used for the technology challenge (the techopoly game)? answer response percent strongly disliked 0 0 disliked 4 6 liked 41 64 strongly liked 19 30 would you participate in another technology challenge? answer response percent yes 64 100 no 0 0 what percentage of time did you spend using the following methods of learning? (participants were asked to allocate 100 points among the categories) category average response instructor-led large group 15.3 instructor-led small group 27 one-on-one instruction 3.5 web tutorial 12.8 self-learning (books, articles) 27.4 dvds .5 small group discussion 2.7 large group discussion 2.6 other 6.7 i am more likely to incorporate new technology into my home or work life. answer response percent strongly disagree 0 0 disagree 2 3 agree 49 77 strongly agree 13 20 i enjoy the process of making new technology a part of my work or home life. answer response percent strongly disagree 0 0 disagree 2 3 agree 37 58 strongly agree 24 38 after completing the technology challenge, my desire to learn new technologies has increased. answer response percent strongly disagree 0 0 disagree 7 11 agree 44 69 strongly agree 13 20 i feel i now learn new technologies more quickly. answer response percent strongly disagree 0 0 disagree 20 31 agree 39 61 strongly agree 5 8 appendix b. exit survey results bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 213 open-ended questions ■■ what would you change about the technology challenge? ■■ what did you like about the technology challenge? ■■ what technologies were you introduced to during the technology challenge that you now use on a regular basis? ■■ in what was do you feel the technology challenge has benefited you the most? how much more proficient do you feel in . . . category not any somewhat a lot hardware 31% 64% 5% software 8% 72% 20% internet resources 17% 68% 15% library technology 23% 64% 13% in order for you to succeed in your job, how important is keeping abreast of new technologies to you? answer response percent not important 1 2 important 22 34 very important 41 64 editorial board thoughts: technology and mission: reflections of a first-year college library director ed tallent information technology and libraries | december 2012 3 as i reflect on my first year as director for a small college library, several themes are clear to me, but perhaps none resonates as vibrantly as the challenges in managing technology, technology planning, and the never-ending need for technology integration, both within the library and the college. it is all-encompassing, involving every library activity and initiative. while my issues will naturally have a contextual flavor unique to my place of employment, i imagine they reflect issues that all librarians face (or have already faced). what is perhaps less unique is how these issues of library technology intersect with some very high priority college initiatives and challenges. and, given myriad reports on students’ ongoing ambivalent attitudes toward libraries (after everything we have done for them!), it still behooves us to keep working at this integration of the library into the learning and teaching process and to hitch our wagon to larger strategic missions. so, what issues have i faced? the campus portal vs. library web site: this issue is neither new nor unique, but is still is a tangled web of conflicting priorities and attitudes, campus politics and technology vision, the extent and location of technology support, and the flexibility of the campus portal or content management system (cms) and the people who direct it. it is not a question of any misunderstandings, as the need to market the library via the campus web site is obvious and the goal of personalized service is laudatory. yet, marrying the external marketing needs with the internal support needs is a difficult balance to achieve. the web offers a more dramatic entrée to the library than a portal/intranet, and portal technology is not perfect, as jacob neilson highlights in a recent post. the goal obviously is further complicated by the fact that the support needed to maintain a quality web presence--one that is well graphically interesting, vibrant and intuitive--is significant when one considers library web sites are rarely used a place to begin research by students and faculty. ed tallent (edtallent@ curry.edu) is director, levin library, curry college, milton, massachusetts. http://www.useit.com/alertbox/intranet-usability.html editorial board thoughts: technology and mission | tallent 4 the portal, on the other hand, promises a personalized approach and easier maintenance, but lacks the level of operability that would be desirable. the web presence can support both user needs and offer visitors a sense of the quality services and collections the library provides. so, at this writing, what we have is a litany of questions not yet resolved. mobile, tablets, and virtual services: the questions also abound in these areas. should we build our own mobile services, or contract out the development? do we (can we) focus on creating a leadership role for the library in the area of emerging technology, or wait for a coordinated institutional vision and plan to emerge? in the area of tablets, we are about to commence circulating ipads and anyone who has gone through the labyrinthian process just to load apps will know that the process gives one pause as to the value of such an initiative, and that is before they circulate and need to be managed. still, it is a technology initiative that demands review of library work flows, security, student training, and collection access. virtual services were at a fairly nascent state upon my arrival and have grown slowly, as they are being developed in a culture that stressed individual, hands-on, and personalized services. virtual services can be all that, but that needs to be demonstrated not only to the user but to the people delivering the service. the added value here is that the work engages us in valuable reflections on the way in which we work or should work. value of the library: i began my new position at time when the college was deeply engrossed in the issue of student recruitment, retention, and success. for my employer these are significant institutional identity issues, and the library is expected to document its contributions to student outcomes and success. not nearly enough has been done, though a working relationship with a new director of institutional research is developing and critical issues such information literacy, integrated student support, learning spaces, learning analytics, and the need for a data warehouse will be incorporated into the into the college’s strategic plan. the opportunity is there for the library to link with major college initiatives, for example, and make information literacy more than a library issue. citation management: now, here is a traditional library activity, the bane of many a reference service interaction and the undergraduate’s last-minute nightmare. a combination of technical, service and fiscal challenge revolve around the campus climate on the use of technology to respond to this quandary. what to do with faculty who believe strongly that the best way to learn this skill is by hand, not with any system that aims for interoperability and a desire to save the time of the user? for others, which tool should be used? should we not just go with a free one? while discipline differences will always exist, the current environment does present opportunities for the library to take a leadership role in defining what the possibilities are and ideally connecting the approach to appropriate and measurable learning outcomes and to the larger issue of academic integrity. information technology and libraries | december 2012 5 e-books, pda, article tokens: one of the unforeseen benefits of my moving to a small college library is that there is not the attachment to a print collection that exists in many/most research libraries. there is remarkable openness to experimenting with and committing to various methods of digital delivery of content. thus, we have been able to test myriad possibilities, from patron driven book purchasing, tokens for journal articles, and streaming popular films from a link in the library management system. this blurring of content, delivery, and functionality presents numerous opportunities for librarians to have conversations with departments of the future of collections. connecting with alumni: this is always an important strategic issue for colleges and universities and it seems as though there are promising emerging options for libraries to deliver database content to alumni, as vendors are beginning to offer more reasonable alumni-oriented packages. my library will be working with the appropriate campus offices next year to develop a plan for funding targeted library content for alumni as part of the college’s broader strategic activities to engage alumni. web design skills: while i understand the value that products like libguides can bring to the community, allowing content experts (librarians) to quickly and easily create template-driven web-based subject guides, i remain troubled by the lack of design skills librarians possess, and by the lack of recognition that good design can be just as important as good content. this is not a criticism, as we are not graphic designers. we have a sense of user needs, knowledge about content, and a desire to deliver, but i believe that products like this lead librarians to believe that good design for learning is easy. i do not claim to be an expert, but i know this is not the case. this approach does not translate into user friendly guides that hold to consistent standards. i think we need to recognize that we can benefit from non-librarian expertise in the area of web design. one opportunity that i want to investigate along these lines is to create student internships that would bring design skills and the student perspective to the work. a win-win, as this also supports the college’s desire for more internships and experiential learning for students. there is neither time nor space to address an even broader library technology issue on the near horizon, which will be another campus engagement moment, the future ils for the library. yet, maybe that should have been addressed first, since what i have read and heard, the new ilss will solve all of the above problems! mapping for the masses: gis lite and online mapping tools in academic libraries kathleen w. weessies and daniel s. dotson information technology and libraries | march 2013 23 abstract customized maps depicting complex social data are much more prevalent today than in the past. not only in formal published outlets, interactive mapping tools make it easy to create and publish custom maps in both formal and more casual outlets such as social media. this article defines gis lite, describes three commercial products currently licensed by institutions, and discusses issues that arise from their varied functionality and license restrictions. introduction news outlets from newspapers to television to internet these days are filled with maps that make it possible for readers to visualize complex social data. presidential election results, employment rates, and the plethora of data arising from the census of population are just a small sampling of social data mapped and consumed daily. the sharp rise in published maps in recent years has increased consumer awareness of the effectiveness of presenting data in map format and has raised expectations for finding, making and using customized maps. not just in news media, but in academia also, researchers and students have high interest in being able to make and use maps in their work. just a few years ago even the simplest maps had to be custom made by specialists. researchers and publishers had to seek out highly trained experts to make maps on their behalf. as a result, custom maps were generally only to be found in formal publications. the situation has changed partly because geographic information system (gis) software for geographic analysis and map making is more readily available than in years past. it does, however, remain specialized and wants considerable training for users to be proficient at even a basic level.1 this gap between supply and demand has been partly filled, especially in the last five years, by the growth of internet-based “gis lite” tools. while some basic tools are freely available on the internet, several tools are subscription-based and are licensed by libraries, schools and businesses for use. college and university libraries especially are quickly becoming a major resource for data visualization and mapping tools. the aim of this article is to describe several data-rich gis lite tools available in the library market and how these products have met or failed to meet the needs of several real-life college class kathleen w. weessies (weessie2@msu.edu), a lita member, is geosciences librarian and head of the map library, michigan state university, lansing. michigan. daniel s. dotson (dotson.77@osu.edu) is mathematical sciences librarian and science education specialist, associate professor, ohio state university libraries, columbus, ohio. mailto:weessie2@msu.edu mailto:dotson.77@osu.edu mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 24 situations. this is followed by a discussion of issues arising from user needs and restrictions posed by licensing and copyright. what is gis lite? students and faculty across the academic spectrum often discover that their topic has a geographic element to it and a map would enhance their work (paper, presentation, project, poster, article, book, thesis or dissertation, etc.). if their research involves data analysis, geospatial tools will draw attention to spatial patterns in the data that might not otherwise be apparent. every scholar with such needs must make a cost/benefit decision concerning gis: is his or her need greater than the cost in time and effort (sometimes money) necessary to learn or hire skills to produce map products? a full functioning gis, being a specialized system of software designed to work with geospatially referenced datasets, is designed to address all the problems above. the data may be analyzed and output into customized maps exactly to the researcher’s need. the traditional lowend solution available to non-experts, on the other hand, is colorizing a blank outline map, either with hand-held tools (markers, colored pencils, etc.) or on a computer using a graphic editing program. the profusion of web mapping options dangles tantalizingly with possibility, and occasionally (and increasingly) is able to provide an output that illustrates a useful point of users’ research in a professional enough manner to fill a need. in recent years the web has blossomed with map applications collectively called the “geoweb” or “geospatial web.” geoweb or geospatial web refers to the “emerging distributed global gis, which is a widespread distributed collaboration of knowledge and discovery.”2 some geoweb applications are well known street map resources such as google maps and mapquest. others are designed to deliver data from an organization, such as the national hazards support system (http://nhss.cr.usgs.gov), national pipeline mapping system (http://www.npms.phmsa.dot.gov/publicviewer), and the broadband map (http://www.broadbandmap.gov). a few tools focus on map creation and output such as arcgis online (http://www.arcgis.com/home/webmap/viewer.html) and scribble maps (http://www.scribblemaps.com). the newest subgenre of the geoweb consists of participatory mapping sites such as openstreet map (http://www.openstreetmap.org), did you feel it? (http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi), and ushahidi (http://community.ushahidi.com/deployments). the geoweb literature is small but growing. 3 elwood reviewed published research on the geographic web.4 the geoweb literature tends to focus on creation of mappable data and delivery of geoweb services.5 in these the map consumer only appears as a contributor of data. very little has been written about users’ needs from the geoweb. the term gis lite has arisen among map and gis librarians to describe a subset of geoweb applications. gis lite is useful to library patrons lacking specialized gis training but who wish to conduct some gis and map-making activities on a lower learning curve. for the purpose of this article, gis lite will refer to applications, usually web-based, which allow users to manipulate geospatial data and create map outputs without programming skills or training in full gis software. http://nhss.cr.usgs.gov/ http://www.npms.phmsa.dot.gov/publicviewer http://www.broadbandmap.gov/ http://www.arcgis.com/home/webmap/viewer.html http://www.scribblemaps.com/ http://www.openstreetmap.org/ http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi http://community.ushahidi.com/deployments information technology and libraries | march 2013 25 while many geoweb applications allow only low-level output options, gis lite will provide an output intended to be used in activities or rolled into a gis for further geospatial processing. in libraries, gis lite is closely allied with data and statistics resources. data and statistics librarianship have already been discussed as disciplines in the literature such as by hogenboom6 and gray.7 new technologies and access to deeper data resources such as the ones presented here have raised the bar for librarians’ responsibilities for curating, serving, and aiding patrons in its use. rather than be passive shepherds of information resources, librarians are now active participants and even information partners. librarians with map and gis skills similarly can directly enhance the quality of student scholarship across academic disciplines.8 the gis lite resources, however, need not remain specialized tools of map and gis librarians. librarians working in disciplines across the academic spectrum may incorporate them into their arsenal of tools to meet patron needs. data visualization tools a growing number of academic libraries have licensed access to online data providers. the following data tools contain enough gis lite functionality to aid patrons in visualizing and manipulating data (primarily social data) and creating customized map outputs. three of the more powerful commercial products described here are social explorer, simplymap, and proquest statistical datasets. social explorer licensed by oxford university press, social explorer provides selected data from the us decennial census 1790 to 2010, plus american community survey 2006 through 2010.9 the interface enables either retrieval of tabular data or visualization of data in an interactive map. as the user selects options through pull-down menus, the map automatically refreshes to reflect the chosen year and population statistics. the level of geography depicted defaults to county level data. if a user zooms in to an area smaller than a county, then data refreshes to smaller geographies such as census tracts if they are available at that level for that year. output is in the form of graphic files suitable for sharing in a computer presentation (see figure 1). one advantage of social explorer is that it utilizes historic boundaries as they existed for states, territories, counties, and census tracts for each given year. social explorer utilizes data and boundary files generated by the national historical gis (nhgis) based at the university of minnesota in collaboration with other partners. the creation of these historical boundaries was a significant undertaking and accomplishment.10 custom tables of data and the historic geographic boundaries may also be retrieved and downloaded for use from an affiliated engine through the nhgis website (http://www.nhgis.org). a disadvantage of this product is that the tool, while robust, does not completely replicate all the data available in the original paper census volumes. also, historical boundaries have not been created for city or township-level data. the final map layout is not customizable either in the location of title and legend or in the data intervals. http://www.nhgis.org/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 26 figure 1: map depicting population having four or more years of college, 1960 (source: social explorer, 2012; image used with permission) simplymap simplymap (http://geographicresearch.com/simplymap) is a product of geographic research. this powerful interface brings together public and licensed proprietary data to offer a broad array of 75,000 data variables in the united states. us census data are available 1980–2010 normalized to the user’s choice of either year 2000 or year 2010 geographies. numerous other licensed datasets primarily focus on demographics and consumer behavior, which makes it popular as a marketing research tool. each user establishes a personal login which allows created maps and tables to persist from session to session. upon creating a map view, the user may adjust the smaller geographic unit at which the theme data is displayed and also may adjust the data intervals as desired. the user creates a layout, adjusting the location of the map legend and title before exporting as a graphic or pdf (see figure 2). data are also exportable as gis-friendly shapefiles. http://geographicresearch.com/simplymap information technology and libraries | march 2013 27 the great advantage of this product is the ability to customize the data intervals. this makes it possible to filter the data and display specific thresholds meaningful to the user. for instance if a user needs to illustrate places where an activity or characteristic is shared by “over half” of the population, then one may change the map to display two data categories: one for places where up to 50 percent of the population shares the characteristic and a second category for places where more than 50 percent of the population shares the characteristic. another potential advantage is that all local data have been allocated pro rata so that all variables, regardless of their original granularity, may be expressed by county boundaries, by zip code boundaries, or by census tract. a disadvantage of the product is the lack of historical boundaries to match historical data. figure 2. map depicting census tracts that have more than 50% black population (yellow line indicates cincinnati city boundary) (source: simplymap, 2012; image used with permission) mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 28 proquest statistical datasets statistical datasets was developed by conquest systems inc. and is licensed by proquest. this product also mingles a broad array of several thousand public and licensed proprietary datasets, including some international data, in one interface. the user may retrieve data and view it in tabular or chart form. if the data have a geographic element, then the user may switch the view to a map interface. the resulting map may be exported as an image. the data may also be exported to a gis-friendly shapefile format. this product offers more robust data manipulation than the other products, in that the user may perform calculations between any of the data tables and create a chart or map of the created data element (see figure 3). statistical datasets, however, has more simplistic map layout capabilities than the other products. figure 3. map of sorghum production, by country, in 2010 (source: proquest statistical datasets, 2012; image used with permission) case studies the following three case studies are of college classroom situations in which students utilized maps or map making as part of the assigned course work. the above mapping options are assessed for how well they met the assignment needs. information technology and libraries | march 2013 29 case study 1 an upper level statistics course at the ohio state university requires students to create maps using sas (http://www.sas.com). while many may not associate the veteran statistical software package with creating maps, this course uses it along with sas/graph to combine statistical data with a map. the project requires data articulated at the county level in ohio, which the students then combine into multi-county regions. the end result is a map with regions labeled and rendered in 3d according to the data values. an example of the type of map that could be produced from such data using sas can be seen in figure 4. figure 4. map of observed rabbit density in ohio using sas, sas/graph, and mail carrier survey data,1998 (image used with permission) while the data are provided in this course, students could potentially seek help from the library in a traditional way to find numerical data expressed at a county level. the librarian would guide http://www.sas.com/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 30 patrons through appropriate avenues to locate data such as to the three products listed above. all three options contain numerous data variables for ohio at the county level. because the students are further processing the data elsewhere (in this case sas), the output options of the three products are less important. ultimately the availability of data on a desired subject would be the primary determinant for choosing one of the three gis lite options discussed here. social explorer will export the data in tabular form which can then be ingested into sas. simplymap and proquest statistical datasets would both be a bit easier, though, because both packages allow the user to export the data as shapefiles which are directly imported into sas/graph as both boundary files and joined tabular data. case study 2 a first year writing class at michigan state university has a theme of the american ethnic and racial experience. assignments all relate to a student’s chosen ethnic group and geographic location from approximately 1880 to 1930. assignments build upon each other to culminate in a final semester paper. students with ancestors living in the united states at that time are encouraged to examine their own family’s ethnicity and how they fit in their geographic context. otherwise, students may choose any ethnic group and place of interest. maps are a required element in the assignments. maps that display historical census data help students place the subject ethnic group into the larger county, state, and national context over the time frame. the students can see, for instance, if their subject household was part of an ethnic cluster or an outlier to ethnic clusters. the parameters for finding data and maps are generous and open to each student’s interpretation. the wish is for students to find social statistics and maps that are insightful to their topic and will help them tell their story. of the three statistical resources considered above, currently the only useful one is social explorer because it covers the time period studied by the class. the students may map several social indicators at the county level across several decades and compare their local area to the region and the nation. also they may save their maps and include them in their papers (properly credited). case study 3 “the ghetto” is an elective geography class restricted to upperclassman at michigan state university. in the semester project, students analyze the spatial organization and demographic variables of “ghetto” neighborhoods in a chosen city. a ghetto is defined as neighborhoods that have a 50 percent or higher concentration of a definable ethnic group. since black and white are the only two races consistently reported at the census tract level for all the years covered by the class (1960 through 2010) the students necessarily use that data for their projects. data needs for the class are focused and deep. the students specifically need to visualize us census data from 1960 through 2010 at the census tract level within the city limits for several social indicators. indicators include median income, median housing value, median rent, educational attainment, income, and rate of unemployment. the instructor has traditionally required use of the paper census volumes and students created hand-made maps that highlight information technology and libraries | march 2013 31 tracts in the subject city that conformed to the ghetto definition and those that did not for each of the census years covered. computer-retrieved data and computer-generated maps would be acceptable, but at the time of this writing no gis lite product is able to make all the maps that meet the specific requirements of this class. social explorer covers all of the date range and provides data down to the tract level. however it does not provide an outline of the city limits and does not provide all the data variables required in the assignment. simplymap will only work for 2000 through 2010 because tract boundaries are only available for those two years even though the data go back to 1980. simplymap does provide two excellent features though: it is the only product that allows an overlay of the (modern) city boundary on top of the census tract map, ands it is the only product that allows manipulation of the data intervals. students may choose to break the data at the needed 50 percent mark, while the other products utilize fixed data intervals not useful to this class. proquest statistical datasets can compute the data into two categories to create the necessary data intervals; however census data are only available beginning with census 2000. map products for user needs these three real-life class scenarios illustrate how the rich and seemingly duplicative resources of the library can range from perfectly suitable to perfectly useless depending on each project’s exact needs. the appropriateness of any given tool can only be assessed fairly if the librarian is familiar with all the “ins and outs” of every product. the geoweb and gis lite tools mentioned throughout this article are summarized in table 1. the suitability of gis lite tools will be further affected by the following issues. historical boundaries the range and granularity of data tools are subject to factors sometimes at odds with what a researcher would wish to have. at this time, for instance, many historical resources provide data only as detailed as the county level. county level data are available largely due to the efforts of the nhgis mentioned above and the newberry library’s atlas of county boundaries project (http://publications.newberry.ort/ahcbp). far fewer resources provide historical data at smaller geographies such as city, township, or census tract levels. this is because the smaller the geographies get, the exponentially more there are to create and for map interfaces to process. from the well-known resource city and county data book,11 it is easy enough to retrieve us city data. the historical boundaries of every city in the united states, however, have not been created. this is because city boundaries are much more dynamic than county boundaries and there is no centralized authoritative source for their changes over time. two of the three case studies presented here utilized historic data. this isn’t necessarily a representative proportion of user needs; librarians should assess data resources in light of their own patrons’ needs. normalization two equally valid data needs concerning any kind of time series data concern changing geographic boundaries. census tracts, for instance, provide geographic detail roughly at the neighborhood level designed by the bureau of census to encompass approximately 2,500 to 8,000 http://publications.newberry.ort/ahcbp mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 32 people.12 because people move around and the density of population changes from decade to decade, so the configuration and numbering of tracts change over time. some scholars will wish to see the data values in the tracts as they were drawn at the time of issue. in this situation, a neighborhood of interest might belong to different tracts over the years or even be split between two or more tracts. other scholars focused on a particular neighborhood may wish to see many decades of census data re-cast into stable tracts in order to be directly comparable. data providers will take one approach or the other on this issue, and librarians will do well to be aware of their choice. license restrictions a third issue affecting use of these products is the ability to use derived map images, not only in formal outlets such as professional presentations, articles, books, and dissertations, but also informal outlets such as blogs and tweets. for the most part gis lite vendors are willing—even pleased—to see their products promoted in the literature and in social media. the vendors uniformly wish any such use to be properly credited. the license that every institution signs when acquiring these products will specify allowed and disallowed activities. the license, fixated on disallowing abuse or resale or other commercialization of the data, might leave a chilling effect on users wishing to use the images in their work. if a user is in any doubt as to the suitability of an intended use of a map, he or she should be encouraged to contact the vendor to seek permission for its use. as data resources grow and become more readily usable, the possibility for scholarly inquiry grows. librarians with familiarity with gis lite tools may partner with their patrons and guide them to the best resources. information technology and libraries | march 2013 33 table 1: a selection of geoweb and gis lite tools and their output options tool name url free or fee electronic output options* geoweb tools atlas of historical county boundaries http://publications.newberry.org/ahcbp/ free spatial data as shapefile, kmz; image as pdf did you feel it? http://earthquake.usgs.gov/earthquakes/dyfi/ free tabular data as txt, xml. image as jpg, pdf, ps google maps https://maps.google.com/ free none mapquest http://www.mapquest.com free none national broadband map http://www.broadbandmap.gov/ free image as png national hazards support systems (usgs) http://nhss.cr.usgs.gov/ free image as pdf, png national pipeline mapping system https://www.npms.phmsa.dot.gov/publicview er/ free image as jsf openstreetmap http://www.openstreetmap.org/ free tabular data as xml; image as png, jpg, svg, pdf ushahidi community deployments http://community.ushahidi.com/deployments/ free image as jpg gis lite tools arcgis online http://www.arcgis.com limited free options; access is part of institutional site license spatial data as arcgis 10; image as png (in arcexplorer) proquest statistical datasets http://cisupa.proquest.com/ws_display.asp?filt er=statistical%20datasets%20overview fee tabular data as excel, pdf, delimited text, sas, xml; spatial data as shapefile; image may be copied to clipboard sas/graph http://www.sas.com/technologies/bi/query_re porting/graph/index.html fee image as pdf, png, ps, emf, pcl scribble maps http://www.scribblemaps.com/ free spatial data as kml, gpx; image as jpg simplymap http://geographicresearch.com/simplymap fee tabular data as excel, csv, dbf, spatial data as shapefile; image as pdf, gif * does not include taking a screen shot of the monitor or making a durable url to the page http://publications.newberry.org/ahcbp/ http://earthquake.usgs.gov/earthquakes/dyfi/ https://maps.google.com/ http://www.mapquest.com/ http://www.broadbandmap.gov/ http://nhss.cr.usgs.gov/ https://www.npms.phmsa.dot.gov/publicviewer/ https://www.npms.phmsa.dot.gov/publicviewer/ http://www.openstreetmap.org/ http://community.ushahidi.com/deployments/ http://www.arcgis.com/ http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.scribblemaps.com/ http://geographicresearch.com/simplymap information technology and libraries | march 2013 34 references 1. national research council, division on earth and life studies, board on earth sciences and resources, geographical sciences committee, learning to think spatially (washington, d.c.: f academies press, 2006): 9. 2. pinde fu and jiulin sun, web gis: principles and applications (redlands, ca: esri press, 2011): 15. 3. for good overviews of the geoweb, see muki haklay, alex singleton and chris parker, “web mapping 2.0: the neogeography of the geoweb,” geography compass 2, no. 6 (2008): 20112039, http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x; jeremy w crampton, “cartography: maps 2.0,” progress in human geography 33, no. 1 (2009): 91-100, http://dx.doi.org/10.1177/0309132508094074. 4. sarah elwood, “geographic information science: visualization, visual methods, and the geoweb,” progress in human geography 35, no. 3 (2010): 401-408, http://dx.doi.org/10.1177/0309132510374250. 5. songnian li; suzana dragićević, and bert veenendaal eds, advances in web-based gis, mapping services and applications (boca raton, fl: crc press, 2011). 6. hogenboom, karen, carissa phillips, and merinda hensley, "show me the data! partnering with instructors to teach data literacy," in declaration of interdependence: the proceedings of the acrl 2011 conference, march 30-april 2, 2011, philadelphia, pa, ed. dawn m. mueller. (chicago: association of college and research libraries, 2011), 410-417, http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_ me_the_data.pdf. 7. ann s. gray, “data and statistical literacy for librarians,” iassist quarterly 28 no. 2/3 (2004): 24-29, http://www.iassistdata.org/content/data-and-statistical-literacy-librarians. 8. kathy weimer, paige andrew, and tracey hughes, map, gis and cataloging / metadata librarian core competencies (chicago: american library association map and geography round table, 2008), http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf. 9. social explorer. http://www.socialexplorer.com/pub/home/home.aspx. 10. catherine fitch and steven ruggles, building the national historical geographic information system historical methods 36, no. 1 (2003): 41-50, http://dx.doi.org/10.1080/01615440309601214 . 11. u. s. bureau of census. county and city data book, http://www.census.gov/prod/www/abs/ccdb.html. http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x http://dx.doi.org/10.1177/0309132508094074 http://dx.doi.org/10.1177/0309132510374250 http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.iassistdata.org/content/data-and-statistical-literacy-librarians http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf http://www.socialexplorer.com/pub/home/home.aspx http://dx.doi.org/10.1080/01615440309601214 http://www.census.gov/prod/www/abs/ccdb.html information technology and libraries | march 2013 35 12. census tracts and block numbering areas. http://www.census.gov/geo/www/cen_tract.html. acknowledgments the authors wish to thank dr. michael fligner, dr. clarence hooker, and dr. joe darden for permission to use their courses as case studies. http://www.census.gov/geo/www/cen_tract.html social contexts of new media literacy: mapping libraries elizabeth thorne-wallington information technology and libraries | december 2013 53 abstract this paper examines the issue of universal library access by conducting a geospatial analysis of library location and certain socioeconomic factors in the st. louis, missouri, metropolitan area. framed around the issue of universal access to internet, computers, and technology (ict) for digital natives, this paper demonstrates patterns of library location related to race and income. this research then raises important questions about library location, and, in turn, how this impacts access to ict for young people in the community. objectives and purpose the development and diffusion of new media and digital technologies has profoundly affected the literacy experiences of today’s youth.1 young people today develop literacy through a variety of new media and digital technologies.2 the dissemination of these resources has also allowed for youth to have literacy-rich experiences in an array of different settings. ernest morrell, literacy researcher, writes, as english educators, we have a major responsibility to help future english teachers to redefine literacy instruction in a manner that is culturally and socially relevant, empowering, and meaningful to students who must navigate a diverse and rapidly changing world.3 this paper will explore how mapping and geographic information systems (gis) can help illuminate the cultural and social factors related to how and where students access and use new media literacies and digital technology. libraries play an important role in encouraging new media literacy development;4 yet access to libraries must be understood through social and cultural contexts. the objective of this paper is to demonstrate how mapping and gis can be used to provide rigorous analysis of how library location in st. louis, missouri, is correlated with socioeconomic factors defined by the us census including median household income and race. by using gis, the role of libraries in providing universal access to new media resources can be displayed statistically, both challenging and confirming previously held beliefs about library access. this analysis raises new questions about how libraries are distributed across the st. louis area and whether they truly provide universal and equal access. elizabeth thorne-wallington (ethornew@wustl.edu) is a doctoral student in the department of education at washington university in st. louis. mailto:ethornew@wustl.edu information technology and libraries | december 2013 54 literature review advances in technologies are transforming the very meaning of literacy.5 traditionally, literacy has been defined as the ability to understand and make meaning of a given text.6 the changing global economy requires a variety of digital literacies, which schools do not provide.7 instead, young people acquire literacy through a multitude of inand out-of-school experiences with new media and digital technology.8 libraries play a vital role in supporting new media literacy by offering out-of-school access and experiences. to understand the role that libraries play in offering access to new media literacy technologies, a few key concepts must be defined. first is the concept of the digital native. those born around 1980, who have essentially grown up with technology, are known as digital natives.9 digital natives are expected to have a base knowledge of technology and to be able to pick up and learn new technology quickly because of that base knowledge. digital natives have been exposed to technology from a young age and are adept at using a variety of digital technologies. the suggestion is that young people can quickly learn to make use of the new media and technology available in a specific location. key to any discussion of digital natives is the concept of the digital divide. the digital divide has been a central issue of education policy since the mid-1990s.10 early work on the digital divide was concerned primarily with equal access.11 more recently, however, the idea of a “binary digital divide” has been replaced by studies focusing on a multidimensional view of the digital divide.12 hargattai asserts that even among digital natives, there are large variations in internet skills and uses correlated with socioeconomic status, race, and gender.13 these variations call for a nuanced study examining social and cultural factors associated with new media literacy, including out-ofschool contexts. the concept of literacy and learning in out-of-school contexts has a strong historical context. hull and schultz provide a review of the theory and research on literacy in out-of-school settings.14 a variety of studies, including self-guided literacy activities, after-school programs, and reading programs were reviewed, and the significance of out-of-school learning opportunities was supported by these studies. importantly for the research here, research has also been done on the use of digital technology in out-of-school settings. lankshear and knobel examine out-of-school practices extensively with their work on new literacies.15 lankshear and knobel also make clear the complexity of out-of-school experiences among young people. students participate in nontraditional literacy activities such as blogging and remix in a variety of out-of-school contexts, from home computers to community-based organizations to libraries. most importantly, lankshear and knobel found that the students did connect what they learned in the classroom with these out-of-school activities. the connection between out-of-school literacies and in-school learning has also been studied. education policy researcher allan luke writes, the redefined action of governments . . . is to provide access to combinatory forms of enabling capital that enhance students’ possibilities of putting the kinds of practices, texts, and discourses social contexts of new media literacies: mapping libraries| thorne-wallington 55 acquired in schools to work in consequential ways that enable active position taking in social fields.16 collins writes about this relationship between inand out-of-school literacies. collins writes in her case study that there are a variety of “imports” and “exports” in terms of practices. that is, skill transaction works in both directions, with skills learned out of school used in school, and skills learned in school used out of school.17 skerett and bomer make this connection even more explicit when looking at adolescent literacy practices.18 their article examines how a teacher in an urban classroom drew on her students’ out-of-school literacies to inform teaching and learning in a traditional literacy classroom. the authors found that the teacher in their study was able to create a curriculum that engaged students by inviting them to use literacies learned in out-of-school settings. however, the authors write that this type of literacy study was taxing and time-consuming for both the teacher and the student. still, it is clear that connections between inand out-of-school literacies can be made. the role libraries play in making this connection has not been studied as extensively. yet it is clear that young people do use libraries to access technology. becker et al., found that nearly half of the nation’s 14 to 18 year olds had used a library computer within the past year. becker et al. additionally found that for poor children and families, libraries are a “technological lifeline.” among those below the poverty line, 61 percent used public library computers and the internet for educational purposes.19 tripp writes that libraries have long played an important role in helping people gain access to digital media tools, resources, and skills.20 tripp writes that libraries should capitalize on the potential of new media to engage young people. additionally, tripp argues that librarians need to develop skills to train young people to use new media. the idea that libraries are important in meeting the need is further supported by the recent grants, totaling $1.2 million, by the john d. and catherine t. macarthur foundation to build “innovative learning labs for teens” in libraries. this grant making was a response to president obama’s “educate to innovate” campaign, a nationwide effort to bring american students to the forefront in science and math.21 this literature review demonstrates that the body of research currently available focuses on digital natives and the digital divide, but that the research lacks the nuance needed to capture the complexity of social and cultural contexts surrounding the issue. this literature review further demonstrates both the importance of new media literacy and out-of-school learning, as well as the key role that libraries play in supporting these learning opportunities. the study provided here uses gis analysis to demonstrate important socioeconomic and cultural factors that surround libraries and library access. first, i describe the role of gis in understanding context. next, i describe the methods used in this paper. finally, i analyze the results and implications for the study. geographic information systems analysis in education there is a burgeoning body of research which uses geographic information systems (gis) to better understand socioeconomic and cultural contexts of education and literacy issues.22 information technology and libraries | december 2013 56 there are several key works that link geography and social context. lefebvre defines space as socially produced, and he writes that space embodies social relationships shaped by values and meanings. he describes space as a tool for thought and action or as a means of control and domination. lefebvre writes that there is a need for spatial reappropriation in everyday urban life. the struggle for equality, then, is central to the “right of the city.”23 the unequal distributions of resources in the city help to maintain social and economic advantaged positions, which is important to the analysis here of library access. this unequal distribution of resources continues today. de souza briggs and others write that there is clear geographical segregation in american cities today.24 this is seen in housing choice, racial attitudes, and discrimination, as well as metropolitan development and policy coalitions. in the conclusion of his book, de souza briggs writes that housing choice is limited for low-ses minorities, and these limitations produce myriad social effects. again, this finding is important to the contexts of where libraries are located. jargowsky writes of similar findings.25 like de souza briggs, jargowsky focuses on the role that geography plays in terms of neighborhood and poverty. jargowsky even finds social characteristics of these neighborhoods: there is a higher prevalence of single-parent families, lower educational attainment, a higher level of dropouts, and more children living in poverty. important here, though, is that all such characteristics can be displayed geographically, which means that varying housing, economic, and social conditions can be displayed with library locations. soja goes beyond the geographic analysis offered by de souza briggs and jargowsky and writes that space should be applied to contemporary social theory.26 soja found that spatiality should be used in terms of critical human geography to advance a theory of justice on multiple levels. he writes that injustice is spatially construed and that this spatiality shapes social injustice as much as social injustice shapes a specific geography. this understanding, then, shapes how i approach the study of new media literacies as influenced by cultural and social factors. these factors are particularly prevalent in the st. louis, missouri, area. colin gordon reiterates the arguments of lefbvre jargowsky and de souza briggs in arguing that st. louis is a city in decline.27 by providing maps that project housing policies, gordon is able to provide a clear link between historical housing policies such as racial covenants and current urban decline. gordon is able to show that vast populations are moving out of st. louis city and into the county, resulting in a concentration of minority populations in the northern part of the city. gordon argues that the policies and programs offered by st. louis city have only exacerbated the problem and led to greater blight.28 in terms of literacy, morrell makes the most explicit connection between literacy and mapping with a study that used a community-asset mapping activity to make the argument that teachers need to make an explicit connection between literacy at school and the new literacies experienced in the community.29 the significance of this is that gis can be used to illuminate the social and economic contexts of new media literacy opportunities as well, which in turn could help inform social dialogue about the availability of and access to informal education opportunities for new media literacy. social contexts of new media literacies: mapping libraries| thorne-wallington 57 methods and data the gis analysis performed here concerns library locations in the st. louis metropolitan area, including st. louis city and st. louis county. the st. louis metropolitan area was chosen because of past research mapping the segregation of the city, largely because the city and county are so clearly segregated racially and economically along the north–south line. this segregation is striking when displayed geographically and illuminating when mapped with library location. maps were created using tiger files (www.census.gov/geo/maps-data/data/tiger.html) and us census data (http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml), both freely available to the public via internet download. libraries were identified using the st. louis city library’s “libraries & hours” webpage (www.slpl.org/slpl/library/article240098545.asp), the st. louis county library “locations & hours” webpage (www.slcl.org/about/hours_and_locations), google maps (www.maps.google.com), and the yellow pages for the st. louis metropolitan area (www.yellowpages.com). the address of each library was entered into itouchmap (http://itouchmap.com ) to indentify the latitude and longitude of the library. a spreadsheet containing this information was then loaded into the gis software and displayed as x–y data. the maps were then displayed using median household income, african american population, and latino and hispanic population as obtained from the us census at census tract level. for median household income, the data was from 1999. for all other census data, the year was 2010. for district-level data, communication arts data from the missouri department of elementary and secondary education (modese) website (http://dese.mo.gov/dsm ), was entered into microsoft excel, and then displayed on the maps. the data is district level, representing all grades tested for communication arts across all district schools. the modese data was from 2008, the most recent year available at the time the analysis was performed. the communication arts data was taken from the missouri assessment program test. this test is given yearly across the state to all public school students. the state then collects the data and makes it available at the state, district, and school level. the data used here is district-level data. scores are broken into four categories: advanced, proficient, basic, and below basic. the groups for proficient and advanced were combined to indicate the district’s success on the map test. these are the two levels generally considered acceptable or passing by the state.30 before looking at patterns of library location and these socioeconomic and educational factors, density analysis was performed on the library locations using esri arcgis software, version 9.0, to analyze whether clustering was statistically significant. this analysis was used to demonstrate whether libraries were clustered in a statistically significant pattern, or if location was random. the nearest neighbor tool of arcgis was used to determine if a set of features, in this case the libraries, shows a statistically significant level of clustering. this was done by measuring the distance from each library to its single nearest neighbor and calculating the average distance of all the measurements. the tool then created a hypothetical set of data with the same number of features, but placed randomly within the study area. then an average distance was calculated for these features and compared to the real data. that is, a hypothetical random set of locations was compared to the set of actual library locations. a near-neighbor index was produced, which expresses the ratio of the observed distance divided by the distance from the hypothetical data, thus comparing the two sets.31 this score was then standardized, producing a z-score, reported below in the results section. http://www.census.gov/geo/maps-data/data/tiger.html http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml http://www.slpl.org/slpl/library/article240098545.asp http://www.slcl.org/about/hours_and_locations http://www.maps.google.com/ http://www.yellowpages.com/ http://dese.mo.gov/dsm information technology and libraries | december 2013 58 results and conclusions using the nearest neighbor tool produced a z-score of -3.08, showing that the data is clustered beyond the 0.01 significance level. this means that there is a less than 1 percent chance that library location would be clustered to this degree based on chance. knowing, then, that library location is not random, we can now examine socioeconomic patterns of the areas where libraries are located. figure 1 shows library location and population of individuals under the age of 18 at the census tract level for st. louis city and county, using data from the 2010 us census. to clarify, the city and county are divided by the bold black line crossing the middle of the map, the only such boundary in figure 1, where the county is the larger geographic area. library location is important because previous research shows that young people use informal learning environments to access new media technologies,32 and libraries are a key informal learning environment.33 this map demonstrates, however, that libraries are not located in census tracts with the highest populations of individuals under the age of 18 in st. louis city and county. in fact, for all the tracts with the highest number of individuals under the age of 18, there are zero libraries located in these tracts. this is especially concerning given that young people may have less access to transportation, so their access of facilities in neighboring census tracts may be quite limited. figure 1. number of individuals under the age of 18 by census tract and library location in st. louis city and st. louis county. source: 2010 us census. social contexts of new media literacies: mapping libraries| thorne-wallington 59 figure 2 includes maps showing library locations in st. louis city and county in terms of poverty and race by census tract level, as well as act score by district, represented by the bold lines, where st. louis city is represented by a single district, the st. louis public school district. median household income in indicated by the gray shading, with white areas not having data available. first, census tracts with low median household income are clustered in the northern part of the city and county. there are four libraries in the northern half of the city, and eleven libraries in the central and southern parts of the city. there are fewer libraries in the census tracts with low median household income. figure 2. median household income, act score, and library location, st. louis city and county. source: 2010 us census and missouri department of elementary and secondary education, 2010, www.modese.gov. while the nearest neighbor analysis has already demonstrated the libraries are significantly clustered, the maps seem to suggest the pattern of that clustering. this is especially concerning given the report by becker that 61 percent of those living below the poverty line use libraries to access the internet.34 first, in terms of median household income, it does appear that many libraries are located in higher income areas of the city and county. while the libraries appear to be http://www.modese.gov/ information technology and libraries | december 2013 60 clustered centrally, and particularly near major freeways, there appear to be libraries in many of the higher income census tracts. adding to the concern of location is that of access to these library locations. for those living below the poverty line, transportation is often a prohibitive cost, so access from public transportation should also be a major concern for libraries. additionally, in a pattern repeated in figure 4, the location of libraries does not appear to have any effect on act scores, but there are clearly higher act scores in wealthier areas of the city and county. this is not to say that there is a statistical relationship between act score and library location, but rather to look at the spatial patterns of each in order to note similarities and differences in these patterns. figure 3 shows library location by race, including african american or black and hispanic or latino. first, it is important to note that patterns of race in st. louis have been carefully documented by gordon.35 the st. louis area is clearly a highly segregated region, which makes the social contexts of libraries in the st. louis area even more important. this map demonstrates that while there are many libraries in the northern parts of st. louis city and county, none of these libraries is located in the census tracts with the highest populations of those identifying themselves as african american or black in either the city or county. this raises questions about the inequality of access to the libraries. on the other hand, the densest populations of those identifying themselves as hispanic or latino are in the southern part of the city, but not the county. there is a library located in one of those tracts. it appears the areas with higher concentrations of african americans or blacks have fewer libraries, while areas with the higher concentrations of latinos or hispanics are located in the southern parts of the city that do have libraries. it is important to note, however, that the concentrations of latinos and hispanics is quite low, and those areas are majority white census tracts. as noted above, beyond location, access from public transportation is also an important issue. at the same time, the clustering and patterns shown on these maps raise key issues about access based on income and race. libraries are not located in areas with low median household income or in areas with high concentrations of african americans or blacks. this raises serious questions about why libraries are located where they are, and whether the individuals located in these areas have equal access to library resources, particularly new media technologies. social contexts of new media literacies: mapping libraries| thorne-wallington 61 figure 3. african american or black and hispanic, library location, st. louis city and county. source: 2010 us census. the final map raises a slightly different issue, one of test scores and student achievement. figure 4 shows library location by percent proficient or advanced on the missouri achievement program test by district. beyond the location of the libraries, one factor that stands out is that the areas with the lowest percent proficient or advanced are also the areas with the lowest median household income and the highest percentage of those identifying as african american or black. here an interesting pattern emerges. while there are many libraries in the city and northern part of the county, the percent proficient or advanced on the communication arts portion of exam is quite low (20–30 percent). on the other hand, in the western part of the county, there are few libraries, but the percent proficient or advanced is at its highest level. this suggests that there may not be a strong connection between achievement on the map exam and library location, similar to the lack of relationship seen in between act average score and library location in figure 2. at the same time, there does appear to be a correlation between race, income, and test scores. this correlation is noted throughout the literature on student achievement.36 clearly, these maps raise important questions such as how and why libraries are located in a certain area, who uses libraries in a given area, as well as what other informal learning environments and community assets exist in these areas. what is made clear by the maps, though, is that gis can be used as a tool to help understand the context of new media literacy. information technology and libraries | december 2013 62 figure 4. proficient or advanced, communication arts map by district, 2009, and library location. source: missouri department of elementary and secondary education, 2010, www.modese.gov. significance these results demonstrate that gis can be used to illuminate the social, cultural, and economic complexity that surrounds informal learning environments, particularly libraries. this can help demonstrate not only where young people have the opportunity to use new media literacy, but also the complex contextual factors surrounding those opportunities. paired with traditional qualitative and quantitative work, gis can provide an additional lens for understanding new media literacy ecologies, which can help inform dialogue about this topic. for the results of this study, there does appear to be a relationship between library location and race and income. this study illuminates the complex contextual factors affecting libraries. because of the important role that libraries can play in offering young people out of school learning opportunities, particularly in terms of access to new media resources, these contextual factors are important to ensuring equal access and opportunity for all. http://www.modese.gov/ social contexts of new media literacies: mapping libraries| thorne-wallington 63 references 1. ernest morrell, “critical approaches to media in urban english language arts teacher development,” action in teacher education 33, no. 2 (2011): 151–71, doi: 10.1080/01626620.2011.569416. 2. mizuko ito et al., hanging out, messing around, and geeking out: kids living and learning with new media (cambridge: mit press/macarthur foundation, 2010). 3. morrell, “critical approaches to media in urban english language arts teacher development.” 4. lisa tripp, “digital youth, libraries, and new media literacy,” reference librarian 52, no. 4 (2011): 329–41, doi: 10.1080/02763877.2011.584842. 5. gunther kress, literacy in the new media age (london: routledge, 2003). 6. ibid. 7. donna e. alvermann and alison h. heron, “literacy identity work: playing to learn with popular media,” journal of adolescent & adult literacy 45, no. 2 (2001): 118–22. 8. colin lankshear and michele knobel, new literacies: everyday practices and classroom learning (maidenshead: open university press, 2006). 9. john palfrey and urs gasser, born digital: understanding the first generation of digital natives (new york: perseus, 2009). 10. karin m. wiburg, “technology and the new meaning of educational equity,” computers in the schools 20, no. 1–2 (2003): 113–28, doi: 10.1300/j025v20n01_09. 11. rob kling, “learning about information technologies and social change: the contribution of social informatics,” information society 16, no. 3 (2000): 212–24. 12. james r. valadez and richard p. durán, “redefining the digital divide: beyond access to computers and the internet,” high school journal 90, no. 3 (2007): 31–44, http://www.jstor.org/stable/40364198. 13. eszter hargittai, “digital na(t)ives? variation in internet skills and uses among members of the ‘net generation,’” sociological inquiry 80, no. 1 (2010): 92–113, doi: 10.1111/j.1475682x.2009.00317.x. 14. glynda hull and katherine schultz, “literacy and learning out of school: a review of theory and research,” review of educational research 71, no. 4 (2001): 575–611, http://www.jstor.org/stable/3516099. 15. colin lankshear and michele knobel, new literacies. http://www.jstor.org/stable/40364198 http://www.jstor.org/stable/3516099 information technology and libraries | december 2013 64 16. allan luke, “literacy and the other: a sociological approach to literacy research and policy in multilingual societies,” reading research quarterly 38, no. 1 (2003): 132–41, http://www.jstor.org/stable/415697. 17. stephanie collins, “breadth and depth, imports and exports: transactions between the in-and out-of-school literacy practices of an ‘at risk’ youth,” in cultural practices of literacy: case studies of language, literacy, social practice, and power (mahwah, nj: lawrence erlbaum, 2007). 18. allison skerrett and randy bomer, “borderzones in adolescents literacy practices: connecting out-of-school literacies to the reading curriculum,” urban education 46, no. 6 (2011): 1256–79, doi: 10.1177/0042085911398920. 19. samantha becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 20. lisa tripp, “digital youth, libraries, and new media literacy.” 21. nora fleming, “museums and libraries awarded $1.2m to build learning labs,” education week (blog), december 7, 2012, http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarde d_12_million_to_build_learning_labs_for_youth.html. 22. see william f. tate iv and mark hogrebe, “from visuals to vision: using gis to inform civic dialogue about african american males,” race ethnicity and education 14, no. 1 (2011), 51– 71, doi: 10.1080/13613324.2011.531980; mark c. hogrebe and william f. tate iv, “school composition and context factors that moderate and predict 10th-grade science proficiency,” teachers college record 112, no. 4 (2010), 1096–1136; robert j. sampson, great american city: chicago and the enduring neighborhood effect (chicago: university of chicago press, 2012). 23. henri lefebvre, the production of space (oxford: blackwell, 1991). 24. xavier de souza briggs, the georgraphy of opportunity: race and housing choice in metropolitan america (washington, dc: brookings institute press, 2005). 25 paul jargowsky, poverty and place: ghettos, barrios, and the american city (new york: russell sage foundation, 1997). 26. edward w. soja, postmodern geographies: the reassertion of space in critical social theory (new york: verso, 1989). 27. collin gordon, mapping decline: st. louis and the fate of the american city (university of pennsylvania press, 2008). 28. ibid. http://www.jstor.org/stable/415697 http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html social contexts of new media literacies: mapping libraries| thorne-wallington 65 29. ernest morrell, “critical approaches to media in urban english language arts teacher development.” 30. missouri department of elementary and secondary education, http://dese.mo.gov/dsm/. 31. david allen, gis tutorial ii: spatial analysis workbook (redlands, ca: esri press, 2009). 32. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 33. lisa tripp, “digital youth, libraries, and new media literacy.” 34. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 35. collin gordon, mapping decline: st. louis and the fate of the american city. 36. see mwalimu shujaa, beyond desegregation: the politics of quality in african american schooling (thousand oaks, ca: corwin, 1996); william j. wilson, the truly disadvantaged: the inner city, the underclass, and public policy (chicago: university of chicago press, 1987); gary orfield and mindy l. kornhaber, raising standards or raising barriers: inequality and highstakes testing in public education (new york: century foundation, 2010). http://dese.mo.gov/dsm/ 44 information technology and libraries | june 2007 author id box for 3 column layout column title 44 information technology and libraries | september 2008 communications james feher and tyler sondag administering an open-source wireless network this tutorial presents enhancements to an open-source wireless network discussed in the june 2007 issue of ital that should reduce its administrative burden. in addition, it will demonstrate an opensource monitoring script written for the wireless network. as it has become increasingly important to provide wireless internet access for their patrons, libraries and colleges are almost expected to offer this service. inexpensive methods of providing wireless access—such as adding a commodity wireless access point to an existing network—can suffer from security issues, access by external entities, and bandwidth abuses. designs that address these issues often involve more costly proprietary hardware as well as expertise and effort that are often not readily available. a wireless network built with open-source software and commodity hardware that addressed the cost, security, and equal access issues mentioned above was presented in the june 2007 issue of ital.1 this tutorial highlights enhancements to the previous design that help to explain the technical hurdles in implementation, and includes a program that monitors the status of the various software and hardware components, helping to reduce the time required to administer the network. the wireless network presented requires several different pieces of software that must work together. because each of the required software programs are frequently updated, slight changes to the implementation may also be needed. a few issues that have arisen since the previous paper was written are addressed. a note is provided explaining the significance of setting the correct media access control (mac) address for the radius server and for wireless distribution system (wds) when configuring the system. in addition, in order to provide secure exchange of authentication credentials (username and password), the secure socket layer was used. a brief explanation of how to install a registered certificate on the gateway server is provided. lastly, a program that monitors the status of the network, provides a web page displaying the status of the various hardware and software components, and e-mails administrators with any changes to the network status—along with information on how this program is to be deployed within the network—is presented. configuration changes for previous design as new exploits are discovered and patched on a continual basis, any system should be regularly updated to insure that the most recent software is being used. the network design provided in the previous article used many different software components including, but not limited to: access point software openwrt—whiterussian rc3 dns cache dnsmasq v2.32 gateway chillispot v1.0 operating system fedora core 4 radius server free radius v1.0.4 web caching server squid v2.5 web server apache 2.2.3 many of these components can be kept up-to-date by using the yellow dog updater, modified (yum). 2 for example, to update a given package, with root access, at the command line enter: yum update packagename the yum command may also be used to update each package that has an available update by simply removing the package name from the yum update command and entering the following: yum update yum may also be used to upgrade the entire operating system.3 keep in mind that with any change in software, the configuration of any particular package may change as well. for example, the newest version of squid is currently 2.6. appendix d in the previous paper explained how to allow transparent relay of web requests so that client browsers did not have to be reconfigured. so, while version 2.5 required four changes to allow the transparent relay, the current version—found in appendix a—requires only one. in addition to changes in software, occasionally even entire websites move, as happened with chillispot.4 another change involved the configuration of the linksys wrt54gs access points. the newer versions of this access point/router sold by linksys have half the flash memory and half of the ram of the older versions.5 while the newer versions of the linksys wrt54gs can be flashed with custom firmware,6 the firmware that will fit on the newer unit lacks all the capability of the standard firmware. given this, those wishing to implement such a wireless network should investigate the capability of models to be deployed, as well as the version numbers for the access points chosen. the current version of the linksys wrt54gl and wrtsl54gs units retain enough flash memory and ram to be updated with the standard firmware mentioned in the previous article.7 james feher (jdfeher@mckendree.edu) is associate professor of computer science and computer information systems at mckendree university, lebanon, illinois. tyler sondag (sondag@cs.iastate.edu) is a phd candidate in computer science at iowa state university, ames. introducing zoomify image | smith 45administering an open-source wireless network | feher and sondag 45 in addition, the procedure for upgrading the firmware for the wrtsl54gs is simpler than the procedure outlined in appendix i of the previous paper. the factory-installed firmware on version 1.1 can be flashed directly using the web interface provided by linksys. so, while this tutorial and the previous paper outline the design of a network, the administrator will need to be vigilant in updating the packages used and keep in mind that the configuration specifications may also change with those updates. the administrator for the network must also investigate the capability of the standard hardware used to insure that it retains the functionality required for the system. choosing the correct mac address for the access point the access points used will have more than one interface and as such more than one mac address. when entering the mac address of a given access point into either the users file for the radius server or the access points that use the wds, use the mac address associated with the wireless interface.8 using the incorrect mac address will result in problems when communicating with the various access points. for the radius server, the access point will not get the correct ip address, which will prohibit the possibility of remotely administering the unit. incorrect mac addresses that are used for the wds settings will cause even worse problems, as the unit will not be able to relay data from users who connect to this access point. installation of a registered ssl certificate as users are required to enter their authentication credentials to gain access to the internet, the exchange of this data is encrypted using the secure socket layer.9 while administrators can self-sign the certificates used for their web servers, it is recommended that a registered certificate be obtained and installed for the system. this can help prevent common attacks and has the added benefit of eliminating warnings for the client browsers when they detect unregistered certificates being used by the ssl. a search of “ssl certificate” will yield any number of commercial vendors from which a certificate can be obtained. generally the installation of a certificate is fairly straightforward. the openssl command line utility can be used to generate a ssl key and certificate signing request (csr).10 once the csr is generated, pick a vendor/certificate authority who can sign your key. it should be noted that the design presented required the authentication gateway to be behind the main router. this required a certificate to be signed for a server within an intranet that does not have a fully qualified domain name. so, when generating the ssl key and csr, make sure to use gatewayhostname.localnet as the common name of your server. of course, gatewayhostname is whatever you choose as the name of your gateway host. the term localnet is used to refer to the server existing within an intranet. then make sure to place an entry for gatewayhostname.localnet into the hosts file of the server that is providing domain name service for your network. an example entry for the hosts file which is in the /etc directory of a standard fedora core installation is found in appendix b. monitoring script for wireless network as the wireless network has many separate hardware and software components, many possible points of failure exist for the system. the script from appendix c, which was written in perl,11 uses ping to test if each access point is still connected to the network and nmap to test whether the port associated with a given network service is still available.12 this program can be run manually or, even better, run automatically through the unix cron utility to update a webpage that displays the current state of all the network components. the webpage generated by this script for the mckendree college wireless network may be found at http://lance.mckendree.edu/cgi-bin/wireless/status.cgi. (additionally, a sample of this page is available as a figure in appendix d.) this script actually contains a script within a script. the main script must be run on the gateway machine, chilli on the diagram in appendix e, as only this machine has access to ping the access points. when the script determines that an access point or daemon is down, it will e-mail the system administrator. when an access point is down, in addition to sending the system administrator an e-mail, it can also send notification to an e-mail address associated with that device. this allows for someone other than the system administrator—who may have closer physical access to the unit—to check the access point on behalf of the administrators for simple issues, such as an access point losing power. this script then generates another cgi script that can be transmitted to an external server that can be reached from anywhere on the internet. in this case, this generated script can be run as a web-based application or by the system itself using the cron utility. if run as by the cron daemon, it will also e-mail the administrators if the script has not been updated recently. the script requires the use of several perl modules that will need to be installed. n expect n mail::mailer n net::ping the script has been released using the gnu general public license, 46 information technology and libraries | june 200846 information technology and libraries | september 2008 version 2 (gpl).13 the first portion of the script contains a reference to the gpl, followed by a brief explanation of the script as well as a set of parameters that should be changed to fit the specifications of the network designed. conclusion administrators should be vigilant in updating the entire system to assure security, keeping in mind that new versions of software or hardware may necessitate changes in the overall configuration of the system. in addition, while the monitoring script provides a useful aid in monitoring the network, it could be further expanded to include a more comprehensive review of level of use for various access points by the different users. it is felt that this would be best done through a database, which would require a higher level of administrative effort. a brief frequently asked questions list along with the script and link to the code for the script can be found at http://lance.mckendree.edu/csi/ wirelessfaq.html. references 1. sondag, tyler and james feher, “open source wifi hotspot implementation,” information technology and libraries 26, no. 2: 35–43, http://ala.org/ala/lita/ litapublications/ital/262007/2602jun/ toc.cfm (accessed july 24, 2008). 2. linux@duke, “yum: yellow dog updater, modified,” http://linux.duke .edu/projects/yum (accessed july 24, 2008) 3. upgrading fedora using yum frequently asked questions, http://fedora p r o j e c t . o r g / w i k i / yu m u p g r a d e f a q (accessed mar. 16, 2007). 4. chillispot—open source wireless lan access point controller, “spice up your hotspot with chilli,” www .chillispot.info/ (accessed may 22, 2008). 5. openwrtdocs/hardware/linksys /wrt54gs—openwrt, http://wiki.open wrt.org/openwrtdocs/hardware/link sys/wrt54gs (accessed july 24, 2008). 6. bitsum technologies wiki— wrt54g5 cfe, http://bitsum.com/ openwiking/owbase/ow.asp?wrt54g5_ cfe (accessed july 24, 2008). 7. openwrtdocs/hardware/linksys/wrtsl54gs—openwrt, http:// wiki.openwrt.org/openwrtdocs/hard ware/linksys/wrtsl54gs (accessed july 24, 2008). 8. o p e n wr t d o c s / w h i t e r u s s i a n / configuration, wireless distribution system (wds)/repeater/bridge. http:// wiki.openwrt.org/openwrtdocs/white russian/configuration (accessed july 24, 2008). 9. viega, john, matt messier, and pravir chandra, network security with openssl cryptography for secure communications. (sebastopol, calif.: o’reilly and associates, 2002). 10. generating a key pair and csr for an apache server with modssl. www .verisign.com/support/tlc/csr/modssl/ v00.html (accessed feb. 20, 2007). 11. wall, larry, tom christiansen, and randal schwartz, programming perl, third edition (sebastopol, calif.: o’reilly and associates). 12. nmap—free security scanner for network exploration and security audits. http://insecure.org/nmap/ (accessed feb. 20, 2007). 13. gnu general public license version 2, june 2007. www.gnu.org/licenses/ gpl.txt. appendix a. squid configuration changes # changes made to squid.conf # lines needed for squid 2.5 #httpd_accel_port 80 #httpd_accel_host virtual #httpd_accel_with_proxy on #httpd_accel_uses_host_header on # # one line needed in version 2.6 http_port 3128 transparent appendix b. /etc/hosts entry on marla for localnet entry 127.0.0.1 marla localhost.localdomain localhost 66.128.109.60 bob 66.99.172.252 lance.mckendree.edu lance # next line is for the ssl certificate to work properly 192.168.176.1 chilli.localnet chilli introducing zoomify image | smith 47administering an open-source wireless network | feher and sondag 47 appendix c. monitoring script #!/usr/bin/perl ######################################################### # code released 03/22/07 under: # # the gnu general public license, version 2 # # http://www.gnu.org/licenses/gpl.txt # # # # it is recommended that this script is run as a cron # # job frequently to find changes in the network. this # # script will check the status of the wireless access # # points/routers as well as the daemons necessary to # # run the network. it will then output the results to # # another perl file that is copied to a remote # # webserver. when the script observes a change in the # # availability of any access point or daemon, email # # will be sent to the specified administrator # # address(es). the option exists to send an email to # # to an additional person for each access point. # # # # additionally, the output file on the remote webserver # # will check when it was last updated, if that script # # is run from the command line or via cron. if it has # # not been updated for a specified number of minutes, # # it will send an email to the administrator. it is # # also recommended that this output script be run as a # # cron jobr. this output script can also be executed # # as a cgi program to generate a display of network # # status. # ######################################################### use strict; use expect(); # needed to scp to webserver use mail::mailer; # needed to send emails if outages use net::ping; # needed to check the status of aps #variables for webserver to host status page’s my $webservuname = “username”; my $webservpass = “password”; my $webservurl = “lance.mckendree.edu”; my $webservtarg = “/var/www/cgi-bin/wireless/”; my $weboutputurl = “http://lance.mckendree.edu/cgi-bin/wireless/status.cgi”; my $instname = “mckendree college”; #default background color of the status page my $defbgcolor = “#660066”; # if the page on the webserver has not been updated # in $updatemin minutes send an email that the service # is down (set to =~ 3*crontime) my $updatemin = 10; #email address errors will be sent to my $fromemail = ‘admin1@email.com’; my $toemail = 48 information technology and libraries | june 200848 information technology and libraries | september 2008 ‘admin1@email.com, admin2@email.com’; #file where errors will be stored on remote host my $logfilename = “/tmp/wireleslog.txt”; #hash for routers/ap’s #location is displayed on the webpage and in status emails #owner changes in status regarding this ap are sent to # this address as well (optional) my %iptoloc = ( “192.168.182.10” => { “location” => “clark 205”, “owner” => ‘’}, “192.168.182.11” => { “location” => “clark 202a”, “owner” => ‘apuser1@email.com’}, “192.168.182.12” => { “location” => “pac lounge”, “owner” => ‘apuser2@email.com’}, “192.168.182.20” => { “location” => “library main”, “owner” => ‘apuser3@email.com’}, “192.168.182.21” => { “location” => “library upper”, “owner” => ‘’}, “192.168.182.22” => { “location” => “library lower”, “owner” => ‘’}, “192.168.182.30” => { “location” => “carnegie”, “owner” => ‘apuser4@email.com’}); #hash for daemons my %daemons = ( “dnsmasq dns server” => { “ip_addr” =>”10.4.1.90”, “port” =>”53”, “proto” =>”tcp”}, “radius authenticate” => { “ip_addr” =>”10.4.1.90”, “port” =>”1812”, “proto” =>”udp”}, “chilli capt. portal” => { “ip_addr” =>”10.5.3.30”, “port” =>”0”, “proto” =>”local”}, “squid web cache” => { “ip_addr” =>”10.4.1.90”, “port” =>”3128”, “proto” =>”tcp”}, “apache web server” => { “ip_addr” =>”10.5.3.30”, “port” =>”80”, “proto” =>”tcp”}); introducing zoomify image | smith 49administering an open-source wireless network | feher and sondag 49 ######################################################## # # # no changes need to be made to the following code # # # ######################################################## # get the current time my $currenttime = scalar localtime(); my $starttime = time(); # open old output status script to get previous status’ open(old, “status.cgi”); my @tmpoldstatfile = ; my $oldstatfile = join(“”, @tmpoldstatfile); # check routers/ap’s using ping my $diff = ‘’; my $allrouterstat; foreach my $host (sort keys %iptoloc){ my $p = net::ping->new(); my $pingresult = $p->ping($host); if(!$pingresult){ sleep 10; $pingresult = $p->ping($host); } my $thislaststat = ( $oldstatfile =~ m/$iptoloc{$host}{location}<\/td>close(); } #check the status of each daemon my $alldaemonstat =’’; foreach my $i (sort keys %daemons){ my $thislaststat = ( $oldstatfile =~ m/$i<\/td> (\$lasttime + (60 * $updatemin))){ \$systemstatus = “#ff0000”; \$message = “

status update failed

”; } # if this is cron running the script if (\$currentuser =~ “$webservuname”){ # send email if status is down & logfile doesn’t exist &sendemail() if ( (\$systemstatus =~ “#ff0000”) && !(-e “$logfilename”) ); # delete log file if everything is up unlink(“$logfilename”) if ( (!(\$systemstatus =~ “#ff0000”)) && (-e “$logfilename”) ); } #else apache is accessing the page (its a web request) else{ #print the page print header(); ############################ # start of html output # ############################ print < $instname wireless status introducing zoomify image | smith 51administering an open-source wireless network | feher and sondag 51

$instname wireless status

\$message $allrouterstat
access point status

$alldaemonstat
daemon status


last updated $currenttime
web_output ########################## # end of html output # ########################## }#end else sub sendemail { my \$mailer = mail::mailer->new(“sendmail”); \$mailer->open({from => ‘$fromemail’, 52 information technology and libraries | june 200852 information technology and libraries | september 2008 to => [\$toemail], subject => “wireless problem”}); my \$message = “the wireless system has failed to “ .”it’s status.\n\n$weboutputurl\n”; print \$mailer \$message; \$mailer->close(); open(file, “>>$logfilename”); print file “failed to update system.”; close(file); } output_file_for_remote_host ######################################################## # end of script output block # ######################################################## #write output code to the file my $perloutputfile = “status.cgi”; open (out, “>$perloutputfile”); print out $perloutput; close (out); chmod 0755, $perloutputfile; #send email is necessary &sendemail($diff, $weboutputurl, $fromemail, $toemail) if ($diff); #send perl file to webserver &scpfile($perloutputfile, $webservuname, $webservpass, $webservurl, $webservtarg); ################################################ # # # end main code block, start functions # # # ################################################ # given the name and status of something (ap or # daemon), this returns a string for the table # row for displaying the status of the ap/daemon sub printstatus { my ($service, $status, $oldstatus, $owner, $toemail,$oldstatusfile, $currenttime ) = @_; my $msg = “”; my $statusline = “\n $serviceup”; introducing zoomify image | smith 53administering an open-source wireless network | feher and sondag 53 # if last two status’ were down if ($oldstatusfile =~ m/\($service\)-0--->/){ $msg = “$service back up at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } #else current status is down else{ $statusline .= “down\”>down”; # if last status was down & before that status was up if ($oldstatusfile =~ m/\($service\)-0-1-->/){ $msg = “$service down at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } $statusline .= “”; return ($statusline, $toemail, $msg); }#end printstatus function # checks the status for the given daemon # takes in ip, port to check, daemon name, and protocol # (tcp/udp). if given port=0 it checks for local daemon sub checkdaemon { my ($ip, $port, $daemon, $proto) = @_; my $dstat = 0; if ($proto !~ /local/){ #su checks for udp ports my $com = ($proto =~ “tcp”) ? (“nmap -p $port $ip | grep $port”) : (“nmap -su -p $port $ip | grep $port”); open(tmp, “$com|”); my $comout = ; close(tmp); if ($comout =~ /open/){ $dstat = 1; #if port is open, status is up } } else{ $daemon =~ s/ +.*//g; #\l lowercases the first letter of $daemon my $com = “which \l$daemon”; open(tmp, “$com|”); my $comout = ; close(tmp); $com = “ps aux | awk ‘{print \$11}’ | grep $comout”; open(tmp, “$com|”); $comout = ; close(tmp); $dstat = 1 if ($comout); 54 information technology and libraries | june 200854 information technology and libraries | september 2008 } return $dstat; } # end checkdaemon function # send the output perl status file to the webserver sub scpfile { my ($filepath, $webservuname, $webservpass, $webservurl, $webservtarg ) = @_; my $command = “scp $filepath $webservuname” .”\@$webservurl:$webservtarg”; my $exp1 = expect->spawn ($command); # the first argument “30” may need to be adjusted # if your system has very high latency my $ret = $exp1->expect(30, “word:”); print $exp1 “$webservpass\r”; my $ret = $exp1->expect(undef); $exp1->close(); } # end scpfile function # send an email to the admin & append error to log file sub sendemail { my ($errorlist, $weboutputurl, $fromemail, $toaddresses ) = @_; my $mailer = mail::mailer->new(“sendmail”); $mailer->open({from => “$fromemail”, to => [$toaddresses], subject => “wireless problem”}); $errorlist .= “\n\n$weboutputurl”; print $mailer $errorlist; $mailer->close(); } # end sendemail function appendix d. script output page appendix e. diagram of network lita cover 2, cover 3, cover 4 index to advertisers editorial | truitt 107 marc truitteditorial: computing in the “cloud” silver lining or stormy weather ahead? c loud computing. remote hosting. software as a service (saas). outsourcing. terms that all describe various parts of the same it elephant these days. the sexy ones—cloud computing, for example—emphasize new age-y, “2.0” virtues of collaboration and sharing with perhaps slightly mystic overtones: exactly where and what is the “cloud,” after all? others, such as the more utilitarian “remote hosting” and “outsourcing,” appeal more to the bean counters and sustainabilityminded among us. but they’re really all about the same thing: the tradeoff between cost and control. that the issue increasingly resonates with it operations at all levels these days can be seen in various ways. i’ll cite just a few: n at the meeting of the lita heads of library technology (holt) interest group at the 2009 ala annual conference in chicago, two topics dominated the list of proposed holt programs for the 2010 annual conference. one of these was the question of virtualization technology, and the other was the whole white hat–black hat dichotomy of the cloud.1 practically everyone in the room seemed to be looking at—or wanting to know more about—the cloud and how it might be used to benefit institutions. n my institution is considering outsourcing e-mail. all of it—to google. times are tough, and we’re being told that by handing e-mail over to the googleplex, our hardware, licensing, evergreening, and technical support fees will total zero. zilch. with no advertising. heady stuff when your campus hosts thirty-plus central and departmental mail servers, at least as many blackberry servers, and total costs in people, hardware, licensing, and infrastructure are estimated to exceed can$1,000,000 annually. n in the last couple of days, library electronic discussion lists such as web4lib have been abuzz— or do we now say a-twitter?—about amazon’s orwellian kindle episode, in which the firm deleted copies of 1984 and animal farm from subscribers’ kindle e-book readers without their knowledge or consent.2 indeed, amazon’s action was in violation of its own terms of service, in which the company “grants [the kindle owner] the non-exclusive right to keep a permanent copy of the applicable digital content and to view, use, and display such digital content an unlimited number of times, solely on the device or as authorized by amazon as part of the service and solely for [the kindle owner ’s] personal, noncommercial use.”3 all of this has me thinking back to the late 1990s marketing slogan of a manufacturer of consumer-grade mass storage devices—remember removable hard drives? iomega launched its advertising campaign for the 1 gb jaz drive with the catch-line “because it’s your stuff.” ultimately, whether we park it locally or send it to the cloud, i think we need to remember that it is our stuff. what i fear is that in straitened times, it becomes easy to forget this as we struggle to balance limited staff, infrastructure, and budgets. we wonder how we’ll find the time and resources to do all the sexy and forward-looking things, burdened as we are with the demands of supporting legacy applications, “utility” services, and a huge and constantly growing pile of all kinds of content that must be stored, served up, backed up (and, we hope, not too often, restored), migrated, and preserved. the buzz over the cloud and all its variants thus has a certain siren-like quality about it. the notion of signing over to someone else’s care—for little or no apparent cost—our basic services and even our own content (our stuff) is very appealing. the song is all the more persuasive in a climate where we’ve moved from just the normal bad news of merely doing more with less to a situation where staff layoffs are no longer limited to corporate and public libraries, but indeed extend now to our greatest institutions.4 at the risk of sounding like a paranoid naysayer to what might seem a no-brainer proposition, i’d like to suggest a few test questions for evaluating whether, how, and when we send our stuff into the cloud: 1. why are we doing this? what do we hope to gain? 2. what will it cost us? bear in mind that nothing is free—except, in the open-source community, where free beer is, unlike kittens, free. if, for example, the borg offer to provide institutional mail without advertisements, there is surely a cost somewhere. the borg, sensibly enough, are not in business to provide us with pro bono services. 3. what is the gain or loss to our staff and patrons in terms of local customization options, functionality, access, etc? 4. how much control do we have over the service offered or how our content is used, stored, marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 108 information technology and libraries | september 2009 repurposed, or made available to other parties? 5. what’s the exit strategy? what if we want to pick up and move elsewhere? can we reclaim all of our stuff easily and portably, leaving no sign that we’d ever sent it to the cloud? we are responsible for the services we provide and for the content we have been entrusted. we cannot shrug off this duty by simply consigning our services and our stuff to the cloud. to do so leaves us vulnerable to an irreparable loss of credibility with our users; eventually some among them would rightly ask, “so what is it that you folks do, anyway?” we’re responsible for it—whether it’s at home or in the cloud—because it’s our stuff. it is our stuff, right? references and notes 1. i should confess, in the interest of full disclosure, that it was eli neiburger of the ann arbor district library who suggested “hosted services as savior or slippery slope” for next year’s holt program. i’ve shamelessly filched eli’s topic, if not his catchy title, for this column. thanks, eli. also, again in the interest of full disclosure, i suggested the virtualization topic, which eventually won the support of the group. finally, some participants in the discussion observed that virtualization technology and hosting are in many ways two sides of the same topical coin, but i’ll leave that for others to debate. 2. brad stone, “amazon erases orwell books from kindle,” new york times, july 17, 2009, http://www.nytimes .com/2009/07/18/technology/companies/18amazon.html?_ r=1 (accessed july 21, 2009). 3. amazon.com, “amazon kindle: license agreement and terms of use,” http://www.amazon.com/gp/help/customer/ display.html?nodeid=200144530 (accessed july 21, 2009). 4. “budget cutbacks announced in libraries, center for professional development,” stanford university news, june 10, 2009, http://news.stanford.edu/news/2009/june17/layoffs-061709 .html (accessed july 22, 2009; “harvard libraries cuts jobs, hours,” harvard crimson (online edition), june, 26 2009, http:// www.thecrimson.com/article.aspx?ref=528524 (accessed july 22, 2009). catqc and shelf-ready material | jay, simpson, and smith 41 michael jay ([e-mail?]) is information technology expert, software unit, information technology department; betsy simpson is chair, cataloging and metadata department; and doug smith is head, copy cataloging unit, cataloging and metadata department, george a. smathers libraries, university of florida, gainesville. michael jay, betsy simpson, and doug smith catqc and shelf-ready material: speeding collections to users while preserving data quality libraries contract with vendors to provide shelf-ready material, but is it really shelf-ready? it arrives with all the physical processing needed for immediate shelving, then lingers in back offices while staff conduct itemby-item checks against the catalog. catqc, a console application for microsoft windows developed at the university of florida, builds on oclc services to get material to the shelves and into the hands of users without delay and without sacrificing data quality. using standard c programming, catqc identifies problems in marc record files, often applying complex conditionals, and generates easy-to-use reports that do not require manual item review. a primary goal behind improvements in technical service workflows is to serve users more efficiently. however, the push to move material through the system faster can result in shortcuts that undermine bibliographic quality. developing safeguards that maintain sufficiently high standards but don’t sacrifice productivity is the modus operandi for technical service managers. the implementation of oclc’s worldcat cataloging partners (wcp, formerly promptcat) and bibliographic record notification services offers an opportunity to retool workflows to take advantage of automated processes to the fullest extent possible, but also requires some backroom creativity to assure that adequate access to material is not diminished. n literature review quality control has traditionally been viewed as a central aspect of cataloging operations, either as part of item-byitem handling or manual and automated authority maintenance. how this activity has been applied to outsourced cataloging was the subject of a survey of academic libraries in the united states and canada. a total of 19 percent of libraries in the survey indicated that they forgo quality control of outsourced copy, primarily for government documents records. however, most respondents reported they review records for errors. of that group, 50 percent focus on access points, 30 percent check a variety of fields, and a significant minority—20 percent—look at all data points. overall, the libraries expressed satisfaction with the outsourced cataloging using the following measures of quality supplied by the author: accuracy, consistency, adequacy of access points, and timeliness.1 at the inception of oclc’s promptcat service in 1995, ohio state university libraries participated in a study to test similar quality control criteria with the stated goals of improving efficiency and reducing copyediting. the results were so favorable that the author speculated that promptcat would herald a future where libraries can “reassess their local practices and develop greater confidence in national standards so that catalog records can be integrated into local opacs with minimal revision and library holdings can be made available in bibliographic databases as quickly as possible.”2 fast forward a few years and the new incarnation of promptcat, wcp, is well on its way to fulfilling this dream. in a recent investigation conducted at the university of arkansas libraries, researchers concluded that error review of copy supplied through promptcat is necessary, but the error rate does not warrant discontinuance of the service. the benefits in terms of time savings far outweigh the effort expended to correct errors, particularly when the focus of the review is to correct errors critical to user access. while the researchers examined a wide variety of errors, a primary consideration was series headings, particularly given the problems cited in previous studies and noted in the article.3 with the 2006 announcement by the library of congress (lc) to curtail its practice of providing controlled series access, the cataloging community voiced great concern about the effect of that decision on user access.4 the arkansas study determined that “the significant number of series issues overall (even before lc stopped performing series authority work) more than justifies our concern about providing series authority control for the shelf-ready titles.” approximately one third of the outsourced copy across the three record samples studied had a series, and, of that group, 32 percent needed attention, predominantly taking the form of authority record creation with associated analysis and classification decisions.5 the overwhelming consensus among catalogers is that error review is essential. as far as can be determined, an underlying premise behind such efforts seems to be that it is done with the book in hand. but could there be a way to satisfy the concerns without the book in hand? certainly, validation tools embedded in library management systems provide protections whether records are manually entered or batchloaded, and outsourced authority maintenance services (for those who can use them) offer further control. but a customizable tool that allows libraries to target specific needs, both standards-based and local, without relying on item-by-item handling can contribute michael jay (emjay@ufl.edu) is information technology expert, software unit, information technology department; betsy simpson (betsys@uflib.ufl.edu) is chair, cataloging and metadata department; and doug smith (dougsmith@uflib.ufl .edu) is head, copy cataloging unit, cataloging and metadata department, george a. smathers libraries, university of florida, gainesville. 42 information technology and libraries | march 2009 to an economy of scale demanded by an environment with shrinking budgets and staff to devote to manual bibliographic scrutiny. if that tool is viewed as part of a workflow stream involving local error detection at the receiving location as well as enhancement at the network level (i.e., oclc’s bibliographic record notification service), then it becomes an important step in freeing catalogers to turn their attention to other priorities, such as digitized and hidden collections. n local setting and workflow the george a. smathers libraries at the university of florida encompasses six branches that address the information needs of a diverse academic research campus with close to fifty thousand undergraduate and graduate students. the technical services division, which includes the acquisitions and licensing department and the cataloging and metadata department, acquires and catalogs approximately forty thousand items annually. seeking ways to minimize the handling of incoming material, beginning in 2006 the departments developed a workflow that made it possible to send shelf-ready incoming material directly to the branches after check-in against the invoice. shelf-ready items represent approximately 30 percent of the libraries’ purchased monographic resources at this time. by using wcp record loads along with vendor-supplied shelf-ready processing, the time from receipt to shelf has been reduced significantly because it is no longer necessary to send the bulk of the shipments to cataloging and metadata. exceptions to this practice include specific categories of material that require individual inspection. the vendor is asked to include a flag in books that fall into many of these categories: n any nonprocessed book or book without a spine label n books with spine labels that have numbering after the date (e.g., vol. 4, no. 2) n books with cds or other formats included n books with loose maps n atlases n spiral-bound books n books that have the words “annual,” “biennial,” or a numeric year in the title (these may be a serial add to an existing record or part of a series that will be established during cataloging) to facilitate a post–receipt record review for those items not sent to cataloging and metadata, acquisitions and licensing runs a local programming tool, catqc, which reports records containing attributes cataloging and metadata has determined necessitate closer examination. figure 1 is an example of the reports generated, which are viewed using the mozilla firefox browser. copy catalogers rotate responsibility for checking the report and revising records when necessary. retrieval of the physical piece is only necessary in the 1 percent of cases where the item needs to be relabeled. n catqc report catqc analyzes the content of the wcp record file and identifies records with particular bibliographic coding, which are used to detect potential problems: 1. encoding levels 2, 3, 5, 7, e, j, k, m 2. 040 with non-english subfield b 3. 245 fields with subfields h, n, or p 4. 245 fields with subfields a or b that contain numerals 5. 245 fields with subfields a or b that contain red flag keywords 6. 246 fields 7. 490 fields with first indicator 0 8. 856 fields without subfield 3 9. 6xx fields with second indicators 4, 5, 6, and 7 the numbers following each problem listed below indicate which codes are used to signal the presence of a potential problem. minimal-level copy (1) the library’s wcp profiles, currently in place for three vendors, are set up to accept all oclc encoding levels. with such a wide-open plan, it is important to catch records with minimal-level copy to assure that appropriate access points exist and are coded correctly. the library encounters these less-than-full encoding levels infrequently. parallel records (2) catqc identifies foreign library records that are candidates for parallel record treatment by indicating in the report if the 040 has a non-english subfield b. the report includes a 936 field if present to alert catalogers that a parallel record is available. volume sets (3, 4, 5) the library does not generally analyze the individual volumes of multipart monographic sets (i.e., volume sets) even when the volumes have distinctive titles. these catcq and shelf-ready material | jay, simpson, and smith 43 “volume,” “part,” and “number” as well as common abbreviations of those words (e.g., v. or vol.). serial vs. monograph treatment (4, 5) titles owned by the library and classified as serials sometimes are ordered inadvertently as monographs, resulting in the delivery of a monographic record. a similar problem also occasionally arises with new titles. by detecting numerals, keywords, or the presence of one or more of the subfields in the 245 field, we can quickly scan a list of records with these characteristics. of course, most of the records detected by catqc are false hits because of the broad scope of the search; however, it takes only a few minutes to scan through the record list. non-print formats (3) the library does not receive records for any format other than print through wcp. consequently, detecting the presence of a subfield h in the 245 field is a good signal that there may be a problem with the record. alternate titles (6) alternate titles can be an important access point for library users. sometimes text that should properly be in subfield i (e.g., “at head of title”) of the 246 field is placed in subfield a in front of the alternate title. this adversely affects user access to the title through browse searching. catqc checks for and reports the presence of a 246 field. the cataloger can then quickly confirm that it is coded correctly. untraced series (7) as a program for cooperative cataloging (pcc) participant, the library opted to follow pcc practice to continue to trace series despite lc’s decision in 2006 to treat as untraced all series statements in newly cataloged records. because some libraries chose to follow lc in its decision, there has been an overall increase in the use of untraced series statements across all types of record-encoding volumes are added to the collection under the title of the set. the june 2006 decision by lc to produce individual volume records when a distinctive title exists caused concern about the integrity of the libraries’ existing open volume set records. because such records typically have enumeration indicated in the subfield n, and sometimes p, of the 245 field, the program searches for instances of those subfields. in addition, the program detects the presence of numerals in the 245 and keywords such as figure 1. an example report from catcq 44 information technology and libraries | march 2009 levels. to address this issue, catqc searches all wcp records for 490 fields with first indicator 0. catalogers check the authority files for the series and make any necessary changes to the records. this is by far the most frequent correction made by catalogers. links (8) to provide users with information about the nature of the urls displayed in the catalog, catalogers insure that explanatory text is recorded in subfield 3 of the 856 field. catqc looks for the absence of subfield 3, and, if absent, displays the 856 field in the report as a hyperlink. the cataloger adds the appropriate text (e.g., full text) as needed. subject headings with second indicators 4, 5, 6, and 7 (9) the catqc report reviewed by catalogers includes subject headings with second indicator 4. when these headings duplicate headings already on the record, catalogers delete them from our local system. when the headings are not duplicates, the catalogers change the second indicator 4 to 0. typically, 6xx fields with second indicators 5, 6, and 7 contain non-english headings based on foreign thesauri. these headings can conflict with lc headings and, in some cases, are cross references on lc authorities. the resulting split files are not only confusing to patrons, but also add to the numbers of errors reported that require authority maintenance. for these reasons, our policy is to delete the headings from our local system. catqc detects the presence of second indicators 5, 6, or 7 and creates a modified file with the headings removed with one exception: a heading with second indicator 7 and subfield 2 of “nasat,” which indicates the heading is taken from the national aeronautics and space administration thesaurus, is not removed because the local preference is to retain the “nasat” headings. n library-specific issues catqc resolves local problems when needed. for example, when more than one lc call number was present on the record, the wcp spine manifest sent to the vendor used to contain the second call number, which was affixed to the item. when the wcp records were loaded into the library’s catalog, the first call number populated the holding. as a result, there was a discrepancy between the spine label on the book and the call number in the catalog. prior to generating the report, catqc found multiple instances of call numbers in the records in the wcp file and created a modified file with the call numbers reordered so that the correct call number was used on the holding when the record was loaded. previously, the library’s opac did not display the text in subfield 3 of the 856 field, which specifies the type of material covered by the link, and to the user it appeared that the link was to a full-text resource. this was particularly troublesome for records with lc links to table of contents, publisher descriptions, contributor information, and sample text. to prevent user frustration, catqc was programmed to move the links on the wcp records to 5xx fields. when the opac interface improved and the programming was no longer necessary, catqc was revised. n analysis to see how well catqc and oclc’s bibliographic notification service were meeting our goal of maintaining high-quality bibliographic control, 63 reports were randomly selected from the 171 reports generated by catqc between october 2007 and april 2008. catqc found no problems in twelve (19 percent) of the selected reports. these twelve were not used in the analysis, leaving fifty-one catqc reports examined with at least one potential problem flagged for review. an average of 35.6 percent of the records in the sample of reports was flagged as requiring review by a cataloger. an average of thirteen possible problems was detected per report. of these, 55 percent were potential problems requiring at least some attention from the cataloger. the action required of the cataloger varied from simply checking the text of a field displayed in the report (e.g., 246 fields) to bringing up the record in aleph and editing the bibliographic record (e.g., verifying and correcting series headings or eliminating unwanted subject headings). why the relatively high rate of false positives (45 percent)? to minimize missing serials and volumes belonging to sets, catqc is designed to err on the side of caution. two of the criteria listed earlier were responsible for the vast majority of the false positives generated by catqc: 245 fields with subfields a or b that contain numerals and 245 fields with subfields a or b that contain red-flag keywords. clearly, if every record with a numeral in the 245 is flagged, a lot of hits will be generated that are not actual problems. the list of keywords was purposefully designed to be extensive. for example, “volume,” “vol.,” and “v.” are all triggers causing a record to be flagged. therefore a bibliographic record containing the phrase “volume cost profit analysis” in the 245 field would be flagged as a potential problem. at first glance, a report filled with so many false positives may seem inefficient and burdensome for catalogers to use; however, this is largely mitigated by the excellent display format. the programmer worked closely with catcq and shelf-ready material | jay, simpson, and smith 45 the copy cataloging unit staff to develop a user-friendly report format. each record is framed separately, making it easy to distinguish from adjoining records. potential problems are highlighted with red lettering immediately alerting catalogers to what the potential problem might be. whenever a potential problem is found, the text of the entire field appears in the report so that catalogers can see quickly whether the field triggering the flag is an actual problem. it takes a matter of seconds to glance through the 245 fields of half a dozen records to see if the numeral or keyword detected is a problem. the catalogers who work with these reports estimated that it took them between two and three hours per month to both review the files and make corrections to bibliographic records. a second component of bibliographic quality maintenance is oclc’s bibliographic record notification service. this service compares newly upgraded oclc records with records held by the library and delivers the upgraded records to the library. because catqc flags records with encoding levels of 2, 3, 5, 7, e, j, k, and m, it was possible to determine if these records had, in fact, been upgraded in oclc. in the sample, thirty-three records were flagged because of the encoding level. no upgrade had been made to 21.2 percent of the records in oclc as of august 2008. upgrades had been made to 45.5 percent of the records. the remaining 33.3 percent of the records were manually loaded by catalogers in copy cataloging. these typically are records for items brought to copy cataloging by acquisitions and licensing because they meet one or more of the criteria for individual inspection discussed previously. when catalogers search oclc and find that the received record has not been upgraded, they search for another matching record. a third of the time, a record of higher quality than that received is found in oclc and exported to the catalog. the reason why the record of better quality is not harvested initially is not clear. it is possible that at the time the records were harvested both records were of equivalent quality and by chance one was enhanced over another. in no instance had any of the records originally harvested been upgraded (this is not reflected in the 21.2 percent of records not upgraded). encoding level 8 records are excluded from catqc reports. because of the relatively quick turnaround for upgrades of this type of copy, the library decided to rely solely on the bibliographic record notification service. n technical specifications catqc is a console application for windows. written in standard c, it is designed to be portable to multiple operating systems with little modification. no graphic interface was developed because (a) the users are satisfied with the current operating procedure and (b) the treatment of the records is predefined as a matter of local policy. the user opens a command console (cmd.exe) and types “catqc”+space+“[name of marc file]”+enter. the corrected file is generated; catqc analyzes the modified file and creates the xml report. it moves the report to a reviewing folder on a file server across the lan and indicates to the user that it is terminating. modifications require action by a programmer; the user cannot choose from a list of options. benefits include a 100 kb file size and a processing speed of approximately 1,000 records per second. no quantitative analysis has yet been done related to the speed of processing, but to the user the entire process seems nearly instantaneous. the genesis of the project was an interest in the record structure of marc files brought about in the programmer by the use of earlier local automation tools. the project was speculative. the first experiment contained the programming structure that would become catqc. one record is read into memory at a time, and there is another array held for individual marc fields. conceptually, the records are divided into three portions—leader, directory, and dataset—when the need arises to build an edited record. initially there was no editing, only the production of the report. the generation of strict, valid xml is a significant aspect of catqc. an original document type was created, along with a corresponding cascading style sheet. the reports are viewable to anyone with an xml–capable browser either through file server, web server, or e-mail. (the current version of internet explorer does not fully support the style sheet syntax.) this continues to be convenient for the report reviewers because they do not have to be client application operators. see appendix a for an excerpt of a document instance and appendix b for the document type definition. catqc is not currently a generalized tool such as marcedit, a widely used marc editing utility that provides a standard array of basic capabilities: field counting, field and subfield deletion (with certain conditional checks), field and subfield additions, field swapping and text replacement, and file conversion to and from various formats such as marcxml and dublin core as well as between marc-8 and utf-8 encodings.6 marcedit continues to grow and does offer programmability that relies on the windows scripting host. this requires the user to either learn vbscript or use the wizards offered by marcedit. the catqc development goal was to create a report, viewable through a lan or the internet, which alerts a group of catalogers to potential problems with specific records, often illustrating those problems. although it might have been possible to use a combination of marcedit capabilities and local programming to help achieve this goal, it likely would have been a more cumbersome route, particularly taking into consideration the multidimensional 46 information technology and libraries | march 2009 conditionals desired. it was deemed easier to write a program that addresses local needs directly in a language already familiar to the programmer. as catqc evolved, it was modified to identify more potential problems and to do more logical comparisons as well as to edit the files as necessary before generating the reports. catqc addresses a particular workflow directly and provides one solution. it is procedural as opposed to event driven or object oriented. with version 1.3, the generic functions were extracted into a marclib 1.0, a common object file format library. functions specific to local workflow remain in catqc. the program is freely available to interested libraries by contacting the authors. as of this writing, the university of florida plans to distribute this utility under the gnu public license version 3 (see www.opensource.org/licenses/gpl-3.0.html) while retaining copyright. n conclusion catqc provides catalogers an easy way to check the bibliographic quality of shelf-ready material without the book in hand. as a result, throughput time from receipt to shelf is reduced, and staff can focus data review on problem areas—those affecting access or interfering with local processes. some of the issues addressed by catqc are of concern to all libraries while others reflect local preferences. the program could be easily modified to conform to those preferences. automation tools such as catqc are of key importance to libraries seeking ways to streamline workflows to the benefit of users. references and notes 1. vinh-the lam, “quality control issues in outsourcing cataloging in united states and canadian academic libraries,” cataloging & classification quarterly 40, no. 1 (2005): 101–22. 2. mary m. rider, “promptcat: a projected service for automatic cataloging—results of a study at the ohio state university libraries,” cataloging & classification quarterly 20, no. 4 (1995): 43. 3. mary walker and deb kulczak, “shelf-ready books using promptcat and ybp: issues to consider (an analysis of errors at the university of arkansas),” library collections, acquisitions, & technical services 31, no. 2 (2007): 61–84. 4. “lc pulls plug on series authority records,” cataloging & classification quarterly 43, no. 2 (2006): 98–99. 5. walker and kulczak, “shelf-ready books.” 6. for more information about marcedit, see http://oregon state.edu/~reeset/marcedit/html/index.php. wcp file analysis: 201 records analyzed. record: 71 oclc number: 243683394 timestamp: 20080824000000.0 245: 10 |a difference algebra /|c levin alexander. 245 h 245 n 245 p numerals keywords appendix a. catqc document instance excerpt catcq and shelf-ready material | jay, simpson, and smith 47 490: 0 |a algebras and applications ;|v v. 8 . . . appendix b. catqc document type definition 48 information technology and libraries | march 2009 170 information technology and libraries | december 2011 this paper summarizes a research program that focuses on how catalogers, other cultural heritage information workers, web/semantic web technologists, and the general public understand, explain, and manage resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within the bibliographic universe and beyond it. a significant effort is made to update the nineteenth-century mathematical and scientific ideas present in traditional cataloging theory to their twentiethand twenty-first-century counterparts. there are two key elements in this approach: (1) a technique for diagrammatically depicting and manipulating large quantities of individual and grouped bibliographic entities and the relationships between them, and (2) the creation of resource description exemplars (problem–solution sets) that are intended to play theoretical, pedagogical, and it system design roles. to the reader: this paper presents a major re-visioning of cataloging theory, introducing along the way a technique for depicting diagrammatically large quantities of bibliographic entities and the relationships between them. as many details of the diagrams cannot be reproduced in regularly sized print publications, the reader is invited to follow the links provided in the endnotes to pdf versions of the figures. c ataloging—the systematic arrangement of resources through their descriptions that is practiced by libraries, archives, and museums (i.e., cultural heritage institutions) and other parties1—can be placed in an advanced, twenty-first-century context by updating its preexisting scientific and mathematical ideas with their more contemporary versions. rather than directing our attention to implementation-oriented details such as metadata formats, database designs, and communications protocols, as do technologists pursuing bottom-up web and semantic web initiatives, in ronald j. murray and barbara b. tillett cataloging theory in search of graph theory and other ivory towers object: cultural heritage resource description networks this paper we will define a complementary, top-down approach. this top-down approach focuses on how catalogers, other cultural heritage information workers, web/ semantic web technologists, and the general public have understood, explained, and managed their resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within and beyond the bibliographic universe. we go on to prescribe what enlargements of cataloging theory and practice are required such that catalogers and other interested parties can describe pages from unique, ancient codices as readily as they might describe information elements and patterns on the web. we will be enhancing cataloging theory with concepts from communications theory, history of science, graph theory, computer science, and from the hybrid field of anthropology and mathematics called ethnomathematics. employing this strategy benefits two groups: ■■ workers in the cultural heritage realm, who will acquire a broadened perspective on their resource description activities, who will be better prepared to handle new forms of creative expressions as they appear, and who will be able to shape the development of information systems that support more sophisticated types of resource descriptions and ways of exploring those descriptions. to build a better library system (perhaps an n-dimensional, n-connected system?), one needs better theories about the library collections and the people or groups who manage and use them. ■■ the full spectrum of people who draw on cultural heritage resources: scholars, creatives (novelists, poets, visual artists, musicians, and so on), professional and technical workers, students, and other people or groups pursuing specific or general, long or short-term interests, entertainment, etc. to apply a multidisciplinary perspective to the processes by which resource description data (linked or otherwise) are created and used is not an ivory tower exercise. our approach draws lessons from the debates on why, what, and how to describe physical phenomena that were conducted by physicists, engineers, software developers (and their historian and philosopher of science observers) during the evolution of high-energy physics. during that time, intensive debates raged over theory and observational/experimental data, the roles of theorists, experimenters, and instrument builders, instrumentation, and hardware/software system design.2 accommodating the resulting scientific approaches to description, collaboration, and publishing has required the creation of information technologies that have had and continue to have world-shaking effects. ronald j. murray (rmur@loc.gov) is a digital conversion specialist in the preservation reformatting division, and barbara b. tillett (btil@loc.gov) is the chief of the policy and standards division at the library of congress. cataloging theory in search of graph theory and other ivory towers | murray and tillett 171 descriptions—accounts or representations of a person, object, or event being drawn on by a person, group, institution, and so on, in pursuit of its interests. given this definition, a person (or a computation) operating from a business rules–generated institutional or personal point of view, and executing specified procedures (or algorithms) to do so, is an integral component of a resource description process (see figure 1). this process involves identifying a resource’s textual, graphical, acoustic, or other features and then classifying, making quality and fitness for purpose judgments, etc., on the resource. knowing which institutional or individual points of view are being employed is essential when parties possessing multiple views on those resources describe cultural heritage resources. how multiple resource descriptions derived from multiple points of view are to be related to one another becomes a key theoretical issue with significant practical consequences. ■■ niels bohr’s complementarity principle and the library in 1927, the physicist niels bohr offered a radical explanation for seemingly contradictory observations of physical phenomena confounding physicists at that time.6 according to bohr, creating descriptions of nature is the primary task of the physicist: it is wrong to think that the task of physics is to find out how nature is. physics concerns what we can say about nature.7 descriptions that appear contradictory or incomparable may in fact be signaling deep limitations in language. bohr’s complementarity principle states that a complete description of atomic-level phenomena requires descriptions of both wave and particle properties. this is generally understood to mean that in the normal language these physics research facilities and their supporting academic institutions are the same ones whose scientific subcultures (theory, experiment, and instrument building) generated the data creation, management, analysis, and publication requirements that resulted in the creation of the web. in response to this development, we have come to believe that cultural heritage resource description (i.e., the process of identifying and describing phenomena in the bibliographic universe as opposed to the physical one) must now be as open to the concepts and practices of those twenty-first-century physics subcultures as it had been to the natural sciences during the nineteenth century.3 we have consequently undertaken an intensive study of the scientific subcultures that generate scientific data and have identified four principles on which to base a more general approach to cultural heritage resource description: 1. observations 2. complementarity 3. graphs 4. exemplars the cultural heritage resource description theory to follow proposes a more articulated view of the complex, collaborative process of making available—through their descriptions—socially relevant cultural heritage resources at a global scale. we will demonstrate that a broader understanding of this resource description process (along with the ability to create improved implementations of it) requires integrating ideas from other fields of study, reaching beyond it system design to embrace larger issues. ■■ cataloging as observation as stated in the oxford english dictionary, an observation is: the action or an act of observing scientifically; esp. the careful watching and noting of an object or phenomenon in regard to its cause or effect, or of objects or phenomena in regard to their mutual relations (contrasted with experiment). also: a measurement or other piece of information so obtained; an experimental result.4 following the scientific community’s lead in striving to describe the physical universe through observations, we adapted the concept of an observation into the bibliographic universe and assert that cataloging is a process of making observations on resources. human or computational observers following institutional business rules (i.e., the terms, facts, definitions, and action assertions that represent constraints on an enterprise and on the things of interest to the enterprise)5 create resource figure 1. a resource description modeled as a business ruleconstrained account of a person, object, or event 172 information technology and libraries | december 2011 purpose, its reformatting, and its long-term preservation must take into consideration that resource’s physical characteristics. having things to say about cultural heritage resources—and having many “voices” with which to say them—presents the problem of creating a well-articulated context for library-generated resource descriptions as well as those from other sources. these contextualization issues must be addressed theoretically before implementation-level thinking, and the demands of contextualization require visualization tools to complement the narratives common to catalogers, scholars, and other users. this is where mathematics and ethnomathematics make their entrance. ethnomathematics is the study of the mathematical practices of specific cultural groups over the course of their daily lives and as they deal with familiar and novel problems.10 an ethnomathematical perspective on cultural heritage resource description directs one’s attention to the existence of simple and complex resource descriptions, the patterns of descriptions that have been created, and the representation of these patterns when they are interpreted as expressions of mathematical ideas. a key advantage of operating from an ethnomathematical perspective is becoming aware that mathematical ideas can be observed within a culture (namely the people and institutions who play key roles in observing the bibliographic universe) before their having been identified and treated formally by western-style mathematicians. ■■ resource description as graph creation relationships between cultural heritage resource descriptions can be represented as conceptually engaging and flexible systems of connections mathematicians call graphs. a full appreciation of two key mathematical ideas underlying the evolution of cataloging—putting things into groups and defining relationships between things and groups of things—was only possible after the founding, naming, and expansion of graph theory, which is a field of mathematics that emerged in the 1850s, and the eventual acceptance around 1900 of set theory, a field founded amid intense controversy in 1874. between the emergence of formal mathematical treatments of those ideas by mathematicians and their actual exploitation by cataloging theorists—or by anyone capable of considering library resource description and organization problems from a mathematical perspective—lay a gulf of more than one hundred years.11 it remained for scholars in the library world to begin addressing the issue. tillett’s 1987 work on bibliographic relationships and svenonius’s 2000 definition of bibliographic entities in set-theoretic terms that physicists use to communicate experimental results, the wholeness of nature is accessible only through the embrace of complementary, contradictory, and paradoxical descriptions of it. later in his career, bohr vigorously affirmed his belief that the complementarity principle was not limited to quantum physics: in general philosophical perspective, it is significant that, as regards analysis and synthesis in other fields of knowledge, we are confronted with situations reminding us of the situation in quantum physics. thus, the integrity of living organisms, and the characteristics of conscious individuals, and most of human cultures, present features of wholeness, the account of which implies a typically complementary mode of description. . . . we are not dealing with more or less vague analogies, but with clear examples of logical relations which, in different contexts, are met with in wider fields.8 within a library, there are many things catalogers, conservators, and preservation scientists—each with their distinctive skills, points of view, and business rules—can observe and say about cultural heritage resources.9 much of what these specialists say and do strongly affects library users’ ability to discover, access, and use library resources in their original or surrogate forms. while observations made by these specialists from different perspectives may lead to descriptions that must be accepted as valid for those specialists, a fuller appreciation of these descriptions calls for the integration of those multiple perspectives into a well-articulated, accessible whole. reflecting the perspectives of the library of congress directorates in which we work, the acquisitions and bibliographic access (aba) directorate and the preservation directorate, we assert that the most fundamental complementary views on cultural heritage resources involve describing a library’s resources in terms of their availability (from an acquisitions perspective), in terms of their information content (from a cataloging perspective), and in terms of their physical properties (from a preservation perspective). for example, in the normal languages used to communicate their results, preservation directorate conservators narrate their condition assessments and record simple physical measurements of library-managed objects—while at the same time preservation scientists in another section bring instrumentation to acquire optical and chemical data from submitted materials and from reference collections of physical and digital media. even though these assessments and measurements may not be comprehended by or made accessible to most library users, the information gathered possess a critical logical relationship to bibliographic and other descriptions of those same resources. key decisions regarding a library resource’s fitness for cataloging theory in search of graph theory and other ivory towers | murray and tillett 173 by the modeling technique. what is required instead is theory-based guidance of systems development, alongside theory testing and improvement through application use. if software development is not constrained by a tacit or explicit resource description theory or practice, graph or other data structures familiar to the historically less well-informed, those favored by an institution’s system designers and developers, or those familiar to and favored by implementation-oriented communities may be invoked inappropriately.18 given graph theory’s potentially overwhelming mathematical power—as evidenced by its many applications in the physical sciences, engineering, and computer science—investigations into graph theory and its history require close attention both to the history and evolving needs of the cultural heritage community.19 the unnecessary constraint on resource description theory formation occasioned by the use of e-r or oo modeling can be removed by dispensing with it system analysis tools and expressing resource description concepts in graph-theoretical terms. with this step, the very general elements (i.e., entities and relationships) that characterize e-r models and the more implementation-oriented ones in oo models are replaced by more mathematically flexible, theory-relevant elements expressed in graph-theoretical terms. the result is a “graph-friendly” theory of cultural heritage resource description, which can borrow from other fields (e.g., ethnomathematics, history of science) to improve its descriptive and predictive power, guide it system design and use, and, in response to users’ experiences with functioning systems, results in improved theories and information systems. graph theory in a cultural heritage context ever since the nineteenth century foundation of graph theory (though scholars regularly date its origins from euler’s 1736 paper)20 and its move from the backwaters of recreational mathematics to full field status by 1936, graph theory has concerned itself with the properties of systems of connections—nowadays regularly expressed as the mathematical objects called sets.21 in addition to its set notational form, graphs also are depicted and manipulated in diagrammatic form as dots/labeled nodes linked by labeled or unlabeled, simple or arrowed lines. for example, the graph x, consisting of one set of nodes labeled a, b, c, d, e, and f and one set of edges labeled ab, bd, de, ef, and fc, can be depicted in set notation as x = {{a b c d e f}, {ab bd de ef fc}} and can be depicted diagrammatically as in figure 2. when graphs are defined to represent different types of nodes and relationships, it becomes possible to create and discuss structures that can support cultural heritage resource description theory and application building. the following diagrams depict simple resource description identified those mathematical ideas in cataloging theory and developed them formally.12 then in 2009, we were able to employ graph theory (expressed in set-theoretical terms and in its highly informative graphical representation) as part of a broader historical and cultural analysis.13 cataloging theory had by 2009 haltingly embraced a new view on how resources in libraries have been described and arranged via their descriptions—an activity that in principle stretches back to catalogs created for the library of alexandria14—and how these structured resource descriptions have evolved over time, irrespective of implementation. murray’s investigation into this issue revealed that the increasingly formalized and refined rules that guided anglo-american catalogers had, by 1876, specified sophisticated systems of cross-references (i.e., connections between bibliographic descriptions of works, authors, and subjects)—systems whose properties were not yet the subject of formal mathematical treatment by mathematicians of the time.15 murray also found that library resource description structures—when teased out of their book and card and digital catalog implementations and treated as graphs—are arguably more sophisticated than those being explored in the world wide web consortium’s (w3c) library linked data initiative.16 implementation-oriented substitutes for graph theory cataloging theory has been both helped and hindered by the use of information technology (it) techniques like entity-relationship modeling (e-r, first used extensively by tillett in 1987 to identify bibliographic relationships in cataloging records) and object-oriented (oo) modeling.17 e-r and oo modeling may be used effectively to create information systems that are based on an inventory of “things of interest” and the relationships that exist between them. unfortunately, the things of interest in cultural heritage institutions keep changing and may require redefinition, aggregation, disaggregation, and re-aggregation. e-r and oo modeling as usually practiced are not designed to manage the degree and kind of changes that take place under those circumstances. when trying to figure out what is “out there” in the bibliographic universe, we assert that focus should first be placed on identifying and describing the things of interest, what relationships exist between them, and what processes are involved in the creation, etc., of resource descriptions. having accomplished this, attention can then be safely paid to defining and managing information deemed essential to the enterprise, that is, undertaking it system analysis and design. but when an it-centric modeling technique becomes the bed on which the resource description theory itself is constructed, the resulting theory will be driven in a direction that is strongly influenced 174 information technology and libraries | december 2011 of the resources they describe. figure 4’s diagrammatic simplicity becomes problematic when large quantities of resources are to be described, when the number and kinds of relationships recorded grows large, and when more comprehensive but less-detailed views of bibliographic relationships are desired. to address these problems in a comprehensive fashion, we examined similar complex description scenarios in the sciences and borrowed another idea from the physics community—paper tool creation and use. ■■ paper tools: graph-aware diagram creation paper tools are collections of symbolic elements (diagrams, characters, etc.), whose construction and manipulation are subject to specified rules and constraints.23 berzelian chemical notation (e.g., c6h12o6) and—more prominently—feynman diagrams like those in figure 5 are familiar examples of paper tool creation and use.24 creating a paper tool resource diagram requires that the rules for creating resource descriptions be reflected in diagram elements, properties of diagram elements, and drawing rules that define how diagram/symbolic elements are connected to one another (e.g., the formula c6h12o6 specifies six molecules of carbon, twelve of hydrogen, and six of oxygen). the detailed bibliographic information in figure 4 is progressively schematized in a graphs that are based on real-world bibliographic descriptions. nodes in the graphs represent text, numbers, or dates and relationships that can be nondirectional (as a simple line), unidirectional (as single arrowed lines) or bidirectional (as a double arrowed line). the all-in-one resource description graph in figure 3 can be divided and connected according to the kinds of relationships that have been defined for cultural heritage resources. this is the point where institutional, group, and individual ways of describing resources shape the initial structure of the graph. once constructed, graph structures like this and their diagrammatic representations are then interpreted in terms of a tacit or explicit resource description theory. in the case of graphs constructed according to ifla’s functional requirements for bibliographic records (frbr) standard,22 figure 3 can be subdivided into four frbr sub-graphs, yielding figure 4. the four diagrams depict the initial graph of cataloging data as four complementary frbr wemi (w–work, e–expression, m–manifestation, and i–item) graphs. note that the item graph contains the call numbers (used here to identify the location of the copy) of three physical copies of the novel. this use of call numbers is qualitatively different from the values found in the manifestation graph in that resource descriptions in this graph apply to the entire population of physical copies printed by the publisher. the descriptions contained in figure 4’s frbr subgraphs reproduce bibliographic characteristics found useful by catalogers, scholars, other educationally oriented end users, and to varying extents the public in general. once created, resource description graphs and subgraphs (in mathematical notation or in simple diagrams like figure 4) can proliferate and link in multiple and complex ways—in parallel with or independently figure 3. library of congress catalog data for thomas pynchon’s novel gravity’s rainbow, represented as an all-inone graph labeled c figure 2. a diagrammatic representation of graph x cataloging theory in search of graph theory and other ivory towers | murray and tillett 175 6 graph is now represented explicitly by a black dot in a ring in the more schematic paper tool version. resource descriptions are then represented in fixed colors and positions relative to the resource/ring: the worklevel resource description is represented by a blue box, expression by a green box, manifestation by a yellow box, and item by a red box. depicting one aspect of the frbr way that reflects frbr definitions of bibliographic things of interest and their relevant relationships. as a first step, the four wemi descriptions in figure 4 are given a common identity by linking them to a c node, as in figure 6. the diagram is then further schematized such that frbr description types and relationships are represented by appropriate graphical elements connected to other elements. the result shows how a frbr paper tool makes it much easier to construct and examine complex large-scale properties of resource and resource description structures (like figure 7, right side) without being distracted by textual and linkage details. the resource described (but not shown) by the figure figure 4. the all-in-one graph in figure 3, separated into four frbr work (top-left), expression (top-right), manifestation (bottom-left), and item (bottom-right) graphs figure 5. feynman diagrams of elementary particle interactions figure 6. a frbr resource description graph 176 information technology and libraries | december 2011 expressions. the work products of scholars—especially those creations that are dense with quotations, citations, and other types of direct and derived textual and graphical reference within and beyond themselves—are excellent environments for paper tool explorations and more generally, for testing of exemplars—solutions to the potentially complex problem of describing cultural heritage resources. ■■ exemplars the fourth principle in our cultural heritage resource description theory involves exemplar identification and analysis. according to the historian of science thomas s. kühn, exemplars are sets of concrete problems and solutions encountered during one’s education, training, and work. in the sciences, exemplar-based problem finding and solving involves mastery of relevant models, builds knowledge bases, and hones problem-solving skills. every student in a field would be expected to demonstrate mastery by learning and using their field’s exemplars. change within a scientific field is manifest by the need to modify old or create new exemplars as new problems appear and must be solved.26 a cultural heritage resource description theorist would, in addition to identifying and developing exemplars from real bibliographic data and other sources, want to speculate about possible resource/description configurations that call for changes in existing information technologies. to the theorist, it would be as important to find out what can’t be done with frbr and other resource description models at library, archive, museum, and internet scales, as it is to be able to explain routine item cataloging and tagging activities. discovering system limitations is better done in advance by simulating uncommon or challenging circumstances than by having problems appear later in production systems. model graphically, the descriptions closest to the black dot resource/slot are the most concrete and those furthest away the most abstract. (readers wishing to interpret frbr paper tool diagrams without reference to color values should note the strict ordering of wemi elements: w–e–m–i–resource/ring or resource/ring–i–m–e–w.) finally, to minimize element use when pairs of wemi boxes touch, the appropriate frbr linking relationship for the relevant pair of descriptions (as explicitly shown in the expanded graph) is implied but not shown. with appropriate diagramming conventions, the process of creating and exploring resource description complexes addresses combined issues of cataloging theory and institutional policy—and results in an ability to make better-informed judgments/computations about resource descriptions and their referenced resources. as a result, resource description graphs are readily created and transformed to serve theoretical—and with greater experience in thinking and programming along graph-friendly lines, practical—ends. one example of transformability would arise when exploring the implications of removing redundant portions of related resource descriptions as more copies of the same work are brought to the bibliographic universe. the frbr paper tool elements and the more articulated resource description graphs in figure 8 both depict the consequences of a practical act: combining resource descriptions for two copies of the same edition of the novel gravity’s rainbow.25 the top-most frbr diagram and its magnified section depict how the graph would look with a single item-level description, the call number for one physical copy. the bottom-most frbr diagram and its magnified section depict the graph with two item-level descriptions, the call numbers for two physical copies. a frbr paper tool’s flexibility is useful for exploring potentially complex bibliographic relationships created or uncovered by scholars—parties whose expertise lies in identifying, interrelating, and discussing creative concepts and influences across a full range of communicative figure 7. a frbr paper tool diagram element (left) and the less schematic frbr resource description graph it depicts (right) cataloging theory in search of graph theory and other ivory towers | murray and tillett 177 drawing diagrams. use case diagrams are secondary in use case work.28 as products of and guides for theory making, resource description exemplars have different origins and audiences than those for use cases. while use cases and exemplars offer perspectives that can support information system design, exemplars were originally introduced as theoretical entities by kühn to explain how theories and theory-committed communities can crystallize around problem-solution sets, how these sets also can serve as pedagogical tools, and why and when problem-solution sets get displaced by new ones. the proposed process of cultural heritage exemplar creation and use, followed by modification or replacement in the face of changes in the bibliographic universe draws on kühn’s and historian of science david kaiser’s interest in how work gets done in the sciences, in addition to their rejection of paradigms as eerie self-directing processes.29 exemplars are not use cases use cases are a software modeling technique employed by the w3c library linked data incubator group (lld xg) in support of requirements specification.27 kühnstyle exemplars are definitely not to be confused with use cases, which are requirements-gathering documents that contribute to software engineering projects. there is a wikipedia definition of a use case that describes its properties: a use case in software engineering and systems engineering, is a description of steps or actions between a user (or “actor”) and a software system which leads the user towards something useful. the user or actor might be a person or something more abstract, such as an external software system or manual process. . . . use cases are mostly text documents, and use case modeling is primarily an act of writing text and not figure 8. frbr paper tool diagram elements and the frbr resource description graphs they depict 178 information technology and libraries | december 2011 ■■ a webpage and its underlying, globally distributed, multimedia resource network, as it changes over time. such exemplars can be presented diagrammatically through the use of paper tools. this use of diagrams in support of conceptualization and information system design is deliberately patterned after professional data modeling theory and practice.31 paper tool–supported analyses of a nineteenth-century american novel (exemplar 1) and of eighteenth-century french poems drawn from state archives (exemplar 2) will be presented to illustrate how information system design and pedagogy can be informed by exemplary scholarly research and publication, combined with narrativized diagrammatic representations of bibliographic and other relationships in traditional and digital media. exemplar 1. from moby-dick to mash-ups—a print publication history and multimedia mash-up problem document the publication history of print copies of a literary work, identifying editorially driven content transfer across print editions along with content selection and transformation in support of multimedia resource creation. solution the solution to this descriptive problem relies heavily on placing resource descriptions into groups and then defining relationships within and across those groups— i.e., on graph creation. after locating a checklist that documented the publication history of the novel and after identifying key components of a moby-dick and orson welles–themed multimedia resource appropriation and transformation network, murray used the frbr paper tool along with additional connection rules to create a resource description diagram (rdd) that represented g. thomas tanselle’s documentation of the printing history (from 1851 to 1976) of herman melville’s epic novel, moby-dick.32 the resulting diagram provides a high-level view of a large set of printed materials—depicting concepts such as a creative work, the expression of the work in a particular mode of languaging (i.e., speech, sign, image), and more concrete concepts such as publications. to reduce displayed complexity, sets of frbr diagram elements were collapsed into green shaded squares representing entire editions/printings, yielding figure 9.33 the vertical axis represents the year of publication, starting with the 1851 printings at the top. connected squares the resulting network of connections in figure 9 can be interpreted in publishing terms. one line or two or more lines descending downwards from a printing’s green in addition, resource description structures specified in an exemplar can and should represent a more abstract treatment of a resource description and not just data or data structures engaged by end users. exemplars on hand and others to come cultural heritage resource description exemplars have been created over time as solutions to problems of resource description and later made available for use, study, mastery, and improvement. while not necessarily bound to a particular information technology, such as papyrus, parchment, index cards, database records, or rdf aggregations, resource description exemplars have historically provided descriptive solutions of physical resources whose physical and intellectual structure had originally been innovative solutions to describing, for example, ■■ a manuscript (individual and related multiples, published but host to history, imaginary, etc.); ■■ a monograph in one edition (individual and related multiples); ■■ a monograph in multiple editions (individual and related multiples); and ■■ a publication in multiple media, created sequentially or simultaneously. with the advent of electronic and then digital communications media, more complex resource description problem-solution sets have been called for as a response to enduringly or recently more sophisticated creative/ editorial decision-making and to more flexible print and digital information technology production capabilities. the most challenging problem-solution sets involve the assembly and cross-referencing of several multipart—and possibly multimedia—creative or editorially constructed works, such as the following: ■■ a work published as a monograph, but which has been reprinted and reedited; translated into numerous languages; supplemented by illustrations from multiple artists; excerpted and adapted as plays, an opera, comic books, and cartoon series; multimedia mash-ups; and has been directly quoted in paintings and other graphic arts productions, and has been the subject of dissertations, monographs, journal articles, etc. ■■ a continuing publication (individual and related multiple publications, special editions, name, publisher, editorial policy changes, etc.). ■■ a monograph whose main content is composed nearly entirely of excerpts from other print publications.30 ■■ a library-hosted multimedia resource and its associated resource description network. cataloging theory in search of graph theory and other ivory towers | murray and tillett 179 by paper tool diagram creation, analysis, and subsequent action, namely, ■■ connecting the squares (i.e., assigning at least one relationship to a printing) ensures access based on the relationship assigned; and ■■ parties located around the globe can examine a given connected or disconnected resource description network and develop strategies for enhancing its usefulness. the wealth of descriptive information available in the moby-dick exemplar illustrates how previous and future collaborative efforts between cultural heritage institutions and other parties have already generated resource descriptions that possess a network structure alongside its content. with a more graph-friendly and collaborative implementation, melville scholars, scholarly organizations,34 and enthusiasts could more effectively examine, discuss, and through their actions enhance the moby dick resource description network’s documentary, scholarly, and educational value. in its original form, the moby dick resource description diagram (and the exemplar it partially documents) only depicted full-length publications of melville’s work. as a test of the frbr paper tool’s ability to accommodate both traditional and modern creative expressions in individual and aggregate form—while continuing to serve theoretical, practical, and educational ends—murray added a resource description network for orson whales,35 square are interpreted to mean that the printing gave rise to one or more additional printings, which may occur in the same or later years. two or more lines converging on a green square from above indicate that the printing was created by combining texts from multiple prior printings—an editorial/creative technique similar to that used to construct the mash-ups published on the web. connecting unconnected squares tanselle’s checklist did not specify predecessor or successor relationships for each post–1851 printing. this often unavoidable, incomplete status is depicted in figure 9 as green squares that are ■■ not linked to any squares above it, i.e., to earlier printings; and/or ■■ not linked to any squares below it, i.e., to later printings; or ■■ connected islands, without a link to the larger structure. recognizing the extent of moby-dick printing disconnectedness in tanselle’s checklist and developing a strategy for dealing with it only by analyzing tanselle’s checklist would be extremely difficult. in contrast, the disconnectedness of the moby-dick resource description network, and its implications for search-based discovery based on following the depicted relationships is readily discernable in figure 9. the ease with which the disconnected condition can be assessed also hints at benefits to be gained by collaborative resource description supported figure 9. a moby-dick resource description diagram, depicting relationships between printings made between 1851–1976 (greatly reduced scale) 180 information technology and libraries | december 2011 darnton’s book can stand on its own as an exemplar for historical method, with the diagram providing additional diagrammatic support. solution 2 darnton’s analysis treated each poem found in the archives as an individual creative work,38 enabling the use of the frbr paper tool (as a bookkeeping device this time) instead of a tool designed to aggregate and describe archival materials. the resulting diagram is a more articulated frbr paper tool depiction of darnton’s poetry communication network, a section of which appears as figure 11. the depiction of the poetry communication network shown in figure 11 is composed of: ■■ tan squares that depict individuals (clerks, professors, priests, students, etc.) who read, discussed, copied, and passed along the poems. ■■ diagram elements that depict poetry written on scraps of paper (treated as resources) that were police custody, were admitted to having existed by suspects, or assumed to have existed by the police. if one’s theory and business rules permit it, paper tool drawing conventions can depict descriptions of lost and nonexistent but nonetheless describable resources. ■■ arrowed lines that represent relationships between a poem and the individuals who owned copies, those who created or received copies of the poem, etc.39 with darnton’s monograph to provide background information regarding the historical personages involved, relationships between the works and the people, document selection from archival fonds, and the point of view of the scholar, the resulting problem-solution set can: ■■ serve as enhanced documentation for darnton-style communication network analysis and discussion. ■■ serve as an exemplar for catalogers, scholars, and alex itin’s moby-dick-themed multimedia mash-up, to the print media diagram. the four-minute long orson whales multimedia mashup contains hundreds of hand-painted page images from the novel, excerpts from the led zeppelin song “moby dick,” parts of two vocal performances by the actor orson welles, and a video clip from welles’s motion picture citizen kane. the result is shown in figure 10.36 the leftmost group of descriptions in figure 10 depicts various releases of led zeppelin’s “moby dick.” the central group depicts the sources of two orson welles audio dialogues after they had been ripped (i.e., digitized from physical media) and made available online. the grouping on the right depicts the orson whales mash-up itself and collections of digital images of painted pages created from two printed copies of the novel. exemplar 2. poetry and the police—archival content identification and critical analysis problem examine archival collections and select, describe, and document ownership and other relationships of a set of documents (poems) alleged to have circulated within a loosely defined social group. solution 1 in his 2010 work, poetry and the police: communication networks in eighteenth-century paris, historian robert darnton studied a 1749 paris police investigation into the transmission of poems highly critical of the french king, louis xv. after combing state archives for police reports, finding and identifying scraps of paper once held as evidence, and collecting other archival materials, darnton was able to construct a poetry communication network diagram,37 which, along with his narrative account, identified a number of parties who owned, copied, and transmitted six of the scandalous poems and placed their activities in a political, social, and literary context. figure 10. a resource description diagram of alex itin’s moby-dick multimedia work, depicting the resources and their frbr descriptions. cataloging theory in search of graph theory and other ivory towers | murray and tillett 181 with all of the adaptations and excerpts extant within a specified bibliographic universe (such as the cataloging records that appear in oclc’s worldcat bibliographic database). resource description diagrams, created from real-world or theoretically motivated considerations, would then provide a diagrammatic means for depicting the precise and flexible underlying mathematical ideas that, heretofore unrecognized but nonetheless systematically employed, serve resource description ends. if the structure of a well-motivated and constructed resource description diagram subsequently makes data representation and management requirements that a given information system cannot accommodate, cataloging theorists and information technologists alike will then know of that system’s limitations, will work together on mitigating them, and will embark on improving system capabilities. ■■ cataloging theory, tool-making, education, and practice this modernized resource description theory offers new and enhanced roles and benefits for cultural heritage personnel as well as for the scholars, students, and those members of the general public who require support not just for searching, but also for collecting, reading, writing, collaborating, monitoring, etc.40 information systems that others who seek similar solutions to their problems with identifying, describing, depicting, and discussing as individual works documents ordinarily bundled within hierarchically structured archival fonds at multiple locations. ■■ a paper tool into a power tool there are limits to what can be done with a hand-drawn frbr paper tool. while murray was able to depict largescale bibliographic relationships that probably had not been observed before, he was forced to stop work on the moby-dick diagram because much of the useful information available could not fit into a static, hand-drawn diagram. we think that automated assistance in creating resource description diagrams from bibliographic records is required. with that capability available, cataloging theorists and parties with scholarly and pedagogical interests could interactively and efficiently explore how scholars and sophisticated readers describe significant quantities of analog and digital resources. it would then be possible and extremely useful to be able to initiate a scholarly discussion or begin a lecture by saying, “given a moby-dick resource description network . . . ” and then proceed to argue or teach from a diagram depicting all known printings of moby-dick—along figure 11. a section of darnton’s poetry communication network 182 information technology and libraries | december 2011 the value of non-euclidean geometry lies in its ability to liberate us from preconceived ideas in preparation for the time when exploration of physical laws might demand some geometry other than the euclidean.41 taking riemann to heart, we assert that the value of describing cultural heritage resources as observations organized into graphs and of enhancing and supplementing the resource description exemplars that have evolved over time and circumstance rests in opportunities for liberating the cultural heritage community from preconceived ideas about resource description structures and from longstanding points of view on those resources. having achieved such a goal, the cultural heritage community would then be ready when the demand came for resource description structures that must be more flexible and powerful than the traditional ones. given the unprecedented development of the web and the promise of bottom-up semantic web initiatives, we think that the time for the cultural heritage community’s liberation is at hand. ■■ acknowledgments the authors wish to thank beacher wiggins and dianne van der reyden, directors of the library of congress acquisitions and bibliographic access directorate and the preservation directorates, respectively, for supporting the authors’ efforts to explore and renew the scientific and mathematical foundations of cultural heritage resource description. thanks also to marcia ascher, david hay, robert darnton, daniel huson, and mark ragan, whose scholarship informed our own; and to joanne o’brienlevin for her critical eye and for editorial advice. references and notes 1. oed online, “catalogue, n.” http://www.oed.com/view dictionaryentry/entry/28711 (accessed aug. 10, 2011). 2. peter galison, “part ii: building data,” in image & logic: a material culture of microphysics (chicago: univ. of chicago pr., 2003): 370–431. 3. gordon mcquat, “cataloguing power: delineating ‘competent naturalists’ and the meaning of species in the british museum,” british journal for the history of science 34, no. 1 (mar. 2001): 1–28. exclusive control of classification schemes and of the records that named and described its specimens are said to have contributed to the success of the british museum’s institutional mission in the nineteenth century. as a division of the british museum, the british library appears to have incorporated classification concepts (hierarchical structuring) from its parent and elaborated on the museum’s strategies for cataloging species. 4. oed online, “observation, n.” http://www.oed.com/ viewdictionaryentry/entry/129883 (accessed july 8, 2011). couple modern, high-level understandings about how cultural heritage resources can be described, organized, and explored with data models that support linking within and across multiple points of view will be able to support those requirements. the complementarity of cosmological and quantum-level views cataloging theory formation and practice—two areas of activity that did not interest many outside of cultural heritage institutions—can now be understood as a much more comprehensive multilayered activity that is approachable from at least two distinct points of view. the approach presented in this paper represents a cosmological-level view on the bibliographic universe. this treatment of existing or imaginable large-scale configurations of cultural heritage resource descriptions serves as a complement to the quantum-level view of resource description, as characterized by it-related specificities such as character sets, identifiers, rdf triples, triplestores, etc. activities at the quantum level—the domain of semantic web technologists and others—yield powerful and relatively unconstrained information management systems. in the absence of cosmological-level inspiration or guidance, these systems have not necessarily been tested against nontrivial, challenging cultural heritage resource description scenarios like those documented in the above two exemplars. applying both views to the bibliographic universe would clearly be beneficial for all institutional and individual parties involved. if ever a model for multilevel, multidisciplinary effort was required, the history of physics is illuminated by mutually influential interactions of cosmological and quantum-level theories, practices, and pedagogy. workers in cultural heritage institutions and technologists pursuing w3c initiatives would do well to reflect on the result. ■■ ready for the future—and creating the future to explore the cultural, scientific, and mathematical ideas underlying cultural heritage resource description, to identify, study, and teach with exemplars, and to exploit the theoretical reach and bookkeeping capability of paper tool –like techniques is to pay homage to the cultural heritage community’s 170+ year-old talent for pragmatic, implementation-oriented thinking,while at the same time pointing out a rich set of possibilities for enhanced service to society. the cultural heritage community can draw inspiration from geometrician bernhard riemann’s own justification for his version of thinking outside of the box called euclidean geometry: cataloging theory in search of graph theory and other ivory towers | murray and tillett 183 18. the prospects for creating graph-theoretical functions that operate on resource description networks are extremely promising. for example, combinatorica (an implementation of graph theory concepts created for the computer mathematics application mathematica) is composed of more than 450 functions. were cultural heritage resource description networks to be defined using this application’s graph-friendly data format, significant quantities of combinatorica functions would be available for theoretical and applied uses; siriam pemmaraju and steven skiena, computational discrete mathematics: combinatorics and graph theory with mathematica (new york: cambridge univ. pr., 2003). 19. dénes könig, theory of finite and infinite graphs, trans. richard mccoart (boston: birkhaüser, 1990); fred buckley and marty lewinter, a friendly introduction to graph theory (upper saddle river, n.j.: pearson, 2003); oystein ore and robin wilson, graphs and their uses (washington d.c.: mathematical association of america, 1990). 20. leonhard euler, “solutio problematis ad geometriam situs pertinentis,” commentarii academiae scientarium imperalis petropolitanae no. 8 (1736): 128–40. 21. “set theory, branch of mathematics that deals with the properties of well-defined collections of objects, which may or may not be of a mathematical nature, such as numbers or functions. the theory is less valuable in direct application to ordinary experience than as a basis for precise and adaptable terminology for the definition of complex and sophisticated mathematical concepts.” quoted from encyclopædia britannica online, “set theory,” oct. 2010, http://www.britannica.com/ebchecked/ topic/536159/set-theory (accessed oct. 27, 2010). 22. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records: final report (munich: k.g. saur, 1998). this document is downloadable as a pdf from http://www.ifla.org/vii/s13/ frbr/frbr.pdf or as an html page at http://www.ifla.org/vii/ s13/frbr/frbr.htm. 23. ursula klein, ed., experiments, models, paper tools: cultures of organic chemistry in the nineteenth century (stanford, calif.: stanford univ. pr., 2003); klein, ed., tools and modes of representation in the laboratory sciences (boston: kluwer, 2001); david kaiser, drawing theories apart: the dispersion of feynman diagrams in postwar physics (chicago: univ. of chicago pr., 2005). 24. for more examples and a general description of feynman diagrams, see http://www2.slac.stanford.edu/vvc/theory/ feynman.html. 25. an enlarged version of this diagram may be found online. ronald j. murray and barbara b. tillett, “frbr paper tool diagram elements and the frbr resource description graphs they depict,” aug. 2011, http://arizona.openrepository.com/ arizona/bitstream/10150/139769/2/fig%208%20frbr%20 paper%20tool%20elements%20and%20graphs.pdf. other informative illustrations also are available. murray and tillett, “resource description diagram supplement to ‘cataloging theory in search of graph theory and other ivory towers. object: cultural heritage resource description networks,” aug. 2011, http://hdl.handle.net/10150/139769. 26. thomas s. kühn, the structure of scientific revolutions, 2nd ed. (chicago: univ. of chicago pr., 1970). 27. daniel vila suero, “use case report,” world wide web consortium, june 27, 2011, http://www.w3.org/2005/ incubator/lld/wiki/usecasereport. 5. david c. hay, uml and data modeling: a vade mecum for modern times (bradley beach, n.j.: technics pr., forthcoming 2011): 124–25. some scholars argue that decisions as to what the things of interest are and the categories they belong to are influenced by social and political factors. geoffrey c. bowker, susan leigh star, sorting things out: classification and its consequences (cambridge, mass.: mit pr., 1999). 6. gerald holton, “the roots of complementarity,” daedalus 117, no. 3 (1988): 151–97, http://www.jstor.org/stable/20023980 (accessed feb. 24, 2011). 7. niels bohr, quoted in aage petersen, “the philosophy of niels bohr,” bulletin of the atomic scientists 19, no. 7 (sept. 1963): 12. 8. niels bohr, “quantum physics and philosophy: causality and complementarity,” in essays 1958–1962 on atomic physics and human knowledge (woodbridge, conn.: ox bow, 1997): 7. 9. for cataloging theorists, the description of cultural heritage things of interest yields groups of statements that occupy different levels of abstraction. upon regarding a certain physical object, a marketer describes product features, a linguist enumerates utterances, a scholar perceives a work with known or inferred relationships to other works, and so on. 10. marcia ascher, ethnomathematics: a multicultural view of mathematical ideas (pacific grove, calif.: brooks/cole, 1991); ascher, mathematics elsewhere: an exploration of ideas across cultures (princeton: princeton univ. pr., 2002). 11. a timeline of events, people, and so on that have had or should have had an impact on describing cultural heritage resources is available online. seven fields or subfields are represented in the timeline and keyed by color: library & information science; mathematics; ethnomathematics; physical sciences; biological sciences; computer science; and arts & literature. ronald j. murray, “the library organization problem,” dipity .com, aug. 2011, http://www.dipity.com/rmur/libraryorganization-problem/ or http://www.dipity.com/rmur/ library-organization-problem/?mode=fs (fullscreen view). 12. barbara ann barnett tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (phd diss., university of california, los angeles, 1987); elaine svenonius, the intellectual foundation of information organization (cambridge, mass.: mit pr., 2000): 32–51. svenonius’s definition is opposed to database implementations that permitted boolean operations on records at retrieval time. 13. ronald j. murray, “the graph-theoretical library,” slideshare.net, july 5 2011, http://www.slideshare.net/ ronmurray/-the-graph-theoretical-library. 14. francis j. witty, “the pinakes of callimachus,” library quarterly 28, no. 1–4 (1958): 132–36. 15. ronald j. murray, “re-imagining the bibliographic universe: frbr, physics, and the world wide web,” slideshare .net, oct. 22 2010, http://www.slideshare.net/ronmurray/frbrphysics-and-the-world-wide-web-revised. 16. for an overview of the technology-driven library linked data initiative, see http://linkeddata.org/faq. murray’s analyses of cultural heritage resource descriptions may be explored in a series of slideshows at http://www.slideshare.net/ronmurray/. 17. pat riva, martin doerr, and maja žumer, “frbroo: enabling a common view of information from memory institutions,” international cataloging & bibliographic control 38, no. 2 (june 2009): 30–34. 184 information technology and libraries | december 2011 36. the multimedia mash-up in figure 10 was linked to the much larger moby-dick structure depicted in figure 9. the combination of the two yields figure 10a, which is too detailed for printout but which can be downloaded for inspection as the following pdf file: ronald j. murray and barbara b. tillett, “transfer and transformation of content across cultural heritage resources: a moby-dick resource description network covering full-length printings from 1851–1976*,” july 2011, http://arizona.openrepository.com/arizona/bitstream/10150/136270/4/fig%2010a%20orson%20whales%20 in%20moby%20dick%20context.pdf. in the figure, two print publications have been expanded to reveal their own similar mash-up structure. 37. robert darnton, poetry and the police: communication networks in eighteenth-century paris (cambridge, mass.: belknap pr. of harvard univ. pr., 2010): 16. 38. ronald j. murray in a discussion with robert darnton, sept. 20, 2010. darnton considered the poems retrieved from the archives as distinct intellectual creations, which permitted the use of frbr diagram elements for the analysis. otherwise, a paper tool with diagram elements based on the archival descriptive standard isad(g) would have been used. committee on descriptive standards, isad (g): general international standard archival description (stockholm, sweden, 1999– ). 39. the complete poetry communication diagram may be viewed at http://arizona.openrepository.com/arizona/ bitstream/10150/136270/6/fig%2011%20poetry%20commun ication%20network.pdf. 40. carole l. palmer, lauren c. teffeau, and carrie m. pittman, scholarly information practices for the online environment: themes from the literature and implications for library science development (dublin, ohio: oclc research, 2009), http://www . o c l c . o rg / p ro g r a m s / p u b l i c a t i o n s / re p o r t s / 2 0 0 9 0 2 . p d f (accessed july 15, 2011). 41. g. f. b. riemann, quoted in marvin j. greenberg, euclidean and non-euclidean geometry: development and history (new york: freeman, 2008): 371. 28. wikipedia.org, “use case,” june 13, 2011, http://en .wikipedia.org/wiki/use_case. 29. kaiser, drawing theories, 385–86. 30. prime examples being jacques derrida’s typographically complex 1974 work glas (univ. of nebraska pr.), and reality hunger: a manifesto (vintage), david shield’s 2011 textual mashup on the topic of originality, authenticity, and mash-ups in general. 31. graeme simsion, data modeling: theory and practice (bradley beach, n.j.: technics, 2007): 333. 32. herman melville, moby-dick (new york: harper & brothers; london: richard bentley, 1851). moby-dick edition publication history excerpted from g. thomas tanselle, checklist of editions of moby-dick 1851–1976. issued on the occasion of an exhibition at the newberry library commemorating the 125th anniversary of its original publication (evanston, ill.: northwestern univ. pr.; chicago: newberry library, 1976). 33. ronald j. murray, “from moby-dick to mash-ups: thinking about bibliographic networks,” slideshare.net, apr. 2011, http://www.slideshare.net/ronmurray/from-mobydick-to-mashups-revised. the moby-dick resource description diagram was presented to the american library association committee on cataloging: description and access at the ala annual conference, washington d.c., july 2010. 34. the life and works of herman melville, melville.org, july 25, 2000, http://melville.org. 35. the new york artist alex itin describes his creation: “it is more or less a birthday gift to myself. i’ve been drawing it on every page of moby dick (using two books to get both sides of each page) for months. the soundtrack is built from searching ‘moby dick’ on youtube (i was looking for orson’s preacher from the the [sic] john huston film) . . . you find tons of led zep [sic] and drummers doing bonzo and a little orson . . . makes for a nice melville in the end. cinqo [sic] de mayo i turn forty. ahhhhhhh the french champagne.” quoted from alex itin, “orson whales,” youtube, jan. 2011, http://www.youtube .com/watch?v=2_3-gem6o_g. editorial | truitt 3 marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. marc truitt editorial: and now for something (completely) different t he issue of ital you hold in your hands—be that issue physical or virtual; we won’t even go into the question of your hands!—represents something new for us. for a number of years, ex libris (and previously, endeavor information systems) has generously sponsored the lita/ex libris (née lita/endeavor) student writing award competition. the competition seeks manuscript submissions from enrolled lis students in the areas of ital’s publishing interests; a lita committee on which the editor of ital serves as an ex-officio member evaluates the entries and names a winner. traditionally, the winning essay has appeared in the pages of ital. in recent years, perhaps mirroring the waning interest in publication in traditional peerreviewed venues, the number of entrants in the competition has declined. in 2008, for instance, there were but nine submissions, and to get those, we had to extend the deadline six weeks from the end of february to midapril. in previous years, as i understand it, there often were even fewer. this year, without moving the goalposts, we had— hold onto your hats!—twenty-seven entries. of these, the review committee identified six finalists for discussion. the turnout was so good, in fact, that with the agreement of the committee, we at ital proposed to publish not only the winning paper but the other finalist entries as well. we hope that you will find them as stimulating as have we. even more importantly, we hope that by publishing such a large group of papers representing 2009’s best in technology-focused lis work, we will encourage similarly large numbers of quality submissions in the years to come. i would like to offer sincere thanks to my university of alberta colleague sandra shores, who as guest editor for this issue worked tirelessly over the past few months to shepherd quality student papers into substantial and interesting contributions to the literature. she and managing editor judith carter—who guest-edited our recent discovery issue—have both done fabulous jobs with their respective ital special issues. bravo! n ex libris’ sponsorship in one of those ironic twists that one more customarily associates with movie plots than with real life, the lita/ex libris student writing award recently almost lost its sponsor. at very nearly the same time that sandra was completing the preparation of the manuscripts for submission to ala production services (where they are copyedited and typeset), we learned that ex libris had notified lita that it had “decided to cease sponsoring” the student writing award. a brief round of e-mails among principals at lita, ex libris, and ital ensued, with the outcome being that carl grant, president of ex libris north america, graciously agreed to continue sponsorship for another year and reevaluate underwriting the award for the future. we at ital and i personally are grateful. carl’s message about the sponsorship raises some interesting issues on which i think we should reflect. his first point goes like this: it simply is not realistic for libraries to continue to believe that vendors have cash to fund these things at the same levels when libraries don’t have cash to buy things (or want to delay purchases or buy the product for greatly reduced amounts) from those same vendors. please understand the two are tied together. point taken and conceded. money is tight. carl’s argument, i think, speaks as well to a larger, implied question. libraries and library vendors share highly synergistic and, in recent years, increasingly antagonistic relationships. library vendors—and i think library system vendors in particular—come in for much vitriol and precious little appreciation from those of us on the customer side. we all think they charge too much (and by implication, must also make too much), that their support and service are frequently unresponsive to our needs, and that their systems are overly large, cumbersome, and usually don’t do things the way we want them done. at the same time, we forget that they are catering to the needs and whims of a small, highly specialized market that is characterized by numerous demands, a high degree of complexity, and whose members—“standards” notwithstanding—rarely perform the same task the same way across institutions. we expect very individualized service and support, but at the same time are penny-pinching misers in our ability and willingness to pay for these services. we are beggars, yet we insist on our right to be choosers. finally, at least for those of us of a certain generation—and yep, i count myself among its members—we chose librarianship for very specific reasons, which often means we are more than a little uneasy with concepts of “profit” and “bottom line” as applied to our world. we fail to understand the open-source dictum that “free as in kittens and not as in beer” means that we will have to pay someone for these services—it’s only a question of whom we will pay. carl continues, making another point: i do appreciate that you’re trying to provide us more recognition as part of this. frankly, that was another consideration in our thought of dropping it—we just didn’t feel like we were getting much for it. marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2010 i’ve said before and i’ll say again, i’ve never, in all my years in this business had a single librarian say to me that because we sponsored this or that, it was even a consideration in their decision to buy something from us. not once, ever. companies like ours live on sales and service income. i want to encourage you to help make librarians aware that if they do appreciate when we do these things, it sure would be nice if they’d let us know in some real tangible ways that show that is true. . . . good will does not pay bills or salaries unless that good will translates into purchases of products and services (and please note, i’m not just speaking for ex libris here, i’m saying this for all vendors). and here is where carl’s and my views may begin to diverge. let’s start by drawing a distinction between vendor tchotchkes and vendor sponsorship. in fairness, carl didn’t say anything about tchotchkes, so why am i? i do so because i think that we need to bear in mind that there are multiple ways vendors seek to advertise themselves and their services to us, and geegaws are one such. trinkets are nice—i have yet to find a better gel pen than the ones given out at iug 14 (would that i could get more!)—but other than reminding me of a vendor’s name, they serve little useful purpose. the latter, vendor sponsorship, is something very different, very special, and not readily totaled on the bottom line. carl is quite right that sponsorship of the student writing award will not in and of itself cause me to buy aleph, primo, or sfx (oh right, i have that last one already!). these are products whose purchase is the result of lengthy and complex reviews that include highly detailed and painstaking needs analysis, specifications, rfps, site visits, demonstrations, and so on. due diligence to our parent institutions and obligations to our users require that we search for a balance among best-of-breed solutions, top-notch support, and fair pricing. those things aren’t related to sponsorship. what is related to sponsorship, though, is a sense of shared values and interests. of “doing the right thing.” i may or may not buy carl’s products because of the considerations above (and yes, ex libris fields very strong contenders in all areas of library automation); i definitely will, though, be more likely to think favorably of ex libris as a company that has similar—though not necessarily identical—values to mine, if it is obvious that it encourages and materially supports professional activities that i think are important. support for professional growth and scholarly publication in our field are two such values. i’m sure we can all name examples of this sort of behavior: in addition to support of the student writing award, ex libris’ long-standing prominence in the national information standards organization (niso) comes to mind. so too does the founding and ongoing support by innovative interfaces and the library consulting firm r2 for the taiga forum (http://www.taigaforum.org/), a group of academic associate university librarians. to the degree that i believe ex libris or another firm shares my values by supporting such activities—that it “does the right thing”—i will be just a bit more inclined to think positively of it when i’m casting about for solutions to a technology or other need faced by my institution. i will think of that firm as kin, if you will. with that, i will end this by again thanking carl and ex libris—because we don’t say thank you often enough!—for their generous support of the lita/ex libris student writing award. i hope that it will continue for a long time to come. that support is something about which i do care deeply. if you feel similarly—be it about the student writing award, niso, taiga, or whatever—i urge you to say so by sending an appropriate e-mail to your vendor’s representative or by simply saying thanks in person to the company’s head honcho on the ala exhibit floor. and the next time you are neck-deep in seemingly identical vendor quotations and need a way to figure out how to decide between them, remember the importance of shared values. n dan marmion longtime lita members and ital readers in particular will recognize the name of dan marmion, editor of this journal from 1999 through 2004. many current and recent members of the ital editorial board—including managing editor judith carter, webmaster andy boze, board member mark dehmlow, and i—can trace our involvement with ital to dan’s enthusiastic period of stewardship as editor. in addition to his leadership of ital, dan has been a mentor, colleague, boss, and friend. his service philosophy is best summarized in the words of a simple epigram that for many years has graced the wall behind the desk in his office: “it’s all about access!!” because of health issues, and in order to devote more time to his wife diana, daughter jennifer, and granddaughter madelyn, dan recently decided to retire from his position as associate director for information systems and digital access at the university of notre dame hesburgh libraries. he also will pursue his personal interests, which include organizing and listening to his extensive collection of jazz recordings, listening to books on cd, and following the exploits of his favorite sports teams, the football irish of notre dame, the indianapolis colts, and the new york yankees. we want to express our deep gratitude for all he has given to the profession, to lita, to ital, and to each of us personally over many years. we wish him all the best as he embarks on this new phase of his life. 34 information technology and libraries | march 2011 camilla fulton web accessibility, libraries, and the law as a typical student, you are able to scan the resources and descriptions, familiarize yourself with the quiz’s format, and follow the link to the quiz with no inherent problems. everything on the page flows well for you and the content is broken up easily for navigation. now imagine that you are legally blind. you navigate to the webpage with your screen reader, a software device that allows you to surf the web despite your impairment. ideally, the device gives you equal access to webpages, and you can navigate them in an equivalent manner as your peers. when you visit your teacher’s webpage, however, you start experiencing some problems. for one, you cannot scan the page like your peers because the category titles were designed with font tags instead of heading tags styled with cascading style sheets (css). most screen readers use heading tags to create the equivalent of a table of contents. this table of contents function divides the page into navigable sections instead of making the screen reader relay all page content as a single mass. second, most screen readers also allow users to “scan” or navigate a page by its listed links. when you visit your teacher’s page, you get a list of approximately twenty links that all read, “search this resource.” unfortunately, you are unable to differentiate between the separate resources without having the screen reader read all content for the appropriate context. third, because the resources are separated by hard returns, you find it difficult to differentiate between each listed item. your screen reader does not indicate when it approaches a list of categorized items, nor does it pause between each item. if the resources were contained within the proper html list tags of either ordered or unordered (with subsequent list item tagging), then you could navigate through the suggested resources more efficiently (see figures 1, 2, and 3). finally, the video tutorial’s audio tract explains much of the quiz’s structure; however, the video relies on image-capture alone for page orientation and navigation. without a visual transcript, you are at a disadvantage. stylistic descriptions of the page and its buttons are generally unhelpful, but the page’s textual content, and the general movement through it, would better aid you in preparation for the quiz. to be fair, your teacher would already be cognizant of your visual disability and would have accommodated your class needs appropriately. the individuals with disabilities education act (idea) mandates educational institutions to provide an equal opportunity to education.1 your teacher would likely avoid posting any class materials online without being certain that the content was fully accessible and usable to you. unlike educational institutions, however, most libraries are not legally bound to the same law. idea does not command libraries to provide equal access to information through with an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. this paper discusses web accessibility in the context of united states’ federal laws most referenced in web accessibility lawsuits. additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference section 508 of the rehabilitation act or web content accessibility guidelines (wcag) 1.0. regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all. i magine you are a student. in one of your classes, a teacher and librarian create a webpage that will help the class complete an online quiz. this quiz constitutes 20 percent of your final grade. through the exercise, your teacher hopes to instill the importance of quality research resources found on the web. the teacher and librarian divide their hand-picked resources into five subject-based categories. each resource listing contains a link to that particular resource followed by a paragraph of pertinent background information. the list concludes with a short video tutorial that prepares students for the layout of the online quiz. neither the teacher nor the librarian has extensive web design experience, but they both have basic html skills. the library’s information technologists give the teacher and librarian web space, allowing them to freely create their content on the web. unfortunately, they do not have a web librarian at their disposal to help construct the page. they solely rely on what they recall from previous web projects and visual layouts from other websites they admire. as they begin to construct the page, they first style each category’s title with font tags to make them bolder and larger than the surrounding text. they then separate each resource and its accompanying description with the equivalent of hard returns (or line breaks). next, they place links to the resources within the description text and label them with “search this resource.” finally, they create the audiovisual tutorial with a runtime of three minutes. camilla fulton (cfulton2@illinois.edu) is web and digital content access librarian, university of illinois, urbana-champaign. web accessibility, libraries, and the law | fulton 35 providing specifics on when those standards should apply. for example, section 508 of the rehabilitation act could serve as a blueprint for information technology guidelines that state agencies should follow. section 508 states that federal employees with disabilities [must] have access to and use of information and data that is comparable to the access and use by federal employees who are not individuals with disabilities, unless an undue burden would be imposed on the agency.4 section 508 continues to outline how the declaration should be met when procuring and managing software, websites, telecommunications, multimedia, etc. section 508’s web standards comply with w3c’s web content accessibility guidelines (wcag) 1.0; stricter compliance is optional. states could stop at section 508 and only make web accessibility laws applicable to other state agencies. section 504 of the rehabilitation act, however, provides additional legislation to model. in section 504, no disabled person can be excluded from programs or activities that are funded by federal dollars.5 section 504 further their websites. neither does the federal government possess a carte blanche web accessibility law that applies to the nation. this absence of legislation may give the impression of irrelevance, but as more core components of librarianship migrate to the web, librarians should confront these issues so they can serve all patrons more effectively. this article provides background information on the federal laws most frequently referenced within web accessibility cases. additionally, this article tests three assumptions: ■■ although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for their respective state agencies. ■■ most state statutes do not mention section 508 of the americans with disabilities act (ada) or acknowledge world wide web consortium (w3c) standards. ■■ most libraries are not included as entities that must comply with state web accessibility statutes. further discussion on why these issues are important to the library profession follows. ■■ literature review no previous study has systematically examined state web accessibility statutes as they relate to libraries. most articles that address issues related to library web accessibility view libraries as independent entities and run accessibility evaluators on preselected library and university websites.2 those same articles also evaluate the meaning and impact of federal disability laws that could drive the outcome of web accessibility in academia.3 in examining state statutes, additional complexities may be unveiled when delving into the topic of web accessibility and librarianship. ■■ background with no definitive stance on public web accessibility from the federal government, states became tasked with figure 1. these webpages look exactly the same to users, but the html structure actually differs in source code view. 36 information technology and libraries | march 2011 title ii, section 201 (1) defines “public entity” as state and local governments, including their agencies, departments, and districts.9 title iii, section 302(a) builds on title ii and states that in the case of commercial facilities, no individual shall be discriminated against on the basis of disability in the full and equal enjoyment of the goods, services, facilities, privileges, advantages, or accommodations of any place of public accommodation by any person who owns, leases . . . or operates a place of public accommodation.10 delineates specific entities subject to the auspice of this law. though section 504 never mentions web accessibility specifically, states could freely interpret and apply certain aspects of the law for their own use (e.g., making organizations receiving state funds create accessible websites to prevent the exclusion of disabled people). if states wanted to provide the highest level of service to all, they would also consider incorporating the most recent w3c recommendations. the w3c formed in 1994 to address the need for structural consistency across multitudinous websites and web browsers. the driving principle of the w3c is to make the benefits of the web accessible to all, “whatever their hardware, software, network infrastructure, native language, culture, geographical location, or physical or mental ability.”6 the most recent w3c guidelines, wcag 2.0, detail web accessibility guidelines that are simpler to understand and, if followed, could improve both accessibility and usability despite browser type. alternatively, states could decide to wait until the federal government mandates an all-encompassing law on web accessibility. the national federation of the blind (nfb) and american council of the blind (acb) have been trying commercial entities in courts, claiming that inaccessible commercial websites discriminate against disabled people. the famous nfb lawsuit against target provided a precedent for other courts to acknowledge; commercial entities should provide an accessible means to purchase regularly stocked items through their website (if they are already maintaining one).7 these commercial web accessibility lawsuits are often defended with title ii and title iii of the ada. title ii, section 202 states, subject to the provisions of this title, no qualified individual with a disability shall, by reason of such disability, be excluded from participation in or be denied the benefits of the services, programs, or activities of a public entity, or be discriminated by any such entity.8 figure 2. here we see distinct variances in the source code. the image at the top (inaccessible) reveals code that does not use headings or unordered lists for each resource. the image on the bottom (accessible) does use semantically correct code, maintaining the same look and feel of the headings and list items through an attached cascading stylesheet. web accessibility, libraries, and the law | fulton 37 accessibility believe that section 301(7) specifically denotes places of physical accommodation because the authors’ original intent did not include virtual ones.13 settling on a definition for “public accommodation” is so divisive that three district courts are receptive to “public accommodation” referring to nonphysical places, four district courts ruled against the notion, and four have not yet made a decision.14 despite legal battles within the commercial sector, state statute analysis shows that states felt compelled to address web accessibility on their own terms. ■■ method this study surveys the most current state statute web presences as they pertain to web accessibility and their connection to libraries. using georgia institute of technology’s state e&it accessibility initiatives database and golden’s article on accessibility within institutions of higher learning as starting points, i searched each state government’s online statutes for the most recently available code.15 examples of search terms used include “web accessibility,” “information technology,” and “accessibility -building -architecture -health.” “building,” for example, excluded statute results that pertained to building accessibility. i then reviewed each statute to determine whether its mandates applied to web accessibility. some statutes excluded mention of web accessibility but outlined specific requirements for an institution’s software procurement. when statutes on web accessibility could not be found, additional searches were conducted for the most recently available web accessibility guidelines, policies, or standards. using a popular web search engine and the search terms “[state] web accessibility” usually resulted in finding the state’s standards online. if the search engine did not offer desirable results, then i visited the appropriate state government’s website. the term “web accessibility” was used within the state government’s site search. the following results serve only as a guide. because of the ever-changing nature of the law, please consult legal advisors within your institution for changes that may have occurred post article publication. ■■ results “although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for its respective state agencies.” false—only seventeen states have codified laws ensuring web accessibility for their state websites.16 four this title’s proclamation seems clear-cut; however, legal definitions of “public accommodation” differ. title iii, section 301(7) defines a list of acceptable entities to receive the title of “public accommodation.”11 among those listed are auditoriums, theaters, terminals, and educational facilities. courts using title iii in defense for web accessibility argue that the web is a place, and therefore cannot discriminate against those with visual, motor, or mental disabilities.12 those arguing against using title iii for web figure 3. fangs (http://www.standards-schmandards.com/ projects/fangs/) visually emulates what a standard screen reader outputs so that designers can take the first steps in creating more accessible content on the web. 38 information technology and libraries | march 2011 classified institutions with library websites found that less than half of each degree-producing division was directed by their institution to comply with the ada for web accessibility.24 some may not recognize the significance of providing accessible library websites, especially if they do not witness a large quantity of accommodation requests from their users. coincidentally, perceived societal drawbacks could keep disabled users from seeking the assistance they need.25 according to american community survey terminology, disabilities negatively affecting web accessibility tend to be sensory and self-care based.26 the 2008 american community survey public use microdata sample estimates that 10,393,100 noninstitutionalized americans of all ages live with a hearing disability and 6,826,400 live with a visual disability.27 according to the same survey, an estimated 7,195,600 noninstitutionalized americans live with a self-care disability. in other words, nearly 24.5 million people in the united states are unable to retrieve information from library websites unless web authors make accessibility and usability their goal. as gatekeepers of information and research resources, librarians should want to be the first to provide unrestricted and unhindered access to all patrons despite their ability. nonetheless, potential objections to addressing web accessibility can deter improvement: learning and applying web accessibility guidelines will be difficult. there is no way we can improve access to disabled users in a way that will be useful. actually, more than 90 percent of sensory-accessibility issues can be resolved through steps outlined in section 508, such as utilizing headings properly, giving alternative image descriptions, and providing captions for audio and video. granted, these elements may be more difficult to manage on extensive websites, but wisely applied web content management systems could alleviate information technology units’ stress in that respect.28 creating an accessible website is time consuming and resource draining. this is obviously an “undue burden” on our facility. we cannot do anything about accessibility until we are given more funding. the “undue burden” clause seen in section 508 and several state statutes is a real issue that government officials needed to address. however, individual institutions are not supposed to view accessible website creation as an isolated activity. “undue burden,” as defined by the code of federal regulations, relies upon the overall budget of the program or component being developed.29 claiming an “undue burden” means that the institution must extensively document why creating an accessible website would cause a burden.30 the institution would also have to provide disabled users an alternative means of access to information provided online. of these seventeen extended coverage to include agencies receiving state funds (with no exceptions).17 though that number seems disappointingly low, many states addressed web accessibility through other means. thirtyone states without web accessibility statutes posted some form of standard, policy, or guideline online in its place (see appendix). these standards only apply to state entities, however, and have no legal footing outside of federal law to spur enforcement. at the time of article submission, alaska and wyoming were the only two states without an accessibility standard, policy, or guideline available on the web. “most state statutes do not mention section 508 of the americans with disabilities act or acknowledge world wide web consortium (w3c) standards” true—interestingly, only seven of the seventeen states with web accessibility statutes reference section 508 or wcag 1.0 directly within their statute text (see appendix).18 minnesota is the only state that references the more current wcag 2.0 standards.19 these numbers may seem minuscule as well, but all states have supplemented their statutes with more descriptive guidelines and standards that delineate best practices for compliance (see appendix). within those guidelines and standards, section 508 and wcag 1.0 get mentioned with more frequency. “most libraries are not included as entities that must comply with state web accessibility statutes.” true—from the perspective of a librarian, the above data means that forty-eight states would require web accessibility compliance for their state libraries (see appendix). four of those states (arkansas, california, kentucky, and montana) require all libraries receiving state funds to maintain an accessible website.20 an additional four states (illinois, oklahoma, texas, and virginia) explicitly hold universities, and therefore their libraries, to the same standards as their state agencies.21 despite the commendable efforts of eight states pushing for more far-reaching web accessibility, thousands of k–12, public, and academic libraries nationwide escape these laws’ reach. ■■ discussion and conclusion without legal backing for web accessibility issues at all levels, “equitable access to information and library services” might remain a dream.22 notably, researchers have witnessed web accessibility improvements in a four-year span; however, as of 2006, even libraries at institutions with ala-accredited library and information science programs did not average an accessibility validation of 70 percent or higher.23 additionally, a survey of carnegie web accessibility, libraries, and the law | fulton 39 9. 42 u.s.c. §12131. 10. 42 u.s.c. §12182. 11. 42 u.s.c. §12181. 12. carrie l. kiedrowski, “the applicability of the ada to private internet web sites,” cleveland state law review 49 (2001): 719–47; shani else, “courts must welcome the reality of the modern word: cyberspace is a place under title iii of the americans with disabilities act,” washington & lee law review 65 (summer 2008): 1121–58. 13. ibid. 14. nikki d. kessling, “why the target ‘nexus test’ leaves disabled americans disconnected: a better approach to determine whether private commercial websites are ‘places of public accommodation,’” houston law review 45 (summer 2008): 991–1029. 15. state e & it accessibility initiatives workgroup, “state it database,” georgia institute of technology, http://acces sibility.gtri.gatech.edu/sitid/state_prototype.php (accessed jan. 28, 2010); nina golden, “why institutions of higher education must provide access to the internet to students with disabilities,” vanderbilt journal of entertainment & technology law 10 (winter 2008): 363–411. 16. arizona revised statutes §41-3532 (2010); arkansas code of 1987 annotated §25-26-201–§25-26-206 (2009); california government code §11135–§11139 (2010); colorado revised statutes §24-85-101–§24-85-104 (2009); florida statutes §282.601– §282.606 (2010); 30 illinois complied statutes annotated 587 (2010); burns indiana code annotated §4-13.1-3 (2010); kentucky revised statutes annotated §61.980–§ 61.988 (2010); louisiana revised statutes §39:302 (2010); maryland state finance and procurement code annotated §3a-311 (2010); minnesota annotated statutes §16e.03 subdivisions 9-10 (2009); missouri revised statutes §191.863 (2009); montana code annotated §185-601 (2009); 62 oklahoma statutes §34.16, §34.28–§34.30 (2009); texas government code §2054.451–§2054.463 (2009); virginia code annotated §2.2-3500–§2.2-3504 (2010); west virginia code § 18-10n-1–§18-10n-4 (2009). 17. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 18. arizona revised statutes §41-3532 (2010); california government code §11135(d)(2) (2010); burns indiana code annotated §4-13.1-3-1(a) (2010); florida statutes §282.602 (2010); kentucky revised statutes annotated §61.980(1) (2010); minnesota annotated statutes §16e.03 subdivision 9(b) (2009); missouri revised statutes §191.863(1) (2009). 19. minnesota annotated statutes §16e.03 subdivision 9(b) (2009). 20. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 21. 30 illinois complied statutes annotated 587/10 (2010); 62 oklahoma statutes §34.29 (2009); texas government code §2054.451 (2009); virginia code annotated §2.2-3501 (2010). 22. american library association, “alahead to 2010 strategic plan,” http://www.ala.org/ala/aboutala/missionhistory/ plan/2010/index.cfm (accessed jan. 28, 2010). 23. comeaux and schmetzke, “accessibility trends.” no one will sue an institution focused on promoting education. we will just continue providing one-on-one assistance when requested. in 2009, a blind student, backed by the nfb, initiated litigation against the law school admissions council (lsac) because of the inaccessibility of its online tests.31 in 2010, they added four law schools to the defense: university of california hastings college of the law, thomas jefferson school of law, whittier law school, and chapman university school of law.32 these law schools were added because they host their application materials on the lsac website.33 assuredly, if instructors and students are encouraged or required to use library webpages for assignments and research, those unable to use them in an equivalent manner as their peers may pursue litigation for forcible change. ultimately, providing accessible websites for library users should not be perceived as a hassle. sure, it may entail a new way of thinking, but the benefits of universal access and improved usability far outweigh the frustration that users may feel when they cannot be self-sufficient in their web-based research.34 regardless of whether the disabled user is in a k–12, college, university, or public library, they are paying for a service that requires more than just a physical accommodation.35 federal agencies, state entities, and individual institutions are all responsible (and important) in the promotion of accessible website construction. lack of statutes or federal laws should not exempt libraries from providing equivalent access to all; it should drive libraries toward it. references 1. individuals with disabilities education act of 2004, 40 u.s.c. §1411–§1419. 2. see david comeaux and axel schmetzke, “accessibility trends among academic library and library school web sites in the usa and canada,” journal of access services 6 (jan.–june 2009): 137–52; julia huprich and ravonne green, “assessing the library homepages of copla institutions for section 508 accessibility errors: who’s accessible, who’s not and how the online webxact assessment tool can help,” journal of access services 4, no. 1 (2007): 59–73; michael providenti and robert zai iii, “web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93. 3. ibid.; michael providenti and rober zai iii, “web accessibility at academic libraries: standards, legislation, and enforcement,” library hi tech 24, no. 4 (2007): 494–508. 4. 29 u.s.c. §794(d); 36 code of federal regulations (cfr) §1194.1. 5. 29 u.s.c. § 794. 6. world wide web consortium, “w3c mission,” http:// www.w3.org/consortium/mission.html (accessed jan. 28, 2010). 7. national federation of the blind v. target corp., 452 f. supp. 2d 946 (n.d. cal. 2006). 8. 42 u.s.c. §12132. 40 information technology and libraries | march 2011 special needs, vol. 5105, lecture notes in computer science (linz, australia: springer-verlag, 2008) 454–61; david kane and nora hegarty, “new site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87. 29. 28 cfr §36.104. 30. ibid. 31. sheri qualters, “blind law student sues law school admissions council over accessibility,” national law journal (feb. 20, 2009), http://www.law.com/jsp/nlj/pubarticlenlj .jsp?id=1202428419045 (accessed jan. 28, 2010). follow the case at the county of alameda’s superior court of california, available online (search for case number rg09436691): http://apps .alameda.courts.ca.gov/domainweb/html/index.html (accessed sept. 20, 2010). 32. ibid. 33. ibid. after finding the case, click on “register of actions” in the side navigation menu. these details can be found on page 10 of the action “joint case management statement filed,” uploaded june 30, 2010. 34. jim blansett, “digital discrimination: ten years after section 508, libraries still fall short of addressing disabilities online,” library journal 133 (aug. 2008): 26–29; drew robb, “one site fits all: companies are working to make their web sites comply with accessibility guidelines because the effort translates into more customers,” computerworld (mar. 28, 2005): 29–32. 35. the united states department of justice supports title iii’s application of “public accommodation” to include virtual web spaces. see u.s. department of justice, “settlement agreement between the united states of america and city of missoula county, montana under the americans with disabilities act,” dj# 204-44-45, http://www.justice.gov/crt/foia/mt_1.php and http://www.ada.gov/missoula.htm (accessed jan. 28, 2010). 24. ruth sara connell, “survey of web developers in academic libraries,” journal of academic librarianship 34, no. 2 (2008): 121–29. 25. patrick m. egan and traci a. guiliano, “unaccommodating attitudes: perceptions of students as a function of academic accommodation use and test performance” north american journal of psychology 11, no. 3 (2009): 487–500; ramona paetzold et al., “perceptions of people with disabilities: when is accommodation fair?” basic & applied social psychology 30 (2008): 27–35. 26. u.s. census bureau, american community survey, puerto rico community survey: 2008 subject definitions (washington, d.c.: government printing office, 2009). hearing disability pertains to deafness or difficulty in hearing. visual disability pertains to blindness or difficulty seeing despite prescription glasses. self-care disability pertains to those whom have “difficulty dressing or bathing.” 27. u.s. census bureau, data set: 2006–2008 american community survey (acs) public use microdata sample (pums) 3-year estimates (washington, d.c.: government printing office, 2009). for a more interactive table, with statistics drawn directly from the american community survey pums data files, see the database created and maintained by the employment and disability institute at cornell university: m. j. bjelland, w. a. erickson, and c. g. lee, disability statistics from the american community survey (acs), cornell university rehabilitation research and training center on disability demographics and statistics (statsrrtc), http://www.disabilitystatistics.org (accessed jan. 28, 2010). 28. sébastien rainville-pitt and jean-marie d’amour, “using a cms to create fully accessible web sites,” journal of access services 6 (2009): 261–64; laura burzagli et al., “using web content management systems for accessibility: the experience of a research institute portal,” in proceedings of the 11th international conference on computers helping people with appendix. library website accessibility requirements, by state state libraries included? code online state statutes online statements/policies/ guidelines ala. n/a n/a n/a http://isd.alabama.gov/isd/statements .aspx alas. n/a n/a n/a n/a ariz.* state and statefunded (with exceptions) arizona revised statutes §413532 http://www.azleg.state.az.us/ arizonarevisedstatutes.asp? title=41 http://az.gov/polices_accessibility.html ark. state and state-funded arkansas code annotated §2526-201 thru §25-26-206 http://www.arkleg.state.ar.us/assembly/ arkansascodelargefiles/title%2025%20 state%20government-chapter%2026%20 information%20technology.htm and http:// www.arkleg.state.ar.us/bureau/publications/ arkansas%20code/title%2025.pdf http://portal.arkansas.gov/pages/policy .aspx web accessibility, libraries, and the law | fulton 41 state libraries included? code online state statutes online statements/policies/ guidelines calif.* state and state-funded california government code §11135 thru §11139 http://www.leginfo.ca.gov/calaw.html http://www.webtools.ca.gov/accessibility/ state_standards.asp colo. state colorado revised statutes §2485-101 thru §24-85-104 http://www.state.co.us/gov_dir/leg_dir/ olls/colorado_revised_statutes.htm www.colorado.gov/colorado/accessibility .html conn. n/a n/a n/a http://www.access.state.ct.us/ del. n/a n/a n/a http://gic.delaware.gov/information/ access_central.shtml fla.* state florida statutes §282.601 thru §282.606 http://www.leg.state.fl.us/statutes/ http://www.myflorida.com/myflorida/ accessibility.html ga. n/a n/a n/a http://www.georgia.gov/00/static/ 0,2085,4802_0_0_accessibility, 00.html hawaii n/a n/a n/a http://www.ehawaii.gov/dakine/docs/ada .html idaho n/a n/a n/a http://idaho.gov/accessibility.html ill. state and university 30 illinois complied statutes annotated 587 http://www.ilga.gov/legislation/ilcs/ilcs.asp http://www.dhs.state.il.us/page.aspx? item=32765 ind.* state and local government burns indiana code annotated §4-13.1-3 http://www.in.gov/legislative/ic/code/title4/ ar13.1/ch3.html http://www.in.gov/core/accessibility.htm iowa n/a n/a n/a http://www.iowa.gov/pages/accessibility kans. n/a n/a n/a http://www.kansas.gov/about/ accessibility_policy.html ky.* state and state-funded kentucky revised statutes annotated §61.980 thru §61.988 http://www.lrc.ky.gov/krs/titles.htm http://technology.ky.gov/policies/ webtoolkit.htm la. state louisiana revised statutes §39:302 http://www.legis.state.la.us/ http://www.louisiana.gov/government/ policies/#webaccessibility maine n/a n/a n/a http://www.maine.gov/oit/accessibility/ policy/webpolicy.html appendix. library website accessibility requirements, by state (continued) 42 information technology and libraries | march 2011 state libraries included? code online state statutes online statements/policies/ guidelines md. state and (possibly) community college maryland state finance and procurement code annotated §3a311 http://www.michie.com/maryland/ and http://www.dsd.state.md.us/comar/coma r.aspx http://www.maryland.gov/pages/ accessibility.aspx mass. n/a n/a n/a http://www.mass.gov/accessibility and http://www.mass.gov/?pageid=mg2utiliti es&l=1&sid=massgov2&u=utility_policy_ accessibility mich. n/a n/a n/a http://www.michigan.gov/som/0,1607,7– 192–26913–2090—, 00.html minn.** state minnesota annotated statutes §16e. 03 subdivisions 9–10 https://www.revisor.mn.gov/pubs/ http://www.starprogram.state.mn.us/ accessibility_usability.htm miss. n/a n/a n/a http://www.mississippi.gov/access_policy .jsp mo.* state missouri revised statutes §191.863 http://www.moga.mo.gov/statutes/ statutes.htm http://oa.mo.gov/itsd/cio/standards/ ittechnology.htm mont. state and state-funded montana code annotated §185-601 http://data.opi.mt.gov/bills/mca_toc/index .htm http://mt.gov/discover/disclaimer .asp#accessibility neb. n/a n/a n/a http://www.webmasters.ne.gov/ accessibilitystandards.html nev. n/a n/a n/a http://www.nitoc.nv.gov/psps/3.02_ standard_webstyleguide.pdf n.h. n/a n/a n/a http://www.nh.gov/wai/ n.j. n/a n/a n/a http://www.state.nj.us/nj/accessibility.html n.m. n/a n/a n/a http://www.newmexico.gov/accessibility .htm n.y. n/a n/a n/a http://www.cio.ny.gov/policy/nys-p08– 005.pdf n.c. n/a n/a n/a http://www.ncsta.gov/docs/principles%20 practices%20standards/application.pdf n. dak. n/a n/a n/a http://www.nd.gov/ea/standards/ ohio n/a n/a n/a http://ohio.gov/policies/accessibility/ appendix. library website accessibility requirements, by state (continued) web accessibility, libraries, and the law | fulton 43 state libraries included? code online state statutes online statements/policies/ guidelines okla. state and university 62 oklahoma statutes §34.16, §34.28 thru §34.30 http://www.lsb.state.ok.us/ http://www.ok.gov/accessibility/ ore. n/a n/a n/a http://www.oregon.gov/accessibility.shtml pa. n/a n/a n/a http://www.portal.state.pa.us/portal/ server.pt/community/it_accessibility/10940 r.i. n/a n/a n/a http://www.ri.gov/policies/access.php s.c. n/a n/a n/a http://sc.gov/policies/accessibility.htm s. dak. n/a n/a n/a http://www.sd.gov/accpolicy.aspx tenn. n/a n/a n/a http://www.tennesseeanytime.org/web -policies/accessibility.html tex. state and university texas government code §2054.451 thru §2054.463 http://www.statutes.legis.state.tx.us/ http://www.texasonline.com/portal/tol/en/ policies utah n/a n/a n/a http://www.utah.gov/accessibility.html va. state, university, and commonwealth virginia code annotated §2.2-3500 thru §2.2-3504 http://leg1.state.va.us/000/src.htm http://www.virginia.gov/cmsportal3/ about_virginia.gov_4096/web_policy.html vt. n/a n/a n/a http://www.vermont.gov/portal/policies/ accessibility.php wash. n/a n/a n/a http://isb.wa.gov/webguide/accessibility .aspx w. va. state west virginia code §1810n-1 thru §18-10n-4 http://www.legis.state.wv.us/wvcode/ code.cfm http://www.wv.gov/policies/pages/ accessibility.aspx wis. n/a n/a n/a http://www.wisconsin.gov/state/core/ accessibility.html wyo. n/a n/a n/a n/a *these states mention section 508 of the rehabilitation act within statute text **this state mentions wcag 2.0 within its statute text note: most states with statutes on web accessibility also have statements, policies, and guidelines that are more detailed than the statute text and may contain references to section 508 and wcag 2.0. all webpages were visited between january 1, 2010, and february 12, 2010. appendix. library website accessibility requirements, by state (continued) a simple scheme for book classification using wikipedia | yelton 7 andromeda yelton a simple scheme for book classification using wikipedia ■■ background hanne albrechtsen outlines three types of strategies for subject analysis: simplistic, content-oriented, and requirements-oriented.3 in the simplistic approach, “subjects [are] absolute objective entities that can be derived as direct linguistic abstractions of documents.” the content-oriented model includes an interpretive step, identifying subjects not explicitly stated in the document. requirementsoriented approaches look at documents as instruments of communication; thus they anticipate users’ potential information needs and consider the meanings that documents may derive from their context. (see, for instance, the work of hjørland and mai.4) albrechtsen posits that only the simplistic model, which has obvious weaknesses, is amenable to automated analysis. the difficulty in moving beyond a simplistic approach, then, lies in the ability to capture things not stated, or at least not stated in proportion to their importance. synonymy and polysemy complicate the task. background knowledge is needed to draw inferences from text to larger meaning. these would be insuperable barriers if computers limited to simple word counts. however, thesauri, ontologies, and related tools can help computers as well as humans in addressing these problems; indeed, a great deal of research has been done in this area. for instance, enriching metadata with princeton university’s wordnet and the national library of medicine’s medical subject headings (mesh) is a common tactic,5 and the yahoo! category structure has been used as an ontology for automated document classification.6 several projects have used library of congress classification (lcc), dewey decimal classification (ddc), and similar library tools for automated text classification, but their results have not been thoroughly reported.7 all of these tools have had problems, though, with issues such as coverage, currency, and cost. this has motivated research into the use of wikipedia in their stead. since wikipedia’s founding in 2001, it has grown prodigiously, encompassing more than 3 million articles in its english edition alone as of this writing; this gives it unparalleled coverage. wikipedia also has many thesaurus-like features. redirects function as “see” references by linking synonyms to preferred terms. disambiguation pages deal with homonyms. the polyhierarchical category structure provides broader and narrower term relationships; the vast majority of pages belong to at least one category. links between pages function as related-term indicators. editor’s note: this article is the winner of the lita/ex libris student writing award, 2010. because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. however, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. i have developed a script that uses wikipedia as context for analyzing the subjects of nonfiction books. though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research. a s the amount of information in the world increases at an ever-more-astonishing rate, it becomes both more important to be able to sort out desirable information and more egregiously daunting to manually catalog every document. it is impossible even to keep up with all the documents in a bounded scope, such as academic journals; there were more than twenty-thousand peer-reviewed academic journals in publication in 2003.1 therefore a scheme of reliable, automated subject classification would be of great benefit. however, there are many barriers to such a scheme. naive word-counting schemes isolate common words, but not necessarily important ones. worse, the words for the most important concepts of a text may never occur in the text. how can this problem be addressed? first, the most characteristic (not necessarily the most common) words in a text need to be identified—words that particularly distinguish it from other texts. some corpus that connects words to ideas is required—in essence, a way to automatically look up ideas likely to be associated with some particular set of words. fortunately, there is such a corpus: wikipedia. what, after all, is a wikipedia article, but an idea (its title) followed by a set of words (the article text) that characterize that title? furthermore, the other elements of my scheme were readily available. for many books, amazon lists statistically improbable phrases (sips)— that is, phrases that are found “a large number of times in a particular book relative to all search inside! books.”2 and google provides a way to find pages highly relevant to a given phrase. if i used google to query wikipedia for a book’s sips (using the query form “site:en.wikipedia .org sip”), would wikipedia’s page titles tell me something useful about the subject(s) of the book? andromeda yelton (andromeda.yelton@gmail.com) graduated from the graduate school of library and information science, simmons college, boston, in may 2010. 8 information technology and libraries | march 2011 ■■ an initial test case to explore whether my method was feasible, i needed to try it on a test case. i chose stephen hawking’s a brief history of time, a relatively accessible meditation on the origin and fate of the universe, classified under “cosmology” by the library of congress. i began by looking up its sips on amazon.com. noticing that amazon also lists capitalized phrases (caps)—“people, places, events, or important topics mentioned frequently in a book”—i included those as well (see table 1).14 i then queried wikipedia via google for each of these phrases, using queries such as “site:en.wikipedia .org ‘grand unification theory.’” i selected the top three wikipedia article hits for each phrase. this yielded a list of sixty-one distinct items with several interesting properties: ■■ four items appeared twice (arrow of time, entropy [arrow of time], inflation [cosmology], richard feynman). however, nothing appeared more than twice; that is, nothing definitively stood out. ■■ many items on the list were clearly relevant to brief history, although often at too small a level of granularity to be good subject headings (e.g., black hole, second law of thermodynamics, time in physics). ■■ some items, while not unrelated, were wrong as subject classifications (e.g., list of solar system objects by size, nobel prize in physics). ■■ some items were at best amusingly, and at worst bafflingly, unrelated (e.g., alpha centauri [doctor who], electoral district [canada], james k. polk, united states men’s national soccer team). ■■ in addition, i had to discard some of the top google hits because they were not articles but wikipedia special pages, such as “talk” pages devoted to discussion of an article. this test showed that i needed an approach that would give me candidate subject headers at a higher level of granularity. i also needed to be able to draw a brighter line between candidates and noncandidates. the presence of noncandidates was not in itself distressing—any automated approach will consider avenues a human would not—but not having a clear basis for discarding low-probability descriptors was a problem. as it happens, wikipedia itself offers candidate subject headers at a higher level of granularity via its categories system. most articles belong to one or more categories, which are groups of pages belonging to the same list or topic.15 i hoped that by harvesting categories from the sixty-one pages i had discovered, i could improve my method. this yielded a list of more than three hundred categories. unsurprisingly, this list mostly comprised irrelevant because of this thesaurus structure, all of which can be harvested and used automatically, many researchers have used wikipedia for metadata enrichment, text clustering and classification, and the like. for example, han and zhao wanted to automatically disambiguate names found online but faced many problems familiar to librarians: “the traditional methods measure the similarity using the bag of words (bow) model. the bow, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. so the bow cannot reflect the actual similarity.” to counter this, they constructed a semantic model from information on wikipedia about the associative relationships of various ideas. they then used this model to find relationships between information found in the context of the target name in different pages. this enabled them to accurately group pages pertaining to particular individuals.8 carmel, roitman, and zwerdling used page categories and titles to enhance labeling of document clusters. although many algorithms exist for sorting large sets of documents into smaller, interrelated clusters, there is less work on labeling those clusters usefully. by extracting cluster keywords, using them to query wikipedia, and algorithmically analyzing the results, they created a system whose top five recommendations contained the human-generated cluster label more than 85 percent of the time.9 schönhofen looked at the same problem i examine— identifying document topics with wikipedia data—but he used a different approach. he calculated the relatedness between categories and words from titles of pages belonging to those categories. he then used that relatedness to determine how strongly words from a target document predicted various wikipedia categories. he found that although his results were skewed by how wellrepresented topics were on wikipedia, “for 86 percent of articles, the top 20 ranked categories contain at least one of the original ones, with the top ranked category correct for 48 percent of articles.”10 wikipedia has also been used as an ontology to improve clustering of documents in a corpus,11 to automatically generate domain-specific thesauri,12 and to improve wikipedia itself by suggesting appropriate categories for articles.13 in short, wikipedia has many uses for metadata enrichment. while text classification is one of these potential uses, and one with promise, it is under-explored at present. additionally, this exploration takes place almost entirely in the proceedings of computer science conferences, often without reference to library science concepts or in a place where librarians would be likely to benefit from it. this paper aims to bridge that gap. a simple scheme for book classification using wikipedia | yelton 9 computationally trivial to do so, given such a list. (the list need not be exhaustive as long as it exhaustively described category types; for instance, the same regular expression could filter out both “articles with unsourced statements from october 2009” and “articles with unsourced statements from may 2008.”) at this stage of research, however, i simply ignored these categories in analyzing my results. to find a variety of books to test, i used older new york times nonfiction bestseller lists because brand-new books are less likely to have sips available on amazon.19 these lists were heavily slanted toward autobiography, but also included history, politics, and social science topics. ■■ results of the thirty books i examined (the top fifteen each from paperback and hardback nonfiction lists), twenty-one had sips and caps available on amazon. i ran my script against each of these phrase sets and calculated three measures for each resulting category list: ■■ precision (p): of the top categories, how many were synonyms or near-synonyms of the book’s lcshs? ■■ recall (r): of the book’s lcshs, how many had synonyms or near-synonyms among the top categories? ■■ right-but-wrongs (rbw): of the top categories, how many are reminiscent of the lcshs without actually being synonymous? these included narrower terms (e.g., the category “african_american_actors” when the lcshs included “actors—united states —biography”), broader terms (e.g., “american_folk_ singers” vs. “dylan, bob, 1941–”), related terms (e.g., “the_chronicles_of_narnia_books” vs. “lion, the witch and the wardrobe (motion picture)”), and examples (“killian_documents_controversy” vs. “united states—politics and government—2001–2009”). i considered the “top categories” for each book to be the five that most commonly occurred (excluding wikipedia administrative categories), with the following exceptions: ■■ because i had no basis to distinguish between them, i included all equally popular categories, even if that would bring the total to more than five. thus, for example, for the book collapse, the most common category occurred seven times, followed by two categories with five appearances and six categories with four. rather than arbitrarily selecting two of the six four-occurrence categories to bring the total to five, i examined all nine top categories. ■■ if there were more than five lcshs, i expanded the number of categories accordingly, so as not to candidates (“wars involving the states and peoples of asia,” “video games with expansion packs,” “organizations based in sweden,” among many others). many categories played a clear role in the wikipedia ecology of knowledge but were not suitable as general-purpose subject headers (“living people,” “1849 deaths”). strikingly, though, the vast majority of candidates occurred only once. only forty-two occurred twice, fifteen occurred three times, and one occurred twelve times: “physical cosmology.” twelve occurrences, four times as many as the next candidate, looked like a bright line. and “physical cosmology” is an excellent description of brief history— arguably better than lcsh’s “cosmology.” the approach looked promising. ■■ automating further test cases the next step was to test an extensive variety of books to see if the method was more broadly applicable. however, running searches and collating queries for even one book is tedious; investigating a large number by hand was prohibitive. therefore i wrote a categorization script (see appendix) that performs the following steps:16 ■■ reads in a file of statistically improbable phrases17 ■■ runs google queries against wikipedia for all of them18 ■■ selects the top hits after filtering out some common wikipedia nonarticles, such as “category” and “user” pages ■■ harvests these articles’ categories ■■ sorts these categories by their frequency of occurrence this algorithm did not filter out wikipedia administrative categories, as creating a list of them would have been prohibitively time-consuming. however, it would be table 1. sips and caps for a brief history of time sips grand unification energy, complete unified theory, thermodynamic arrow, psychological arrow, primordial black holes, boundary proposal, hot big bang model, big bang singularity, more quarks, contracting phase, sum over histories caps alpha centauri, solar system, nobel prize, north pole, united states, edwin hubble, royal society, richard feynman, milky way, roger penrose, first world war, weak anthropic principle 10 information technology and libraries | march 2011 “continental_army_generals” vs. “united states— history—revolution, 1775–1783.” ■■ weak: some categories treated the same subject as the lcsh but not at all in the same way ■■ wrong: the categories were actively misleading the results are displayed in table 2. ■■ discussion the results of this test were decidedly more mixed than those of my initial test case. on some books the wikipedia method performed remarkably well; on misleadingly increase recall statistics. ■■ i did not consider any categories with fewer than four occurrences, even if that left me with fewer than five top categories to consider. the lists of three-, two-, and one-occurrence categories were very long and almost entirely composed of unrelated items. i also considered, subjectively, the degree of overlap between the lcshs and the top wikipedia categories. i chose four degrees of overlap: ■■ strong: the top categories were largely relevant and included synonyms or near-synonyms for the lcsh ■■ near miss: some categories suggested the lcsh but missed its key points, such as table 2. results (sorted by percentage of relevant categories). book p r rbw subjective quality chronicles, bob dylan 0.2 0.5 0.8 strong the chronicles of narnia: the lion, the witch and the wardrobe official illustrated movie companion, perry moore 0.25 1 0.625 strong 1776, david mccullough 0 0 0.8 near miss 100 people who are screwing up america, bernard goldberg 0 0 0.625 weak the bob dylan scrapbook, 1956–1966, with text by robert santelli 0.2 0.5 0.4 strong three weeks with my brother, nicholas sparks 0 0 0.57 weak mother angelica, raymond arroyo 0.07 0.33 0.43 near miss confessions of a video vixen, karrine steffans 0.25 0.33 0.25 weak the fairtax book, neal boortz and john linder 0.17 0.33 0.33 strong never have your dog stuffed, alan alda 0 0 0.43 weak the world is flat, thomas l. friedman 0.4 0.5 0 near miss the tender bar, j. r. moehringer 0 0 0.2 wrong the tipping point, malcolm gladwell 0 0 0.2 wrong collapse, jared diamond 0 0 0.11 weak blink, malcolm gladwell 0 0 0 wrong freakonomics, steven d. levitt and stephen j. dubner 0 0 0 wrong guns, germs, and steel, jared diamond 0 0 0 weak magical thinking, augusten burroughs 0 0 0 wrong a million little pieces, james frey 0 0 0 wrong worth more dead, ann rule 0 0 0 wrong tuesdays with morrie, mitch albom no category with more than 4 occurrences a simple scheme for book classification using wikipedia | yelton 11 my method’s success with a brief history of time. i tested another technical, jargon-intensive work (n. gregory mankiw’s macroeconomics textbook), and found that the method also worked very well, giving categories such as “macroeconomics” and “economics terminology” with high frequency. therefore a system of this nature, even if not usable for a broad-based collection, might be very useful for scientific or other jargon-intensive content such as a database of journal articles. ■■ future research the method outlined in this paper is intended to be a proof of concept using readily available tools. the following work might move it closer to a real-world application: ■■ a configurable system for providing statistically improbable phrases; there are many options.23 this would provide the user with more control over, and understanding of, sip generation (instead of the amazon black box), as well as providing output that could integrate directly with the script. ■■ a richer understanding of the wikipedia category system. some categories (e.g., “all articles with unsourced statements”) are clearly useful only for wikipedia administrative purposes, not as document descriptors; others (e.g., “physical cosmology”) are excellent subject candidates; others have unclear value as subjects or require some modification (e.g., “environmental non-fiction books,” “macroeconomics stubs”). many of these could be filtered out or reformatted automatically. ■■ greater use of wikipedia as an ontology. for example, a map of the category hierarchies might help locate headers at a useful level of granularity, or to find the overarching meaning suggested by several headers by finding their common broader terms. a more thorough understanding of wikipedia’s relational structure might help disambiguate terms.24 others, it performed very poorly. however, there are several patterns here: many of these books were autobiographies, and the method was ineffective on nearly all of these.20 a key feature of autobiographies, of course, is that they are typically written in the first person, and thus lack any term for the major subject—the author’s name. biography, by contrast, is rife with this term. this suggests that including titles and authors along with sips and caps may be wise. additionally, it might require making better use of wikipedia as an ontology to look for related concepts (rather in the manner that han and zhao used it for name disambiguation).21 books that treat a single, well-defined subject are easier to analyze than those with more sprawling coverage. in particular, books that treat a concept via a sequence of illustrative essays (e.g., tipping point, freakonomics) do not work well at all. sips may apply only to particular chapters rather than to the book as a whole, and the algorithm tends to pick out topics of particular chapters (e.g., for freakonomics, the fascinating chapter on sudhir venkatesh’s work on “gangs_in_chicago, _illinois”22) rather than the connecting threads of the entire book (e.g. “economics—sociological aspects”). the tactics suggested for autobiography might help here as well. my subjective impressions were usually, but not always, borne out by the statistics. this is because some of the rbws were strongly related to one another and suggested to a human observer a coherent narrative, whereas others picked out minor or dissimilar aspects of the book. there was one more interesting, and promising, pattern: my subjective impressions of the quality of the categories were strongly predicted by the frequency of the most common category. remember that in the brief history example, the most common category, “physical cosmology,” occurred twelve times, conspicuously more than any of its other categories. therefore i looked at how many times the top category for each book occurred in my results. i averaged this number for each subjective quality group; the results are in table 3. in other words, the easier it was to draw a bright line between common and uncommon categories, the more likely the results were to be good descriptions of the work. this suggests that a system such as this could be used with very little modification to streamline categorization. for example, it could automatically categorize works when it met a high confidence threshold (when, for instance, the most common category has double-digit occurrence), suggest categories for a human to accept or reject at moderate confidence, and decline to help at low confidence. it was also interesting to me that—unlike my initial test case—none of the bestsellers were scientific or technical works. it is possible that the jargon-intensive nature of science makes it easier to categorize accurately, hence table 3. category frequency and subjective quality subjective quality of categories frequencies of most common category average frequency of most common category strong 6, 12, 16, 19 13.25 near miss 5, 5, 7, 10 6.75 weak 4, 5, 6, 7, 8 6 wrong 3, 4, 4, 5, 5, 5, 7, 7 5 12 information technology and libraries | march 2011 (1993): 219. 4. birger hjørland, “the concept of subject in information science,” journal of documentation 48, no. 2 (1992): 172; jenserik mai, “classification in context: relativity, reality, and representation,” knowledge organization 31, no. 1 (2004): 39; jens-erik mai, “actors, domains, and constraints in the design and construction of controlled vocabularies,” knowledge organization 35, no. 1 (2008): 16. 5. xiaohua hu et al., “exploiting wikipedia as external knowledge for document clustering,” in proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining, paris, france, 28 june–1 july 2009 (new york: acm, 2009): 389. 6. yannis labrou and tim finin, “yahoo! as an ontology— using yahoo! categories to describe documents,” in proceedings of the eighth international conference on information and knowledge management, kansas city, mo, usa 1999 (new york: acm, 1999): 180. 7. kwan yi, “automated text classification using library classification schemes: trends, issues, and challenges,” international cataloging & bibliographic control 36, no. 4 (2007): 78. 8. xianpei han and jun zhao, “named entity disambiguation by leveraging wikipedia semantic knowledge,” in proceeding of the 18th acm conference on information and knowledge management, hong kong, china, 2–6 november 2009 (new york: acm, 2009): 215. 9. david carmel, haggai roitman, and naama zwerdling, “enhancing cluster labeling using wikipedia,” in proceedings of the 32nd international acm sigir conference on research and development in information retrieval, boston, ma, usa (new york: acm, 2009): 139. 10. peter schönhofen, “identifying document topics using the wikipedia category network,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, hong kong, china, 18–22 december 2006 (los alamitos, calif.: ieee computer society, 2007). 11. hu et al., “exploiting wikipedia.” 12. david milne, olena medelyan, and ian h. witten, “mining domain-specific thesauri from wikipedia: a case study,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, 22–26 december 2006 (washington, d.c.: ieee computer society, 2006): 442. 13. zeno gantner and lars schmidt-thieme, “automatic content-based categorization of wikipedia articles,” in proceedings of the 2009 workshop on the people’s web meets nlp, acl-ijcnlp 2009, 7 august 2009, suntec, singapore (morristown, n.j.: association for computational linguistics, 2009): 32. 14. “amazon.com capitalized phrases,” amazon.com, http://www.amazon.com/gp/search-inside/capshelp.html/ ref=sib_caps_help (accessed mar. 13, 2010). 15. for more on the epistemological and technical roles of categories in wikipedia, see http://en.wikipedia.org/wiki/ wikipedia:categorization. 16. two sources greatly helped the script-writing process: william steinmetz, wicked cool php: real-world scripts that solve difficult problems (san francisco: no starch, 2008); and the documentation at http://php.net. 17. not all books on amazon.com have sips, and books that do may only have them for one edition, although many editions may be found separately on the site. there is not a readily apparent pattern determining which edition features sips. therefore ■■ a special-case system for handling books and authors that have their own article pages on wikipedia. in addition, a large-scale project might want to work from downloaded snapshots of wikipedia (via http:// download.wikimedia.org/), which could be run on local hardware rather than burdening their servers, this would require using something other than google for relevance ranking (there are many options), with a corresponding revision of the categorization script. ■■ conclusions even a simple system, quickly assembled from freely available parts, can have modest success in identifying book categories. although my system is not ready for real-world applications, it demonstrates that an approach of this type has potential, especially for collections limited to certain genres. given the staggering volume of documents now being generated, automated classification is an important avenue to explore. i close with a philosophical point. although i have characterized this work throughout as automated classification, and it certainly feels automated to me when i use the script, it does in fact still rely on human judgment. wikipedia’s category structure and its articles linking text to title concepts are wholly human-created. even google’s pagerank system for determining relevancy rests on human input, using web links to pages as votes for them (like a vast citation index) and the texts of these links as indicators of page content.25 my algorithm therefore does not operate in lieu of human judgment. rather, it lets me leverage human judgment in a dramatically more efficient, if also more problematic, fashion than traditional subject cataloging. with the volume of content spiraling ever further beyond our ability to individually catalog documents—even in bounded contexts like academic databases, which strongly benefit from such cataloging— we must use human judgment in high-leverage ways if we are to have a hope of applying subject cataloging everywhere it is expected. references and notes 1. carol tenopir. “online databases—online scholarly journals: how many?” library journal (feb. 1, 2004), http://www .libraryjournal.com/article/ca374956.html (accessed mar. 13, 2010). 2. “amazon.com statistically improbable phrases,” amazon. com, http://www.amazon.com/gp/search-inside/sipshelp .html/ref=sib_sip_help (accessed mar. 13, 2010). 3. hanne albrechtsen. “subject analysis and indexing: from automated indexing to domain analysis,” the indexer, 18, no. 4 a simple scheme for book classification using wikipedia | yelton 13 problematic million little pieces to be autobiography, as it has that writing style, and as its lcsh treats it thus. 21. han and zhao, “named entity disambiguation.” 22. sudhir venkatesh, off the books: the underground economy of the urban poor (cambridge: harvard univ. pr., 2006). 23. see karen coyle, “machine indexing,” the journal of academic librarianship 34, no. 6 (2008): 530. she gives as examples phraserate (http://ivia.ucr.edu/projects/phraserate/), kea (http://www.nzdl.org/kea/), and extractor (http://extractor. com/). 24. per han and zhao, “named entity disambiguation.” 25. lawrence page et al., “the pagerank citation ranking: bringing order to the web,” stanford infolab (1999), http:// ilpubs.stanford.edu:8090/422/ (accessed mar. 13, 2010). this paper precedes the launch of google; as the title indicates, the citation index is one of google’s foundational ideas. this step cannot be automated. 18. be aware that running automated queries without permission is an explicit violation of google’s terms of service. seegoogle webmaster central, “automated queries,” http://www.google.com/support/webmasters/bin/answer .py?hl=en&answer=66357 (accessed mar. 13, 2010). before using this script, obtain an api key, which confers this permission. ajax web search api keys can be instantly and freely obtained via http://code.google.com/apis/ajaxsearch/web.html. 19. “hardcover nonfiction,” new york times, oct. 9, 2005, http://www.nytimes.com/2005/10/09/books/bestseller /1009besthardnonfiction.html?_r=1 (accessed mar. 13, 2010); “paperback nonfiction,” new york times, oct. 9, 2005, http://www .nytimes.com/2005/10/09/books/bestseller/1009bestpapernon fiction.html?_r=1 (accessed mar. 13, 2010). 20. for the purposes of this discussion i consider the appendix. php script for automated classification 4) { echo “i’m sorry; the number specified cannot be more than 4.”; die; } // next, turn our comma-separated list into an array. 14 information technology and libraries | march 2011 $sip_temp = fopen($argv[1], ‘r’); $sip_list = ‘’; while (! feof($sip_temp)) { $sip_list .= fgets($sip_temp, 5000); } fclose($sip_temp); $sip_array = explode(‘, ‘, $sip_list); /* here we access google search results for our sips and caps. it is a violation of the google terms of service to run automated queries without permission. obtain an ajax api key via http://code.google.com. */ $apikey = ‘your_key_goes_here’; foreach($sip_array as $query) { /* in multiword terms, change spaces to + so as not to break the google search. */ $query = str_replace( “ “, “+”,,” $query); $googresult = “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3aen.wikipedia.org+$query&key=$apikey”; $googdata = file_get_contents($googresult); // pick out the urls we want and put them into the array $links preg_match_all(‘|” url”:” [^” ]*”|i’,, $googdata, $links); /* strip out some crud from the json syntax to get just urls */ $links[0] = str_replace( “\” url\”:\” “, “”, $links[0]); $links[0] = str_replace(“\” “, “”, $links[0]); /* here we step through the links in the page google returned to us and find the top wikipedia articles among the results */ $i=0; foreach($links[0] as $testlink) { /* these variables test to see if we have hit a wikipedia special page instead of an article. there are many more flavors of special page, but these are the most likely to show up in the first few hits. */ $filetest = strpos($testlink, ‘wiki/file:’); $cattest = strpos($testlink, ‘wiki/category:’); $usertest = strpos($testlink, ‘wiki/user’); $talktest = strpos($testlink, ‘wiki/talk:’); $disambtest = strpos($testlink, ‘(disambiguation)’); $templatetest = strpos($testlink, ‘wiki/template_’); if (!$filetest && !$cattest && !$usertest && !$talktest && !$disambtest && !$templatetest) { $wikipages[] = $testlink; $i++; } /* once we’ve accumulated as many article pages as the user asked for, stop adding links to the $wikipages array. */ appendix. php script for automated classification (continued) a simple scheme for book classification using wikipedia | yelton 15 if ($i == $argv[2]) { break; } //this closes the foreach loop which steps through $links } // this closes the foreach loop which steps through $sip_array } /* for each page that we identified in the above step, let’s find the categories it belongs to. */ $mastercatarray = array(); foreach ($wikipages as $targetpage) { // scrape category information from the article page. $wikiscrape = file_get_contents($targetpage); preg_match_all(“|/wiki/category.[^\” ]+|”,,” $wikiscrape, $categories); foreach ($categories[0] as $catstring) { /* strip out the “wiki/category:” at the beginning of each string */ $catstring = substr($catstring, 15); /* keep count of how many times we’ve seen this category. */ if (array_key_exists($catstring, $mastercatarray)) { $mastercatarray[$catstring]++; } else { $mastercatarray[$catstring] =1; } } } // sort by value: most popular categories first. arsort($mastercatarray); echo “the top categories are:\n”; print_r($mastercatarray); ?> appendix. php script for automated classification (continued) patrick griffis building pathfinders with free screen capture tools building pathfinders with free screen capture tools | griffis 189 this article outlines freely available screen capturing tools, covering their benefits and drawbacks as well as their potential applications. in discussing these tools, the author illustrates how they can be used to build pathfinding tutorials for users and how these tutorials can be shared with users. the author notes that the availability of these screen capturing tools at no cost, coupled with their ease of use, provides ample opportunity for low-stakes experimentation from library staff in building dynamic pathfinders to promote the discovery of library resources. o ne of the goals related to discovery in the university of nevada las vegas (unlv) libraries’ strategic plan is to “expand user awareness of library resources, services and staff expertise through promotion and technology.”1 screencasting videos and screenshots can be used effectively to show users how to access materials using finding tools in a systematic, step-by-step way. screencasting and screen capturing tools are becoming more intuitive to learn and use and can be downloaded for free. as such, these tools are becoming an efficient and effective method for building pathfinders for users. one such tool is jing (http://www.jingproject.com), freeware that is easy to download and use. jing allows for short screencasts of five minutes or less to be created and uploaded to a remote server on screencast.com. once a jing screencast is uploaded, screencast.com provides a url for the screencast that can be shared via e-mail or instant message or on a webpage. another function of jing is recording screenshots, which can be annotated and shared by url or pasted into documents or presentations. jing serves as an effective tool for enabling librarians working with students via chat or instant messaging to quickly create screenshots and videos that visually demonstrate to students how to get the information they need. jing stores the screenshots and videos on its server, which allows those files to be reused in subject or course guides and in course management systems, course syllabi, and library instructional handouts. moreover, jing’s files storage provides an opportunity for librarians to incorporate tutorials into a variety of spaces where patrons may need them in such a manner that does not require internal library server space or work from internal library web specialists. trailfire (http://www.trailfire.com) is another screencapturing tool that can be utilized in the same manner. trailfire allows users to create a trail of webpage screenshots that can be annotated with notes and shared with others via a url. such trails can provide users with a step-by-step slideshow outlining how to obtain specific resources. when a trail is created with trailfire, a url is provided to share. like jing, trailfire is free to download and easy to learn and use. wink (http://debugmode.com/wink) was originally created for producing software tutorials, which makes it well suited for creating tutorials about how to use databases. although wink is much less sophisticated than expensive software packages, it can capture screenshots, add explanation boxes, buttons, titles, and voice to your tutorials. screenshots are captured automatically as you use your computer on the basis of mouse and keyboard input. wink files can be converted into very compressed flash presentations and a wide range of other file types, such as pdf, but do not support avi files. as such, wink tutorials converted to flash have a fluid movie feel similar to jing screencasts, but wink tutorials also can be converted to more static formats like pdf, which provides added flexibility. slideshare (http://www.slideshare.net) allows for the conversion of uploaded powerpoint, openoffice, or pdf files into online flash movies. an option to sync audio to the slides is available, and widgets can be created to embed slideshows onto websites, blogs, subject guides, or even social networking sites. any of these tools can be utilized for just-in-time virtual reference questions in addition to the common use of just-in-case instructional tutorials. such just-in-time screen capturing and screencasting offer a viable solution for providing more equitable service and teachable moments within virtual reference applications. these tools allow library staff to answer patron questions via e-mail and chat reference in a manner that allows patrons to see processes for obtaining information sources. demonstrations that are typically provided in face-toface reference interactions and classroom instruction sessions can be provided to patrons virtually. the efficiency of this practice is that it is simpler and faster to capture and share a screencast tutorial when answering virtual reference questions than to explain complex processes in written form. additionally, the fact that these tools are freely available and easy to use provides library staff the opportunity to pursue low-stakes experimentation with screen capturing and screencasting. the primary drawback to these freely available tools is that none of them provides a screencast that allows for both voice and text annotations, unlike commercial products such as camtasia and captivate. however, tutorials rendered with these freely available tools can be repurposed into a tutorial within commercial applications like camtasia studio (http://www.techsmith.com/camtasia .asp) and adobe captivate (http://www.adobe.com/ products/captivate/). patrick griffis (patrick.griffis@unlv.edu) is business librarian, university of nevada las vegas libraries. 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . 32 information technology and libraries | june 2007 author id box for 3 column layout column title 32 information technology and libraries | june 2008 communications michaela brenner and peter klein discovering the library with google earth libraries need to provide attractive and exciting discovery tools to draw patrons to the valuable resources in their catalogs. the authors conducted a pilot project to explore the free version of google earth as such a discover tool for portland state library’s digital collection of urban planning documents. they created eye-catching placemarks with links to parts of this collection, as well as to other pertinent materials like books, images, and historical background information. the detailed how-to-do part of this article is preceded by a discussion about discovery of library materials and followed by possible applications of this google earth project. in calhoun’s report to the library of congress, it becomes clear that staff time and resources will need to move from cataloging traditional formats, like books, to cataloging unique primary sources, and then providing access to these sources from many different angles. “organize, digitize, expose unique special collections” (calhoun 2006). in 2005, portland state university library received a grant “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository of the collection, accession, and dissemination of [urban] key planning documents . . . that have high value for oregon citizens and for scholars around the world” (abbott 2005). this collection is called the oregon sustainable community digital library (oscdl) and is an ongoing project that includes literature, planning reports, maps, images, rlis (regional land information system) geographical data, and more. much of the older material is unpublished, and making it available online presents a valuable resource. most of the digitized—and, more recently, borndigital—documents are accessible through the library’s catalog, where patrons can find them together with other library materials about the city of portland. the bibliographic records are arranged in the catalog in an electronic resource management (erm) system (brenner, larsen, and weston 2006). additionally, these bibliographic data are regularly exported from the library catalog to the oscdl web site (http://oscdl. research.pdx.edu) and there integrated with gis (global information system) features, thus optimizing cataloging costs by reusing data in a different electronic environment. committed to not falling into the trap that clifford lynch had in mind when he wrote, “i think there is a mental picture that many of us have that digitization is something you do and you finish . . . a finite, one-time process“ (lynch 2002), and agreeing with gatenby that “it doesn’t matter at all if a user finds our opac through the ‘back door ’“ (gatenby 2007), the authors looked into further using these existing data from the library catalog by making them accessible from a popular and appealing place on the internet, a place that users are more likely to visit than the library catalog. the free version of google earth, a virtual-globe program that can be installed on pcs, lent itself to experimenting. “google earth combines the power of google search with satellite imagery, maps, terrain and 3-d buildings to put the world’s geographic information at your fingertips” (http://earth.google.com). from there, the authors provide links to the digitized documents in the library catalog. easy distribution, as well as the more playful nature of this pilot project and the inclusion of pictures, make the available data even more attractive to users. “google now reigns” “google now reigns,” claims karen markey (markey 2007), and many others agree that using google is easier and more appealing to most than using library catalogs. google’s popularity has been growing spectacularly. in august 2007, google accounted for 64 percent of all u.s. searches (avtec media group 2007). in contrast, the oclc report on how users perceive the library shows that only one percent of the respondents begin their information search on a library web site, while 84 percent use search engines (de rosa, et al. 2005). “if we [libraries] want to survive,” says stephen abram, “we must place our messages where the users are seeking answers and will trip over them. today that usually means at yahoo, msn, and google” (abram 2005). according to lorcan dempsey, in the longer run, traffic to the library catalog will come by linking from larger consolidated resources, like open worldcat and google scholar (dempsey 2005). dempsey also stressed that it becomes more and more significant to differentiate between discovery and location (dempsey 2006a). initially, users want to discover; they want to find what interests them independent from where this information is actually located and available. while there may be lots of valuable, detailed, and exceptionally well-organized bibliographic information in the library catalog, not michaela brenner (brennerm@pdx.edu) is assistant professor and database maintenance and catalog librarian at portland state university library, oregon. peter klein (peter.klein@colorado.edu) is aerospace engineering bs/ms at the university of colorado at boulder. introducing zoomify image | smith 33discovering the library with google earth | brenner and klein 33 many users (one percent) are willing to discover this information through the catalog. they may not discover what a library has to offer if “the library does not find a way to go to the user, rather than waiting for the user to come to the library” (coyle 2007). unless the intent is to keep our treasures buried, the library community needs to work with popular outside discovery environments— like search engines—to bring information available in libraries to users from the outside. libraries are, although sometimes reluctantly, responding. google, google scholar, and google books are open worldcat partner sites that are now or soon will be providing access to worldcat records. google book search includes “find this book in the library,” and the advanced book search also has the option to limit a search to library catalogs with access to the worldcat web record for each item. “deep linking” enables web users to link from search results in yahoo, google, or other partner sites to the “find in a library” interface in open worldcat, and then directly to the item’s record in their library’s online public access catalog (opac). simply put, “find it on google, get it from your library” (calhoun 2006). the “leveraged discovery environment” is an expression coined by dempsey that means it becomes increasingly important to leverage a “discovery environment which is outside your control to bring people back into our catalog environment (like amazon, google scholar)” (dempsey 2006b). issues in calhoun’s report to the library of congress include the question of how to get a google user from google to library collections. she quotes an interviewee saying that “data about a library’s collection needs to be on google and other popular sites as well as the library interface” (calhoun 2006). with evidence pointing to the heavy use of google for discovery and with google earth technology providing such a powerful visualization tool, the authors felt tempted to experiment with existing data from portland state library’s digital oscdl collection and make these data accessible through a virtual globe. the king’s college cultural heritage project martyn jessop from king’s college in london, united kingdom, published an article about a relatively small pilot project on providing access to a digital cultural heritage collection through a geographical information system (jessop 2005). jessop’s approach to explore different technologies and techniques to apply to existing data about unique primary sources was exactly what the authors had in mind with this project, and provided encouragement to move forward with the idea of providing additional access to the oregon sustainable community digital library (oscdl) collections through google earth. similar to jessop, the authors regard it an unaffordable luxury to put a great deal of effort into collecting, digitizing, and cataloging materials without making them available to a much broader audience through multiple access points. comparable to jessop, the goal of this project was to find a relatively simple, low-cost technological solution that could also be applied to a much wider range of data without much more investment in staff time and money. once the authors mastered the initial hurdle of understanding google earth’s programming language, they could easily identify with jessop’s notion of “project creep” as more and more possibilities arose to make the project more appealing. this, as with the king’s college project, was a valuable part of the development process, the details of which are described below. the portland state library oscdl-ongoogle-earth project the authors chose ten portlandbased oscdl sub-collections as the basis of this pilot project: harbor drive, front street, portland public market, urban studies collection, downtown, park blocks, south park blocks, pioneer courthouse square, portland city archives, and jpact (joint policy advisory committee on transportation). the programming language for google earth is kml (keyhole markup language), a file format used to display geographic data. kml is based on the xml standard and can be created with the google earth user interface or from scratch with a simple text editor. having no previous kml experience, the authors decided to use both. figure 1. basic placemark in google earth figure 2. kml script for basic placemark 34 information technology and libraries | june 200834 information technology and libraries | june 2008 a basic placemark provided by google earth (figure 1), copied and pasted in notepad (figure 2), was the starting point. at portland state library, information technology routinely batch export cataloged oscdl data from the library catalog (ils) to the oscdl web site to reuse them. for the google earth project, the authors had two options, to either export data relevant to our collections from the ils to a spreadsheet or to use an existing excel spreadsheet containing most of the same data, including place coordinates. this spreadsheet was one of many others that had been created to keep track for the digitization process as well as for creating bibliographic records for the library catalog later. using the available spreadsheet again, the following data were retained: n the title of the collection n longitude and latitude of the place the collection refers to n a brief description of the collection the following were added manually to the remaining spreadsheet: n all the texts and urls for the collection-specific links n urls for the collection-specific images the authors extracted the placemark-specific script from figure 2 to create a template in notepad. a general description and all links that were the same for the ten collections were added to this template, and placeholders were inserted for collection-specific data (figure 3). using microsoft office word’s mail merge, the authors populated the template with the data from the spreadsheet in one quick step. the result was a kml script that included all the placemark data for the ten collections (figure 4). the script was saved as plain text (.txt) first, and then renamed with the extension .kml, which represents the final file (figure 5). clicking the oscdl.kml icon on a desktop or inside a web application opens google earth. the user “flies” to portland, where ten stars represent the ten collections (figure 6). zooming in, the placemarks show the locations to which the collections refer. considering the many layers and icons available in google earth, the authors decided to use yellow stars to make them more visible. in order to avoid clutter and overlapping labels, titles only appear on mouse-over (figures 7 and 8). figure 9 shows the open placemark for portland public market. “portland state university” with the university’s logo is a link that takes the user to the university’s homepage. the next line is the title of the collection, followed by a brief description. the paragraph after that is the same for all collections and includes links to the portland state university library and the oscdl web site. the collection-specific links that follow next go to the library catalog where the user has access to the digitized manuscripts of this collection (figure 10). other pertinent links—in this case to a book available in the library, a public web site on the history of the market, and a historic image of the market—were added as well. to make the placemarks visually more attractive, all links are presented in the school’s “psu green,” and an image representative of the collection was added. the pictures can be enlarged in a new window by clicking on them. to avoid copyright issues, the authors photographed their own images. the last link opens an e-mail window for questions and comments (figure 11). this link is intended to bring some feedback and suggestions on how to improve the project and on its value for researchers and other users. the authors have been toying with the idea of including in the future more elaborate features such as video clips and music. one more recent feature is that kml files, created in google earth, can now also be viewed on the web by simply entering the url of the kml file into the search box of google maps (figure 12), thus creating google earth placemarks in figure 3. detail of template with variables between « double brackets » figure 4. detail: “downtown” placemark of finished kml script figure 5. simplified process figure 6. ten stars representing the ten collections introducing zoomify image | smith 35discovering the library with google earth | brenner and klein 35 google maps with different view options (figures 13 and 14). not all formatting is correctly transferred, and at this point, there is no way to correct this in google maps. for example, the yellow stars were white, the mouse-over didn’t work and the size of the placemarks was imprecise. however, the content of the placemarks—except for the images which didn’t show on some computers—was fully retained and all links worked (figure 15). although the use of the kml file in google maps is not as elegant as in google earth, it has the advantage that there is no need to install software as with google earth. this adds value to kml files and makes projects like this more versatile. the authors have identified several uses for the kml file: n a workstation in the library can be dedicated to resources about the city of portland. an icon on the desktop of this workstation will open google earth and “fly” directly to portland where the yellow stars are displayed. n professors can easily add the .kml file to webct (now blackboard) or other course management systems. n the file can be e-mailed as an figure 7. zoomed in with mouse-over placemark figure 8. location of the pioneer courthouse square placemark figure 9. portland public market figure 10. access to the collection in library catalog figure 11. ready-to-go e-mail window figure 12. url of kml file in google maps search box figure 13. “map” view in google maps figure 14. “satellite” view in google maps figure 15. portland public market placemark in google maps 36 information technology and libraries | june 200836 information technology and libraries | june 2008 attachment to those interested in the development of the city of portland. n a link from the wikipedia page related to the oscdl project leads to the google earth pilot project. n the project was added to the google earth gallery where many remarkable projects, created by individuals and groups can be found. n it can also be accessed through the oscdl web site, and relevant links from the records in the library catalog to google maps can be included. it may be useful to alert patrons, who actually did come to the catalog by themselves, to this visual tool. conclusion “the question now is not how we improve the catalog as such,” says dempsey. “it is how we provide effective discovery and delivery of library materials in a network environment where attention is scarce and information resources are abundant and where discovery opportunities are being centralized into major search engines and distributed to other environments” (dempsey 2006a). with this in mind, the authors took on the challenge to create another discovery tool for one of the library’s primary unique digital collections. google earth is not the web, and it needs to be installed on a workstation in order to use a kml file. on the other hand, the file created in google earth can also be used on the web more readily but less elegantly in google maps, thus possibly reaching a larger audience. similar to the king’s college project and following abram’s suggestion that “we should experiment more with pilots in specific areas” (abram 2005), this pilot project is of an exploratory, experimental nature. and as with many experiments, the authors were testing an idea, trying something different and new to find out how useful this idea might be, and useful applications for this project were identified. google earth is a sophisticated, attractive, and exciting program—and fun to play with. in a time “where attention is scarce and information resources are abundant,” as dempsey (2006a) says, we need to provide these kinds of discovery tools to attract patrons and to lure them to these valuable resources in our library’s catalog that we created with so much diligence and cost of staff time and resources. works cited abbott, carl. 2005. planning a sustainable portland: a digital library for local, regional, and state planning and policy documents. framing paper. http://oscdl.research.pdx.edu/documents/library_grant.pdf. abram, stephen. 2005. the google opportunity. library journal 130, no. 2: 34. avtec media group. 2007. search engine statistics. http://avtecmedia.com/ internet-marketing/internet-marketing-trends.htm. brenner, michaela, tom larsen, and claudia weston. 2006. digital collection management through the library catalog. information technology and libraries 25, no. 2: 65–77. calhoun, karen. 2006. the changing nature of the catalog and its integration with other discovery tools; final report, prepared for the library of congress. www.loc.gov.proxy.lib.pdx. edu/catdir/calhoun-report-final.pdf. coyle, karen. 2007. the library catalog in a 2.0 world. the journal of academic librarianship 33, no. 2: 289–291. de rosa, cathy et al. 2005. perceptions of libraries and information resources. a report to the oclc membership. www .oclc.org.proxy.lib.pdx.edu/reports/ pdfs/percept_all.pdf. dempsey, lorcan. 2006a. the library catalogue in the new discovery environment: some thoughts. ariadne 48. www.ariadne.ac.uk/issue48/dempsey. dempsey, lorcan. 2006b. lifting out the catalog discovery experience. lorcan dempsey’s weblog on libraries, services, and networks, may 14, 2006. http://orweblog .oclc.org/archives/001021.html dempsey, lorcan. 2005. making data work—web 2.0 and catalogs. lorcan dempsey’s weblog on libraries, services, and networks, october 4, 2005. http://orweblog.oclc .org/archives/000815.html gatenby, janifer. 2007. accessing library materials via google and other web sites. paper presented to elag (european library automation group), may 9, 2007. http://elag2007.upf. edu/papers/gatenby_2.pdf. jessop, martyn. 2005. the application of a geographical information system to the creation of a cultural heritage digital resource. literary and linguistic computing: journal of the association for literary and linguistic computing 20, no. 1: 71–90. lynch, clifford. 2002. digital collections, digital libraries, and the digitization of cultural heritage information. first monday 7, no. 5. www.firstmonday. org/issues/issue7_5/lynch. markey, karen. 2007. the online library catalog. d-lib magazine 13, no. 1/2. www .dlib.org/dlib/january07/markey/01 markey.html. lita cover 2, cover 3, cover 4 index to advertisers using open access institutional repositories to save the student symposium during the covid-19 pandemic article using open access institutional repositories to save the student symposium during the covid-19 pandemic allison symulevich and mark hamilton information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14175 allison symulevich (asymulev@usf.edu) is scholarly communications librarian, university of south florida. mark hamilton (hamiltonma@longwood.edu) is research and digital services librarian, longwood university. © 2022. abstract in 2020, during the covid-19 pandemic, colleges and universities around the world were forced to close or move to online instruction. many institutions host yearly student research symposiums. this article describes how two universities used their institutional repositories to adapt their student research symposiums to virtual events in a matter of weeks. both universities use the bepress digital commons platform for their institutional repositories. even though the two universities’ symposium strategies differed, some commonalities emerged, particularly with regard to learning the best practices to highlight student work and support their universities’ efforts to host research symposiums virtually. introduction many colleges and universities host student research symposiums as a way to celebrate students’ intellectual experiences and support the high-impact practice of presenting original student research. students contribute research outputs and share their projects with others in their institution’s community, beyond the classroom. typically, many of these student research symposiums are conducted in the second half of the spring semester in order to allow students to work on their research throughout the course of the year. during the 2020 school year, the world experienced the covid-19 pandemic. the many ways this pandemic has changed our society are only now being understood, but the pervasive move to virtual meetings and presentations is certainly one of the most dramatic. college campuses began delivering remote instruction in a matter of days and organizers of student research symposiums around the country were forced either to cancel or reimagine the events. longwood university and university of south florida st. petersburg campus (usf) were two institutions that transformed their in-person student symposiums into online events in a matter of weeks. in this article, the authors share their experiences of working with many people throughout their campuses to create a student research symposium experience similar to their past in-person events. both universities use bepress’ digital commons platform for their institutional repositories. overall, longwood’s and usf’s symposium strategies were different in some regards, but some commonalities emerged, particularly with regard to learning the best practices that celebrate the students’ achievements and support their universities’ efforts promoting high-impact student research. literature review student research has grown in importance. following george kuh’s 2008 report, high-impact educational practices: what they are, who has access to them, and why they matter, universities recognized and responded to the need to integrate these high-impact practices into their curricular and co-curricular efforts.1 one of the recognized high-impact practices is student mailto:asymulev@usf.edu mailto:hamiltonma@longwood.edu information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 2 research.2 students can contribute to their disciplinary scholarly conversation through their original research and by presenting on their research projects, and colleges and universities can promote this conversation by facilitating the display of student work and enabling interactive discussions between the student presenters and other members of their academic community. the number of student research conferences has increased internationally. 3 students participate in the formal aspect of these conferences, as well as informal conversations where they can continue to expound on their research, extend their professional social networks, and gain confidence as researchers.4 student research is also being captured in institutional repositories (irs) more than in the past.5 “these most junior members of the academic community are doing research and adding to the body of knowledge generated by their institutions and in their disciplines.”6 passehl-stoddart and monge point out the importance of institutional repositories supporting student work: “the ir also serves to support, enhance, and capture evidence of high-impact educational practices; acts as an equitable access point to meaningful learning opportunities; and provides a platform for students to begin to develop academic confidence and an entryway into the scholarly communication learning cycle.”7 in supporting high-impact student research, librarians do not act alone. we collaborate with other departments on campus such as offices of undergraduate and graduate studies; offices of research, honor colleges, student affairs; and more. krause, eickholt, and otto describe how the library collaborated with the music department at eastern washington university to upload student musical performances to the institutional repository.8 this type of collaboration leads to increased student support, as well as increased discoverability of student intellectual and creative scholarship. when the covid-19 pandemic hit, universities around the world were forced to change their means of conducting business. classes were moved online at many institutions. conferences were either canceled or moved online as well. many colleges and universities around the coun try host student research symposiums to highlight the high-impact work that students are doing. these symposiums needed to move to remote delivery, and many of these had to move quickly as the spring semester was well underway when institutions were being forced to close. symposiums and conferences adapted to online environments by moving away from their inperson events. this applied to both academic and professional conferences. for example, oregon state university (osu) and the new haven local section of the american chemical society hosted their respective events virtually using a variety of technologies. osu worked with its distance learning unit to create a canvas course, whereas the new haven local section of the american chemical society used a combination of open broadcaster software (obs studio), youtube, zoom, and google drive. as far as professional conferences, many used prerecorded sessions when hosting on digital platforms such as zoom.9 there were positive outcomes from these virtual symposiums. for example, osu saw benefits of “enhanced ability to devote personalized attention to presenters (e.g., by providing links to relevant publications or websites), fewer distractions, more time to craft thoughtful responses, and an ability for students to keep track of shared resources and discussants’ contact information that could be used for follow-up after the event.” their post-event surveys also showed that students who could not previously participate due to distance circumstances were able to information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 3 participate in an online forum. osu’s approach involved using canvas, their learning management system (lms), through which students submitted prerecorded lightning talks over powerpoint slides with a written narrative. the canvas course was open to the osu community. discussion boards for commenting were open for a two-day period.10 in her article, stephanie houston interviewed various conference coordinators.11 interviewees stated that a major benefit was global access to information from top researchers.12 with regard to cancer research conferences, free registration vastly expanded the number of registrants from previous years.13 conference hosts felt as though some of the differences of online events would stay for future years because of personal scheduling issues, ability to provide global access, and environmental impact.14 others think the novelty of virtual events may wear off following the pandemic.15 however, the switch to virtual events was not without challenges. osu noted that two main challenges they faced were organization of presentations and presenters responding to comments on their asynchronous presentations.16 houston’s interviewees explain that the lack of informal discussions and face-to-face interactions was a negative of hosting virtual symposiums.17 speirs also states that virtual poster sessions suffer from the lack of interaction of face-to-face exchanges, especially for young researchers.18 some saw that the large number of participants made it difficult for participants to engage in question-and-answer sessions.19 two of the interviewees attempted to fix this by using twitter to have asynchronous q&a using a specific hashtag for the event.20 technology issues such as limited bandwidth and internet connectivity problems are a concern for virtual conferences.21 conferences that are not archived can result in a loss of material beyond the original event. jonathan bull and stephanie davis-kahl discuss the problem of conference ephemera not being accessible in their poster presentation.22 they explain that conference-hosting as an institutional repository service can assist with this lack of accessibility. “by posting documents and artifacts from conferences within an institutional repository, the content is not only accessible for future use, but also preserves those materials for the future and for institutional memory.”23 virtual student research symposiums longwood university’s virtual spring showcase for research and creative inquiry longwood university is a public university in south central virginia. it has about 4,000 undergraduate and 500 graduate students. it is known for its liberal arts focus, with strong programs in the humanities, nursing, and education. since spring of 2018 there has been a spring symposium of undergraduate research (now called the spring showcase for research and creative inquiry). for the first three years of its existence, the spring showcase was planned as a single day event in april, then a fall showcase was added in november 2019. in january 2020, the university showcase committee began planning to have an in-person spring showcase for research and creative inquiry on april 22, 2020. the proposed schedule was to be as follows: students would register to be part of it by march 13, 2020; they would be notified of their acceptance by the end of march, and they would be encouraged to submit posters to the institutional repository, digital commons @ longwood, by the date of the spring showcase, april 22. planning for the in-person showcase continued throughout february and the first part of march. one of the elements on the registration form was giving permission for student content to information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 4 go into the institutional repository. this step had been added in fall 2019 for the previous showcase. as covid-19 cases in the united states began rising in the beginning of march, administrators at longwood began to discuss the possibility of altering certain events. author mark hamilton considered the possibility of offering the institutional repository as a vehicle for hosting digital content for the showcase. by march 23, the director of undergraduate research notified the author that a decision had been made to host the spring showcase as an asynchronous event from april 22–24, 2020. this decision had been made by a small group including the co-chairs of the showcase and the provost in consultation with others. the director of undergraduate research specifically asked the library if the event could be hosted virtually through digital commons @ longwood and also requested a comments feature to facilitate online conversation. students, faculty, and staff would comment on the presentations throughout the three-day showcase. presenters would check the comments within those three days and post replies. the director also asked if it would be possible to upload videos that could go along with posters and other presentations. hamilton and the library’s digital initiatives specialist began to work through the technical aspects of making the showcase virtual. after inquiring about potential software from digital commons, they looked through two suggested options, disqus (https://disqus.com/) and intense debate (https://intensedebate.com/). they decided on intense debate because the comments feature was already integrated into the platform. they also looked through the various video formats available. then they worked with the digital commons representative to develop the structure for the showcase. this involved a bit of dialogue back and forth between the showcase co-chairs, digital commons staff, and the library. because the registration form already gave permission to post student content, it was decided that the university did not need to ask for this permission a second time for a virtual conference. new workflows were developed for the research submission process which included posters, presentations, and videos. faculty would submit files on behalf of students in their classes. library staff and instructional designers developed video and printed upload instructions. they were posted on the showcase website (part of the university website) as well as advertised via the library website. faculty asked students to submit their final projects by thursday, april 16, so there was time to upload the project posters and videos to digital commons @ longwood. most students submitted their projects through the campus lms, canvas. as faculty attempted to upload content, library staff were available to help them. the author helped one faculty member via zoom, describing the process of uploading to digital commons. one process that had to be adjusted involved powerpoint presentations that contained videos—they had to be separately downloaded and then just the powerpoint of the poster re-uploaded, so visitors could both view the powerpoint and watch the video. hamilton and the other staff member worked with faculty to place all the required content for each presentation; then the library’s digital initiatives specialist made all the content live. a number of activities occurred during the showcase. initially the digital initiatives specialist individually approved the comments that were posted, because this was the default set up. later this was changed to allow for automatic posting to speed up the approval process and to remove any apparent bias on the part of the administrators. some faculty also uploaded a few new versions of presentations. some of the science students decided to post only their abstracts, https://disqus.com/ https://intensedebate.com/ information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 5 because they were going to publish their research in journals and did not want their content to be open access and because some faculty were co-authors in these publications. in the subject listing, library staff included a link to the live zoom presentations that were offered. there was also a link to a live zoom session for the showcase awards ceremony, highlighting submissions to the journal of the college of arts and sciences. longwood university has hosted two more virtual showcases: in fall 2020 and spring 2021. the showcase organizers chose to switch the hosting platform to symposium by forager one (https://symposium.foragerone.com/), a third-party platform that allows for virtual and live posting of presentations and videos. the new platform provided an easier interface for students to submit research and administrators to manage it. library staff worked with the showcase organizing committee to preserve all the abstracts from the spring showcase. they are in discussions with how future content will be preserved and whether library staff should collect some of the research into the institutional repository. university of south florida st. petersburg campus virtual student research symposium the university of south florida st. petersburg campus is a branch campus of the larger university of south florida (usf). usf is an r1 research institution with approximately 50,000 undergraduate and graduate students in tampa, florida. at the time of the 2020 virtual student research symposium, usf st. petersburg campus was a separately accredited institution with roughly 5,000 undergraduate and graduate students. the student research symposium was in its 17th iteration in 2020. the office of research at usf st. petersburg organized the event and coordinated with the nelson poynter memorial library and the honors program. undergraduate and graduate students were invited to share their work with the campus community to demonstrate the high-impact research that they were conducting. in 2019, the library had worked with the office of research to host award winners and posters nominated by faculty on the bepress digital commons institutional repository called usfsp digital archive. for the 2020 symposium, the office of research began planning in august 2019. the inperson symposium was scheduled for april 16, 2020. when the covid-19 pandemic hit, the usf st. petersburg campus moved to remote instruction and ended on-campus activities on march 20, 2020. the office of research staff contacted librarians at nelson poynter memorial library to discuss the possibility of a virtual symposium. author allison symulevich considered a variety of platforms for hosting the research symposium, such as the campus website, canvas, libguides, digital commons, and facebook. criteria for platforms included factors related to team control, security, engagement, and archiving. becaus e of these factors and the prior pilot project, the library decided to recommend that the institutional repository be used to host the virtual research symposium. the office of research wanted to capture an experience for the students similar to that of the inperson event. thus, they requested that the platform include both video and audio options, as well as a way for the poster to be viewed. the office of research also requested audience participation through a commenting feature if possible. they also extended student submission deadlines to assist with the disruption in students’ lives. the office of research used a course in canvas, the learning management system used at usf, to collect research posters and presentations from the students. the library digital team was given https://symposium.foragerone.com/' information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 6 access to the canvas course so that the team could download posters, presentations, and abstracts to then upload to the institutional repository. the library uploaded 55 student projects, 43 of which had a video or audio presentation. the digital team had hoped to batch upload the files to the institutional repository using spreadsheets containing metadata such as author names, titles, abstracts, and links to audio or video presentation files. however, due to technical concerns, everything was uploaded manually, with work divided amongst team members who had previously used the system. first, all of the content was downloaded from canvas. these files were then posted to a shared drive in a variety of folders organized to maintain a workflow. the projects with audio/video presentations were uploaded first. then the projects that had abstracts and posters were added. due to time constraints, the digital team wanted to make sure the basics were done first so students would have time to make any revisions necessary before the site was promoted to the usf st. petersburg community. after this initial implementation, the team had a meeting with the larger committee to discuss the progress of the digital collection. the committee suggested some changes and offered constructive feedback. once the abstracts, posters, and presentations (either audio or video) were posted, the team noticed issues with some submissions. some students had submitted powerpoint presentations that did not display as the team was hoping, so one of the team members changed the format to mp4 files. audio files did not include a visual component. as a way to add a visual component, the team worked with digital commons to create a digital image gallery and add thumbnail images that could then be added to a special metadata field called poster preview. this enabled the collection to have a visual of the poster displayed above the audio file, allowing virtual attendees to press play on the audio file and see the poster image on the same page. the team then turned to the office of research’s request for a feature that allowed virtual attendees to interact with student presenters. digital commons does not have a commenting feature, so the digital team had to look at third-party commenting platforms. digital commons was able to integrate the platform chosen, intense debate, so that virtual attendees could comment on presentations. students were asked to monitor their posters for a two-week period. moving forward, the library and the usf st. petersburg campus discussed using the institutional repository for the spring 2021 symposium. however, due to administrative consolidation of the usf tampa, st. petersburg, and sarasota-manatee campuses into a oneusf with a single accreditation, the new combined office of research on the tampa campus decided to host the newly expanded, one-university, undergraduate student symposium through a canvas course.24 download statistics following the research symposiums, the authors looked at metrics for the virtual events. at usf, 55 presentations were uploaded to the ir. all time downloads from april 1, 2020 to december 31, 2021, including additional supplementary files, is 2,068, from 53 countries around the world. total streams of audio or video presentations for the same timeframe are 1,168. at longwood, 200 presentations were uploaded to the ir. all time downloads from april 13, 2020 to december 31, 2021, including additional supplementary files, is 16,190 from 124 countries around the world. total streams of video presentations for the same timeframe are 2,541. see figures 1 and 2. these presentations are still getting downloads and streams—one of the benefits of preserving high-impact student research projects. information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 7 university of south florida longwood university figure 1. downloads of symposium materials from each campus from april 20, 2020 to december 31, 2021. blue represents downloads of presentations. red represents supplementary materials. university of south florida longwood university figure 2. streams of symposium materials from each campus from april 20, 2020 to december 31, 2021. dark blue represents plays, blue represents views, and light blue represents completed viewings. information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 8 best practices after reflecting on these large undertakings to move in-person events to online student research symposiums, the authors have identified some common best practices, meant to assist other institutions making similar decisions. these decisions are based on the following core requisites. consistent university branding although both universities used bepress’ digital commons platform, institutions can use a variety of online platforms, such as campus websites, other institutional repositories, and third-party software for conference hosting, such as symposium by forager one or lumen learning. use a system that creates a cohesive look and feel to the collection of student research projects. usf had to do this with audio only presentations for consistency of viewing, adding a visual co mponent to match those of video presentations. university access or open access use a platform that allows archiving of student projects. even if the platform chosen for hosting the event does not allow for archiving, libraries should work with event hosts to provide institutional repository digital archiving of projects, similar to usf’s pilot project and longwood’s 2021 spring project of archiving abstracts. libraries can offer this as a solution to provide permanent archiving of high-impact student work.25 institutions need to consider whether they will keep their symposiums closed, meaning only accessible to the university community, or open to the world. while it is technically straightforward to restrict access using the campus lms, irs using net id sign-ins, or private websites, the authors argue for worldwide access to these presentations. archiving student work archiving these projects allows students to build their cvs for graduate school or interviews by providing hyperlinked citations to worldwide published projects. making these projects available open access allows students to contribute to the worldwide scholarly conversation on their given research topics.26 statistics from both longwood and usf show international downloads. file formats as far as file formats that work best, consider embedding video and audio files using consistent formats. mp4 video files worked best for both longwood and usf on the digital commons platform. audio files should be consistent as well for preservation. cross-unit collaborations work with other departments on campus to host these major academic events. many units on campus contribute to student success, and these efforts can be combined to distribute work amongst university faculty and staff so as not to overload one department and to provide the best possible symposium. different departments have different skill sets, such as technology and marketing. both longwood and usf st. petersburg libraries worked with departments such as undergraduate studies and communications to switch these in-person events to successful online programs. consider working with distance learning units to increase distance learning student participation in student research symposiums.27 distance learning and it departments may have additional technology experience that could lead to a better overall experience for students. in 2021, longwood worked with the office of student research, the library, marketing and communications, and academic affairs to put on the spring student showcase. this inter-unit information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 9 work led to another successful online event, with several hundred student researchers presenting their work. flexibility institutions should use flexible workflows when transitioning in-person events to online. both longwood and usf used flexible workflows for posting presentations into institutional repositories. however, the two universities differed in their submission process. longwood had faculty submit student projects directly to the ir, a more distributed approach. usf st. petersburg had students submit projects to canvas and then the digital team posted projects to the ir, a more centralized approach. institutions will need to decide which approach works best for them. usf st. petersburg does not have a history of allowing outside submissions to its ir. the digital teams needed to remain flexible as event dates were moving and online technology requests were changing, for example, event coordinators requesting online commenting features. similarly, deadlines should be set with realistic timeframes, allowing enough time for uploading projects to the online platform. longwood and usf worked with offices of research to establish flexible timelines for digital teams. submission forms consider using forms or a system to collect student submissions. google forms, microsoft forms, or a learning management system such as canvas are ways to collect the projects. make sure to test these prior to the submission process. both universities used canvas during the 2020 student research symposiums to collect student projects. however, in 2021, longwood students (both graduate and undergraduate) submitted directly to forager one’s symposium platform because it was already integrated into the campus single sign-on service, enabling ease of submissions. for the graduate student research symposium in 2021, usf used microsoft forms to create a form that was tailored to file format preference. although this form was not used after the office of graduate studies went in another direction, symulevich felt it was an improvement from the previous year’s collection process due to the output of an excel spreadsheet for metadata collection for batch uploading purposes. abstract archiving institutions should consider allowing students to submit abstracts only. longwood allowed students to not submit complete presentations if they were planning to publish their projects. this may be more of an issue when students are working with faculty members on research to be published at a later date. promoting the symposium and creating engagement promote the event to increase student participation. this can be done both through social media and through university web presences. consider working with your campus marketing and communications department to broaden marketing beyond the library. this marketing can be both to gain student projects and to promote the event to the broader campus community. likewise, seek ways to promote engagement on the institutional repository or whatever platform is chosen. this could include using a third-party commenting feature as a way to further engage students with their scholarly topic. however, make sure to monitor commenting in some capacity to avoid spam. also, turn off commenting features after a certain period of time so as not to overburden students. commenting features and increased engagement via online platforms, like video and audio presentations, help avoid the negative impact of a lack of face-to-face interactions.28 information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 10 hybrid symposiums even after the end of the covid pandemic when events resume in-person, hybrid symposium models should be considered, as evidenced by longwood’s use of synchronous presentations using zoom. these links were integrated into the ir. osu is considering using hybrid solutions in the future as well.29 conclusion moving in-person student research symposiums to online platforms during a pandemic is challenging. but this process of creating online events allows students to continue to celebrate their highimpact research and contribute to the scholarly community. open access archiving of these projects has been successful based on download counts at longwood university and usf st. petersburg campus. the authors hope to continue to use innovative digital archiving to provide support for student research projects. remaining flexible and working with other departments on campus can lead to successful online events. the authors hope in-person events will eventually return; however, these online platforms can enhance student research symposiums, providing global access to high-impact student projects. acknowledgement the authors thank the collaborative teams at longwood university and university of south florida st. petersburg campus that helped make these student research symposiums happen and succeed during a very difficult time. endnotes 1 george d. kuh, “high-impact educational practices: what they are, who has access to them, and why they matter,” leap (vol. 2008). association of american colleges & universities, https://provost.tufts.edu/celt/files/high-impact-ed-practices1.pdf. 2 kuh, “high-impact.” 3 helen walkington, jennifer hill, and pauline e. kneale, “reciprocal elucidation: a student-led pedagogy in multidisciplinary undergraduate research conferences,” higher education research & development 36, no. 2 (2017): 417, https://doi.org/10.1080/07294360.2016.1208155. 4 walkington, hill, and kneale, “reciprocal elucidation,” 417–18. 5 danielle barandiaran, betty rozum, and becky thoms, “focusing on student research in the institutional repository: digitalcommons@usu,” college & research libraries news 75, no. 10 (2014): 546–49, https://doi.org/10.5860/crln.75.10.9209; betty rozum, becky thoms, scott bates, and danielle barandiaran, “we have only scratched the surface: the role of student research in institutional repositories” (paper, acrl 2015 conference, portland, or, march 26, 2015), https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/rozum_thoms_bates_barandiaran.pdf. 6 rozum, thoms, bates, and barandiaran, “we have only scratched the surface,” 804. https://provost.tufts.edu/celt/files/high-impact-ed-practices1.pdf https://doi.org/10.1080/07294360.2016.1208155 https://doi.org/10.5860/crln.75.10.9209 https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/rozum_thoms_bates_barandiaran.pdf https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/rozum_thoms_bates_barandiaran.pdf information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 11 7 erin passehl-stoddardt and robert monge, “from freshman to graduate: making the case for student-centric institutional repositories,” journal of librarianship and scholarly communication 2, no. 3 (2014): 2, https://doi.org/10.7710/2162-3309.1130. 8 rose sliger krause, andrea langhurst eickholt, and justin l. otto, “creative collaboration: student creative works in the institutional repository,” digital library perspectives 34, no. 1 (2018): 20–31, https://doi.org/10.1108/dlp-03-2017-0010. 9 jessica g. freeze et al., “orchestrating a highly interactive virtual student research symposium,” journal of chemical education 97, no. 9 (2020): 2773–78, https://dx.doi.org/10.1021/acs.jchemed.0c00676; sophie pierszalowski et al., “developing a virtual undergraduate research symposium in response to covid-19 disruptions: building a canvas-based shared platform and pondering lessons learned,” scholarship and practice of undergraduate research 4, no. 1 (fall 2020): 75, https://doi.org/10.18833/spur/4/1/10. 10 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 11 stephanie houston, “lessons of covid-19: virtual conferences,” journal of experimental medicine 217, no. 9 (2020): e20201467, https://doi.org/10.1084/jem.20201467. 12 houston, “lessons of covid-19,” 2. 13 valerie speirs, “reflections on the upsurge of virtual cancer conferences during the covid -19 pandemic,” british journal of cancer 123 (2020): 698–99, https://doi.org/10.1038/s41416020-1000-x. 14 houston, “lessons of covid-19,” 3. 15 speirs, “reflections on the upsurge,” 699. 16 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 17 houston, “lessons of covid-19,” 2–3; goedele roos et al., “online conferences—towards a new (virtual) reality,” computational and theoretical chemistry 1189 (november 2020): 5, https://doi.org/10.1016/j.comptc.2020.112975. 18 speirs, “reflections on the upsurge,” 699. 19 houston, “lessons of covid-19,” 2–3. 20 houston, “lessons of covid-19,” 2. 21 houston, “lessons of covid-19,” 3; roos et al., “online conferences,” 5; speirs, “reflections on the upsurge,” 699. 22 jonathan bull and stephanie davis-kahl, “contributions to the scholarly record: conferences & symposia in the repository,” library faculty presentations (2015): paper 12, http://scholar.valpo.edu/ccls_fac_presentations/12. 23 bull and davis-kahl, “contributions to the scholarly record.” https://doi.org/10.7710/2162-3309.1130 https://doi.org/10.1108/dlp-03-2017-0010 https://dx.doi.org/10.1021/acs.jchemed.0c00676 https://doi.org/10.18833/spur/4/1/10 https://doi.org/10.1084/jem.20201467 https://doi.org/10.1038/s41416-020-1000-x https://doi.org/10.1038/s41416-020-1000-x https://doi.org/10.1016/j.comptc.2020.112975 http://scholar.valpo.edu/ccls_fac_presentations/12 information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 12 24 digital commons @ usf will be used for a hybrid symposium, the 2022 annual undergraduate research conference. there will be an in-person component, as well as both synchronous and asynchronous presentations. 25 passehl-stoddardt and monge, “from freshman to graduate,” 2; barandiaran, rozum, and thoms, “focusing on student research in the institutional repository”; rozum, thoms, bates, and barandiaran, “we have only scratched the surface,” 804. 26 houston, “lessons of covid-19,” 2. 27 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 28 houston, “lessons of covid-19,” 2–3; roos et al., “online conferences,” 3. 29 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. abstract introduction literature review virtual student research symposiums longwood university’s virtual spring showcase for research and creative inquiry university of south florida st. petersburg campus virtual student research symposium download statistics university of south florida longwood university university of south florida longwood university best practices consistent university branding university access or open access archiving student work file formats cross-unit collaborations flexibility submission forms abstract archiving promoting the symposium and creating engagement hybrid symposiums conclusion acknowledgement endnotes 2 information technology and libraries | march 2009 andrew k. pace president’s message: lita now andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. a t the time of this writing, my term as lita president is half over; by the time of publication, i will be in the home stretch—a phrase that, to me, always connotes relief and satisfaction that is never truly realized. i hope that this time between ala conferences is a time of reflection for the lita board, committees, interest groups, and the membership at large. various strategic planning sessions are, i hope, leading us down a path of renewal and regeneration of the division. of course, the world around us will have its effect—in particular, a political and economic effect. first, the politics. i was asked recently to give my opinion about where the new administration should focus its attention regarding library technology. i had very little time to think of a pithy answer to this question, so i answered with my gut that the united states needs to continue its investment in it infrastructure so that we are on par with other industrialized nations while also lending its aid to countries that are lagging behind. furthermore, i thought it an apt time to redress issues of data privacy and retention. the latter is often far from our minds in a world more connected, increasingly through wireless technology, and with a user base that, as one privacy expert put it, would happily trade a dna sample for an extra value meal. i will resist the urge to write at greater length a treatise on the bill of rights and its status in 2008. i will hope, however, that lita’s technology and access and legislation and regulation committees will feel reinvigorated post–election and post–inauguration to look carefully at the issues of it policy. our penchant for new tools should always be guided and tempered by the implementation and support of policies that rationalize their use. as for the economy, it is our new backdrop. one anecdotal view of this is the number of e-mails i’ve received from committee appointees apologizing that they will not be able to attend ala conferences as planned because of the economic downturn and local cuts to library budgets. libraries themselves are in a paradoxical situation—increasing demand for the free services that libraries offer while simultaneously facing massive budget cuts that support the very collections and programs people are demanding. what can we do? well, i would suggest that we look at library technology through a lens of efficiency and cost savings, not just from a perspective of what is cool or trendy. when it comes to running systems, we need to keep our focus on end-user satisfaction while considering total cost of ownership. and if i may be selfish for a moment, i hope that we will not abandon our professional networks and volunteer activities. while we all make sacrifices of time, money, and talent to support our profession, it is often tempting when economic times are hard to isolate ourselves from the professional networks that sustain us in times of plenty. politics and economics? though i often enjoy being cynical, i also try to make lemonade from lemons whenever i can. i think there are opportunities for libraries to get their own economic bailout in supporting public works and emphasizing our role in contributing to the public good. we should turn our “woe-are-we” tendencies that decry budget cuts and low salaries into championed stories of “what libraries have done for you lately.” and we should go back to the roots of it, no matter how mythical or anachronistic, and think about what we can do technically to improve systemwide efficiencies. i encourage the membership to stay involved and reengage, whether through direct participation in lita activities or through a closer following of the activities in the ala office of information technology policy (oitp, www.ala.org/ala/aboutala/offices/oitp) and the ala washington office itself. there is much to follow in the world that affects our profession, and so many are doing the heavy lifting for us. all we need to do sometimes is pay attention. make fun of me if you want for stealing a campaign phrase from richard nixon, but i kept coming back to it in my head. in short, library information technology— now more than ever. editor’s comments bob gerrity information technology and libraries | december 2012 1 past and present converge with the december 2012 issue of information technology and libraries (ital), as we also publish online the first volume of ital’s predecessor, the journal of library automation (jola), originally published in print in 1968. the first volume of jola offers a fascinating glimpse into early days of library automation, when many things were different, such as the size (big) and capacity (small) of computer hardware, and many things were the same (e.g., richard johnson’s description of the book catalog project at stanford, where “the major achievement of the preliminary systems design was to establish a meaningful dialogue between the librarian and systems and computer personnel.” plus ça change, plus c'est la meme. there are articles by luminaries in the field: richard de gennaro describes approaches to developing an automation program in a large research library, frederick kilgour, from the ohio bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. http://ejournals.bc.edu/ojs/index.php/ital/issue/view/312 editor’s comments bob gerrity editor’s comments | gerrity 2 college library center (now oclc), analyzes catalog-card production costs at columbia, harvard, and yale in the mid 1960s (8.8 to 9.8 cents per completed card), and henriette avram from the library of congress describes the successful use of the cobol programming language to manipulate marc ii records. the december 2012 issue marks the completion of ital’s first year as an e-only, open-access publication. while we don’t have readership statistics for the previous print journal to compare with, download statistics for the e-version appear healthy, with more than 30,000 full-text article downloads for 2012 content so far this year, plus more than 10,000 downloads for content from previous years. based on the download statistics, the topics of most interest to today’s ital readers are discovery systems, web-based research guides, digital preservation, and digital copyright. this month’s issue takes some of these themes further, with articles that examine the usability of autocompletion features in library search interfaces (ward, hahn, and feist), reveal patterns of student use of library computers (thompson), propose a cloud-based digital library storage solution (sosa-sosa), and summarize attributes of open standard file formats (park, oh). happy reading. from chatgpt to catgpt: the implications of artificial intelligence on library cataloging article from chatgpt to catgpt the implications of artificial intelligence on library cataloging richard brzustowicz information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.16295 richard brzustowicz (rrbrzustowicz@carlow.edu) is instruction and outreach librarian, carlow university. © 2023. abstract this paper explores the potential of language models such as chatgpt to transform library cataloging. through experiments with chatgpt, the author demonstrates its ability to generate accurate marc records using rda and other standards such as the dublin core metadata element set. these results demonstrate the potential of chatgpt as a tool for streamlining the record creation process and improving efficiency in library settings. the use of ai-generated records, however, also raises important questions related to intellectual property rights and bias. the paper reviews recent studies on ai in libraries and concludes that further research and development of this innovative technology is necessary to ensure its responsible implementation in the field of library cataloging. introduction as librarianship continues to evolve in the digital age, the importance of cataloging as a tool for accessing vast amounts of information cannot be overstated. unfortunately, this crucial process can be both labor-intensive and time-consuming, often requiring significant resources. in recent years, automation and artificial intelligence (ai) technologies have emerged as potential solutions for streamlining workflows. openai’s language model chatgpt1 is one such technology, offering the potential to automate various tasks, including text generation and even creating working code.2 this paper explores the potential applications of chatgpt in library cataloging, examining the results of my own experiments using this innovative technology. literature review large language models (llms) applications have been explored in a range of contexts. taecharungroj explores reactions to chatgpt, noting wide public interest expressed via social media posts. due to chatgpt’s ability to generate accurate information in a conversational tone, it provides an accessible medium for interacting with an ai resource. these technologies will usher in substantial changes to how we do our work: “the next evolution of jobs will likely be impacted by chatgpt and other innovative ai technologies.”3 at the same time, the author cautions that while chatgpt performs admirably in generating semantically and syntactically correct information, it does not always provide accurate information. therefore, this presents an opening for targeted professional development opportunities in fields that may be affected by llms. kasneci et al. argue that ai and language models, if used effectively, are an opportunity for better, more learner-centered education. they caution, however, that llms’ potential for bias may necessitate careful training and review of records by professionals: “[i]f a model is trained on data that is biased towards certain groups of people, it may produce results that are unfair or discriminatory towards those groups (e.g., local knowledge about minorities such as small ethnic groups or cultures can fade into the background).”4 to counter the potential for bias and abuse, the authors advise that individuals using these resources should ensure that the data training the mailto:rrbrzustowicz@carlow.edu information technology and libraries september 2023 from chatgpt to catgpt 2 brzustowicz ai systems are diverse and inclusive. to that end, they recommend making the data open and available to users. likewise, the authors recommend regular reviews and continuous updates of ai models. this will ensure not only that the ai (e.g., chatgpt) generates accurate information, but it will also provide a model for continuous improvement of the machine learning algorithm.5 recent research has examined the potential of chatgpt in the library context. lund and wang discuss the implications of chatgpt for libraries and the academy. according to their findings, chatgpt can enhance library search systems’ accuracy and effectiveness by automatically generating cataloging and metadata for library resources, such as book titles and descriptions.6 however, they also note the potential ethical and legal implications of using chatgpt. as chatgpt draws on publicly available material, it may include copyrighted content, necessitating careful evaluation of ai-generated content and coordination with the copyright holder. according to atlas, the initial impact of ai in higher education is still not fully understood, but it does present an opportunity for professionals to incorporate chatgpt into their workflow to increase efficiency and streamline tasks. atlas advises that a well-thought-out plan with clear goals in mind is crucial when utilizing chatgpt.7 these findings suggest that ai resources such as chatgpt have the potential to improve library cataloging by automating labor-intensive and timeconsuming tasks. moreover, they offer opportunities for professional development resources to ensure that libraries remain up to date with current technologies. methodology to evaluate chatgpt’s potential for library cataloging, i used the chatgpt free research preview version (released february 13, 2023) to generate marc records in rda for various items. when asked about its training data, chatgpt replied: i was trained on a vast amount of data, including examples of marc and rda records, as well as the rules and guidelines for creating these records. my training data includes a wide variety of cataloging resources, such as online guides and manuals, training modules, and real-world examples of marc and rda records. this training allows me to generate accurate and standardized marc and rda records based on the information provided (chatgpt, personal communication, february 23, 2023). while this response was interesting, i wanted to test chatgpt’s claims. i asked it to generate records for six items, including one with no corresponding worldcat entry, and compared the results to records created by human catalogers. five items had existing marc records, while the sixth item was specifically chosen to test chatgpt’s ability to generate an original record. i evaluated chatgpt’s accuracy, efficiency, and ability to handle different types of materials and cataloging/metadata standards, given that chatgpt is known to produce both accurate and inaccurate or misleading information.8 to test chatgpt’s ability to create marc records, i conducted an initial experiment using readily available materials. i asked chatgpt to generate a marc record for the 1996 edition of anne rice’s interview with the vampire9 using rda (chatgpt, personal communication, february 23, 2023). the resulting record is shown in table 1, and i compared it to a record in oclc’s worldcat, which is illustrated in table 2. the results of this test indicate that chatgpt can produce an accurate and effective record for interview with the vampire. information technology and libraries september 2023 from chatgpt to catgpt 3 brzustowicz after this first success, i attempted to generate a marc record for the 2018 vinyl reissue of david bowie’s 1977 album low10 using chatgpt and the rda standard (chatgpt, personal communication, february 23, 2023). the resulting marc record is presented in table 3, which was then compared to professional catalogers’ records. table 4 shows an existing marc record for low in oclc’s worldcat. notable differences were observed between the human-generated and chatgpt-generated marc records, with the chatgpt record lacking foreign-language headings and subject headings in certain fields (6xx). this is not surprising, as such tasks require a degree of personal discernment on the part of the cataloger. these discrepancies spurred me to investigate the applications further. i refined the question to test chatgpt’s ability to generate appropriate library of congress call numbers. for this example, i requested: “generate a marc record using rda that includes library of congress call number for the 1971 german edition of pedagogy of the oppressed”11 (chatgpt, personal communication, february 24, 2023). tables 5 and 6 demonstrate that while chatgpt may not always “choose” the same subject access points or consistently format all relevant fields as effectively as a human cataloger, given proper training and oversight it can be used as an effective supplement to human cataloging. the accurate formatting of field 050 and appropriate “dummy” call number (lb875.p442) further demonstrate this technology’s potential for streamlining cataloging and resource description, given proper training. in this instance, the ai noted that multiple fields would need to be edited: “please note that the control number (001) and the date (005) in the above record are placeholders and should be replaced with actual values when creating the record” (chatgpt, personal communication, february 24, 2023). to further put chatgpt’s abilities to the test, i asked it to generate a citation for the 2018 russian print edition of cixin liu’s the three body problem12 (chatgpt, personal communication, march 2, 2023). this was a more complex request than the previous ones; it required chatgpt to extract and incorporate metadata from a non-latin character set (cyrillic) and in a foreign language. table 7 shows the marc record generated by chatgpt, while table 8 displays the existing marc record for the russian translation of this work found in worldcat. although there were differences between the two records, chatgpt’s output was comparable to the professional catalogers’ work. the discrepancies between the records, however, suggested that chatgpt was not merely reproducing existing records but creating original marc records, as it claimed. the results of this test further demonstrate chatgpt’s potential as a powerful tool for automating the generation of accurate metadata records. during my testing, i discovered that the limited vinyl pressing of alternative rock band mood rings’ 2013 single “pathos y lagrimas”13 had no worldcat entry. to see if chatgpt could generate an original marc record for this item, i asked it, “can you generate a marc record using rda for mood rings’ 2013 single ‘pathos y lagrimas’” (chatgpt, personal communication, march 8, 2023). despite the absence of an equivalent worldcat record, chatgpt was able to provide a sample marc record, which i have included in table 9. this record, complete with sample text for the leader and control fields (00x), serves as evidence of two important capabilities of chatgpt: its ability to generate original cataloging records, and its incorporation of placeholder content in fields that are collection specific. chatgpt’s ability to generate accurate marc records using both rda and ersatz “original” cataloging demonstrates its potential as a cataloging and item description resource. additionally, chatgpt’s versatility is further highlighted by its ability to produce original content in other metadata formats. when asked if it could generate records using the dublin core metadata information technology and libraries september 2023 from chatgpt to catgpt 4 brzustowicz element set, chatgpt not only confirmed its ability but also provided a sample entry for “pathos y lagrimas” as seen in table 10. while some modifications may be necessary to cater to collectionspecific demands, this showcases chatgpt’s potential as a time-saving tool for automating record generation in multiple formats. in addition to its ability to generate accurate records adhering to multiple metadata standards, the results of this study also highlight the potential versatility of chatgpt as a cataloging and item description resource. the model’s ability to generate records for different media and in different languages could prove particularly useful for librarians and other information professionals who manage diverse collections. moreover, while catalogers may need to modify the pregenerated records to suit their specific collections’ requirements, chatgpt’s user-friendly interface and accurate record generation suggest that it could be a valuable tool for improving cataloging workflows and increasing efficiency. with further development and refinement, chatgpt has the potential to significantly enhance the capabilities of information professionals and improve the discoverability of library collections. results this study provides evidence that chatgpt can generate accurate records that conform to multiple metadata standards. the model can extract essential metadata, including title, author, publisher, publication date, subject headings, and other descriptive elements, with precision. additionally, my research reveals that chatgpt’s ability to generate marc records is not limited to specific formats or languages, as it successfully created marc records for various media and materials in different languages, such as english, german, and russian. chatgpt was able to generate both accurate existing authority records and entirely original ones, and it could generate records using both rda and dublin core standards. according to chatgpt, it has been trained on data from various catalogs, including oclc’s worldcat, the library of congress, the national library of medicine, the british library, copac (uk academic and national library catalog), europeana, and the hathitrust digital library (chatgpt, personal communication, march 9, 2023). this poses a unique challenge, as these catalogs may have different policies on access and reuse of their data. for example, oclc’s catexpress is a subscription-based automated cataloging system. if chatgpt or a future “catgpt” draws on oclc’s data and makes it available for free, it may raise questions about oclc’s copyright holdings. additionally, while chatgpt may generate records for materials available on the public internet, such as “pathos y lagrimas,” questions remain regarding how to credit the intellectual labor necessary for creating these records. my comparison of chatgpt-generated marc records against manually created records by professional catalogers had positive results. while the accuracy of the chatgpt-generated records was comparable to those of the manually created records, notable differences existed in how subject access points were assigned. this suggests that chatgpt has the potential to provide new methods for growing the discipline of library cataloging by automating the more rote, laborintensive and time-consuming tasks (for example, copy cataloging). in future studies, it may be of interest to the discipline to further test the applications of ai-generated marc records on a catalog-wide scale. while chatgpt has the potential to streamline aspects of the cataloging process, it is not a complete replacement for human catalogers. the records generated by chatgpt can serve as effective starting points, but they often contain discrepancies when compared to professional information technology and libraries september 2023 from chatgpt to catgpt 5 brzustowicz catalogers’ records. for example, while the placeholder text in fields 001 and 005 can be useful, it may not match the formatting standards used by specific library collections. nonetheless, chatgpt-generated records can be accurate and effective in classifying information that is not specific to any collection, such as call numbers. bias while chatgpt shows promise as a tool for generating marc and dublin core-style records, it is also limited by its training data. at present, chatgpt searches public records (e.g., worldcat). as a result, any records it generates will draw on existing professional catalogers’ records. if a record is incomplete or contains bias—even via omission—then chatgpt will reflect those biases in its output. this will necessitate close monitoring of both original records and those which chatgpt has created through virtual copy cataloging. chatgpt’s ability to copy and generate records is rooted in its machine learning-based understanding of cataloging and metadata standards. this ai system uses training data from oclc’s worldcat to generate records, which means that the quality of the generated records is dependent on the quality and comprehensiveness of the training data.14 biases or limitations in the training data can result in biased or incomplete records. for example, if the training data is restricted to certain regions, languages, or publishers, the generated records may not reflect the full diversity of a library’s collections. similarly, biases in subject headings, descriptors, or other fields in the training data may also manifest in the generated records. while chatgpt itself has no biases, it is possible for biases to be introduced through the training data, which makes it essential for librarians and other information professionals to curate and update the data regularly. to address these potential biases, information professionals training a large language model should curate the training data carefully and periodically review and update it to ensure it is comprehensive, representative, and unbiased. they may also need to manually review and edit generated records to correct any biases or inaccuracies identified. this approach would provide new opportunities for the profession to highlight diversity, equity, and inclusion in the development and use of ai. while an ai may not have biases, biases of the people involved in training and applying the ai could affect the generated content. like other machine learning models, chatgpt acquires its biases from external sources as it can only respond to the data it has been trained on, which may reflect human errors or intentions. therefore, while chatgpt could streamline and improve the record generation process, information professionals should approach its use with awareness of its limitations and potential biases. to ensure the accuracy, comprehensiveness, and fairness of the generated records, information professionals should take proactive measures to mitigate any biases and errors . discussion the results of this study have significant implications for library cataloging. the ability to accurately create descriptive records using chatgpt could significantly reduce the time and resources required for copy cataloging; this could free up library workers to focus on other important tasks, such as collection development, user services, and metadata management. moreover, chatgpt could improve the accuracy and consistency of records in library catalogs. as chatgpt follows established cataloging rules, records created by the model are less likely to contain errors or inconsistencies; this could lead to improved search and discovery experiences for library users, as well as better interoperability between library catalogs and other systems. information technology and libraries september 2023 from chatgpt to catgpt 6 brzustowicz the intellectual property concerns surrounding chatgpt’s ability to generate content are multifaceted. one concern is the potential for copyright infringement, as chatgpt’s detailed descriptions of original works may be too like the originals, leading to legal issues for those who use the generated content without proper attribution or permission. this concern is particularly heightened for copyrighted works like books or music, where even small portions of the work can be protected. therefore, it is crucial for chatgpt’s output to be thoroughly reviewed and vetted before being used in any public-facing materials. another concern is the possibility of misattribution of authorship. chatgpt’s use of dublin core to describe original works could lead to disputes over ownership and potentially even legal action if it generates a description that attributes authorship to the wrong person or entity. to prevent such conflicts, information professionals should ensure that the metadata generated by chatgpt accurately reflects the authorship and ownership of the original work. this can be done by reviewing and editing chatgpt’s output to ensure that the metadata is correct before it is shared publicly. the ownership of the generated content is also a concern, as it is not clear who owns the content created by chatgpt. as a machine learning model, chatgpt generates content based on the data it has been trained on, raising questions about the ownership of the content it produces. establishing clear guidelines for ownership and use of the generated content can help avoid any potential disputes over ownership and ensure that appropriate attribution and permissions are obtained; this is particularly important given the potential commercial value of the content that chatgpt can produce. furthermore, it is essential to consider ethical and legal implications of the generated content, such as data privacy and protection, and to ensure that these concerns are addressed when designing guidelines for ownership and use. finally, there is the potential for unintentional disclosure of sensitive or confidential information. chatgpt’s ability to generate detailed descriptions of original works may inadvertently disclose unpublished findings or proprietary information, potentially causing harm to the author or institution. to mitigate this risk chatgpt’s output must be carefully reviewed and edited to ensure that it does not inadvertently disclose sensitive information. implementing appropriate data security measures and access controls may help prevent unauthorized access to sensitive information. conclusion the study demonstrates that chatgpt has the potential to significantly streamline the cataloging process in libraries by generating accurate and consistent records for a diverse range of materials. however, it should be used as an auxiliary tool in conjunction with human cataloging efforts to ensure the highest level of accuracy and impartiality. regular monitoring and evaluation of the model are necessary to detect any potential biases or limitations in the training data. by applying a careful and considered approach to its use, librarians and other information professionals can leverage chatgpt to enhance the efficiency and effectiveness of cataloging processes, ultimately benefiting library and information center patrons. the accurate and comprehensive marc records produced by chatgpt highlight its potential to enhance the effectiveness of library cataloging systems. by extracting metadata information such as author, publisher, subject headings, title, and other descriptive components with high precision, the technology can improve the search and discovery experience for library users. as with any machine learning model, though, there is a risk of bias that needs to be considered when utilizing information technology and libraries september 2023 from chatgpt to catgpt 7 brzustowicz chatgpt. therefore, it is crucial to monitor and evaluate the training data used to build the model, to ensure that it is extensive, impartial, and representative. the use of dublin core to describe original works is another factor that should be taken into consideration when using chatgpt for cataloging. the model’s potential to generate accurate metadata records is reliant on the quality of the input data, including the use of standardized vocabularies like dublin core. the use of any metadata standard, such as dublin core, may introduce biases and errors that could affect quality of chatgpt-generated records. therefore, information professionals should ensure that the metadata standards used by chatgpt are appropriate for the materials being cataloged and that the resulting records are regularly reviewed and evaluated for accuracy and impartiality. chatgpt’s accurate and comprehensive metadata records have the potential to enhance the effectiveness of library cataloging systems and improve the search and discovery experience for library users. however, as with any machine learning model, there is a risk of bias that needs to be considered when utilizing chatgpt. to ensure error-free results, librarians and other information professionals should approach chatgpt’s application systematically, by monitoring and evaluating the training data used to develop the model’s capabilities and by regularly curating and updating those data. additionally, periodic inspection and amendment of the generated records may be necessary to avoid inaccuracies and discrepancies arising from biases in the training data. by taking a careful and deliberate approach to chatgpt’s use, we can ensure that the resulting metadata records are as accurate, comprehensive, and unbiased as possible, while also being mindful of the intellectual property considerations inherent in using a machine learning model to generate content. information technology and libraries september 2023 from chatgpt to catgpt 8 brzustowicz tables table 1. chatgpt-generated marc record for the 1996 reprint of interview with the vampire by anne rice field data ldr 01110cam a2200325 a 4500 001 2740930 005 20040429112254.0 008 960304s1996 nyu 000 1 eng d 010 96011406 020 0345337662 (pbk.) 040 dlc $c dlc $d dlc 100 1 rice, anne. 245 1 0 interview with the vampire / $c by anne rice. 260 new york : $b ballantine books, $c 1996. 300 372 p. ; $c 18 cm. 336 text $b txt $2 rdacontent 337 unmediated $b n $2 rdamedia 338 volume $b nc $2 rdacarrier 650 0 vampires $v fiction. 650 0 horror tales. 650 0 vampires $v fiction. 650 0 horror tales. information technology and libraries september 2023 from chatgpt to catgpt 9 brzustowicz table 2. worldcat marc record for the 1996 reprint of interview with the vampire by anne rice field data ldr c a22 0 4500 001 1052676753 005 20220606044740.3 008 960903r19761996nyua b 001 0 eng 010 $z 75036792 020 9780394498218 (hardcover) ; 0394498216 (hardcover) 040 ntg $b eng $c ntg $d oclco $d oclcf $d oclcq $d oclco $d oclca 049 crcb 050 4 ps3568.i265 $b i5 1996 082 1 4 [e] 082 0 4 fic rice 090 $b 100 1 rice, anne, $d 1941-2021. 245 1 0 interview with the vampire / $c by anne rice. 250 anniversary ed. 260 new york : $b knopf, $c 1996. 300 340 pages ; $c 22 cm 336 text $b txt $2 rdacontent 337 unmediated $b n $2 rdamedia 338 volume $b nc $2 rdacarrier 500 “published may 5, 1976 ... reprinted eleven times”--title... 520 0 in a remote room in a large city, a young reporter sits fa... 600 0 lestat $c (vampire), $d 1760$v fiction. 650 0 vampires $v fiction. 650 0 imaginary interviews. 650 6 vampires $v romans, nouvelles, etc. 650 6 entretiens imaginaires. 600 0 lestat $c (vampire), $d 1760$2 fast $0 (ocolc)fst00996665 650 7 imaginary interviews. $2 fast $0 (ocolc)fst00967566 650 7 vampires. $2 fast $0 (ocolc)fst01163968 655 7 fiction. $2 fast $0 (ocolc)fst01423787 655 7 horror fiction. $2 fast $0 (ocolc)fst01921684 655 7 horror fiction. $2 lcgft information technology and libraries september 2023 from chatgpt to catgpt 10 brzustowicz table 3. chatgpt-generated marc record for low by david bowie (vinyl) field description ldr 01523njm a2200373 i 4500 001 ocn1029184931 005 20220130152943.0 007 sd fsngnn|||eu 008 180430s2018 xx umm n eng d 024 1 00602557675302 028 2 lcrhlp14 028 3 lcrhcd14 040 ##$adlc$beng$cdlc$doclco$docolc 049 ##$aocoz 100 1 bowie, david,$eperformer. 245 10 low$h[sound recording] /$cdavid bowie. 264 31 [place of publication not identified] :$b[label not identified],$c[2018] 300 ##$a1 sound disc :$banalog, 33 1/3 rpm ;$c12 in. 336 ##$anotated music$bntm$2rdacontent 337 ##$audio$sanc$2rdamedia 338 ##$aaudio disc$bac$2rdacarrier 500 ##$aoriginally released in 1977. 500 ##$atitle from container. 500 ##$alyrics and full credits on container insert. 511 0 david bowie, vocals, saxophones, guitars, piano, and chamberlin ; brian eno, synthesizer, guitar treatments, and piano ; carlos alomar, rhythm guitar ; george murray, bass guitar ; dennis davis, drums and percussion. 518 ##$arecorded august-september 1976 at the château d’hérouville, hérouville, france. 650 _0 $arock music$y1971-1980. 650 _0 $arock music$y2011-2020. 650 _0 $arock music$zengland. 650 _0 $avinyl lps. 700 1 eno, brian,$eperformer. 700 1 alomar, carlos,$eperformer. 700 1 murray, george,$eperformer. 700 1 davis, dennis,$eperformer. 730 0 low. 902 ##$amusic$new. information technology and libraries september 2023 from chatgpt to catgpt 11 brzustowicz table 4. worldcat marc record for low by david bowie (vinyl) field data ldr 02213cjm a2200529 i 4500 005 20220317111101.1 008 180321t20182017enkrcna zxx d 040 ## $asfr $beng $erda $csfr $doclcf $doclca $doclcq $doclco 024 3 $a0190295842918 028 02 $a0190295842918 $bparlophone 049 ## $acrcb 100 1 $abowie, david, $ecomposer, performer, producer. 245 10 $alow / $cdavid bowie. 264 #1 $a[london] : $bparlophone records, $c[2018]. 264 #4 $c℗2017 300 ## $a1 audio disc ; $c12 in. 336 ## $aperformed music $bprm $2rdacontent 337 ## $aaudio $bs $2rdamedia 338 ## $aaudio disc $bsd $2rdacarrier 340 ## $avinyl 344 ## $aanalog $2rdatr 344 ## $c33 1/3 rpm 344 ## $dmicrogroove 344 ## $gstereo $2rdacpc 500 ## $areissue, remastered, originally released in 1977. 508 ## $aproduced by david bowie and tony visconti. 511 0 $adavid bowie ; with accompanying musicians. 505 0 $tspeed of life -$tbreaking glass -$twhat in the world -$tsound and vision - $talways crashing in the same car -$tbe my wife -$ta new career in a new town $twarszawa -$tart decade -$tweeping wall -$tsubterraneans. 538 ## $avinyl disc, double-sided; turntable required. 650 0 $arock music $y1971-1980. 650 0 $apopular music $y1971-1980. 650 0 $aprogressive rock music. 650 6 $arock (musique) $y1971-1980. 650 6 $amusique populaire $y1971-1980. 650 6 $arock progressif. 650 7 $apopular music. $2fast $0(ocolc)fst01071422 information technology and libraries september 2023 from chatgpt to catgpt 12 brzustowicz field data 650 7 $aprogressive rock music. $2fast $0(ocolc)fst01078743 650 7 $arock music. $2fast $0(ocolc)fst01099204 648 7 $a1971-1980 $2fast 655 7 $apopular music. $2fast $0(ocolc)fst01726706 655 7 $aprogressive rock (music) $2fast $0(ocolc)fst01726709 655 7 $apopular music. $2lcgft 655 7 $aprogressive rock (music) $2lcgft 655 7 $amusique populaire. $2rvmgf 655 7 $arock progressif. $2rvmgf 700 1 $avisconti, tony, $eproducer. information technology and libraries september 2023 from chatgpt to catgpt 13 brzustowicz table 5. chatgpt-generated marc record for the 1971 german edition of pedagogy of the oppressed by paolo freire field description ldr 00000nam a2200000 4500 001 ocm04045077 003 ocolc 005 2023-02-23 008 711215s1971 gw 000 0 ger 020 $a9783801200059 020 $a3801200054 035 $a(ocolc)40053172 040 $adlc$cdlc$docolc$doclcf 041 1 $ager$hpor 042 $apcc 043 $ae-gx-- 050 00 $alb875.p442 100 1$afreire, paulo. 240 10 $apedagogy of the oppressed.$lgerman 245 10 $apädagogik der unterdrückten /$cpaulo freire ; aus dem portugiesischen von walter jens. 260 $astuttgart :$bklett-cotta verlag,$c1971. 300 $a218 pages ;$c21 cm 336 $atext$btxt$2rdacontent 337 $aunmediated$bn$2rdamedia 338 $avolume$bnc$2rdacarrier 546 $ain german. 650 0 $aeducation$xphilosophy. 650 0 $apopular education$zbrazil. 650 0 $asocial change. 651 0 $abrazil$xsocial conditions. 700 1 $ajens, walter,$d1923-2013,$etranslator. information technology and libraries september 2023 from chatgpt to catgpt 14 brzustowicz table 6. worldcat marc record for the 1971 german edition of pedagogy of the oppressed by paolo freire field description ldr cam mi 001 63439859 005 20220106164408.5 008 830119s1971 gw 000 0 ger d 040 nlggc $b dut $e fobidrtb $c nlggc $d hebis $d debbg $d oclcq $d oclco 015 72,a06,0123 $2 dnb 016 7 720015707 $2 de-101 020 3783103746 020 9783783103748 041 1 1 ger $h por 084 5,3 $2 ssgn 084 bu 1410 $2 rvk 084 dd 6455 $2 rvk 084 dt 1000 $2 rvk 090 $b 049 crcb 100 1 freire, paulo, $d 1921 245 10 pädagogik der unterdrückten / $c paulo freire ; mit einer einf. von ernst lange ; [autor. übertr. aus dem portugesischen]. 260 stuttgart : $b kreuz-verlag, $c 1971. 300 218 p. ; $c 21 cm 650 04 onderdrukking. 650 04 pedagogiek. 650 7 bildungspolitik. $2 fes 650 7 bildungsziel. $2 fes 650 7 erziehung. $2 fes 650 7 pädagogik $2 gnd 650 7 unterdrückung $2 gnd 650 7 alphabetisierung $2 gnd 650 7 politische bildungsarbeit $2 gnd 650 7 emanzipatorische erziehung $2 gnd 651 7 brasilien $2 gnd 650 17 pedagogiek. $2 gtt information technology and libraries september 2023 from chatgpt to catgpt 15 brzustowicz field description 650 17 onderdrukking. $2 gtt 655 4 unterdrückte. 655 7 unterdrückte. $2 swd 029 0 nlggc $b 830230688 029 1 debbg $b bv005751307 029 1 debsz $b 003425525 029 1 hebis $b 045711100 information technology and libraries september 2023 from chatgpt to catgpt 16 brzustowicz table 7. chatgpt-generated for the 2018 print edition in russian of the three body problem by cixin liu field data ldr cam i 001 1037948853 005 20220402070218.7 008 180529s2018 ru 000 1 rus d 040 cle $b eng $e rda $c cle $d oclcf $d ocl $d oclcq $d oclco 066 $c (n 020 9785040891122 020 5040891121 041 1 $h chi $a rus 043 a-cc-- 090 $b 049 crcb 100 1 $a liu, cixin, $e author. 240 10 $a san ti. $l russian 245 10 $a задача трех тел / $c лю цысинь. 245 10 $a zadacha trekh tel / $c li︠u ︡ t︠s ︡ysinʹ. 264 1 $a москва : $b э, $c 2018. 264 1 $a moskva : $b ė, $c 2018. 300 $a 462 pages ; $c 22 cm 336 $a text $b txt $2 rdacontent 337 $a unmediated $b n $2 rdamedia 338 $a volume $b nc $2 rdacarrier 490 0 $a sci-fi universe 650 0 human-alien encounters, fiction 650 0 imaginary wars and battles, fiction 651 0 china -history -cultural revolution, 1966-1976 -fiction 650 6 rencontres avec les extraterrestres, romans, nouvelles, etc. 650 6 guerres et batailles imaginaires, romans, nouvelles, etc. 651 6 chine -histoire -1966-1976 (révolution culturelle) -romans, nouvelles, etc. 650 7 human-alien encounters. $2 fast $0 (ocolc)fst00963475 650 7 imaginary wars and battles. $2 fast $0 (ocolc)fst00967580 651 7 china. $2 fast $0 (ocolc)fst01206073 647 7 cultural revolution (china : 1966-1976) $2 fast $0 (ocolc)fst01352662 information technology and libraries september 2023 from chatgpt to catgpt 17 brzustowicz field data 648 7 1966-1976 $2 fast 655 7 fiction. $2 fast $0 (ocolc)fst01423787 655 7 history. $2 fast $0 (ocolc)fst01411628 655 7 science fiction. $2 fast $0 (ocolc)fst01726489 655 7 science fiction. $2 lcgft 700 1 translation of: liu, cixin. san ti. information technology and libraries september 2023 from chatgpt to catgpt 18 brzustowicz table 8. worldcat record for the 2018 print edition in russian of the three body problem by cixin liu field data ldr cam i 001 1037948853 005 20220402070218.7 008 180529s2018 ru 000 1 rus d 040 cle $b eng $e rda $c cle $d oclcf $d ocl $d oclcq $d oclco 066 $c (n 020 9785040891122 020 5040891121 041 1 $h chi $a rus 043 a-cc-- 090 $b 049 crcb 100 1 $a liu, cixin, $e author. 240 10 $a san ti. $l russian 245 10 $a задача трех тел / $c лю цысинь. 245 10 $a zadacha trekh tel / $c li︠u ︡ t︠s ︡ysinʹ. 264 1 $a москва : $b э, $c 2018. 264 1 $a moskva : $b ė, $c 2018. 300 $a 462 pages ; $c 22 cm 336 $a text $b txt $2 rdacontent 337 $a unmediated $b n $2 rdamedia 338 $a volume $b nc $2 rdacarrier 490 0 $a sci-fi universe 650 0 human-alien encounters, fiction 650 0 imaginary wars and battles, fiction 651 0 china -history -cultural revolution, 1966-1976 -fiction 650 6 rencontres avec les extraterrestres, romans, nouvelles, etc. 650 6 guerres et batailles imaginaires, romans, nouvelles, etc. 651 6 chine -histoire -1966-1976 (révolution culturelle) -romans, nouvelles, etc. 650 7 human-alien encounters. $2 fast $0 (ocolc)fst00963475 650 7 imaginary wars and battles. $2 fast $0 (ocolc)fst00967580 651 7 china. $2 fast $0 (ocolc)fst01206073 647 7 cultural revolution (china : 1966-1976) $2 fast $0 (ocolc)fst01352662 information technology and libraries september 2023 from chatgpt to catgpt 19 brzustowicz field data 648 7 1966-1976 $2 fast 655 7 fiction. $2 fast $0 (ocolc)fst01423787 655 7 history. $2 fast $0 (ocolc)fst01411628 655 7 science fiction. $2 fast $0 (ocolc)fst01726489 655 7 science fiction. $2 lcgft 700 1 translation of: liu, cixin. san ti. information technology and libraries september 2023 from chatgpt to catgpt 20 brzustowicz table 9. chatgpt’s original marc record for “pathos y lagrimas” by mood rings field data ldr 02461cjm a2200517 a 4500 001 123456789 005 20181025171247.0 007 sd fungnnmmned 008 181025s2013 nyurcn n zxx d 024 8 767870622805 035 (ocolc)851415773 040 dlc $b eng $c dlc $e rda $d dlc 041 0 $g spa 043 n-mx-- 047 vd $a pp 048 ka01 049 dlc 050 00 m1630.18 $b .m66 2013 082 00 782.421660972 100 1 mood rings $4 prf 245 10 pathos y lagrimas / $c mood rings. 246 3 333 264 1 [brooklyn, ny] : $b mexican summer, $c [2013] 300 1 audio disc : $b analog, 33 1/3 rpm ; $c 7 in. 336 performed music $b prm $2 rdacontent 337 audio $b s $2 rdamedia 338 audio disc $b sd $2 rdacarrier 500 title from label. 500 b-side label has number “333.” 511 0 mood rings. 500 “limited edition” “500 copies.” 650 0 rock music $z mexico $y 2011-2020. 655 7 rock music. $2 lcgft 655 7 popular music. $2 lcgft 700 1 mood rings. $4 prf 710 2 mexican summer (firm) $4 pbl. information technology and libraries september 2023 from chatgpt to catgpt 21 brzustowicz table 10. chatgpt’s original record using the dublin core for “pathos y lagrimas” by mood rings field data title pathos y lagrimas / mood rings creator mood rings contributor mexican summer date 2013 type sound format audio/vinyl identifier 333 language eng relation mexican summer source mexican summer coverage atlanta, ga rights all rights reserved information technology and libraries september 2023 from chatgpt to catgpt 22 brzustowicz endnotes 1 viriya taecharungroj, “what can chatgpt do? analyzing early reactions to the innovative ai chatbot on twitter,” big data and cognitive computing 7, no. 1: 35, https://doi.org/10.3390/bdcc7010035. gpt (generative pretrained transformer) technologies are structured in a way that involves training language models on a large body of data. subsequently, fine-tuning processes are applied to enhance the model’s performance on specific tasks and domains. fine-tuning processes are then applied to enhance the model’s performance on specific tasks and domains. 2 roberto gozalo-brizuela and eduardo c. garrido-merchan, “chatgpt is not all you need. a state of the art review of large generative ai models,” arxiv:2301.04655v1 [cs.lg]: 15, https://doi.org/10.48550/arxiv.2301.04655. 3 taecharungroj, “what can chatgpt do?” 4 enkelejda kasneci et al., “chatgpt for good? on opportunities and challenges of large language models for education,” edarxiv (january 30, 2023), https://doi.org/10.35542/osf.io/5er8f. 5 kasneci et al., “chatgpt for good?” 6 brady d. lund and ting wang, “chatting about chatgpt: how may ai and gpt impact academia and libraries?” library hi tech news 40 (2023), no. 3: 26–29, https://doi.org/10.1108/lhtn01-2023-0009. 7 stephen atlas, “chatgpt for higher education and professional development: a guide to conversational ai,” (2023): 106–7, https://digitalcommons.uri.edu/cba_facpubs/548/. 8 ali borji, “a categorical archive of chatgpt failures,” arxiv.2302.03494 [cs.cl]: 11, https://doi.org/10.48550/arxiv.2302.03494. 9 anne rice, interview with the vampire (new york: alfred k. knopf, 1996). 10 david bowie, low, recorded september–october 1976, rca victor, 1977, vinyl lp. 11 paolo freire, pädagogik der unterdrückten (stuttgart: kreuz-verlag, 1971). 12 cixin liu, задача трех тел (moscow: sci-fi universe, 2018). 13 mood rings, “pathos y lagrimas,” recorded ca. 2013, mexican summer, 2013, vinyl single. 14 tom b. brown et al., “language models are few-shot learners,” arxiv.2005.14165v4 [cs.cl]: 8–9, https://doi.org/10.48550/arxiv.2005.14165. https://doi.org/10.3390/bdcc7010035 https://doi.org/10.48550/arxiv.2301.04655 https://doi.org/10.48550/arxiv.2301.04655 https://doi.org/10.35542/osf.io/5er8f https://doi.org/10.1108/lhtn-01-2023-0009 https://doi.org/10.1108/lhtn-01-2023-0009 https://digitalcommons.uri.edu/cba_facpubs/548/ https://doi.org/10.48550/arxiv.2302.03494 https://doi.org/10.48550/arxiv.2005.14165 abstract introduction literature review methodology results bias discussion conclusion tables endnotes editorial | truitt 159 marc truitt editorial: reflections on what we mean by “forever” w hat do we mean when we tell people that we want or intend to preserve content or an object “forever”? a couple of weeks ago, i attended the fall meeting of the preservation and archiving special interest group (pasig) in san francisco. the group, generously sponsored by sun microsystems, is the brainchild of art pasquinelli of sun and michael keller of stanford. first, a confession on my part. since the university of alberta (ua) was one of the founding members of pasig, i had occasion to attend the first several pasig meetings. in the beginning, there were just a handful of—perhaps fewer than ten—institutions represented. it seemed at the first couple of meetings, when the group was still finding its direction, that the content was slim, repetitious, and overly focused on sun’s own solutions in the digital preservation and archiving (dpa) arena. since we had other attendees ably representing ua, i stayed away from the following several meetings. well, pasig has grown up. the attendee list for this meeting boasted nearly two hundred persons representing more than thirty institutions. among the attendees were many of the leading lights in dpa and the profession generally. institutions represented included several north american and european national libraries, as well as arls, memory institutions, and a host of companies and consultants offering a range of dpa solutions. yes, pasig has arrived, and we have art, mike, and sun to thank for this. if i have one real remaining complaint about pasig, it’s that the group is still overly focused on sun’s solutions. true, other vendors such as exlibris and vtls attended, but their solutions don’t compete; rather, they build on sun’s offerings. and while microsoft also was in attendance for the first time, its presentation focused not so much on dpa solutions—it has none—as on a raft of interesting and useful plug-ins whose purpose is to facilitate preservation of content created in microsoft products such as word, excel, powerpoint, etc. other large vendors of dpa solutions—think ibm, for one—remain conspicuously absent. it’s time for sun to do the “right thing” and “open source” pasig. if sun wishes to continue to sponsor pasig by lending administrative and organizational expertise, that would be great. indeed, a leading but not controlling role in pasig would be entirely consistent with the company’s new focus on support of open-source efforts such as mysql, openoffice, and opensolaris. so, what about the title of this editorial? when we talk of digital preservation, just how long are we thinking of preserving an object? ask any twenty specialists in dpa, and chances are that you’ll get at least ten different answers. for some, the timeframe can be as short as five to twenty years. for others, it’s fifty or perhaps one hundred years. at pasig, at least one presenter described an organizational business model that envisions preserving content for five hundred years. and there are even some in our profession who glibly use what one might call “the dpa f-word,” although fortunately none of them seemed to be in attendance at this fall’s pasig what does this mean in a very practical, nuts-and-bolts it sense? chris wood of sun gave a presentation at the 2008 pasig spring meeting in which he estimated that the cost to supply power and cooling alone to maintain a petabyte (1,000 tb) of disk-based digital content for a mere ten years would easily exceed $1 million.1 refining his figures downward somewhat, wood noted a few months later at the following pasig meeting that for a 1 tb drive, the fiveyear estimated power and cooling for 2008–12 could be estimated at approximately $320, or $640,000 per petabyte over ten years, still a considerable sum.2 add to this the costs of migration—consider that a modern spinning disk is generally thought to have a useful lifespan of about five years, and tape may have two or three decades—and the need regular integrity-checking of digital content for “bit-rot,” and you have the stuff of a sustainability nightmare. these challenges don’t even include the messy question of preservating an object so that it is usable in a century or five. while we probably will be able to read word and excel files for the foreseeable future, there are already countless files created with nowdefunct pc applications of the 1980s and 1990s; many are stored on all kinds of obsolete media and today are skating on the edge of inaccessibility. already we are seeing concern expressed at institutions with significant digital library and digitization commitments that curating, migrating, and ensuring the integrity and usability of growing petabytes of content over centuries may be unsustainable in both dollars and staff.3 can we even imagine the possible maintenance burden for our descendants, say, 250 or 500 years from now? in 2006, alexander stille observed that “one of the great ironies of the information age is that, while the late twentieth century will undoubtedly have recorded more data than any other period in history, it will also almost certainly have lost more information than any previous era.”4 how are we to deal with this? can we meaningfully plan for the preservation of digital content over centuries given our poor track record over just the past few decades? perhaps we’re thinking too big when we speak of “forever.” maybe we need to begin by conceptualizing and implementing on a more manageable scale. or, to adopt a phrase that seemed to become the informal mantra of marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 160 information technology and libraries | december 2009 both this year’s pasig and the immediately preceding ipres meeting, “to get to forever you have to get to five years first.”5 n about this issue of ital a few months ago, while she was still working at the university of nevada las vegas, ital’s longtime managing editor, judith carter, shared with me the program for discovery mini-conference that had just been held at unlv. the presentations, originally cast as poster sessions, suggested a diverse and fascinating collection of insights deserving of wider attention. i suggested to judith that she and her colleagues had the makings of a great ital theme issue, and i’m pleased that they accepted my invitation to rework the presentations into a form suitable for publication here. i hope that you will find the results of their work interesting—i certainly do. they’ve done a superb job! bravo to judith and the presenters at the unlv discovery mini-conference! n corrigenda in our september issue, in an article by kathleen carlson, we inadvertently characterized camtasia studio as an open-source product. it is not. camtasia studio is published by techsmith corporation. you can find out more at the product website (http://www.techsmith.com/ camtasia.asp). also, in the same article, we provided a url to a flash tutorial titled “how to order an article that asu does not own.” ms. carlson has recently advised us that the tutorial in question is no longer available. references and notes 1. chris wood, “the billion file problem and other archive issues” (presentation, spring meeting of the sun preservation and archiving special interest group [pasig], san francisco, california, may 28, 2008), http://events-at-sun.com/ pasig_spring/presentations/chriswood_massivearchive.pdf (accessed oct. 22, 2009). 2. chris wood, “archive and preservation: emerging storage: technologies & trends” (presentation, fall meeting of pasig, baltimore, maryland, nov. 19, 2008), http://events -at-sun.com/pasig_fall08/presentations/pasig_wood.pdf. (accessed oct. 22, 2009). 3. consider, for example, the following extract from a recent posting to the syslib-l electronic discussion list by the head of library systems at the university of north carolina at chapel hill: i’m exaggerating a little in my subject line, but it’s been less than 4 years since we purchased our first large (5tb) storage array. we now have a raw 65tb online, and 84tb on order—although a considerable chunk of that 84 is going to replace storage that’s going out of warranty/maintenance and is more cost effective to replace (apple xraids, for instance). in the end, though we’ll net out with 100tb or thereabouts by the end of next year. a great deal of this space is going to digitization projects—no surprise there. we have over 20tb now in our “digital archive,” storage i consider dim, if not dark. we need a heck of a lot of space for staging backups, givien [sic] how much we write to tape in a 24-hour period. individual staff aren’t abusing our lack of quotas—it’s really almost all legitimate, project-driven work that’s eating us up. what’s scarier is that we’re now talking seriously about moving from project-driven work to programmatic work: the latest large photographic archive we acquired is being scanned as part of the acquisition/processing workflow. we’re looking at ways to prioritize the scanning of our manuscript collections. donors increasingly expect to see their gifts online. and we’re not even yet supporting an “institutional repository.” will owen, “0 to 60 in three years: mass storage management,” online posting, dec. 8, 2008, syslib-l@listserv.indiana.edu, https://listserv.indiana.edu/cgi-bin/wa-iub.exe?a0=syslib-l (account required; accessed oct. 22, 2009). 4. alexander stille, “are we losing our memory? or, the museum of obsolete technology,” lost magazine, no. 3 (feb. 2006), http://www.lostmag.com/issue3/memory.php (accessed oct. 22, 2009). while stille was referring in this quotation to both digital and nondigital materials, his comments are but part of a larger debate positing that the latter half of the twentieth century could well come to be known in the future as a “digital dark age” because of the vast quantity of at-risk digital content, recently estimated by one expert at some 369 exabytes (369 billian gb) worth of data. physorg.com, “‘digital dark age’ may doom some data,” http://www.physorg.com/news144343006 .html (accessed oct. 22, 2009). 5. ed summers, “ipres, iipc, pasic roundup/braindump,” online posting, oct. 14, 2009, inkdroid, http://inkdroid .org/journal/2009/10/14/ipres-iipc-pasig-roundupbrain dump/ (accessed oct. 22, 2009). improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides kate a. pittsley and sara memmott information technology and libraries | se ptember 2012 52 abstract can the navigation of complex research websites be improved so that users more often find their way without intermediation or instruction? librarians at eastern michigan university discovered both anecdotally and by looking at patterns in usage statistics that some students were not recognizing navigational elements on web-based research guides, and so were not always accessing secondary pages of the guides. in this study, two types of navigation improvements were applied to separate sets of online guides. usage patterns from before and after the changes were analyzed. both sets of experimental guides showed an increase in use of secondary guide pages after the changes were applied whereas a comparison group with no navigation changes showed no significant change in usage patterns. in this case, both duplicate menu links and improvements to tab design appeared to improve independent student navigation of complex research sites. introduction anecdotal evidence led librarians at eastern michigan university (emu) to investigate possible navigation issues related to the libguides platform. anecdotal evidence included (1) incidents of emu librarians not immediately recognizing the tab navigation when looking at implementations of the libguides platform on other university sites during the initial purchase evaluation, (2) multiple encounters with students at the reference desk who did not notice the tab navigation, and (3) a specific case involving use of a guide with an online course. the case investigation started with a complaint from a professor that graduate students in her online course were suddenly using far fewer resources than students in the same course during previous semesters. the students in that semester’s section relied heavily—often solely— on one database, while most students during previous semesters had used multiple research sources. this course has always relied on a research guide prepared by the liaison librarian, the selection of resources provided had not changed significantly between the semesters, and the assignment had not changed. furthermore, the same professor taught the course and did not alter her recommendation to the students to use the resources on the research guide. what had changed between the semesters was the platform used to present research guides. the library had just migrated from a simple one-page format for research guides to the more flexible multipage format offered by the libguides platform. only a few resources were listed on the first kate a. pittsley (kpittsle@emich.edu) is an assistant professor and business information librarian and sara memmott (smemmott@emich.edu) is an instructor and emerging technologies librarian at eastern michigan university, ypsilanti, michigan. improving independent student navigation of complex educational websites | pittsley and memmott 53 libguides page of the guide used for the course. only one of these resources was a subscription database, and that database was the one that current students were using to the exclusion of many other useful sources. after speaking with the professor, the liaison librarian also worked one-on-one with a student in the course. the student confirmed that she had not noticed the tab navigation and so was unaware of the numerous resources offered on subsequent pages. the professor then sent a message to all students in the course explaining the tab navigation. subsequently the professor reported that students in the course used a much wider range of sources in assignments. statistical evidence of the problem a look at statistics on guide use for fall 2010 showed that on almost all guides the first pages of guides were the most heavily used. as the usual entry point, it wasn’t surprising that the first pages would receive the most use; however, on many multipage guides, the difference in use between the first page and all secondary pages was dramatic. that users missed the tab navigation and so did not realize additional guide pages existed seemed like a possible explanation for this usage pattern. librarians felt strongly that most users should be able to navigate guides without direct instruction in their use, and they were concerned by the evidence that indicated problems with the guide navigation. was there something that could be done to improve independent student navigation in libguides? two types of design changes to navigation were considered. to test the changes, each navigation change was applied to separate sets of guides. usage patterns were then compared for those guides before and after changes were made. the investigators also looked at usage patterns over the same period for a comparison group to which no navigation changes had been made. literature review navigation in libguides and pathfinders the authors reviewed numerous articles related to libguides or pathfinders generally, but found few that mention navigation issues. they then turned to studies of website navigation in general. in an early article on the transition to web-based library guides, cooper noted that “computer screens do not allow viewers to visualize as much information simultaneously as do print guides, and consequently the need for uncomplicated, easily understood design is even greater.”1 four university libraries’ usability studies of the libguides platform specifically address navigation issues. university of michigan librarians dubicki et al. found that “tabs are recognizable and meaningful—users understood the function of the tabs.”2 the michigan study then focused on the use of meaningful language for tab labels. however, at the latrobe university library (australia), corbin and karasmanis found a consistent pattern of students not recognizing the navigation tabs, and so recommended providing additional navigation links elsewhere on the page.3 at the university of washington, hungerford et al. found students did not immediately recognize the tab navigation: information technology and libraries | se ptember 2012 54 during testing it was observed that users frequently did not notice a guide’s tabs right away as a navigational option. users’ eyes were drawn to the top middle of the page first and would focus on content there, especially if there was actionable content, such as links to other pages or resources.4 the solution at the university of washington was to require that all guides have a main page navigation area (libguides “box”) with a menu of links to the tabbed pages. after a usability study, mit libraries also recommended use of a duplicate navigation menu on the first page, stating in mit libraries staff guidelines for creating libguides to “make sure to link to the tabs somewhere on the main page” as “users don’t always see the tabs, so providing alternate navigation helps.”5 navigation palmer mentions navigation as one of the factors most significantly associated with website success as measured by user satisfaction, likelihood to use a site again, and use frequency.6 however, effective navigation may be difficult to achieve. nielsen found in numerous studies that “users look straight at the content and ignore the navigation areas when they scan a new page.”7 in a presentation on the top ten mistakes in web design, human–computer interaction scholar tullis included “awkward or confusing navigation.”8 the following review of the literature on website navigation design is limited to studies of navigation models that use browsing via menus, tabs, and menu bars. the navigation problem seen in libguides is far from unique. usability studies for other information-rich websites demonstrate similar problems with users not recognizing navigation tabs or menu bars similar to those used in libguides. in 2001, mcgillis and toms investigated the usability of a library website with a horizontal navigation bar at the top of the page, a design similar to the single row of libguides tabs. this study found that users either did not see the navigation bar or did not realize it could be clicked.9 in multiple usability studies, u.s. census bureau researchers found similar problems with navigation bars on government websites. in 2009, olmsted-hawala et al. reported that study participants did not use the top-navigation bar on the census bureau’s business and industry website.10 the next year, chen et al. again reported problems with top-navigation bar use on the governments division public website, explaining that the “top-navigation bar blends into the header, leading participants to skip over the tabs and move directly to the main content. this is a recurring issue the usability laboratory has identified with many web sites.”11 one possible explanation for user neglect of tabs and navigation bars may be a phenomenon termed “banner blindness.” as early as 1999, benway provided in-depth analysis of this problem. in his thesis, he uses the word “banner” not just for banner ads, but also for banners that consist of horizontal graphic buttons similar to the libguides tab design. benway’s experiments show that an attempt to make important items visually prominent may have the opposite effect— that “the visual distinctiveness may actually make important items seem unimportant.” benway follows with two recommendations: (1) that “any method that is created to make something stand out should be carefully tested with users who are specifically looking for that content to ensure that it does not cause banner blindness,” and (2) that “any item visually distinguished on a page should be duplicated within a collection of links or other navigation areas of the page. that way, if searchers ignore the large salient item, they can still find what they need through basic navigation.”12 improving independent student navigation of complex educational websites | pittsley and memmott 55 in 2005, tullis cited multiple studies that showed that users found information faster or more effectively by using a simple table of contents than by using other navigation forms, including tabbased navigation.13 yet in 2011, nicolson et al. found that “participants rarely used table of contents; and often appeared not to notice them.”14 yelinek et al. pointed to a practical problem in using content menus on libguides pages: since libguides pages can be copied or mirrored on other guides, guide authors must be cognizant that such menus could cause problems with incorrect or confusing navigational links on copied or mirrored pages.15 success can also depend on the location of navigational elements, although researchers disagree on effects of location. in addition, user expectations of where to look for navigation elements may change over time along with changes in web conventions. in 2001, bernard studied user expectations as to where common web functions would be located on the screen layout. he found that “most participants expected the links to web pages within a website to be almost exclusively located in the upper-left side of a web page, which conforms to the current convention of placing links on [the] left side.”16 in 2004, pratt et al. found that users were equally effective using horizontal or vertical navigation menus, but when given a choice more users chose to use vertical navigation.17 also in 2004, mccarthy et al. performed an eye-tracking study, which showed faster search times when sites conformed to the expected left navigation menu and a user bias toward searching the middle of the screen; but it also found that the initial effect of menu position diminished with repeated use of a site.18 nonetheless, jones found that by 2006 most corporate webpages used “horizontally aligned primary navigation using buttons, tabs, or other formatted text.”19 in 2008, cooke found that users looked equally at left, top, and center menus; however, when “a visually prominent navigation menu populated the center of the web page, participants were more likely to direct their search in this location.”20 wroblewski describes how tab navigation was first popularized by amazon.21 burrell and sodan investigated user preferences for six navigation styles and found that users clearly preferred tab navigation “because it is most easily understood and learned.”22 in the often-cited web design manual don’t make me think, krug also recommends tabs: “tabs are one of the very few cases where using a physical metaphor in a user interface actually works.”23 krug recommends that tabs be carefully designed to resemble file folder tabs. they should “create the visual illusion that the active tab is in front of the other tabs . . . the active tab needs to be a different color or contrasting shade [than the other tabs] and it has to physically connect with the space below it. this is what makes the active tab ‘pop’ to the front.”24 an often-cited u.s. department of health and human services manual on research-based web design addresses principles of good tab design, stating that tabs should be located near the top of the page and should “look like clickable versions of real-world tabs. real-world tabs are those that resemble the ones found in a file drawer.”25 nielsen provides similar guidelines for tab design, which include that the selected tab should be highlighted, the current tab should be connected to the content area (just like a physical tab), and that one should use only one row of tabs.26 more recently, cronin highlighted examples of good tab design that effectively use elements such as rounded tab corners, space between tabs, and an obvious design for the active tab that visually connects the tab to the area beneath it.27 christie also provides best practices for tab design that include consistent use of only one row of tabs, use of a prominent color for the active tab and a single information technology and libraries | se ptember 2012 56 background color for unselected tabs, changing the font color on the active tab, and use of rounded corners to enhance the file-folder-tab metaphor.28 two articles mention that the complexity of a site can be a factor in navigation success. mccarthy et al. found that search times are significantly affected by site complexity and recommended finding ways to balance the provision of numerous user options with simplifying the site so that users can find their way.29 little specifically suggests reducing the amount of extraneous information on libguides pages in her article, which applies cognitive load theory to use of library research guides.30 in sum, effective navigation is difficult to achieve. however, navigation design can be improved by considering the purpose of the site, user expectations, common conventions, best practices, the possibility that intuitive ideas for design may not perform as expected (e.g., banner blindness), the site’s complexity, and more. research question and method could design changes improve independent student use of libguides tab navigation? the literature reviewed above suggested two likely design changes to test: adding additional navigation links in the body of the page and improving the tab design. testing these design changes on selected guides would allow the emu library to assess the impact before implement changes on all library research guides. for this experiment, each type of navigation change was applied to separate subsets of guides; a subset of similar guides was selected as a comparison group; and usage patterns were analyzed for similar periods before and after changes were made. navigation design changes were made to fourteen subject guides related to business. the business subject guides were divided into two experimental groups of seven guides. in group a, a table of contents box with navigation links was added to the front page of each guide, and in group b, the navigation tabs were altered in appearance. no navigation changes were made to comparison group c. class specific guides were excluded from the experiment, as in many cases the business librarian would have instructed students in the use of tabs on class guides. changes were made at the beginning of the winter 2011 semester so that an entire semester’s data could be collected and compared to the previous semester’s usage patterns. the design for group a was similar to the university of washington implementation of a “what’s in the guide” box on guide homepages that repeated the tab navigation links.31 for guides in group a, a table of contents box was placed on the guide homepages. it contained a simple list of links to the secondary pages of the guides, using the same labels as on the navigation tabs. the table of contents box used a larger font size than other body text and was given an outline color that contrasted with the outline color used on other boxes and matched the navigation tab color to create visual cues that this box had a different function from the other boxes on the page (navigation). the table of contents box was placed alongside other content on the guide homepages so users could still see the most relevant resources immediately. figure 1 shows a guide containing a table of contents box. improving independent student navigation of complex educational websites | pittsley and memmott 57 figure 1. group a guide with content menu box labeled “guide sections” the design change for group b focused on the navigation tabs. libguides tabs exhibit some of the properties of good tab design, such as allowing for rounded corners and contrasting colors for the selected tabs. other aspects are not ideal, such as the line that separates the active tab from the page body.32 in the emu library’s initial libguides implementation, the option for tabs with rounded corners was used to resemble the design of manila file folders and increase the association with the file-folder metaphor. possibilities for further design adaptation on the experimental guides were somewhat limited because these changes needed to be applied to the tabs of just a selected set of guides. the investigators theorized that increasing the height of the tabs might make them more closely resemble paper file folder tabs. increasing the height would also increase the area of the tabs, and the larger size might also make the tabs more noticeable. this option was simple to implement on the guides in group b by adding html break tags,
, to the tab text. taller tabs also provided more room for text on the tabs. tabs in libguides will expand in width to fit the text label used, and if the tabs on a guide require more space on the page, they will be displayed in multiple rows. multiple rows of tabs are visually confusing and break the tabs metaphor, decreasing their usefulness for navigation.33 the emu library’s best practices for research guides already encouraged limiting tabs to one row. adding height to tabs allowed for clearer text labels on some guides without expanding the tab display beyond a single row. figure 2 shows a guide containing the altered taller tabs. information technology and libraries | se ptember 2012 58 figure 2. group b guide with tabs redesigned to look more like file folder tabs while variations in content and usage of library guides did not allow for a true control group, other social science subject guides were selected as a comparison group. social science subject guides were excluded from the comparison group if they had very low guide usage during the fall 2010 semester (fewer than thirty uses), or if they had fewer than three tabs, making them structurally dissimilar to the business guides. this left a group of sixteen comparison guides. no changes were made to the navigation design of these guides during the test period. the business guides—which the authors had permission to experiment with—tend to be longer and have more pages than other guides. on average, the experimental guides had more pages per guide than the comparison guides; guides in groups a and b averaged nine pages per guide, and comparison guides averaged five pages per guide. guides with more pages will tend to have a higher percentage of hits on secondary pages because there are more pages available to users. however, the authors intended to measure the change in usage patterns with each guide measured against itself in different periods, and the number of pages in each guide did not change from semester to semester. data collection and results libguides provides monthly usage statistics that include the total hits on each guide and the number of hits on each page of a guide. use of secondary pages of the guides was measured by calculating the proportion of hits to each guide that occurred on secondary pages. data for the fall 2010 semester (september through december 2010) was used to measure usage patterns before navigation changes were made to the experimental guides. data for the winter 2011 semester (january through april 2011) was used to measure usage patterns after navigation changes were made. each would represent a full semester’s use at similar enrollment levels with many of the same courses and assignments. usage patterns for the comparison guides were also examined for these periods. improving independent student navigation of complex educational websites | pittsley and memmott 59 as shown in figures 3 and 4, in both group a and group b, the percentage of hits on secondary pages increased in five guides and decreased in two guides. figure 3. group a: change in secondary page usage with content menus added for winter 2011 figure 4. group b: change in secondary page usage with new tab design for winter 2011 both groups of experimental guides showed an increase in use of secondary guide pages after the design changes were made. the median usage score was calculated for each group. group a, with the added menu links, showed an increase of 10.3 points in the median percentage of guide hits on secondary pages. group b, with redesigned tabs, showed an increase of 10.4 points in the median percentage of guide hits on secondary pages. within the comparison guides, the proportion of hits secondary tab usage : guides in group a fall 2010 winter 2011 secondary tab usage: guides in group b fall 2010 winter 2011 information technology and libraries | se ptember 2012 60 on secondary pages did not change significantly from fall 2010 to winter 2011. table 1 shows the median percentage of guide hits on secondary pages before and after navigation design changes. group a: menu links added group b: tabs redesigned group c: comparison group fall 2010 39.1% 50.5% 37.7% winter 2011 49.4% 60.9% 37.4% table 1. median percentage of guide hits on secondary pages the box plot in figure 5 graphically illustrates the range of the usage of secondary pages in each group of guides and the changes from fall 2010 to winter 2011, showing the minimum, maximum, and median scores, as well as the range of each quartile. figure 5. distribution of percentage of guide hits on secondary pages. this figure demonstrates the change in usage pattern for groups a and b and the lack of change in usage pattern for comparison group c. averages for the percentage change in secondary tab use were also computed for the combined experimental groups and the comparison group. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% group a f10 group a w11 group b f10 group b w11 group c f10 group c w11 improving independent student navigation of complex educational websites | pittsley and memmott 61 experimental or comparison n mean std. deviation std. error mean change in secondary tab use dim ension 1 experimental 14 .07871 .097840 .026149 comparison 16 -. 02550 .145977 .036494 table 2. average change in secondary tab use from fall 2010 to winter 2011, comparing all experimental guides (groups a & b) with all comparison (group c) guides. when comparing all experimental guides and all comparison guides, the change in use of secondary pages was found to be statistically significant. the average change in use of secondary pages for all experimental guides (groups a and b) was .07871, and the average for all comparison guides (group c) was -.02550. a t test showed that this difference was significant at the p < . 05 level (p = .032). study limitations in some (possibly many) cases, the first page of the guide provides all necessary sources and advice for an assignment. we measured actual use of secondary pages, but were unable to measure recognition of navigation elements where the student did not use the secondary pages because they had no need for additional resources. because it wasn’t possible to control use of the guides during the periods studied, it is possible that factors other than the design changes contributed to the pattern of hits. though subject guides rather than class guides were used to limit the influence of instruction in the use of guides, it wasn’t possible to determine with certainty if other faculty members instructed a significant number of students in the use of particular guides during the periods examined. the comparison group was slightly dissimilar in that they had fewer pages than the experimental guides; however, the number of pages on a guide did not correlate with a change in percentage of hits on secondary pages from one semester to the next. application of findings when presented with the study results, the full library faculty at emu expressed interest in using both design changes across all library research guides. the change to tab design—which is easiest to implement—has been made to all subject guides. some librarians also chose to add content menus to selected guides. since the complexity of research guides is also a factor in successful navigation,35 a recent libguides enhancement was used to move elements from the header area to the bottom of the guides. the elements moved out of the header included the date of last update, guide url, print option, and rss updates. the investigators hypothesize that the reduced complexity of the header may help in recognizing the tab navigation. although convinced that the experimental changes made a difference to independent student navigation in research guides, the authors hope to find further ways to strengthen independent navigation. vendor design changes to enhance the tab metaphor, such as creating a more visible connection between the active tab and page, might also improve navigation.36 information technology and libraries | se ptember 2012 62 conclusion designing navigation for complex sites, such as library research guides, is likely to be an ongoing challenge. this study suggests that thoughtful design changes can improve navigation. in this case, both duplicate menu links and improvements to tab design improved independent student navigation of complex research sites. references and notes 1. eric a. cooper, “library guides on the web: traditional tenets and internal issues,” computers in libraries 17, no. 9 (1997): 52. 2. barbara dubicki beaton et al., libguides usability task force guerrilla testing (ann arbor: university of michigan, 2009), http://www.lib.umich.edu/content/libguides-guerillatesting. 3. jenny corbin and sharon karasmanis, health sciences information literacy modules usability testing report (bundoora, australia: la trobe university library, 2009), http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852. 4. rachel hungerford, lauren ray, christine tawatao, and jennifer ward, libguides usability testing: customizing a product to work for your users (seattle: university of washington libraries, 2010), 6, http://hdl.handle.net/1773/17101. 5. mit libraries, research guides (libguides) usability results (cambridge, ma: mit libraries, 2008), http://libstaff.mit.edu/usability/2008/libguides-summary.html; mit libraries, guidelines for staff libguides (cambridge, ma: mit libraries, 2011), http://libguides.mit.edu/staff-guidelines. 6. jonathan w. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151-67, doi:10.1287/isre.13.2.151.88. 7. jakob nielsen, “is navigation useful?,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/20000109.html. 8. thomas s. tullis, “web-based presentation of information: the top ten mistakes and why they are mistakes,” in hci international 2005 conference: 11th international conference on human-computer interaction, 22–27, july 2005, caesars palace, las vegas, nevada usa (mahwah nj: lawrence erlbaum associates, 2005), doi:10.1.1.107.9769. 9. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 355–67, http://crl.acrl.org/content/62/4/355.short. 10. erica olmsted-hawala et al., usability evaluation of the business and industry web site, survey methodology #2009–15, (washington, dc: statistical research division, u.s. census bureau, 2009), http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf. 11. jennifer chen et al., usability evaluation of the governments division public web site, survey http://www.lib.umich.edu/content/libguides-guerilla-testing http://www.lib.umich.edu/content/libguides-guerilla-testing http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852 http://hdl.handle.net/1773/17101 http://crl.acrl.org/content/62/4/355.short http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf improving independent student navigation of complex educational websites | pittsley and memmott 63 methodology #2010–02, (washington, dc: u.s. census bureau, usability laboratory, 2010), 19, http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf. 12. jan panero benway, “banner blindness: what searching users notice and do not notice on the world wide web” (phd diss., rice university, 1999), 75, http://hdl.handle.net/1911/19353. 13. tullis, “web-based presentation of information.” 14. donald j. nicolson et al., “combining concurrent and sequential methods to examine the usability and readability of websites with information about medicines,” journal of mixed methods research 5, no. 1 (2011): 25–51, doi:10.1177/1558689810385694. 15. kathryn yelinek et al., “using libguides for an information literacy tutorial 2.0,” college & research libraries news 71, no. 7 (july): 352–55, http://crln.acrl.org/content/71/7/352.short 16. michael l. bernard, “developing schemas for the location of common web objects,” proceedings of the human factors and ergonomics society annual meeting 45, no. 15 (october 1, 2001): 1162, doi:10.1177/154193120104501502. 17. jean a. pratt, robert j. mills, and yongseog kim, “the effects of navigational orientation and user experience on user task efficiency and frustration levels,” journal of computer information systems 44, no. 4 (2004): 93–100. 18. john d. mccarthy, m. angela sasse, and jens riegelsberger, “could i have the menu please? an eye tracking study of design conventions,” people and computers 17, no. 1 (2004): 401–14. 19. scott l. jones, “evolution of corporate homepages: 1996 to 2006,” journal of business communication 44, no. 3 (2007): 236–57, doi:10.1177/0021943607301348. 20. lynne cooke, “how do users search web home pages?” technical communication 55, no. 2 (2008): 185. 21. luke wroblewski, “the history of amazon’s tab navigation,” lukew ideation + design, may 7, 2007, http://www.lukew.com/ff/entry.asp?178. after addition of numerous product categories made tabs impractical, amazon now relies on a left-side navigation menu. 22. a. burrell and a. c. sodan, “web interface navigation design: which style of navigationlink menus do users prefer?” in 22nd international conference on data engineering workshops, april 2006. proceedings (washington, d.c.: ieee computer society, 2006), 42– 42, doi:10.1109/icdew. 2006.163. 23. steve krug, don’t make me think! a common sense approach to web usability, 2nd ed. (berkeley: new riders, 2006), 79. 24. ibid., 82. http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf http://hdl.handle.net/1911/19353 http://crln.acrl.org/content/71/7/352.short http://www.lukew.com/ff/entry.asp?178 information technology and libraries | se ptember 2012 64 25. u.s. department of health and human services, “navigation,” in research-based web design & usability guidelines (washington, dc: u.s. department of health and human services, 2006), 8, http://www.usability.gov/pdfs/chapter7.pdf. 26. jakob nielsen, “tabs, used right,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/tabs.html. 27. matt cronin, “showcase of well-designed tabbed navigation,” smashing magazine, april 6, 2009, http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designedtabbed-navigation. 28. alex christie, “usability best practice, part 1—tab navigation,” tamar, january 13, 2010, http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation. 29. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 30. jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (2010): 52–63, doi:10.1080/10875300903530199. 31. hungerford et al., libguides usability testing. 32. christie, “usability best practice”; nielsen, “tabs, used right”; krug, don’t make me think; cronin, “showcase of well-designed tabbed navigation.” 33. christie, “usability best practice”; nielsen. “tabs, used right.” 34. eva d. vaughan, statistics: tools for understanding data in the behavioral sciences (upper saddle river, nj: prentice hall, 1998), 66. 35. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 36. springshare, the libguides vendor, has been amenable to customer feedback and open to suggestions for platform improvements. http://www.usability.gov/pdfs/chapter7.pdf http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation library use of web-based research guides jimmy ghaphery and erin white information technology and libraries | march 2012 21 abstract this paper describes the ways in which libraries are currently implementing and managing webbased research guides (a.k.a. pathfinders, libguides, subject guides, etc.) by examining two sets of data from the spring of 2011. one set of data was compiled by visiting the websites of ninety-nine american university arl libraries and recording the characteristics of each site’s research guides. the other set of data is based on an online survey of librarians about the ways in which their libraries implement and maintain research guides. in conclusion, a discussion follows that includes implications for the library technology community. selected literature review while there has been significant research on library research guides, there has not been a recent survey either of the overall landscape or of librarian attitudes and practices. there has been recent work on the efficacy of research guides as well as strategies for their promotion. there is still work to be done on developing a strong return on investment metric for research guides, although the same could probably be said for other library technologies including websites, digital collections, and institutional repositories. subject-based research guides have a long history in libraries that predates the web as a servicedelivery mechanism. a literature-review article from 2007 found that research on the subject gained momentum around 1996 with the advent of electronic research guides, and that there was a need for more user-centric testing.1 by the mid-2000s, it was rare to find a library that did not offer research guides through its website.2 the format of guides has certainly shifted over time to database-driven efforts through local library programming and commercial offerings. a number of other articles start to answer some of the questions about usability posed in the 2007 literature review by vileno. in 2008, grays, del bosque, and costello used virtual focus groups as a test bed for guide evaluation.3 two articles from the august 2010 issue of the journal of library administration contain excellent literature reviews and look toward marketing, assessment, and best practices.4 also in 2010, vileno followed up on the 2007 literature review with usability testing that pointed toward a number of areas in which users experienced difficulties with research guides.5 jimmy ghaphery (jghapher@vcu.edu) is head, library information systems and erin white (erwhite@vcu.edu) is web systems librarian, virginia commonwealth university libraries, richmond, va. mailto:jghapher@vcu.edu library use of web-based research guides | ghaphery and white 22 in terms of cross-library studies, an interesting collaboration in 2008 between cornell and princeton universities found that students, faculty, and librarians perceived value in research guides, but that their qualitative comments and content analysis of the guides themselves indicated a need for more compelling and effective features.6 the work of morris and grimes from 1999 should also be mentioned; the authors surveyed 53 university libraries, finding that it was rare to find a library with formal management policies for their research guides.7 most recently, libguides has emerged as a leader in this arena, offering a popular software-as-aservice (saas) model and as such is not yet heavily represented in the literature. a multichapter libguides lita guide is pending publication and will cover such topics as implementing and managing libguides, setting standards for training and design, and creating and managing guides. arl guides landscape during the week of march 3rd, 2011, the authors visited the websites of 99 american university arl libraries to determine the prevalence and general characteristics of their subject-based research guides. in general, the visits reinforced the overarching theme within the literature that subject-based research guides are a core component of academic library web services. all 99 libraries offered research guides that were easy to find from the library home page. libguides was very prominent as a platform, in production at 67 of the 99 libraries. among these, it appeared that at least 5 libraries were in the process of migrating from a previous system (either a homegrown, database-driven site or static html pages) to libguides. in addition to the presence and platform, the authors recorded additional information about the scope and breadth of each site’s research guides. for each site, the presence of course-based research guides was recorded. in some cases the course guides had a separate listing, whereas in others they were intermingled with the subject-based research guides. course guides were found on 75 of the 99 libraries visited. of these, 63 were also libguides sites. it is certainly possible that course guides are being deployed at some of the other libraries but were not immediately visible in visiting the websites, or that course guides may be deployed through a course management system. nonetheless, it appears that the use of libguides encourages the presence of public-facing course guides. qualitatively, there was wide diversity of how course guides were organized and presented, varying from a simple a-to-z listing of all guides to separately curated landing pages specifically organized by discipline. the number of guides was recorded for each libguides site. it was possible to append “/browse.php?o=a” to the base url to determine how many guides and authors were published at each site. this php extension was the publicly available listing of all guides on each libguides platform. the “/browse.php?o=a” extension no longer publicly reports these statistics; however, findings could be reproduced by manually counting the number of guides and authors on each site. the authors confirmed the validity of this method in the fall of 2011 by revisiting four sites and finding that the numbers derived from manual counting were in line with the previous findings. of information technology and libraries | march 2012 23 the 63 libguides sites we observed, a total of 14,522 guides were counted from 2,101 authors for an average of 7 guides per author. on average, each site had 220 guides from 32 authors (median of 179 guides; 29 authors). at the high end of the scale, one site had 713 guides from 46 authors. based on the volume observed, libraries appear to be investing significant time toward the creation, and presumably the maintenance, of this content. in addition to creation and ongoing maintenance, such long lists of topics raise a number of usability issues that libraries will also be wise to keep in mind.8 survey the literature review and website visits call out two strong trends: 1. research guides are as commonplace as books in libraries, 2. libguides is the elephant in the room, so much so that it is hard to discuss research guides without discussing libguides. based on preliminary findings from the literature review and survey, we looked to further describe how libraries are supporting, innovating, implementing, and evaluating their research guides. a ten-question survey was designed to better understand how research guides sit within the cultural environment of libraries. it was distributed to a number of professional discussion lists the week of april 19, 2011 (see appendix). the following lists were used in an attempt to get a balance of opinion from populations of both technical and public services librarians: code4lib, web4lib, lita-l, lib-ref-l, and ili-l. the survey was made available for two weeks following the list announcements. survey response was very strong, with 198 responses (188 libraries) received without the benefit of any follow-up recruitment. ten institutions submitted more than one response. in these cases only the first response was included for analysis. we did not complete a response for our own institution. the vast majority (155, 82%) of respondents were from college or university libraries. of the remaining 33, 24 (13%) were from community college libraries, with only 9 (5%) identifying themselves as public, school, private, or governmental. among the college and university libraries, 17 (9%) identified themselves as members of the arl, which comprises 126 members.9 in terms of “what system best describes your research guides by subject?” the results were similar to the survey of arl websites. most libraries (129, 69%) reported libguides as their system, followed by “customized open source system” and “static html pages,” both at 20 responses (11% each). sixteen libraries (9%) reported using a homegrown system, with three libraries (2%) reporting “other commercial system.” in terms of initiating and maintaining a guides system, much of the work within libraries seems to be happening outside of library systems departments. when asked which statement best described who selected the guides system, 67 respondents (36%) indicated their library research library use of web-based research guides | ghaphery and white 24 guides were “initiated by public services,” followed closely by “more of a library-wide initiative” at 63 responses (34%). in the middle at 34 responses (18%) was “initiated by an informal crossdepartmental group.” only 10 respondents (5%) selected “initiated by systems,” with the top down approach of “initiated by administration” gathering 14 responses (7%). when narrowing the responses to those sites that are using libguides or campus guides, the portrait is not terribly different, with 36% library-wide, 35% public services, 18% informal cross-departmental, 7% administration, and systems trailing at 4%. likewise there was not a strong indication of library systems involvement in maintaining or supporting research guides. sixty-nine responses (37%) indicated “no ongoing involvement” and an additional 35 (19%) indicated “n/a we do not have a systems department.” there were only 21 responses (11%) stating “considerable ongoing involvement,” with the balance of 63 responses (34%) for “some ongoing involvement.” not surprisingly, there was a correlation between the type of research guide and the amount of systems involvement. for sites running a “customized open source system,” “other commercial system,” or “homegrown system,” at least 80% of responses indicated either “considerable” or “some” ongoing systems involvement. in contrast, 37% of sites running libguides or campusguides indicated “considerable” or “some” technical involvement. further, the libguides and campusguides users recorded the highest percentage (43%) of “no ongoing involvement” compared to 37% of all respondents. interestingly, 20% of libguides and campus guides users answered “n/a we do not have a systems department,” which is not significantly higher than all respondents for this question at 19%. the level of interaction between research guides and enterprise library systems was not reported as strong. when asked “which statement best describes the relationship between your web content management system and your research guides?” 112 responses (60%) indicated that “our content management system is independent of our research guides” with an additional 51 responses (27%) indicating that they did not have a content management system (cms). only 12 respondents (6%) said that their cms was integrated with their research guides with a remaining 13 (7%) saying that their cms was used for “both our website and our research guides.” a similar portrait was found in seeking out the relationship between research guides and discovery/federated search tools. when asked “which statement best describes the relationship between your discovery/federated search tool and your research guides?” roughly half of the respondents (96, 51%) did not have a discovery system (“n/a we do not have a discovery tool”). only 12 respondents (6%) selected “we prominently feature our discovery tool on our guides,” whereas more than double that number, 26 (14%), said “we typically do not include our discovery tool on our guides.” fifty four respondents (29%) took the middle path of “our discovery tool is one of many search options we feature on our guides.” in the case of both discovery systems and content management systems, it seems that research guides are typically not deeply integrated. when asked “what other type of content do you host on your research guides system?” respondents selected from a list of choices as reflected in table 1. information technology and libraries | march 2012 25 answer total percent libguides/campusguides course pages 127 68% 74% “how to” instruction 123 65% 77% alphabetical list of all databases 76 40% 42% “about the library” information (for example hours, directions, staff directory, event) 59 31% 35% digital collections 34 18% 19% everything—we use the research guide platform as our website 16 9% 9% none of the above 17 9% 2% table 1. other types of content hosted on research guides system these answers reinforce the portrait of integration within the larger library web presence. while the research guides platform is an important part of that presence, significant content is also being managed by libraries through other systems. it is also consistent with the findings from the arl website visits, where course pages were consistently found within the research guides platform. for sites reporting libguides or campusguides as their platform, inclusion of course pages and how-to instruction was even higher, at 74% and 77%, respectively. another multi-answer question sought to determine what types of policies are being used by libraries for the management of research guides: “which of the following procedures or policies do you have in place for your research guides?” responses are summarized in table 2. library use of web-based research guides | ghaphery and white 26 answer total percent percent using libguides/campusguides style guides for consistent presentation 105 56 58 maintenance and upkeep of guides 94 50 53 link checking 87 46 50 required elements such as contact information, chat, pictures, etc. 78 41 56 training for guide creators 73 39 43 transfer of guides to another author due to separation or change in duties 72 38 41 defined scope of appropriate content 43 23 22 allowing and/or moderating user tags, comments, ratings 36 19 25 none of the above 36 19 19 controlled vocabulary/tagging system for managing guides 23 12 25 table 2. management policies/procedures for research guides while nearly one in five libraries reported none of the policies in place at all, the responses indicate that there is effort being applied toward the management of these systems. the highest percentage for any given policy was 56% for “style guides for consistent presentation.” best practices in these areas could be emerging or many of these policies could be specific to individual library needs. as with the survey question on content, the research-guides platform also has a role with the libguides and campusguides users reporting much higher rates of policies for “controlled vocabulary/tagging” (25% vs. 12%) and “required elements” (56% vs. 41%). in both information technology and libraries | march 2012 27 of these cases, it is likely that the need for policies arise from the availability of these features and options that may not be present in other systems. based on this supposition, it is somewhat surprising that the libguides and campusguides sites reported the same lack of policy adoption (none of the above; 19%). the final question in the survey further explored the management posture for research guides by asking a free-text question: “how do you evaluate the success or failure of your research guides?” results were compiled into a spreadsheet. the authors used inductive coding to find themes and perform a basic data analysis on the responses, including a tally of which evaluation methods were used and how often. one in five institutions (37 respondents, 19.6%) looked only to usage stats, while seven respondents (4%) indicated that their library had performed usability testing as part of the evaluation. forty-our respondents (23.4%) said they had no evaluation method in place (“ouch! it hurts to write that.”), though many expressed an interest or plans to begin evaluation. another emerging theme included ten respondents who quantified success in terms of library adoption and ease of use. this included one respondent who had adopted libguides in light of prohibitive it regulations (“we choose libguides because it would not allow us to create class specific research webpages”). several institutions also expressed frustration with the survey instrument because they were in the process of moving from one guides system to another and were not sure how to address many questions. most responses indicated that there are more questions than answers regarding the efficacy of their research guides, though the general sentiment toward the idea of guides was positive, with words such as “positive,” “easy,” “like,” and “love” appearing in 16 responses. countering that, 5 respondents indicated that their libraries’ research-guides projects had fallen through. conclusion this study confirms previous research that web-based research guides are a common offering, especially in academic libraries. adding to this, we have recorded a quantitative adoption of libguides both through visiting arl websites and through a survey distributed to library listservs. further, this study did not find a consistent management or assessment practice for library research guides. perhaps the most interesting finding from this study is the role of library systems departments with regard to research guides. it appears that many library systems departments are not actively involved in either the initiation or ongoing support of web-based research guides. what are the implications for the library technology community and what questions arise for future research? the apparent ascendancy of libguides over local solutions is certainly worth considering and in part demonstrates some comfort within libraries for cloud computing and saas. time will tell how this might spread to other library systems. the popularity of libguides, at its heart a specialized content management system, also calls into question the vitality and adaptability of local content management system implementations in libraries. more generally, does the desire to professionally select and steward information for users on research guides indicate librarian misgivings about the usability of enterprise library systems? how do attitudes library use of web-based research guides | ghaphery and white 28 toward research guides differ between public services and technical services? hopefully these questions serve as a call for continued technical engagement with library research guides. what shape that engagement may have in the future is an open question, but based on the prevalence and descriptions of current implementations, such consideration by the library technology community is worthwhile. references 1. luigina vileno, “from paper to electronic, the evolution of pathfinders: a review of the literature,” reference services review 35, no. 3 (2007): 434–51. 2. martin courtois, martha higgins, aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33 , no. 2 (2005): 188–96. 3. lateka j. grays, darcy del bosque, and kristen costello, “building a better m.i.c.e. trap: using virtual focus groups to assess subject guides for distance education students,” journal of library administration 48, no. 3/4 (2008): 431–53. 4. mira foster et al., “marketing research guides: an online experiment with libguides,” journal of library administration 50, no. 5/6 (july/september, 2010): 602–16; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5/6 (july/september, 2010): 638–56. 5. luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 (accessed august 8, 2011). 6. angela horne and steve adams, “do the outcomes justify the buzz? an assessment of libguides at cornell university and princeton university—presentation transcript,” presented at the association of academic and research libraries, seattle, wa, 2009, http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-oflibguides-at-cornell-university-and-princeton-university (accessed august 8, 2011). 7. sarah morris and marybeth grimes, “a great deal of time and effort: an overview of creating and maintaining internet-based subject guides,” library computing 18, no. 3 (1999): 213–16. 8. mathew miles and scott bergstrom, “classification of library resources by subject on the library website: is there an optimal number of subject labels?” information technology & libraries 28, no. 1 (march 2009): 16–20, http://www.ala.org/lita/ital/files/28/1/miles.pdf (accessed august 8, 2011). 9. association of research libraries, “association of research libraries: member libraries,” http://www.arl.org/arl/membership/members.shtml (accessed october 24, 2011). http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.ala.org/lita/ital/files/28/1/miles.pdf http://www.arl.org/arl/membership/members.shtml information technology and libraries | march 2012 29 appendix. survey library use of web-based research guides please complete the survey below. we are researching libraries’ use of web-based research guides. please consider filling out the following survey, or forwarding this survey to the person in your library who would be in the best position to describe your library’s research guides. responses are anonymous. thank you for your help! jimmy ghaphery, vcu libraries erin white, vcu libraries 1) what is the name of your organization? __________________________________ note that the name of your organization will only be used to make sure multiple responses from the same organization are not received. any publication of results will not include specific names of organizations. 2) which choice best describes your library? o arl o university library o college library o community college library o public library o school library o private library o governmental library o nonprofit library 3) what type of system best describes your research guides by subject? o libguides or campusguides o customized open source system o other commercial system o homegrown system o static html pages 4) which statement best describes the selection of your current research guides system? o initiated by administration o initiated by systems o initiated by public services o initiated by an informal cross-departmental group o more of a library-wide initiative library use of web-based research guides | ghaphery and white 30 5) how much ongoing involvement does your systems department have with the management of your research guides? o no ongoing involvement o some ongoing involvement o considerable ongoing involvement o n/a we do not have a systems department 6) what other type of content do you host on your research guides system? o course pages o “how to” instruction o alphabetical list of all databases o “about the library” information (for example: hours, directions, staff directory, events) o digital collections o everything—we use the research guide platform as our website o none of the above 7) which statement best describes the relationship between your discovery/federated search tool and your research guides? o we typically do not include our discovery tool on our guides o our discovery tool is one of many search options we promote on our guides o we prominently feature our discovery tool on our guides o n/a we do not have a discovery tool 8) which statement best describes the relationship between your web content management system and your research guides? o our content management system is independent of our research guides o our content management system is integrated with our research guides o our content management system is used for both our website and our research guides o n/a we do not have a content management system 9) which of the following procedures or policies do you have in place for your research guides? o defined scope of appropriate content o required elements such as contact information, chat, pictures, etc. o style guides for consistent presentation o allowing and/or moderating user tags, comments, ratings o training for guide creators o controlled vocabulary/tagging system for managing guides o maintenance and upkeep of guides o link checking information technology and libraries | march 2012 31 o transfer of guides to another author due to separation or change in duties o none of the above 10) how do you evaluate the success or failure of your research guides? [free text] 40 information technology and libraries | march 2010 mary kurtz dublin core, dspace, and a brief analysis of three university repositories this paper provides an overview of dublin core (dc) and dspace together with an examination of the institutional repositories of three public research universities. the universities all use dc and dspace to create and manage their repositories. i drew a sampling of records from each repository and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. i also examined the quality of records with reference to the methods of educating repository users. one repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the selfarchiving process. the librarian-overseen archive had the most complete and accurate records for dspace entries. t he last quarter of the twentieth century has seen the birth, evolution, and explosive proliferation of a bewildering variety of new data types and formats. digital text and images, audio and video files, spreadsheets, websites, interactive databases, rss feeds, streaming live video, computer programs, and macros are merely a few examples of the kinds of data that can be now found on the web and elsewhere. these new dataforms do not always conform to conventional cataloging formats. in an attempt to bring some sort of order from chaos, the concept of metadata (literally “data about data”) arose. metadata is, according to ala, “structured, encoded data that describe characteristics of informationbearing entities to aid in the identification, discovery, assessment, and management of the described entities.”1 metadata is an attempt to capture the contextual information surrounding a datum. the enriching contextual information assists the data user to understand how to use the original datum. metadata also attempts to bridge the semantic gap between machine users of data and human users of the same data. n dublin core dublin core (dc) is a metadata schema that arose from an invitational workshop sponsored by the online computer library center (oclc) in 1995. “dublin” refers to the location of this original meeting in dublin, ohio, and “core” refers to that fact dc is set of metadata elements that are basic, but expandable. dc draws upon concepts from many disciplines, including librarianship, computer science, and archival preservation. the standards and definitions of the dc element sets have been developed and refined by the dublin core metadata initiative (dcmi) with an eye to interoperability. dcmi maintains a website (http://dublincore.org/ documents/dces/) that hosts the current definitions of all the dc elements and their properties. dc is a set of fifteen basic elements plus three additional elements. all elements are both optional and repeatable. the basic dc elements are: 1. title 2. creator 3. subject 4. description 5. publisher 6. contributor 7. date 8. type 9. format 10. identifier 11. source 12. language 13. relation 14. coverage 15. rights the additional dc elements are: 16. audience 17. provenance 18. rights holder dc allows for element refinements (or subfields) that narrow the meaning of an element, making it more specific. the use of these refinements is not required. dc also allows for the addition of nonstandard elements for local use. n dspace dspace is an open-source software package that provides management tools for digital assets. it is frequently used to create and manage institutional repositories. first released in 2002, dspace is a joint development effort of hewlett packard (hp) labs and the massachusetts institute of technology (mit). today, dspace’s future mary kurtz (mhkurtz@gmail.com) is a june 2009 graduate of drexel university’s school of information technology. she also holds a bs in secondary education from the university of scranton and an ma in english from the university of illinois at urbana– champaign. currently, kurtz volunteers her time in technical services/cataloging at simms library at albuquerque academy and in corporate archives at lovelace respiratory research institute (www.lrri.org), where she is using dspace to manage a diverse collection of historical photographs and scientific publications. dc, dspace, and a brief analysis of three university repositories | kurtz 41 is guided by a loose grouping of interested developers called the dspace committers group, whose members currently include hp labs, mit, oclc, the university of cambridge, the university of edinburgh, the australian national university, and texas a&m university. dspace version 1.3 was released in 2005 and the newest version, dspace 1.5, was released in march 2008. more than one thousand institutions around the world use dspace, including public and private colleges and universities and a variety not-for-profit corporations. dc is at the heart of dspace. although dspace can be customized to a limited extent, the basic and qualified elements of dc and their refinements form dspace’s backbone.2 n how dspace works: a contributor’s perspective dspace is designed for use by “metadata naive” contributors. this is a conscious design choice made by its developers and in keeping with the philosophy of inclusion for institutional repositories. dspace was developed for use by a wide variety of contributors with a wide range of metadata and bibliographic skills. dspace simplifies the metadata markup process by using terminology that is different from dc standards and by automating the production of element fields and xml/html code. dspace has four hierarchical levels of users: users, contributors, community administrators, and network/ systems administrators. the user is a member of the general public who will retrieve information from the repository via browsing the database or conducting structured searches for specific information. the contributor is an individual who wishes to add their own work to the database. to become a contributor, one must be approved by a dspace community administrator and receive a password. a contributor may create, upload, and (depending upon the privileges bestowed upon him by his community administrator), edit or remove informational records. their editing and removal privileges are restricted to their own records. a community administrator has oversight within their specialized area of dspace and accordingly has more privileges within the system than a contributor. a community administrator may create, upload, edit, and remove records, but also can edit and remove all records available within the community’s area of the database. additionally, the community administrator has access to some metadata about the repository’s records that is not available to users and contributors and has the power to approve requests to become contributors and grant upload access to the database. lastly, the community administrator sets the rights policy for all materials included in the database and writes the statement of rights that every contributor must agree to with every record upload. the network/systems administrator is not involved with database content, focusing rather on software maintenance and code customization. when a dspace contributor wishes to create a new record, the software walks them through the process. dspace presents seven screens in sequence that ask for specific information to be entered via check buttons, fillin textboxes, and sliders. at the end of this process, the contributor must electronically sign an acceptance of the statement of rights. because dspace’s software attempts to simplify the metadata-creation process for contributors, its terminology is different from dc’s. dspace uses more common terms that are familiar to a wider variety of individuals. for example, dspace asks the contributor to list an “author” for the work, not a “creator” or a “contributor.” in fact, those terms appear nowhere in any dspace. instead, dspace takes the text entered in the author textbox and maps it to a dc element—something that has profound implications if the mapping does not follow expected dc definitions. likewise, dspace does not use “subject” when asking the contributor to describe their material. instead, dspace asks the contributor to list keywords. text entered into the keyword field is then mapped into the subject element. while this seems like a reasonable path, it does have some interesting implications for how the subject element is interpreted and used by contributors. dc’s metadata elements are all optional. this is not true in dspace. dspace has both mandatory and automatic elements in its records. because of this, data records created in dspace look different than data records created in dc. these mandatory, automatic, and default fields affect the fill frequency of certain dc elements—with all of these elements having 100 percent participation. in dspace, the title element is mandatory; that is, it is a required element. the software will not allow the contributor to proceed if the title text box is left empty. as a consequence, all dspace records will have 100 percent participation in the title element. dspace has seven automatic elements, that is, element fields that are created by the software without any need for contributor input. three are date elements, two are format elements, one is an identifier, and one is provenance. dspace automatically records the time of the each record’s creation in machine-readable form. when the record is uploaded into the database, this timestamp is entered into three element fields: dc.date.available, dc.date.accessioned, and dc.date.issued. therefore dspace records have 100 percent participation in the date element. for previously published materials, a separate screen asks for the original publication date, which is then 42 information technology and libraries | march 2010 placed in the dc.date.issued element. like title, the original date of publication is a mandatory field, and failure to enter a meaningful numerical date into the textbox will halt the creation of a record. in a similar manner, dspace “reads” the kind of file the contributor is uploading to the database. dspace automatically records the size and type (.doc, .jpg, .pdf, etc.) of the file or files. this data is automatically entered into dc.format.mimetype and dc.format.extent. like date, all dspace records will have 100 percent participation in the format element. likewise, dspace automatically assigns a location identifier when a record is uploaded to the database. this information is recorded as an uri and placed in the identifier element. all dspace records have a dc.identifier.uri field. the final automatic element is provenance. at the time of record creation, dspace records the identity of the contributor (derived from the sign-in identity and password) and places this information into a dc.provenance element field. this information becomes a permanent part of the dspace record; however, this field is a hidden to users. typically only community and network/systems administrators may view provenance information. still, like date, format, and identifier elements, dspace records have automatic 100 percent participation in provenance. because of the design of dspace’s software, all dspace-created records will have a combination of both contributor-created and dspace-created metadata. all dspace records can be edited. during record creation, the contributor may at any time move backward through his record to alter information. once the record has been finished and the statement of rights signed, the completed record moves into the community administrator’s workflow. once the record has entered the workflow, the community administrator is able to view the record with all the metadata tags attached and make changes using dspace’s editing tools. however, depending on the local practices and the volume of records passing through the administrator’s workflow, the administrator may simply upload records without first reviewing them. a record may also be edited after it has been uploaded, with any changes being uploaded into the database at the end of editing process. in editing a record after it has been uploaded, the contributor, providing he has been granted the appropriate privileges, is able to see all the metadata elements that have attached to the record. calling up the editing tools at this point allows the contributor or administrator to make significant changes to the elements and their qualifiers, something that is not possible during the record’s creation. when using the editing tools, the simplified contributor interface disappears, and the metadata elements fields are labeled with their dc names. the contributor or administrator may remove metadata tags and the information they contain and add new ones selecting the appropriate metadata element and qualifier from a slider. for example, during the editing process, the contributor or administrator may choose to create dc.contributor. editor or dc.subject.lcsh options—something not possible during the record-creation process. in the examination of the dspace records from our three repositories, dspace’s shaping influence on element participation and metadata quality will be clearly seen. n the repositories dspace is principally used by academic and corporate nonprofit agencies to create and manage their institutional repositories. for this study, i selected three academic institutions that shared similar characteristics (large, public, research-based universities) but which had differing approaches to how they managed their metadata-quality issues. the university of new mexico (unm) dspace repository (dspaceunm) holds a wide-ranging set of records, including materials from the university’s faculty and administration, the law school, the anderson school of business administration, and the medical school, as well as materials from a number of tangentially related university entities like the western water policy review advisory commission, new mexico water trust board, and governor richardson’s task force on ethic reform. at the time of the initial research for this paper (spring 2008), dspaceunm provided little easily accessible on-site education for contributors about the dspace record-creation process. what was offered—a set of eight general information files—was buried deep inside the library community. a contributor would have to know the files existed to find them. by summer 2009, this had changed. dspaceunm had a new homepage layout. there is now a link to “help sheets and promotional materials” at the top center of the homepage. this link leads to the previously difficult-tofind help files. the content of the help files, however, remains largely unchanged. they discuss community creation, copyrights, administrative workflow for community creation, a list of supported formats, a statement of dspaceunm’s privacy policy, and a list of required, encouraged, and not required elements for each new record created. for the most part, dspaceunm help sheets do not attempt to educate the contributor in issues of metadata quality. there is no discussion of dc terminology, no attempts to refer the contributor to a thesaurus or controlled vocabulary list, nor any explanation of the record-creation or editing process. this lack of contributor education may be explained in part because dspaceunm requires all new records dc, dspace, and a brief analysis of three university repositories | kurtz 43 to be reviewed by a subject area librarian as part of the dspace community workflow. thus any contributor errors, in theory, ought to be caught and corrected before being uploaded to the database. the university of washington (uw) dspace repository (researchworks at the university of washington) hosts a narrower set of records than dspaceunm, with the materials limited to the those contributed by the university’s faculty, students, and staff, plus materials from the uw’s archives and uw’s school of public and community health. in 2008, researchworks was self-archiving. most contributors were expected to use dspace to create and upload their record. there is no indication in the publicly available information about the record creation workflow if record reviews were conducted before record upload. the help link on the researchworks homepage brought contributors to a set of screen-by-screen instructions on how to use dspace’s software to create and upload a record. the step-through did not include instructions on how to edit a record once it had been created. no explanation of the meanings or definitions of the various dc elements was included in the help files. there also were no suggestions about the use of a controlled vocabulary or a thesaurus for subject headings. by 2009, this link had disappeared and the associated contributor education materials with it. the knowledge bank at ohio state university(osu) is the third repository examined for this paper. osu’s repository hosts more than thirty communities, all of which are associated with various academic departments or special university programs. like researchworks at uw, osu’s repository appears to be self-archiving with no clear policy statement as to whether a record is reviewed before it is uploaded to the repository’s database. osu makes a strong effort to educate its contributors. on the upper-left of the knowledge bank homepage is a slider link that brings the contributor (or any user) to several important and useful sources of repository information: about knowledge bank, faqs, policies, video upload procedures, community set-up form, describing your resources, and knowledge bank licensing agreement. the existence and use of metadata in knowledge bank are explicitly mentioned in the faq and policies areas, together with an explanation of what metadata is and how metadata is used (faq), and a list of supported metadata elements (policies). the describe your resources section gives extended definitions of each dspace-available dc metadata element and provides examples of appropriate metadata-element use. knowledge bank provides the most comprehensive contributor education information of any of the three repositories examined. it does not use a controlled vocabulary list for subject headings, and it does not offer a thesaurus. n data and analysis i chose twenty randomly selected full records from each repository. no more than one record was taken from any one collection to gather a broad sampling from each repository. i examined each record for the quality of its metadata. metadata quality is a semantically slippery term. park, in the spring 2009 special metadata issue of cataloging and classification quarterly, suggested that most commonly accepted criteria for metadata quality are completeness, accuracy, and consistence.3 those criteria will be applied in this analysis. for the purpose of this paper, i define completeness as the fill rate for key metadata elements. because the purpose of metadata is to identify the record and to assist in the user’s search process, the key elements are title, contributor/creator, subject, and description.abstract— all contributor-generated fields. i chose these elements because these are the fields that the dspace software uses when someone conducts an unrestricted search. table 1 shows the fill rate for the title element is 100 percent for all three repositories. this is to be expected because, as noted above, title is mandatory field. the fill rate for contributor/creator is likewise high: 16 of 20 (80 percent) for unm, 19 of 20 (95 percent) for uw, and 19 of 20 (95 percent) for osu. (osu’s fill rate for creator and contributor were summed because osu uses different definitions for creator and contributor element fields than do unm or uw. this discrepancy will be discussed in greater depth in the consistency of metadata terminology below.) the fill rate for subject was more variable. unm’s subject fill rate was 100 percent, while uw’s was 55 percent, and osu’s was 40 percent. the fill rate for the description.abstract subfield was 12 of 80 (60 percent) at unm, 15 of 20 (75 percent) at uw, and 8 of 20 (40 percent) at osu. (see appendix a for a complete list of metadata elements and subfields used by each of the three repositories.) the relatively low fill rate (below 50 percent) at the osu knowledgebank in both subject and description .abstract suggests a lack of completeness in that repository’s records. accuracy in metadata quality is the essential “correctness” of a record. correctness issues in a record range from data-entry issues (typos, misspellings, and inconsistent date formats) to the correct application of metadata definitions and data overlaps.4 accuracy is perhaps the most difficult of the metadata 44 information technology and libraries | march 2010 quality criteria to judge. local practices vary widely, and dc allows for the creation of custom metadata tags for local use. additionally, there is long-standing debate and confusion about the definitions of metadata elements even among librarians and information professionals.5 because of this, only the most egregious of accuracy errors were considered for this paper. all three repositories had at least one record that contained one or more inaccurate metadata fields; two of them had four or more inaccurate records. inaccurate records included a wide variety of accuracy errors, including poor subject information (no matter how loosely one defines a subject heading, “the” is not an accurate descriptor); mutually contradictory metadata (record contained two different language tags, although only one applied to the content); and one in which the abstract was significantly longer and only tangentially related than the file it described. additionally, records showed confusion over contributor versus creator elements. in a few records, contributors entered duplicate information into both element fields. this observation supports park and childress’s findings that there is widespread confusion over these elements.6 among the most problematic records in terms of accuracy were those contained in uw’s early buddhist manuscripts project. this collection, which has been removed from public access since the original data was drawn for this paper, contained numerous ambiguous, contradictory, and inaccurate metadata elements.7 while contributor-generated subject headings were specifically not examined for this paper, it must be noted that was a wide variation in the level of detail and vocabulary used to describe records. no community within any of the repositories had specific rules for the generation of keyword descriptors for records, and the lack of guidance shows. consistency can be defined as the homogeneity of formats, definitions, and use of dc elements within the records. this consistency, or uniformity, of data is important because it promotes basic semantic interoperability. consistency both inside the repository itself and with other repositories makes the repository easier to use and provides the user with higher quality information. all three repositories showed 100 percent consistency in dspace-generated elements. dspace’s automated creation of date and format fields provided reliably consistent records in those element fields. dspace’s automatic formatting of personal names in the dc.contributor.author and dc.creator fields also provided excellent internal consistency. however, the metadata elements were much less consistent for contributor-generated information. inconsistency within the subject element is where most problems occurred. personal names used as subject heading and capitalization within subject headings both proved to be particular issues. dspace alphabetizes subject headings according to the first letter of the free text entered in the keyword box. thus the same name entered in different formats (first name first or last name first) generates different subject-heading listings. the same is true for capitalization. any difference in capitalization of any word within the free-text entry generates a separate subject heading. another field where consistency was an issue was dc.description.sponsorship. sponsorship is problem because different communities, even different collections within the same community, use the field to hold different information. some collections used the sponsorship field to hold the name of a thesis or dissertation advisor. some collections used sponsorship to list the funding agency or underwriter for a project being documented inside the record. some collections used sponsorship to acknowledge the donation of the physical materials documented by the record. while all of these are valid uses of the field, they are not the same thing and do not hold the same meaning for the user. the largest consistency issue, however, came from table 1. metadata fields and their frequencies element univ. of n.m. univ. of wash. ohio state univ. title 20 20 20 creator 0 0 16 subject 20 11 8 description 12 16 17 publisher 4 4 8 contributor 16 19 3 date 20 20 20 type 20 20 20 identifier 20 20 20 source 0 0 0 language 20 20 20 relation 3 1 6 coverage 2 0 0 rights 2 0 0 provenance ** ** ** **provenance tags are not visible to public users dc, dspace, and a brief analysis of three university repositories | kurtz 45 a comparison of repository policies regarding element use and definition. unaltered dspace software maps contributor-generated information entered into the author textbox during the record-creation process into the dc.contributor.author field. however, osu’s dspace software has been altered so that the dc.contributor .author field does not exist. instead, text entered into the author textbox during the record-creation process maps to dc.creator. although both uses are correct, this choice does create a significant difference in element definitions. osu’s dspace author fields are no longer congruent with other dspace author fields. n conclusions dspace was created as repository management tool. by streamlining the record creation workflow and partially automating the creation of metadata, dspace’s developers hoped to make institutional repositories more useful and functional while time providing an improved experience for both users and contributors. in this, dspace has been partially successful. dspace has made it easier for the “metadata naive” contributor to create records. and, in some ways, dspace has improved the quality of repository metadata. its automatically generated fields ensure better consistency in those elements and subfields. its mandatory fields guarantee 100 percent fill rates in some elements, and this contributes to an increase in metadata completeness. however, dspace still relies heavily on contributorgenerated data to fill most of the dc elements, and it is in these contributor-generated fields that most of the metadata quality issues arise. nonmandatory fields are skipped, leading to incomplete records. data entry errors, a lack of authority control over subject headings, and confusion over element definitions can lead to poor metadata accuracy. a lack of enforced, uniform naming and capitalization conventions leads to metadata inconsistency, as does the localized and individual differences in the application of metadata element definitions. while most of the records examined in this small survey could be characterized as “acceptable” to “good,” some are abysmal. to improve the inconsistency of the dspace records, the three universities have tried differing approaches. only unm’s required record review by a subject area librarian before upload seems to have made any significant impact on metadata quality. unm has a 100 percent fill rate for subject elements in its records, while uw and osu do not. this is not to say that unm’s process is perfect and that poor records do not get into the system—they do (see appendix b for an example). but it appears that for now, the intermediary intervention of a librarian during the record-creation process is an improvement over self-archiving—even with education—by contributors. references and notes 1. association of library collections & technical services, committee on cataloging: description & access, task force on metadata, “final report,” june 16, 2000, http://www.libraries .psu.edu/tas/jca/ccda/tf-meta6.html (accessed mar. 10, 2007). 2. a voluntary (and therefore less-than-complete) list of current dspace users can be found at http://www.dspace. org/index.php?option=com_content&task=view&id=596&ite mid=180. further specific information about dspace, including technical specifications, training materials, licensing, and a user wiki, can be found at http://www.dspace.org/index .php?option=com_content&task=blogcategory&id=44&itemi d=125. 3. jung-ran park “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3 (2009): 213–28. 4. sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata creation process,” alt-j: research in learning technology 12, no. 1 (2004): 5–20. 5. jung-ran park and eric childress, “dc metadata semantics: an analysis of the perspectives of informational professionals,” journal of information science 20, no. 10 (2009): 1–13. 6. ibid. 7. for a fuller discussion of the collection’s problems and challenges in using both dspace and dc, see kathleen forsythe et al., university of washington ealy buddhist manuscripts project in dspace (paper presented at dc-2003, seattle, wash., sept. 28–oct. 2, 2003), http://dc2003.ischool.washington.edu/ archive-03/03forsythe.pdf (accessed mar. 10, 2007). lita cover 2, cover 3 neal-schuman cover 4 oclc 7 index to advertisers 46 information technology and libraries | march 2010 appendix a. a list of the most commonly used qualifiers in each repository university of new mexico dc.date.issued (20) dc.date.accessioned (20) dc.date.available (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.contributor.author (15)) dc.description.abstract (12) dc.identifier.citation (6) dc.description.sponsorship (4) dc.subject.mesh (2) dc.contributor.other (2) dc.description.sponsor (1) dc.date.created (1) dc.relation.isbasedon (1) dc.relation.ispartof (1) dc.coverage.temporal (1) dc.coverage.spatial (1) dc.contributor.other (1) university of washington dc.date.accessioned (20) dc.date.available (20) dc.date.issued (20) dc.format.mimetype (20) dc.format.extent (20) dc. identifier.uri (20) dc.contributor.author (18) dc.description.abstract (15) dc.identifier.citation (4) dc.identifier.issn (4) dc.description.sponsorship (1) dc.contributor.corporateauthor (1) dc.contributor.illustrator (1) dc.relation.ispartof (1) ohio state university dc.date.issued (20) dc.date.available (20) dc.date.accessioned (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.description.abstract (8) dc.identifier.citation (4) dc.subject.lcsh (4) dc.relation.ispartof (4) dc.description.sponsorship (3) dc.identifier.other (2) dc.contributor.editor (2) dc.contribtor.advisor (1) dc.identifier.issn (1) dc.description.duration (1) dc.relation.isformatof (1) dc.description.statementofresponsibility (1) dc.description.tableofcontents (1) appendix b. sample record dc.identifier.uri http://hdl.handle.net/1928/3571 dc.description.abstract president schmidly’s charge for the creation of a north golf course community advisory board. dc.format.extent 17301 bytes dc.format.mimetype application/pdf dc.language.iso en_us dc.subject president dc.subject schmidly dc.subject north dc.subject golf dc.subject course dc.subject community dc.subject advisory dc.subject board dc.subject charge dc.title community_advisory_board_charge dc.type other 2 information technology and libraries | december 2007 editorial: farewell and thank you john webb this issue of information technology and libraries (ital), december 2007, marks the end of my term as editor. it has been an honor and a privilege to serve the lita membership and ital readership for the past three years. it has been one of the highlights of my professional career. editing a quarterly print journal in the field of information technology is an interesting experience. my deadlines for the submission of copy for an issue are approximately three and a half months prior to the beginning of the month in which the issue is pub­ lished; for example, my deadline for the submission of this issue to ala production services was august 15. therefore, most articles that can appear in an issue were accepted in final form at least five months before they were published. some are older; one was a baby at only four months old. when one considers the rate of change in information technologies today, one understands the need for blogs, wikis, lists, and other forms of profes­ sional discourse in our field. what role does ital play in this rapidly changing environment? for one, unlike these newer forms, it is double­blind refereed. published articles run a peer review gauntlet. this is an important distinction, not least to the many lita members who work for aca­ demic institutions. it may be crass to state it so baldly, but publication in ital can help one earn tenure, an old­fashioned fact of life. it is indexed or abstracted in nineteen published sources, not all of them in english. many of its articles appear in various digital repositories and archives, and these also are harvested or indexed or both. in addition, its articles are cataloged in worldcat local. many of lita’s most prominent members—your distinguished peers—have published articles in ital. the journal also serves as a source for the wider dis­ semination of sponsored research, a requirement of most grants. and you can read it on the bus or at the beach (heaven forbid!), in the brightest sunlight, or with a flashlight under the covers (though there are no reports of this ever having been observed). i am amazed at how quickly these three years have passed, though that may be at least as much a function of my advanced age as of the fun and pleasure i have had as editor. certainly, these past three years have hosted some notable landmarks in our history. lita and ital both celebrated their fortieth anniversaries. sadly, the death of one of lita’s founders and ital’s first editor, frederick g. kilgour, on july 31, 2006, at age ninety­two, was a landmark in the passing of an era. oclc and rlg’s merger, which fred lived to witness, was a landmark of a different sort—one of maturity, we hope. ital is now an electronic as well as a print journal. this conversion has had some rough passages, but i trust these will have been ironed out by the time you read this. when i became editor, i had a number of goals for the journal, which i stated in my first editorial in march 2005. reading that editorial today, i realize that we successfully accomplished the concrete ones that were most important to me then: increasing the number of articles from library and i­school faculty; increasing the number that result from sponsored research; increasing the number that describe any relevant research or cutting­edge advance­ ments; increasing the number of articles with multiple authors; and finding a model for electronic publication of the journal. the accomplishment of the most abstract and ambitious goal, “to make ital a destination journal of excellence for both readers and authors,” only you, the readers and authors, can judge. i thank mary taylor, lita executive director, and her staff for all of the support they provided to me during my term. i owe a debt that i can never repay to all of the staff of ala production services who worked with me these past three years. their patience with my some­ times bumbling ways was award­winning. thank all of you. the lita presidents and other officers and board members were unfailingly supportive, and i thank you all. in the lita organizational structure, the ital editor and the editorial board report to the lita publications committee, and the editor is a member of that body. i thank all of the chairs and other members of that commit­ tee for their support. once more, and sadly for the last time, i thank all of the members of the ital editorial board who served dur­ ing my term for their service and guidance. they perform more than their share of refereeing, but more importantly, as i have written before, they are the junkyard dogs who have kept me under control and prevented my acting on my worst instincts. i say again, you, the lita member­ ship and ital readership, owe them more than you can ever guess. trust me. to marc truitt, ital managing editor and the incom­ ing ital editor for the 2008–2010 volume years, i must say, “thank you, thank you, thank you!” marc and the ala production services staff were responsible for the form, fit, and finish of the journal issues you received in the mail, held in your hands, and read under the covers. finally, most of all, thank you authors whose articles, communications, and tutorials i have had the privilege to publish, and you whose articles have been accepted and await publication. john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial: farewell and thank you | john webb 3 not only is this the end of my term as editor, but i also have retired. from now on, my only role in the field of library and information technology will be as a user. those of you have seen the movie the graduate probably remember the early scene when benjamin, the dustin hoffman character, receives the single word of advice regarding his future: “plastics.” (i don’t know if that scene is in the novel from which the movie was adapted.) my single word of advice to those of you too young or too ambitious to retire from our field is: “handhelds.” i am surprised that my treo is more valuable to me now in retirement than it was when i was working. (i’m not surprised that my ipod video is, nor that word thinks that treo and ipod are misspellings.) i just wish that more of the web was as easily accessible on my treo as are google maps and almost all of yahoo!. handhelds. trust me. technology integration in storytime programs: provider perspectives article technology integration in storytime programs provider perspectives maria cahill, erin ingram, and soohyung joo information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.15701 maria cahill (maria.cahill@uky.edu) is professor, university of kentucky. erin ingram (erin.ingram@chpl.org) is youth librarian, cincinnati and hamilton county public library. soohyung joo (soohyung.joo@uky.edu) is associate professor, university of kentucky. © 2023. abstract technology use is widespread in the lives of children and families, and parents and caregivers express concern about children’s safety and development in relation to technology use. children’s librarians have a unique role to play in guiding the technology use of children and families, yet little is known about how public library programs facilitate children’s digital literacy. this study sought to uncover librarians’ purposes for using technology in programs with young children as well as the supporting factors and barriers they encountered in attempting to do so. findings reveal 10 purposes for integrating technology into public library storytime programs and 15 factors across four dimensions that facilitate and/or inhibit its inclusion. if librarians are to embrace the media mentor role with confidence and the necessary knowledge and skills required of the task, much greater attention should be devoted to the responsibility and more support in the way of professional development and resources is necessary. introduction technology use is widespread in the lives of children and families. from a very early age, children in highly developed countries across the world regularly interact with technology and data from device trackers substantiate parental reports.1 nearly all families have access to one or more mobile devices, and nearly three-fourths of children in the united states begin some form of digital engagement, primarily television viewing, before age three.2 prior to formal schooling, children (ages two to four) in highly developed countries tend to use a device with a screen for about two and a half hours per day on average.3 differences in screen use by income level and race are significant, with children from lowerincome families and children of color spending more time on electronic devices than children from higher-income families and children who are white. though most parents do allow their children to use technology, many voice some concerns about their children’s well-being, particularly regarding privacy as well as the content of the media.4 yet, young children’s digital activity can be beneficial, particularly when the technology is designed to foster active, meaningful engagement and when it facilitates social interaction.5 in light of children’s usage and parents’ concerns, librarians in public libraries have a unique role to play in this information realm. not only can librarians provide access to technology and recommended resources but they can also provide guidance in how to use technology to contribute to children’s learning, especially in the areas of reading, information literacy, and academic concepts.6 yet, little is known about whether librarians actually facilitate children’s digital literacy through integration of technology into programs, and this dearth of empirical mailto:maria.cahill@uky.edu mailto:erin.ingram@chpl.org mailto:soohyung.joo@uky.edu information technology and libraries june 2023 technology integration in storytime programs 2 cahill, ingram, and joo evidence is highlighted in the association of library services to children (alsc) research agenda.7 storytime, as a program attended by both children and caregivers, can be used as a time for children’s librarians to integrate technology for the purposes of modeling and explaining how various electronic tools might be beneficial for young children.8 due to this potential, it is important to understand how and why children’s librarians are—or are not—integrating technology into storytime programs. previous studies of technology use in children’s programs and storytimes internationally, there have been few investigations of technology integration within library programs for young children. within the united states, two survey studies, both commissio ned by alsc, sought to capture the use of technology in youth programming.9 the initial survey launched in 2014 and the follow-up survey in 2018. respondents to these surveys reported that the types of devices used most often in libraries were proprietary institutional devices, digital tablets, tangible tech such as squishy circuits that allow children to build electrical circuits with play dough, and programmable tech such as cubetto, a wooden robot toy.10 additionally, more than half of respondents working in medium and large libraries and more than 45% of those working in small libraries indicated using digital devices during storytimes.11 conversely, a comprehensive study of programming for young children in public libraries, which included observations, concluded that, “while many libraries offer families a place to use computers and other digital resources together, few libraries actively promote the use of technology during their programming.”12 notably, neither the 2014 nor 2018 alsc survey included questions about the types of technology used in storytimes, nor were respondents asked to explain their thoughts on why or how technology was or was not included in storytime.13 a study conducted in aotearoa new zealand collected data about technology use in storytime in three phases: a survey of 25 children’s librarians, interviews with librarians in nine libraries, and a survey of 28 caregivers who attend a library storytime with a young child.14 slightly more than a quarter of the librarians responding to the survey reported incorporating digital technology such as tablets or e-books into storytime programs. the most common rationale for technology use in storytime was to educate caregivers. other reasons included for the novelty of it and to promote accessibility and the aims of library services. interviewees explained that they used technology in storytime to show caregivers the availability of high-quality digital media such as e-books and educational apps, with one likening the use and recommendation of digital media to librarians’ traditional role as recommenders of storybooks (i.e., readers’ advisory services). conversely, one interviewee expressed reservations about using technology for fear that children would be distracted from the content of the story. the majority of caregiver respondents who had attended a storytime with digital technology reported enjoying the experience. however, those who had never attended a storytime with technology were apprehensive about doing so. technology best practices: joint media engagement and media mentorship recent scholarship encourages children’s librarians to use their expertise and experience to evaluate and recommend technology and new media resources as well as to model for adults how to interact with children as they use technology.15 for example, librarians can promote joint media engagement during storytimes both by modeling the practice and by directly explaining it to the adults in attendance. using technology during storytime can be seen as modeling modern literacy practices, just as reading print books has modeled literacy practices in traditional storytimes since the 1940s .16 information technology and libraries june 2023 technology integration in storytime programs 3 cahill, ingram, and joo alsc instructs youth services librarians to act as media mentors, a role that means they will assist caregivers in choosing and using technology by researching new technology and by modeling technology use, such as joint media engagement, for caregivers in programs such as storytimes.17 media mentorship is seen as an extension of how youth services librarians have traditionally been called upon to meet the needs of caregivers and children with their knowledge of child development and ability to facilitate caregivers’ information seeking.18 while alsc encourages media mentorship, the extent to which children’s librarians have embraced this role is unclear in professional research. findings from prior surveys and interviews with storytime providers suggest that librarians are regularly integrating technology into programs while observations of library programs suggest otherwise.19 further, goulding and colleagues found that while many librarians were comfortable recommending technology such as apps, it was unclear whether or not they were modeling its use during storytimes.20 study objectives the overarching research question of this study is “how do storytime providers view the integration of technology into storytime programs?” the following three research questions guide this study. 1. what are the purposes for using technology in storytimes? 2. what are factors associated with adopting technology in storytimes? 3. what are barriers to integrating technology in storytimes? method participants as part of a larger institute of museum and library services (imls)-funded, multistate study that was approved by the university of kentucky institutional review board (irb number 42829), researchers conducted semi-structured interviews with 34 library staff who facilitate storytime programs at public libraries serving urban, suburban, and rural communities across kentucky, ohio, and indiana.21 interviewees were not asked to identify their race or ethnicity. thirty-two identified as female and two as male. all but one of the participants (97%) had earned a college degree, but only 13 (38.2%) held a master’s degree from a library and information science (lis) program, while another two were enrolled in an lis master’s degree program when the interviews occurred. the majority of participants (57.1%) had five years or more of experience in children’s library services. the participants will be referred to as “storytime providers.” procedure the interviews were conducted by one member of the research team. other members of the team created written transcripts from recordings of the interviews. for the study reported in this paper, researchers focused on participants’ answers to the interview question “what place, if any, does technology or digital media have in a quality storytime program?” an open coding method was used to organize participants’ statements within three categories: purposes underlying technology use, factors associated with technology adoption, and barriers to technology integration. three researchers conducted open coding independently and came up with the initial set of coding results. then, the researchers discussed the coding results multiple times to assess the relevance of the coded constructs, refine operational definitions, and select one representative quote for each code. interviewees were assigned a number between 1 and 34 to eliminate identifying information. information technology and libraries june 2023 technology integration in storytime programs 4 cahill, ingram, and joo results what are the purposes for using technology in storytimes? to find answers to this research question, the researchers coded statements related to how or why interviewees used or wanted to use technology in storytime programs. we identified 10 specific purposes, formed operational definitions for each, and chose one representative quote (table 1). although most purposes had statements from more than one interviewee associated with them, we collaborated to choose one example due to space constraints. researchers determined that the purposes for technology use could be divided into two categories: experiential and learning. experiential purposes are those for which technology is used to create a positive, engaging experience for child and/or adult participants. learning purposes are those for which technology use is intended to help child and/or adult participants learn. what are factors associated with adopting technology in storytimes? to answer the second research question, researchers looked for statements explaining the reasons or causes for storytime providers using or wanting to use technology in their storytime programs. these would be factors that facilitate technology adoption. researchers coded statements independently and then discussed results multiple times to verify relevance and consolidate categories into 15 factors in four dimensions: storytime provider, participant, library system, and content. though many factors had more than one corresponding statement from participants, we chose one representative quote for each. results are presented in table 2. what are barriers to integrating technology in storytimes? to answer this question, researchers independently reviewed responses, looking for statements related to why storytime providers did not or did not wish to use technology during storytime. after individual coding, we collaborated to verify relevance, refine definitions of the 15 identified barriers, and choose representative quotes. the results are presented in table 3. researchers found that three of the dimensions created for factors that lead to techno logy adoption could also be applied to barriers to technology integration: storytime provider, participant, and library system. information technology and libraries june 2023 technology integration in storytime programs 5 cahill, ingram, and joo table 1. purposes for using technology in storytimes category purpose operational definition representative quote experiential accommodating large groups technology is used to enable a large group to view books/materials 2: “i had this huge group of kids. and i took them to our red room and did a story on our big screen. you know, through tumblebooks.” children’s enjoyment provider incorporates media or technology because children enjoy it 14: “and then as far as, um, sometimes, um, we’ll have, like, at the end of a storytime, we may have a little short, um, like nonfiction or sign language or if we were doing something on the alphabet, maybe i would throw in a little dvd and give them popcorn for the end of storytime and things like that and i think that they really enjoy that. it is important to integrate that in.” facilitating adult participation provider uses technology to display the words to songs to facilitate adult participation 12: “the closest thing i would say, i use a powerpoint that has the words on it for the parents to be able to follow along, um, or for the kids if they can pick out some of the letters or start to read, even some of the older ones.” facilitating movements technology is used to facilitate movements or dancing 19: “in addition to our singing, just to give, you know, to change it up a little bit. so, they can hear the music. we clap rhythms. so, we use that a lot.” playing songs/music technology is used to play songs or music 13: “we have a sound system that i love, with surround sound. we always do our last song with, you know, that, and i’ve been fortunate that it’s worked all the time.” information technology and libraries june 2023 technology integration in storytime programs 6 cahill, ingram, and joo category purpose operational definition representative quote sound effects technology is used to create a sound or voice 17: “one of the better things that i’ve done, that i like to do, is, i like to use animal sounds. i’ll research or pull up a list of sounds on youtube or whatever and have the kids listen to them. i think that’s always been a fun way to work in a little bit of technology without taking out all of the flow.” visual aids technology is used to support children’s visual experience 24: “and, like, it gives the kids a visual. and i feel like sometimes, if we could give them a better visual, they might be more engaged.” learning support for adult-child interaction technology is used to support adult-child interaction 1: “if you’re actually sitting down with your child, looking at it together, it’s a lot more effective and the child is getting a lot more out of it versus just sitting them in front of it and expecting to teach something to the child.” teaching caregivers technology is integrated to model for caregivers 11: “i think it’s important to share with parents really good e-resources, such as, like, apps. and books and stuff. so, that, i think it’s very important…. i have, like, when i have like a screen, a projector screen, maybe when the book i picked for the storytime was an e-book that they could get through the library, and kind of, you know, advertise that resource, and then we would, we would read the e-book, you know, from the projector. so i’ve done, like, e-books and stuff.” teaching concepts technology is used to present letters, words, numbers, shapes, sign language, colors, or coding skills, to children 22: “…. all these different color songs, um, and they’re actually just on youtube…. so that is one way that we’ve been incorporating technology, um, is with those color songs because it spells it out for them. they can see the word, it’s a familiar tune, and it helps them, you know, at least be able to sing, sing the song.” information technology and libraries june 2023 technology integration in storytime programs 7 cahill, ingram, and joo table 2. factors associated with adopting technology in storytimes dimension factor operational definition representative quotes storytime provider awareness provider is aware of the tool/technology available for storytime 1: “i’m aware of all kinds of apps that are out there and of course the ebooks.” familiarity provider feels comfortable with the technology and with integrating the technology into programs 1: “i feel like it’s going to be effective if it’s what you’re comfortable with and you’re excited about. because that will come through when you actually provide the storytime.” choice of provider ultimately it is up to the provider to choose to integrate technology or not 1: “i think it all depends on the provider.” provider’s philosophy and approach how the provider views storytime and its purpose influences technology integration 1: “everyone has their own, unique storytime philosophy and the way that they approach planning storytimes…. so, really, a lot of it is just ... theory of how you want to approach it since there’s so many options out there.” reaction/success with initial attempt if the provider tried technology integration, the success or failure of that initial attempt influences subsequent attempts 2: “it went over really well.” information technology and libraries june 2023 technology integration in storytime programs 8 cahill, ingram, and joo dimension factor operational definition representative quotes research base provider is aware of research to support integration of technology 1: “... it’s kind of what the research is saying with parents and digital media at home. it all depends on how you are using it. if you’re actually sitting down with your child, looking at it together, it’s a lot more effective and the child is getting a lot more out of it versus just sitting them in front of it and expecting to teach something to the child.” participant number of participants the number of participants facilitates technology integration 2: “i think this summer was the first time i ever did that [used technology], and it was because i had this huge group of kids.” perception of caregivers’ reactions provider’s perception of how the caregivers would react to technology use 1: “i think they would probably be open to it…. i don’t know if maybe the perception some parents don’t want any technology, that would keep some people from appreciating it. but i think in general, it would be wellreceived if we tried it.” responsive to children’s interests provider uses digital resources because the children show interest or engagement 10: “kids are automatically interested in that stuff. they don’t need to be enticed. you know, you just get out an iphone or an ipad and they’re, like, gasp.” library system access to equipment and resources provider has access to technology and tools 1: “... we have technology, i think, in our system to implement it. you know, e-readers and ipads and things that we can use in storytimes. and large screen tvs.” information technology and libraries june 2023 technology integration in storytime programs 9 cahill, ingram, and joo dimension factor operational definition representative quotes colleague support provider is part of a branch or system that shares information and resources for technology integration 17: “so, you know, we have, and we’ve gotten pretty [good] at sharing with other storytime providers in our system if we have any websites or anything that we’ve been using or music that works really well for ‘movers and shakers’ or anything like that.” expectation to integrate technology in programs provider feels pressure to integrate technology and is defensive about the choice not to do so 1: “i kind of apologize for it…. so, we have the technology available, and they encourage us to use it....” training provider has used or wants to use technology during storytime because of a training 17: “we did a digital mentoring training about how to appropriately model, like, tech skills and screen time with families. so we’ve been encouraged to add in a little bit more technology into our storytimes if we can do those, you know, in an appropriate way.” content interactivity provider can use technology to facilitate interactivity 24: “... i would love to use some, like, smart tvs, smart boards, those kind of things. just for some interactive songs and you know, activities... when i go into these kindergartens and first grade and second grade rooms, like, these kids are using the smart boards for interactive activities for abcs and colors and shapes and numbers. and it may be through an activity or a song that’s being used with that smart board. and i say, ‘oh, i love that! i wish i could do that!’” theme provider uses technology that clearly connects to the theme of the storytime 17: “actually in my kinderbridge storytime now, it’s shapes month. we have the osmotangrams that i bring out. so that’s one of the ones all four weeks i’m going to use the apps and bring out both of our ipads so that kids can practice those spatial shapes.” information technology and libraries june 2023 technology integration in storytime programs 10 cahill, ingram, and joo table 3. barriers to integrating technology in storytime dimension barrier operational definition representative quote storytime provider fear of difficulties/ problems provider doesn’t plan or hesitates to plan technology use because there may be problems with using it 13: “but technology can be a problem. when you’re planning or something and it’s not working.” previous/ own child’s experiences with tech provider has negative experience using technology with children 5: “i have a four-year old. and it’s interesting to see how he responds to technology and what he responds to. and what helps him to learn the most. and it’s just, like, night and day what he learns from. you know, hearing repeated songs and rhymes and just reading tons of books versus what he learns.… i mean, i think that probably the most he ever learned from an ipad was getting to watch sesame street. just sort of the same, sort of like watching a storytime, i think. but yeah, i think just now from experience seeing like, ‘oh! that really doesn’t. it’s not a helpful tool, i don’t think, for that age.’ just from my experience.” undecided about the value of tech provider is unsure if tech integration is appropriate 5: “i have been all over the board in terms of that subject … like i said, it’s really important for me to pack in as much of what i think they need in a storytime. and i don’t know, again, i’m not sure that i’m doing exactly what is correct and maybe i should be exposing them more. but i feel like, especially for threeto five-year olds, it’s one of those things.... screen time/ overuse concerns provider is concerned about children’s screen time 2: “because i think there’s plenty of opportunity to be had in other places.” information technology and libraries june 2023 technology integration in storytime programs 11 cahill, ingram, and joo dimension barrier operational definition representative quote storytime activities as purposeful alternative to technology provider deliberately chooses not to use technology in storytime because they see storytime activities as equally or more beneficial 16: “and one thing that i’ve gotten feedback on is that kids are exposed to the technology in pretty much every facet of their life, so if we can make this a space where they can learn and experience things in a way that doesn’t have technology and they can see that it’s still really fun and exciting and we can learn a lot, then that has its own place, too.” unwilling to adopt a new technology provider keeps using the prior tool and does not try a new alternative technology 18: “i’m kind of old school because we’ve been using our cd player.” participant children devalue other components of storytime when tech is integrated provider perceives that the children prefer tech over other components of storytime 5: “i used to sometimes show a short video, and then i kind of found that that’s what they looked forward to most. i wanted to sort of change that perception of what the library was for some kids.” difficult to use tech with young children provider experiences difficulty using technology with young children 5: “i have found, for preschoolers, that it is really hard to incorporate anything digital.” lack of access to the internet poor broadband in rural area; why expose children to something they can’t use at home 5: “i feel like, especially here in this rural area, … [w]e have a really poor broadband network here, so not a lot of people have access to the internet. and so sometimes i feel like, also, showing them something that they can’t really utilize at home is not really helpful until they’re a little older also. information technology and libraries june 2023 technology integration in storytime programs 12 cahill, ingram, and joo dimension barrier operational definition representative quote perception or anticipated perception of some parents/ caregivers if the provider perceives that some parents/caregivers will object to tech integration, the storytime provider may be reluctant to do so 1: “i don’t know if maybe the perception, some parents don’t want any technology, that would keep some people from appreciating it.” tech is distracting for young children provider believes technology is distracting 5: “personally, i think i kind of get distracted by the media, so, then i think they would, too. library system lack of access to devices library does not have a certain device or technology even though the provider would like to have or think useful for storytime 24: “um, i’ll be honest with you, if we had the ability, i would love to use some like smart tvs, smart boards, those kind of things.… we just don’t really have that option here.” lack of time to integrate tech into storytime, the provider has to have time to explore tools and know the best resources/media to integrate, and that takes time 1: “and part of it’s time, too. having the time to find quality resources, and to learn how to use them. because we have the technology, i think, in our system to implement it. “ information technology and libraries june 2023 technology integration in storytime programs 13 cahill, ingram, and joo dimension barrier operational definition representative quote lack of training provider thinks self doesn’t have the knowledge, interest, skill, or training to use technology during storytime 15: “and i’d be open to ways to use it, but i guess i haven’t taken, you know, any trainings on … i mean, i really haven’t seen a lot of things offered at conferences.” old facility library does not support installing newer technology 21: “... that’s a thing that we have struggled with previously because of our infrastructure and set-up. it was almost a hazard to set up a projector and have some sort of digital aspect to storytime.” information technology and libraries june 2023 technology integration in storytime programs 14 cahill, ingram, and joo discussion purposes experiential many of the storytime providers’ purposes for using technology revealed a goal to create a positive, engaging experience for all children and adults who attend storytime, a theme that prior research has highlighted.22 specifically, technology facilitates the sharing of visual aids, sound effects, and songs. providers also use technology to encourage adult participation, and like their early childhood educator colleagues, storytime providers in this study reported using technology to scaffold and coordinate children’s gross motor movements with songs and action rhymes.23 learning storytime providers’ responses also show the aim to contribute to the learning of children and adults in storytime. this finding mirrors those of goulding, shuker, and dickie, which found that providers like to use technology in ways that coincide with the aims of children’s services. 24 two of the purposes show an awareness of best practices in technology integration: support for adult-child interaction and teaching caregivers.25 additionally, storytime can be an opportune time for providers to model technology best practices for caregivers as providers have been modeling literacy best practices throughout the history of storytime programming.26 importantly, when storytime providers do model and intentionally seek to support caregivers’ learning, caregivers expand their knowledge, experience heightened confidence, and tend to utilize the strategies they encountered.27 notably, storytime providers tend to feel discomfort with providing instructional or developmental information directly to caregivers via “asides”; thus, a more palatable approach for many storytime providers might include using “we” language along the lines of “when we use digital media, we want to be sure that we are developing healthy habits. some families set a timer to help them monitor the duration of their children’s screentime.”28 one way that storytime providers might model digital media use is to search for and find information related to the storytime theme or book in one of the library’s databases. for example, if a book shared in storytime included a sloth, the storytime provider might demonstrate how to search for a video of a sloth in one of the library’s digital encyclopedias (e.g., encyclopedia britannica). storytime providers should also keep in mind that digital play can be incorporated into the informal activities that typically occur before and after storytime programs as a means to support children’s social interaction with other children.29 for example, if puzzles are typically included as one of the informal activity options before or after the storytime program, the provider might offer both traditional and digital puzzles (e.g., https://kids.nationalgeographic.com/games/puzzles/) on library-owned tablets and provide a simple how-to if needed. supports and barriers through the process of open coding, researchers identified four dimensions that storytime providers’ perceived supports and barriers could fall into based on the primary influential factor: provider, library system, participants, or content. provider the providers’ perceptions about technology and experiences with technology in the library setting serve as facilitators or barriers to integration. if a provider is aware of useful technology, familiar and comfortable with its use, knowledgeable of research supporting technology use, has a https://kids.nationalgeographic.com/games/puzzles/ information technology and libraries june 2023 technology integration in storytime programs 15 cahill, ingram, and joo professional philosophy that can accommodate technology use, and/or has had a positive experience trying out technology, then these may be factors that lead to the adoption of technology in storytime. on the other hand, if the provider has concerns about the difficulties of technology use or the amount of time children spend on screens, if the provider’s professional philosophy views storytime as a deliberate alternative to time with technology, or if the provider has had a negative experience with technology, then these may be factors that prevent the adoption of technology in storytime. these same factors affect early childhood practitioners and influence their decisions to incorporate technology into classroom practices.30 the factors that lead to technology integration could be seen as related to media mentorship. a media mentor has awareness, familiarity, knowledge, and a professional philosophy that supports technology use, all of which were factors identified by interviewees. professional training in mentorship was mentioned by one interviewee (17) who stated, “we did a digital mentoring training about how to appropriately model, like, tech skills and screen time with families.” thus , some providers’ responses indicate some general awareness of the currently emphasized best practice of media mentorship. however, the ambivalence toward the role of media mentor that goulding and colleagues found amongst librarians is also found here as interviewees’ responses do not give a clear picture of how they model technology use for caregivers during storytimes .31 in addition, responses that highlight barriers to technology integration show ways in which some providers are opposed to employing the role of media mentor specifically during storytime. as such, our findings align with prior observational studies that noted “few instances of librarians willing to speak directly to parents about how to interact with their children using technology.”32 participant providers consider the perspectives of the adult and child participants in storytimes in relation to integrating technology. providers are more likely to integrate technology if they view it as an aid to facilitating sessions for large groups, they believe caregivers will be open to the technology, and they appreciate that young children show a high interest in devices such as ipads. however, children’s high interest in devices was seen by other providers as a negative aspect of technology use and a barrier to integration because they thought children were too focused on the technology itself or would be distracted by the technology. just as early childhood teachers have been encouraged to broaden their perspectives of literacy to encompass digital literacy, so too might storytime providers, as this shift in focus would enable them to view these incidences as engagement rather than distraction.33 also, the same interviewee who thought caregivers might be open to technology in storytime expressed the concern that other caregivers might not like its use. our findings related to caregiver reaction echo similar findings from goulding and colleagues: the reaction that providers anticipate from adult participants might be either a support or a barrier for technology integration.34 library system two aspects of the library system were present in both factors and barriers: access and training. when the library system in which the provider worked gave them access to technology and training in its use for programs, they were more likely to integrate technology. in contrast, when a provider did not have access to technology, the library building did not support its use, or training was not given, the provider was less likely to integrate technology. libraries pride themselves on providing the highest level of service to members of the community and “removing barriers to access presented by socioeconomic circumstances.”35 yet, if libraries are to facilitate the digital learning of young children, it is necessary for them to recognize the digital divide impacting information technology and libraries june 2023 technology integration in storytime programs 16 cahill, ingram, and joo children’s access to technology throughout the world, and parents’ reluctance to spend money on digital apps.36 content content was a dimension only found in factors that support technology integration, not in barriers. providers used or wanted to use technology because they could connect the technology to two essential elements in the content of storytime: interactivity and theme. this dimension relates to purposes for technology use in the learning category as providers want to use the interactivity of technology as well as technology directly related to the session’s theme to boost children’s learning. indeed, child learning has long been librarians’ goal in providing storytime programs as has facilitating the development of parent skills.37 conclusion technology is prevalent in the lives of children and many begin interacting with digital tools as early as the first year of life; and caregivers seek guidance regarding their children’s technology use.38 while alsc has championed children’s librarians as media mentors, findings from this study, coupled with those from prior research, highlight storytime providers’ opposition to the media mentor role and the integration of technology within storytime programs.39 some first steps storytime providers might take are to integrate the digital tools the library is already providing. for example, if the library offers e-books (e.g., via libby), the storytime provider might consider integrating one or more picturebooks from that collection into storytime. alternatively, if the library does not have the tools necessary to share the book electronically during the program (e.g., a screen large enough for the storytime group), the provider might read the print version but then follow that up with a comment along the lines of “grownups, did you know that the library also offers this as an e-book that you could read on a phone, tablet, or other device? i would be happy to show you how to access it and other e-books after the program.” providers looking for other ways to incorporate digital tools into library programs might read strategies recommended by librarians in a fully and freely accessible online book.40 as scholars have previously noted, early childhood providers, including those who support young children and families in libraries, need much more professional development.41 specifically, the field needs more opportunities for librarians and other early childhood educators to develop their knowledge and skills within the realm of digital technology for young children, but they also need training that advances the notion of media mentor and boosts their confidence and identities relative to that role.42 the institute of museum and library services recently funded a project designed to support librarians’ knowledge and skills within the realm of family media for children ages five to eleven years—and products from that project are certainly a good starting place for storytime providers; however, additional resources and research focused on library programs and services designed for children from birth through five years are needed.43 if librarians are to embrace the media mentor role with confidence and the necessary knowledge and skills required of the task, much greater attention should be devoted to the responsibility and more support in the way of professional development and resources is necessary. acknowledgement this work was supported by the institute of museum and library services [federal award identification number: lg-96-17-0199-17]. information technology and libraries june 2023 technology integration in storytime programs 17 cahill, ingram, and joo endnotes 1 nalika unantenne, mobile device usage among young kids: a southeast asia study (the asianparent insights, november 2014), https://s3-ap-southeast-1.amazonaws.com/tap-sgmedia/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.p df; brooke auxier, monica anderson, andrew perrin, and erica turner, parenting children in the age of screens (pew research center, 2020), https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-ofscreens/; stephane chaudron, rosanna di gioia, and monica gemo, young children (0–8) and digital technology: a qualitative study across europe (publications office of the european union, 2018), https://doi.org/10.2760/294383; organization for economic cooperation and development, what do we know about children and technology? (2019), https://www.oecd.org/education/ceri/booklet-21st-century-children.pdf; victoria rideout and michael b. robb, the common sense census: media use by kids age zero to eight, 2020 (common sense media, 2020), https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eig ht_census_final_web.pdf; jenny s. radesky et al., “young children’s use of smartphones and tablets,” pediatrics 146, no. 1 (2020). 2 unantenne, mobile device usage; auxier, anderson, perrin, and turner, parenting children; chaudron, di gioia, and gemo, young children (0-8) and digital technology. 3 rideout and robb, the common sense census; sebastian paul suggate and philipp martzog, “preschool screen-media usage predicts mental imagery two years later,” early child development and care (2021): 1–14. 4 auxier, anderson, perrin, and turner, parenting children; suggate and martzog, “preschool screen-media usage.” 5 marc w. hernandez, carrie e. markovitz, elc estrera, and gayle kelly, the uses of technology to support early childhood practice: instruction and assessment. sample product and program tables (administration for children & families, u.s. department of health & human services, 2020), https://www.acf.hhs.gov/media/7970; lisa b. hurwitz and kelly l. schmitt, “can children benefit from early internet exposure? shortand long-term links between internet use, digital skill, and academic performance,” computers & education 146 (2020): 103750; kathy hirsh-pasek et al., “putting education in ‘educational’ apps: lessons from the science of learning,” psychological science in the public interest 16, no. 1 (2015): 3–34. 6 amy koester, ed., young children, new media, and libraries: a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-andlibraries-full-pdf.pdf. 7 association for library service to children, national research agenda for library service to children (ages 0–14), 2019, https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agen da_p rint_version.pdf. https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/ https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/ https://doi.org/10.2760/294383 https://www.oecd.org/education/ceri/booklet-21st-century-children.pdf https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eight_census_final_web.pdf https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eight_census_final_web.pdf https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-and-libraries-full-pdf.pdf https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-and-libraries-full-pdf.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agenda_print_version.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agenda_print_version.pdf information technology and libraries june 2023 technology integration in storytime programs 18 cahill, ingram, and joo 8 christner, hicks, and koester, “chapter six: new media in storytimes: strategies for using tablets in a program setting.” in a. koester, ed., a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), 77-88. 9 kathleen campana, j. elizabeth mills, marianne martens, and claudia haines, “where are we now? the evolving use of new media with young children in libraries,” children and libraries 17, no. 4 (2019): 23–32; j. elizabeth mills, emily romeign-stout, cen campbell, and amy koester, “results from the young children, new media, and libraries survey: what did we learn?”, children and libraries 13, no. 2 (2015): 26–32. 10 campana, mills, martens, and haines, “where are we now?”. 11 campana, mills, martens, and haines, “where are we now?”. 12 susan b. neuman, naomi moland, and donna celano, “bringing literacy home: an evaluation of the every child ready to read program” (chicago: association for library service to children and public library association, 2017), 5, http://everychildreadytoread.org/wpcontent/uploads/2017/11/2017-ecrr-report-final. 13 campana, mills, martens, and haines, “where are we now?”; mills, romeign-stout, campbell, and koester, “results from the young children, new media, and libraries survey.” 14 anne goulding, mary jane shuker, and john dickie, “media mentoring through digital storytimes: the experiences of public libraries in aotearoa new zealand,” in proceedings of ifla wlic (2017), https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf. 15 goulding, shuker, and dickie, “media mentoring through digital storytimes”; cen campbell and amy koester, “new media in youth librarianship,” in a. koester, ed., a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), 8–24. 16 jennifer nelson and keith braafladt, technology and literacy: 21st century library programming for children and teens (chicago: american library association, 2012). 17 c. campbell, c. haines, a. koester, and d. stoltz, media mentorship in libraries serving youth (chicago: association for library service to children, 2015), https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20li braries%20serving%20youth_final_no%20graphics.pdf. 18 association for library service to children, competencies for librarians serving children in libraries. 19 campana, mills, martens, and haines, “where are we now?”; mills, romeign-stout, campbell, and koster, “results from the young children, new media, and libraries survey”; neuman, moland, and celano, “bringing literacy home”; goulding, shuker, and dickie, “media mentoring through digital storytimes.” http://everychildreadytoread.org/wp-content/uploads/2017/11/2017-ecrr-report-final http://everychildreadytoread.org/wp-content/uploads/2017/11/2017-ecrr-report-final https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20libraries%20serving%20youth_final_no%20graphics.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20libraries%20serving%20youth_final_no%20graphics.pdf information technology and libraries june 2023 technology integration in storytime programs 19 cahill, ingram, and joo 20 goulding, shuker, and dickie, “media mentoring through digital storytimes” in proceedings of ifla wlic (2017), https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf. 21 institute of museum and library services, public libraries survey, 2016, https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey. 22 maria cahill, soohyung joo, mary howard, and suzanne walker, “we’ve been offering it for years, but why do they come? the reasons why adults bring young children to public library storytimes,” libri 70, no. 4 (2020), 335–44; peter andrew de vries, “parental perceptions of music in storytelling sessions in a public library,” early childhood education journal 35, no. 5 (2008): 473–78; goulding and crump, “developing inquiring minds.” 23 courtney k. blackwell, ellen wartella, alexis r. lauricella, and michael b. robb, technology in the lives of educators and early childhood programs: trends in access, use, and professional development from 2012 to 2014 (chicago: northwestern school of communication, 2015). 24 campbell and koester, “new media in youth librarianship.” 25 campbell, haines, koester, stoltz, media mentorship in libraries serving youth; prachi e. shah et al., “daily television exposure, parent conversation during shared television viewing and socioeconomic status: associations with curiosity at kindergarten,” plos one 16, no. 10 (2021), e0258572. 26 nelson and braafladt, technology and literacy. 27 roger a. stewart et al., “enhanced storytimes: effects on parent/caregiver knowledge, motivation, and behaviors,” children and libraries 12, no. 2 (2014): 9–14; scott graham and andré gagnon, “a quasi-experimental evaluation of an early literacy program at the regina public library/évaluation quasi-expérimentale d'un programme d'alphabétisation des jeunes enfants à la bibliothèque publique de regina,” canadian journal of information and library science 37, no. 2 (2013): 103–21. 28 maria cahill and erin ingram, “instructional asides in public library storytimes: mixed methods analyses with implications for librarian leadership,” journal of library administration 61, no. 4 (2021): 421–38. 29 leigh disney and gretchen geng, “investigating young children’s social interactions during digital play, early childhood education journal (2021): 1–11. 30 hernandez, markovitz, estrera, and kelly, “the uses of technology”; karen daniels et al., “early years teachers and digital literacies: navigating a kaleidoscope of discourses,” education and information technologies 25, no. 4 (2020): 2415–26. 31 goulding, shuker, and dickie, “media mentoring through digital storytimes.” 32 neuman, moland, and celano, “bringing literacy home,” 58. 33 daniels et al., “early years teachers and digital literacies.” https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey information technology and libraries june 2023 technology integration in storytime programs 20 cahill, ingram, and joo 34 goulding, shuker, and dickie, “media mentoring through digital storytimes.” 35 association for library service to children, competencies for librarians serving children in libraries (2020) https://www.ala.org/alsc/edcareeers/alsccorecomps; american library association, code of ethics of the american library association (2021), https://www.ala.org/tools/ethics 36 jenna herdzina and alexis r. lauricella, “media literacy in early childhood report,” child development 101 (2020): 10; sara ayllon et al., digital diversity across europe: policy brief september 2021 (digigen project, 2021), https://www.digigen.eu/news/digital-diversityacross-europe-recommendations-to-ensure-children-across-europe-equally-benefit-fromdigital-technology/. 37 goulding and crump, “developing inquiring minds”; nancy l. kewish, “south euclid’s pilot project for two-year-olds and parents,” school library journal 25, no. 7 (1979): 93–97. 38 auxier, anderson, perrin, and turner, parenting children; rideout and robb, the common sense census. 39 neuman, moland, and celano, “bringing literacy home”; goulding, shuker, and dickie, “media mentoring through digital storytimes.” 40 koester, ed., young children, new media, and libraries. 41 us department of education, office of educational technology, policy brief on early learning and use of technology, 2016, https://tech.ed.gov/files/2016/10/early-learning-tech-policybrief.pdf. 42 herdzina and lauricella, “media literacy in early childhood report.” 43 rebekah willett, june abbas, and denise e. agosto, navigating screens (blog), https://navigatingscreens.wordpress.com. https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://tech.ed.gov/files/2016/10/early-learning-tech-policy-brief.pdf https://tech.ed.gov/files/2016/10/early-learning-tech-policy-brief.pdf https://navigatingscreens.wordpress.com/ abstract introduction previous studies of technology use in children’s programs and storytimes technology best practices: joint media engagement and media mentorship study objectives method participants procedure results what are the purposes for using technology in storytimes? what are factors associated with adopting technology in storytimes? what are barriers to integrating technology in storytimes? discussion purposes experiential learning supports and barriers provider participant library system content conclusion acknowledgement endnotes redesigning research guides: lessons learned from usability testing at the university of memphis article redesigning research guides lessons learned from usability testing at the university of memphis jessica mcclure, carl hess, and david marsicano information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.15535 jessica mcclure (jmcclre3@memphis.edu) is the virtual instruction librarian, university of memphis. carl hess (carl.hess@memphis.edu) is the undergraduate success librarian, university of memphis. david marsicano (dmmrscno@memphis.edu) is a library assistant ii, university of memphis. © 2023. abstract at the university of memphis, a team of librarians and library staff formed the research guides redesign team (rgrt) to redesign, organize, and evaluate the university libraries’ (ul) research guides. the purpose of the project was to ensure that the new design of the research guides homepage was intuitive to use. while it is impossible to ensure absolute usability for every user, this usability study attempts to eradicate the most common interface issues in community experiences at the university of memphis. the rgrt conducted usability testing to evaluate the effectiveness of the new standardized format, grouped headings, and the appearance of the interface. the rgrt worked within the limitations of springshare’s software to create the design and then chose five users to complete various task scenarios. upon analysis of the users’ ability to complete the tasks, the rgrt discovered that overall, the design was effective, but they did make a few minor changes. this study describes the process and includes the original design, the new design, edits made after usability testing was conducted, and plans for future testing. introduction in the spring of 2021, the research guides redesign team (rgrt) assembled to establish a new workflow for maintaining and designing research guides at the university of memphis. previously, all university library (ul) faculty (hereafter “librarians”) were tasked with creating and maintaining research guides for academic programs and courses in their liaison areas in addition to their other duties. all librarians took part in the liaison program but did not have extensive training in guide creation. as a result, many guides consisted of lists of resources without instructions on how to use them, did not encourage information-seeking behavior, or did not cover basic information literacy skills.1 while some librarians had administrative privilege over the research guides software (springshare’s libguides product), no one was tasked with reviewing or evaluating the guides holistically. consequently, librarians were creating guides for their liaison departments without considering whether the information they included was covered in existing guides, and many were not regularly updated. the rgrt was created to solve these issues. it allowed for a more centralized workflow for creating and maintaining guides. it comprised a smaller group of librarians and library staff volunteering to take the baton from library liaisons in creating, updating, and deleting guides based on the needs of all patrons at the university of memphis. the role of liaison librarians in the process changed to providing content expertise for their assigned departments’ guides as needed. mailto:jmcclre3@memphis.edu mailto:carl.hess@memphis.edu mailto:dmmrscno@memphis.edu information technology and libraries september 2023 redesigning research guides 2 mcclure, hess, and marsicano the rgrt selected the redesign of the research guides homepage as the first project. the homepage (illustrated in figs. 1, 2, and 3) had become an unwieldy list, organized by a mix of topic headings and headings reflecting university college and school names. users had to scroll through this list to find guides, which the rgrt decided was clunky and inefficient. there was a search bar for locating guides, but it was in the upper right of the page away from the guide list (see fig. 1–3). despite this long list of guides, users were often choosing between very similar guides with redundant content and long lists of undifferentiated resources, and if a u ser found a guide, it was not guaranteed to be up to date and often did not follow a consistent format. improving this experience became the focus of the rgrt. while the rgrt considered starting with usability testing on the old homepage, it determined this was not necessary. the guides included many of the common usability issues in libguides as described by thorngate and hoden including “inconsistent design from page to page and from guide to guide,” “cluttered pages lacking a focal point,” and “too much content, not appropriately scoped to the task at hand”; and anecdotally, librarians and staff in the past had expressed frustrations with trying to find useful guides to share with patrons while navigating through the clutter of the homepage.2 further, the rgrt began the project of homepage and guide redesign early in the covid-19 pandemic. this would have required recruiting and running usability testing remotely, since the campus was primarily engaging in virtual instruction, adding difficulty to the project. the pandemic virtual instruction environment also gave the rgrt an added sense of urgency to quickly design a more usable research guide environment, since the ul’s digital learning objects and digital services had become the primary tool for patrons to interact with the ul. performing virtual usability testing was a less efficient use of time when a wide variety of usability issues with the old setup was already known. the homepage redesign project took place simultaneously with a project to reorganize the subject guides, consolidate related academic programs to limit the amount of redundant content, focus more on teaching users how to use the resources recommended in the guides, and create a more manageable number of guides both for users and for the rgrt to update. all guides now have a consistent format and include instructional materials (videos, tutorials, etc.) to teach users how to use the resources in each guide. this study describes the creation and testing conducted on the new design of the r esearch guides homepage and the redesigned subject guides. to assess the success of the rgrt’s work on the new homepage, the researchers’ objectives for this study were to: • identify whether users could effectively navigate the new homepage structure, inclu ding categorization and search features. • determine whether the new structure allowed users to identify useful resources with a minimum of attempts and clicks. • demonstrate that the updated design facilitated and encouraged users’ informationseeking behavior. information technology and libraries september 2023 redesigning research guides 3 mcclure, hess, and marsicano figure 1. the research guides homepage before the redesign, pt. 1. information technology and libraries september 2023 redesigning research guides 4 mcclure, hess, and marsicano figure 2. the research guides homepage before the redesign, pt. 2. information technology and libraries september 2023 redesigning research guides 5 mcclure, hess, and marsicano figure 3. the research guides homepage before the redesign, pt. 3. literature review online learning is an established format of higher education and there is an increase in the number of students attending classes online.3 this growth has implications for the use of library digital materials. a primary research group study of online library services revealed that 10 out of 37 academic libraries reported that distance students utilized research guides and tutorials more than traditional students.4 another stated the use of research guides increased by 56 percent in 2020.5 colorado christian university also used the covid-19 pandemic as an opportunity to redesign its websites and research guides.6 ghaphery and white’s study states that “libguides are as commonplace as books” in academic libraries, reiterating a need for gathering more usage statistics on them.7 based on this data, increasing the effectiveness and usefulness of these library digital materials is imperative to expand the scope of the library’s reach to all users, not just those on ground. it is clear research guides are one of the primary ways academic libraries share their resources with the community and that ongoing maintenance and attention is essential, particularly to online users. the authors of this study wanted to know how topic-based navigation would work for the university of memphis and wanted to conduct usability testing based on this design. the value of usability testing in libraries is well documented.8 at california state university, vargas discovered that using a topic-based navigation design is user-friendly and helps patrons “get used to” the site layout.9 furthermore, polger observed that the website and not the building is the user’s first interaction with the space and is often deemed the “face” of the library.10 designers of library web pages must organize information in a way that is intuitive for all users. this includes faculty, staff, undergraduate and graduate students, visiting researchers, and those with disabilities. while it is an impossible task to ensure absolute usability for every user, this usability study attempts to eradicate the most common interface issues in community experiences at the university of memphis. information technology and libraries september 2023 redesigning research guides 6 mcclure, hess, and marsicano redesigning research guides as a more holistic way of sharing resources with faculty and students is an effective way of designing. in the past, many research guides were lists of key reference sources; catalogs for finding books, theses, and dissertations; and periodical databases for finding journal articles and news articles—without much guidance on how to use them.11 similarly, the authors of a five-step usability study with research guides discovered that a significant challenge was that the “purpose and nature of research guides was not readily evident to users” and the library jargon used was confusing to users as well.12 an example of a more holistic design would be one guiding students through the research process.13 replacing library vocabulary with more commonly used language is also recommended.14 bowen et al. stated, “consequently, the questions of what to put on a guide and how best to arrange that material have driven an entire research agenda for the better part of a decade.” their study has similar objectives to this one regarding the ease of use and navigation menus. they elected to use an a/ b method of usability testing, presenting students with two different versions of the same page. their study was an excellent example of providing a format that would assist users with comprehending the content instead of having to understand the design itself.15 for this study, the rgrt adopted the five-user assumption commonly practiced among user experience professionals. the five-user assumption posits that, as you run usability testing, each additional user after the first will find usability problems already discovered by an earlier user in addition to discovering new problems. after the fifth user, the amount of repeat problems found will be so much greater than the number of new problems discovered as to render further testing an inefficient use of time and resources.16 borsci et al. addressed and tested the debate concerning the five-user assumption and its established adoption. they observed that, while it is widely accepted, human-computer interaction professionals are split into two camps: those who apply and accept the model and those who are critical of it but apply it nevertheless. they proposed an alternative model called the “grounded procedure,” but still recommended the five-user assumption as a starting point.17 additionally, the rgrt adopted the use of task scenarios for usability testing to gain valuable insights from users. rather than setting goals such as “find a website on citation,” it is more helpful to design a task that presents context. therefore, a task scenario lets the user know why they need to find something and not simply how. mccloskey recommended designing tasks based on what a user would realistically do when visiting the website. she recommended making the task “actionable” to set up the context of why a user would need to visit the site.18 the success of mccloskey’s recommended actionable tasks are observed in many other usability studies.19 redesign process to create the design, the rgrt began by independently investigating other universities ’ research guides homepage designs and reporting back to the group on their findings. the rgrt was looking for a model for improving categorization and guides, limiting the use of long lists, and foregrounding search. after investigating a few candidates, the group unanimously agreed that the university of arizona’s research guides homepage (see fig. 4) should be the model.20 the design was effective because the content was not excessive, scrolling was unnecessary, and headings were descriptive. the visual aspects were not purely decorative and helped the patron understand the purpose of the content. the rgrt was given approval from the university of arizona to model the new research guides homepage after theirs.21 in this model, a topic-based navigation design would organize sections into research help guides, subject guides, topic guides, and course guides. every guide would only be included in one of these four categories. research help guides covered information literacy and library use skills, subject guides covered major academic https://libguides.library.arizona.edu/library-guides information technology and libraries september 2023 redesigning research guides 7 mcclure, hess, and marsicano programs, and course guides were for specific classes. topic guides was a catch -all area for guides that did not easily fit in the other categories. for each category, a smaller list of the most highly used guides in that category, based on springshare user statistics and to be updated regularly, would be displayed on the homepage, followed by a link to a complete list of guides in the category. at the request of librarians at the university of memphis’s three branch libraries (health sciences, lambuth, and music), guides connected to these branches were also organized together. the bottom of the homepage was dedicated to an ask a librarian help box. figure 4. university of arizona research guides homepage. to create the redesigned subject guides, the rgrt identified a course guide created by the university of memphis’s instruction curriculum coordinator as the design standard. it included tutorials, videos, and other instructional materials guiding students on how to use the linked content.22 each guide would have a standardized set of content pages based on this model, including redirect pages for introduction to research and writing help guides. the rgrt created a list of subject guides and used it to identify redundant guides that could be consolidated into these new subject guides. each guide would be a broad subject area (e.g., business & economics, social sciences), and the main page would cover commonly used resources such as major databases for that area. more specialized resources would be included on topic pages for subareas reflecting specific academic programs (e.g., accountancy, psychology). in the spring of 2021, the rgrt completed the design for the homepage (see fig. 5). it was populated with the new subject guides (see fig. 6 for an example) that followed a standard format (see fig. 7 for the page-level organization). information technology and libraries september 2023 redesigning research guides 8 mcclure, hess, and marsicano figure 5. initial redesigned research guides homepage. figure 6. business & economics subject guide homepage. information technology and libraries september 2023 redesigning research guides 9 mcclure, hess, and marsicano figure 7. business & economics subject guide navigation. usability testing this study was exempt from irb (institutional review board) approval and qualitative data using task scenarios was gathered to gain insights on the new research guides homepage and its linked pages. task scenarios 1, 2, 3, and 5 were designed to reflect real user needs and required access to a variety of different guides, which would involve either using the guides homepage structure or using the libguides search. task scenario 4 was designed to see if users identified the ask a librarian section at the bottom. with a laptop and a series of task scenarios (see table 1), a usability testing center was set up at the university of memphis’s starbucks. for recruitment, two of the authors asked various students in line and socializing at tables to participate. a $5 starbucks gift card was awarded to participants. data from five recruited participants was gathered. before beginning the testing session, one of the authors acted as facilitator, explaining the testing was to ensure the website was intuitive and set up in a way that is helpful. she assured the participants that their knowledge was not being tested. instead, the goal was to make sure that the web page was user-friendly and accessible. the participants were encouraged to use a think-aloud method when completing tasks, so the author acting as note-taker could transcribe the thought process of each participant.23 information technology and libraries september 2023 redesigning research guides 10 mcclure, hess, and marsicano table 1. task scenarios task #1 you need to find information on mla citation. how would you use this page to find that information? task #2 you are in a course titled soci 4420: racial inequality. how would you use this page to find information on that course? task #3 how would you use this page to find historical newspapers to use in a research project? task #4 if you were struggling to access a library resource, how would you use this page to solve the problem? task #5 if you wanted a video explaining how to find articles to use in your research assignment, how would you find it? the same series of task scenarios was given to each participant to complete. the authors began each testing session on a laptop with the research guides homepage already open. the main goal of each testing session was to observe if each participant could successfully complete task scenarios using the homepage and its links to external and internal web pages. the secondary goal was for participants to complete the task scenario with minimal attempts and limited clicks. when a task was completed, the facilitator observed how many attempts were made and if the participant was successful or unsuccessful. a successful attempt was documented if the participant completed the task using the information on the homepage and its links. a successful attempt was also documented if the user did not complete a task but was able to use the page to demonstrate adequate information-seeking behavior (e.g., using chat or the ask a librarian box to seek answers). if a participant did not complete a task, while also not utilizing the homepage to seek answers, then the task was marked as unsuccessful. results out of the 25 total task scenarios conducted during usability testing, 20 were successfully completed and 5 failed (see table 2). a quick summary of each participant’s experiences with completing the task scenarios is provided below. more detailed tables and notes for each participant are included in appendix a. table 2. task scenarios completed or failed by participant task scenario 1 task scenario 2 task scenario 3 task scenario 4 task scenario 5 participant 1 completed completed completed failed completed participant 2 completed completed completed completed completed participant 3 completed completed completed completed completed participant 4 completed completed failed completed completed participant 5 failed failed completed failed completed information technology and libraries september 2023 redesigning research guides 11 mcclure, hess, and marsicano participant #1 successfully completed four out of five task scenarios. they used the topic-based navigation to complete tasks 1, 2, and 5, while they initially failed task scenario 3 using the topic based navigation before completing it using the search bar. they failed task scenario 4, failing to understand the “access a resource” language in the task scenario and stating that they would email someone for help. participant #2 completed all five of the task scenarios. they used topic-based navigation to complete tasks 1 and 2. for task scenario 3, the participant used the search bar. though there was again some confusion over the “access a resource” language used in task 4, the participant stated they would use chat or contact using the information in the ask a librarian box to complete the task. to complete task 5, the participant scanned the homepage and found the library help videos link. participant #3 successfully completed all five task scenarios but required multiple attempts to complete tasks 3 and 4. ultimately tasks 1, 2, 3, and 5 were completed using topic-based navigation. when unsuccessfully clicking various links in an attempt to complete task 4, the participant stated they would contact using the information in the ask a librarian box. participant #4 successfully completed four out of five task scenarios. tasks 1, 2, and 5 were completed using the topic-based navigation, while task scenario 4 was completed by scrolling to the ask a librarian section of the homepage. the participant tried to use the topic-based navigation for task 3, but they found a guide on contemporary, not historical, newspapers and failed the task. participant #5 completed two out of the five task scenarios. task 3 was completed using the search bar and finding an faq on historical newspapers, and task 5 was completed using the topic-based navigation. for task scenario 1, the participant left the research guides homepage using the logo in the banner at the top of the page, which took them to the university libraries website. they had to be returned to the guides homepage for the following tasks. the participant tried to use the topic-based navigation three times to complete task 2 before giving up, and they also tried topic-based navigation for task scenario 4 before giving up. discussion out of the 20 successfully completed task scenarios during the usability testing, 15 task scenarios were completed by users browsing the research guides homepage and making use of the headings and layout. these results demonstrate that the topic-based navigation design separating each section by research help, subject guides, topic guides, and course guides proved effective. users were regularly able to intuit the correct area to look under with minimal errors. scrolling was limited and clicks were few. the search bar was used less frequently, to the surprise of the rgrt. only three task scenarios were successfully completed by using the search. participants often clicked on random pages instead of using the search bar, and even when the search bar was used, it was not always immediately identified by the users. the authors noted that users would skip directly to the four category headings. the success of the ask a librarian section was mixed. three of the five users successfully completed task scenario 4. part of this was that some users were not familiar with the phrase “report access issues,” though once one of the authors explained further, many of them were still information technology and libraries september 2023 redesigning research guides 12 mcclure, hess, and marsicano unsure of who to ask or where to report issues. additionally, only two out of the five participants asked for help when they could not solve other task scenarios. the remaining three gave up or moved onto the next question regardless of the availability of chat or the ask a librarian box. limitations four out of five participants were undergraduate students. an ideal study would have had participants that included an additional graduate student and perhaps a faculty/staff member from the university of memphis. an additional limitation was that the disability status of the participants was unknown, so the accessibility of the design was not formally tested beyond running the design through a web accessibility checker. lastly, the original design was not tested for user errors and assumptions were made based on the rgrt’s observations and usage statistics in springshare. edits were made after usability testing occurred; therefore, improvements in usability between the original homepage design and the revision cannot be shown directly. post-test revisions since most of the task scenarios were completed with successful results, the new design for the research guides homepage proved usable overall. still, when the authors presented the findings to the rest of the rgrt at one of their regular meetings, the group discussed ideas for addressing issues identified in the testing. as a result, some minor changes were made: • the four categories (research help, topic guides, course guides, and subject guides) were moved to the top of the page with the search bar below it. • a graphic was created to draw attention to the search bar. • the music, health sciences, and lambuth campus guides were made more visible by giving each link a university of memphis brand-approved blue background with white lettering. the final product after usability testing was completed is shown in figure 8. the minor changes were not subject to usability testing at the time to test whether they improved the usability of the homepage. further usability testing on the homepage is in the planning stages as the authors write this article. information technology and libraries september 2023 redesigning research guides 13 mcclure, hess, and marsicano figure 8. post-testing redesigned research guides homepage. conclusion these results report one of the three usability testing results conducted on the research guides homepage. the rgrt elected to do three small tests with five users each. these tests were conducted in the same manner as the first, with the authors asking users to complete tasks in a starbucks on a laptop. the only difference was the task scenarios. the rgrt designed new task scenarios for each test to reflect changes made to the homepage based on previous user experiences. for example, there were questions designed to test the relocation of the search bar and the branch library (music, lambuth, and health sciences) headings. a list of task scenarios from the other two tests are included in appendix b and appendix c. this study proves it is imperative that library web page designers familiarize themselves with the users’ information-seeking behavior in their community to design pages to be intuitive and easy to use. while perhaps the most valuable data gleaned from this study was ensuring an effective library web page design, the rgrt also learned that streamlining a workflow and making information technology and libraries september 2023 redesigning research guides 14 mcclure, hess, and marsicano maintenance of research guides a priority is essential for the success of enhancing an academic library’s online presence. furthermore, it is a goal of the rgrt to conduct usability testing of its websites on a quarterly basis, not just on the homepage, but many of its subpages and the design of the guides themselves. the value of usability testing to the university library is exemplified in this study. now, anytime a crucial change is made to research guides, the rgrt is equipped to conduct usability testing to verify modifications and introduce new designs. information technology and libraries september 2023 redesigning research guides 15 mcclure, hess, and marsicano appendix a. summary of partipant tests participant #1 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources guide → mla subsection. completed. the participant quickly found it without hesitation. participant stated when talking aloud that they did not understand what “mla” or “citation” meant. but the participant was able to find the information by scanning for the words provided. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed. the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: open educational resources guide. failed. attempt 2: used the search bar → historical newspapers guide. completed. the participant paused to work through this task. the participant did not notice the search bar during the first attempt. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: participant did not know where to click. failed. participant did not appear to understand phrase “access a resource.” participant stated, “i don’t know … i would just email someone.” task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → articles subsection. completed. the participant found the link to report an access issue while performing this task. participant #2 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources → mla subsection. completed. the participant quickly found it without hesitation. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed.  the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: used the search bar → historical newspapers guide. completed. the participant took more time with this task. ultimately, they relied on the search bar to complete the task. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: chat and ask a librarian. completed. the participant gave up and said they would rely on chat and the ask a librarian function on the web page to report an access issue. this attempt was documented as successful since they used the page to exhibit sufficient information-seeking behavior. task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → articles subsection. completed. the participant quickly found it without hesitation. information technology and libraries september 2023 redesigning research guides 16 mcclure, hess, and marsicano participant #3 (graduate student and staff member) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources → mla subsection. completed. the participant quickly found it without hesitation. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed.   the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: library help videos. failed. attempt 2: all research help guides → historical newspapers guide. completed. when not finding what they were looking for, they switched tactics and selected the all research help guides link. from there, they found the historical newspapers guide from an alphabetical list. the only user who completed this task using the navigation menu headings on the homepage. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: could not find it by clicking various links. failed. attempt 2: ask a librarian. completed. the student clicked on various links on the homepage looking for the phrase “access a resource” (it was impossible for the note-taker to keep up with the various links the participant tried). eventually the student stated they would rely on the ask a librarian box task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: online tutorials and help guides → library help videos → articles subsection. completed. “i would use the libraries search bar to look for articles. i wouldn’t need a video.” participant #4 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources → mla subsection. completed. the participant quickly found it without hesitation. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed. the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: all topic guides → news literacy & news sources guide. failed. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: scrolled to the ask a librarian box → report access issue. completed. task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → journals subsection. completed. information technology and libraries september 2023 redesigning research guides 17 mcclure, hess, and marsicano participant #5 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: university libraries homepage → searched kinesiology → selected an ebook on kinesiology. failed. immediately left the research guides homepage by selecting banner at the top. the participant did not understand the question. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: library help videos. failed. attempt 2: writing help guide. failed. attempt 3: humanities guide → courses and topics subsection → books and ebooks. failed. the participant gave up and wanted to move to the next question. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: used the search bar → faq: can i access historical issues of the new york times? → proquest historical newspapers: the new york times with index. completed. the participant did not notice the links to the historical newspapers guide once using the search bar. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: open educational resources guide. failed. the participant gave up quickly and appeared exasperated. task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → articles subsection. completed. information technology and libraries september 2023 redesigning research guides 18 mcclure, hess, and marsicano appendix b. task scenarios for second usability test task #1 you are enrolled in a course where you are tasked with writing an informative paper on the musical instrument of your choice; how would you use this page to locate that information? task #2 you have heard about a service where you can borrow books and articles from other libraries. you remember it’s called interlibrary loan. how would you use this page to use that service? task #3 where would you go to find historical newspapers? task #4 where would you go to find support for lgbtqia+ students? task #5 in which category (research help, subject guides, topic guides, course guides) would you look for: a) a guide to the natural and physical sciences? (subject) b) a guide on news literacy and news sources? (topic) information technology and libraries september 2023 redesigning research guides 19 mcclure, hess, and marsicano appendix c. task scenarios for third usability test task #1 where would you look to find information on the history of the library’s lambuth branch? task #2 what would you search through information about the health sciences library? task #3 how would you use this page to find primary sources? task #4 you are in a class where the instructors ask you to cite in apa format. you are unfamiliar with this citation style and seek assistance from the library. how would you use this page to find information on apa citation? task #5 you are in a criminology course and need to find crime data to include in a project. how would you use this page to find that information? endnotes 1 nigel ford, “what is information behaviour and why do we need to know about it?,” in introduction to information behaviour (london: facet publishing, 2015), 14. 2 sarah thorngate and allison hoden, “exploratory usability testing of user interface options in libguides 2,” college & research libraries 78, no. 6 (2017): 846, https://doi.org/10.5860/crl.78.6.844. 3 peggy c. holzweiss, barbara polnick, and fred c. lunenberg, “online in half the time: a case study with online compressed courses,” innovative higher education, 44 no. 4 (2019): 299– 315, https://doi.org/10.1007/s10755-019-09476-8; andrew j. magda, david capranos, and carol b. aslanian, online college students 2020: comprehensive data on demands and preferences (louisville, ky: wiley education), https://universityservices.wiley.com/wpcontent/uploads/2020/06/ocs2020report-online-final.pdf; julia e. seaman, i. elaine allen, and jess seaman, grade increase: tracking distance education in the united states (babson park, ma: babson survey research group, 2018), https://eric.ed.gov/?id=ed580852. 4 primary research group, the survey of library services for moocs: blended and distance learning programs, 2016 ed. (new york: primary research group inc, 2015), 24, https://digital.auraria.edu/files/pdf?fileid=149c1b4a-a35d-48f1-81d8-5f47bb757ecf 5 ruth sara connell, lisa c. wallis, and david comeaux, “the impact of covid-19 on the use of academic library resources,” information technology & libraries 40, no. 2 (2021): 8, https://doi.org/10.6017/ital.v40i2.12629. 6 oliver schulz, “library support for distance learning at colorado christian university,” theological librarianship 13, no. 2 (2020): 20–22, https://doi.org/10.31046/tl.v13i2.1938. 7 jimmy ghaphery and erin white, “library use of web-based research guides,” information technology & libraries 31, no. 1 (2012): 23, https://doi.org/10.6017/ital.v31i1.1830. 8 amy e. g. barker and ashley t. hoffman, “student-centered design: creating libguides students can actually use,” college and research libraries 82, no. 4 (2021): 75–91, https://doi.org/10.5860/crl.82.1.75; danielle a. becker and lauren yannotta, “modeling a library website redesign process: developing a user-centered website through usability https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1007/s10755-019-09476-8 https://universityservices.wiley.com/wp-content/uploads/2020/06/ocs2020report-online-final.pdf https://universityservices.wiley.com/wp-content/uploads/2020/06/ocs2020report-online-final.pdf https://eric.ed.gov/?id=ed580852 https://digital.auraria.edu/files/pdf?fileid=149c1b4a-a35d-48f1-81d8-5f47bb757ecf https://doi.org/10.6017/ital.v40i2.12629 https://doi.org/10.31046/tl.v13i2.1938 https://doi.org/10.6017/ital.v31i1.1830 https://doi.org/10.5860/crl.82.1.75 information technology and libraries september 2023 redesigning research guides 20 mcclure, hess, and marsicano testing,” information technology & libraries 32, no. 1 (2013): 6–22, https://doi.org/10.6017/ital.v32i1.2311; suzanna conrad and christy sevens, “am i on the library website?: a libguides usability study,” information technology & libraries 38, no. 3 (2019): 49–81, https://doi.org/10.6017/ital.v38i3.10977; amanda donovan, “building the website of the future in-house,” computers in libraries 42, no. 2 (2022): 4–8, https://www.infotoday.com/cilmag/mar22/index.shtml. 9 isabel vargas ochoa, “navigation design and library terminology: findings from a usercentered usability study on a library website,” information technology & libraries 39, no. 4 (2020): 12, https://doi.org/10.6017/ital.v39i4.12123. 10 mark aaron polger, “student preferences in library website vocabulary,” library philosophy & practice (2011): 1, https://digitalcommons.unl.edu/libphilprac/618/. 11 jon c. giullian and ernest a. zitser, “beyond libguides: the past, present, and future of online research guides,” slavic and east european information resources 16, no. 4 (2015): 170–80, https://doi.org/10.1080/15228886.2015.1094718. 12 ashley lierman, bethany scott, mea warren, and cherie turner, “testing for transition: evaluating the usability of research guides around a platform migration,” information technology and libraries 38, no. 4 (2019): 83, https://doi.org/10.6017/ital.v38i4.11169. 13 yoo young lee and m. sara lowe, “building positive learning experiences through pedagogical research guide design,” journal of web librarianship 12, no. 4 (2018): 205–31, https://doi.org/10.1080/19322909.2018.1499453. 14 kimberly l. o’neill and brooke a. guilfoyle, “sign, sign, everywhere a sign: what does ‘reference’ mean to academic library users?,” the journal of academic librarianship 41, no. 4 (2015): 386–93, https://doi.org/10.1016/j.acalib.2015.05.007. 15 aaron bowen, jake ellis, and barbara chaparro, “long nav or short nav?: student responses to two different navigational interface designs in libguides version 2,” the journal of academic librarianship 44, no. 3 (2018): 391–403, https://doi.org/10.1016/j.acalib.2018.03.002. 16 jakob nielsen, “why you only need to test with 5 users,” accessed may 22, 2023, https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ 17 simone borsci et al., “reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation,” acm transactions on computer-human interaction (tochi) 20, no. 5 (2013): 11–18, https://doi.org/10.1145/2506210. 18 marieke mccloskey, “turn user goals into task scenarios for usability testing,” accessed may 22, 2023, https://www.nngroup.com/articles/task-scenarios-usability-testing/. 19 olga torres-hostench, joss moorkens, sharon o’brien, and joris vreeke, “testing interaction with a mobile mt post-editing app,” translation & interpreting 9, no. 2 (2017): 138–50, https://doi.org/10.12807/ti.109202.2017.a09; jure trilar, tjaša sobočan, and emilija stojmenova duh, “family-center design: interactive performance testing and user interface evaluation of the slovenian edavki public tax portal,” sensors 21, no. 15 (2021): 5161, https://doi.org/10.3390/s21155161; scott uhl, “applying user-centered design to discovery https://doi.org/10.6017/ital.v32i1.2311 https://doi.org/10.6017/ital.v38i3.10977 https://www.infotoday.com/cilmag/mar22/index.shtml https://doi.org/10.6017/ital.v39i4.12123 https://digitalcommons.unl.edu/libphilprac/618/ https://doi.org/10.1080/15228886.2015.1094718 https://doi.org/10.6017/ital.v38i4.11169 https://doi.org/10.1080/19322909.2018.1499453 https://doi.org/10.1016/j.acalib.2015.05.007 https://doi.org/10.1016/j.acalib.2018.03.002 https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ https://doi.org/10.1145/2506210 https://www.nngroup.com/articles/task-scenarios-usability-testing/ https://doi.org/10.12807/ti.109202.2017.a09 https://doi.org/10.3390/s21155161 information technology and libraries september 2023 redesigning research guides 21 mcclure, hess, and marsicano layer evaluation in the law library,” legal reference services quarterly 38, no. 1/2 (2019): 30–63, https://doi.org/10.1080/0270319x.2019.1614373. 20 jeremiah paschke-wood, ellen dubinsky, and leslie sult, “creating a student-centered alternative to research guides: developing the infrastructure to support novice learners,” in the library with the lead pipe (october 2020), https://www.inthelibrarywiththeleadpipe.org/2020/student-centered-alternative-researchguides/. 21 university libraries, “research: by course, subject, or topic,” accessed may 22, 2023, https://libguides.library.arizona.edu/library-guides. 22 university libraries, “engl 1020: english composition 1020,” accessed may 22, 2023, https://libguides.memphis.edu/engl1020. 23 jakob nielsen, “thinking aloud: the #1 usability tool,” accessed may 22, 2023, https://www.nngroup.com/articles/thinking-aloud-the-1-usability-tool/. https://doi.org/10.1080/0270319x.2019.1614373 https://www.inthelibrarywiththeleadpipe.org/2020/student-centered-alternative-research-guides/ https://www.inthelibrarywiththeleadpipe.org/2020/student-centered-alternative-research-guides/ https://libguides.library.arizona.edu/library-guides https://libguides.memphis.edu/engl1020 https://www.nngroup.com/articles/thinking-aloud-the-1-usability-tool/ abstract introduction literature review redesign process usability testing results discussion limitations post-test revisions conclusion appendix a. summary of partipant tests participant #1 (undergraduate student) participant #2 (undergraduate student) participant #3 (graduate student and staff member) participant #4 (undergraduate student) participant #5 (undergraduate student) appendix b. task scenarios for second usability test appendix c. task scenarios for third usability test endnotes 8 information technology and libraries | march 2010 t. michael silver monitoring network and service availability with open-source software silver describes the implementation of a monitoring system using an open-source software package to improve the availability of services and reduce the response time when troubles occur. he provides a brief overview of the literature available on monitoring library systems, and then describes the implementation of nagios, an open-source network monitoring system, to monitor a regional library system’s servers and wide area network. particular attention is paid to using the plug-in architecture to monitor library services effectively. the author includes example displays and configuration files. editor’s note: this article is the winner of the lita/ex libris writing award, 2009. l ibrary it departments have an obligation to provide reliable services both during and after normal business hours. the it industry has developed guidelines for the management of it services, but the library community has been slow to adopt these practices. the delay may be attributed to a number of factors, including a dependence on vendors and consultants for technical expertise, a reliance on librarians who have little formal training in it best practices, and a focus on automation systems instead of infrastructure. larger systems that employ dedicated it professionals to manage the organization’s technology resources likely implement best practices as a matter of course and see no need to discuss them within the library community. in the practice of system and network administration, thomas a. limoncelli, christine j. hogan, and strata r. chalup present a comprehensive look at best practices in managing systems and networks. early in the book they provide a short list of first steps toward improving it services, one of which is the implementation of some form of monitoring. they point out that without monitoring, systems can be down for extended periods before administrators notice or users report the problem.1 they dedicate an entire chapter to monitoring services. in it, they discuss the two primary types of monitoring—real-time monitoring, which provides information on the current state of services, and historical monitoring, which provides long-term data on uptime, use, and performance.2 while the software discussed in this article provides both types of monitoring, i focus on real-time monitoring and the value of problem identification and notification. service monitoring does not appear frequently in library literature, and what is written often relates to single-purpose custom monitoring. an article in the september 2008 issue of ital describes the development and deployment of a wireless network, including a perl script written to monitor the wireless network and associated services.3 the script updates a webpage to display the results and sends an e-mail notifying staff of problems. an enterprise monitoring system could perform these tasks and present the results within the context of the complete infrastructure. it would require using advanced features because of the segregation of networks discussed in their article but would require little or no extra effort than it took to write the single-purpose script. dave pattern at the university of huddersfield shared another perl script that monitors opac functionality.4 again, the script provided a single-purpose monitoring solution that could be integrated within a larger model. below, i discuss how i modified his script to provide more meaningful monitoring of our opac than the stock webpage monitoring plug-in included with our opensource networks monitoring system, nagios. service monitoring can consist of a variety of tests. in its simplest form, a ping test will verify that a host (server or device) is powered on and successfully connected to the network. feher and sondag used ping tests to monitor the availability of the routers and access points on their network, as do i for monitoring connectivity to remote locations.5 a slightly more meaningful check would test for the establishment of a connection on a port. feher and sondag used this method to check the daemons in their network.6 the step further would be to evaluate a service response, for example checking the status code returned by a web server. evaluating content forms the next level of meaning. limoncelli, hogan, and chalup discuss end-to-end monitoring, where the monitoring system actually performs meaningful transactions and evaluates the results.7 pattern’s script, mentioned above, tests opac functionality by submitting a known keyword search and evaluating the response.8 i implemented this after an incident where nagios failed to alert me to a problem with the opac. the web server returned a status code of 200 to the request for the search page. users, however, want more from an opac, and attempts to search were unsuccessful because of problems with the index server. modifying pattern’s original script, i was able to put together a custom check command that verifies a greater level of functionality by evaluating the number of results for the known search. n software selection limoncelli, hogan, and chalup do not address specific t. michael silver (michael.silver@ualberta.ca) is an mlis student, school of library and information studies, university of alberta, edmonton, alberta, canada. monitoring network and service availability with open-source software | silver 9 how-to issues and rarely mention specific products. their book provides the foundational knowledge necessary to identify what must be done. in terms of monitoring, they leave the selection of an appropriate tool to the reader.9 myriad monitoring tools exist, both commercial and open-source. some focus on network analysis, and some even target specific brands or model lines. the selection of a specific software package should depend on the services being monitored and the goals for the monitoring. wikipedia lists thirty-five different products, of which eighteen are commercial (some with free versions with reduced functionality or features); fourteen are opensource projects under a general public license or similar license (some with commercial support available but without different feature sets or licenses); and three offer different versions under different licenses.10 von hagen and jones suggest two of them: nagios and zabbix.11 i selected the nagios open-source product (http:// www.nagios.org). the software has an established history of active development, a large and active user community, a significant number of included and usercontributed extensions, and multiple books published on its use. commercial support is available from a company founded by the creator and lead developer as well as other authorized solution providers. monitoring appliances based on nagios are available, as are sensors designed to interoperate with nagios. because of the flexibility of a software design that uses a plug-in architecture, service checks for library-specific applications can be implemented. if a check or action can be scripted using practically any protocol or programming language, nagios can monitor it. nagios also provides a variety of information displays, as shown in appendixes a–e. n installation the nagios system provides an extremely flexible solution to monitor hosts and services. the object-orientation and use of plug-ins allows administrators to monitor any aspect of their infrastructure or services using standard plug-ins, user-contributed plug-ins, or custom scripts. additionally, the open-source nature of the package allows independent development of extensions to add features or integrate the software with other tools. community sites such as monitoringexchange (formerly nagios exchange), nagios community, and nagios wiki provide repositories of documentation, plug-ins, extensions, and other tools designed to work with nagios.12 but that flexibility comes at a cost—nagios has a steep learning curve, and usercontributed plug-ins often require the installation of other software, most notably perl modules. nagios runs on a variety of linux, unix, and berkeley software distribution (bsd) operating systems. for testing, i used a standard linux server distribution installed on a virtual machine. virtualization provides an easy way to test software, especially if an alternate operating system is needed. if given sufficient resources, a virtual machine is capable of running the production instance of nagios. after installing and updating the operating system, i installed the following packages: n apache web server n perl n gd development library, needed to produce graphs and status maps n libpng-devel and libjpeg-devel, both needed by the gd library n gcc and gnu make, which are needed to compile some plug-ins and perl modules most major linux and bsd distributions include nagios in their software repositories for easy installation using the native package management system. although the software in the repositories is often not the most recent version, using these repositories simplifies the installation process. if a reasonably recent version of the software is available from a repository, i will install from there. some software packages are either outdated or not available, and i manually install these. detailed installation instructions are available on the nagios website, in several books, and on the previously mentioned websites.13 the documentation for version 3 includes a number of quick-start guides.14 most package managers will take care of some of the setup, including modifying the apache configuration file to create an alias available at http://server.name/nagios. i prepared the remainder of this article using the latest stable versions of nagios (3.0.6) and the plug-ins (1.4.13) at the time of writing. n configuration nagios configuration relies on an object model, which allows a great deal of flexibility but can be complex. planning your configuration beforehand is highly recommended. nagios has two main configuration files, cgi.cfg and nagios.cfg. the former is primarily used by the web interface to authenticate users and control access, and it defines whether authentication is used and which users can access what functions. the latter is the main configuration file and controls all other program operations. the cfg_file and cfg_dir directives allow the configuration to be split into manageable groupsusing additional recourse files and the object definition files (see figure 1). the flexibility offered allows a variety of different structures. i group network 10 information technology and libraries | march 2010 devices into groups but create individual files for each server. nagios uses an objectoriented design. the objects in nagios are displayed in table 1. a complete review of nagios configuration is beyond the scope of this article. the documentation installed with nagios covers it in great detail. special attention should be paid to the concepts of templates and object inheritance as they are vital to creating a manageable configuration. the discussion below provides a brief introduction, while appendixes f–j provide concrete examples of working configuration files. n cgi.cfg the cgi.cfg file controls the web interface and its associated cgi (common gateway interface) programs. during testing, i often turn off authentication by setting use_authentication to 0 if the web interface is not accessible from the internet. there also are various configuration directives that provide greater control over which users can access which features. the users are defined in the /etc/nagios/htpasswd.users file. a summary of commands to control entries is presented in table 2. the web interface includes other features, such as sounds, status map displays, and integration with other products. discussion of these directives is beyond the scope of this article. the cgi.cfg file provided with the software is well commented, and the nagios documentation provides additional information. a number of screenshots from the web interface are provided in the appendixes, including status displays and reporting. n nagios.cfg the nagios.cfg file controls the operation of everything except the web interface. although it is possible to have a single monolithic configuration file, organizing the configuration into manageable files works better. the two main directives of note are cfg_file, which defines a single file that should be included, and cfg_dir, which includes all files in the specified directory with a .cfg extension. a third type of file that gets included is resource.cfg, which defines various macros for use in commands. organizing the object files takes some thought. i monitor more than one hundred services on roughly seventy hosts, so the method of organizing the files was of more than academic interest. i use the following configuration files: n commands.cfg, containing command definitions n contacts.cfg, containing the list of contacts and associated information, such as e-mail address, (see appendix h) n groups.cfg, containing all groups—hostgroups, servicegroups, and contactgroups, (see appendix g) n templates.cfg, containing all object templates, (see appendix f) n timeperiods.cfg, containing the time ranges for checks and notifications all devices and servers that i monitor are placed in directories using the cfg_dir directive: servers—contains server configurations. each file includes the host and service configurations for a physical or virtual server. devices—contains device information. i create individual files for devices with service monitoring that goes beyond simple ping tests for connectivtable 1. nagios objects object used for hosts servers or devices being monitored hostgroups groups of hosts services services being monitored servicegroups groups of services timeperiods scheduling of checks and notifications commands checking hosts and services notifying contacts processing performance data event handling contacts individuals to alert contactgroups groups of contacts figure 1. nagios configuration relationships. copyright © 2009 ethan galstead, nagios enterprises. used with permission. monitoring network and service availability with open-source software | silver 11 ity. devices monitored solely for connectivity are grouped logically into a single file. for example, we monitor connectivity with fifty remote locations, and all fifty of them are placed in a single file. the resource.cfg file uses two macros to define the path to plug-ins and event handlers. thirty other macros are available. because the cgi programs do not read the resource file, restrictive permissions can be applied to them, enabling some of the macros to be used for usernames and passwords needed in check commands. placing sensitive information in service configurations exposes them to the web server, creating a security issue. n configuration the appendixes include the object configuration files for a simple monitoring situation. a switch is monitored using a simple ping test (see appendix j), while an opac server on the other side of the switch is monitored for both web and z39.50 operations (see appendix i). note that the opac configuration includes a parents directive that tells nagios that a problem with the gateway-switch will affect connectivity with the opac server. i monitor fifty remote sites. if my router is down, a single notification regarding my router provides more information if it is not buried in a storm of notifications about the remote sites. the web port, web service, and opac search services demonstrate different levels of monitoring. the web port simply attempts to establish a connection to port 80 without evaluating anything beyond a successful connection. the web service check requests a specific page from the web server and evaluates only the status code returned by the server. it displays a warning because i configured the check to download a file that does not exist. the web server is running because it returns an error code, hence the warning status. the opac search uses a known search to evaluate the result content, specifically whether the correct number of results is returned for a known search. i used a number of templates in the creation of this configuration. templates reduce the amount of repetitive typing by allowing the reuse of directives. templates can be chained, as seen in the host templates. the opac definition uses the linux-server template, which in turn uses the generic-host template. the host definition inherits the directives of the template it uses, overriding any elements in both and adding new elements. in practical terms, generic-host directives are read first. linux-server directives are applied next. if there is a conflict, the linuxserver directive takes precedence. finally, opac is read. again, any conflicts are resolved in favor of the last configuration read, in this case opac. n plug-ins and service checks the nagios plugins package provides numerous plug-ins, including the check-host-alive, check_ping, check_tcp, and check_http commands. using the plug-ins is straightforward, as demonstrated in the appendixes. most plugins will provide some information on use if executed with—help supplied as an argument to the command. by default, the plug-ins are installed in /usr/lib/nagios/ plugins. some distributions may install them in a different directory. the plugins folder contains a subfolder with usercontributed scripts that have proven useful. most of these plug-ins are perl scripts, many of which require additional perl modules available from the comprehensive perl archive network (cpan). the check_hip_search plug-in (appendix k) used in the examples requires additional modules. installing perl modules is best accomplished using the cpan perl module. detailed instructions on module installation are available online.15 some general tips: n gcc and make should be installed before trying to install perl modules, regardless of whether you are installing manually or using cpan. most modules are provided as source code, which may require compiling before use. cpan automates this process but requires the presence of these packages. n alternately, many linux distributions provide perl module packages. using repositories to install usually works well assuming the repository has all the needed modules. in my experience, that is rarely the case. table 2. sample commands for managing the htpasswd.users file create or modify an entry, with password entered at a prompt: htpasswd /etc/nagios/htpasswd.users create or modify an entry using password from the command line: htpasswd -b /etc/nagios/htpasswd.users delete an entry from the file: htpasswd -d /etc/nagios/htpasswd.users 12 information technology and libraries | march 2010 n many modules depend on other modules, sometimes requiring multiple install steps. both cpan and distribution package managers usually satisfy dependencies automatically. manual installation requires the installer to satisfy the dependencies one by one. n most plug-ins provide information on required software, including modules, in a readme file or in the source code for the script. in the absence of such documentation, running the script on the command line usually produces an error containing the name of the missing module. n testing should be done using the nagios user. using another user account, especially the root user, to create directories, copy files, and run programs creates folders and files that are not accessible to the nagios user. the best practice is to use the nagios user for as much of the configuration and testing as possible. the lists and forums frequently include questions from new users that have successfully installed, configured, and tested nagios as the root user and are confused when nagios fails to start or function properly. n advanced topics once the system is running, more advanced features can be explored. the documentation describes many such enhancements, but the following may be particularly useful depending on the situation. n nagios provides access control through the combination of settings in the cgi.cfg and htpasswd.users files. library administration and staff, as well as patrons, may appreciate the ability to see the status of the various systems. however, care should be taken to avoid disclosing sensitive information regarding the network or passwords, or allowing access to cgi programs that perform actions. n nagios permits the establishment of dependency relationships. host dependencies may be useful in some rare circumstances not covered by the parent–child relationships mentioned above, but service dependencies provide a method of connecting services in a meaningful manner. for example, certain opac functions are dependent on ils services. defining these relationships takes both time and thought, which may be worthwhile depending on any given situation. n event handlers allow nagios to initiate certain actions after a state change. if nagios notices that a particular service is down, it can run a script or program to attempt to correct the problem. care should be taken when creating these scripts as service restarts may delete or overwrite information critical to solving a problem, or worsen the actual situation if an attempt to restart a service or reboot a server fails. n nagios provides notification escalations, permitting the automatic notification of problems that last longer than a certain time. for example, a service escalation could send the first three alerts to the admin group. if properly configured, the fourth alert would be sent to the managers group as well as the admin group. in addition to escalating issues to management, this feature can be used to establish a series of responders for multiple on-call personnel. n nagios can work in tandem with remote machines. in addition to custom scripts using secure shell (ssh), the nagios remote plug-in executor (nrpe) add-on allows the execution of plug-ins on remote machines, while the nagios service check acceptor (nsca) add-on allows a remote host to submit check results to the nagios server for processing. implementing nagios on the feher and sondag wireless network mentioned earlier would require one of these options because the wireless network is not accessible from the external network. these add-ons also allow for distributed monitoring, sharing the load among a number of servers while still providing the administrators with a single interface to the entire monitored network. the nagios exchange (http://exchange.nagios .org/) contains similar user-contributed programs for windows. n nagios can be configured to provide redundant or failover monitoring. limoncelli, hogan, and chalup call this metamonitoring and describe when it is needed and how it can be implemented, suggesting self-monitoring by the host or having a second monitoring system that only monitors the main system.16 nagios permits more complex configurations, allowing for either two servers operating in parallel, only one of which sends notifications unless the main server fails, or two servers communicating to share the monitoring load. n alternative means of notification increase access to information on the status of the network. i implemented another open-source software package, quickpage, which allows nagios text messages to be sent from a computer to a pager or cell phone.17 appendix l shows a screenshot of a firefox extension that displays host and service problems in the status bar of my browser and provides optional audio alerts.18 the nagios community has developed a number of alternatives, including specialized web interfaces and rss feed generators.19 monitoring network and service availability with open-source software | silver 13 n appropriate use monitoring uses bandwidth and adds to the load of machines being monitored. accordingly, an it department should only monitor its own servers and devices, or those for which it has permission to do so. imagine what would happen if all the users of a service such as worldcat started monitoring it! the additional load would be noticeable and could conceivably disrupt service. aside from reasons connected with being a good “netizen,” monitoring appears similar to port-scanning, a technique used to discover network vulnerabilities. an organization that blithely monitors devices without the owner’s permission may find their traffic is throttled back or blocked entirely. if a library has a definite need to monitor another service, obtaining permission to do so is a vital first step. if permission is withheld, the service level agreement between the library and its service provider or vendor should be reevaluated to ensure that the provider has an appropriate system in place to respond to problems. n benefits the system-administration books provide an accurate overview of the benefits of monitoring, but personally reaping those benefits provides a qualitative background to the experience. i was able to justify the time spent on setting up monitoring the first day of production. one of the available plug-ins monitors sybase database servers. it was one of the first contributed plug-ins i implemented because of past experiences with our production database running out of free space, causing the system to become nonfunctional. this happened twice, approximately a year apart. each time, the integrated library system was down while the vendor addressed the issue. when i enabled the sybase service checks, nagios immediately returned a warning for the free space. the advance warning allowed me to work with the vendor to extend the database volume with no downtime for our users. that single event convinced the library director of the value of the system. since that time, nagios has proven its worth in alerting it staff to problem situations, providing information on outage patterns both for in-house troubleshooting and discussions with service providers. n conclusion monitoring systems and services provides it staff with a vital tool in providing quality customer service and managing systems. installing and configuring such a system involves a learning curve and takes both time and computing resources. my experiences with nagios have convinced me that the return on investment more than justifies the costs. references 1. thomas a. limoncelli, christina j. hogan, and strata r. chalup, the practice of system and network administration, 2nd ed. (upper saddle river, n.j.: addison-wesley, 2007): 36. 2. ibid., 523–42. 3. james feher and tyler sondag, “administering an opensource wireless network,” information technology & libraries 27, no. 3 (sept. 2008): 44–54. 4. dave pattern, “keeping an eye on your hip,” online posting, jan. 23, 2007, self-plagiarism is style, http://www.daveyp .com/blog/archives/164 (accessed nov. 20, 2008). 5. feher and sondag, “administering an open-source wireless network,” 45–54. 6. ibid., 48, 53–54. 7. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 8. pattern, “keeping an eye on your hip.” 9. limoncelli, hogan, and chalup, the practice of system and network administration, xxv. 10. “comparison of network monitoring systems,” wikipedia, the free encyclopedia, dec. 9, 2008, http://en.wikipedia .org/wiki/comparison_of_network_monitoring_systems (accessed dec. 10, 2008). 11. william von hagen and brian k. jones, linux server hacks, vol. 2 (sebastopol, calif.: o’reilly, 2005): 371–74 (zabbix), 382–87 (nagios). 12. monitoringexchange, http://www.monitoringexchange. org/ (accessed dec. 23, 2009); nagios community, http:// community.nagios.org (accessed dec. 23, 2009); nagios wiki, http://www.nagioswiki.org/ (accessed dec. 23, 2009). 13. “nagios documentation,” nagios, mar. 4, 2008, http:// www.nagios.org/docs/ (accessed dec. 8, 2008); david josephsen, building a monitoring infrastructure with nagios (upper saddle river, n.j.: prentice hall, 2007); wolfgang barth, nagios: system and network monitoring, u.s. ed. (san francisco: open source press; no starch press, 2006). 14. ethan galstead, “nagios quickstart installation guides,” nagios 3.x documentation, nov. 30, 2008, http://nagios.source forge.net/docs/3_0/quickstart.html (accessed dec. 3, 2008). 15. the perl directory, (http://www.perl.org/) contains complete information on perl. specific information on using cpan is available in “how do i install a module from cpan?” perlfaq8, nov. 7, 2007, http://perldoc.perl.org/perlfaq8.html (accessed dec. 4, 2008). 16. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 17. thomas dwyer iii, qpage solutions, http://www.qpage .org/ (accessed dec. 9, 2008). 18. petr šimek, “nagioschecker,” google code, aug. 12, 2008, http://code.google.com/p/nagioschecker/ (accessed dec. 8, 2008). 19. “notifications,” monitoringexchange, http://www .monitoringexchange.org/inventory/utilities/addon-projects/notifications (accessed dec. 23, 2009). 14 information technology and libraries | march 2010 appendix a. service detail display from test system appendix b. service details for opac (hip) and ils (horizon) servers from production system appendix c. sybase freespace trends for a specified period appendix d. connectivity history for a specified period appendix e. availability report for host shown in appendix d appendix f. templates.cfg file ############################################################################ # templates.cfg sample object templates ############################################################################ ############################################################################ # contact templates ############################################################################ monitoring network and service availability with open-source software | silver 15 # generic contact definition template this is not a real contact, just # a template! define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } ############################################################################ # host templates ############################################################################ # generic host definition template this is not a real host, just # a template! define host{ name generic-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 } # linux host definition template this is not a real host, just a template! define host{ name linux-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 } appendix f. templates.cfg file (cont.) 16 information technology and libraries | march 2010 # define a template for switches that we can reuse define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } ############################################################################ # service templates ############################################################################ # generic service definition template this is not a real service, # just a template! define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 } appendix f. templates.cfg file (cont.) monitoring network and service availability with open-source software | silver 17 # define a ping service. this is not a real service, just a template! define service{ use generic-service name ping-service notification_options n check_command check_ping!1000.0,20%!2000.0,60% register 0 } appendix f. templates.cfg file (cont.) appendix g. groups.cfg file ############################################################################ # contact group definitions ############################################################################ # we only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias nagios administrators members nagiosadmin } ############################################################################ # host group definitions ############################################################################ # define an optional hostgroup for linux machines define hostgroup{ hostgroup_name linux-servers ; the name of the hostgroup alias linux servers ; long name of the group } # create a new hostgroup for ils servers define hostgroup{ hostgroup_name ils-servers ; the name of the hostgroup alias ils servers ; long name of the group } # create a new hostgroup for switches define hostgroup{ hostgroup_name switches ; the name of the hostgroup alias network switches ; long name of the group } ############################################################################ # service group definitions ############################################################################ 18 information technology and libraries | march 2010 # define a service group for network connectivity define servicegroup{ servicegroup_name network alias network infrastructure services } # define a servicegroup for ils define servicegroup{ servicegroup_name ils-services alias ils related services } appendix g. groups.cfg file (cont.) appendix h. contacts.cfg ############################################################################ # contacts.cfg sample contact/contactgroup definitions ############################################################################ # just one contact defined by default the nagios admin (that’s you) # this contact definition inherits a lot of default values from the # ‘generic-contact’ template which is defined elsewhere. define contact{ contact_name nagiosadmin use generic-contact alias nagios admin email nagios@localhost } appendix i. opac.cfg ############################################################################ # opac server ############################################################################ ############################################################################ # host definition ############################################################################ # define a host for the server we’ll be monitoring # change the host_name, alias, and address to fit your situation define host{ use linux-server host_name opac parents gateway-switch alias opac server monitoring network and service availability with open-source software | silver 19 appendix i. opac.cfg (cont.) address 192.168.1.123 } ############################################################################ # service definitions ############################################################################ # create a service for monitoring the http port define service{ use generic-service host_name opac service_description web port check_command check_tcp!80 } # create a service for monitoring the web service define service{ use generic-service host_name opac service_description web service check_command check_http!-u/bogusfilethatdoesnotexist.html } # create a service for monitoring the opac search define service{ use generic-service host_name opac service_description opac search check_command check_hip_search } # create a service for monitoring the z39.50 port define service{ use generic-service host_name opac service_description z3950 port check_command check_tcp!210 } appendix j. switches.cfg ############################################################################ # switch.cfg sample config file for monitoring switches ############################################################################ ############################################################################ # host definitions ############################################################################ 20 information technology and libraries | march 2010 appendix k. check_hip_search script #!/usr/bin/perl -w ######################### # check horizon information portal (hip) status. # hip is the web-based interface for dynix and horizon # ils systems by sirsidynix corporation. # # this plugin is based on a standalone perl script written # by dave pattern. please see # http://www.daveyp.com/blog/index.php/archives/164/ # for the original script. # # the original script and this derived work are covered by # http://creativecommons.org/licenses/by-nc-sa/2.5/ ######################### use strict; use lwp::useragent; # note the requirement for perl module lwp::useragent! use lib “/usr/lib/nagios/plugins”; use utils qw($timeout %errors); # define the switch that we’ll be monitoring define host{ use generic-switch host_name gateway-switch alias gateway switch address 192.168.0.1 hostgroups switches } ############################################################################ ### # service definitions ############################################################################ ### # create a service to ping to switches # note this entry will ping every host in the switches hostgroup define service{ use ping-service hostgroups switches service_description ping normal_check_interval 5 retry_check_interval 1 } appendix j. switches.cfg monitoring network and service availability with open-source software | silver 21 ### some configuration options my $hipserverhome = “http://ipac.prl.ab.ca/ipac20/ipac. jsp?profile=alap”; my $hipserversearch = “http://ipac.prl.ab.ca/ipac20/ipac.jsp?menu=se arch&aspect=subtab132&npp=10&ipp=20&spp=20&profile=alap&ri=&index=.gw&term=li nux&x=18&y=13&aspect=subtab132&getxml=true”; my $hipsearchtype = “xml”; my $httpproxy = ‘’; ### check home page is available... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserverhome ); my $status = $response->status_line; if( $response->is_success ) { } else { print “hip_search critical: $status\n”; exit $errors{‘critical’}; } } ### check search page is returning results... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserversearch ); my $status = $response->status_line; if( $response->is_success ) { my $results = 0; my $content = $response->content; if( lc( $hipsearchtype ) eq ‘html’ ) { if ( $content =~ /\(\d+?)\<\/b\>\ \;titles matched/ ) { $results = $1; appendix k. check_hip_search script (cont.) 22 information technology and libraries | march 2010 } } if( lc( $hipsearchtype ) eq ‘xml’ ) { if( $content =~ /\(\d+?)\<\/hits\>/ ) { $results = $1; } } ### modified section original script triggered another function to ### save results to a temp file and email an administrator. unless( $results ) { print “hip_search critical: no results returned|results=0\n”; exit $errors{‘critical’}; } if ( $results ) { print “hip_search ok: $results results returned|results=$results\n”; exit $errors{‘ok’}; } } } appendix k. check_hip_search script (cont.) appendix l. nagios checker display batch loading collections into dspace | walsh 117 maureen p. walsh batch loading collections into dspace: using perl scripts for automation and quality control colleagues briefly described batch loading marc metadata crosswalked to dspace dublin core (dc) in a poster session.2 mishra and others developed a perl script to create the dspace archive directory for batch import of electronic theses and dissertations (etds) extracted with a java program from an in-house bibliographic database.3 mundle used perl scripts to batch process etds for import into dspace with marc catalog records or excel spreadsheets as the source metadata.4 brownlee used python scripts to batch process comma-separated values (csv) files exported from filemaker database software for ingest via the dspace item importer.5 more in-depth descriptions of batch loading are provided by thomas; kim, dong, and durden; proudfoot et al.; witt and newton; drysdale; ribaric; floyd; and averkamp and lee. however, irrespective of repository software, each describes a process to populate their repositories dissimilar to the workflows developed for the knowledge bank in approach or source data. thomas describes the perl scripts used to convert marc catalog records into dc and to create the archive directory for dspace batch import.6 kim, dong, and durden used perl scripts to semiautomate the preparation of files for batch loading a university of texas harry ransom humanities research center (hrc) collection into dspace. the xml source metadata they used was generated by the national library of new zealand metadata extraction tool.7 two subsequent projects for the hrc revisited the workflow described by kim, dong, and durden.8 proudfoot and her colleagues discuss importing metadata-only records from departmental refbase, thomson reuters endnote, and microsoft access databases into eprints. they also describe an experimental perl script written to scrape lists of publications from personal websites to populate eprints.9 two additional workflow examples used citation databases as the data source for batch loading into repositories. witt and newton provide a tutorial on transforming endnote metadata for digital commons with xslt (extensible stylesheet language transformations).10 drysdale describes the perl scripts used to convert thomson reuters reference manager files into xml for the batch loading of metadata-only records into the university of glascow’s eprints repository.11 the glascow eprints batch workflow is additionally described by robertson and nixon and greig.12 several workflows were designed for batch loading etds into repositories. ribaric describes the automatic this paper describes batch loading workflows developed for the knowledge bank, the ohio state university’s institutional repository. in the five years since the inception of the repository approximately 80 percent of the items added to the knowledge bank, a dspace repository, have been batch loaded. most of the batch loads utilized perl scripts to automate the process of importing metadata and content files. custom perl scripts were used to migrate data from spreadsheets or comma-separated values files into the dspace archive directory format, to build collections and tables of contents, and to provide data quality control. two projects are described to illustrate the process and workflows. t he mission of the knowledge bank, the ohio state university’s (osu) institutional repository, is to collect, preserve, and distribute the digital intellectual output of osu’s faculty, staff, and students.1 the staff working with the knowledge bank have sought from its inception to be as efficient as possible in adding content to dspace. using batch loading workflows to populate the repository has been integral to that efficiency. the first batch load into the knowledge bank was august 29, 2005. over the next four years, 698 collections containing 32,188 items were batch loaded, representing 79 percent of the items and 58 percent of the collections in the knowledge bank. these batch loaded collections vary from journal issues to photo albums. the items include articles, images, abstracts, and transcripts. the majority of the batch loads, including the first, used custom perl scripts to migrate data from microsoft excel spreadsheets into the dspace batch import format for descriptive metadata and content files. perl scripts have been used for data cleanup and quality control as part of the batch load process. perl scripts, in combination with shell scripts, have also been used to build collections and tables of contents in the knowledge bank. the workflows using perl scripts to automate batch import into dspace have evolved through an iterative process of continual refinement and improvement. two knowledge bank projects are presented as case studies to illustrate a successful approach that may be applicable to other institutional repositories. ■■ literature review batch ingesting is acknowledged in the literature as a means of populating institutional repositories. there are examples of specific batch loading processes minimally discussed in the literature. branschofsky and her maureen p. walsh (walsh.260@osu.edu) is metadata librarian/ assistant professor, the ohio state university libraries, columbus, ohio. 118 information technology and libraries | september 2010 relational database postgresql 8.1.11 on the red hat enterprise linux 5 operating system. the structure of the knowledge bank follows the hierarchical arrangement of dspace. communities are at the highest level and can be divided into subcommunities. each community or subcommunity contains one or more collections. all items—the basic archival elements in dspace—are contained within collections. items consist of metadata and bundles of bitstreams (files). dspace supports two user interfaces: the original interface based on javaserver pages (jspui) and the newer manakin (xmlui) interface based on the apache cocoon framework. at this writing, the knowledge bank continues to use the jspui interface. the default metadata used by dspace is a qualified dc schema derived from the dc library application profile.18 the knowledge bank uses a locally defined extended version of the default dspace qualified dc schema, which includes several additional element qualifiers. the metadata management for the knowledge bank is guided by a knowledge bank application profile and a core element set for each collection within the repository derived from the application profile.19 the metadata librarians at osul create the collection core element sets in consultation with the community representatives. the core element sets serve as metadata guidelines for submitting items to the knowledge bank regardless of the method of ingest. the primary means of adding items to collections in dspace, and the two ways used for knowledge bank ingest, are (1) direct (or intermediated) author entry via the dspace web item submission user interface and (2) in batch via the dspace item importer. recent enhancements to dspace, not yet fully explored for use with the knowledge bank, include new ingest options using simple web-service offering repository deposit (sword), open archives initiative object reuse and exchange (oai-ore), and dspace package importers such as the metadata encoding and transmission standard submission information package (mets sip) preparation of etds from the internet archive (http:// www.archive.org/) for ingest into dspace using php utilities.13 floyd describes the processor developed to automate the ingest of proquest etds via the dspace item importer.14 also using proquest etds as the source data, averkamp and lee described using xslt to transform the proquest data to bepress’ (the berkeley electronic press) schema for batch loading into a digital commons repository.15 the knowledge bank workflows described in this paper use perl scripts to generate dc xml and create the archive directory for batch loading metadata records and content files into dspace using excel spreadsheets or csv files as the source metadata. ■■ background the knowledge bank, a joint initiative of the osu libraries (osul) and the osu office of the chief information officer, was first registered in the registry of open access repositories (roar) on september 28, 2004.16 as of december 2009 the repository held 40,686 items in 1,192 collections. the knowledge bank uses dspace, the open-source java-based repository software jointly developed by the massachusetts institute of technology libraries and hewlett-packard.17 as a dspace repository, the knowledge bank is organized by communities. the fifty-two communities currently in the knowledge bank include administrative units, colleges, departments, journals, library special collections, research centers, symposiums, and undergraduate honors theses. the commonality of the varied knowledge bank communities is their affiliation with osu and their production of knowledge in a digital format that they wish to store, preserve, and distribute. the staff working with the knowledge bank includes a team of people from three osul areas—technical services, information technology, and preservation—and the contracted hours of one systems developer from the osu office of information technology (oit). the osul team members are not individually assigned full-time to the repository. the current osul team includes a librarian repository manager, two metadata librarians, one systems librarian, one systems developer, two technical services staff members, one preservation staff member, and one graduate assistant. the knowledge bank is currently running dspace 1.5.2 and the figure 1. dspace simple archive format archive_directory/ item_000/ dublin_core.xml--qualified dublin core metadata contents --text file containing one line per filename file_l.pdf --files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml file_1.pdf ... batch loading collections into dspace | walsh 119 ■■ case studies the issues of the ohio journal of science ojs was jointly published by osu and the ohio academy of science (oas) until 1974, when oas took over sole control of the journal. the issues of ojs are archived in the knowledge bank with a two year rolling wall embargo. the issues for 1900 through 2003, a total of 639 issues containing 6,429 articles, were batch loaded into the knowledge bank. due to rights issues, the retrospective batch loading project had two phases. the project to digitize ojs began with the 1900–1972 issues that osu had the rights to digitize and make publicly available. osu later acquired the rights for 1973–present, and (accounting for the embargo period) 1973–2003 became phase 2 of the project. the two phases of batch loads were the most complicated automated batch loading processes developed to date for the knowledge bank. to batch load phase 1 in 2005 and phase 2 in 2006, the systems developers working with the knowledge bank wrote scripts to build collections, generate dc xml from the source metadata, create the archive directory, load the metadata and content files, create tables of contents, and load the tables of contents into dspace. the ojs community in the knowledge bank is organized by collections representing each issue of the journal. the systems developers used scripts to automate the building of the collections in dspace because of the number needed as part of the retrospective project. the individual articles within the issues are items within the collections. there is a table of contents for the articles in each issue as part of the collection homepages.21 again, due to the number required for the retrospective project, the systems developers used scripts to automate the creation and loading of the tables of contents. the tables of contents are contained in the html introductory text section of the collection pages. the tables of contents list title, authors, and pages. they also include a link to the item record and a direct link to the article pdf that includes the file size. for each phase of the ojs project, a vendor contracted by osul supplied the article pdfs and an excel spreadsheet with the article-level metadata. the metadata format. this paper describes ingest via the dspace batch item importer. the dspace item importer is a command-line tool for batch ingesting items. the importer uses a simple archive format diagramed in figure 1. the archive is a directory of items that contain a subdirectory of item metadata, item files, and a contents file listing the bitstream file names. each item’s descriptive metadata is contained in a dc xml file. the format used by dspace for the dc xml files is illustrated in figure 2. automating the process of creating the unix archive directory has been the main function of the perl scripts written for the knowledge bank batch loading workflows. a systems developer uses the test mode of the dspace item importer tool to validate the item directories before doing a batch load. any significant errors are corrected and the process is repeated. after a successful test, the batch is loaded into the staging instance of the knowledge bank and quality checked by a metadata librarian to identify any unexpected results and script or data problems that need to be corrected. after a successful load into the staging instance the batch is loaded into the production instance of the knowledge bank. most of the knowledge bank batch loading workflows use excel spreadsheets or csv files as the source for the descriptive item metadata. the creation of the metadata contained in the spreadsheets or files has varied by project. in some cases the metadata is created by osul staff. in other cases the metadata is supplied by knowledge bank communities in consultation with a metadata librarian or by a vendor contracted by osul. whether the source metadata is created in-house or externally supplied, osul staff are involved in the quality control of the metadata. several of the first communities to join the knowledge bank had very large retrospective collection sets to archive. the collection sets of two of those early adopters, the journal issues of the ohio journal of science (ojs) and the abstracts of the osu international symposium on molecular spectroscopy currently account for 59 percent of the items in the knowledge bank.20 the successful batch loading workflows developed for these two communities—which continue to be active content suppliers to the repository—are presented as case studies. figure 2. dspace qualified dublin core xml notes on the bird life of cedar point 1901-04 griggs, robert f. 120 information technology and libraries | september 2010 article-level metadata to knowledge bank dc, as illustrated in table 1. the systems developers used the mapping as a guide to write perl scripts to transform the vendor metadata into the dspace schema of dc. the workflow for the two phases was nearly identical, except each phase had its own batch loading scripts. due to a staff change between the two phases of the project, a former osul systems developer was responsible for batch loading phase 1 and the oit systems developer was responsible for phase 2. the phase 1 scripts were all written in perl. the four scripts written for phase 1 created the archive directory, performed database operations to build the collections, generated the html introduction table of contents for each collection, and loaded the tables of contents into dspace via the database. for phase 2, the oit systems developer modified and added to the phase 1 batch processing scripts. this case study focuses on phase 2 of the project. batch processing for phase 2 of ojs the annotated scripts the oit systems developer used for phase 2 of the ojs project are included in appendix a, available on the italica weblog (http://ital-ica .blogspot.com/). a shell script (mkcol.sh) added collections based on a listing of the journal issues. the script performed a login as a selected user id to the dspace web interface using the web access tool curl. a subsequent simple looping perl script (mkallcol.pl) used the stored credentials to submit data via this channel to build the collections in the knowledge bank. the metadata.pl script created the archive directory for each collection. the oit systems developer added the pdf file for each item to unix. the vendor-supplied metadata was saved as unicode text format and transferred to unix for further processing. the developer used vi commands to manually modify metadata for characters illegal in xml (e.g., “<” and “&”). (although manual steps were used for this project, the oit systems developer improved the perl scripts for subsequent projects by adding code for automated transformation of the input data to help ensure xml validity.) the metadata.pl script then processed each line of the metadata along with the corresponding data file. for each item, the script created the dc xml file and the contents file and moved them and the pdf file to the proper directory. load sets for each collection (issue) were placed in their own subdirectory, and a load was done for each subdirectory. the items for each collection were loaded by a small perl script (loaditems. pl) that used the list of issues and their collection ids and called a shell script (import.sh) for the actual load. the tables of contents for the issues were added to the knowledge bank after the items were loaded. a perl script (intro.pl) created the tables of contents using the metadata and the dspace map file, a stored mapping of item received from the vendor had not been customized for the knowledge bank. the ojs issues were sent to a vendor for digitization and metadata creation before the knowledge bank was chosen as the hosting site of the digitized journal. the osu digital initiatives steering committee 2002 proposal for the ojs digitization project had predated the knowledge bank dspace instance. osul staff performed quality-control checks of the vendor-supplied metadata and standardized the author names. the vendor supplied the author names as they appeared in the articles—in direct order, comma separated, and including any “and” that appeared. in addition to other quality checks performed, osul staff edited the author names in the spreadsheet to conform to dspace author-entry convention (surname first). semicolons were added to separate author names, and the extraneous ands were removed. a former metadata librarian mapped the vendor-supplied table 1. mapping of vendor metadata to qualified dublin core vendor-supplied metadata knowledge bank dublin core file [n/a: pdf file name] cover title dc.identifier.citation* issn dc.identifier.issn vol. dc.identifier.citation* iss. dc.identifier.citation* cover date dc.identifier.citation* year dc.date.issued month dc.date.issued fpage dc.identifier.citation* lpage dc.identifier.citation* article title dc.title author names dc.creator institution dc.description abstract dc.description.abstract n/a dc.language.iso n/a dc.rights n/a dc.type *format: [cover title]. v[vol.], n[iss.] ([cover date]), [fpage]-[lpage] batch loading collections into dspace | walsh 121 directories to item handles created during the load. the tables of contents were added to the knowledge bank using a shell script (installintro.sh) similar to what was used to create the collections. installintro.sh used curl to simulate a user adding the data to dspace by performing a login as a selected user id to the dspace web interface. a simple looping perl script (ldallintro.pl) called installintro.sh and used the stored credentials to submit the data for the tables of contents. the abstracts of the osu international symposium on molecular spectroscopy the knowledge bank contains the abstracts of the papers presented at the osu international symposium on molecular spectroscopy (mss), which has met annually since 1946. beginning with the 2005 symposium, the complete presentations from authors who have authorized their inclusion are archived along with the abstracts. the mss community in the knowledge bank currently contains 17,714 items grouped by decade into six collections. the six collections were created “manually” via the dspace web interface prior to the batch loading of the items. the retrospective years of the symposium (1946–2004) were batch loaded in three phases in 2006. each symposium year following the retrospective loads was batch loaded individually. retrospective mss batch loads the majority of the abstracts for the retrospective loads were digitized by osul. a vendor was contracted by osul to digitize the remainder and to supply the metadata for the retrospective batch loads. the files digitized by osul were sent to the vendor for metadata capture. osul provided the vendor a metadata template derived from the mss core element set. the metadata taken from the abstracts comprised author, affiliation, title, year, session number, sponsorship (if applicable), and a full transcription of the abstract. to facilitate searching, the formulas and special characters appearing in the titles and abstracts were encoded using latex, a document preparation system used for scientific data. the vendor delivered the metadata in excel spreadsheets as per the spreadsheet template provided by osul. quality-checking the metadata was an essential step in the workflow for osul. the metadata received for the project required revisions and data cleanup. the vendor originally supplied incomplete files and spreadsheets that contained data errors, including incorrect numbering, data in the wrong fields, and inconsistency with the latex encoding. the three knowledge bank batch load phases for the retrospective mss project corresponded to the staged receipt of metadata and digitized files from the vendor. the annotated scripts used for phase 2 of the project, which included twenty years of the osu international symposium between 1951 and 1999, are included in appendix b, available on the italica weblog. the oit systems developer saved the metadata as a tab-separated file and added it to unix along with the abstract files. a perl script (mkxml2.pl) transformed the metadata into dc xml and created the archive directories for loading the metadata and abstract files into the knowledge bank. the script divided the directories into separate load sets for each of the six collections and accounted for the inconsistent naming of the abstract files. the script added the constant data for type and language that was not included in the vendor-supplied metadata. unlike the ojs project, where multiple authors were on the same line of the metadata file, the mss phase 2 script had to code for authors and their affiliations on separate lines. once the load sets were made, the oit systems developer ran a shell script to load them. the script (import_ collections.sh) was used to run the load for each set so that the dspace item import command did not need to be constructed each time. annual mss batch loads a new workflow was developed for batch loading the annual mss collection additions. the metadata and item files for the annual collection additions are supplied by the mss community. the community provides the symposium metadata in a csv file and the item files in a tar archive file. the symposium uses a web form for latex–formatted abstract submissions. the community processes the electronic symposium submissions with a perl script to create the csv file. the metadata delivered in the csv file is based on the template created by the author, which details the metadata requirements for the project. the oit systems developer borrowed from and modified earlier perl scripts to create a new script for batch processing the metadata and files for the annual symposium collection additions. to assist with the development of the new script, i provided the developer a mapping of the community csv headings to the knowledge bank dc fields. i also provided a sample dc xml file to illustrate the desired result of the perl transformation of the community metadata into dc xml. for each new year of the symposium, i create a sample dc xml result for an item to check the accuracy of the script. a dc xml example from a 2009 mss item is included in appendix c, available on the italica weblog. unlike the previous retrospective mss loads in which the script processed multiple years of the symposium, the new script processes one year at a time. the annual symposiums are batch loaded individually into one existing mss decade collection. the new script for the annual loads was tested and refined by loading the 2005 symposium into the staging instance of the 122 information technology and libraries | september 2010 ■■ summary and conclusion each of the batch loads that used perl scripts had its own unique features. the format of content and associated metadata varied considerably, and custom scripts to convert the content and metadata into the dspace import format were created on a case-by-case basis. the differences between batch loads included the delivery format of the metadata, the fields of metadata supplied, how metadata values were delimited, the character set used for the metadata, the data used to uniquely identify the files to be loaded, and how repeating metadata fields were identified. because of the differences in supplied metadata, a separate perl script for generating the dc xml and archive directory for batch loading was written for each project. each new perl script borrowed from and modified earlier scripts. many of the early batch loads were firsts for the knowledge bank and the staff working with the repository, both in terms of content and in terms of metadata. dealing with communityand vendor-supplied metadata and various encodings (including latex), each of the early loads encountered different data obstacles, and in each case solutions were written in perl. the batch loading code has matured over time, and the progression of improvements is evident in the example scripts included in the appendixes. batch loading can greatly reduce the time it takes to add content and metadata to a repository, but successful knowledge bank. problems encountered with character encoding and file types were resolved by modifying the script. the metadata and files for the symposium years 2005, 2006, and 2007 were made available to osul in 2007, and each year was individually loaded into the existing knowledge bank collection for that decade. these first three years of community-supplied csv files contained author metadata inconsistent with knowledge bank author entries. the names were in direct order, uppercase, split by either a semicolon or “and,” and included extraneous data, such as an address. the oit systems developer wrote a perl script to correct the author metadata as part of the batch loading workflow. an annotated section of that script illustrating the author modifications is included in appendix d, available on the italica weblog. the mss community revised the perl script they used to generate the csv files by including an edited version of this author entry correction script and were able to provide the expected author data for 2008 and 2009. the author entries received for these years were in inverted order (surname first) and mixed case, were semicolon separated, and included no extraneous data. the receipt of consistent data from the community for the last two years has facilitated the standardized workflow for the annual mss loads. the scripts used to batch load the 2009 symposium year are included in appendix e, which appears at the end of this text. the oit systems developer unpacked the tar file of abstracts and presentations into a directory named for the year of the symposium on unix. the perl script written for the annual mss loads (mkxml. pl) was saved on unix and renamed mkxml2009.pl. the script was edited for 2009 (including the name of the csv file and the location of the directories for the unpacked files and generated xml). the csv headings used by the community in the new file were checked and verified against the extract list in the script. once the perl script was up-to-date and the base directory was created, the oit systems developer ran the perl script to generate the archive directory set for import. the import.sh script was then edited for 2009 and run to import the new symposium year into the staging instance of the knowledge bank as a quality check prior to loading into the live repository. the brief item view of an example mss 2009 item archived in the knowledge bank is shown in figure 3. figure 3. mss 2009 archived item example batch loading collections into dspace | walsh 123 proceedings of the 2003 international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, 2003, http://dcpapers .dublincore.org/ojs/pubs/article/view/753/749 (accessed dec. 21, 2009). 3. r. mishra et al., “development of etd repository at iitk library using dspace,” in international conference on semantic web and digital libraries (icsd-2007), ed. a. r. d. prasad and devika p. madalli (2007), 249–59. http://hdl.handle .net/1849/321 (accessed dec. 21, 2009). 4. todd m. mundle, “digital retrospective conversion of theses and dissertations: an in house project” (paper presented to the 8th international symposium on electronic theses & dissertations, sydney, australia, sept. 28–30, 2005), http://adt.caul .edu.au/etd2005/papers/080mundle.pdf (accessed dec. 21, 2009). 5. rowan brownlee, “research data and repository metadata: policy and technical issues at the university of sydney library,” cataloging & classification quarterly 47, no. 3/4 (2009): 370–79. 6. steve thomas, “importing marc data into dspace,” 2006, http://hdl.handle.net/2440/14784 (accessed dec. 21, 2009). 7. sarah kim, lorraine a. dong, and megan durden, “automated batch archival processing: preserving arnold wesker’s digital manuscripts,” archival issues 30, no. 2 (2006): 91–106. 8. elspeth healey, samantha mueller, and sarah ticer, “the paul n. banks papers: archiving the electronic records of a digitally-adventurous conservator,” 2009, https://pacer .ischool.utexas.edu/bitstream/2081/20150/1/paul_banks_ final_report.pdf (accessed dec. 21, 2009); lisa schmidt, “preservation of a born digital literary genre: archiving legacy macintosh hypertext files in dspace,” 2007, https://pacer .ischool.utexas.edu/bitstream/2081/9007/1/mj%20wbo%20 capstone%20report.pdf (accessed dec. 21, 2009). 9. rachel e. proudfoot et al., “jisc final report: increase (increasing repository content through automation and services),” 2009, http://eprints.whiterose.ac.uk/9160/ (accessed dec. 21, 2009). 10. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs .lib.purdue.edu/lib_research/96/ (accessed dec. 21, 2009). 11. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed dec. 21, 2009). 12. r. john robertson, “evaluation of metadata workflows for the glasgow eprints and dspace services,” 2006, http://hdl .handle.net/1905/615 (accessed dec. 21, 2009); william j. nixon and morag greig, “populating the glasgow eprints service: a mediated model and workflow,” 2005, http://hdl.handle .net/1905/387 (accessed dec. 21, 2009). 13. tim ribaric, “automatic preparation of etd material from the internet archive for the dspace repository platform,” code4lib journal no. 8 (nov. 23, 2009), http://journal.code4lib.org/ articles/2152 (accessed dec. 21, 2009). 14. randall floyd, “automated electronic thesis and dissertations ingest,” (mar. 30, 2009), http://wiki.dlib.indiana.edu/ confluence/x/01y (accessed dec. 21, 2009). 15. shawn averkamp and joanna lee, “repurposing probatch loading workflows are dependent upon the quality of data and metadata loaded. along with testing scripts and checking imported metadata by first batch loading to a development or staging environment, quality control of the supplied metadata is an integral step. the flexibility of perl allowed testing and revising to accommodate problems encountered with how the metadata was supplied for the heterogeneous collections batch loaded into the knowledge bank. however, toward the goal of standardizing batch loading workflows, the staff working with the knowledge bank iteratively refined not only the scripts but also the metadata requirements for each project and how those were communicated to the data suppliers with mappings, explicit metadata examples, and sample desired results. the efficiency of batch loading workflows is greatly enhanced by consistent data and basic standards for how metadata is supplied. batch loading is not only an extremely efficient means of populating an institutional repository, it is also a valueadded service that can increase buy-in from the wider campus community. it is hoped that by openly sharing examples of our batch loading scripts we are contributing to the development of an open library of code that can be borrowed and adapted by the library community toward future institutional repository success stories. ■■ acknowledgments i would like to thank conrad gratz, of osu oit, and andrew wang, formerly of osul. gratz wrote the shell scripts and the majority of the perl scripts used for automating the knowledge bank item import process and ran the corresponding batch loads. the early perl scripts used for batch loading into the knowledge bank, including the first phase of ojs and mss, were written by wang. parts of those early perl scripts written by wang were borrowed for subsequent scripts written by gratz. gratz provided the annotated scripts appearing in the appendixes and consulted with the author regarding the description of the scripts. i would also like to thank amanda j. wilson, a former metadata librarian for osul, who was instrumental to the success of many of the batch loading workflows created for the knowledge bank. references and notes 1. the ohio state university knowledge bank, “institutional repository policies,” 2007, http://library.osu.edu/sites/ kbinfo/policies.html (accessed dec. 21, 2009). the knowledge bank homepage can be found at https://kb.osu.edu/dspace/ (accessed dec. 21, 2009). 2. margret branschofsky et al., “evolving metadata needs for an institutional repository: mit’s dspace,” 124 information technology and libraries | september 2010 appendix e. mss 2009 batch loading scripts -mkxml2009.pl -#!/usr/bin/perl use encode; # routines for utf encoding use text::xsv; # routines to process csv files. use file::basename; # open and read the comma separated metadata file. my $csv = new text::xsv; #$csv->set_sep(' '); # use for tab separated files. $csv->open_file("mss2009.csv"); $csv->read_header(); # process the csv column headers. # constants for file and directory names. $basedir = "/common/batch/input/mss/"; $indir = "$basedir/2009"; $xmldir= "./2009xml"; $imagesubdir= "processed_images”; $filename = "dublin_core.xml"; # process each line of metadata, one line per item. $linenum = 1; while ($csv->get_row()) { # this divides the item's metadata into fields, each in its own variable. my ( $identifier, $title, $creators, $description_abstract, $issuedate, $description, $description2, appendixes a–d available at http://ital-ica.blogspot.com/ quest metadata for batch ingesting etds into an institutional repository,” code4lib journal no. 7 (june 26, 2009), http://journal .code4lib.org/articles/1647 (accessed dec. 21, 2009). 16. tim brody, registry of open access repositories (roar), http://roar.eprints.org/ (accessed dec. 21, 2009). 17. duraspace, dspace, http://www.dspace.org/ (accessed dec. 21, 2009). 18. dublin core metadata initiative libraries working group, “dc-library application profile (dc-lib),” http://dublincore .org/documents/2004/09/10/library-application-profile/ (accessed dec. 21, 2009). 19. the ohio state university knowledge bank policy committee, “osu knowledge bank metadata application profile,” http://library.osu.edu/sites/techservices/kbappprofile.php (accessed dec. 21, 2009). 20. ohio journal of science (ohio academy of science), knowledge bank community, http://hdl.handle .net/1811/686 (accessed dec. 21, 2009); osu international symposium on molecular spectroscopy, knowledge bank community, http://hdl.handle.net/1811/5850 (accessed dec. 21, 2009). 21. ohio journal of science (ohio academy of science), ohio journal of science: volume 74, issue 3 (may, 1974), knowledge bank collection, http://hdl.handle.net/1811/22017 (accessed dec. 21, 2009). batch loading collections into dspace | walsh 125 $abstract, $gif, $ppt, ) = $csv->extract( "talk_id", "title", "creators", "abstract", "issuedate", "description", "authorinstitution", "image_file_name", "talk_gifs_file", "talk_ppt_file" ); $creatorxml = ""; # multiple creators are separated by ';' in the metadata. if (length($creators) > 0) { # create xml for each creator. @creatorlist = split(/;/,$creators); foreach $creator (@creatorlist) { if (length($creator) > 0) { $creatorxml .= '' .$creator.’’.”\n “; } } } # done processing creators for this item. # create the xml string for the abstract. $abstractxml = ""; if (length($description_abstract) > 0) { # convert special metadata characters for use in xml/html. $description_abstract =~ s/\&/&/g; $description_abstract =~ s/\>/>/g; $description_abstract =~ s/\' .$description_abstract.''; } # create the xml string for the description. $descriptionxml = ""; if (length($description) > 0) { # convert special metadata characters for use in xml/html. $description=~ s/\&/&/g; $description=~ s/\>/>/g; $description=~ s/\' .$description.''; } appendix e. mss 2009 batch loading scripts (cont.) 126 information technology and libraries | september 2010 # create the xml string for the author institution. $description2xml = ""; if (length($description2) > 0) { # convert special metadata characters for use in xml/html. $description2=~ s/\&/&/g; $description2=~ s/\>/>/g; $description2=~ s/\' .'author institution: '.$description2.''; } # convert special characters in title. $title=~ s/\&/&/g; $title=~ s/\>/>/g; $title=~ s/\:encoding(utf-8)", "$basedir/$subdir/$filename"); print fh <<"xml"; $identifier $title $issuedate $abstractxml $descriptionxml $description2xml article en $creatorxml xml close($fh); # create contents file and move files to the load set. # copy item files into the load set. if (defined($abstract) && length($abstract) > 0) { system "cp $indir/$abstract $basedir/$subdir"; } $sourcedir = substr($abstract, 0, 5); if (defined($ppt) && length($ppt) > 0 ) { system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/"; } if (defined($gif) && length($gif) > 0 ) { system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/"; } # make the 'contents' file and fill it with the file names. appendix e. mss 2009 batch loading scripts (cont.) batch loading collections into dspace | walsh 127 system "touch $basedir/$subdir/contents"; if (defined($gif) && length($gif) > 0 && -d "$indir/$sourcedir/$imagesubdir" ) { # sort items in reverse order so they show up right in dspace. # this is a hack that depends on how the db returns items # in unsorted (physical) order. there are better ways to do this. system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents"; system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[a-za-z][0-9].* | sort -r >> $basedir/$subdir/contents"; } if (defined($ppt) && length($ppt) > 0 && -d "$indir/$sourcedir/$sourcedir" ) { system "cd $indir/$sourcedir/$sourcedir/;" . " ls *.* >> $basedir/$subdir/contents"; } # put the abstract in last, so it displays first. system "cd $basedir/$subdir; basename $abstract >>" . " $basedir/$subdir/contents"; $linenum++; } # done processing an item. --------------------------------------------------------------------------------------------------import.sh –#!/bin/sh # # import a collection from files generated on dspace # collection_id=1811/6635 eperson=[name removed]@osu.edu source_dir=./2009xml base_id=`basename $collection_id` mapfile=./map-dspace03-mss2009.$base_id /dspace/bin/dsrun org.dspace.app.itemimport.itemimport --add --eperson=$eperson --collection=$collection_id --source=$source_dir --mapfile=$mapfile appendix e. mss 2009 batch loading scripts (cont.) extending im beyond the reference desk: a case study on the integration of chat reference and library-wide instant messaging network ian chan, pearl ly, and yvonne meulemans information technology and libraries | september 2012 4 abstract openfire is an open-source instant messaging (im) network and a single unified application that meets the needs of chat reference and internal communication. in fall 2009, the california state university san marcos (csusm) library began using openfire and other jive software im technologies to simultaneously improve our existing im-integrated chat reference software and implement an internal im network. this case study describes the chat reference and internal communications environment at the csusm library and the selection, implementation, and evaluation of openfire. in addition, the authors discuss the benefits of deploying an integrated im and chat reference network. introduction instant messaging (im) has become a prevalent contact point for library patrons to get information and reference help, commonly known as chat reference or virtual reference. however, im can also offer a unique method of communication between library staff. librarians are able to rapidly exchange information synchronously or asynchronously in an informal way. im provides another means of building relationships within the library organization and can improve teamwork. many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified application that uses the extensible messaging and presence protocol (xmpp), a widely adopted open protocol for im. since 2009, the california state university san marcos (csusm) kellogg library has used openfire for chat reference and internal im communication. openfire was relatively easy to set up and administer by the web development librarian. librarians and library users have found the im interface to be intuitive. in addition to helpful chat reference features such as statistics capture, queues, transfer, linking to meebo widgets, openfire offers the unique capability to host an internal im network within the library. ian chan (ichan@csusm.edu) is web development librarian, california state university san marcos, pearl ly (pmly@pasadena.edu) is access services & emerging technologies librarian, pasadena community college, pasadena, and yvonne meulemans (ymeulema@csusm.edu) is information literacy program coordinator, california state university san marcos, california. extending im beyond the reference desk | chan, ly, and meulemans 5 in this article, the authors present a literature review on im as a workplace communication tool and its successful use in libraries for chat reference services. a case study on the selection, implementation, and evaluation of openfire for use in chat reference and as an internal network will be discussed. in addition, survey results on the library staff use of the internal im network and its implications for collaboration and increased communication are shared. literature review although there is a great deal of literature on im for library reference services, publications on the use of im in libraries for internal communications do not appear in the professional literature. a review of library and information science (lis) literature has revealed very limited work on this aspect of instant messaging. however, a wider literature review in the fields of communications, computer science, and business, indicates there is growing interest in studying the benefits of im within organizations. instant messaging in the workplace in the workplace, im can offer a cost-effective means of connecting in real-time and may increase communication effectiveness between employees. it offers a number of advantages over email, telephone, and face-to-face that we will discuss further in the following section. within the academic library, im offers the possibility of not only improving access to librarians for research help but also provides the opportunity to enhance communication and collaboration throughout the entire organization. research findings indicate that im allows coworkers to maintain a sense of connection and context that is different from email, face-to-face (ftf), and phone conversations.1 each im conversation is designed to display as a single textual thread with one window per conversation. the contributions from each person in the discussion are clearly indicated and it is easy to review what has been said. this design supports the intermittent reconnection of conversation and in contrast to email, “intermittent instant messages were thought to be more immersive and to give more of a sense of a shared space and context than such email exchanges.”2 through the use of im, coworkers gain a highly interactive channel of communication that is not available via other methods of communication.3 phone and ftf conversations are two of the most common forms of interruption within the workplace.4 however, garrett and danziger found that “instant messaging in the workplace simultaneously promotes more frequent communications and reduces interruptions.”5 participants reported they were better able to manage disruptions using im and that im did not increase their communication time. the findings of this study revealed that some communication that otherwise may have occurred over email, by telephone, or in-person were instead delivered via im. this likely contributed to the reduced interruptions because im does not require full and immediate attention unlike a phone call or face-to-face communication. in addition, im study participants reported the ability to negotiate their availability through postponing conversations, information technology and libraries | september 2012 6 and these findings support earlier studies suggesting im is less intrusive than traditional communication methods for determining availability of coworkers.6 a number of research studies show that im improves teamwork and is useful for discussing complex tasks. huang, hung, and chen compared the effectiveness of email and im and the number of new ideas; they found that groups utilizing im generated more ideas than the email groups.7 they suggested that the spontaneous and rapid interchanges typical of im facilitates brainstorming between team members. the information that is uniquely visible through im and the ease of sending messages help create opportunities for spontaneous dialog. this is supported by a study by quan-haase, cothrel, and wellman, which found im promotes team interaction by indicating the likelihood of a faster response.8 ou et al. also suggest im has “potential to empower teamwork by establishing social networks and facilitating knowledge sharing among organizational members.”9 im can enhance the social connectedness of coworkers through its focus on contact lists and instant, opportunistic interactivity. the informal and personalized nature of im allows workers to build relationships while promoting the sharing of information. cho, trier, and kim suggest that the use of im as a communication tool encourages unplanned virtual hallway discussions that may be difficult for those located in different parts of a building, campus, or in remote locations.10 im can build relationships between teams and organizations where members are in physically separated locations. however, cho, trier, and kim also note that im is more successful in building relationships between coworkers who already have an existing relationship. wu et al. argue that by helping to build the social network within the organization, instant messaging can contribute to increased productivity.11 several studies have cautioned that im, like other forms of communication, requires organizational guidelines on usage and best practices. mahatanankoon suggests that productivity or job satisfaction may decrease without policies and workplace norms that guide im use.12 other research indicates that personality, employee status, and working style may affect the usefulness of im for individual employees.13 some workers may find the multitasking nature of im to work in their favor while those who prefer sequential task completion may find im disruptive. the hierarchy of work relationships and the nature of managerial styles are likely to have an impact on the use of im as well. while there are no research findings associated with the use of im for internal communication within libraries, there are articles encouraging its use. breeding writes of the potential for im to bring about “a level of collaboration that only rarely occurs with the store-and-forward model of traditional e-mail.”14 fink provides a concise introduction to the advantages of using internal im for communication between library staff.15 in addition, he provides an overview of the implementation and success of the openfire-based im network at mcmaster university. extending im beyond the reference desk | chan, ly, and meulemans 7 success of chat reference in libraries im-based chat reference gives libraries the means to more easily offer low-cost delivery of synchronous, real-time research assistance to their users, commonly referred to as “chat reference.” although libraries have used im for the last decade and many currently subscribe to questionpoint, a collaborative virtual reference service through oclc, two newer online services helped propel the growth of im-based chat reference. first available in 2006, the web-based meebo (www.meebo.com) made it much easier to use im for localized chat reference because library patrons were no longer required to have accounts on a proprietary network, such as aol or yahoo, to communicate with librarians.16 instead, meebo provided web widgets that allowed users to chat via the web browser. libraries could easily embed these widgets throughout their website and unlike questionpoint, meebo is free and does not require a subscription. librarians could answer questions using either their account on meebo’s website or by logging-in with a locally installed instant messaging client. in comparison to im-based chat reference, a number of libraries also found questionpoint difficult to use due to its complexity and awkward interface.17 in 2008, libraryh3lp (http://libraryh3lp.com) pushed the growth of im-based chat reference even further because it offered a low-cost, library-specific service that required little technical expertise to implement and operate. libraryh3lp improved on the meebo model by adding features such as queues, multi-user accounts, and assessment tools.18 im adds a more informal means of interaction that helps librarians build relationships with their users. several recent studies have shown that users respond positively to the use of im for chat reference. the illinois state university milner library found that switching from its older chat reference software to im increased transactions by 161 percent within one year.19 with the introduction of web-based im widgets pennsylvania state university library’s im-based chat reference grew from 20 percent to 60 percent of all virtual reference (vr), which includes email reference, in one year.20 a 2010 study of vr and im service at the university of guelph library found 71 percent user satisfaction with im compared to 70 percent satisfaction with vr overall.21 im use in academic libraries has become ubiquitous, and other types of libraries also use im to communicate with library patrons. case study california state university, san marcos (csusm) is a mid-size public university with approximately 9,500 students. csusm is a commuter campus with the majority of students living in north county san diego and offers many online or distance courses at satellite campuses. the csusm kellogg library has a robust chat reference service that is used by students on and off campus. the library has about forty-five employees including librarians, library administrators, and library assistants. the following section will discuss the meebo chat reference pilot, selection of openfire to replace meebo, implementation and customization of openfire, and evaluation of openfire for chat reference by librarians and as an internal network for all library personnel. information technology and libraries | september 2012 8 meebo chat reference pilot to examine the feasibility of using im for chat reference at csusm, the reference librarians initiated a pilot program using meebo (2008–9). a meebo widget was placed on the library’s homepage, the ask a librarian page, and on library research guides. within the first year of the pilot project, chat reference grew to more than 41 percent of all reference transactions.22 based on responses to user satisfaction surveys, 85 percent indicated they would recommend chat reference to other students, and 69 percent said they preferred it to other forms of reference services. chat reference is now an integral part of the library’s research assistance program, and im has become a permanent access point for students to contact reference librarians. although the new im service was successful, the pilot program uncovered a number of key shortcomings with meebo when used for chat reference; these shortcomings are documented in a case study by meulemans et al.23 these findings matched problems reported by other libraries who used meebo in their reference services.24 meebo is most suited for individual users who communicate one-to-one via im. for example, meebo chat widgets are specific to each meebo user, and it is not possible to share a single widget between multiple librarians. in addition, features such as message queues and message transfers, invaluable for managing a heavily used chat reference service, are not available in meebo. those features are essential for working with multiple, simultaneous incoming im messages, a common occurrence in virtual reference. other missing features included the lack of built-in transcript retention and lack of automated usage statistics.25 selecting openfire based on the need for a more robust chat reference system, the csusm reference librarians and the web development librarian explored other im options, especially open-source software. the web development librarian had previous experience using openfire at the university of alaska anchorage, for an internal library im network and investigated its capabilities to replace meebo as a chat reference tool. the desire to replace meebo for chat reference at csusm also provided the opportunity to pilot an internal im network. openfire, part of the suite of open-source instant messaging tools from jive software, was the only application that could easily fulfill both roles and offered a number of features that made it highly preferable when compared to other im-based chat reference systems. of its many features, one of the most valuable was the integration between openfire user accounts and our campus email system. being able to tap into the university’s email system meant automated configuration and updating of all staff accounts and contact lists. this removed the burden of individual account maintenance associated with external services such as meebo, libraryh3lp, and questionpoint. openfire supports internal im networks at educational institutions such as the university of pennsylvania, central michigan university, and university of california, san francisco. extending im beyond the reference desk | chan, ly, and meulemans 9 openfire could meet our im chat reference needs because it includes the fastpath plugin, a complete web-based chat management system available at www.igniterealtime.org/projects/openfire/plugins.jsp. this robust system incorporates important features such as message queues, message transfer, statistics, and canned messages. james cook university library in australia also chose to use openfire with fastpath plugin as its chat reference solution based on their need for those features.26 other institutions using fastpath and openfire in the role of chat reference or support include the university of texas, the oregon/ohio multistate virtual reference consortium, mozilla.com, and the university of wisconsin. when reviewing chat reference solutions, we considered the possibility of using chat modules available through drupal (http://drupal.org), the web content management system (cms) for our library website. the primary advantage of that option was complete integration with the library website and intranet. further analysis of the drupal option revealed that the available chat modules where too basic for our needs and that reconfiguration of our intranet and website to incorporate a workable chat reference system would require extensive time. in comparison to the implementation time associated with deploying the openfire system, using drupal-based chat modules did not provide a favorable cost-benefit ratio. while the proprietary libraryh3lp offered similar functionality for chat reference, its inability to integrate with our email system was clearly a deficit when compared to openfire. in libraryh3lp, it is necessary to create accounts for all library personnel in chat reference. fastpath does not have that requirement if you integrate openfire with your organization’s lightweight directory access protocol (ldap) directory. instead, the system will automatically create accounts for all library staff. furthermore, the administrative options and interface for libraryh3lp also did not compare favorably with that of fastpath. the fastpath interface for assigning users is more intuitive and the system generates a customizable chat initiation form for each workgroup (figures 1 and 2). oregon’s l-net and ohio’s knowitnow24x7 offer information about software requirements and an online demonstration of spark/fastpath.27 information technology and libraries | september 2012 10 figure 1. fastpath chat initiation form for csusm research help desk figure 2. fastpath chat initiation form for csusm media library for our requirements, openfire was clearly superior to the available systems for chat reference. its relatively simple deployment requirements and ease of setup helped make it our first choice for building a combined im network and chat reference system. in the following section, we will discuss the installation, customization, and assessment of our openfire implementation. openfire installation and configuration the openfire application is a free download from ignite realtime, a community of jive software. the program will run on any web server that has a windows, linux, or macintosh operating system. if configured as a self-contained application, openfire only requires java to be available on your web server. installation of the software is an automated process and system configuration is through a web-based setup guide. after the initial language selection form, the next step in the server configuration process is to enter the web server url and the ports through which the server will communicate with the outside world (figure 3). the third step provides fields for selecting the type of database to use with openfire and for inputting any information relating to your selection (figure 4). extending im beyond the reference desk | chan, ly, and meulemans 11 figure 3. openfire server settings screen figure 4. openfire database configuration form openfire uses a database to store information such as im network settings, user account information, and transcripts. database options include using an embedded database or connecting to an external database server. using the embedded database is the simpler option and is helpful if you do not have access to a database server. connecting to an external database server offers more control of the data generated by openfire and provides additional backup options. openfire works with a number of the more commonly used database servers such as mysql, postgresql, and microsoft sql server. in addition, oracle and ibm’s db2 are database options with additional free plugins from these vendors. we choose to use mysql because of our experience using it with other library web applications. if using the external database option, creating and configuring the external database before installing openfire is highly recommended. after choosing a database, the openfire configuration requires the selection of an authentication method for user accounts. one option is to use openfire’s internal authentication system. while the internal system is robust, it requires additional administrative support to manage the process of creating and maintaining user accounts. the recommended option is to connect openfire with your organization’s lightweight directory access protocol (ldap) directory (figure 5). ldap is a protocol that allows external systems to interact with the user information stored in an organization’s email system. using ldap with openfire is highly preferable because it simplifies access for your librarians and staff by automatically creating user accounts based on the information in your organization’s email system. library staff simply login with their work email or network account information; they are not required to create a new username and password. information technology and libraries | september 2012 12 figure 5. openfire ldap configuration form the last step in the configuration process is to grant system administrator access to the appropriate users. if using the ldap authentication method, you are able to select one or more users in your organization by entering their email id (the portion before the ampersand). the selected users will have complete access to all aspects of the openfire server. once the setup and configuration process is complete, the server is ready to accept im connections and route messages. reviewing the settings and options within the openfire system administration area is highly recommended. most libraries will likely want to adjust the configurations within the sections for server settings and archives. connecting the im network the second phase of the implementation process connected our library personnel with the im network using im software installed on their workstations. the openfire im server works with any multiprotocol im client (“multiprotocol” refers to support for simultaneous connections to multiple im networks) that provides options for configuring an xmpp or jabber account. some of the more popular im clients that offer this functionality include spark, trillian, miranda, and pidgin. based on our chat reference requirements, we choose to use spark (www.igniterealtime.org/projects/spark), an im client program designed to work specifically with the fastpath web chat service. spark comes with a fastpath plugin that enables users to receive and send messages to anyone communicating through the web-based fastpath chat widgets (more information on fastpath configuration is in the next section of this article). this plugin provides a tab for logging into a fastpath group and for viewing the status of the group’s message queues extending im beyond the reference desk | chan, ly, and meulemans 13 (figure 6). spark also includes many of the features offered by other im clients including built-in screen capture, message transfer, and group chat. figure 6. the fastpath plugin for spark library personnel were able to install spark on their own by downloading it from the ignite software website and launching the software’s installation package. the installation process is very simple and user-specific information is only required when spark is started for the first time. the fields required for login include the username and password of the user’s organizational email and the address of the im server. as part of our implementation process, we also provided library staff with recommendations regarding the selection and configuration of optional settings that might enhance their im experience. recommendations included auto-start of spark when loggingin to computer and the activation of incoming message signals, such as sound effects and pop-ups. on our openfire server, we had also installed the kraken gateway (http://kraken.blathersource.org) plugin to enable connections to external im networks. the gateway plugin works with spark to integrate library staff accounts on chat network such as google talk, facebook, and msn (an example of integrated networks is shown in figure 6.) by integrating meebo as well, librarians were able to continue using the meebo widgets they had embedded into their research guides and faculty profile pages. this allowed them to use spark to receive im messages rather than logging on to the meebo website. information technology and libraries | september 2012 14 configuring the fastpath plugin for chat reference a primary motivation for using openfire was the feature set available in the fastpath plugin. fastpath is a complete chat messaging system that includes workgroups, queues, chat widgets, and reporting. fastpath actually consists of two plugins that work together, fastpath service for managing the chat system and fastpath webchat for web-based chat widgets. both plugins are available as free downloads from the openfire plugins section of the ignite software website— www.igniterealtime.org/projects/openfire/plugins.jsp. to install fastpath, upload the its packages using the form in the plugins section of the openfire administrative interface. the plugins will automatically install and add a fastpath tab to the administrative main menu. the first step in getting started with the system is to create a workgroup and add members (figure 7). within each new workgroup, one or more queues are required to process and route incoming requests and each queue requires at least one “agent.” in fastpath, the term agent refers to those who will receive the incoming chat requests. figure 7. workgroup setup form in fastpath as work groups are created, the system automatically generates a chat initiation form which by default includes fields for name, email and question. administrators can remove, modify, and add any combination of field types including text fields, dropdown menus, multiline text areas, radio buttons, and check boxes. you may also configure the chat initiation form to require completion of some, all, or none of the fields. at csusm, our form (figures 1 and 2) includes name, question, email, and a dropdown menu for selecting the topic area of the user’s research and a field for the user to enter their question. the information in these fields allows us to quickly route incoming extending im beyond the reference desk | chan, ly, and meulemans 15 questions to the appropriate subject librarian. fastpath includes the ability to create routing rules that use the values submitted in the form to send messages to specific queues within a workgroup. in future, we may use the dropdown menu to automatically route questions to the subject specialist based on the student’s topic. there are two methods to make the fastpath chat widget available to the public. the standard approach embeds a presence icon on your webpage and provides automatic status updates. clicking on the icon displays the chat initiation form. for our needs we choose to embed the chat initiation form in our webpages (see appendix b for sample code). when the user submits the form, openfire routes the message to the next available librarian. on the librarian’s computer, the spark program plays a notification sound and displays a pop-up dialog. the pop-up dialog remains open until the librarian accepts the message, passes it on, or the time limit for acceptance is reached, in which case the message returns to the queue for the next available librarian. evaluation of openfire for enhanced chat reference the csusm reference librarians found fastpath and openfire to be much more robust than meebo for chat reference. the ability to keep chat transcripts and to retain metadata such as time stamps, duration of chats, and topic of research for each conversation is very helpful toward analyzing the effectiveness of chat research assistance and for statistical reporting. the automated recording of transcripts and metadata saved time when compared to meebo. using meebo, transcripts were manually copied into a microsoft word document and the tracking statistics of im interactions were kept in a shared excel spreadsheet. other useful features of fastpath were the capability of transferring of patrons to other librarians and having more than one librarian monitor incoming questions. furthermore, access to the database holding the fastpath data allowed us to build an intranet page to monitor real-time incoming im messages and their responses. however, some issues were encountered with the fastpath plugin when initiating chat connections. we experienced intermittent, random instances of dropped im connections and lost messages. while many of these lost connections were likely the result of user actions (accidentally closing the chat pop-up, walking away from the computer, etc.), others appear to have been due to problematic connections between the server and the user’s browser. to address these issues, we are now asking users to provide their email when they initiate a chat session. with user emails and our real-time chat monitoring system, we are able to follow up with reference patrons that experience im connection issues and provide research assistance via email. evaluation of openfire as an internal communication tool while the adoption of im as internal communication tool was highly encouraged, its use was not mandatory for all library personnel. based on the varied technical background of our staff and librarians, we recognized that some might find im difficult to integrate within their workflow or communication style and chose a soft-launch for our network. information technology and libraries | september 2012 16 in summer 2011, we conducted a survey of csusm library personnel (44 respondents, 99 percent of total staff) to evaluate im as an internal communication tool. (see appendix a for survey questions.) we found that 59 percent of staff use the internal im network while 85 percent use some type of im for web-based chat for work. of those who use internal im, 30 percent used it daily. while the survey was anonymous, anecdotal discussions indicate adoption rates are higher among library units where the work is technically oriented or instructional in nature, such as library systems and the information literacy program/reference. among the respondents who use im, 45 percent of library staff indicated they use it because it allows quick communication between those in the library and 39 percent like its informal nature of communication. twenty percent of total respondents preferred im to email and phone communications. two respondents use the internal im network but were dissatisfied with it and indicated it did not work well while one found it too difficult to use. an additional survey question was geared for staff members who do not use the internal im network at all (“why do you not use the library im network?”). this question was designed to find areas of possible improvement within our system to encourage greater use. survey respondents were allowed to select more than one reason. the most common reasons given by those who do not use the library im network were that they don’t feel the need (34 percent of nonusers), they mainly communicate with staff members who are also not utilizing the im network (18 percent), im does not work for their communication style (14 percent), and privacy concerns (14 percent). we believe more in-depth analysis is necessary to learn more regarding the perceived usefulness of im within our organization and to further its adoption. conclusion through additional training and user education, we hope to promote greater use of the openfire internal im network among those who work in the library. while 100 percent adoption of im as a communication tool is not a stated goal of our project, we believe that some staff have not realized the full potential of im for collaboration and productivity due to a lack of experience with this technology. in hindsight, additional training sessions beyond the initial introductory workshop to set up the spark im client may have increased the usage of im by staff. for example, providing more information on the library’s policies regarding internal im tracking and the configuration of our system may have alleviated concerns regarding privacy. in addition, we need to lead more discussions on the benefits of im for collaboration, lowering disruptions, and increasing effectiveness in the workplace. openfire and fastpath for chat reference has brought many new features that were previously unavailable to chat reference at csusm. the addition of queues, message transfer, and transcripts has enhanced the effectiveness of this service and eased its management. compared to the prior chat reference implementations that used questionpoint and meebo, this new system is more user friendly and robust. extending im beyond the reference desk | chan, ly, and meulemans 17 furthermore, the internal im network and its connection to web-based chat widgets offer the opportunity for building a library that is more open to users. library users could feasibly contact any library staff member, not just reference librarians, via im for help. we are testing this concept with a pilot project involving the csusm media library. they are staffing their own chat workgroup and a chat widget is now available on their website. in the future, we also hope to employ a chat widget for circulation and ill services, another public services area that frequently works with library users. it is important to note that the success of openfire and im in the library attracted the attention of other csusm instructional and student support areas. in spring 2011, instructional and information technology services (iits), which provides campus-wide technology services for faculty, staff, and students piloted an openfire-based im helpdesk service to assist users with technology questions and problems. as of fall 2011, the “ask an it technician” service is fully implemented and available on all campus webpages. discussions on the adoption of im for other campus student services, such as financial aid and counseling, have also occurred. in addition to being a contact point for students, im has potential to improve the internal communication within the organization. references 1. hee-kyung cho, matthias trier, and eunhee kim, “the use of instant messaging in working relationship development: a case study,” journal of computer-mediated communication 10, no. 4 (2005), http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full (accessed aug. 1, 2011). 2. bonnie a. nardi, steven whittaker, and erin bradner, “interaction and outeraction: instant messaging in action,” in proceedings of the 2000 acm conference on computer supported cooperative work (new york, new york: acm press, 2000),79–88. 3. ellen isaacs et al., “the character, functions, and styles of instant messaging in the workplace,” in proceedings of the 2002 acm conference on computer supported cooperative work (new york, new york: acm press, 2002), 11–20. 4. victor m. gonzález and gloria mark, “constant, constant, multi-tasking craziness: managing multiple working spheres,” in proceedings of the sigchi conference on human factors in computing systems (new york, new york: acm press, 2004), 113–20. 5. r. kelly garrett and james n. danziger, “im = interruption management? instant messaging and disruption in the workplace,” journal of computer-mediated communication 13, no. 1 (2007), http://jcmc.indiana.edu/vol13/issue1/garrett.html (accessed jun. 15, 2011). 6. nardi, whittaker, and bradner, “interaction and outeraction,” 83. 7. albert h. huang, shin-yuan hung, and david c. yen, “an exploratory investigation of two internet-based communication modes,” computer standards & interfaces 29, no. 2 (2006): 238–43. http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full http://jcmc.indiana.edu/vol13/issue1/garrett.html information technology and libraries | september 2012 18 8. anabel quan-haase, joseph cothrel, and barry wellman, “instant messaging for collaboration: a case study of a high-tech firm,” journal of computer-mediated communication 10, no. 4 (2005), http://jcmc.indiana.edu/vol10/issue4/quan-haase.html (accessed jun. 12, 2011). 9. carol x. j. ou et al., “empowering employees through instant messaging,” information technology & people 23, no. 2 (2010): 193–211. 10. cho, trier, and kim, “instant messaging in working relationship development.” 11. lynn wu et al., “value of social network—a large-scale analysis on network structure impact to financial revenue of information technology consultants” (paper presented at winter information systems conference, salt lake city, ut, feb. 5, 2009). 12. pruthikrai mahatanankoon, “28p. exploring the impact of instant messaging on job satisfaction and creativity,” conf-irm 2010 proceedings (2010). 13. ashish gupta and han li, “understanding the impact of instant messaging (im) on subjective task complexity and user satisfaction,” in pacis 2009 proceedings. paper 10, http://aisel.aisnet.org/pacis2009/1; and stephanie l. woerner, joanne yates, and wanda j. orlikowski, “conversational coherence in instant messaging and getting work done,” in proceedings of the 40th annual hawaii international conference on system sciences, http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 (2007). 14. marshall breeding, “instant messaging: it’s not just for kids anymore,” computers in libraries 23, no. 10 (2003): 38–40. 15. john fink, “using a local chat server in your library,” feliciter 56, no. 5 (2010): 202–3. 16. william breitbach, matthew mallard, and robert sage, “using meebo’s embedded im for academic reference services: a case study,” reference services review 37, no. 1 (2009): 83–98. 17. cathy carpenter and crystal renfro, “twelve years of online reference services at georgia tech: where we have been and where we are going,” georgia library quarterly 44, no. 2 (2007), http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 (accessed aug. 25, 2011); and danielle theiss-white et al., “im’ing overload: libraryh3lp to the rescue,” library hi tech news 26, no. 1/2 (2009): 12–17. 18. theiss-white et al., “im’ing overload,” 12–17. 19. sharon naylor, “why isn’t our chat reference used more?” reference & user services quarterly 47, no. 4 (2008): 342–54 20. sam stormont, “becoming embedded: incorporating instant messaging and the ongoing evolution of a virtual reference service,” public services quarterly 6, no. 4 (2010): 343–59. http://jcmc.indiana.edu/vol10/issue4/quan-haase.html http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 extending im beyond the reference desk | chan, ly, and meulemans 19 21. lorna rourke and pascal lupien, “learning from chatting: how our virtual reference questions are giving us answers,” evidence based library & information practice 5, no. 2 (2010): 63–74. 22. pearl ly and allison carr, “do u im?: using evidence to inform decisions about instant messaging in library reference services” (poster presented at the 5th evidence based library and information practice conference, stockholm, sweden, june 29, 2009), http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf (accessed august 1, 2011). 23. yvonne nalani meulemans, allison carr, and pearl ly, “from a distance: robust reference service via instant messaging,” journal of library & information services in distance learning 4, no. 1 (2010): 3–17. 24. theiss-white et al., “im’ing overload,” 12–17. 25. meulemans, carr, and ly, “from a distance,” 14–15 26. nicole johnston, “improving the reference and information experience of students in regional areas—does an instant messaging service make a difference?” (paper presented at 4th alia new librarians symposium, december 5–6, 2008, melbourne, australia), http://eprints.jcu.edu.au/2076(accessed august 17, 2011); and alan cockerill, “open source for im reference: openfire, fastpath and spark” (workshop presented at fair shake of the open source bottle, griffith university, queensland college of art, brisbane, australia, november 20, 2009), http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 (accessed august 4, 2011). 27. oregon state multistate collaboration, “multi-state collaboration: home,” http://www.oregonlibraries.net/multi-state (accessed august 16, 2011). http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf http://eprints.jcu.edu.au/2076 http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 http://www.oregonlibraries.net/multi-state information technology and libraries | september 2012 20 appendix a library instant messaging (im) usage survey the information you submit is confidential. your name and campus id are not included with your response. which of the following do you use . . . for work for personal library’s im network (spark) meebo msn yahoo gtalk facebook or other website-specific chat system im app on my phone trillian, pidgin or other im aggregator skype i don’t use im or web-based chat other if you selected other, please describe: ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 21 on average, how often do you communicate via im or web-based chat at work? ● several times a day ● almost daily ● several times a week ● several times a month ● never how often do you use im or web-based chat to . . . 5—often 4 3— sometimes 2 1—never discuss work-related topic socialize with co-worker answer questions from library users talk about non-work related topic request tech support other if you selected other, please describe: ____________________________________________________________________ if you use im to communicate at work, what do you like about it? ● allows for quick communication with others in the library ● facilitates informal conversation ● students like to use it to ask library related questions ● i prefer im over phone or email ● other: information technology and libraries | september 2012 22 why do you not use the library im network? ● don’t feel the need ● the people i usually talk to aren’t on it ● does not work well ● never get around to it . . . but would like to ● it doesn’t work for my communication style ● the system is too difficult to use ● privacy concerns ● other: additional comments? ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 23 appendix b iframe code for embedding fastpath chat widget many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified ap... literature review instant messaging in the workplace success of chat reference in libraries case study meebo chat reference pilot selecting openfire openfire installation and configuration connecting the im network configuring the fastpath plugin for chat reference evaluation of openfire for enhanced chat reference evaluation of openfire as an internal communication tool conclusion references appendix a library instant messaging (im) usage survey appendix b 62 information technology and libraries | june 2011 jason vaughan and kristen costello management and support of shared integrated library systems the second major hardware migration occurred, and an initial memorandum of understanding (mou) was drafted by the unlv libraries. this mou is still used by the libraries. the mou was discussed with all partners and ultimately signed by the director of each library. since the mou was signed nearly a decade ago, the system has continued to grow by all measures—size of the database, number of users, number of software modules comprising the complete system, and the financial and staff commitment toward support and maintenance. despite the emergence of a large number of other network-based technologies critical to library operations and services, the ils remains a critical system that supports many library operations. the research described in this paper developed in part because there is a dearth of published survey-based research of shared ils management and financial support. this article interweaves local existing practices with research findings. for brevity’s sake, the system shared by the unlv university libraries and four additional partners will be referred to as unlv’s system. to provide a relative sense of the footprint of each partner on the system, various measures can be used (see figure 1). ■■ survey method in april 2010, the authors administered a 20-question survey to the innovative user’s group (iug) via the group’s listserv. the survey focused on libraries that are part of a consortial or otherwise shared innovative ils. the innovative user’s group is the primary user’s group associated with the innovative ils and suite of products. the iug hosts a busy listserv, coordinates the annual north american conference devoted solely to the innovative system, and provides innovative customer-driven enhancement requests. to prevent multiple individuals from the same consortium responding to the survey, instructions indicated that only one individual from the main institution hosting the system should officially respond. given the anonymity of the survey and the desire to provide confidentiality, there is the possibility that some survey responses refer to the same system. the survey consisted primarily of multiple choice, “select all that apply,” and free-text response questions. the survey was divided into four broad topical areas: (1) background information; (2) funding; (3) support; and (4) training, professional development, and planning. the survey was open for a period of three weeks. because respondents could choose to skip questions, the number of responses received per question varied. on average, 43 individual responses were received for each question. innovative currently has more than 1,200 millennium ils installations.2 not all of those installations support multiple, administratively separate library entities. it is unknown the university of nevada, las vegas (unlv) university libraries has hosted and managed a shared integrated library system (ils) since 1989. the system and the number of partner libraries sharing the system has grown significantly over the past two decades. spurred by the level of involvement and support contributed by the host institution, the authors administered a comprehensive survey to current innovative interfaces libraries. research findings are combined with a description of unlv’s local practices to provide substantial insights into shared funding, support, and management activities associated with shared systems. s ince 1989, the university of nevada, las vegas university libraries has hosted and managed a shared integrated library system (ils). currently, partners include the university of nevada, las vegas university libraries (consisting of one main and three branch libraries, and hereafter referred to as unlv libraries); the administratively separate unlv law library; the college of southern nevada (a community college system consisting of three branch libraries); nevada state college; and the desert research institute. the original ils installation included just the unlv libraries and the clark county community college (now known as the college of southern nevada). the desert research institute joined in the early 1990s, the unlv law library joined with the establishment of the william j. boyd school of law in 1998, and, finally, nevada state college joined upon its creation in 2002. over time, the technological underpinnings of the ils have changed tremendously and have migrated firmly into a webbased environment unknown in 1989. the system was migrated to innovative interfaces’ current java-based platform, millennium, beginning in 1999. since the original installation, there have been three major full hardware migrations, in 1997, 2002, and 2009. over time, regular innovative software updates, as well as additional purchased software modules, have greatly extended both the staff and end user functionality of the ils. in early 2001, unlv and its partners conducted a marketplace assessment of ils vendors catering to academic customers.1 the assessment reaffirmed the consortia’s commitment to innovative interfaces. shortly thereafter, jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. kristen costello (kristen.costello@unlv.edu) is systems librarian, university of nevada las vegas. management and support of shared integrated library systems | vaughan and costello 63 partners originally purchased the system together; 20 (38.5 percent) indicated they purchased the system with some of their current existing partners, while 9 (17.3 percent) indicated they as the main institution originally and solely purchased the system. several of the entities sharing the unlv libraries’ system did not even exist when the ils was originally purchased; only two of the current partners shared the original purchase cost of the system. another background question sought to understand how partners potentially individualize the system despite being on a shared platform. innovative, and likely other similar ils vendors, offers several products to help libraries better manage and control their holdings and acquisitions. of potential benefit to staff operations and workflow, innovative offers the option to have multiple acquisitions and/or serials control units, which provide separate fund files and ranges of order records for different institutions sharing the ils system. of 51 responses received, 44 respondents (86.3 percent) indicated they had multiple acquisitions and serials units and 7 (13.7 percent) do not. innovative offers two web-based discovery interfaces for patrons: the traditional online public access catalog, known as webpac, and their version of a next-generation discovery layer, known as encore. of potential benefit to staff as well as patrons, innovative offers “scoping” modules that help patrons using one of the web-based discovery interfaces, as well as staff using the millennium staff modules. the scoping module allows holdings segmentation by location or material type. scopes allow libraries to define their collections and offer their patrons the option to search just the collection of their applicable library. forty-six (88.5 percent) of the 52 respondents indicated they use scoping and 6 (11.5 percent) do not. unlv how many shared innovative library systems exist. while a true response rate cannot be determined, such a measure is not critical for this research. the survey questions with summarized results are provided in appendix a. ■■ survey background unlv’s system, with only five unique library entities, is a “small” system when compared with survey responses. survey respondents indicated a range from 2 to 80 unique members sharing their system. of the 48 responses received for this background question, 26 (54 percent) indicated 10 or fewer partners on the system. seven (14.6 percent) indicated 40 or more partners. the average number of partners sharing an ils implementation was 18 and the median was 8.5. there can be varying levels of partnership within a shared ils system. unlv’s instance is a rather informal partnership. some survey respondents indicated the existence of a far more structured or dedicated support group not directly associated with any particular library. one respondent noted they have a central office comprised of an executive director and two additional staff, responsible for ils administration; this central office reports to a board of directors, comprised of library directors for each member library. another indicated they have a central office responsible not only for the ils, but for other things such as wide and local area networks and workstation support. one respondent indicated that they are actually a consortium of consortia, with 9 hosts each comprised of anywhere from 4 to 11 libraries. twenty-three respondents out of 52 (44.2 percent) indicated that they and all of their current existing full-time library staff bibliographic records item records order records patron records staff login licenses unlv libraries 105 (70.9%) 1,494,890 (78.2%) 1,906,225 (81.1%) 74,223 (58.4%) 40,788 (59.6%) 85 (69.1%) unlv law library 13 (8.8%) 246,678 (12.9%) 243,788 (10.4%) 29,921 (23.5%) 2,034 (3%) 13 (10.6%) college of southern nevada 27 (18.2%) 146,118 (7.6%) 175,862 (7.5%) 22,142 (17.4%) 23,876 (34.9%) 20 (16.3%) nevada state college 1 (.7%) 17,787 (.9%) 17,979 (.8%) 841 (.7%) 1,718 (2.5%) 3 (2.4%) desert research institute 2 (1.4%) 5,396 (.3%) 5,361 (.2%) 0 (0%) 24 (<.1%) 2 (1.6%) figure 1. various measures of ils footprints for unlv’s shared ils (percentage of overall system) note: “staff login licenses” refers to the number of simultaneous staff users each institution can have on the system at any given time. 64 information technology and libraries | june 2011 share of funding toward annual maintenance based on their number of staff licenses, as shown in figure 1. ■■ funding support from partners mous appear to include funding and budgeting information more than any other discrete topic. direct support costs can include the maintenance support costs paid to one or more vendors, costs for additional vendor authored software modules purchased in addition to the base software, and, perhaps, licensing costs associated with a database or operating system used by the ils (e.g., an oracle license for oracle based ils systems). there are many parameters by which costs could be determined for partners, and, given the dearth of published research on the topic, a chief focus of this research sought more information on what factors were used by other consortia. the authors brainstormed 10 elements that could potentially figure into the overall cost sharing method. thirty-eight respondents provided information on factors playing a role in their cost sharing arrangements, illustrated in figure 2. respondents could mark more than one answer for this question, as more than one factor could be involved. the top two factors relate directly to vendor costs— whether annual support costs or acquisition of new vendor software. hardware placed third in overall frequency; for innovative and likely for other ils systems, ils hardware can be purchased from the vendor or an approved platform can be sourced from a reseller directly. support costs from third parties and the number of staff login ports were each identified as a factor by more than a third of all respondents. ■■ software purchases depending on the software, additional modules extending the system capabilities can benefit a single partner, or, in unlv’s experience, all partners on the system. traditionally, the unlv libraries have had the largest operating budget of the group, and a majority of new software requests have come internally from unlv libraries staff. over the past 20 years, the unlv libraries have fully funded the initial purchase costs of a majority of the software extending the system, regardless of whether it benefits just the unlv libraries or all system partners. there are numerous exceptions where the partner libraries have contributed funding, including significant start-up costs associated with the unlv law library joining the system in 1998 and the addition of nevada state college in 2002. in both instances, those bodies funded required and recommended software directly applicable has multiple serials and acquisitions units as well as multiple scopes configured to help segment the records for each entities’ particular collection. innovative offers various levels of maintenance support. unlv’s level of support includes the vendor supplying services such as application troubleshooting resolution, software updates, and some degree of operating system and hardware configuration and advice. unlv also contracts with the hardware vendor for hardware maintenance and underlying operating system support. the unlv libraries have had the opportunity to hire fully qualified and capable technical staff to provide a high level of support for the ils. unlv’s level of vendor support has evolved from an original full turnkey installation with innovative providing all support to a present level of more modest support. nearly half of all survey respondents, 25 of 52 (48.1 percent) indicated they had a turnkey arrangement with innovative; the remaining 27 respondents had a lesser level of support. maintenance and support obviously carry a cost with one or more third party providers. the majority of the respondents, 40 of 51 (78.4 percent), indicated there is a cost-sharing structure in place where maintenance support costs related to the ils are spread across partner libraries. six respondents (11.8 percent) indicated the main institution fully funds the maintenance support costs. the unlv libraries drafted the first and current mou in 2002 for all five entities sharing the ils system. thirty-five of 51 survey respondents (68.6 percent) indicated they, too, have a mou in place. unlv’s mou is a basic document, two pages in length, split into the following sections: background; acquisition of new or additional hardware; acquisition of new or additional software; annual maintenance associated with the primary vendor and third party suppliers and, importantly, the associated cost allocation method for how annual support costs are split between the partners; how new products are purchased from the vendor; and management and support responsibilities of the hosting institution. many of the survey respondents provided details on items contained in their own mous, which can be clustered into several broad categories. these include budgeting, payments, funding formulas; general governance and voting matters; support (e.g., contractual service responsibilities, responsibilities of member libraries); equipment (e.g., title and use of equipment, who maintains equipment); and miscellaneous. this latter category includes items such as expectations for record quality; network requirements/ restrictions; fine collection; and holds management. the majority of unlv’s mou addresses shared costs for annual maintenance. unlv’s cost-sharing structure is simple. the system has a particular number of associated staff (simultaneous login) licenses, which have gradually increased as the libraries have grown. logins are separated by institution, and each member is assessed their management and support of shared integrated library systems | vaughan and costello 65 annual maintenance bill and all partners help maintain new software acquisitions by contributing toward the annual maintenance. regarding new software acquisitions, cost-sharing practices varied between 44 respondents providing information in the survey. eight (18.2 percent) indicated there is consultation with other partners and there is some arrangement to share costs between the majority or all partners sharing the system. two respondents (4.5 percent) indicated the institution expressing the initial interest in the product fully funds the purchase. nineteen respondents (43.2 percent) indicated that they have had instances of both these scenarios (shared funding and sole funding). two respondents (4.5 percent) indicated they could not recall ever adding any additional software. thirteen respondents (29.5 percent) offered details to their operation such as additional serials and accounting units (for the law library), check-in and order records, and staff licenses. in addition, when the system was migrated from the aging text-based system (innopac) to the current millennium java-based gui system in 1999, the current partners contributed toward the upgrade cost based on number of staff licenses. partner institutions have continued to fund items of sole benefit to their operation, such as adding staff licenses or required network port interfaces associated with patron self-check stations installed at their facilities. during the 2000s, the unlv libraries have fully funded a majority of software of potential benefit to all partners, such as the electronic resource management module, the encore next generation discovery platform, and various opac/encore enhancements. software additions typically increase the figure 2. cost-sharing formula factors t h e a m o u n t o f th e o ve ra ll ye a rl y in n o va ti ve in te rf a c e s m a in te n a n c e /s u p p o rt i n vo ic e t h e a m o u n t o f a n y a d d it io n a l 3 rd p a rt y m a in te n a n c e / su p p o rt a g re e m e n ts a ss o c ia te d w it h t h e i n n o va ti ve sy st e m ( su c h a s c o n tr a c ts w it h t h e h a rd w a re m a n u fa c tu re r— h p, s u n m ic ro sy st e m s [o ra c le ], e tc .) t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d i n n o va ti ve m o d u le s/ p ro d u c ts t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d h a rd w a re a ss o c ia te d w it h t h e i n n o va ti ve s ys te m ( su c h a s a se rv e r, a d d it io n a l d is k s p a c e , b a c k u p e q u ip m e n t, e tc .) t h e n u m b e r o f in c id e n t re p o rt s (o r ti m e s p e n t) , b y p e rs o n n e l a t th e m a in i n st it u ti o n r e la te d t o r e se a rc h , tr o u b le sh o o ti n g , e tc . su p p o rt i ss u e s re p o rt e d b y p a rt n e r in st it u ti o n s t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y in st it u ti o n f t e t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f b ib o r it e m r e c o rd s th e p a rt n e r’ s in st it u ti o n h a s in t h e in n o va ti ve d a ta b a se t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f st a ff lo g in p o rt s d e d ic a te d t o t h e p a rt n e r lib ra ry t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f u se r se a rc h e s c o n d u c te d f ro m i p r a n g e s a ss o c ia te d w it h th e p a rt n e r in st it u ti o n t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y th e n u m b e r o f p a tr o n r e c o rd s w h o se h o m e l ib ra ry i s a ss o c ia te d w it h t h e p a rt n e r in st it u ti o n 66 information technology and libraries | june 2011 applied, the number of staff users has increased significantly, and the system was migrated to an underlying oracle database in 2004. since the original system was purchased in 1989 and fully installed in 1990, the central, locally hosted server has been replaced three times, in 1997, 2002, and 2009. partners contributed toward the costs of the server upgrades in 1997 and 2002, while the unlv libraries fully funded the 2009 upgrade. software and hardware components comprising the backup system have been significantly enhanced with a modern system capable of the speed, capacity, and features needed to perform appropriately in the short backup window available each night. unlv funded the initial backup software and hardware, and the partner institutions contribute toward the annual maintenance associated with the backup equipment and software. one survey question focused on major central infrastructure supporting the ils (defined as items exceeding $1,000 and with several examples listed). the question did not focus on hardware that could be provided by ils vendors benefiting a single partner, such as self-check stations or inventory devices. fourteen (31.8 percent) of the 44 respondents indicated that if major new hardware was needed, there was consultation with other partners, and, if purchased, a cost-sharing agreement was arranged. two respondents (4.5 percent) indicated the institution expressing the initial interest fully funds the purchase and seven respondents indicated they’ve had instances in the past of both these scenarios. three respondents (6.8 percent) indicated their shared system hardware had never been replaced or upgraded to their knowledge. nineteen respondents provided information on alternate scenarios or otherwise more details as to local practice. several indicated a separate fund is maintained solely for large ils system-related improvements or ils related purchases. revenue for these funds can be built up over time through maintenance and use payments by partner libraries or by a small additional fee earmarked for future hardware replacement needs collected each year. one respondent indicated they have been able to get grant funds to cover major purchases. with few exceptions, the majority of free text responses indicated that costs for major purchases were shared by partners or otherwise funded by the central consortium or cooperative agency. as with regular annual maintenance and new software purchases, various elements can determine what portion of hardware replacement costs are borne by partner libraries. this includes number of staff licenses (21.9 percent of responses), institutional fte count (15.6 percent), number of bibliographic or item records (15.6 percent), and number of patron records (9.4 percent). twenty respondents provided additional information. several indicated that the costs are split evenly across all partners. several indicated that population served was a factor. others reiterated that costs for central hardware on other scenarios. several indicated that if a product is directly applicable to only one library, such as self-check interfaces and additional acquisition units, then the library in need fully funds the purchase, which mirrors the local practice at unlv. several respondents indicated that if a product benefits all libraries, then costs are shared equally. one respondent indicated that the partner libraries discuss the potential item, and collectively they may choose not to purchase, even if one or more partners are very interested. in such cases, those partners have the option to purchase the product and must agree to make it available to all partners. several respondents indicated that, as the largest entity using the shared system, they generally always purchased new software for their operation as needed, with the associated benefit that the other partners of the system were allowed to use the software as well. three respondents reiterated that a central office funds add-on modules, in one case from funding set aside each year for system improvements. a fourth respondent indicated that a “joiners fee” fund, built up from new members joining the system, allows for the purchase of new software. clearly there are many scenarios of how new software is funded. generally, regardless of funding source, sole or share, if a product can benefit all partners, it’s allowed to do so. thirty-six survey respondents provided details on what factors determine how much each partner contributes toward new software purchases. seven respondents (19.4 percent) indicated the number of staff licenses plays a role (as in the unlv model). three respondents (8.3 percent) indicated that institution fte played a role, while three other respondents indicated that the number of partner bibliographic/item records played a role. the majority of respondents, 25 (69.4 percent) provided alternate scenarios or otherwise more information. nine of these 25 respondents indicated costs were split evenly across all partners. several indicated that the formula used for determining maintenance costs was also applied to new software purchases. four respondents indicated that the library service population was a factor. two indicated that circulation counts were a factor. one indicated that it’s negotiated on a per purchase basis, based on varying factors. ■■ hardware purchases hardware needs related to the underlying infrastructure, such as server(s), disk space, and backup equipment increases as the ils grows. unlv’s ils installation has grown tremendously. new software modules have been purchased, application architecture changes occurred with the release of the millennium suite in the late 1990s, regular annual updates to the system software have been management and support of shared integrated library systems | vaughan and costello 67 each partner institution. each module coordinator served as the contact person charged with maintaining familiarity with the functions and features of a particular module, testing enhancements within new releases, keeping other staff informed of changes, and alerting the system vendor of any problems with the module. annually, module coordinators were to consider new software and prioritize and recommend ils software the library should consider purchasing. module coordinators were tasked to maintain a system-wide view of the ils and alert others if they discovered problems or made changes to the ils that could affect other areas of the system. in addition, module coordinators were encouraged to subscribe to the iug listserv to monitor discussions and to maintain awareness of overall system issues. all staff had access to the system’s user manual but if they had questions on system features or functions, the module coordinator served as an additional resource. in addition, any bug reports were provided to the most appropriate module coordinator, who would contact innovative. the unlv systems staff, which has grown over time and is now part of the library technologies division, was responsible for all hardware and networking problems, and for scheduling and verifying nightly data backups. the systems department coordinated any new software installations with the module coordinators group, library staff, and library partners. in 2006, the unlv libraries reorganized and hired a dedicated systems librarian focused on the ils. the systems librarian’s principal job responsibility is to serve as the central administrator and site coordinator of the unlv libraries’ shared ils. responsibilities include communicating with colleagues regarding current system capabilities, monitoring vendor software developments, monitoring how other libraries utilize their innovative systems, and recommending enhancements. the systems librarian is the site contact with innovative and coordinates and monitors support calls, software and patch upgrades, and new software module installations. the position serves as the contact person for the shared institutions whenever they have questions or issues with the ils. the systems librarian has taken over much of the work previously coordinated through the module coordinators group. while the formal module coordinators group no longer exists, module experts still provide assistance as needed, and consultation always occurs with partners on system-wide issues as they arise. unlv is not unique in how it manages their ils. in the survey results, 36 respondents (87.8 percent) indicated there is a dedicated individual at the main institution who has a primary responsibility of overseeing the ils. to help clarify the responses, “primary responsibility” is defined as individuals spending more than half their time devoted to support, research, troubleshooting, and system administration duties related to the ils. the authors replacements are determined by the same formula used for assessing the share of annual maintenance. ■■ additional purchases the last funding-related survey question asked if ongoing content enrichment services were subscribed to, and if so, to describe how the cost share amount is determined for partner libraries. content enrichment services can provide additional evaluative content such as book cover images, table of contents (toc), and book reviews. unlv subscribes to a toc service as well as an additional service providing book covers, reviews, and excerpts. partner institutions contribute to the annual service charge associated with the toc service and pay for each record enhanced at their library. unlv fully funds the book cover/review/excerpt service that benefits all partners. fourteen of the 43 survey respondents (32.6 percent) indicated they did not subscribe to enrichment services. twelve respondents (27.9 percent) indicated they had one or more enrichment services and that the costs were fully funded by the main institution. seventeen respondents (39.5 percent) subscribe to enrichment services and that the costs are shared. several indicated the existing cost-sharing formula used for other assessments (annual maintenance, hardware, or nonsubscription-based software) is also used for the ongoing enrichment services. one respondent indicated they maintain a collective fund for enrichment services and estimate the cost of all shared subscriptions; this figure is integrated into the share each institution contributes to the central fund annually. one respondent indicated that their system only uses free enrichment services. ■■ support the next section of the survey addressed staff support efforts related to management of the ils. twenty years ago when unlv installed its ils, staff support included one librarian and one additional staff; both focused on various aspects of system support, from maintaining hardware to working with the vendor, in addition to having other primary job responsibilities completely unrelated to the ils. in addition, over time, functional experts developed for particular modules of the system, such as cataloging, acquisitions, circulation, and serials control. this group of functional experts eventually became known as the unlv innovative module coordinators group, which was chaired by the head of the library systems department. this group met quarterly and included experts from unlv as well as one representative from 68 information technology and libraries | june 2011 solely by the main library. typical system administration activities include managing and executing mid-release and major release software upgrades (95.2 percent of all respondents indicated the main library is solely responsible); managing, coordinating, and scheduling new products for installation (95.2 percent); monitoring disk space (95 percent); and scheduling and monitoring backups (92.9 percent). unlv’s ils support model is very similar to the survey results. the systems librarian at unlv manages all software upgrades, as well as coordinating and scheduling new ils software product and module installs. the library technologies division monitors and schedules the nightly backups and diskspace usage. certain unlv libraries staff and selected individuals from the partner libraries are authorized to open support calls with the system vendor, although the systems librarian often handles this activity herself. other functions, such as maintaining the year-to-date and last year circulation statistics are also performed by the unlv libraries systems librarian. updating circulation parameters are tasks best performed by each of the created a list of 20 duties related to ils system administration and asked respondents to indicate whether: the main library or a central consortial or cooperative office dedicated to the ils handles this particular duty; the duty is shared between the main library and partner libraries; or the duty is handled by just a partner library. as illustrated in figure 3, the survey results overwhelmingly show that the main library in a shared system provides the majority of system administration support. only two tasks were broadly shared between the main library and partner libraries; maintenance of the institution’s records (bibliographic, item, patron, order, etc.) and maintaining network and label printers. other shared tasks included changes to the circulation parameters tables (e.g., configuring loan rules and specifying open hours and days closed tables for materials they themselves circulate) with 40.5 percent of the respondents indicating this as a shared responsibility, opening support calls with the vendor (38.1 percent), monitoring bounced export and fts mail (33.3 percent), and account management (31 percent). the more typical system administration activities are done a c c o u n t m a n a g e m e n t (c re a te n e w / d e le te a c c o u n ts ; m ill e n n iu m a u th o ri za ti o n s) m a n a g e a n d e x e c u te i n n o va ti ve m id -r e le a se a n d m a jo r re le a se s o ft w a re u p g ra d e s m a n a g e , c o o rd in a te a n d s c h e d u le n e w in n o va ti ve s o ft w a re p ro d u c t in st a lla ti o n s s c h e d u le a n d m o n it o r b a c k u p s w ri te s c ri p ts t o a u to m a te p ro c e ss e s (i. e ., c ir c u la ti o n o ve rr id e s re p o rt , sy st e m s ta tu s re p o rt s, e tc .) p e rf o rm r e vi e w f ile m a in te n a n c e a n d t a k e a c ti o n s h o u ld a ll fi le s fi ll o p e n s u p p o rt c a lls w it h i n n o va ti ve m o n it o r st a tu s o f o p e n c a lls ; se rv e a s lia is o n w it h i n n o va ti ve f o r re so lu ti o n o f su p p o rt c a lls m a in ta in y e a rto -d a te /l a st y e a r c ir c u la ti o n st a ti st ic c o u n te rs m o n it o r sy st e m m e ss a g e s m o n it o r d is k s p a c e u sa g e m o n it o r b o u n c e d e x p o rt a n d f t s m a il m a in ta in c o d e t a b le s (f ix e d l e n g th , va ri a b le le n g th , e tc .) u p d a te c ir c u la ti o n p a ra m e te rs t a b le s (lo a n ru le s, h o u rs o p e n , d a ys c lo se d , e tc .) s e t u p , m o n it o r a n d t ro u b le sh o o t n o ti c e s is su e s w ri te o r m o d if y lo a d t a b le s fo r n e w r e c o rd lo a d in g m a in ta in s ys te m p ri n te rs ( la b e l, n e tw o rk e d la se r p ri n te rs ) p ro vi d e m a in te n a n c e o n r e c o rd s (p a tr o n , b ib , it e m , e tc .) m a n a g e s ys te m s e c u ri ty t h ro u g h i n n o va ti ve sy st e m s e tt in g s a n d /o r h o st b a se d o r n e tw o rk b a se d f ir e w a lls p ro vi d e e m e rg e n c y (o ff h o u rs ) re sp o n se t o re p o rt s o f in n o va ti ve d o w n ti m e o r se rv e r h a rd w a re f a ilu re s figure 3. systems administration / support responsibilities management and support of shared integrated library systems | vaughan and costello 69 and definition of policies and procedures. some groups provide recommendations to a larger executive board for the consortia. the meeting frequency of these groups is as varied as the libraries. some groups meet quarterly (33.3 percent) or monthly (20 percent) but the majority meet at other frequencies (40 percent), such as every other month or twice a year. some libraries use e-mail to communicate as opposed to having regular in-person meetings. in addition to a standing committee focused on the ils, and similar to unlv’s experience, libraries may have finite working groups to implement particular products. ■■ training, professional development, and planning the survey also focused on training, professional development, and planning activities related to the ils. there are many methods that library staff can use to stay current with their ils. most training methods typically include in-person workshops or online tutorials, as well as other venues for professional development, such as conference attendance. the authors were interested in how libraries sharing an ils determined training needs and who was responsible for the training. the survey results showed that libraries value a variety of training opportunities, partner institutions, with advice and assistance as necessary provided by the systems librarian. the authors were interested if an ils oversight body exists with other shared systems, and, if so, what issues are discussed. responses indicated that a variety of groups exist, and, in some instances, multiple groups may exist within one consortia (some groups have a more specific ils focus and others a more tangential involvement). as illustrated in figure 4, a minority of respondents, 11 of 41 (26.8 percent), indicated that they do not have a group providing ils oversight. if such a group exists, respondents were allowed to select various predefined duties performed by that group. twenty-three respondents indicated the group discusses purchasing decisions. respondents also indicated that such a group also discusses the impact of the vendor enhancements offered by mid-release and regular full-releases (19), and when to schedule the upgrades (12). the absence of an oversight group doesn’t imply that consultation doesn’t occur, rather, it may be the responsibility of an individual as opposed to an effort coordinated by a group. some libraries also have module-driven committees, which disseminate information, introduce new ideas, and try to promote cohesiveness throughout the consortium. other duties that such an oversight group may focus on include workflow issues, discussion of system issues, figure 4. issues discussed by ils oversight body updates on unresolved problem calls with innovative discussion on enhancements offered by mid-release and regular full release software upgrades and their impact (positive/ negative) on users of the system scheduling mid-release/ full release software upgrades prioritizing and selecting choices related to the innovative user’s group enhancements ballot for your installation discussion of potential new software/ modules to purchase from innovative n/a—an oversight group, body, or committee does not exist related to the oversight of the innovative system other 70 information technology and libraries | june 2011 specifically regarding cost sharing, support, and rights and responsibilities. in conducting this background research, a paucity of published literature was observed, and thus the authors hope the findings above may help other established consortia, who may be interested in reviewing or tweaking their current mous or more formalized agreements likely in place. it may also provide some considerations for libraries considering initiating a shared ils instance, something that, given the current recession, may be a topic to consider. given that nearly a decade has passed since the original unlv mou was drafted and agreed to, several revisions will be proposed and drafted. this includes formalization of how costs are divided for enrichment services (new since the original mou), and formalization in writing of the coordination role of the systems librarian in her capacity as chief manager of the ils. other ideas gathered from survey responses are worth consideration, such as a base additional fee contributed each year (above and beyond the fee accessed as determined by staff licenses). such a fee could help recoup real, sometimes significant costs associated with the system, such as the purchase of additional software benefitting all players (often, in practice funded solely by the main library). such a fee could also help recoup more tangential (but still real) expenses, such as replacement of backup media. however, at the time of writing, tweaking (increasing) the fee assessed to partner institutions is a delicate issue. as with many other institutions of learning and their associated libraries, the nevada system of higher education has been particularly hard hit with funding cuts, even when compared against serious cuts experienced by colleagues nationwide. by all measures (unemployment, state budget shortfall, foreclosures, etc.) nevada has been one of the hardest hit states in the current recession. while knowledge gained from this survey was useful (and current), what effect it will have in changing the cost structure is, now, on hold. in the spirit of support among the libraries in the same system of higher education, and in continuing to demonstrate serious shared efficiencies (by maintaining one joint system as opposed to five individual systems), no new fee structure will be implemented in the short term. at the appropriate time, different costing structures such as those elicited in the survey results will merit closer attention. references 1. jason vaughan, “a library’s integrated online library system: assessment and new hardware implementation,” information technology and libraries 23, no. 2 (june 2004): 50–57. 2. innovative interfaces, “about us: history,” http://www .iii.com/about/history.shtml (accessed may 17, 2010). regardless of the library’s status. the easiest and cheapest method of awareness involves having someone monitor the iug electronic discussion list, with 29 respondents (70.7 percent) indicating that both the main library and one or more partner libraries participate in this activity. attendance at the national and regional iug meetings was also valued highly by libraries with 26 respondents (66.7 percent) indicating both the main libraries and their partner libraries having a staff member attend such meetings in the past 5 years. sixteen respondents (64 percent) indicated both the main library and their partner libraries regularly send staff to the american library association annual conference and midwinter meeting. iug typically has a meeting the friday before the midwinter meeting. attendance at training workshops held at the vendor headquarters, as well as online training, is an activity in which the main library participates more frequently than the partner libraries (61.1 percent). complete survey results are provided in appendix a, available at http://www.lita.org/ala/mgrps/divs/lita/ ital/302011/3002jun/pdf/vaughan_app.pdf. ■■ research summary and future directions integrated library systems shared by multiple partners hold the promise of shared efficiencies. given a rather significant number of responses, shared systems appear to be quite common, ranging from a few partners to systems with many partners. perhaps reflecting this, shared systems range from loose federations of library partners to shared systems managed by a more formalized, official consortium. a majority of libraries with shared systems have a mou or other official documents to help define the nature of the relationship, focusing on such topics as budgeting, payments, and funding formulas; general governance and voting matters; support; and equipment. most libraries sharing a system have a method or funding formula outlining how the ils is funded on an annual basis and the contributions provided by each partner. such methods can include not only annual maintenance, but also the procurement of new hardware and software extending the system capabilities. while many support functions are carried out by a central office or staff at the main library hosting the shared system, partner libraries often participate in annual user group and library association conferences where they help stay abreast of vendor ils developments. the research above describes the authors’ investigations into management of shared integrated library systems. in particular, the authors were interested in how other consortia sharing an ils managed their system, investigations into library web-scale discovery services jason vaughan information technology and libraries | march 2012 32 abstract web-scale discovery services for libraries provide deep discovery to a library’s local and licensed content and represent an evolution—perhaps a revolution—for end-user information discovery as pertains to library collections. this article frames the topic of web-scale discovery and begins by illuminating web-scale discovery from an academic library’s perspective—that is, the internal perspective seeking widespread staff participation in the discovery conversation. this included the creation of the discovery task force, a group that educated library staff, conducted internal staff surveys, and gathered observations from early adopters. the article next addresses the substantial research conducted with library vendors that have developed these services. such work included drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. together, feedback gained from library staff, insights arrived at by the discovery task force, and information gathered from vendors collectively informed the recommendation of a service for the unlv libraries. introduction web-scale discovery services, combining vast repositories of content with accessible, intuitive interfaces, hold the potential to greatly facilitate the research process. while the technologies underlying such services are not new, commercial vendors releasing such services, and their work and agreements with publishers and aggregators to preindex content, is very new. this article in particular frames the topic of web-scale discovery and helps illuminate some of the concerns and commendations related to web-scale discovery from one library’s staff perspective—that is, the internal perspective. the second part focuses on detailed dialog with the commercial vendors, enabling the library to gain a better understanding of these services. in this sense, the second half is focused externally. given that web-scale discovery is new for the library environment, the author was unable to find any substantive published work detailing identification, research, evaluation, and recommendation related to library web-scale discovery services. it’s hoped that this article will serve as the ideal primer for other libraries exploring or contemplating exploration of these groundbreaking services. web-scale discovery services are able to index a variety of content, whether hosted locally or remotely. such content can include library ils records, digital collections, institutional repository content, and content from locally developed and hosted databases. such capabilities existed, to varying degrees, in next-generation library catalogs that debuted in the mid 2000s. in addition, web-scale discovery services pre–index remotely hosted content, whether purchased or licensed by the library. this latter set of content—hundreds of millions of items—can include items such as e-books, publisher or aggregator content for tens of thousands of full-text journals, content from abstracting and indexing databases, and materials housed in open-access repositories. for purposes of this article, web-scale discovery services are flexible services which jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada, las vegas. investigations into library web-scale discovery services | vaughan 33 provide quick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content. commercial web-scale discovery vendors have brokered agreements with content providers (publishers and aggregators), allowing them to pre–index item metadata and full-text content (unlike the traditional federated search model). this approach lends itself to extremely rapid search and return of results ranked by relevancy, which can then be sorted in various ways according to the researcher’s whim (publication date, item type, full text only, etc.). by default, an intuitive, simple, google-like search box is provided (along with advanced search capabilities for those wishing this approach). the interface includes design cues expected by today’s researchers (such as faceted browsing) and, for libraries wishing to extend and customize the service, embraces an open architecture in comparison to traditional ils systems. why web-scale discovery? as illustrated by research dating back primarily to the 1990s, library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. as a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. users accustomed to a quick, easy, “must have it now” environment have defected, and research continues to illustrate this fact. rather than weave these research findings into a paragraph or page, below are some illustrative quotes to convey this challenge. the quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1 * * * today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information] . . . we have seen faculty members steadily shifting towards reliance on networklevel electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2 * * * a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.3 * * * end users’ expectations of data quality arise largely from their experiences of how information is organized on popular web sites. . . 4 * * * [user] expectations are increasingly driven by their experiences with search engines like google and online bookstores like amazon. when end users conduct a search in a library information technology and libraries | march 2012 34 catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.5 * * * users don’t understand the difference in scope between the catalog and a&i services (or the catalog, databases, digitized collections, and free scholarly content).6 * * * it is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary . . . the continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. our users expect simplicity and immediate reward and amazon, google, and itunes are the standards against which we are judged. our current systems pale beside them.7 * * * q: if you could provide one piece of advice to your library, what would it be? a: just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the internet.8 additional factors sell the idea of web-scale discovery. obviously, something must be discoverable for it to be used (and of value) to a researcher; ideally, content should be easily discoverable. since these new services index content that previously was housed in dozens or hundreds of individual silos, they can greatly facilitate the search process for many research purposes. libraries often spend large sums of money to license and purchase content, sums that often increase annually. any tool that holds the potential to significantly increase the discovery and use of such content should cause libraries to take notice. at time of writing, early research is beginning to indicate that these tools can increase discovery. doug way compared link-resolver-database and full-text statistics prior to and after grand valley state university’s implementation of the summon webscale discovery service.9 his research suggests that the service was both broadly adopted by the university’s community and that it has led to an increase in their library’s electronic resource discovery and use. willamette university implemented worldcat local, and bill kelm presented results that showed an increase in both ill requests as well as use of the library’s electronic resources.10 from another angle, information-literacy efforts focus on connecting users to “legitimate” content and providing researchers the skills to identify content quality and legitimacy. given that these web-scale discovery services include or even primarily focus on indexing a large amount of scholarly research, such services can serve as another tool in the library’s arsenal. results retrieved from these services—largely content licensed or purchased by libraries—is accurate, relevant, and vetted, compared to the questionable or opinionated content that may often be returned through a web search engine query. several of the services currently allow a user to refine results to just categorized as peer-reviewed or scholarly. the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task investigations into library web-scale discovery services | vaughan 35 force charged to identify, research, evaluate, and recommend a potential service for purchase. throughout this process, and as detailed below, communication with and feedback from the variety of library staff was essential in ensuring success. given the increasing vitality of content in electronic format, and the fact that such content was increasingly spread across multiple access points or discovery systems, in late 2008 the university of nevada las vegas (unlv) libraries began an effort to engage library staff in information discovery and how such discovery would ideally occur in the future. related to the exponential growth of content in electronic format, traditional technical-services functions of cataloging and acquisitions were changing or would soon change, not just at unlv, but throughout the academic library community. coinciding with this, the libraries were working on drafting their 2009–11 strategic plan and wanted to have a section highlighting the importance of information discovery and delivery with action items focused on improving this critical responsibility of libraries. in spring 2009, library staff were given the opportunity to share with colleagues a product or idea, related to some aspect of discovery, which they felt was worthy of further consideration. this event, open to unlv libraries staff and other nevada colleagues, was titled the discovery mini-summit, and more than a dozen participants shared their ideas, most in a poster-session format. one of the posters focused on serial solutions summon, an early entrant into the vendor web-scale discovery service landscape. at the time, it was a few months from public release. other posters included topics such as the flickr commons (cultural heritage and academic institutions exposing their digital collections through this popular platform), and a working prototype of a homegrown, open-source federated search approach searching across various subscribed databases. in august 2009, the dean of the unlv university libraries charged a ten-person task force to investigate and evaluate web-scale discovery services with the ultimate goal of providing a final recommendation for potential purchase. representation on the task force included three directors and a broad cross section of staff from across the functional areas of the library, including back-of-the-house and public-service operations. the director of library technologies, and author of this article, was tasked with drafting a charge and chairing the committee; once charged, the discovery task force worked over the next fifteen months to research, evaluate, and ultimately provide a recommendation regarding a web-scale discovery service. to help illustrate some of the events described, a graphical timeline of activities is presented as appendix a; the original charge appears as appendix b. in retrospect, the initial target date of early 2010 to make a recommendation was naive, as three of the five products ultimately identified and evaluated by the task force weren’t publicly released until 2010. several boundaries were provided within the charge, including the fact that the task force was not investigating and evaluating traditional federated search products. the libraries had had a very poor experience with federated search a few years earlier, and the shortcomings of the traditional federated search approach—regardless of vendor—are well known. the remainder of this article discusses the various steps taken by the discovery task force in evaluating and researching web-scale discovery services. while many libraries have begun to implement the webscale discovery services evaluated by this group, many more are currently at the learning and evaluation stage, or have not yet begun. many libraries that have already implemented a commercial service likely went through an evaluation process, but perhaps not at the scale conducted by the unlv libraries, if for no other reason than the majority of commercial services are extremely new. even in early 2010, there was less competition, fewer services to evaluate, information technology and libraries | march 2012 36 fewer vendors to contact, and fewer early adopters from whom to seek references. fortunately, the initial target date of early 2010 for a recommendation was a soft target, and the discovery task force was given ample time to evaluate the products. based on presentations given by the author in 2010, it can’t be presumed that an understanding of web-scale discovery—or the awareness of the commercial services now available—is necessarily widespread. in that sense, it’s the author’s hope and intent that information contained in this article can serve as a primer, or a recipe, for those libraries wishing to learn more about web-scale discovery and perhaps begin an evaluation process of their own. while research exists on federated search technologies within the library environment, the author was unable to find any peer-reviewed published research on the evaluation model and investigations for vendor produced web-scale discovery services as described in this paper. however, some reports are available on the open web, providing some insights into web-scale discovery evaluations led by other libraries, such as two reports provided by oregon state university. the first, dated march 2009, describes a task force whose activities included “scrutinize wcl [worldcat local], investigate other vendors’ products, specifically serials solutions’ summon, the recently announced federated index discovery system; ebsco’s integrated search; and innovative interfaces’ encore product, so that a more detailed comparison can be done,” and “by march 2010, communicate . . . whether wcl or another discovery service is the optimal purchase for osu libraries.”11 note that in 2009, encore existed as a next-generation discovery layer, and it had an optional add on called “encore harvester,” which allows for the harvesting of digital local collections. the report cites the university of michigan’s evaluation of wcl, and adds their additional observations. the march 2009 report provides a features comparison matrix for worldcat local, encore, summon, and libraryfind (an open-source search tool developed at osu that provides federated searching for selected resources). feature sets include the areas of search and retrieval, content, and added features (e.g., book covers, user tagging, etc.). the report also describes some usability testing involving wcl and integration with other local library services. a second set of investigations followed “in order to provide the task force with an opportunity to more thoroughly investigate other products” and is described in a second report provided at the end of 2009.12 at the time of both phases of this evaluation (and drafted reports) three of the web-scale discovery products had yet to enter public release. the december 2009 report focused on the two released products, serials solutions summon and worldcat local, and includes a feature matrix like the earlier report, with the added feature set of “other,” which included the features of “clarity of display,” “icons/images,” and “speed.” the latter report briefly describes how they obtained subject librarian feedback and the pros and cons observed by the librarians in looking at summon. it also mentions obtaining feedback from two early adopters of the summon product, as well as obtaining feedback from librarians whose library had implemented worldcat local. apart from the oregon reports, some other reports on evaluations (or selection) of a particular service, or a set of particular services, are available, such as the university of michigan’s article discovery working group, which submitted a final report in january 2010.13 activity: understanding web-scale the first activity of the discovery task force was to educate the members, and later, other library colleagues, on web-scale discovery. terms such as “federated search,” “metasearch,” “next investigations into library web-scale discovery services | vaughan 37 generation catalogs,” and “discovery layers” had all come before, and “web-scale” was a rather new concept that wasn’t widely understood. the discovery mini summit served as a springboard that perhaps more by chance than design introduced to unlv library staff what would later become more commonly known as web-scale discovery, though even we weren’t familiar with the term back in spring 2009. in fall 2009, the discovery task force identified reports from entities such as oclc, ithaka, and reports prepared for the library of congress highlighting changing user behavior and expectations; these reports helped form a solid foundation for understanding the “whys” related to web-scale discovery. additional registration and participation in sponsored web-scale discovery webcasts and meeting with vendors at library conferences helped further the understanding of web-scale discovery. after the discovery task force had a firm understanding of web-scale discovery, the group hosted a forum for all library staff to help explain the concept of web-scale discovery and the role of the discovery task force. specifically, this first forum outlined some key components of a web-scale discovery service, discussed research the task force had completed to date, and outlined some future research and evaluation steps. a summary of these steps appears in the timeline in appendix a. time was allowed for questions and answers, and then the task force broadcast several minutes of a (then recent) webcast talking about web-scale discovery. as part of its education role, the discovery task force set up an internal wiki-based webpage in august 2009 upon formation of the group, regularly added content, and notified staff when new content was added. a goal of the task force was to keep the evaluative process transparent, and over time the wiki became quite substantial. links to “live” services were provided on the wiki. given that some services had yet to be released, some links were to demo sites or sites of the closest approximation available, i.e., some services yet to be released were built on an existing discovery layer already in general release, and thus the look, feel, and functionality of such services was basically available for staff review. the wiki also provided links to published research and webcasts on web-scale discovery. such content grew over time as additional webscale discovery products entered general release. in addition to materials on particular services, links were provided to important background documents and reports on topics related to the user discovery experience and user expectations for search, discovery, and delivery. discovery task force meeting notes and staff survey results were posted to the wiki, as were evaluative materials such as information on the content-overlap analysis conducted for each service. announcements to relevant vendor programs at the american library association’s annual conference were also posted to the wiki. activity: initial staff survey as noted above, when the task force began its work, only two products (out of five ultimately evaluated) were in general release. as more products entered public release, a next step was to invite vendors onsite to show their publicly released product, or a working, developed prototype nearing initial public release. to capture a sense of the library staff ahead of these vendor visits, the discovery task force conducted the first of two staff surveys. the 21-question survey consisted of a mix of “rank on a scale” questions, multiple-choice questions, and free-text response questions. both the initial and subsequent surveys were administered through the online surveymonkey tool. respondents were allowed to skip any question they wished. the survey was broken into three broad topical areas: “local library customization capabilities,” “end user aspect: information technology and libraries | march 2012 38 features and functionality,” and “content.” the survey had an average response rate of 47 staff, or 47% of the library’s 100-strong workforce. the survey questions appear in appendix c. in hindsight, some of the questions could have benefitted from more careful construction. that said, there was a conscious juxtaposition of differing concepts within the same question—the task force did not want to receive a set of responses in which all library staff felt it was important for a service to do everything—in short, to be all things to all people. forcing staff to rate varied concepts within a question could provide insights into what they felt was really important. a brief summary of some key questions for each section follows. as an introduction, one question in the survey asked staff to rate the relative importance of each overarching aspect related to a discovery service (customization, end user interface, and content). staff felt content was the most critical aspect of a discovery service, followed by the end-user interface, followed by the ability to heavily customize the service. a snapshot of some of the capabilities library staff thought were important (or not) is provided in table 1. web-scale capabilities sa a n d sd physical item status information 81.6% 18.4% publication date sort capability 75.5% 24.5% display library-specified links in the interface 69.4% 30.6% one-click retrieval of full-text items 61.2% 36.7% 2% ability to place ill / consortial catalog requests 59.2% 36.7% 4.1% display the library’s logo 59.2% 36.7% 4.1% to be embedded within various library website pages 58% 42% full-text items first sort capability 58.3% 31.3% 8.3% 2.1% shopping cart for batch printing, emailing, saving 55.1% 44.9% faceted searching 48.9% 42.6% 8.5% media type sort capability 47.9% 43.8% 4.2% 4.2% author name sort capability 41.7% 37.5% 18.8% 2.1% have a search algorithm that can be tweaked by library staff 38% 36% 20% 4% 2% user account for saved searches and marked items 36.7% 44.9% 14.3% 4.1% book cover images 25% 39.6% 20.8% 10.4% 4.2% have a customizable color scheme 24% 58% 16% 2% google books preview button for book items 18.4% 53.1% 24.5% 4.1% tag cloud 12.5% 52.1% 31.3% 4.2% user authored ratings 6.4% 27.7% 44.7% 12.8% 8.5% user authored reviews 6.3% 20.8% 50% 12.5% 10.4% user authored tags 4.2% 33.3% 39.6% 10.4% 12.5% sa = strongly agree; a = agree; n = neither agree nor disagree; d = disagree; sd = strongly disagree table 1. web-scale discovery service capabilities investigations into library web-scale discovery services | vaughan 39 none of the results was surprising, other than perhaps the low interest or indifference in several web 2.0 community features, such as the ability for users to provide ratings, reviews, or tags for items, and even a tag cloud. the unlv libraries already had a next-generation catalog offering these features, and they have not been heavily used. even if there had been an appreciable adoption of these features by end users in the next-generation catalog for a web scale discovery service they are perhaps less applicable—it’s probably more likely that users would be less inclined to post reviews and ratings for an article, as opposed to a monograph—and article-level content vastly outnumbers book-level content with web-scale discovery services. the final survey section focused on content. one question asked about the incorporation of ten different information types (sources) and asked staff to rank how important it was that a service include such content. results are provided in table 2. a bit surprisingly, inclusion of catalog records was seen as most important. not surprisingly, full-text and a&i content from subscription resources were ranked very highly. it should also be noted that at the time of the survey, the institutional repository was in its infancy with only a few sample records, and awareness of this resource was low among library staff. another question listed a dozen existing publishers (e.g., springer, elsevier, etc.) deemed important to the libraries and asked staff to rank the importance that a discovery service index items from these publishers on a four point scale from “essential” to “not important.” results showed that all publishers were ranked as essential and important. related to content, 83.8 percent of staff felt that it was preferable for a service to de-dupe records such that the item appears once in the returned list of results; 14.6 percent preferred that the service not de-dupe results. information source rating average ils catalog records 1.69 majority of full-text articles / other research contained in vendorlicensed online resources 2.54 majority of citation records for non-full-text vendor-licensed a&i databases 4.95 consortial catalog records 5.03 electronic reserves records 5.44 records within locally created and hosted databases 5.64 digital collection records 5.77 worldcat records 6.21 ils authority control records 6.5 institutional repository records 6.68 table 2. importance of content indexed in discovery service after the first staff survey was concluded, the discovery task force hosted another library forum to introduce and “test drive” the five vendor services in front of library staff. this session was scheduled just a few weeks ahead of the onsite vendor visits to help serve as a primer to engage library staff and get them actively thinking about questions to ask the vendors. the task force information technology and libraries | march 2012 40 distributed notecards at the forum and asked attendees to record any specific questions they had about a particular service. after the forum, specific questions related to the particular products were collected; 28 questions were collected, and they helped inform future research for those questions for which the task force did not at the time have an answer. questions ran the gamut and collectively touched on all three areas of evaluation. activity: second staff survey within a month after the five vendor onsite visits, a content analysis of the overlap between unlv licensed content and content indexed by the discovery services was conducted. after these steps, a second staff survey was administered. this second staff survey had questions focused on the same three functional areas as the first staff survey: local library customization features, end user features and functionality, and content. since the vendor visits had taken place and users could now understand the questions in the context of the products, questions were asked from the perspective of each product, e.g., “please rate on a five point likert scale whether each discovery service appears to adequately cover a majority of the critical publisher titles (worldcat local, summon, eds, encore synergy, primo central).” in addition, there were free-text questions focused on each individual product allowing colleagues to share additional, detailed thoughts. the second survey totalled 25 questions and had an average response rate of 18 respondents, or about 18 percent of library staff. several staff conducted a series of sample searches in each of the services and provided feedback of their findings. though this was a small response rate, two of the five products rose to the top, a third was a strong contender, and two were seen as less desirable. the lower response rate is perhaps indicative of several things. first, not all staff had attended the onsite vendor demonstrations or had taken the time to test drive the services via the links provided on the discovery task force wiki site. second, some questions were more appropriately answered by a subset of staff. for example, the content questions might best be matched to those with reference, collection development, or curriculum and program liaison duties. finally, intricate details emerged once a thorough analysis of the vendor services was commenced. the first survey was focused more on the philosophy of what was desirable; the second survey took this a step further and asked how well each product matched such wishes. discovery services are changing rapidly with respect to interface updates, customization options, and scope of content. as such, and also reflective of the lower response rate, the author is not providing response information nor analysis for this second survey within this article. however, results may be provided upon specific request to the author. the questions themselves for the second staff survey are significant, and they could help serve as a model for other libraries evaluating existing services on the market. as such, questions appear in appendix d. activity: early adopter references one of the latter steps in the evaluation process from the internal academic library perspective was to obtain early adopter references from other academic library customers. a preliminary shortlist was compiled through a straw vote of the discovery task force—and the results of the vote showed a consensus. this vote narrowed down the discovery task force’s list of services still in contention for a potential purchase. this shortlist was based on the growing mass of research conducted by the discovery task force and informed by the staff surveys and feedback to date. three live customers were identified for each service that had made the shortlist, and the task investigations into library web-scale discovery services | vaughan 41 force successfully obtained two references for each service. reference requests were intensive and involved a set of two dozen questions that references either responded to in writing or answered during scheduled conference calls. to help libraries conducting or interested in conducting their own evaluation and analysis of these services, this list of questions appears in appendix e. the services are so new that the live references weren’t able to comprehensively answer all the questions—they simply hadn’t had sufficient time to fully assess the service they’d chosen to implement. still, some important insights were gained about the specific products and, at the larger level, discovery services as a whole. as noted earlier, discovery services are changing rapidly in the sense of interface updates, customization options, and scope of content. as such, the author is not providing product specific response information or analysis of responses for each specific product—such investigations and interpretations are the job of each individual library seriously wishing to evaluate the services to help decide which product seems most appropriate for its particular environment. several broad insights merit notice, and they are shared below. regarding a question on implementation (though some challenges were mentioned with a few responders), nothing reached the threshold of serious concern. all respondents indicated the new discovery service is already the default or primary search box on their website. one section of the early adopter questions focused on content. the questions in this area seemed a bit challenging for the respondents to provide lots of detail. in terms of “adequately covering a majority of the important library titles,” respondents varied from “too early to tell,” “it covers many areas but there are some big names missing,” to two of the respondents answering simply, “yes.” several respondents also clearly indicated that the web-scale discovery service is not the “beginning and ending” for discovery, a fact that even some of the discovery vendors openly note. for example, one respondent indicated that web-scale discovery doesn’t replace remote federated searching. a majority (not all) of the discovery vendors also have a federated search product that can, to varying degrees, be integrated with their preharvested, centralized, index-based discovery service. this allows additional content to be searched because such databases may include content not indexed within the web-scale discovery service. however, many are familiar with the limitations of federated search technologies: slow speed, poor relevancy ranking of results, and the need to configure and maintain sources and targets. such problems remain with federated search products integrated with web-scale discovery services. another respondent indicated they were targeting their discovery service at undergraduate research needs. another responded, “as a general rule, i would say the discovery service does an excellent job covering all disciplines. if you start really in-depth research in a specific discipline, it starts to break down. general searches are great . . . dive deeper into any discipline and it falls apart. for example, for a computer science person, at some point they will want to go to acm or ieee directly for deep searches.” related to this, “the catalog is still important, if you want to do a very specific search for a book record, the catalog is better. the discovery service does not replace the catalog.” in terms of satisfaction with content type (newspapers, articles, proceedings, etc.), respondents seemed generally happy with the content mix. a range of responses were received, such as “doesn’t appear to be a leaning one way or another, it’s a mix. some of these things depend on how you set the system up, as there is quite a bit of flexibility; the library has to make a decision on what they want searched.” another example was that “the vendor has been working very hard to balance content types and i’ve seen a lot of improvement,” “no imbalance, results seem pretty well rounded.” another responded, “a common complaint is that newspapers and book reviews dominate the search results, but that is much more a function of search algorithms then the amount of content in the index.” information technology and libraries | march 2012 42 when asked about positive or critical faculty feedback to the service, several respondents indicated they hadn’t had a lot of feedback yet. one indicated they had anecdotal feedback. another indicated they’d received backlash from some users who were used to other search services (but also added that it was no greater than backlash from any other service they’d implemented in the past—and so the backlash wasn’t a surprise). one indicated “not a lot of feedback from faculty, the tendency is to go to databases directly, librarians need to instruct them in the discovery service.” for student feedback, one indicated, “we have received a few positive comments and see increased usage.” another indicated, “reviews are mixed. we have had a lot of feedback thanking us for providing a search that covers articles and books. they like the ability to do one search and get a mix of resources without the search taking a long time. other feedback usually centers around a bug or a feature not working as it should, or as they understand it should. in general, however, the feedback has been positive.” another replied, “comments we receive are generally positive, but we’ve not collected them systematically.” some respondents indicated they had done some initial usability testing on the initial interface, but not the most recent one now in use. others indicated they had not yet conducted usability testing, but it was planned for later in 2010 or 2011. in terms of their fellow library staff and their initial satisfaction, one respondent indicated, “somewhere between satisfied and very satisfied . . . it has been increasing with each interface upgrade . . . our instruction librarians are not planning to use the discovery service this fall [in instruction efforts] because they need more experience with it . . . they have been overall intrigued and impressed by it . . . i would say our organization is grappling more with the implications of a discovery tools as a phenomenon than with our particular discovery service in particular. there seems to be general agreement that it is a good search tool for the unmediated searcher.” another indicated some concerns with the initial interface provided: “if librarians couldn’t figure it out, users can’t figure it out.” another responded, it was “a big struggle with librarians getting on board with the system and promoting the service to students. they continually compare it against the catalog. at one point, they weren’t even teaching the discovery service in bib instruction. the only way to improve things it with librarian feedback; it’s getting better, it has been hard. librarians have a hard time replacing the catalog and changing things that they are used to.” in terms of local customization, responses varied; some libraries had done basically no customization to the out-of-the-box interface, others had done extensive customization. one indicated they had tweaked sort options and added widgets to the interface. another indicated they had done extensive changes to the css. one indicated they had customized the colors, added a logo, tweaked the headers and footers, and created “canned” or preconfigured search boxes searching a subset of the index. another indicated they couldn’t customize the header and footer to the degree they would have liked, but were able to customize these elements to a degree. one respondent indicated they’d done a lot of customization to an earlier version of the interface, which had been rather painstaking, and that much of this broke when they upgraded to the latest version. that said, they also indicated the latest version was much better than the previous version. one respondent indicated it would be nice if the service could have multiple sources for investigations into library web-scale discovery services | vaughan 43 enriched record content so that better coverage could be achieved. one respondent indicated they were working on a complete custom interface from scratch, which would be partially populated with results from the discovery service index (as well as other data sources). a few questions asked about relevancy as a search concept and how well the respondents felt about the quality of returned results for queries. one respondent indicated, “we have been able to tweak the ranking and are satisfied at this point.” another indicated, “overall, the relevance is good – and it has improved a lot.” another noted, “known item title searching has been a problem . . . the issues here are very predictable – one word titles are more likely to be a problem, as well as titles with stopwords,” and noted the vendor was aware of the issue and was improving this. one noted, “we would like to be able to experiment with the discovery service more – and noted, “no relevancy algorithm control.” another indicated they looked to investigate relevance more once usability studies commenced, and noted they worked with the vendor to do some code changes with the default search mechanism. one noted that they’d like to be able to specify some additional fields that would be part of the algorithm associated with relevancy. another optimistically noted “as an early adopter, it has been amazing to see how relevance has improved. it is not perfect, but it is constantly evolving and improving.” a final question asked simply, “overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product?” for the majority of responses, there was general agreement from the early adopters that they felt they’d made the right choice. one noted that it was still early and the evaluation is still a work in progress, but felt it has been positively received. the majority were more certain, “yes, i strongly feel that this was the right decision . . . as more users find it, i believe we will receive additional positive feedback,” “yes, we strongly believe in this product and feel it has been adopted and widely accepted by our users,” “i do feel it was a good selection.” the external perspective: dialog with web-scale discovery vendors the preceding sections focused on an academic library’s perspective on web-scale discovery services—the thoughts, opinions, preferences, and vetting activities involving library staff. the following sections focus on the extensive dialog and interaction with the vendors themselves, regardless of the internal library perspective, and highlight the thorough, meticulous research activities conducted on five vendor services. the discovery task force sought to learn as much about the each service as possible, a challenging proposition given the fact that at the start of investigations, only two of five services had been released, and, unsurprisingly, very little research existed. as such, it was critical to work with vendors to best understand their services, and how their service compared to others in the marketplace. broadly summarized efforts included identification of services, drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. activity: vendor identification over the course of a year’s work, the discovery task force executed several steps to systematically understand the vendor marketplace—the capabilities, content considerations, development cycles, and future roadmaps associated with five vendor offerings. given that the information technology and libraries | march 2012 44 task force began their work when only two of these services were in public release, there was no manual, recipe, or substantial published research to rely on. the beginning, for the unlv libraries, lie in identification of the services—one must first know the services to be evaluated before evaluation can commence. as mentioned previously, the discovery mini-summit held at the unlv libraries highlighted one product—serial solutions summon; the only released product at the time of the mini-summit was worldcat local. while no published peer-reviewed research highlighting these new web-scale discovery services existed, press and news releases did exist for the three to-be-released services. such releases shed light on the landscape of services that the task force would review—a total of five services, from the first-to-market, worldcat local, to the most recent entrant, primo central. oclc worldcat local, released in november 2007, can be considered the first web-scale discovery service as defined in this research; the experience of an early pilot partner (the university of washington) is profiled in a 2008 issue of library technology reports.14 in the uw pilot, approximately 30 million article-level items were included with the worldcat database. another product, serials solutions summon, was released in july 2009, and together these two services were the only ones publicly released when the discovery task force began its work. the task force identified three additional vendors each working on their own version of a web-scale discovery service; each of these services would enter initial general release as the task force continued its research: ebsco eds in january 2010, innovative interfaces encore synergy around may 2010, and ex libris primo central in june 2010. while each of these three were new in terms of web-scale discovery capabilities, each was built, at least in part, on earlier systems from the vendors. eds draws heavily from the ebscohost interface (the original version of which dates back to the 1990s), while the base encore and base primo systems were next-generation catalog systems that debuted in 2007. activity: vendor investigations after identification of existing and under development discovery services, a next step in unlv’s detailed vendor investigations included the creation of a uniform, comprehensive question list sent to each of the five vendors. the discovery task force ultimately developed a list of 71 questions divided into nine functional areas, as follows, with an example question: section 1: background. “when did product development begin (month, year)?” section 2: locally hosted systems and associated metadata. “with what metadata schemas does your discovery platform work? (e.g., marc, dublin core, ead, etc.)” section 3: publisher/aggregator coverage (full text and citation content). “with approximately how many publishers/aggregators have you forged content agreements ?” section 4: records maintenance and rights management. “how is your system initialized with the correct set of rights management information when a new library customer subscribes to your product?” investigations into library web-scale discovery services | vaughan 45 section 5: seamlessness & interoperability with existing content repositories. “for ils records related to physical holdings, is status information provided directly within the discovery service results list?” section 6: usability philosophy. “describe how your product incorporates published, established best practices in terms of a customer focused, usable interface.” section 7: local “look & feel” customization options. “which of the following can the library control: color scheme; logo / branding; facet categories and placement; etc.” section 8: user experience (presentation, search functionality, and what the user can do with the results). “at what point does a user leave the context and confines of the discovery interface and enter the interface of a different system, whether remote or local?” section 9: administration module & statistics. “describe in detail the statistics reporting capabilities offered by your system. does your system provide the following sets of statistics . . .” all vendors were given 2–3 weeks to respond, and all vendors responded. it was evident from the uneven level of responses to the questions that the vendors were at different developmental states with their products. some vendors were still 6–9 months away from initial public release; some were not even firm on when their service would enter release. it was also observed that some vendors were less explicit in the level of detail provided, reflective of, or in some cases perhaps regardless of, development state. a refined subset of the original 71 questions appears as a list of 40 questions in appendix f. apart from the detailed question list, various sets of free and licensed information on these discovery services are available online, and the task force sought to identify and digest the information. the charleston advisor has conducted interviews with several of the library webscale discovery vendors on their products, including ebsco,15 serials solutions,16 and ex libris.17 these interviews, each around a dozen questions, ask the vendors to describe their product, how it differs from other products in the marketplace, and include questions on metadata and content—all important questions. an article by ronda rowe reviews summon, eds, and worldcat local, and provides some analysis of each product on the basis of content, user interface and searchability, pricing, and contract options.18 it also provides a comparison of 24 product features provided by these three services, such as “search box can be embedded in any webpage,” “local branding possible,” and “supports social networking.” a wide variety of archived webcasts, many provided by library journal, are available through free registration, and new webcasts are being offered at time of writing; these presentations to some degree touch on discussions with the discovery vendors, and are often moderated or include company representatives as part of the discussion group.19 several libraries have authored reports and presentations that, at least partially, discuss information on particular services gained through their evaluations, which include dialog with the vendors.20 vendors themselves each have a section on their corporate website devoted to their service. information provided on these websites ranges from extremely brief to, in the case of worldcat local, very detailed and informative. in addition, much can be gained by “test-driving” live implementations. as such, a listing of vendor website addresses information technology and libraries | march 2012 46 providing more information as well as a list of sample, live implementations is provided in appendix g. activities: vendor visits and content overlap analysis each of the five vendors visited the unlv libraries in spring 2010. vendor visits all occurred within a nine-day span; visits were intentionally scheduled close to each other to keep things fresh in the minds of library staff, and such proximity would help with product comparisons. vendor visits lasted approximately half a day, and each vendor visit often included the field or regional sales representative as well as a product manager or technical expert. vendor visits included a demonstration and q&a for all library staff as well as invited colleagues from other southern nevada libraries, a meeting with the discovery task force, and a meeting with technical staff at unlv responsible for website design and application development and customization. vendors were each given a uniform set of fourteen questions on topics to address during their visit; these appear in appendix h. questions were divided into the broad topical areas of content coverage, end user interface and functionality, and staff “control” over the end user interface. on average, approximately 30–40 percent of the library staff attended the open vendor demo and q & a session. shortly after the vendor visits, a content-overlap analysis comparing unlv serials holdings with indexed content in the discovery service was sought from each vendor. given that the amount of content indexed by each discovery service was growing (and continues to grow) extremely rapidly as new publisher and aggregator content agreements are signed, this content-overlap analysis was intentionally not sought at an earlier date. some vendors were able to provide detailed coverage information against our existing journal titles (unlv currently subscribes to approximately 20,000 e-journals and provides access to another 7,000+ open-access titles). for others, this was more difficult. recognizing this, the head of collection development was asked to provide a list of the “top 100” journal titles for unlv based on such factors as usage statistics and whether the title was a core title for part of the unlv curriculum. the remaining vendors were able to provide content coverage information against this critical title list. four of the five products had quite comprehensive coverage (more than 80 percent) of the unlv libraries’ titles. while outside the scope of this article, “coverage” can mean different things for different services. driven by the publisher agreements they are able to secure, some discovery services may have extensive coverage for particular titles (such as the full text, abstracts, author-supplied keywords, subject headings, etc.), whereas other services, while covering the same title, may have “thinner” metadata, such as basic citation information (article title, publication title, author, publication date, etc.). more discussion on this topic is present in the january 2011 library technology reports on library web-scale discovery services.21 activity: product development tracking one aspect of web-scale discovery services, and the next-generation discovery layers that preceded them, is a rapid enhancement cycle, especially when juxtaposed against the turnkeystyle ils system that dominated library automation for many years. as an example, minor enhancements are provided by serials solutions to summon approximately every three to four weeks; provided by ebsco to ebsco discovery service approximately every three months; and investigations into library web-scale discovery services | vaughan 47 provided by ex libris to primo/primo central approximately every three months. many vendors unveil updates coinciding with annual library conferences, and 2010 was no exception. in late summer/early fall 2010, the discovery task force had conference calls or onsite visits with several of the vendors with a focused discussion on new enhancements and changes to services as well as to obtain answers to any questions that arose since their last visit several months earlier. since the vendor visits in spring 2010, each service had changed, and two services had unveiled significantly different and improved interfaces. the discovery task force’s understanding of web-scale discovery services had expanded greatly since starting their work. coordinated with the second series of vendor visits and discussions, an additional list of more than two dozen questions, recognizing this refined understanding, was sent to the majority of vendors. a portion of these questions are provided as part of the refined list of questions presented in appendix f. this second set of questions dealt with complex discussions of metadata quality, such as what level of content publishers and aggregators were providing for indexing purposes, e.g., full text, abstracts, table of contents, author-supplied keywords or subject headings, or particular citation and record fields), and also the vendor’s stance on content neutrality, i.e., whether they are entering into exclusive agreements with publishers and aggregators, and, if the discovery service vendor is owned by a company involved with content, if that content is promoted or weighted more heavily in result sets. other questions dealt with such topics as current install base counts and technical clarifications about how their service worked. in particular, the questions related to content were tricky for many (not all) of the vendors to address. still, the discovery task force was able to get a better understanding of how things worked in the evolving discovery environment. combined with the internal library perspective and the early adopter references, information gathered from vendors provided the necessary data set to submit a recommendation with confidence. activity: recommendation by mid-fall 2010, the discovery task force had conducted and had at their disposal a tremendous amount of research. recognizing how quickly these services change and the fact that a cyclical evaluation could occur, the task force members felt they had met their charge. if all things failed during the next phase—implementation—at least no one would be able to question the thoroughness of the task force’s efforts. unlike the hasty decision, which in part led to a less than stellar experience with federated search a few years earlier, the evaluation process to recommend a new web-scale discovery service was deliberate, thorough, transparent, and vetted with library stakeholders. given the discovery task force was entering its final phase, official price quotes were sought from each vendor. each task force member was asked to develop a pro/con list for all five identified products based on the knowledge that was gained. these lists were anonymized and consolidated into a single, extensive pro/con list for each service. some of the pros and cons were subjective (such as the interface aesthetics), some were objective (such as a particular discovery service not offering a desired feature). at one of the final meetings of the task force, members reaffirmed the three top contenders, indicated the other two were no longer under consideration and, afterward, were asked to rank their first, second, and third choices for the remaining services. while complete consensus wasn’t achieved, there was a resounding first choice, second choice, and third information technology and libraries | march 2012 48 choice. the task force presented a summary of findings at a meeting open to all library staff. this meeting summarized the research and evaluation steps the task force had conducted over the past year, framed each of the three shortlisted services by discussing some strengths and weaknesses of each service as observed by the task force, and sought to answer any questions from the library at large. prior to drafting the final report and making the recommendation to the dean of libraries, several task force members led a discussion and final question and answer at a libraries’ cabinet meeting, one of the high-level administrative groups at the unlv libraries. vetting by this body represented the last step related to the discovery task force’s investigation, evaluation, and recommendation for purchase of a library web-scale discovery service. the recommendation was broadly accepted by the library cabinet, and shortly afterward the discovery task force was officially disbanded, having met its goal of investigating, evaluating, and making a recommendation for purchase of a library web-scale discovery service. next steps the dialog above describes the research, evaluation, and recommendation model used by the unlv libraries to select a web-scale discovery service. such a model and the associated appendixes could serve as a framework, with some adaptations perhaps, for other libraries considering the evaluation and purchase of a web-scale discovery service. together, the discovery task force’s internal and external research and evaluation provided a substantive base of knowledge on which to make a recommendation. after its recommendation, the project progressed from a research and recommendation phase to an implementation phase. the libraries’ cabinet brainstormed a list of more than a dozen concise implementation bullet points—steps that would need to be addressed—including the harvesting and metadata mapping of local library resources, local branding and some level of customization work, and integration of the web-scale discovery search box in the appropriate locations on the libraries’ website. project implementation co-managers were assigned (the director of technical services and the web technical support manager), as well as key library personnel who would aid in one or more implementation steps. in january 2011, the implementation commenced, with an expected public launch of the new service planned for mid-2011. the success of a web-scale discovery service at the unlv libraries is a story yet to be written, but one full of promise. acknowledgements the author wishes to thank the other members of the unlv libraries’ discovery task force in the research and evaluation of library web-scale discovery services: darcy del bosque, alex dolski, tamera hanken, cory lampert, peter michel, vicki nozero, kathy rankin, michael yunkin, and anne zald. references 1. marcia j. bates, improving user access to library catalog and portal information, final report, version 3 (washington, dc: library of congress, 2003), 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf (accessed september 10, 2010). http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf investigations into library web-scale discovery services | vaughan 49 2. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies (new york: ithaka s+r, 2010), 4, http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-20002009/faculty%20study%202009.pdf (accessed september 10, 2010). 3. oclc, online catalogs: what users and librarians want (dublin, oh: oclc, 2009), 20, http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf (accessed september 10, 2010). 4. ibid, vi. 5. ibid, 14. 6. karen calhoun, the changing nature of the catalog and its integration with other discovery tools: final report (washington, dc: library of congress, 2006), 35, http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed september 10, 2010). 7. bibliographic services task force, rethinking how we provide bibliographic services for the university of california: final report ([pub location?] university of california libraries, 2005), 2, http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf (accessed september 10, 2010). 8. oclc, college students’ perceptions of libraries and information resources (dublin, oh: oclc, 2006), part 1, page 4, http://www.oclc.org/reports/pdfs/studentperceptions.pdf (accessed september 10, 2010). 9. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review, in press. 10. bill kelm, “worldcat local effects at willamette university,” presentation, prezi, july 21, 2010, http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ (accessed sept 10, 2010). 11. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,”march 27, 2009, http://hdl.handle.net/1957/11167 (accessed february 12, 2012). 12. michael boock et al., “discovery services task force recommendation to university librarian,” http://hdl.handle.net/1957/13817 (accessed february 12, 2012). 13. ken varnum et al., “university of michigan library article discovery working group final report,” umich, january 29, 2010, http://www.lib.umich.edu/files/adwg/final-report.pdf.[access date?] 14. jennifer ward, pam mofjeld, and steve shadle, “worldcat local at the university of washington libraries,” library technology reports 44, no. 6 (august/september 2008). 15. dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (january 2010): 62–65. http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf http://www.oclc.org/reports/pdfs/studentperceptions.pdf http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ http://hdl.handle.net/1957/11167 http://hdl.handle.net/1957/13817 http://www.lib.umich.edu/files/adwg/final-report.pdf information technology and libraries | march 2012 50 16. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (april 2010): 60–62. 17. dennis brunning and george machovec, “an interview with nancy dushkin, vp discovery and delivery solutions at ex libris, regarding primo central,” charleston advisor 12, no. 2 (october 2010): 58–59. 18. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (october 2010): 5–10. 19. library journal archived webcasts are available at http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp (accessed sept 10, 2010). 20. boock, chadwell, and reese, “worldcat local task force report to lamp”; boock et al., “discovery services task force recommendation to university librarian”; ken varnum et al., “university of michigan library article discovery working group final report.” 21. jason vaughan, “library web-scale discovery services,” library technology reports 47, no. 1 (january 2011). note: appendices a–h available as supplemental files. http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp investigations into library web-scale discovery services: appendices a-h jason vaughan information technology and libraries | march 2012 51 appendices appendix a. discovery task force timeline appendix b. discovery task force charge appendix c. discovery task force: staff survey 1 questions appendix d. discovery task force: staff survey 2 questions appendix e. discovery task force: early adopter questions appendix f. discovery task force: initial vendor investigation questions appendix g. vendor websites and example implementations appendix h. vendor visit questions investigations into library web-scale discovery services | vaughan 52 appendix a. discovery task force timeline information technology and libraries | march 2012 53 appendix b. discovery task force charge discovery task force charge informed through various efforts and research at the local and broader levels, and as expressed in the libraries 2010/12 strategic plan, the unlv libraries have the desire to enable and maximize the discovery of library resources for our patrons. specifically, the unlv libraries seek a unified solution which ideally could meet these guiding principles: • creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g. journal articles, images, archival materials). • enhances discoverability of as broad a spectrum of library resources as possible • intuitive: minimizes the skills, time, and effort needed by our users to discover resources •supports a high level of local customization (such as accommodation of branding and usability considerations) • supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure) •demonstrates commitment to sustainability and future enhancements •informed by preferred starting points as such, the discovery task force advises libraries administration on a solution that appears to best meet the goal of enabling and maximizing the discovery of library resources. a bulk of the work will entail a marketplace survey and evaluation of vendor offerings. charge specific deliverables for this work include: 1. identify vendor next generation discovery platforms, whether established and currently on the market, or those publicized and at an advanced stage of development, with an expectation of availability within a year’s time. identify & create a representative list of other academic libraries which have implemented or purchased currently available products. 2. create a checklist / criteria of functional requirements / desires for a next generation discovery platform. 3. create lists of questions to distribute to potential vendors and existing customers of next generation discovery platforms. questions will focus on broad categories such as the following: a. seek to understand how content hosted in our current online systems (iii catalog, contentdm, locally created databases, vendor databases, etc.) could/would (or not be able investigations into library web-scale discovery services | vaughan 54 to) be incorporated or searchable within the discovery platform. apart from our existing online systems as we know them today, the task force will explore, in general terms, how new information resources could be incorporated into the discovery platform. more explicitly, the task force will seek an understanding of what types of existing records are discoverable within the vendor’s next generation discovery platform, and seek an understanding of what basic metadata must exist for an item to be discoverable. b. seek to understand whether the solution relies on federated search, the creation of a central site index via metadata harvesting, or both, to enable discovery of items. c. additional questions, such as pricing, maintenance, install base, etc. 4. evaluate gathered information and seek feedback from library staff. 5. provide to the dean’s directs a final report which summarizes the task force findings. this report will include a recommended product(s) and a broad, as opposed to detailed, summary of workload implications related to implementation and ongoing maintenance. the final report should be provided to the dean’s directs by february 15, 2010. boundaries the work of the task force does not include: • detailing the contents of “hidden collections” within the libraries and seeking to make a concrete determination that such hidden collections, in their current form, would be discoverable via the new system. • conducting an inventory, recommending, or prioritizing collections or items which should be cataloged or otherwise enriched with metadata to make them discoverable. • coordination with other southern nevada nshe entities. • an ils marketplace survey. the underlying innovative millennium system is not being reviewed for potential replacement. • implementation of a selected product. [the charge concluded with a list of members for the task force] information technology and libraries | march 2012 55 appendix c. discovery task force: staff survey 1 questions “rank” means the surveymonkey question will be set up such that each option can only be chosen once, and will be placed on a scale that corresponds to the number of choices overall. “rate” means there will be a 5 point likert scale ranging from strongly disagree to strongly agree. section 1: customization. the “staff side” of the house 1. customization. it is important for the library to be able to control/tweak/influence the following design element [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  general color scheme  ability to include a unlv logo somewhere on the page.  ability to add other branding elements to the page.  ability to add one or more library specified links prominently in the interface (example: a link to the libraries’ home page)  able to customize the name of the product (meaning, the vendor’s name for the product doesn’t need to be used nor appear within the interface)  ability to embed the search box associated with the discovery platform elsewhere into the library website, such as the homepage (i.e. the user could start a search w/o having to directly go to the discovery platform 2. customization. are there any other design customization capabilities that are significantly important? please list, and please indicate if this is a high, low, or medium priority in terms of importance to you. (freetext box ) 3. search algorithms. it is important for the library to be able to change or tweak the platform’s native search algorithm to be able to promote desired items such that they appear higher in the returned list of [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] [e.g. the library, at its option, could tweak one or more search algorithms to more heavily weight resources it wants to promote. for example, if a user searches for “hoover dam” the library could set a rule that would heavily weight and promote unlv digital collection images for hoover dam – those results would appear on the first page of results]. 4. statistics. the following statistic is important to have for the discovery platform [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  number of searches, by customizable timeframe number of item or article level records accessed (that is, a user clicks on something in the returned list of results)  number of searches generating 0 results investigations into library web-scale discovery services | vaughan 56  number of items accessed by type  number of items accessed by provider of content (that is, number of articles from particular database/fulltext vendor 5. statistics. what other statistics would you like to see a discovery platform provide and how important is this to you? (freetext box) 6. staff summary. please rank on a 1-3 scale how important the following elements are, with a “1” being most important, a “2” being 2nd most important, and a 3 being 3rd most important.  heavy customization capabilities as described in questions 1 & 2 above  ability to tweak search algorithms as described in question 3  ability for the system to natively provide detailed search stats such as described in question 4, 5. section 2. the “end user” side of the house 7. searching. which of the following search options is preferable when a user begins their search [choose one]  the system has a “google-like” simple search box  the system has a “google-like” simple search box, but also has an advanced search capability (user can refine the search to certain categories: author, journal, etc.)  no opinion 8. zero hit searches. for a search that retrieves no actual results: [choose one]  the system should suggest something else or ask, “did you mean?”  retrieving precise results is more important and the system should not suggest something else or ask “did you mean?”  no opinion 9. de-duplication of similar items. which of the following is preferable [choose one]  the system automatically de-dupes records (the item only appears once in the returned list)  the system does not de-dupe records (the same item could appear more than once in the returned list, such as when we have overlapping coverage of a particular journal from multiple subscription vendors)  no opinion information technology and libraries | march 2012 57 10. sorting of returned results. it is important for the user to be able to sort or reorder a list of returned results by . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  publication date  alphabetical by author name  alphabetical by title  full text items first  by media type (examples: journal, book, image, etc) 11. web 2.0 functionality on returned results. the following items are important for a discovery platform to have . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] (note, if necessary, please conduct a search in the libraries’ encore system to help illustrate / remember some of the features/jargon mentioned below. in encore, “facets” appear on the left hand side of the screen; the results with book covers, “add to cart,” and “export” features appear in the middle; and a tag cloud to the right. note: this question is asking about having the particular feature regardless of which vendor, and not how well or how poorly you think the feature works for the encore system)  a tag cloud  faceted searching  ability to add user-generated tags to materials (“folksonomies”)  ability for users to write and post a review of an item • other (please specify) 12. enriched record information on returned results. the following items are important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  book covers for items held by the libraries  a google books preview button for print items held by the libraries  displays item status information for print items held by the libraries (example: available, checked out) 13. what the user can do with the results. the following functionality is important to have in the discovery system . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  retrieve the fulltext of an item with only a single click on the item from the initial list of returned results  ability to add items to a cart for easy export (print, email, save, export to refworks) investigations into library web-scale discovery services | vaughan 58  ability to place an interlibrary loan / link+ request for an item  system has a login/user account feature which can store user search information for later. in other words, a user could potentially log in to retrieve saved searches, previously stored items, or create alerts when new materials become available. 14. miscellaneous. the following feature/attribute is important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  the vendor has an existing mobile version of their discovery tool for use by smartphones or other small internet-enabled devices.  the vendor has designed the product such that it can be incorporated into other sites used by students, such as webcampus and/or social networking sites. such “designs” may include the use of persistent urls to embed hyperlinks, the ability to place the search box in another website, or specifically designed widgets developed by the vendor  indexing and availability of newly published items occurs within a matter of days as opposed to a week or perhaps a month.  library catalog authority record information is used to help return proper results and/or populate a tag cloud. 15. end user summary. please rank on a 1-8 scale how important the following elements are; a “1” means you think it is the most important, a “2” second most important, etc.  system offers a “google-like” simple search box only, as detailed in question 7 above  system offers a “did you mean?” or alternate suggestions for all searches retrieving 0 results as detailed in question 8 above (obviously, if you value precision of results over “did you mean” functionality, you would rank this toward the lower end of the spectrum).  system de-dupes similar items as detailed in question 9 above(if you believe the system should not dedupe similar items, you would rate this toward the lower end of the spectrum)  system provides multiple sort options of returned results as detailed in question 10 above  system offers a variety of web 2.0 features as detailed in question 11 above  system offer enriched record information as detailed in question 12 above  system offers flexible options for what a user can do with the results, as detailed in question 13 above  system has one or more miscellaneous features as detailed in question 14 above. section 3: content 16. incorporation of different information types. in an ideal world, a discovery platform would incorporate all of our electronic resources, whether locally produced or licensed/purchased from vendors. below is a listing of different information types. please rank on a scale of 1-10 how vital it is information technology and libraries | march 2012 59 that a discovery platform accommodate these information types (“1” is the most important item in your mind, a “2” is second most important, etc). a. innopac millennium records for unlv print & electronic holdings b. link+ records for print holdings held within the link+ consortium c. innopac authority control records d. records within oclc worldcat e. contentdm records for digital collection materials f. bepress digital commons institutional repository materials g. locally created web accessible database records (e.g. the special collections & architecture databases) h. electronic reserves materials hosted in eres i. a majority of the citation records from non fulltext, vendor licensed online index/abstract/citation databases (e.g. the “agricola” database) j. a majority of the fulltext articles or other research contained in many of our vendor licensed online resources (e.g. “academic search premier” which contains a lot of full text content, and the other fulltext resource packages / journal titles we subscribe to) 17. local content. related to item (g) in the question immediately above, please list any locally produced collections that are currently available either on the website, or in electronic format as a word document, excel spreadsheet or access database (and not currently available on the website) that you would like the discovery platform to incorporate. (freetext box) 18. particular sets of licensed resources, what’s important? please rank which of the licensed (full text or primarily full text) existing publishers below are most important for a discovery platform to accommodate. elsevier sage wiley springer american chemical society taylor & francis (informaworld) ieee american institute of physics oxford ovid nature emerald investigations into library web-scale discovery services | vaughan 60 section 4: survey summary 19. overarching survey question. the questions above were roughly categorized into three areas. given that no discovery platform will be everything to everybody, please rank on a 1-3 scale what the most important aspects of a discovery system are to you (1 is most critical, 2 is second in importance overall, etc.)  the platform is highly customizable by staff (types of things in area 1 of the survey)  the platform is highly flexible from the end-user standpoint (type of things in area 2 of the survey)  the platform encompasses a large variety of our licensed and local resources (type of things in area 3 of the survey) 20. additional input. the survey above is roughly drawn from a larger list of 71 questions sent to the discovery task force vendors. what other things do you think are really important when thinking about a next-generation discovery platform? (freetext input, you may write a sentence or a book) 21. demographic. what library division do you belong to? library administration library technologies research & education special collections technical services user services information technology and libraries | march 2012 61 appendix d. discovery task force: staff survey 2 question for the comparison questions, products are listed by order of vendor presentation. please mark an answer for each product. part i. licensed publisher content (e.g. fulltext journal articles; citations / abstracts) sa = strongly agree; a = agree; n= neither agree nor disagree; d = disagree; sd = strongly disagree 1. “the discovery platform appears to adequately cover a majority of the critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 2. “the discovery platform appears to adequately cover a majority of the second-tier or somewhat less critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 3. overall, from the content coverage point of view, please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 4. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the content coverage standpoint. unacceptable acceptable ex libris primo central investigations into library web-scale discovery services | vaughan 62 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part ii. end-user functionality & ease of use 5. from the user perspective, how functional do you think the discovery platform is? are the facets and/or other methods that one can use to limit or refine a search appropriate? were you satisfied with the export options offered by the system (email, export into refworks, print, etc.)? if you think web 2.0 technologies are important (tag cloud, etc.), were one or more of these present (and well executed) in this product? the platform appears to be severely limited in major aspects of end user functionality the platform appears to have some level of useful functionality, but perhaps not as much or as well executed as some competing products. yes, the platform seems quite rich in terms of end user functionality, and such functions are well executed. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 6. from the user perspective, for a full-text pdf journal article, how easy is it to retrieve the full-text? does it take many clicks? are there confusing choices? it’s very cumbersome trying to retrieve the full text of an item, there are many clicks, and/or it’s simply confusing when going through the steps to retrieve the full text. it’s somewhat straightforward to retrieve a full text item, but perhaps it’s not as easy or as well executed as some of the competing products it’s quite easy to retrieve a full text item using this platform, as good as or better than the competition, and i don’t feel it would be a barrier to a majority of our users. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central information technology and libraries | march 2012 63 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 7. how satisfied were you with the platform’s handling of “dead end” or “zero hit” searches? did the platform offer “did you mean” spelling suggestions? did the platform offer you the option to request the item via doc delivery / link+? is the vendor’s implementation of such features well executed, or were they difficult, confusing, or otherwise lacking? the platform appears to be severely limited in or otherwise poorly executes how it responds to a dead end or zero hit search. the platform handled dead end or zero hit results, but perhaps not as seamlessly or as well executed as some of the competing products. i was happy with how the platform handled “dead end” searches, and such functionality appears to be well executed, as good as or better than the competition. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 8. how satisfied were you with the platform’s integration with the opac? were important things such as call numbers, item status information, and enriched content immediately available and easily viewable from within the discovery platform interface, or did it require an extra click or two into the opac – and did you find this cumbersome or confusing? the platform provides minimal opac item information, and a user the platform appeared to integrate ok with the opac in i was happy with how the platform integrated with the i can’t comment on this particular product because i didn’t see the investigations into library web-scale discovery services | vaughan 64 would have to click through to the opac to get the information they might really need; and/or it took multiple clicks or was otherwise cumbersome to get the relevant item level information terms of providing some level of relevant item level information, but perhaps not as much or as well executed as competing products. opac. a majority of the opac information was available in the discovery platform, and/or their connection to the opac was quite elegant. vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 9. overall, from an end user functionality / ease of use standpoint – how a user can refine a search, export results, easily retrieve the fulltext, easily see information from the opac record – please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 10. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the user functionality / ease of use standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iii. staff customization information technology and libraries | march 2012 65 11. the “out of the box” design demo’ed at the presentation (or linked to the discovery wiki page – whichever particular implementation you liked best for that product) was . . seriously lacking and i feel would need major design changes and customization by library web technical staff. middle of the road – some things i liked, some things i didn’t. the interface design was better than some competing products, worse than others. appeared very professional, clean, well organized, and usable; the appearance was better than most/all of the others products. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 12. all products offer some level of customization options that allow at least some changes to the “out of the box” platform. based on what the vendors indicated about the level of customization possible with the platform (e.g. look and feel, ability to add library links, ability to embed the search box on a homepage) do you feel there is enough flexibility with this platform for our needs? the platform appears to be severely limited in the degree or types of customization that can occur at the local level. we appear “stuck” with what the vendor gives us – for better or worse. the platform appeared to have some level of customization, but perhaps not as much as some competing products. yes, the platform seems quite rich in terms of customization options under our local control; more so than the majority or all of the other products. i can’t comment on this particular product because i didn’t see the vendor demo, don’t have enough information, and/or would prefer to leave this question to technical staff to weigh in on. ex libris primo central oclc worldcat local ebsco discovery services innovative encore investigations into library web-scale discovery services | vaughan 66 synergy serials solutions summon 13. overall, from a staff customization standpoint – the ability to change the interface, embed links, define facet categories, define labels, place the searchbox in a different webpage, etc., please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 14. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iv. summary questions 15. overall, from a content coverage, user functionality, and staff customization standpoint, please rank each product from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon information technology and libraries | march 2012 67 16. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the overall standpoint of content coverage, user functionality, and staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part v. additional thoughts 17. please share any additional thoughts you have on ex libris primo central. (freetext box) 18. please share any additional thoughts you have on oclc worldcat local. (freetext box) 19. please share any additional thoughts you have on ebsco discovery services. (freetext box) 20. please share any additional thoughts you have on innovative encore synergy. (freetext box) 21. please share any additional thoughts you have on serials solutions summon. (freetext box) investigations into library web-scale discovery services | vaughan 68 appendix e. discovery task force: early adopter reference questions author’s note: appendix e originally appeared in the january 2011 library technology reports: web scale discovery services as chapter 7, “questions to consider.” part 1 background 1. how long have you had your discovery service available to your end users? (what month and year did it become generally available to your primary user population, and linked to your public library website). 2. after you had selected a discovery service, approximately how long was the implementation period – how long did it take to “bring it up” for your end‐users and make it available (even if in ‘beta’ form) on your library website? 3. what have you named your discovery service, and is it the ‘default’ search service on your website at this point? in other words, regardless of other discovery systems (ils, digital collection management system, ir, etc.), has the new discovery service become the default or primary search box on your website? part 2 content: article level content coverage & scope “article level content” = articles from academic journals, articles from mainstream journals, newspaper content, conference proceedings, open access content 4. in terms of article level content, do you feel the preindexed, preharvested central index of the discovery platform adequately covers a majority of the titles important to your library’s collection and focus? 5. have you observed any particular strengths in terms of subject content in any of the three major overarching areas -humanities, social sciences, sciences? 6. have you observed any big, or appreciable, gaps in any of the three major overarching areas – humanities, social sciences, sciences? 7. have you observed that the discovery service leans toward one or a few particular content types (e.g. peer reviewed academic journal content; mainstream journal content; newspaper article content; conference proceedings content; academic open access content)? 8. are there particular publishers whose content is either not incorporated, (or not adequately incorporated), into the central index, that you’d like to see included (e.g. elsevier journal content)? 9. have you received any feedback, positive or negative, from your institution’s faculty, related to the content coverage within the discovery service? 10. taking all of the above questions into consideration, are you happy, satisfied, or dissatisfied with the scope of subject content, and formats covered, in the discovery platform’s central index? 11. in general, are you happy with the level of article level metadata associated with the returned information technology and libraries | march 2012 69 citation level results (that is, before one retrieves the complete full text). in other words, the product may incorporate basic citation level metadata (e.g. title, author, publication info), or it may include additional enrichment content, such as abstracts, author supplied keywords, etc. overall, how happy do you sense your library staff is with the quality and amount of metadata provided for a “majority” of the article level content indexed in the system? part 3 content: your local library resources 12. it’s presumed that your local library ils bib records have been harvested into the discovery solution. do you have any other local “homegrown” collections – hosted by other systems at your library or institution – whose content has been harvested into the discovery solution? examples would include digital collection content, institutional repository content, library subject guide content, or other specialized, homegrown local database content. if so, please briefly describe the content – focus of collection, type of content (images, articles, etc.), and a ballpark number of items. if no local collections other than ils bib record content have been harvested, please skip to question 15. 13. [for local collections other than ils bib records]. did you use existing, vendor provided ingestors to harvest the local record content (i.e. ingestors to transfer the record content, apply any transformations and normalizations to migrate the local content to the underlying discovery platform schema)? or did you develop your own ingestors from scratch, or using a toolkit or application profile template provided by the vendor? 14. [for local collections other than ils bib records]. did you need extensive assistance from the discovery platform vendor to help harvest any of your local collections into the discovery index? if so, regardless of whether the vendor offered this assistance for free or charged a fee, were you happy with the level of service received from the vendor? 15. do you feel your local content (including ils bib records) is adequately “exposed” during a majority of searches? in other words, if your local harvested content equaled a million records, and the overall size of the discovery platform index was a hundred million records, do you feel your local content is “lost” for a majority of end user searches, or adequately exposed? part 4 interface: general satisfaction level 16. overall, how satisfied are you and your local library colleagues with the discovery service’s interface? 17. do you have any sense of how satisfied faculty at your institution are with the discovery service’s interface? have you received any positive or negative comments from faculty related to the interface? 18. do you have any sense of how satisfied your (non-faculty) end-users are with the discovery service’s interface? have you received any positive or negative comments from users related to the interface? 19. have you conducted any end-user usability testing related to the discovery service? if so, can you provide the results, or otherwise some general comments on the results of these tests? 20. related to searching, are you happy with the relevance of results returned by the discovery service? have you noticed any consistent “goofiness,” or surprises with the returned results? if you could make a investigations into library web-scale discovery services | vaughan 70 change in the relevancy arena, what would it be, if anything? part 5 interface: local customization 21. has your library performed what you might consider any “major customization” to the product? or has it primarily been customizations such as naming the service, defining hyperlinks and the color scheme? if you’ve done more extensive customization, could you please briefly describe, and was the product architecture flexible enough to allow you to do what you wanted to do (also see question 22 below, which is related). 22. is there any particular feature or function that is missing or non-configurable within the discovery service that you wish were available? 23. in general, are you happy with the “openness” or “flexibility” of the system in terms of how customizable it is by your library staff? part 6: final thoughts 24. overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product? 25. have you conducted any statistics review or analysis (through the discovery service statistics, or link resolver statistics, etc.) that would indicate or at least suggest that the discovery service has improved the discoverability of some of your materials (whether local library materials or remotely hosted publisher content). 26. if you have some sense of the competition in the vendor discovery marketplace, do you feel this product offers something above and beyond the other competitors in the marketplace? if so, what attracted you to this particular product, what made it stand out? information technology and libraries | march 2012 71 appendix f. discovery task force: initial vendor investigation questions section 1: general / background questions 1. customer install base how many current customers do you have that have which have implemented the product at their institution? (the tool is currently available to users / researchers at that institution) how many additional customers have committed to the product? how many of these customers fall within our library type (e.g. higher ed academic, public, k-12)? 2. references can you provide website addresses for live implementations which you feel serve as a representative model matching our library type? can you provide references – the name and contact information for the lead individuals you worked with at several representative customer sites which match our library type? 3. pricing model, optional products describe your pricing model for a library type such as ours, including initial upfront costs and ongoing costs related to the subscription and technical support. what optional add-on services or modules (federated search, recommender services, enrichment services) do you market which we should be aware of, related to and able to be integrated with your web scale discovery solution? 4. technical support and troubleshooting briefly describe options customers have, and hours of availability, for reporting mission critical problems; and for reporting observed non mission-critical glitches. briefly describe any consulting services you may provide above and beyond support services offered as part of the ongoing subscription. (e.g. consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). is there a process for suggesting enhancement requests for potential future incorporation into the product? 5. size of the centralized index. how many periodical titles does your preharvested, centralized index encompass? how many indexed items? 6. statistics. please describe what you feel are some of the more significant use, management or content related statistics available out-of-the-box with your system. investigations into library web-scale discovery services | vaughan 72 are the statistics counter compliant? 7. ongoing maintenance activities, local library staff. for instances where the interface and discovery service is hosted on your end, please describe any ongoing local library maintenance activities associated with maintaining the service for the local library’s clientele (e.g. maintenance of the link resolver database; ongoing maintenance associated with periodic local resource harvest updates; etc.) section 2: local library resources 8. metadata requirements and existing ingestors. what mandatory record fields for a local resource has to exist for the content to be indexed and discoverable within your platform (title, date)? please verify that your platform has existing connectors -ingest/transform/normalize tools and transfer mechanisms and/or application profiles for the following schema used by local systems at our library (e.g. marc 21 bibliographic records; unqualified / qualified dublin core, ead, etc.) please describe any standard tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform. our library uses the abc digital collection management software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? our library uses the abc institutional repository software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? 9. resource normalization. is content for both local and remote content normalized to a single schema? if so, please offer comments on how local and remote (publisher/aggregator) content is normalized to this single underling schema. to what degree can collections from different sources have their own unique field information which is displayed and/or figures into the relevancy ranking algorithm for retrieval purposes. 10. schedule. for records hosted in systems at the local library, how often do you harvest information to account for record updates, modifications, deletions? can the local library invoke a manual harvest of locally hosted resource records on a per-resource basis (e.g. from a selected resource – for example, if the library launches a new digital collection and want the records to be available in the new discovery platform shortly after they are available in our local digital collection management system, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine? after harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the searchable central index? information technology and libraries | march 2012 73 11. policies / procedures. please describe any general policies and procedures not already addressed which the local library should be aware of as relates to the harvesting of local resources. 12. consortial union catalogs. can your service harvest or provide access to items within a consortial or otherwise shared catalog (e.g. the inn-reach catalog). please describe. section 3: publisher and aggregator indexed content 13. publisher/aggregator agreements: general with approximately how many publishers have you forged content agreements with? are these agreements indefinite or do they have expiration dates? have you entered into any exclusive agreements with any publishers/aggregators (i.e. the publisher/aggregator is disallowed from forging agreements with competing discovery platform vendors, or disallowed from providing the same deep level of metadata/full text for indexing purposes). 14. comments on metadata provided. could you please provide some general comments on the level of data provided to you, for indexing purposes, by the “majority” of major publishers/aggregators with which you have forged agreements. please describe to what degree the following elements play a role in your discovery service: a. “basic” bibliographic information (article title/journal title/author/publication information) b. subject descriptors c. keywords (author supplied?) d. abstracts (author supplied?) e. full text 15. topical content strength do you feel there is a particular content area that you feel the service covers especially well or leans heavily toward (e.g. humanities, social sciences, sciences). do you feel there is a particular content type that you feel the service covers very well or leans heavily toward (scholarly journal content, mainstream journal content, newspapers, conference proceedings). what subject / content areas, if any, do you feel the service may be somewhat weak? are there current efforts to mitigate these weaknesses (e.g. future publisher agreements on the horizon)? 16. major publisher content agreements. are there major publisher agreements that you feel are especially significant for your service? if so, which publishers, and why (e.g. other discovery platform vendors may not have such agreements with those particular providers; the amount of content was so great that it greatly augmented the size and scope of your service; etc.) investigations into library web-scale discovery services | vaughan 74 17. content considered key by local library (by publisher). following is a list of some major publishers whose content the library licenses which is considered “key.” has your company forged agreements with these publishers to harvest their materials. if so please describe in general the scope of the agreement. how many titles are covered for each publisher? what level of metadata are they providing to you for indexing purposes (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. elsevier b. ex. sage c. ex. taylor and francis d. ex. wiley / blackwell 18. content considered key by local library (by title). following is a list of some major journal / newspaper titles whose content the library licenses which is considered “key.” could you please indicate if your central index includes these titles, and if so, the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. nature b. ex. american historical review c. ex. jama d. ex. wall street journal 19. google books / google scholar. do any agreements exist at this time to harvest the data associated with the google books or google scholar projects into your central index? if so, could you please describe the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). 20. worldcat catalog. does your service include the oclc worldcat catalog records? if so, what level of information is included? the complete record? holdings information? 21. e-book vendors. does your service include items from major e-book vendors? 22. record information. given the fact that the same content (e.g. metadata for a unique article) can be provided by multiple sources (e.g. the original publisher of the journal itself, an open access repository, a database / aggregator, another database / aggregator, etc.), please provide some general comments on how records are built within your discovery service. for example: a. you have an agreement with a particular publisher/aggregator and they agree to provide you with rich metadata for their content, perhaps even provide you with indexing they’ve already done for their content, and may even provide you with the full text for you to be able to “deep index” their content. b. you’ve got an agreement with a particular publisher who happens to be the only publisher/provider of that content. they may provide you rich info, or they may provide you rather weak info. in any case, you choose to incorporate this into your service, as they are the only provider/publisher of the info. or, information technology and libraries | march 2012 75 alternately, they may not be the only publisher/provider of the info, but they are the only publisher/provider you’ve currently entered into an agreement with for that content. c. for some items appearing within your service, content for those items is provided by multiple different sources whom you’ve made agreements with. in short, there will be in some/many cases of overlap for unique items, such as a particular article title. in such cases, do you create a “merged/composite/super record” -where your service utilizes particular metadata from each of the multiple sources, creating a “strong” single record built from these multiple resources. 23. deduping. related to the question immediately above, please describe your services’ approach (or not) to deduplicating items in your central index. if your service incorporates content for a same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? or do you create a merged/composite/super record, and only this single record is displayed? please describe. section 4: open access content 24. open access content sources. does your service automatically include (out of the box, no additional charge) materials from open access repositories? if so, could you please list some of the major repositories included (e.g. arxiv e-prints; hindawi publishing corporation; the directory of open access journals; hathi trust materials; etc.). 25. open access content sources: future plans. in addition to the current open access repositories that may be included in your service, are there other repositories whose content you are planning to incorporate in the future? 26. exposure to other libraries’ bibliographic / digital collection / ir content. are ils bibliographic records from other customers using your discovery platform exposed for discoverability in the searchable discovery instance of another customer? are digital collection records? institutional repository records? section 5: relevancy ranking 27. relevancy determination. please describe some of the factors which comprise the determination of relevancy within your service. what elements play a role, and how heavily are they weighted for purposes of determining relevancy? 28. currency. please comment on how heavily currency of an item plays in relevancy determination. does currency weigh more heavily for certain content types (e.g. newspapers)? 29. local library influence. does the local library have any influence or level of control over the relevancy algorithm? can they choose to “bump up” particular items for a search? please describe. 30. local collection visibility. could you please offer some comments on how local content (e.g. ils bibliographic records; digital collections) remains visible and discoverable within the larger pool of content indexed by your service? for example, local content may measures a million items, and your centralized index may cover half a billion items. investigations into library web-scale discovery services | vaughan 76 31. exposure of items with minimal metadata. some items likely have lesser metadata than other items. could you please offer some comments on how your system ensures discoverability for items with lesser or minimal metadata. 32. full text searching. does your service offer the capability for the user to search the fulltext of materials in your service (i.e. are they searching a full text keyword index?) if so, approximately what percentage of items within your service are “deep indexed?” 33. please describe how your system deals when no hits are retrieved for a search. does your system enable “best-match” retrieval – that is, something will always be returned or recommended? what elements play into this determination; how is the user prevented from having a completely “dead-end” search? section 6: authentication and rights management 34. open / closed nature of your discovery solution. does your system offer an unauthenticated view / access? please describe and offer some comments on what materials will not be discoverable/visible for an unauthenticated user. a. licensed full text b. records specifically or solely sourced from abstract and indexing databases c. full citation information (e.g. an unauthenticated user may see just a title; an authenticated user would see fuller citation information) d. enrichment information (such as book image covers, table of contents, abstracts, etc.) e. other 35. exposure of non-licensed resource metadata. if one weren’t to consider and take into account any e-journal/publisher package/database subscriptions & licenses the local library pays for, is there a base index of citation information that’s exposed and available to all subscribers of your discovery service? this may include open access materials, and/or bibliographic information for some publisher / aggregator content (which often requires a local library license to access the full text). please describe. would a user need to be authenticated to search (and retrieve results from) this “base index?” approximately how large is this “base index” which all customers may search, regardless of local library publisher/aggregator subscriptions. 36. rights management. please discuss how rights management is initialized and maintained in your system, for purposes of determining whether a local library user should have access to the full text (or otherwise “full resolution” if a library doesn’t license the fulltext – such as resolution to a detailed citation/abstract). information technology and libraries | march 2012 77 our library uses the abc link resolver. our library uses the abc a-z journal listing service. our library uses the abc electronic resource management system. is your discovery solution compatible with one/all of these systems for rights management purposes? is one approach preferable to the other, or does your approach explicitly depend on one of these particular services? section 7: user interface 37. openness to local library customization. please describe how “open” your system is to local library customization. for example, please comment on the local library’s ability to a. rename the service b. customize the header and footer hyperlinks / color scheme c. choose which facet clusters appear d. define new facet clusters e. embed the search box in other venues f. create canned, pre-customized searches for an instance of the search box g. define and promote a collection, database, or item such that it appears at the top or on the first page of any search i. develop custom “widgits” offering extra functionality or download “widgits” from an existing user community (e.g. image retrieval widgits such as flickr integration; library subject guide widgits such as libguides integration; etc. j. incorporate links to external enriched content (e.g. google book previews; amazon.com item information) k. other 38. web 2.0 social community features. please describe some current web 2.0 social features present in your discovery interface (e.g. user tagging, ratings, reviews, etc.). what, if any, plans do you have to offer or expand such functionality in future releases? 39. user accounts. does your system offer user accounts? if so, are these mandatory or optional? what services does this user account provide? a. save a list of results to return to at a later time? investigations into library web-scale discovery services | vaughan 78 b. save canned queries for later searching? c. see a list of recently viewed items? d. perform typical ils functions such as viewing checked out items / renewals / holds? e. create customized rss feeds for a search 40. mobile interface. please describe the mobile interfaces available for your product. is it a browser based interface optimized for smallscreen devices? is it a dedicated iphone, android, or blackberry based executable application? 41. usability testing. briefly describe how your product incorporates published, established “best practices” in terms of a customer focused, usable interface. what usability testing have your performed and/or do you conduct on an ongoing basis? have any other customers that have gone live with your service completed usability testing that you’re aware of? information technology and libraries | march 2012 79 appendix g: vendor websites and example implementations oclc worldcat local www.oclc.org/us/en/worldcatlocal/default.htm example implementations: lincoln trails library system www.lincolntrail.info/linc.html university of delaware www.lib.udel.edu university of washington www.lib.washington.edu willamette university http://library.willamette.edu serials solutions summon www.serialssolutions.com/summon example implementations: dartmouth college www.dartmouth.edu/~library/home/find/summon drexel university www.library.drexel.edu university of calgary http://library.ucalgary.ca western michigan university http://wmich.summon.serialssolutions.com ebsco discovery services www.ebscohost.com/discovery example implementations: james madison university www.lib.jmu.edu mississippi state university http://library.msstate.edu northeastern university www.lib.neu.edu university of oklahoma http://libraries.ou.edu investigations into library web-scale discovery services | vaughan 80 innovative interfaces encore synergy encoreforlibraries.com/tag/encore-synergy example implementations: university of nebraska-lincoln http://encore.unl.edu/iii/encore/home?lang=eng university of san diego http://sallypro.sandiego.edu/iii/encore/home?lang=eng scottsdale public library http://encore.scottsdaleaz.gov/iii/encore/home?lang=eng sacramento public library http://find.saclibrarycatalog.org/iii/encore/home?lang=eng ex libris primo central www.exlibrisgroup.com/category/primocentral example implementations: (note: example implementations are listed in alphabetical order. some implementations are more open to search by an external audience, based on configuration decisions at the local library level.) brigham young university scholarsearch www.lib.byu.edu (note: choose all-in-one search) northwestern university http://search.library.northwestern.edu vanderbilt university discoverlibrary http://discoverlibrary.vanderbilt.edu (note: choose books, media, and more) yonsei university (korea) wisearch: articles + library holdings http://library.yonsei.ac.kr/main/main.do (note: choose the articles + library holdings link. the interface is available in both korean and english; to change to english, select english at the top right of the screen after you have conducted a search and are within the primo central interface) information technology and libraries | march 2012 81 appendix h. vendor visit questions content 1. please speak to how well you feel your product stacks up against the competition in terms of the licensed full-text / citation content covered by your product. based on whatever marketplace or other competitive analysis you may have done, do you feel the agreements you’ve made with publishers equal, exceed, or trail the agreements other competitors have made? 2. from the perspective of an academic library serving undergraduate and graduate students as well as faculty, do you feel that there are particular licensed content areas your product covers very well (e.g. humanities, social sciences, sciences). do you feel there are areas which you need to build up? 3. what’s your philosophy going forward in inking future agreements with publishers to cover more licensed content? are there particular key publishers your index currently doesn’t include, but whom you are in active negotiations with? 4. we have several local content repositories, such as our digital collections in contentdm, our growing ir repository housed in bepress, and locally developed, web-searchable mysql databases. given the fact that most discovery platforms are quite new, do you already have existing customers harvesting their local collections, such as the above, into the discovery platform? have any particular, common problems surfaced in their attempts to get their local collections searchable and exposed in the discovery platform? 5. let’s say the library subscribes to an ejournal title – journal of animal studies -that’s from a publisher with whom you don’t have an agreement for their metadata, and thus, supposedly, don’t index. if a student tried to search for an article in this journal – “giraffe behavior during the drought season,” what would happen? is this content still somehow indexed in your tool? would the discovery platform invoke our link resolver? please describe. 6. our focus is your next generation discovery platform, and not on your “traditional” federated search product which may be able to cover other resources not yet indexed in your next generation discovery platform. that said, please briefly describe the role of your federated search product vis a vis the next generation discovery platform. do you see your federated search product “going away” once more and more content is eventually indexed in your next generation discovery platform? end user interface & functionality 7. are there any particular or unique look and feel aspects of your interface that you feel elevate your product above your competitors? if so, please describe. 8. are there any particular or unique functionality aspects of your product that you feel elevate it above the competition (e.g. presearch or postsearch refinement categories, export options, etc.) 9. studies show that end users want very quick access to full text materials such as electronic journal articles and ebooks. what is your product’s philosophy in regards to this? does your platform, in your opinion, provide seamless, quick access to full text materials, with a minimum of confusion? please describe. investigations into library web-scale discovery services | vaughan 82 related to this, does your platform de-dupe results, or is the user presented with a list of choices for a single, particular journal article they are trying to retrieve? in addition, please describe a bit how your relevancy ranking works for returned results. what makes an item appear first or on the first page of results? 10. please describe how “well” your product integrates with the library’s opac (in our case, innovative’s millennium opac). what information about opac holdings can be viewed directly in the discovery platform w/o clicking into the catalog and opening a new screen (e.g. call #, availability, enriched content such as table of contents or book covers?) in addition, our opac uses “scopes” which allow a user – if they choose – to limit at an outset (prior to a search being conducted) what collection they are searching. in other words, these scopes are location based, not media type based. for our institution, we have a scope for the main library, one for each of our three branch libraries, and a scope for the entire unlv collection. would your system be able to incorporate or integrate these pre-existing scopes in an advanced search mode? and/or, could these location based scopes appear as facets which a user could use to drill down a results list? 11. what is your platform’s philosophy in terms of “dead end searches.” does such a thing exist with your product? please describe what happens if a user a.) misspells a word b.) searches for a book or journal title / article that our library doesn’t own/license, but that we could acquire through interlibrary loan. staff “control” over the end user interface 12. how “open” is your platform to customization or interface design tweaks desired by the library? are there any particular aspects that the library can customize with your product that you feel elevate it above your competitors (e.g. defining facet categories; completely redesigning the end-user interface with colors, links, logos; etc.)? what are the major things customizable by the library, and why do you think this is something important that your product offers. 13. how “open” is your platform to porting over to other access points? in other words, provided appropriate technical skills exist, can we easily embed the search box for your product into a different webpage? could we create a “smaller,” more streamlined version of your interface for smartphone access? overarching question 14. in summary, what are some of the chief differentiators of your product from the competition? why is your product the best and most worthy of serious consideration? abstract introduction why web-scale discovery? q: if you could provide one piece of advice to your library, what would it be? the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task force charged to identify, research, evaluate, and recommend a pot... activity: understanding web-scale activity: initial staff survey table 1. web-scale discovery service capabilities activity: second staff survey activity: early adopter references activity: vendor identification activity: vendor investigations activity: product development tracking activity: recommendation next steps references author id box for 2 column layout editorial board thoughts | hirst 5 donna hirsteditorial board thoughts the iowa city flood of 2008: a librarian and it professional’s perspective d o you like to chase fire trucks? do you enjoy watching a raft of adventurers go over the waterfall, careening from rock to rock? well, this is a story of the iowa city flood of 2008, a flood projected to happen once every five hundred years, from the perspective of a librarian and it professional. n the approach of the flood the winter of 2008 was hard, and we got mounds of snow. the spring was wet that year in iowa city. it rained almost every day. minnesota’s snow melt-off hadn’t been released from the reservoir due to the heavy rains. everyone watched the river rise, day by day. the parks were underwater; the river was creeping up toward buildings, including the university of iowa. in early june, with about a day and a half notice, library staff at the university’s main library, art library, and music library were told to evacuate. one of the first acts of evacuation was the relocation of all of the library servers to the engineering building up the hill—high and dry—literally rolling them across the street and up the sidewalk. although all servers were relocated to engineering, engineering didn’t have enough power in their server room to handle the extra capacity to run all of our machines. the five primo servers that run our discovery searching service had to stay disconnected. with the servers safe and sound, we moved our attention to staff workstations. the personal workstations of the administrative staff and the finance department were moved to the business library. the libraries’ laptops were collected and moved into the branch libraries, which would be receiving displaced staff. many staff would be expected to work from public clusters in the various library branches, locked down to specific functions. as library staff were collecting their critical possessions, the town was madly sandbagging. more than a million sandbags were piled around university buildings, private businesses, and residences. in retrospect, some of the sandbags may have made a difference, but since the flood was so much greater than anticipated, the water largely went over and around, leaving a lot of soggy sandbags. on june 13, the day before the main library was to be closed, the decision was made to move books up from the basement. there were well over 500,000 volumes in the basement, and a group of approximately five hundred volunteers moved 62,000 volumes and 37,000 manuscript boxes from the lower shelves. volunteers passed books hand to hand into the third, fourth, and fifth floors of the building. a number of the volunteers came from sandbagging teams. individuals who had never been in a boxes of manuscripts being stacked on the fifth floor photo by carol jonck moving boxes out of the basement photo courtesy of the university of iowa news service donna hirst (donna-hirst@uiowa.edu) is project coordinator, library information technology, university of iowa libraries, iowa city. 6 information technology and libraries | december 2008 library, didn’t know what a circulation desk was, or what a library of congress call number was were working hard side by side with physicians, ministers, scientists, students, and retirees. the end result was not orderly, but the collection was saved from the encroaching river. the libraries at the university of iowa are indebted to these volunteers who helped protect the collection from the expected water. n the river peaks approximately twenty university buildings were closed because of the flood, including the main library, the art building, and the music building. the university’s power plant was closed. the entire arts campus was deeply under water. most of the main roads connecting the east side of iowa city to the west side were closed, and most of the highways into iowa city were closed. interstate 80 was closed in multiple places, and no traffic was allowed from the east side of the state to the west side. many bridges in and around iowa city were closed; some had actually crumbled and floated down stream. so the president of the university, sally mason, closed the university for the first time in its history. most staff would not be able to get to work anyway. many individuals were struggling with residences and businesses that were under water. the university was to be closed for the week of june 15, with the university’s hospitals continuing to operate under strained conditions; continued delivery of patient services was a priority. most library staff stayed home and followed the news stories, shocked at the daily news of destruction and loss. select library it staff began working in the background to set up new work environments for library staff returning to foreign workstations or relocated work environments. at the flood’s peak, the main library took several inches of water in the basement. there was slight rusting in the compact shelving, but the collection was completely saved. a portion of the basement was lower, and the computer equipment controlling the libraries’ public computer cluster was completely ruined. this computer cluster housing more than two hundred workstations library staff and volunteers sandbagging photo by carol jonck moving books out of the basement photo courtesy of the university of iowa news service the beginning of a book chain to the fourth floor photo courtesy of the university of iowa news service editorial board thoughts | hirst 7 which had been moved on the last day before the evacuation. much of this administrative work could proceed, and during the first week at the business library our finance department successfully completed our end-ofyear rollover process on all our materials funds. staff from the music library, art library, preservation, and special collections were assigned to the business library. the engineering library adopted the main library circulation and reserve departments. the media services staff was relocated to the physics library. the media staff had cleverly pulled most of the staff development videos and made them available to staff from the physics library, thus allowing the many displaced library staff to make progress on staff development requirements. was completely out of commission. the basements and first floors of the art and music buildings were completely ruined, but the libraries for these disciplines were on higher floors. the collections were spared, but there was absolutely no access to the building. n the libraries take baby steps to resume service after a week of being completely shut down, the university opened to a first day of summer school, but things were not the same. for the nineteen university buildings that had been flooded, hordes of contractors, subcontractors, and laborers began the arduous task of reclamation. university staff could work at home when that was possible, and most of the library’s dislocated reference staff did that, developing courses for the fall, progressing on selection work, and so on. staff could take vacation, but few chose this option. approximately 160 staff from the main library and the art and music libraries were reassigned to four branch libraries that were not affected by the flood. all of central technical services (cts) and interlibrary loan staff were assigned to the hardin health science library. central shipping and facilities was also at harden library, thus the convoluted distribution of mail started from here. most of the public machines were taken by cts staff, but their routine work proceeded very slowly. cts did not have access to oclc until the end of their flood relocation, which seriously impacted their workflow. an early problem that had to be solved was providing telephones and printing to relocated staff. virtually none of the relocated staff had dedicated telephones, even the administration. in any given location the small number of regular branch staff graciously shared their phones with their visitors. sharing equipment tended to be true for printers as well. for a few critical phone numbers in the main library, the phone number was transferred to a designated phone in the branch. thus often, when regular staff or student workers answered a phone, they had no idea what number the originating caller was trying to call. staff were encouraged to transfer their office phone number to their cell phone. at the business library, the library administrative staff and the finance staff had their personal workstations, library staff sandbagging photo by donald baxter 8 information technology and libraries | december 2008 was closed for about four weeks. the art and music libraries may be closed for a year. when library staff returned to the main library, there were books and manuscript boxes piled on the floor and on top of all the study tables. some of the main corridors, approximately twentyone feet wide, were so filled with library materials that you almost had to walk sideways and suck in your tummy to walk down the hall. bathrooms were blocked and access to elevators was limited. every library study table on the third through fifth floors were piled three feet high or more with books. for many weeks, library staff and volunteers carefully sorted through the materials and reshelved them as required. many materials needed conservation treatment, not because of the flood, but because of age and handling. many adjustments needed to be made to resume full service. due dates for all circulation categories had to be retrospectively altered to allow for the libraries being closed and for the extraordinary situations in which our library users found themselves during the flood. library materials were returned wet and moldy, and some items were lost. during the flood, in some cases, buildings actually floated down river. the libraries’ preservation department did extensive community education regarding treatment of materials damaged in the flood. the university was very interested in documenting the affect of the flood, and thus the libraries cooperated in trying to gather statistics on the number of hours of library staff and volunteers used during the flood. record keeping was complex, since one person could be a staff person working on flood efforts but also a volunteer working evenings and weekends. n our neighbors the effect of the iowa city flood of 2008 has been extensive, but was nothing compared to the flood in cedar rapids, our neighbor to the north. the cedar rapids public library lost their entire collection of 300,000 volumes, except for the children’s collection and 26,000 volumes that were checked out to library users that week. it staff were housed throughout the newly distributed libraries complex. one it staff member was at the engineering library, one was at the health science library, and two were at the business library. several it staff were relocated to the campus computer center. n the libraries proceed apace despite hurdles as the water receded and workers cleaned and proceeded with air handling and mold abatement, a very limited number of library staff were allowed back into the main library, typically with escorts, for very limited periods of time. during this time it staff was able to go into the main library and retrieve barcode scanners to allow cts staff to progress with book processing. staff went back for unprocessed materials needing original cataloging since staff had the time to process materials but didn’t have the materials. it staff retrieved some of our zebra printers so that labels could be applied to unbound serials. as it staff were allowed limited access to the main library, they went around to the various staff workstations and powered them up so that relocated staff could utilize the remote desktop function. n moving back the art and music libraries were evacuated june 10. the main library was evacuated june 13. the main library passing the books up the stairs photo courtesy of the university of iowa news service towards an open source-first praxis in libraries article towards an open source-first praxis in libraries j. robertson mcilwain information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16025 about the author j. robertson mcilwain (corresponding author: mcilwain@berlin-international.de) is librarian and research associate, berlin international university of applied sciences © 2023. submitted: december 22, 2022. accepted for publication: september 12, 2023. published: 18 december 2023. abstract in terms of utility and technical quality, open-source software solutions have become a common option for many libraries. as barriers to adoption have been reduced and systems such as folio appear poised to change the landscape of lis technology, it is worth examining how the use of open source can support the normative core values of librarianship and to outline a strategy for critical engagement with the technology that is beneficial to patrons and libraries. such a strategy will require further codification, institutionalization, and investigation of open source at many levels. introduction open-source software has continued to gain popularity among libraries in the past decade. it has moved from the periphery to become a major competitor with some of the most established software in the library technology sector, but implementation has been uneven and is still represented in only a small percentage of libraries. among those that have adopted open-source systems, the language used to describe the switch is often related more to pragmatism than normative concerns.1 as acceptance of open source as a legitimate technical alternative to proprietary systems has gained traction, some may be interested in reevaluating the heretofore utilitarian drivers of open source adoption and ask how it can bolster the values and ideals of librarianship. the open-source movement, while sharing some of the same civic ideals as librarianship, is not as motivationally coherent. some corners of the movement are motivated by industrial or market concerns. therefore, as open source emerges as a common option for many libraries, it is in the interests of the profession to establish, early on, the terms on which it will critically engage with open source. as software has matured and third-party support has expanded, the technical barriers to adopting open source have greatly diminished and, especially when viewed through the lens of critical librarianship, the reasons to choose open source are more pertinent than ever. as noted, for many libraries, the conversation has up until now focused, and not entirely unjustly so, largely on utility and cost-effectiveness (an unfortunately myopic view of open-source software that stops at “potential utility” and highlights “ease of installation”) while ignoring how open source can support the values of librarianship and the library’s mission. while questions of support personnel and budget are still relevant, advances in the past decade mean that they no longer must represent the entirety of the discussion of open source in libraries. libraries now have the opportunity to look at what is arguably the more fundamental reason they should adopt an open-source-first praxis, an approach where closed-source proprietary systems should only be considered as a last resort. mailto:mcilwain@berlin-international.de information technology and libraries december 2023 towards an open source-first praxis in libraries 2 mcilwain libraries have a duty to their patrons. in order to serve them well, the profession has adopted a set of associated core values such as service, privacy, equity of access, stewardship, and intellectual freedom. the use of closed-source technology presents complicated ethical questions related to, among other things, information security, privacy, and transparency. fortunately, the lis and open-source communities share many of the same core values and can support each other in addressing the deficiencies and transgressions of proprietary software. because of the lowered barriers to entry and because the values of librarianship and the opensource community complement each other so well, open-source technologies present libraries with both a pragmatic solution to better serve patrons and a solution that aligns with the values of the profession. the justification arguments for libraries to use open source represent the intersection of pragmatic, utilitarian, and moral nonutilitarian stances. however, if open source is to reinforce the mission of libraries, it must be viewed through a critical lens. librarians must ask whether efforts to develop and introduce systems that are fundamental to their missions are best led by private enterprise or by libraries themselves. the motivation of this article is to review the current state of open-source technology in lis, address common concerns, especially regarding the principles of librarianship, and critically evaluate developments in the field. the use of open-source technology presents a pragmatic opportunity for libraries, but if not approached thoughtfully, it could potentially result in a compromise of professional ethics like what has already occurred more generally with the commodification of the information profession.2 theoretical note broadly speaking, this article is informed by a critical theoretical approach. “critical theories have been applied to lis under a general umbrella of ‘critical librarianship,’ which takes an explicitly political approach to information work, seeking to promote ethical practices which support the ethical creation and communication of scholarly knowledge with a focus on implications for social justice.”3 moreover, this article advocates a praxis in line with that defined by john budd: “action that carries social and ethical implications and is not reducible to technical performance of tasks.”4 more specifically, much is owed to bergquist et al.’s application of boltanski’s and thévenot’s justification framework to the development of the free and open-source software movement.5 it is further applied here to the use of open source in lis. put briefly, the framework presents a typology that describes how actors in various settings justify means and initiatives. 6 the typology is composed of six justification logics: inspirational, related to seeking an authenticity in life via artistry; domestic, related to maintenance of a traditional status quo; popular, in which personal aggrandizement is prioritized; civic, where the common good is paramount; market, where commerce is the focus; and industrial, where qualities such as efficiency, productivity, and functionality are used to justify actions (see table 1). this framework is particularly useful in a discussion of praxis since the nuances of motivation and justification can be more easily clarified. after briefly providing context for open source, its current use in libraries, and the core values associated with librarianship, i use this framework to inform my discussion of open source and librarianship. information technology and libraries december 2023 towards an open source-first praxis in libraries 3 mcilwain table 1. boltanski’s and thévenot’s justification typology justification logic defined by inspirational authenticity in life via artistry domestic maintenance of a traditional status quo popular personal aggrandizement civic prioritization of common good market prioritization of commerce industrial efficiency, productivity, and functionality open-source technology what is discussed here as open source is known as open-source software (oss), free and opensource software (foss) or free libre open-source software (floss or f/loss); each variation representing conflicting philosophies within the movement that range from communal development for the public good to profit-maximizing neoliberal business models. in the interest of simplicity and brevity, and since it is the most commonly used term within lis literature, the terms open source and open-source software are used throughout this discussion. the concepts underpinning open source were first introduced in the 1980s as private firms began restricting access to software (specifically to its source code) under the auspices of intellectual property rights. it was at this time that the gnu general public license (gpl) was written by richard stallman, the founder of the free software foundation. it stipulated that items licensed under the gpl were subject to the “four essential freedoms” to run, study, share, and modify the information therein, and that any derivative works should be subject to those freedoms as well. this latter concept, related to derivative works, is known as “copyleft.” according to ettlinger, “many open-source and free software developers have deliberately subverted the idea of intellectual-property rights and, in the process, created a rich common to which all could contribute, according to their abilities, and from which all could benefit, according to their needs; where innovations could be shared for free.”7 following this initial period of idealistically motivated development came another decisive moment for open source when linus torvalds, while working on linux in the early 1990s, discovered that by releasing the code as he went and making it easy for others to review and contribute to, the quality of the software was much higher than if one person or team were working on it in isolation. torvalds estimated that he only coded 2% of the project himself; the remainder came from contributors.8 soon industry found it difficult to ignore a development model that offered such a cost-effective approach to making high-quality software. later, other licenses, referred to as “permissive,” were introduced that did not require that the derivative works observe the same freedoms as the original. as a result, they were seen as less hostile to intellectual property and private enterprise. while a compromise of the original principles of the free software movement, this change was seen as a major turning point for open source, as it resulted in a significant growth in the amount of, and use of, open-source software. as the foundational freedoms were de-emphasized, we see the term “open source” instead of “free software” used more often from this point forward. information technology and libraries december 2023 towards an open source-first praxis in libraries 4 mcilwain today open source is a common foundation for, or component of, proprietary software, and firms like google and microsoft are major contributors to the development of open source. likewise, in the lis sector, it is not uncommon for open-source technologies to represent significant components of closed-source systems. within these developments of the open-source movement, there can be observed three major currents of importance for the present argument, or put another way, using the concepts of justification borrowed from luc boltanski and laurent thevénot, the three justification regimes employed for the use of open source could be described as civic, industrial, and market logic.9 during its early stages, use of open-source software was dominated by a civic logic based “on principles and rules defining free software as a common good” as codified in the “four essential freedoms” of the gnu gpl license, and later by an industrial logic that prioritized quality and efficiency10 as exemplified by torvalds’s work on the linux kernel. later still we see market justification employed with the introduction of permissive licenses. this will be addressed further below, but it is worth noting here that while there are additional logics at play when justifying the use of open source in general, it is the interaction of civic, industrial, and market logic that are especially relevant here, because they are mirrored in the justification for use of open source within librarianship. open-source trends in librarianship because we share so many of the values of the oss community, we should feel an obligation to promote open source in the library community.11 at this point it is worth briefly surveying the four major pieces of open-source software used in libraries (see table 2), all of which are library systems. the discussion of open source in libraries is often focused on integrated library systems (ilss), because they represent the single largest mission-critical system that most libraries work with on a daily basis and they affect almost every operation of the library. the discussion here tends to focus on ilss as well, but that should not suggest that there are not other powerful open-source technologies available to librarians. there are notable examples in discovery systems (aspen discovery, blacklight, vufind), institutional repositories (eprints, dspace, islandora, omeka, opus 4, samvera hyrax), content management systems (drupal, subjectsplus, wordpress, etc.), wikis (bookstack, mediawiki, etc.), and analytics (matomo, umami). there are even robust open-source platforms for networking and communication such as the ascendant mastodon microblogging platform. koha one of the first and, to date, most actively developed pieces of open-source lis software is the koha ils.12 it was launched in 2000 in new zealand for a group of three libraries, and it is licensed under the gnu gpl license. it has a very active global community and many private firms that offer support. traditionally popular with small to medium-sized libraries, koha has gained traction with larger academic and public libraries in recent years. opals in 2001, six new york state school library systems came together to create what would become opals (open-source automated library system). today opals is developed by a single company, media flex, and used primarily in school libraries. “opals support is provided through districts, other service centers, or directly through media flex. although an open-source software, development for opals is performed primarily by media flex.” 13 while open source and licensed information technology and libraries december 2023 towards an open source-first praxis in libraries 5 mcilwain with gnu gpl, opals does not appear to take advantage of a collaborative development model as its source code is only available by request from media flex. 14 evergreen in 2004, the georgia public library system began development of the evergreen ils for its large consortium of public libraries, and in 2006 evergreen was launched with a gnu gpl license. afterwards a nonprofit corporation, equinox, was formed to promote, develop, and support the system. because evergreen was originally developed with large consortia or library systems in mind, it offers possibilities of scale, but requires significant resources, which may have heretofore slowed it growth. folio folio was introduced in 2016 under the apache 2.0 license which, unlike the gnu gpl, does not require that derivative works carry similar licenses as the source. this means that in the future, proprietary software can be built with folio as a base, much like the web server software of the license’s namesake, apache, is used as the base for much of the internet today. despite relatively low levels of current adoption (see table 3), folio should not be underestimated. folio is being heavily promoted and has found several high-profile early adopters, especially from the now abandoned kuali ole project. notably, in mid-2022, the library of congress announced its intent to migrate to, and support, folio.15 the folio project is currently developed under the auspices of a single-member limited liability company by the same name, nested within the open library foundation (olf), and is supported by many large libraries and library consortia, but it was ebsco that, in 2015, began exploring the possibility of creating an open-source project and has since significantly funded, promoted, and steered the project.16 ebsco, as the only “enabling partner,” has stated that it “does not expect to exert direct control” beyond “its basic expectations of an open and modular system.”17 while ebsco’s outsized role in the conception, funding, and current presence in the project must not be overlooked, it is an open-source project and many (mostly academic) libraries have been present since early on. in addition, ebsco engaged index data, a well-respected lis software firm, to develop the initial technical platform.18 index data also provides services in support of folio for libraries. table 2. open-source ilss and license types open-source ils license type koha gpl – copyleft opals gpl – copyleft evergreen gpl – copyleft folio apache 2.0 – permissive awareness and use of open source in libraries while limited to reporting about integrated library systems and platforms, marshall breeding’s annual library automation perceptions reports show a significant growth in interest in open source in the past decade. the 2012 “survey reflected fairly low levels of interest in migrating to an open-source ils, even when the company rates their satisfaction with their current proprietary information technology and libraries december 2023 towards an open source-first praxis in libraries 6 mcilwain ils and its company as poor” compared to the 2022 report that noted “open source products are a routine option in all library sectors.”19 a closer look at specific sectors reveals a more complicated picture, however. in academic libraries in the us, we see in 2019 that use of open-source software is highest among those academic institutions that offer doctoral programs and lowest among those that offer associate degrees.20 awareness was not a barrier to adoption, but among current non-adopters there were surprisingly low levels of intent.21 in contrast, among public libraries, choi found in 2021 that awareness was still a barrier for adoption and, moreover, among current non-adopters there was very low intent to migrate to open source in the near future.22 breeding’s libraries.org features an extensive database of libraries worldwide and provides data based on library type with which we may draw some inferences. again, accounting only for ilss, open-source options currently account for just around 5% of the systems among academic, public, school, and special libraries (see table 3), but again here we see an uneven distribution. the popularity of the opals system among small school libraries (78%) may distort the overall picture (see table 4). folio, despite much discussion in field, still has a relatively small footprint, even among medium to large libraries (see table 5). in general, if we exclude opals from the calculation we see similar adoption rates of around 8–10% for all libraries. special libraries have higher rates of open-source adoption ranging from 26% to 30%, but the relatively low sample sizes must be taken into account (see tables 3–5). in the end, we still see modest adoption rates among libraries of all sizes, barring some outliers among small school and special libraries. despite anecdotal evidence that interest or discussion of open source in libraries is increasing relative to 10 years ago, that does not seem to have translated into significant adoption rates and, as choi and pruett have noted, interest among nonadopters is still low.23 an important question, then, is why have open-source solutions not been more widely adopted? while beyond the scope of the current paper, evidence suggests that lack of staffing for maintenance or customization is the biggest barrier blocking adoption, but as we will see later, the introduction of more and more third-party lis it support firms could lower that barrier.24 information technology and libraries december 2023 towards an open source-first praxis in libraries 7 mcilwain table 3. open-source ils/lsp adoption among libraries by type academic libraries public libraries school libraries special libraries academic, public, school, and special libraries n percent n percent n percent n percent n percent koha 955 11.43% 3,227 8.98% 406 1.20% 294 25.59% 4,882 6.16% evergreen 37 0.44% 1,679 4.67% 49 0.14% 10 0.87% 1,775 2.24% opals 50 0.60% 15 0.04% 1,663 4.92% 39 3.39% 1767 2.23% folio 81 0.97% 6 0.02% 0 0.00% 3 0.26% 90 0.11% open source subtotal 1,123 13.44% 4,927 13.71% 2,118 6.26% 346 30.11% 8,514 10.74% grand total 8,358 35,943 33,812 1,149 79,262 note: grand total here equals all libraries identified by type, irrespective of collection size, but excludes those that did not indicate any ils. source: marshall breeding, “libraries.org,” accessed december 21, 2022, https://librarytechnology.org/products/marketshare.pl. table 4. open-source ils/lsp adoption among small libraries by type academic libraries public libraries school libraries special libraries academic, public, school, and special libraries n percent n percent n percent n percent n percent koha 136 16.04% 375 9.55% 32 3.90% 57 23.75% 600 10.28% evergreen 5 0.59% 177 4.51% 2 0.24% 1 0.42% 185 3.17% opals 10 1.18% 4 0.10% 641 78.17% 14 5.83% 669 11.46% folio 2 0.24% 0 0.00% 0 0.00% 0 0.00% 2 0.03% open source subtotal 153 18.04% 556 14.15% 675 82.32% 72 30.00% 1,456 24.95% grand total 848 3,928 820 240 5,836 note: small libraries are defined as those with a collection size of less than 20,000 items. source: marshall breeding, “libraries.org,” accessed december 21, 2022, https://librarytechnology.org/products/marketshare.pl information technology and libraries december 2023 towards an open source-first praxis in libraries 8 mcilwain table 5. open-source ils/lsp adoption among medium/large libraries by type academic libraries public libraries school libraries special libraries academic, public, school, and special libraries n percent n percent n percent n percent n percent koha 368 9.39% 515 7.26% 35 14.52% 56 23.05% 974 8.47% evergreen 22 0.56% 656 9.24% 0 0.00% 2 0.82% 680 5.91% opals 17 0.43% 5 0.07% 60 24.90% 3 1.23% 85 0.74% folio 53 1.35% 1 0.01% 0 0.00% 2 0.82% 56 0.49% open source subtotal 460 11.73% 1,177 16.58% 95 39.42% 63 25.93% 1,795 15.61% grand total 3,920 7,097 241 243 11,501 note: medium and large libraries are defined as those with a collection size of greater than 19,999 items. source: marshall breeding, “libraries.org,” accessed december 21, 2022, https://librarytechnology.org/products/marketshare.pl. core values though not a monolithic profession, there are values associated with lis and many argue that they are quite robust and coherent, even internationally. it was, after all, in 1931 when ranganathan wrote the five laws of library science, asserting that: (1) books are for use, (2) every reader his/her book, (3) every book its reader, (4) save the time of the reader, and (5) the library is a growing organism.25 ranganathan’s five laws have been interpreted and reinterpreted many times over, but in them we may recognize the values still associated with librarianship. michael gorman, the notable library scholar and former president of the american library association, expounded on and made explicit the notion of core values during his career, identifying eight: stewardship, service, intellectual freedom, privacy, rationalism, commitment to literacy and learning, equity of access, and democracy.26 foster and mcmenemy went further and compared the codes of ethics of 36 national library associations and found that of gorman’s eight values, five appeared the most often: service, privacy, equity of access, stewardship, and intellectual freedom (see fig. 1).27 looking at the values identified here by ranganathan, gorman, and foster and mcmenemy, we start to see the intersection of the pragmatic, utilitarian, and moral nonutilitarian stances that define the profession. regarding open-source technology in libraries, utilitarian considerations have heretofore dominated the discussion, but thanks to the maturation of current technologies and dialog around critical librarianship, librarians may now want to evaluate open source in light of the ethics, ideals, and values associated with lis. though there are arguably valid and mutually reinforcing relationships between many of identified values and open source, this discussion will be confined to the five most cited values identified by foster and mcmenemy in the previous paragraph because, owing to their prevalence internationally, these may be considered the most universal. information technology and libraries december 2023 towards an open source-first praxis in libraries 9 mcilwain figure 1. percentage of 36 codes of ethics studied by foster and mcmenemy that adhere to gorman’s eight core values.28 open source and libraries many librarians have long identified the shared values between the profession and the opensource community,29 but perceived barriers (outlined below) have prevented widespread adoption of open-source technologies. this section addresses the use of open source in libraries considering the five core values identified above and argues that many of those perceived barriers are misguided, outdated, or otherwise not completely applicable. service librarianship is a profession defined by service. every aspect of librarianship, every action that we take as librarians can and should be measured in terms of service. 30 perhaps the most fundamental mission of the librarian is to assist patrons in locating the knowledge they seek. in its purest form one might imagine the reference interview, the one-onone interaction between patron and librarian in which the patron is guided through various resources until the answer is found. but the reference interview only represents one point of contact and its prominence in the popular image of the librarian overshadows the other complex labor that aims to connect the patron with information resources. technology plays an enormous role in the myriad complex tasks that are performed largely in the background. indeed, as noted by barron and preater, “contemporary librarianship, as practitioners have constructed it, could not exist without library systems.”31 it is, therefore, appropriate to begin a discussion of the use of open source in libraries with a discussion of how those technologies can allow librarians to better serve their communities, specifically how technology costs and functionality can affect s ervice. information technology and libraries december 2023 towards an open source-first praxis in libraries 10 mcilwain costs cost is often the first argument made for open source in libraries and given the perennial budget constraints of many libraries, it is easy to see why. the largest cost advantage of open source comes from the elimination of license fees and support flexibility. since the code is open and not owned by anyone, vendors cannot demand fees for the use of the software or per user/per installation fees. users are free to use the software as they wish, limited in most cases only by hardware availability and in some cases technical expertise. corrado goes further and notes “open-source software not only has a lower acquisition cost than proprietary software, it often has lower evaluation/implementation and support costs as well.”32 indeed, as noted by choi and pruett in their examination of open source adoption in academic libraries, the “ability to download and test the software in advance” was the fourth most cited driver for choosing open source.33 while there are often lower costs associated with open source, there are still costs, especially with support and infrastructure. some libraries will already have the technical expertise and physical hosting capacity to maintain and run open-source systems, and other “organizations will contract with specialized firms for the services needed to operate the software with the levels of reliability and performance expected for critical business functions.”34 the perceived lack of in-house technical expertise is a common barrier among libraries that are considering open-source solutions, but here again open source presents opportunities for libraries.35 instead of a single firm that produces the software and provides support, open source allows libraries to select options best suited for them based on the on-site expertise and physical capacity already available. flexibility and avoidance of vendor lock-in are closely tied to any discussion of cost and have been noted as significant drivers in choosing open source.36 the main distinction between support for proprietary and open-source systems is that with proprietary systems, support is generally limited to the firm that developed the software. if there is an issue that requires additional expertise, a library may be required to purchase an elevated support tier or may be otherwise waiting for a bug to be fixed or feature introduced at the discretion of the firm.37 in the opensource support world on the other hand, there are more options: first with regard to the companies providing support—if company a cannot or will not provide the desired level of support, company b may be a better option—and second, there are more options from the user community—if several users want a certain feature, they may work together to develop it and contribute it back to the project, making it available for everyone. or, as with projects that have formalized decision-making structures, they may decide to become active within the governance bodies to steer a project in a certain direction. moreover, support for an issue may already be openly available in the form of online documentation or user-driven support forums. so, while potentially spending less on support and infrastructure that is at the same time more bespoke, a library can support vendors and communities whose values more closely align with their own and can avoid being locked into lengthy service agreements (vendor lock-in) with the developers of the software.38 today there is a robust ecosystem around open source, providing support and hosting solutions. arguably one of the most prominent current examples in the open-source library community is bywater solutions. bywater solutions started in 2009 to provide support for the open-source koha ils, and while it was not the first firm set up to support open-source library systems, it differed notably from some predecessors such as ptfs (née liblime) because it strived to have a collaborative relationship with the global koha community. other prominent examples include catalyst, equinox, and ptfs europe (not related to ptfs cited above). information technology and libraries december 2023 towards an open source-first praxis in libraries 11 mcilwain while cost is an oft-cited reason for interest in or adoption of open source, in 2006 marshall breeding noted that “concerted interest in open source ilss began,” not primarily out of budgetary concerns but rather frustration with the functionality of proprietary ils options.39 quality/functionality/customization as the expectations of patrons change, the need for more and more sophisticated technology increases year on year, and as the needs of each institution are different, the desire to customize that technology to meet those needs increases in kind, creating a source of tension between libraries and library software vendors in the process. private firms, especially publicly traded ones, are under pressure to make the minimum viable product to maximize profits, hardly an offense for a for-profit company, but it does represent the divergent interests of firms and libraries.40 functionality and customization are at once barriers to and drivers for adopting open-source solutions, and this fact alone demonstrates the continued misconceptions around open source in libraries.41 still, for the present argument it is sufficient to say that, despite earlier doubts around the open-source development model and the quality of the software, the continuous growth in popularity of open source has proved it is a legitimate alternative to proprietary systems in terms of quality. indeed, perhaps the strongest argument for the quality of open-source technology can be made by the firms that produce proprietary software, including in the library sector, since many of them use open-source technology in their own software. for example, ex libris’s alma system, used by 36% of academic libraries in the us in mid-2022, relies on the open-source apache solr for its search index.42 another part of providing the best service to patrons is being able to evaluate how our systems function and how they serve results. the black-box nature of proprietary systems (i.e., we know what goes in and what comes out, but have little notion of what decisions are made within ) means that librarians’ ability to serve their patrons is at times significantly hindered. for example, as corrado has noted, the inclusion or exclusion of open-access journals in the indices of proprietary discovery systems such as ebsco discovery services (eds) and ex libris’s primo, while not as opaque as academic search engines such as google scholar, is not always transparent. 43 this could represent a specific problem for some libraries, but it also speaks to a more fundamental problem. because of the nature of software development and the business models of private firms, there is an associative amount of “protected” information that may be considered trade sensitive, and whenever it is not clear how a system arrived at, or delivered, a specific piece of information, that creates a power differential and disadvantages libraries and users. smith and hanson go further to note that the uneven power dynamics in library services limits patrons’ access to information and can limit librarians’ ability to work toward socially just outcomes.44 the increased transparency of open-source systems may provide librarians the means to better serve users by allowing them to better understand how library or discovery systems are serving results to users, ultimately helping them more easily find relevant information. the current dominant paradigm in lis is that libraries pay companies for access to mission-critical systems. all support and development are provided by one firm. if there are problems or bugs, librarians must dedicate resources to reporting those to the firm to be fixed (or not) at the discretion of the developer. barron and preater, referring to galvan’s “architecture of authority,” noted that “whereas community developers are actively contributing to open source projects, systems librarians contributing to supplier-hosted community areas are providing free labor to improve a system for which they have already paid: ‘we’re one of the only industries that pays for information technology and libraries december 2023 towards an open source-first praxis in libraries 12 mcilwain the privilege of improving products, just to get them to work the way we needed them to in the first place.’”45 librarians contributing to proprietary systems (that they have already paid for) provides a particularly stark illustration of an exploitative power dynamic. of course, private enterprise will continue to profit from the unpaid labor of open-source contributors as long as their systems are built on top of open-source packages (e.g., elasticsearch, apache server and solr, nginx, and mariadb, to name the most obvious), but at least libraries will not pay twice—once for the product, second for the labor to improve the product— as in the current model. in the end, proprietary firms and open source both have the capacity to produce modern, high quality systems, but all things being equal, open source has the added advantage of transparency and control, which reinforces rather than compromises the core values of the profession. privacy libraries have an obligation to ensure the privacy of those who use their services. the use of remotely hosted proprietary software suites can make that difficult, impossible, or at the very least difficult to appraise. the dominant model for ils hosting is now one in which the provider also hosts the software on their own servers, as opposed to locally installed instances. patron data—from name, birthdate, and home address to search queries and circulation records—are now often stored in remote databases that system administrators may not have complete access to. the terms of use of this data are detailed in the vendor’s privacy policy, which may change over time. due to limited capacity, libraries may not have the time or resources to review in detail each vendors’ privacy policy or each change to that policy. remotely hosting library systems provides advantages of scale for the ils providers and may reduce the it costs of the library, while also representing an outsourcing of library it labor, but it also represents another point where we see power dynamics shifting in favor of ils firms. with less control of and access to the systems used in the library, librarians are disadvantaged. moreover, warehousing the data of many libraries in one place may create a more attractive target for nefarious actors. for libraries without on-site it knowledge, having a system hosted remotely on servers maintained by dedicated professional staff offers clear advantages, and obviously using open-source software doesn’t immediately eliminate privacy concerns, but it does shift the power dynamic back to the librarian and enhances their agency in terms of proactively protecting users’ interests. as we will see below privacy also features in discussions of stewardship and intellectual freedom. equity of access the technologies used in lis are designed to either allow librarians to better serve their patrons or, in many cases, to allow patrons themselves to directly access knowledge. they are therefore, essential to any discussion of equity of access, a “basic premise that everyone has a right to have access to library resources and services, irrespective of who they are and where and under which conditions they live.”46 making high-quality, modern technology available with the lowest possible barrier is important to providing that access, and as noted previously, producing high -quality software is one area where open-source technology excels. it was also noted above, in the discussion of cost, that depending on the required third -party support and infrastructure, open source is often a less costly solution. the absence of annual licensing fees means that a larger portion of the money invested in systems will go towards development and maintenance, activities that directly serve the user. information technology and libraries december 2023 towards an open source-first praxis in libraries 13 mcilwain stewardship according to gorman, “stewardship in the library context has three components: the preservation of the human record to ensure that future generations know what we know, the care and nurture of education for librarianship so that we pass on our best professional values and practices, [and] the care and maintenance of our libraries so that we earn the respect of our communities.”47 referring to gorman’s first point, henderson notes that, “libraries play this archival role because history has shown that it is not economically viable for profit-based businesses to do so.”48 the most pressing threat posed by closed-source technology to this concept of stewardship is longterm access to the proprietary systems and formats that contain and transmit knowledge. paradoxically, this brings us to another one of the main reasons, as identified by wilson and mitchell, that libraries are reluctant to adopt open source: “the risks involved in using oss are too great.” namely, libraries are worried about investing in systems where no single company is responsible for their development.49 while true that generally no single entity is solely responsible for development, that can be an advantage. with the barriers to the transit of capital across national borders reduced or eliminated and the liberalization of financial markets in many parts of the world comes the consolidation of industries, including the publishing and library technology sectors, a topic familiar to most librarians. when one firm acquires another, priorities may change, and as trends, tastes, and the economic environments change, technologies may be rendered uneconomical, redundant, and ultimately useless. this can mean that a piece of software or file format that was in active development one day is shelved the next, its proprietary source code permanently frozen and support for it curtailed and eventually eliminated at the earliest possible moment that is contractually possible. users are left locked into an increasingly out-ofdate technology, exposed to data security vulnerabilities (creating potential privacy issues among other problems), or faced with the costly prospect of migrating to a new system. this scenario is taken for granted today, because operating at the whims of technology firms is a common occurrence, but the open-source model offers an alternative. there is nothing preventing interest in a particular piece of open-source software from waning for some of the same reasons as mentioned for closed-source software (changing trends, tastes, etc.), but what happens next is fundamentally different. instead of the source code being permanently frozen in a firm’s archives, anyone could take the open-source code and update it or adapt it for future use. if a group of libraries are all using a piece of open-source software that is no longer actively developed by the community, they could pool their resources to adapt or update the software to their needs and maintain functionality and address security issues. intellectual freedom intellectual freedom is perhaps the most obvious value shared by the open source and lis communities. if we return briefly to the formative ideas around the open-source movement, intellectual freedom is central, especially when viewed in light of the freedoms to run, study, share, and modify source code outlined in the initial gnu gpl license. applied to traditional libraries these freedoms might be reinterpreted as read, study, share, and modify, and often “intellectual freedom begins with opposition to censorship of books and other library materials.”50 but it should apply no less to computer code. supporting open source and a model of knowledge creation that eschews copyright maximalism and embraces the information commons reinforces librarianship’s own values around intellectual freedom. to return again briefly to privacy, it is also necessary to intellectual freedom, representing another, indirect, relationship between open source and libraries promotion of intellectual freedom.51 without privacy, patrons cannot fully utilize the information resources available to information technology and libraries december 2023 towards an open source-first praxis in libraries 14 mcilwain them. “protecting information privacy allows individuals to feel free to sample the marketplace of ideas without fear of interference or scrutiny, which could inhibit curiosity.”52 the prevalence of these and other core values within the lis community are a proclamation of what is important to the profession. they help guide practitioners and help us to keep our focus on the communities we serve. that doesn’t mean there isn’t any room for interpretation; indeed, as seen in figure 1, the core values we have identified here are interpreted differently and are adhered to, to varying degrees in different places. it is the responsibility of each of us to apply these values to the work we do each day. a critical appraisal of current trends it’s hard to discuss the current state of open source in libraries without talking about folio, or the future of libraries is open. the enthusiasm behind folio is notable and its early adoption among large established academic libraries is impressive, especially for an open-source project, but the prominent role that the private sector plays in its development deserves critical examination. indeed, with the introduction of the folio library services platform (lsp), it is worth looking more closely at a strategy among private companies to leverage open-source technology (and the labor behind it) to bolster profits and reputational capital. already in 1999, eric raymond identified “open development,” a term used by linus torvalds to describe what would become known as open source, and “decentralized peer review” to “lower costs and improve software quality.”53 “open innovation,” as it became known, is a business model designed to profit from open-source technology.54 with the ascension of open innovation, the dominant justification was no longer civic but rather industrial (efficiency, quality, scale) and market (competition, profit), and there are many examples. in recent years, there has been much discussion of microsoft shipping a linux kernel inside of windows because this would have been unimaginable twenty years ago when steve balmer declared that “linux is a cancer that attaches itself in an intellectual pro perty sense to everything it touches”—presumably a reference to the gnu gpl’s requirement that derivative works carry the same open license.55 as more permissive licenses were introduced, microsoft has been making more and more moves towards interoperability between its own systems and open source. setting aside the 2014 statement from its then ceo that “microsoft ♥ linux,” microsoft’s approach to open source has been largely calculated and pragmatic, a strategy to ensure that its azure cloud computing service can host systems that the vast majority of the web runs on.56 still, its 2019 purchase of code-sharing platform github for $7.5 billion was a testament to the fact that microsoft saw value in open innovation and open source.57 the same could be said of google. when suddenly confronted with a major competitor potentially cornering the market for mobile operating systems (the 2007 release of apple’s ios), google decided to put its energies into supporting the development of the android open source project (aosp) and building proprietary components on top of it. aosp is the open-source base underpinning android. aosp is, as the name suggests, open source, whereas android includes many proprietary critical components. this is made possible because aosp is licensed with a permissive open-source license (apache 2) that does not require derivative works to have a similarly open license. as time passed, google introduced more and more closed-source components that mirrored essential aosp functionality, at which point in many cases development on the aosp counterpart ceased, at least from google’s perspective. this has left the original aosp project largely unusable without additional (now) proprietary components. information technology and libraries december 2023 towards an open source-first praxis in libraries 15 mcilwain the most explicative for our discussion however is ibm. ibm became the first major firm to pivot in supporting open source when, in 2001, it announced that it would invest $1 billion in opensource development. ibm’s then ceo lou gerstner explained the company’s shift to investing in open source and the proprietary software that it planned to develop on top of it when he earnestly commented “giving one away helps increase sales of the other.”58 pamela samuelson went further: “there are at least three stories one can tell about this shift. ibm’s adoption of open source can be viewed: as an anti-microsoft strategy; as a consequence of changed business models in the software industry; and as a manifestation of an open innovation strategy for promoting faster and more robust technical advances.”59 if we take this quote and replace ibm with ebsco and microsoft with proquest, we may have a ready-made explanation of folio as well. around the same time that its competitor proquest purchased library system developer ex libris in 2015, ebsco announced the launch of a competing open-source platform, folio. after initial discussions were carried out in the first half of 2015, formal approval arrived in the autumn of the same year, and development began in earnest soon after.60 irrespective of motivations, the decision leveraged the predictable community enthusiasm for open source, while reaping the benefits of that community’s efforts to develop the platform. according to ebsco executive vice president sam brooks, “ebsco will contribute more than any previous library vendor has to an open source project, comparable or greater than what other organizations have invested in creating proprietary lsps.”61 ettlinger notes that “through a series of calculated tactics, firms can appear to be altruistically contributing technologies to the public domain, while indirectly promoting demand for their products.”62 beyond the direct profits earned as a folio service and hosting provider, the benefits for ebsco—from gaining foundational access to a library system platform that has been built to its own specifications to acquiring reputational capital, capital that, among some in the lis community, frames the firm as a benevolent and selfless patron of libraries—are clear. librarians must evaluate whether this is the best model for libraries and their patrons. the potential benefits of a robust, versatile, and scalable open-source library system for the lis community are great, but librarians must ensure that the core values that shape the profession are not compromised during its development. alternative models the communities that have emerged around projects such as koha and evergreen are sizable and have resulted in robust systems. other examples, such as the kuali ole, were less successful. it is beyond the scope of this paper to examine the specific reasons for the relative successes of some projects compared to others, but it may be valuable to briefly explore some alternative models to private enterprise-led open-source development, since as seen above, those models may not represent the best interest of libraries or the public in terms of core values. with open source, the community around a project is key to its success, but funding and leadership are also essential. first, funding to develop open-source library systems can come from anyone who is interested in the project, but with funding comes the ability to directly or indirectly steer the project. therefore, there is a strong argument for such projects to be largely publicly funded. making libraries better and more accessible is in the public interest. libraries are a legitimate recipient of public funding, and that extends to the software that makes possible many of the services that users have come to expect. to look briefly at europe and the united states, there are several potential partners. in europe, the european union and its member states have, in recent years, committed in various ways to information technology and libraries december 2023 towards an open source-first praxis in libraries 16 mcilwain promoting and using open source.63 the eu’s stated motivations, or operational principles as first laid out in the 2018 european commission digital strategy, are digital by default, security and privacy, openness and transparency, interoperability and cross-border, and user-centric/datadriven/agile.64 there is obvious overlap here with the identified values of librarianship and the eu has already shown itself to be a valuable partner to libraries through such efforts as the europeana project.65 at the national level there are many prospective supporters present including the german research foundation (deutsche forschungsgemeinschaft) with an annual budget of €3.6 billion in 2021,66 the belgian science policy office (belspo), the dutch research council (nederlandse organisatie voor wetenschappelijk onderzoek), the french national research agency (agence nationale de la recherche), and the italian national research agency (agenzia nazionale per la ricerca) among others. in the us, the institute of museum and library services, established in 1996, is a logical source of funding as its mission is “to advance, support, and empower america’s museums, libraries, and related organizations through grantmaking, research, and policy development.”67 as for leadership, again there is a strong argument to be made for stakeholders, in this case libraries themselves, to govern and steer open-source lis projects. this requires open and transparent governance that again reflects the values of the profession, e.g., equity of access. there is a long history of national libraries leading publicly funded projects, from the library of congress developing any number of technologies, including marc records, to the koninklijke bibliotheek providing administrative support to europeana. there is also room for library consortia or associations to lead these efforts. in germany, for example, regional library consortia have been developing and sharing library-related technology for years, including widely used solutions such as dbis (datenbank-infosystem), the ezb (elektronische zeitschriftenbibliothek), and opus 4. indeed, the participation of several german library consortia (among many other international library partners) in the folio project suggests that it will not likely become locked to any one private-sector actor. though, given the foundational support provided by some, ebsco and index data in particular, it may be difficult to imagine the project continuing if that support was to suddenly vanish. as profits dictate corporate acquisitions and acquisitions dictate priorities, librarianship is often placed at a disadvantage. librarians and libraries must evaluate whether a more sustainable solution may be found in a model that is publicly funded and led by libraries. conclusion open-source technology presents a valuable opportunity to libraries and librarians to better serve their users by supporting the core values of the profession. supporting these core values is both pragmatic (aligned with the core value service) and moral-idealistic (aligned with the core values privacy, equity of access, stewardship, and intellectual freedom). at the same time, it is important for librarians to critically evaluate and challenge cultural assumptions around the current state of open source and the inherent power dynamics, and information as a commodity. awareness and use of open source continue to increase among libraries of all sizes, but research suggests disparities between different types and sizes of libraries. moreover, the nuances regarding open-source technology are rarely addressed in the literature. in order to further promote its shared values and enrich the profession, librarianship as a whole should formally address and support open source through further codification, institutionalization, and investigation. this could be done by including open source in the accreditation requirements for information technology and libraries december 2023 towards an open source-first praxis in libraries 17 mcilwain lis degree programs, for instance, inclusion in the technology section of the ala’s core competences of librarianship.68 individual librarians are encouraged to explore toolkits like awesome self-hosted (https://selfhosted.libhunt.com/) and to continue to develop and promote open source in their libraries. turning to communities such as code{4}lib (https://code4lib.org/) and the eu’s open source observatory (https://joinup.ec.europa.eu/collection/open-sourceobservatory-osor/) for questions or to share experiences is also valuable. once awareness of open source and its nuances are more widespread within the profession, we may start to have more critical conversations about the most beneficial ways of using the technology to better serve our users. endnotes 1 namjoo choi and joseph a. pruett, “the context and state of open source software adoption in us academic libraries,” library hi tech 37, no. 4 (november 18, 2019): 648, https://doi.org/10.1108/lht-02-2019-0042. 2 stuart lawson, kevin sanders, and lauren smith, “commodification of the information profession: a critique of higher education under neoliberalism,” journal of librarianship and scholarly communication 3, no. 1 (march 10, 2015), https://doi.org/10.7710/2162-3309.1182. 3 lawson, sanders, and smith, “commodification,” 17. 4 john m. budd, “the library, praxis, and symbolic power,” the library quarterly 73, no. 1 (january 2003): 20, https://doi.org/10.1086/603373. 5 magnus bergquist, jan ljungberg, and bertil rolandsson, “a historical account of the value of free and open source software: from software commune to commercial commons,” in ifip international conference on open source systems (springer, 2011), 196–207. 6 bergquist, ljungberg, and rolandsson, “a historical account,” 197. 7 nancy ettlinger, “the openness paradigm,” new left review, no. 89 (october 2014): 97. 8 ettlinger, “the openness paradigm,” 94. 9 luc boltanski and laurent thévenot, on justification: economies of worth (princeton university press, 2006). see also the discussion of boltanski and thévenot’s justification theory applied to the development of the open-source software movement in bergquist, ljungberg, and rolandsson, “a historical account.” 10 bergquist, ljungberg, and rolandsson, “a historical account,” 199; ibid. 201. 11 jason puckett, “open source software and librarian values,” georgia library quarterly 49 (2012): 4. 12 “about – official website of koha library software”. accessed 7 december 2023. https://kohacommunity.org/about/. 13 marshall breeding, “library systems report 2016,” american libraries magazine, may 2, 2016, https://americanlibrariesmagazine.org/2016/05/02/library-systems-report-2016/. https://selfhosted.libhunt.com/ https://code4lib.org/ https://joinup.ec.europa.eu/collection/open-source-observatory-osor https://joinup.ec.europa.eu/collection/open-source-observatory-osor https://doi.org/10.1108/lht-02-2019-0042 https://doi.org/10.7710/2162-3309.1182 https://doi.org/10.1086/603373 https://americanlibrariesmagazine.org/2016/05/02/library-systems-report-2016/ information technology and libraries december 2023 towards an open source-first praxis in libraries 18 mcilwain 14 marshall breeding, “major open source ils products,” library technology reports 44, no. 8 (february 26, 2009): 16–31. 15 leah knobel, “library of congress launches effort to transform collections management and access,” library of congress newsroom, september 21, 2022, https://newsroom.loc.gov/news/library-of-congress-launches-effort-to-transformcollections-management-and-access/s/c432d3c2-780b-4bfe-9123-bbb6c25631bc. 16 liu tiewei, "how is folio different from its predecessors?,” international journal of librarianship 6, no. 2 (december 22, 2021): 41; marshall breeding, “ebsco supports new open source project,” american libraries magazine (april 22, 2016), https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/. 17 “members,” folio, accessed october 24, 2023, https://folio.org/community/members/; breeding, “ebsco supports new open source project.” 18 marshall breeding, “folio: a new open source initiative,” in “open source library systems: the current state of the art,” library technology reports 53, no. 6 (august/september 2017): 27. 19 marshall breeding, “perceptions 2012: an international survey of library automation,” january 21, 2013, https://librarytechnology.org/perceptions/2012/; marshall breeding, “library perceptions 2022: results of the 15th international survey of library automation,” april 17, 2022, https://librarytechnology.org/perceptions/2021/. 20 choi and pruett, “the context and state of open source software adoption,” 653. 21 choi and pruett, “the context and state of open source software adoption,” 641. 22 namjoo choi, “an empirical examination of open source software adoption in us public libraries,” the electronic library 35, no. 5 (2021): 695. 23 choi and pruett, “the context and state of open source software adoption”; choi, “an empirical examination.” 24 choi and pruett, “the context and state of open source software adoption,” 646. 25 s. r. ranganathan, the five laws of library science (london: edward goldston, ltd, 1931), https://hdl.handle.net/2027/uc1.$b99721. 26 michael gorman, our enduring values revisited: librarianship in an ever-changing world (chicago: ala editions, 2015). 27 catherine foster and david mcmenemy, “do librarians have a shared set of values? a comparative study of 36 codes of ethics based on gorman’s enduring values,” journal of librarianship and information science 44, no. 4 (december 2012): 253, https://doi.org/10.1177/0961000612448592. 28 foster and mcmenemy, “do librarians have a shared set of values?,” 253. https://newsroom.loc.gov/news/library-of-congress-launches-effort-to-transform-collections-management-and-access/s/c432d3c2-780b-4bfe-9123-bbb6c25631bc https://newsroom.loc.gov/news/library-of-congress-launches-effort-to-transform-collections-management-and-access/s/c432d3c2-780b-4bfe-9123-bbb6c25631bc https://librarytechnology.org/perceptions/2012/ https://librarytechnology.org/perceptions/2021/ https://hdl.handle.net/2027/uc1.$b99721 https://doi.org/10.1177/0961000612448592 information technology and libraries december 2023 towards an open source-first praxis in libraries 19 mcilwain 29 daniel chudnov, “open source software: the future of library systems?,” library journal 124, no. 13 (august 1, 1999): 40; micah altman, “open source software for libraries: from greenstone to the virtual data center and beyond,” iassist quarterly 25, no. 4 (december 4, 2002): 5, https://doi.org/10.29173/iq856; puckett, “open source software and librarian values”; choi and pruett, “the context and state of open source software adoption.” 30 gorman, our enduring values revisited, 92. 31 simon barron and andrew preater, “critical systems librarianship,” in the politics of theory and the practice of critical librarianship, ed. karen p. nicholson and maura seale (sacramento, california: library juice press, 2017). 32 edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science and technology librarianship, no. 42 (spring 2005): 3, https://doi.org/10.5062/f42f7kd8. 33 choi and pruett, “the context and state of open source software adoption,” 648 . 34 breeding, “folio: a new open source initiative,” 7. 35 choi and pruett, “the context and state of open source software adoption,” 646. 36 choi and pruett, “the context and state of open source software adoption,” 648. 37 corrado, “the importance of open access,” 3. 38 robert wilson and james mitchell, open source library systems: a guide, lita guides (lanham: rowman & littlefield, 2021), 23. 39 marshall breeding, “adoption patterns of proprietary and open source ils in u.s. libraries,” computers in libraries 35, no. 8 (january 2015): 18. 40 barron and preater, “critical systems librarianship,” 95. 41 wilson and mitchell, open source library systems, 21; choi and pruett, “the context and state of open source software adoption,” 248. 42 marshall breeding, “libraries.org,” accessed august 9, 2022, https://librarytechnology.org/libraries/. 43 edward m. corrado, “revisiting the importance of open access, open source, and open standards for libraries,” technical services quarterly 38, no. 3 (july 3, 2021): 287, https://doi.org/10.1080/07317131.2021.1934312. 44 lauren smith and michael hanson, “communities of praxis: transforming access to information for equity,” the serials librarian 76, no. 1–4 (june 14, 2019): 43, https://doi.org/10.1080/0361526x.2019.1593015. 45 barron and preater, “critical systems librarianship,” 95. 46 gorman, our enduring values revisited, 159. https://doi.org/10.29173/iq856 https://doi.org/10.5062/f42f7kd8 https://librarytechnology.org/libraries/ https://doi.org/10.1080/07317131.2021.1934312 https://doi.org/10.1080/0361526x.2019.1593015 information technology and libraries december 2023 towards an open source-first praxis in libraries 20 mcilwain 47 gorman, our enduring values revisited, 76. 48 carol c. henderson, “why librarians care about intellectual property law and policy,” american library association, march 10, 2019, https://www.ala.org/advocacy/copyright/copyrightarticle/librariescreatures. 49 wilson and mitchell, open source library systems, 20. 50 gorman, our enduring values revisited, 110. 51 barron and preater, “critical systems librarianship,” 98. 52 cherie l. givens, information privacy fundamentals for librarians and information professionals (blue ridge summit: rowman & littlefield publishers, 2014), 27. 53 e. raymond and b. young, the cathedral & the bazaar: musings on linux and open source by an accidental revolutionary, rev. ed. (o’reilly, 2001), xi. 54 henry chesbrough, open business models: how to thrive in the new innovation landscape (harvard business press, 2006), 43. 55 dave newbart, “microsoft ceo takes launch break with the sun-times,” chicago sun-times, 2001, https://web.archive.org/web/20011211130654/http://www.suntimes.com/output/tech/cstfin-micro01.html. 56 steven vaughan-nichols, “why microsoft loves linux,” zdnet, october 29, 2014, https://www.zdnet.com/article/why-microsoft-loves-linux/; cade metz, “why microsoft ceo satya nadella loves what steve ballmer once despised,” wired, october 21, 2014, https://www.wired.com/2014/10/microsoft-ceo-satya-nadella-loves-steve-ballmerdespised/. 57 alex hern, “microsoft is buying code-sharing site github for $7.5bn,” technology, the guardian, june 4, 2018, https://www.theguardian.com/technology/2018/jun/04/microsoft-is-buyingcode-sharing-site-github-say-reports. 58 chesbrough, open business models, 240. 59 pamela samuelson, “ibm’s pragmatic embrace of open source,” communications of the acm 49, no. 10 (october 2006): 22, https://doi.org/10.1145/1164394.1164412. 60 breeding, “ebsco supports new open source project.” 61 breeding, “ebsco supports new open source project.” 62 ettlinger, “the openness paradigm,” 98. 63 see for example “open source software strategy 2020 – 2023 think open,” communication to the commission (brussels: european commission, october 21, 2020), https://ec.europa.eu/info/sites/default/files/en_ec_open_source_strategy_2020 -2023.pdf; “berlin declaration on digital society and value-based digital government” (ministerial https://www.ala.org/advocacy/copyright/copyrightarticle/librariescreatures https://web.archive.org/web/20011211130654/http:/www.suntimes.com/output/tech/cst-fin-micro01.html https://web.archive.org/web/20011211130654/http:/www.suntimes.com/output/tech/cst-fin-micro01.html https://www.zdnet.com/article/why-microsoft-loves-linux/ https://www.wired.com/2014/10/microsoft-ceo-satya-nadella-loves-steve-ballmer-despised/ https://www.wired.com/2014/10/microsoft-ceo-satya-nadella-loves-steve-ballmer-despised/ https://www.theguardian.com/technology/2018/jun/04/microsoft-is-buying-code-sharing-site-github-say-reports https://www.theguardian.com/technology/2018/jun/04/microsoft-is-buying-code-sharing-site-github-say-reports https://doi.org/10.1145/1164394.1164412 https://ec.europa.eu/info/sites/default/files/en_ec_open_source_strategy_2020-2023.pdf information technology and libraries december 2023 towards an open source-first praxis in libraries 21 mcilwain meeting during the german presidency of the council of the european union, december 8, 2020), https://ec.europa.eu/isa2/sites/isa/files/cdr_20201207_eu2020_berlin_declaration_on_digita l_society_and_val ue-based_digital_government_.pdf; “european commission digital strategy next generation digital commission” (european commission, june 30, 2022), https://ec.europa.eu/info/sites/default/files/strategy/decisionmaking_process/documents/c_2022_4388_1_en_act.pdf. 64 “european commission digital strategy: a digitally transformed, user-focused and datadriven commission” (european commission, november 21, 2018), 4–6. 65 j. robertson mcilwain, “the eu and library science fostering legitimacy through partnership” (presentation, 27th international conference of europeanists, june 2021), https://doi.org/10.6084/m9.figshare.20347917.v2. 66“jahresbericht 2021: aufgaben und ergebnisse” (bonn: deutsche forschungsgemeinschaft), accessed august 24, 2022, https://www.dfg.de/download/pdf/dfg_im_profil/geschaeftsstelle/publikationen/dfg_jb2021. pdf. 67 institute of museum and library services, fy 2022–2026 strategic plan, march 2022, 3, https://www.imls.gov/sites/default/files/2022-02/imls-strategic-plan-2022-2026.pdf. 68 american library association, ala’s core competences of librarianship, january 27, 2009, https://www.ala.org/educationcareers/files/careers/corecomp/corecompetences/finalcorec ompstat09.pdf. https://ec.europa.eu/isa2/sites/isa/files/cdr_20201207_eu2020_berlin_declaration_on_digital_society_and_val%20ue-based_digital_government_.pdf https://ec.europa.eu/isa2/sites/isa/files/cdr_20201207_eu2020_berlin_declaration_on_digital_society_and_val%20ue-based_digital_government_.pdf https://ec.europa.eu/info/sites/default/files/strategy/decision-making_process/documents/c_2022_4388_1_en_act.pdf https://ec.europa.eu/info/sites/default/files/strategy/decision-making_process/documents/c_2022_4388_1_en_act.pdf https://doi.org/10.6084/m9.figshare.20347917.v2 https://www.dfg.de/download/pdf/dfg_im_profil/geschaeftsstelle/publikationen/dfg_jb2021.pdf https://www.dfg.de/download/pdf/dfg_im_profil/geschaeftsstelle/publikationen/dfg_jb2021.pdf https://www.imls.gov/sites/default/files/2022-02/imls-strategic-plan-2022-2026.pdf https://www.ala.org/educationcareers/files/careers/corecomp/corecompetences/finalcorecompstat09.pdf https://www.ala.org/educationcareers/files/careers/corecomp/corecompetences/finalcorecompstat09.pdf abstract introduction theoretical note open-source technology open-source trends in librarianship koha opals evergreen folio awareness and use of open source in libraries core values open source and libraries service costs quality/functionality/customization privacy equity of access stewardship intellectual freedom a critical appraisal of current trends alternative models conclusion endnotes 172 information technology and libraries | december 2009 information discovery insights gained from multipac, a prototype library discovery system alex a. dolski at the university of nevada las vegas libraries, as in most libraries, resources are dispersed into a number of closed “silos” with an organization-centric, rather than patron-centric, layout. patrons frequently have trouble navigating and discovering the dozens of disparate interfaces, and any attempt at a global overview of our information offerings is at the same time incomplete and highly complex. while consolidation of interfaces is widely considered to be desirable, certain challenges have made it elusive in practice. m ultipac is an experimental “discovery,” or metasearch, system developed to explore issues surrounding heterogeneous physical and networked resource access in an academic library environment. this article discusses some of the reasons for, and outcomes of, its development at the university of nevada las vegas (unlv). n the case for multipac fragmentation of library resources and their interfaces is a growing problem in libraries, and unlv libraries is no exception. electronic information here is scattered across our innovative webpac; our main website, our three branch library websites; remote article databases, local custom databases, local digital collections, special collections, other remotely hosted resources (such as libguides), and others. the number of these resources, as well as the total volume of content offered by the libraries, has grown over time (figure 1), while access provisions have not kept pace in terms of usability. in light of this dilemma, the libraries and various units within have deployed finding and search tools that provide browsing and searching access to certain subsets of these resources, depending on criteria such as n the type of resource; n its place within the libraries’ organizational structure; n its place within some arbitrarily defined topical categorization of library resources; n the perceived quality of its content; and n its uniqueness relative to other resources. these tools tend to be organization-centric rather than patron-centric, as they are generally provisioned in relative isolation from each other without placing as much emphasis on the big picture (figure 2). the result is, from the patron’s perspective, a disaggregated mass of information and scattered finding tools that, to varying degrees, each accomplishes its own specific goals at the expense of macro-level findability. currently, a comprehensive search for a given subject across as many library resources as possible might involve visiting a half-dozen interfaces or more—each one predicated upon awareness of each individual interface, its relation to the others, and figure 1. “silos” in the library figure 2. organization-centric resource provisioning alex a. dolski (alex.dolski@unlv.edu) is web & digitization application developer at the university of nevada las vegas libraries. information discovery insights gained from multipac | dolski 173 the characteristics of its specific coverage of the corpus of library content. our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation. test subjects, when given a task to find something and asked to use the library website as a starting point, fail outright in a clear majority of cases.1 multipac is a technical prototype that serves as an exploration of these issues. while the system itself breaks no new technical ground, it brings to the forefront critical issues of metadata quality, organizational structure, and long-term planning that can inform future actions regarding strategy and implementation of potential solutions at unlv and elsewhere. yet it is only one of numerous ways that these issues could be addressed.2 in an abstract sense, multipac is biased toward principles of simplification, consolidation, and unification. in theory, usability can be improved by eliminating redundant interfaces, consolidating search tools, and bringing together resource-specific features (e.g., opac holdings status) in one interface to the maximum extent possible (figure 3). taken to an extreme, this means being able to support searching all of our resources, regardless of type or location, from a single interface; abstracting each resource from whatever native or built-in user interface it might offer; and relying instead on its data interface for querying and result-set gathering. thus multipac is as much a proof-of-concept as it is a concrete implementation. n background: how multipac became what it is multipac came about from a unique set of circumstances. from the beginning, it was intended as an exploratory project, with no serious expectation of it ever being deployed. our desire to have a working prototype ready for our discovery mini-conference meant that we had just six weeks of development time, which was hardly sufficient for anything more than the most agile of table 1. some popular existing library discovery systems name company/institution commercial status aquabrowser serials solutions commercial blacklight university of virginia open-source (apache) encore innovative interfaces commercial extensible catalog university of rochester open-source (mit/gpl) libraryfind oregon state university open-source (gpl) metalib ex libris commercial primo ex libris commercial summon serials solutions commercial vufind villanova university open-source (gpl) worldcat local oclc commercial table 2. some existing back-end search servers name company/institution commercial status endeca endeca technologies commercial idol autonomy commercial lucene apache foundation open-source (apache) search server microsoft commercial search server express microsoft free solr (superset of lucene) apache foundation open-source (apache) sphinx sphinx technologies open-source (gpl) xapian community open-source (gpl) zebra index data open-source (gpl) 174 information technology and libraries | december 2009 development models. the resulting design, while foundationally solid, was limited in scope and depth because of time constraints. another option, instead of developing multipac, would have been to demonstrate an existing open-source discovery system. the advantage of this approach is that the final product would have been considerably more advanced than anything we could have developed ourselves in six weeks. on the other hand, it might not have provided a comparable learning opportunity. n survey of similar systems were its development to continue, multipac would find itself among an increasingly crowded field of competitors (table 1). a number of library discovery systems already exist, most backed by open-source or commercially available back-end search engines (table 2), which handle the nitty-gritty, low-level ingestion, indexing, and retrieval. these lists of systems are by no means comprehensive and do not include notable experimental or research systems, which would make them much longer. n architecture in terms of how they carry out a search, meta-search applications can be divided into two main groups: distributed (or federated search), in which searches are “broadcast” to individual resources that return results in real time (figure 4); and harvested search, in which searches are carried out against a local index of resource contents (figure 5).3 both have advantages and disadvantages beyond the scope of this article. multipac takes the latter approach. it consists of three primary components: the search server, the user interface, and the metadata harvesting system (figure 6). figure 4. the federated search process figure 5. the harvested search process figure 6. the three main components of multipac figure 3. patron-centric resource provisioning information discovery insights gained from multipac | dolski 175 n search server after some research, solr was chosen as the search server because of its ease of use, proven library track record, and http–based representational state transfer (rest) application programming interface (api), which improves network-topological flexibility, allowing it to be deployed on a different server than the front-end web application—an important consideration in our server environment.4 jetty—a java web application server bundled with solr—proved adequate and convenient for our needs. the metadata schema used by solr can be customized. we derived ours from the unqualified dublin core metadata element set (dcmes),5 with a few fields removed and some fields added, such as “library” and “department,” as well as fields that support various multipac features, such as thumbnail images, and primary record urls. dcmes was chosen for its combination of generality, simplicity, and familiarity. in practice, the solr schema is for finding purposes only, so whether it uses a standard schema is of little importance. n user interface the front-end multipac system is written in php 5.2 in a model-view-controller design based on classical object design principles. to support modularity, new resources can be added as classes that implement a resource-class interface. the multipac html user interface is composed of five views: search, browse, results, item, and list, which exist to accommodate the finding process illustrated in figure 7. each view uses a custom html template that can be easily styled by nonprogrammer web designers. (needless to say, judging by figures 8–12, they haven’t been.) most dynamic code is encapsulated within dedicated “helper” methods in an attempt to decouple the templates from the rest of the system. output formats, like resources, are modular and decoupled from the core of the system. the html user interface is one of several interfaces available to the multipac system; others include xml and json, which effectively add web services support to all encompassed resources—a feature missing from many of the resources’ own built-in interfaces.6 n search view search view (figure 8) is the simplest view, serving as the “front page.” it currently includes little more than a brief introduction and search field. the search field is not complicated; it is, in fact, possible to include search forms on any webpage and scope them to any subset of resources on the basis of facet queries. for example, a search form could be scoped to las vegas–related resources in special collections, which would satisfy the demand of some library departments for custom search engines tailored to their resources without contributing to the “interface fragmentation” effect discussed in the introduction. (this would require a higher level of metadata quality than we currently have, which will be discussed in depth later.) because search forms can be added to any page, this view is not essential to the multipac system. to improve simplification, it could be easily removed and replaced with, for example, a search form on the library homepage. n browse view browse view (figure 9) is an alternative to search view, intended for situations in which the user lacks a “concrete target” (figure 7). as should be evident by its appearance, figure 7. the information-finding process supported by multipac figure 8. the multipac search view page 176 information technology and libraries | december 2009 this is the least-developed view, simply displaying facet terms in an html unordered list. notice the facet terms in the format field; this is malprocessed, marc– encoded information resulting from a quick-and-dirty extensible stylesheet language (xsl) transformation from marcxml to solr xml. n results view the results page (figure 10) is composed of three columns: 1. the left column displays a facet list—a feature generally found to be highly useful for results-gathering purposes.7 the data in the list is generated by solr and transformed to an html unordered list using php. the facets are configurable; fields can be made “facetable” in the solr schema configuration file. 2. the center column displays results for the current search query that have been provided by solr. thumbnails are available for resources that have them; generic icons are provided for those that do not. currently, the results list displays item title and description fields. some items have very rich descriptions; others have minimal descriptions or no descriptions at all. this happens to be one of several significant metadata quality issues that will be discussed later. 3. the right column displays results from nonindexed resources, including any that it would not be feasible to index locally, such as google, our article databases, and so on. multipac displays these resources as collapsed panes that expand when their titles are clicked and initiate an ajax request for the current search query. in a situation in which there might be twenty or more “panes” to load, performance would obviously suffer greatly if each one had to be queried each time the results page loaded. the on-demand loading process greatly speeds up the page load time. currently, the right column includes only a handful of resource panes—as many as could be developed in six weeks alongside the rest of the prototype. it is anticipated that further development would entail the addition of any number of panes—perhaps several dozen. the ease of developing a resource pane can vary greatly depending on the resource. for developerfriendly resources that offer a useful javascript object notation (json) api, it can take less than half an hour. for article databases, which vendors generally take great pains to “lock down,” the task can entail a two-day marathon involving trial-and-error http-request-token authentication and screen-scraping of complex invalid html. in some cases, vendor license agreements may prohibit this kind of use altogether. there is little we can do about this; clearly, one of multipac’s severest limitations is its lack of adeptness at searching these types of “closed” remote resources. n item view item view (figure 11) provides greater detail about an individual item, including a display of more metadata fields, an image, and a link to the item in its primary context, if available. it is expected that this view also would include holdings status information for opac resources, although this has not been implemented yet. the availability of various page features is dependent on values encoded in the item’s solr metadata record. for example, if an image url is available, it will be displayed; if not, it won’t. an effort was made to keep the view logic separate from the underlying resource to improve code and resource maintainability. the page template itself does not contain any resource-dependent conditionals. n list view list view (figure 12), essentially a “favorites” or “cart” view, is so named because it is intended to duplicate the list feature of unlv libraries’ innovative millennium figure 9. the multipac browse view page information discovery insights gained from multipac | dolski 177 opac. the user can click a button in either results view or item view to add items to the list, which is stored in a cookie. although currently not feature-rich, it would be reasonable to expect the ability to send the list as an e-mail or text message, as well as other features. n metadata harvesting system for metadata to be imported into solr, it must first be harvested. in the harvesting process, a custom script checks source data and compares it with local data. it downloads new records, updates stale records, and deletes missing records. not all resources support the ability to easily check for changed records, meaning that the full record set must be downloaded and converted during every harvest. in most cases, this is not a problem; most of our resources (the library catalog excluded) can be fully dumped in a matter of a few seconds each. in a production environment, the harvest scripts would be run automatically every day or so. in practice, every resource is different, necessitating a different harvest script. the open archives initiative protocol for metadata harvesting (oai-pmh) is the protocol that first jumps to mind as being ideal for metadata harvesting, but most of our resources do not support it. ideally, we would modify as many of them as possible to be oai–compliant, but that would still leave many that are out of our hands. either way, a substantial number of custom harvest scripts would still be required. for demonstration purposes, the multipac prototype was seeded with sample data from a handful of diverse resources: 1. a set of 16,000 marc records from our library catalog, which we converted to marcxml and then to solr xml using xsl transformations 2. our locally built las vegas architects and buildings database, a mysql database containing more than 10,000 rows across 27 tables, which we queried and dumped into xml using a php script 3. our locally built special collections database, a smaller mysql database, which we dealt with the same way 4. our contentdm digital collections, which we downloaded via oai-pmh and transformed using another custom xsl stylesheet there are typically a variety of conversion options for each resource. because of time constraints, we simply chose what we expected would be the quickest route for each, and did not pay much attention to the quality of the conversion. n how multipac answers unlv libraries’ discovery questions multipac has essentially proven its capability of solving interface multiplication and fragmentation issues. figure 10. the multipac results view page 178 information technology and libraries | december 2009 by adding a layer of abstraction between resource and patron, it enables us to reference abstract resources instead of their specific implementations—for example, “the library catalog” instead of “the innopac catalog.” this creates flexibility gains with regard to resource provision and deployment. this kind of “pervasive decoupling” can carry with it a number of advantages. first, it can allow us to provide custom-developed services that vendors cannot or do not offer. second, it can prevent service interruptions caused by maintenance, upgrades, or replacement of individual back-end resources. third, by making us less dependent on specific implementations of vendor products—in other words, reducing vendor “lock-in”—it can potentially give us leverage in vendor contract negotiations. because of the breadth of information we offer from our website gateway, we as a library are particularly sensitive about the continued availability of access to our resources at stable urls. when resources are not persistent, patrons and staff need to be retrained, expectations need to be adjusted, and hyperlinks—scattered all over the place—need to be updated. by decoupling abstract resources from their implementations, multipac becomes, in effect, its own persistent uri system, unifying many library resources under one stable uri schema. in conjunction with a url rewriting system on the web server, a resource-based uri schema (figure 13) would be both powerful and desirable.8 n lessons learned in the development of multipac the lessons learned in the development of multipac fall into three main categories, listed here in order of importance. metadata quality considerations quality metadata—characterized by unified schemas; useful crosswalking; and consistent, thorough description—facilitates finding and gathering. in practice, a surrogate record is as important as the resource it describes. below a certain quality threshold, its accompanying resource may never be found, in which case it may as well not exist. surrogate record quality influences relevance ranking and can mean the difference between the most relevant result appearing on page 1 or page 50 (relevance, of course, being a somewhat disputed term). solr and similar systems will search all surrogates, including those that are of poor quality, but the resulting relevancy ranking will be that much less meaningful. figure 13. example of an implementation-based vs. resource-based uri implementation-based http://www.library.unlv.edu/arch/archdb2/index.php/projects/view/1509 resource-based (hypothetical) http://www.library.unlv.edu/item/483742 figure 11. the multipac item view page figure 12. the multipac list view page information discovery insights gained from multipac | dolski 179 metadata quality can be evaluated on several levels, from extremely specific to extremely broad (figure 14). that which may appear to be adequate at one level may fail at a higher level. using this figure as an example, multipac requires strong adherence to level 5, whereas most of our metadata fails to reach level 4. a “level 4 failure” is illustrated in table 3, which compares sample metadata records from four different multipac resources. empty cells are not necessarily “bad”— not all metadata elements apply to all resources—but this type of inconsistency multiplies as the number of resources grows, which can have negative implications for retrieval. suggestions for improving metadata quality the results from the multipac project suggest that metadata rules should be applied strictly and comprehensively according to library-wide standards that, at our libraries, have yet to be enacted. surrogate records must be treated as must-have (rather than nice-to-have) features of all resources. resources that are not yet described in a system that supports searchable surrogate records should be transitioned to one that does; for example, html webpages should be transitioned to a content management system with metadata ascription and searchability features (at unlv, this is planned). however, it is not enough for resources to have high-quality metadata if not all schemas are in sync. there exist a number of resources in our library that are well-described but whose schemas do not mesh well with other resources. different formats are used; different descriptive elements figure 14. example scopes of metadata application and evaluation, from broad (top) to specific table 3. comparing sample crosswalked metadata from four different unlv libraries resources library catalog digital collections special collections database las vegas architects & buildings database title goldfield: boom town of nevada map of tonopah mining district, nye county, nevada 0361 : mines and mining collection flamingo hilton las vegas creator paher, stanley w. booker & bradford call number f849.g6p34 contents (item-level description of contents) format digital object photo collections database record language eng eng eng coverage tonopah mining district (nev.) ; ray mining district (nev.) description (omitted for brevity) publisher nevada publications university of nevada las vegas libraries unlv architecture studies library subject (lcsh omitted for brevity) (lcsh omitted for brevity) 180 information technology and libraries | december 2009 are used; and different interpretations, however subtle, are made of element meanings. despite the best intentions of everyone involved with its creation and maintenance, and despite the high quality of many of our metadata records when examined in isolation, in the big picture, multipac has demonstrated—perhaps for the first time—how much work will be needed to upgrade our metadata for a discovery system. would the benefits make the effort worthwhile? would the effort be implementable and sustainable given the limitations of the present generation of “silo” systems? what kind of adjustments would need to be made to accommodate effective workflows, and what might those workflows look like? these questions still await answers. of note, all other open-source and vendor systems suffer from the same issues, which is a key reason that these types of systems are not yet ascendant in libraries.9 there is much promise in the ability of infrastructural standards like frbr, skos, rda, and the many other esoteric information acronyms to pave the way for the next generation of library discovery systems. organizational considerations electronic information has so far proved relatively elusive to manage; some of it is ephemeral in existence, most of it is constantly changing, and all of it is from diverse sources. attempts to deal with electronic resources—representing them using catalog surrogate records, streamlining website portals, farming out the problem to vendors—have not been as successful as they have needed to be and suffer from a number of inherent limitations. multipac would constitute a major change in library resource provision. our library, like many, is for the most part organized around a core 1970s–80s ils–support model that is not well adapted to a modern unified discovery environment. next-generation discovery is trending away from assembly-line-style acquisition and processing of primarily physical resources and toward agglomerating interspersed networked and physical resource clouds from onand offsite.10 in this model, increasing responsibilities are placed on all content providers to ensure that their metadata conforms to site-wide protocols that, at our library, have yet to be developed. n conclusion in deciding how to best deal with discovery issues, we found that a traditional product matrix comparison does not address the entire scope of the problem, which is that some of the discoverability inadequacies in our libraries are caused by factors that cannot be purchased. sound metadata is essential for proper functioning of a unified discovery system, and descriptive uniformity must be ensured on multiple levels, from the element level to the institution level. technical facilitators of improved discoverability already exist; the responsibility falls on us to adapt to the demands of future discovery systems. the specific discovery tool itself is only a facilitator, the specific implementation of which is likely to change over time. what will not change are library-wide metadata quality issues that will serve any tool we happen to deploy. the multipac project brought to light important library-wide discoverability issues that may not have been as obvious before, exposing a number of limitations in our existing metadata as well as giving us a glimpse of what it might take to improve our metadata to accommodate a next-generation discovery system, in whatever form that might take. references 1. unlv libraries usability committee, internal library website usability testing, las vegas, 2008. 2. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools.” report prepared for the library of congress, 2006. 3. xiaoming liu et al., “federated searching interface techniques for heterogeneous oai repositories,” journal of digital information 4, no. 2 (2002). 4. apache software foundation, apache solr, http://lucene .apache.org/solr/ (accessed june 11, 2009). 5. dublin core metadata initiative, “dublin core metadata element set, version 1.1,” jan. 14, 2008, http://dublincore.org/ documents/dces/ (accessed june 25, 2009). 6. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, jan. 20, 2006, http://orweblog.oclc .org/archives/000927.html (accessed july 15, 2009). 7. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 4, no. 25 (2007): 550–61. 8. tim berners-lee, “hypertext style: cool uris don’t change,” 1998, http://www.w3.org/provider/style/uri (accessed june 23, 2009). 9. bowen, jennifer, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology and libraries 2, no. 27 (june 2008): 6–19. 10. calhoun, “the changing nature of the catalog.” editorial | truitt 163 ■■ the space in between in my opinion, ital has an identity crisis. it seems to try in many ways to be scholarly like jasist, but lita simply isn’t as formal a group as asist. on the other end of the spectrum, code4lib is very dynamic, informal and community-driven. ital kind of flops around awkwardly in the space in between. —comment by a respondent to ital’s reader survey, december 2009 last december and january, you, the readers of information technology and libraries were invited to participate in a survey aimed at helping us to learn your likes and dislikes about ital, and where you’d like to see this journal go in terms of several important questions. the responses provide rich food for reflection about ital, its readers, what we do well and what we don’t, and our future directions. indeed, we’re still digesting and discussing them, nearly a year after the survey. i’d like to use some of my editorial space in this issue to introduce, provide an overview, and highlight a few of the most interesting results. i strongly encourage you to access the full survey results, which i’ve posted to our weblog italica (http:// ital-ica.blogspot.com/); i further invite you to post your own thoughts there about the survey results and their meaning. we ran the survey from mid-december to mid-january. a few responses trickled in as late as mid-february. the survey invitation was sent to the 2,614 lita personal members; nonmembers and ital subscribers (most of whom are institutions) were excluded. we ultimately received 320 responses—including two from individuals who confessed that they were not actually lita members—for a response rate of 12.24 percent. thus the findings reported below reflect the views of those who chose to respond to the survey. the response rate, while not optimal, is not far from the 15 percent that i understand lita usually expects for its surveys. as you may guess, not all respondents answered all questions, which accounts for some small discrepancies in the numbers reported. who are we? in analyzing the survey responses, one of the first things one notices is the range and diversity of ital’s reader base, and by extension, of lita’s membership. the largest groups of subscribers identify themselves either as traditional systems librarians (58, or 18.2 percent) or web services/development librarians (31, or 9.7 percent), with a further cohort of 7.2 percent (23) composed of those working with electronic resources or digital projects. but more than 20 percent (71) come from the ranks of library directors and associate directors. nearly 15 percent (47) identify their focus as being in the areas of reference, cataloguing, acquisitions, or collection development. see figure 1. the bottom line is that more than a third of our readers are coming from areas outside of library it. a couple of other demographic items: ■■ while nearly six in ten respondents (182, or 57.6 percent) work in academic libraries, that still leaves a sizable number (134, or 42.3 percent) who don’t. more than 14 percent (45) of the total 316 respondents come from the public library sector. ■■ nearly half (152, or 48.3 percent) of our readers indicated that they have been with lita for five years or fewer. note that this does not necessarily indicate the age or number of years of service of the respondents, but it’s probably a rough indicator. still, i confess that this was something of a surprise to me, as i expected larger numbers of long-time members. and how do the numbers shake out for us old geezers? the 6–10 and greater-than-15-years cohorts each composed about 20 percent of those responding; interestingly, only 11.4 percent (36) answered that they’d been lita members for between 11 and 15 years. assuming that these numbers are an accurate reflection of lita’s membership, i can’t help but wonder about the explanation for this anomaly.” see figure 2. how are we doing? question 4 on the survey asked readers to respond to several statements: “it is important to me that articles in ital are peerreviewed.” more than 75 percent (241, or 77.2 percent) answered that they either “agreed” or “strongly agreed.” “ital is timely.” more than seven in ten respondents (228, or 73.0 percent) either “agreed” or “strongly agreed” that ital is timely. only 27 (8.7 percent) disagreed. as a technology-focused journal, where time-to-publication is always a sensitive issue, i expected more dissatisfaction on this question (and no, that doesn’t mean that i don’t worry about the nine percent who believe we’re too slow out of the gate). marc truitt editorial: the space in between, or, why ital matters marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 164 information technology and libraries | december 2010 would likely quit lita, with narrative explanations that clearly underscore the belief that ital—especially a paper ital—is viewed by many as an important benefit of membership. the following comments are typical: ■■ “lita membership would carry no benefits for me.” ■■ “dues should decrease, though.” [from a respondent who indicated he or she would retain lita “i use information from ital in my work and/ or i find it intellectually stimulating.” by a nearly identical margin to that regarding timeliness, ital readers (226, or 72.7 percent) either “agreed” or “strongly agreed” that they use ital in their work or find its contents stimulating. “ital is an important benefit of lita membership.” an overwhelming majority (248, or 79.78 percent) of respondents either “agreed” or “strongly agreed” with this statement.1 this perception clearly emerges again in responses to the questions about whether readers would drop their lita membership if we produced an electronic-only or open-access ital (see below). where should we be going? several questions sought your input about different options for ital as we move forward. question 7, for example, asked you to rank how frequently you access ital content via several channels, with the choices being “print copy received via membership,” “print copy received by your institution/library,” “electronic copy from the ital website,” or “electronic copy accessed via an aggregator service to which your institution/library subscribes (e.g., ebsco).” the choice most frequently accessed was the print copy received via membership, at 81.1 percent (228). question 8 asked about your preferences in terms of ital’s publication model. of the 307 responses, 60.6 percent (186) indicated a preference for continuance of the present arrangement, whereby we publish both paper and electronic versions simultaneously. four in ten respondents preferred that ital move to publication in electronic version only.2 of those who favored continued availability of paper, the great majority (159, or 83.2 percent) indicated in question 9 that they simply preferred reading ital in paper. those who advocate moving to electronic-only do so for more mixed reasons (question 10), the most popular being cost-effectiveness, timeliness, and the environmental friendliness of electronic publication. a final question in this section asked that you respond to the statement “if ital were to become an electronic-only publication i would continue as a dues-paying member of lita.” while a reassuring 89.8 percent (273) of you answered in the affirmative, 9.5 percent (29) indicated that you figure 2. years of lita membership figure 1. professional position of lita members 18.2% (58) 0.3% (1) 0.6% (2) 0.6% (2) 0.9% (3) 2.2% (7) 2.5% (8) 3.1% (10) 4.1% (13) 4.4% (14) 6.3% (20) 7.9% (25) 9.4% (30) 9.7% (31) 12.9 % (41) 16.7% (53) 0% 5% 10% 15% 20% systems librarian (includes responsibility for ils, servers, workstat... other (please specify) library director web services/development librarian deputy/associate/assistant director reference services librarian cataloging librarian consortium/network/vendor librarian electronic resources librarian digital projects/digitization librarian student teaching faculty computing professional (non-mls) resource sharing librarian acquisitions/collection development librarian other library staff (non-mls) 11.4% (36) 19.7% (62) 20.0% (63) 48.3% (152) 0% 10% 20% 30% 40% 5 years or less 11–15 years 6–10 years more than 15 years editorial | truitt 165 his lipstick-on-a-pig ils. somewhere else there’s a library blogger who fends off bouts of insomnia by reading “wonky” ital papers in the wee hours of the morning. and that ain’t the half of it, as they say. in short—in terms of readers, interests, and preferences—“the space in between” is a pretty big niche for ital to serve. we celebrate it. and we’ll keep trying our best to serve it well. ■■ departures as i write these lines in late-september, it’s been a sad few weeks for those of us in the ital family. in mid-august, former ital editor jim kopp passed away following a battle with cancer. last week, dan marmion—jim’s successor as editor (1999–2004)—and a dear friend to many of us on the current ital editorial board—also left us, the victim of a malignant brain tumor. i never met jim, but lita president karen starr eulogized him in a posting to lita-l on august 16, 2010.3 i noted dan’s retirement due to illness in this space in march.4 i first met dan in the spring of 2000, when he arrived at notre dame as the new associate director for information systems and digital access (i think the position was differently titled then) and, incidentally, my new boss. dan arrived only six weeks after my own start there. things at notre dame were unsettled at the time: the libraries had only the year before successfully implemented exlibris’ aleph500 ils, the first north american site to do so. while exlibris moved on to implementations at mcgill and the university of iowa, we at notre dame struggled with the challenges of supporting and upgrading a system then new to the north american market. it was not always easy or smooth, but throughout, dan always maintained an unflappable and collegial manner with exlibris staff and a quiet but supportive demeanor toward those of us who worked for him. i wish i could say that i understood and appreciated this better at the time, but i can’t. i still had some growing ahead of me—i’m sure that i still do. dan was there for me again as an enthusiastic reference when i moved on, first to the university of houston in 2003 and then to the university of alberta three years later. in these jobs i’d like to think i’ve come to understand a bit better the complex challenges faced by senior managers in large research libraries; in the process, i know i’ve come to appreciate dan’s quiet, knowledgeable, and hands-off style with department managers. it is one i’ve tried (not always successfully) to cultivate. while i was still at notre dame, dan invited me to join the editorial board of information technology and libraries, a group which over the years has come to include many “friends of dan,” including judith carter (quite possibly the world’s finest managing editor), andy boze (ital’s membership] ■■ “ital is the major benefit to me as we don’t have funds for me to attend lita meetings or training sessions.” ■■ “the paper journal is really the only membership benefit i use regularly.” ■■ “actually my answer is more, ‘i don’t know.’ i really question the value of my lita membership. ital is at least some tangible benefit i receive. quite honestly, i don’t know that there really are other benefits of lita membership.” question 12 asked about whether ital should continue with its current delayed open-access model (i.e., the latest two issues embargoed for non-lita members), or go completely open-access. by a three-to-two margin, readers favored moving to an open-access model for all issues. in the following question that asked whether respondents would continue or terminate lita membership were ital to move to a completely open-access publication model, the results were remarkably similar to those for the question linking print availability to lita membership, with the narrative comments again suggesting much the same underlying reasoning. in sum, the results suggest to me more satisfaction with ital than i might have anticipated; at the same time, i’ve only scratched the surface in my comments here. the narrative answers in particular—which i have touched on in only the most cursory fashion—have many things to say about ital’s “place,” suggestions for future articles, and a host of other worthy ideas. there is as well the whole area of crosstabbing: some of the questions, when analyzed with reference to the demographic answers in the beginning of the survey, may highlight entirely new aspects of the data. who, for instance, favors continuance of a paper ital, and who prefers electronic-only? but to come back to that reader’s comment about ital and “the space in between” that i used to frame this discussion (indeed, this entire column): to me, the demographic responses—which clearly show ital has a substantial readership outside of library it—suggest that that “space in between” is precisely where ital should be. we may or may not occupy that space “awkwardly,” and there is always room for improvement, although i hope we do better than “flop around”! the results make clear that ital’s readers—who would be you!—encompass the spectrum from the tech-savvy early-career reader of code4lib journal (electronic-only, of course!) to the library administrator who satisfies her need for technology information by taking her paper copy of ital along when traveling. elsewhere on that continuum, there are reference librarians and catalogers wondering what’s new in library technology, and a traditional systems librarian pondering whether there is an open-source discovery solution out there that might breathe some new life into 166 information technology and libraries | december 2010 between membership and receiving the journal. many of them appear to infer that a portion of their lita dues, then, are earmarked for the publication and mailing of ital. sadly, this is not the case. in years past, ital’s income from advertising paid the bills and even generated additional revenue for lita coffers. today, the shoe is on the other foot because of declining advertising revenue, but ital is still expected to pay its own way, which it has failed to do in recent years. but to those who reasonably believe that some portion of their dues is dedicated to the support of ital, well, t’ain’t so. bothered by this? complain to the lita board. 2. as a point of comparison, consider the following results from the 2000 ital reader survey. respondents were asked to rank several publishing options on a scale of 1 to 3 (with 1 = most preferred option and 3 = least preferred option): ital should be published simultaneously as a print-onpaper journal and an electronic journal (n = 284): 1 = 169 (59.5%); 2 = 93 (32.7%); 3 = 22 (7.7%) ital should be published in an electronic form only (n = 293): 1 = 55 (18.8%); 2 = 61 (20.8%); 3 = 177 (60.4%) in other words, then as now, about 60% of readers preferred paper and electronic to electronic-only. 3. karen starr, “fw: [libs-or] jim kopp: celebration of life,” online posting, aug. 16, 2010, lita-l, http://lists.ala. org/sympa/arc/lita-l/2010-08/msg00079.html (accessed sept. 29, 2010). 4. marc truitt, “dan marmion,” information technology & libraries 29 (mar. 2010): 4, http://www.ala.org/ala/mgrps/ divs/lita/ital/292010/2901mar/editorial_pdf.cfm (accessed sept. 29, 2010). webmaster), and mark dehmlow. while dan left ital in 2004, i think that he left the journal a wonderful and lasting legacy in these extremely capable and dedicated folks. my fondest memories of dan concern our shared passion for model trains. i remember visiting a train show in south bend with him a couple of times, and our last time together (at the ala midwinter meeting in denver two years ago) was capped by a snowy trek with exlibris’ carl grant, another model train enthusiast, to the mecca of model railroading, caboose hobbies. three boys off to see their toys—oh, exquisite bliss! i don’t know whether ital or its predecessor jola have ever reprinted an editorial, but while searching the archives to find something that would honor both jim and dan, i found a piece that i hope speaks eloquently of their contributions and to ital’s reason for being. dan’s editorial, “why is ital important?” originally published in our june 2002 issue, appears again immediately following this column. i think its message and the views expressed therein by jim and dan remain as valid today as they were in 2002. they also may help to frame my comments concerning our reader survey in the previous section. farewell, jim and dan. you will both be sorely missed. notes and references 1. a number of narrative answers to the survey make it clear that ital readers who are lita members perceive a link introducing zoomify image | smith 29 column title editor author id box for 3 column layout playing tag in the dark: diagnosing slowness in library response time | brown-sica 29 margaret brown-sicatutorial playing tag in the dark: diagnosing slowness in library response time in this article the author explores how the systems department at the auraria library (which serves more than thirty thousand primarily commuting students at the university of colorado–denver, the metropolitan state college of denver, and the community college of denver) diagnosed and analyzed slow response time when querying proprietary databases. issues examined include vendor issues, proxy issues, library network hardware, and bandwidth and network traffic. w hy is everything so slow?” this is the question that library systems departments often have the most trouble answering. it is also easy to dismiss because it is often the fault of factors beyond the control of library staff. what usually prompts these questions are the experiences of the reference librarians. when these librarians are trying to help students at the reference desk, it is very frustrating when databases seem to respond to queries slowly, files take forever to load onto the computer screen, and all the while the line in front of the desk get continues to grow. or the library gets calls from students using databases and the catalog from their homes who complain that searching library resources takes too long, and that they are getting frustrated and using google instead. this question is so painful because libraries spend so much of their shrinking budgets on high quality information in the form of expensive proprietary databases, and it is all wasted if users have trouble using them. in this case the problem seemed to be how slow the process of searching for information and downloading documents from databases was. for lack of a better term, the auraria library called this the “response time” problem. this article will discuss the various ways the systems (technology) department of the auraria library, which serves the university of colorado–denver, metropolitan state college of denver, and the community college of denver, tried to identify problems and improve database response time. the systems department defined “response time” as the time it took for a person to send a query from a computer at home or in the library to a proprietary information database and receive a response back, or how long it took to load a selected fulltext article from a database. when a customer sets out to use a database in the library, the query to the database could be slowed down by many different factors. the first is the proxy, in our case innovative interfaces’ inc. web access management (iii wam), a product that authenticates the user via the iii api (application program interface) product. to do this the query travels over network hardware, switches, and wires to the iii server and back again. then the query goes to the database’s server, which may be almost anywhere in the world. hardware problems at the database vendor’s end can affect this transfer. in the case of auraria library this transfer can be influenced by traffic on the library’s network, the university’s network, and any other place in between. this could also be hampered by the amount of memory in the computer where the query originates, by the amount of tasks being performed by that computer, etc. the bandwidth of the network and its speed can also have an effect. basically, the bottlenecks needed to be found and fixed. bottlenecks are described by webopedia as “the delay in transmission of data through the circuits of a computer’s microprocessor or over a tcp/ip network. the delay typically occurs when a system’s bandwidth cannot support the amount of information being relayed at the speed it is being processed. there are, however, many factors that can create a bottleneck in a system.”1 literature review there is not a lot on database response slowness in library literature, probably because the issue overlaps with computer science and really is not one problem but a possibility of one of several problems. the issue is figuring out where the problem lies. gerhan and mutula examined technical reasons for network slowness, performing bandwidth testing at a library in botswana and one in the united states using the same computer, and giving several suggestions for testing, fixing technical problems, and issues to examine. gerhan and mutula concluded that bandwidth and insufficient network infrastructure were the main culprits in their situation. they studied both bandwidth and bandwidth “squeeze.” looking for the bandwidth “squeeze” means looking along the internet’s “journey of many stages through routers and exchange points, each successively farther removed from the user.”2 bandwidth bottlenecks could occur at any one or more of those stages in the query’s transmission. the following four sections parse that lengthy pathway and examine how each may contribute to delays. badue et al. in their article “basic issues on the processing of web queries,” described web margaret brown-sica (margaret.brown -sica@ucdenver.edu) is head of technology and distance education support, auraria library, serving the university of colorado–denver, metropolitan state college of denver, and the community college of denver. 30 information technology and libraries | december 200830 information technology and libraries | december 2008 queries, load balancing, and how they function.3 bertot and mcclure’s “assessing sufficiency and quality of bandwidth for public libraries” is based on data collected as part of the 2006 public libraries and the internet study and provides a very straightforward approach for checking specific areas for problems.4 it outlines why basic data such as bandwidth readings may not give the complete picture. it also gives a nice outline of factors involved such as local settings and parameters, ultimate connectivity path, application resource needs, and protocol priority. azuma, okamoto, hasegawa, and masayuki’s “design, implementation and evaluation of resource management system for internet servers” was very helpful in understanding the role and function of proxy servers and problems they can present.5 vendor issues this is a very thorny topic because it is out of the library’s control, and also because the library has so many databases. the systems department asked the reference staff to send reports of problems listing the type of activity attempted, time and dates, the names of the database, the problem and any error messages encountered. a few that seemed to be the slowest were selected for special examination. one vendor worked extensively with the library and in the end it was believed that there were problems at their end in load balancing, which eventually seemed to be fixed. that company was in the middle of a merger and that may have also been an issue. we also noted that a database that uses very large image files, artstor, was hard to use because it was so slow. this company sent the library an application that simulated the databases’ use and was supposed to test to see if bandwidth at auraria library was sufficient for that database. according to the test, it was. databases that consistently were perceived as the slowest were those that had the largest documents and pictures, such as those that used primarily pdfs and visual material. this, with the results of the testing, pointed to a problem independent of vendor issues. bandwidth and network traffic the systems department decided to do bandwidth testing on the library’s public and staff computers after reading gerhan and mutula’s article about the university of botswana. the general perception is that bandwidth is often the primary problem in network slowness, as well as the problems with databases that use larger files. several of the computers were tested in several successive days during what is usually the busiest time for the network, between noon and 2 p.m. the results were good, averaging about 3000 kilobytes per second (kbps). for this test we used the cnet bandwidth meter, which downloads an image to your computer, measures the time of the download, and compares it to the maximum speeds offered by other internet service providers.6 there are several bandwidth meters available on the internet. when the network administrator checked the switches for network traffic, they showed low traffic, almost always less than 20 percent of capacity. this was confusing: if the problem was neither with the bandwidth nor the vendors, what was causing the slow network performance? one of the university network administrators was consulted to see if any factor in their sphere could be having an effect on our network. we knew that the main university network had implemented a bandwidth shaper to regulate bandwidth. “these devices limit bandwidth . . . by greedy applications, guarantee minimum throughput for users, groups or protocols, and better utilize widearea connections by smoothing out bursty traffic.”7 it was thought that perhaps this might be incorrectly prioritizing some of the library’s traffic. this was a dead end, though—the network administrators had stopped using the device. if the bandwidth was good and the traffic was manageable, then the problem appeared to not be at the library. however, according to bertot and mcclure, the bandwidth question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” . . . such arbitrary definitions to describe bandwidth sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction. there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet-based applications and services. this inadequacy will increase significantly as internet-based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high-use multiple-workstation public-access context.8 proxy issues auraria library uses the iii wam proxy server product. there were several things that pointed to the introducing zoomify image | smith 31playing tag in the dark: diagnosing slowness in library response time | brown-sica 31 proxy being an issue. one was that the systems department had been experimenting with invoking the proxy in the library building in order to collect more accurate statistics and found that complaints about speed seemed to have started around the same time as this experiment. but if the bandwidth was not showing inadequacy and the traffic was light, why was this happening? the answer is better explained by azuma et al.: needless to say, busy web servers must have many simultaneous http sessions, and server throughput is degraded when effective resource management is not considered, even with large network capacity. web proxy servers must also accommodate a large number of tcp connections, since they are usually prepared by isps (internet service providers) for their customers. furthermore, proxy servers must handle both upward tcp connections (from proxy server to web servers) and downward tcp connections (from client hosts to proxy server). hence, the proxy server becomes a likely spot for bottlenecks to occur during web document transfers, even when the bandwidth of the network and web server performance are adequate.9 testing was done from on campus and off campus, with and without using the proxy server. the results showed that the connection was faster without the proxy. when testing was done from the health sciences library at the university of colorado with the same type of server and proxy, the response time was much faster. the difference between auraria library and the other library is that the community auraria library serves (the community college of denver, metropolitan state college, and the university of colorado–denver) has a much larger user population who overwhelmingly use databases from home, therefore taxing the proxy server. the other library belonged to a smaller campus, but the hardware was the same. the proxy was immediately dropped for on-campus users, and that resulted in some responsetime improvements. a conference call was set up with the proxy vendor to determine if improvements in response time might be attained by changing from a proxy server to ldap (lightweight directory access protocol) authentication. the response given was that although there might be other benefits, increased response time was not one of them. library network hardware it was evident that the biggest bottleneck was the proxy, so the systems department decided to take a closer look at iii’s hardware. the switch that regulated traffic between the network and the server that houses our integrated library system, part of which is the proxy server, was discovered to have been set at “halfduplex.” half-duplex refers to the transmission of data in just one direction at a time. for example, a walkie-talkie is a half-duplex device because only one party can talk at a time. in contrast, a telephone is a full-duplex device because both parties can talk simultaneously. duplex modes often are used in reference to network data transmissions. some modems contain a switch that lets you select between halfduplex and full-duplex modes. the correct choice depends on which program you are using to transmit data through the modem.10 when this setting was changed to full duplex response time increased. there was also concern that this switch had not been functioning as well as it could. the switch was replaced, and this also improved response time. in addition, the old server purchased through iii was a generic server that had specifications based on the demands of the ils software and didn’t into consideration the amount of traffic going to the proxy server. auraria library, which serves a campus of more than thirty thousand full-time equivalent students, is a library with one of the largest commuter student populations in the country. a new server had been scheduled to be purchased in the near future, so a call was made to the ils vendor to talk about our hypothesis and requirements. the vendor agreed that the library should change the specification on the new server to make sure it served the library’s unique demands. a server will be purchased with increased memory and a second processor to hopefully keep these problems from happening again in the next few years. also, the cabling between the switch and the server was changed to greater facilitate heavy traffic. conclusion although it is sometimes a daunting task to try to discover where problems occur in the library’s database response time because there are so many contributing factors and because librarians often do not feel that they have enough technical knowledge to analyze such problems, there are certain things that can be examined and analyzed. it is important to look at how each library is unique and may be inadequately served by current bandwidth and hardware configurations. it is also important not to be intimidated by computer science literature and to trust patterns of reported problems. the auraria library systems department was fortunate to also be able to compare problems with colleagues at other libraries and test in those libraries, which revealed issues that were unique and therefore most likely due to a problem at the library end. it is important to keep learning about how 32 information technology and libraries | december 200832 information technology and libraries | december 2008 your system functions and to try to diagnose the problem by slowly looking at one piece at a time. though no one ever seems to be completely satisfied with the speed of their network, the employees of auraria library, especially those who work with the public, have been pleased with the increased speed they are experiencing when using proprietary databases. having improved on the responsetime speed issue, other problems that are not caused by the proxy hardware have been illuminated, such as browser configuration, which may be hampering certain databases—something that had been attributed to the network. references 1. webopedia, s.v. “bottleneck,” www.webopedia.com/term/b/bottleneck.html (accessed oct. 8, 2008). 2. david r. gerhan and stephen mutula, “bandwidth bottlenecks at the university of botswana,” library hi tech 23, no. 1 (2005): 102–17 3. claudine badue et al., “basic issues on the processing of web queries,” sigir forum; 2005 proceedings (new york: association for computing machinery, 2005): 577–78. 4. john carlo bertot and charles r. mcclure,” assessing sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (mar. 2007): 14 –22. 5. kazuhiro azuma, takuya okamoto, go hasegawa, and murata masayuki, “design, implementation and evaluation of resource management system for internet servers,” journal of high speed networks 14, no. 4 (2005): 301–16. 6. “cnet bandwidth meter,” http:// reviews.cnet.com/internet-speed-test (accessed oct. 8, 2008). 7. michael j. demaria, “warding off wan gridlock,” network computing nov. 15, 2002, www.networkcomputing.com/ showitem.jhtml?docid=1324f3 (accessed oct. 8, 2008). 8. bertot and mcclure, “assessing sufficiency and quality of bandwidth for public libraries,” 14. 9. azuma, okamoto, hasegawa, and masayuki, “design, implementation and evaluation of resource management system for internet servers,” 302. 10. webopedia, s.v. “half-duplex,” www.webopedia.com/term/h/half _duplex.html (accessed oct. 8, 2008). lita cover 2, cover 3, cover 4 index to advertisers editorial | truitt 55 a recent library journal (lj) story referred to “the palpable hunger public librarians have for change . . . and, perhaps, a silver bullet to ensure their future” in the context of a presentation at the public library association’s 2010 annual conference by staff members of the rangeview (colo.) library district. now, lest there be any doubt on this point, allow me to state clearly from the outset that none of the following ramblings are in any way intended as a specific critique of the measures undertaken by rangeview. far be it from me to second-guess the rangeview staff’s judgment as to how best to serve the community there.1 rather, what got my attention was lj’s reference to a “palpable hunger”for magic ammunition, from whose presumed existence we in libraries seem to draw comfort. in the last quarter century, it seems as though we’ve heard about and tried enough silver bullets to keep our collective six-shooters endlessly blazing away. here are just a few examples that i can recall off the top of my head, and in no particular order: ■■ library cafes and coffee shops. ■■ libraries arranged along the lines of chain bookstores. ■■ general-use computers in libraries (including information/knowledge commons and what-have-you) ■■ computer gaming in libraries. ■■ lending laptops, digital cameras, mp3 players and ipods, e-book readers, and now ipads. ■■ mobile technology (e.g., sites and services aimed at and optimized for iphones, blackberries, etc.) ■■ e-books and e-serials. ■■ chat and instant-message reference. ■■ libraries and social networking (e.g., facebook, twitter, second life, etc.). ■■ “breaking down silos,” and “freeing”/exposing our bibliographic data to the web, and reuse by others outside of the library milieu. ■■ ditching our old and “outmoded” systems, whether the object of our scorn is aacr2, lcsh, lcc, dewey, marc, the ils, etc. ■■ library websites generally. remember how everyone—including us—simply had to have a website in the 1990s? and ever since then, it’s been an endless treadmill race to find the perfect, user-centric library web presence? if sisyphus were to be incarnated today, i have little doubt that he would appear as a library web manager and his boulder would be a library website. ■■ oh, and as long as we’re at it, “user-centricity” generally. the implication, of course, is that before the term came into vogue, libraries and librarians were not focused on users. ■■ “next-gen” catalogs. i’m sure i’m forgetting a whole lot more. anyway, you get the picture. each of these has, at one time or another, been positioned by some advocate as the necessary change—the “silver bullet”—that would save libraries from “irrelevance” (or worse!), if we would but adopt it now, or better yet, yesterday. well, to judge from the generally dismal state of libraries as depicted by some opinionmakers in our profession—or perhaps simply from our collective lack of self-esteem—we either have been misled about the potency of our ammunition, or else we’ve been very poor markspersons. notwithstanding the fact that we seem to have been indiscriminately blasting away with shotguns rather than six-shooters, our shooting has neither reversed the trends of shrinking budgets and declining morale nor staunched the ceaseless dire warnings of some about “irrelevance” resulting from ebbing library use. to stretch the analogy a bit further still, one might even argue that all this shooting has done damage of its own, peppering our most valuable services with countless pellet-sized holes. at the same time, we have in recent years shown ourselves to be remarkably susceptible to the marketingfocused hyperbole of those in and out of librarianship about technological change. each new technology is labeled a “game-changer”; change in general is either— to use the now slightly-dated, oh-so-nineties term—a “paradigm shift” or, more recently, “transformational.” when did we surrender our skepticism and awareness of a longer view? what’s wrong with this picture?2 i’d like to suggest another way of viewing this. a couple of years ago, alan weisman published the world without us, a book that should be required reading for all who are interested in sustainability, our own hubris, and humankind’s place in the world. the book begins with our total, overnight disappearance, and asks (1) what would the earth be like without us? and (2) what evidence of our works would remain, and for how long? the bottom line answers for weisman are (1) in the long run, probably much better off, and (2) not much and not for very long, really. so, applying weisman’s first question to our own, much more modest domain, what might the world be like if tomorrow librarians all disappeared or went on to work doing something else—became consultants, perhaps?— and our physical and virtual collections were padlocked? would everything be okay, because as some believe, marc truitteditorial: no more silver bullets, please marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2010 think we need to be prepared to turn off the lights, lock the doors, and go elsewhere, because i hope that what we’re doing is about more than just our own job security. and if the far-fetched should actually happen, and we all disappear? i predict that at some future point, someone will reinvent libraries and librarians, just as others have reinvented cataloguing in the guise of metadata. notes and references 1. norman oder, “pla 2010 conference: the anythink revolution is ripe,” library journal, mar. 26, 2010, http://www .libraryjournal.com/article/ca6724258.html (accessed mar. 30, 2010). there, i said it! a fairly innocuous disclaimer added to one of my columns last year seemed to garner more attention (http:// freerangelibrarian.com/2009/06/13/marc-truitts-surprising -ital-editorial/) than did the content of the column itself. will the present disclaimer be the subject of similar speculation? 2. one of my favorite antidotes to such bloated, short-term language is embodied in michael gorman’s “human values in a technological age,” ital 20, no. 1 (mar. 2000): 4–11, http:// www.ala.org/ala/mgrps/divs/lita/ital/2001gorman.cfm (accessed apr 12, 2010)—highly recommended. the following is but one of many calming and eminently sensible observations gorman makes: the key to understanding the past is the knowledge that people then did not live in the past—they lived in the present, just a different present from ours. the present we are living in will be the past sooner than we wish. what we perceive as its uniqueness will come to be seen as just a part of the past as viewed from the point of a future present that will, in turn, see itself as unique. people in history did not wear quaintly oldfashioned clothes—they wore modern clothes. they did not see themselves as comparing unfavorably with the people of the future, they compared themselves and their lives favorably with the people of their past. in the context of our area of interest, it is particularly interesting to note that people in history did not see themselves as technologically primitive. on the contrary, they saw themselves as they were—at the leading edge of technology in a time of unprecedented change. it’s all out there on the web anyway, and google will make it findable? absent a few starry-eyed bibliophiles and newly out-of-work librarians—those who didn’t make the grade as consultants—would anyone mourn our disappearance? would anyone notice? if a tree falls in the woods . . . in short, would it matter? and if so, why and how much? the answer to the preceding two questions, i think, can help to point the way to an approach for understanding and evaluating services and change in libraries that is both more realistic and less draining than our obsessive quest for the “silver bullet.” what exactly is our “valueadd”? what do we provide that is unique and valuable? we can’t hope to compete with barnes and noble, starbucks, or the googleplex; seeking to do so simply diverts resources and energy from providing services and resources that are uniquely ours. instead, new and changed services and approaches should be evaluated in terms of our value-add: if they contribute positively and are within our abilities to do them, great. if they do not contribute positively, then trying to do them is wasteful, a distraction, and ultimately disillusioning to those who place their hopes in such panaceas. some of the “bullets” i listed above may well qualify as contributing to our value-add, and that’s fine. my point isn’t to judge whether they are “bad” or “good.” my argument is about process and how we decide what we should do and not do. understanding what we contribute that is uniquely ours should be the reference standard by which proposed changes are evaluated, not some pie-inthe-sky expectation that pursuit of this or that vogue will magically solve our funding woes, contribute to higher (real or virtual) gate counts, make us more “relevant” to a particular user group, or even raise our flagging selfesteem. in other words, our value-add must stand on its own, regardless of whether it actually solves temporal problems. it is the “why” in “why are we here?” if, at the end of the day, we cannot articulate that which makes us uniquely valuable—or if society as a whole finds that contribution not worth the cost—then i 8 information technology and libraries | june 20088 information technology and libraries | september 2008 from our readers: virtues and values in digital library architecture mark cyzyk editor’s note: “from our readers” will be an occasional feature, highlighting ital readers’ letters and commentaries on timely issues. at the fall 2007 coalition for networked information (cni) conference in washington, d.c., i pre-sented “a survey and evaluation of open-source electronic publishing systems.” toward the end of my presentation was a slide enumerating some of the things i had personally learned as a web application architect during my review of the systems under consideration: n platform independence should not be neglected. n one inherits the flaws of external libraries and frameworks. choose with care. n installation procedures must be simple and flawless. n don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. n documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” initially, these were just notes i thought might be useful to others, figuring it’s typically helpful to share experiences, especially at international conferences. but as i now look at those maxims, it occurs to me that when abstracted further they point in the direction of more general concepts and traits—concepts and traits that accurately describe us and the products of our labor if we are successful, and prescribe to us the concepts and traits we need to understand and adopt if we are not. in short, peering into each maxim, i can begin to make out some of the virtues and values that underlie, or should underlie, the design and architecture of our digital library systems. n freedom and equality platform independence should not be neglected. “even though this application is written in platformindependent php, the documentation says it must be run on either red hat or suse, or maybe it will run on solaris too, but we don’t have any of these here.” while i no doubt will be heartily flamed for suggesting that microsoft has done more to democratize computing than any other single company, i nevertheless feel the need to point out that, for many of us, windows server operating systems and our responsibility for administering them way back when provided the impetus for adding our swipe-card barcodes to the acl of the data center—surely a badge of membership in the club of enterprise it if ever there was one. you may not like the way windows does things. you may not like the way microsoft plays with the other boys. but to act like they don’t exist is nothing more than foolish burying one’s head in the *nix sand. windows servers have proven themselves time and again as being affordable, easily managed, dependable, and, yes, secure workhorses. windows is the ford pickup truck of the server world, and while that pickup will some day inevitably suffer a blowout of its twenty-year-old head gasket (and will therefore be respectfully relegated to that place where all dearly departed trucks go), it’s been a long and good run. we should recognize and appreciate this. windows clearly has a place in the data center, sitting quietly humming alongside its unix and linux brothers. i imagine that it actually takes some effort to produce platform-dependent applications using platform-independent languages and frameworks. such effort should be put toward other things. keep it pure. and by that i mean, keep it platform independent. freedom to choose and presumed equality among the server-side oses should reign. n responsibility and good sense one inherits the flaws of external libraries and frameworks. choose with care. so you’ve installed the os, you’ve installed and configured the specified web server, you’ve installed and configured the application platform, you’ve downloaded and compiled the source, yet there remains a long list of external libraries to install and configure. one by one you install them. suddenly, when you get to library number 16 you hit a snag. it won’t install. it requires a previous version of library number 7, and multiple versions of library number 7 can’t be installed at the same time on the same box. worse yet, as you take a break to read some more of the documentation, it sure looks like required library number 19 is dependent on the current version of library number 7 and won’t work with any previous version. and could it be that library number 21 is dependent on library number 20, which is dependent on library number 23, which is dependent on—yikes—library number 21? mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect, library digital programs group, sheridan libraries, johns hopkins university in baltimore. from our readers: virtues and values in digital library architecture | cyzyk 9 all things come full circle. but let’s suppose you’ve worked out all of these dependencies, you’ve figured out the single, secret order in which they must install, you’ve done it, and it looks like it’s working! yet, when you go to boot up the web service, suddenly there are errors all over the place, a fearsome crashing and burning that makes you want to go home and take a nap. something in your configuration is wrong? something in the way your configuration is interacting with an external library is wrong? you search the logs. you gather the relevant messages. they don’t make a lot of sense. now what to do? you search the lists, you search the wikis to no avail, and finally, in desperation, you e-mail the developers. “but that’s a problem with library x, not with our application.” au contraire. i would like to strongly suggest a copernican revolution in how we think about such situations. while it’s obvious that the developers of the libraries themselves are responsible for developing and maintaining them, i’d like to suggest that this does not relieve you, the developer of a system that relies on their software, from responsibility for its bugs and peculiar configuration problems. i’d like to suggest that, far from pushing responsibility in the case mentioned above out to the developers of the malfunctioning external library, that you, in choosing that library in the first place, have now inherited responsibility for it. even if you don’t believe in this notion of inheritance, if you would please at least act as if it were true, we’d all be in a better place. part of accepting this kind of responsibility is you then acting as a conduit through which we poor implementers learn the true nature of the problem and any solutions or temporary workarounds we may apply so that we can get your system up and running pronto. in the end, it’s all about your system. your system as a whole is only as strong as the weakest link in its chain of dependencies. n simplicity and perfection installation procedures must be simple and flawless. it goes without saying that if we can’t install your system we a fortiori can’t adopt it for use in our organization. i remember once having such a difficult time trying to get a system up and running that i almost gave up. i tried first to get it running against apache 1.4, then against apache 2.0. i had multiple interactions with the developers. i banged my head against the wall of that system for days in frustration. the documentation was of little help. it seemed to be more part of an internal documentation project, a way for the developers to communicate among themselves, than to inform outsiders like me about their system. and related to this i remember driving to work during this time listening to a report on npr about the famous hopkins pediatric neurosurgeon, dr. ben carson. apparently, earlier in the week he had separated the brains of siamese twins and the twins were now doing fine, recuperating. the npr commentator marveled at the intricacy of the operation and at the fact that the whole thing took, i believe, five hours. “five hours? five hours?!” i exclaimed while barreling down the highway in my vintage 1988 ford ranger pickup (head gasket mostly sealed tight, no compression leakage). “i can’t get this system at work installed in five days!” our goal as system architects needs to be that we provide to our users simple and flawless installation procedures so that our systems can, on average, be installed and configured in equal or less time than it takes to perform major brain surgery.1 “all in an afternoon” should become our motto. i am happy to find that there are useful and easy to use package managers, e.g., yum and synaptic, for doing such things on various linux distributions. windows has long had solid and sophisticated installation utilities. tomcat supports drop-in-place war files. when possible and appropriate, we need to use them. n justice and e-z livin don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. i remember reading plato’s republic as an undergraduate and the feeling of being let down when the climax of the whole thing was a definition in which “justice” simply is each man serving his proper place in society and not transgressing the boundaries of his role. “that’s it?” i thought. “so you have this rigidly hierarchical society and each person in it knows his role and knows in which slot his role fits—and keeping to this is ‘justice’?” this may not be such a great way to structure a society, but now that i think about it, it’s a great way to structure a computer application. sit down and carefully look at the functions your program will provide. then create a small set of user roles to which these functions will be carefully mapped. in the end you will have a hierarchical structure of roles and functions that should look perfectly simple and rational when drawn on a piece of paper. and while the superuser role should have power over 10 information technology and libraries | september 2008 all and access to all functions in the application, the list of functions that he alone has access to should be small, i.e., the actual work of the superuser should be minimized as much as possible by making sure that most functions are delegated to the members of other, appropriate, proper user roles. doing this happily results in what i call the state of e-z livin: the last thing you want is for users to constantly be calling you with data issues to fix. you therefore will model management of the data—all of it—and the configuration of the application itself—most of it— directly into the architecture of the application, provide users the guis they need to configure and manage things themselves, and push as much functionality as you can out to them where it belongs. let them click their respective ways to happiness and computing goodness. you build the tool, they use it, and you retire back to the land of e-z livin. users are assigned to their roles, and all roles are in their proper places. application architecture justice is achieved. n clarity and wholeness documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” as system developers we’ve probably all had the magical experience of a mind meld with a fellow developer when working intensively on a project. i have had this experience with two other developers, separately, at different stages of my career. (one of them, in fact, used to point out to everyone that, “between the two of us, we make one good developer!”) this is a wonderful and magical and productive working relationship in which to be, and it needs to be recognized, supported, and exploited whenever it happens. you are lucky if you find yourself designing and developing a system and your counterpart is reading your mind and finishing your sentences. however, just as it’s best to leave that nice young couple cuddling in the corner booth alone, so too it really doesn’t make a lot of sense to expect the mind-melded developers to turn out anything that remotely resembles coherent and understandable documentation. those undergoing a mind meld by definition know perfectly well what they mean. to the rest of us it just feels like we missed a memo. if you have the luxury, make sure that the one writing the documentation is not currently undergoing a mind meld with anyone else on the development team. scotty typically stayed behind while he beamed the others down. beam them down. be that scotty. you do the world a great service by staying behind on the ship and dutifully reporting, clearly and comprehensively, what’s happening down on the red planet. to these five maxims, and their corresponding virtues, i would add one more set, one upon which the others rely: n empathy and graciousness you are not your audience. at least in applied computing fields like ours, we need to break with the long-held “guru in the basement” mentality. the actions of various managerial strata have now ostensibly acknowledged for us that technical expertise, especially in applied fields, is a commodity, i.e., it can be bought. a dearth of such expertise is remedied by simply applying money to the situation—admittedly difficult to do at the majority of institutions of higher education, but a common occurrence at the wealthiest. nevertheless, the dogmatic hold of the guru has been broken and the magical aura that once draped her is not now so resplendent—her relative rarity, and the clubby superiority that depended upon it, has been diluted significantly by the sheer number of counterparts who can and will gleefully fill her function. we respect, value, and admire her; it’s just that her stranglehold on things has (rightfully) been broken. and while nobody is truly indispensable, what is more difficult and rare to find is someone who has the guru’s same level of technical chops coupled with a genuine empathic ability to relate to those who are the intended users of her systems and services. unless your systems and services are geared primarily toward other developers, programmers, and architects— and presumably they are not, nor, in the library world, should they be—your users will typically be significantly unlike you. let me repeat that: your users are not like you. rephrased: you are not your audience. when looking back over the other maxims, values, and virtues mentioned in this essay then, the moralpsychological glue that binds them all is composed of empathy for our users—faculty, students, librarians, non-technical staff—and the graciousness to design and carry out a project plan in a spirit of openness, caring, flexibility, humility, respect, and collaboration. when empathy for the users of our systems is absent—and there are cases where you can actually see this in the design and documentation of the system itself—our systems will ultimately not be used. when the spirit of graciousness is broken, men become robots, mere rule followers, and users will boycott using their systems and will look elsefrom our readers: virtues and values in digital library architecture | cyzyk 11 where, naturally preferring to avoid playing the simonsays games so often demanded by tech folk in their workaday worlds; there is a reason the comic strip dilbert is so funny and rings so true. when confronted with a lack of empathy and graciousness on our part, the users who can boycott using our systems and services will boycott using our systems and services. and we’ll be left out in the rain, feeling like, as bonnie raitt once sadly sang, “i can’t make you love me if you don’t / i can’t make your heart feel something it won’t.” empathy and graciousness, while not guaranteeing enthusiastic adoption of our systems and services, are a necessary precondition for users even countenancing participation. there are undoubtedly other virtues and values that can usefully be expounded in the context of digital library architecture—consistency, coherence, and elegance immediately come to mind—and i could go on and on analyzing the various maxims surrounding these that bubble up through the stack of consciousness during the course of the day. yet doing so would conflict with another virtue i think is key to the success and enjoyment of opinionpiece essays like this and maybe even of other sorts of publications and presentations: brevity. note 1. a colleague of mine has since informed me that carson’s operation took twenty-five hours, not five. nevertheless, my admonition here still holds. when installation and configuration of our systems are taking longer, significantly longer, than it takes to perform major brain surgery, surely there is something amiss? supporting faculty’s instructional video creation needs for remote teaching: a case study on implementing eglass technology in a library multimedia studio space article supporting faculty’s instructional video creation needs for remote teaching a case study on implementing eglass technology in a library multimedia studio space hanwen dong information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.15201 hanwen dong (hanwendong@uidaho.edu) is instructional technology librarian, university of idaho. © 2023. abstract in 2021, alongside seven colleges at the university of idaho campus, the university of idaho library received an eglass system (https://eglass.io) with funding from the governor’s emergency education relief grant to expand faculty’s capacity to create instructional videos. the eglass is a transparent glass whiteboard that allows instructors to write, draw, and annotate. it comes with a built-in camera that can capture instructors’ facial expressions and gestures while facing their remote students and allow better engagement. the eglass is suitable for creating asynchronous instructional videos for flipped classrooms and integrating zoom for synchronous online classes. this article details the eglass equipment setup, studio space optimization, outreach efforts and initiatives, usage examples of early adopters, lessons learned during the first year of the eglass deployment, and future considerations. introduction in 2021, the university of idaho library (library) received a transparent glass whiteboard called the eglass for faculty to record video-based lectures. the eglass was based on a similar glass whiteboard technology, called the lightboard, that the library already owned. initially built by university of idaho engineering students and later gifted to the library, the lightboard presented challenges to library staff as properly supporting the technology required spending a significant amount of time. offering similar functionalities, the eglass had the potential to also address the issues that the lightboard presented. similar to the lightboard, the eglass allowed instructors to write and draw on the glass while facing their audience, typically students who would be watching the recorded videos later, to provide better engagement. the eglass could also be used for creating asynchronous instructional videos for flipped classrooms and integrating zoom for synchronous online classes. to implement the eglass, it was necessary to consider factors such as the functionality, the space to be occupied, and faculty interest. a year after the original deployment of this tool, the author reports on the lessons learned in this article. lessons including the eglass equipment setup, multimedia studio space optimization, outreach efforts and initiatives, usage examples of early adopters, lessons learned, and future considerations are explored later in this article. background the studio in the university of idaho library provides space and audiovisual equipment to students, faculty, and staff to pursue curricular, personal, and creative multimedia projects. mailto:hanwendong@uidaho.edu https://eglass.io/ https://eglass.io/ information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 2 dong originally converted from a 200-square-foot meeting room, the studio is equipped with a 27-inch imac, a 32-inch full-hd victek monitor, a scarlett 18i20 audio interface, a dbx 266xs 2-channel compressor/gate, two krk’s rokit 5 g3 powered studio monitors, two shure sm58 dynamic vocal microphones with microphone arm stands and pop filters, several portable lights, a green screen, and more. software installed on the imac includes audacity, camtasia, and the essential adobe creative cloud applications such as photoshop, premiere pro, indesign, etc. patrons can use the studio software and equipment to record voice-over narrations and podcasts as well as to edit multitrack audio clips and videos. in addition to using the studio equipment, patrons can also borrow other multimedia equipment, such as video camcorders, audio recorders, tripods, a usb microphone, and a dslr camera, at the circulation desk. initially managed by two library support staff, both of whom left the organization to pursue other opportunities, the studio operations were taken over by the author in 2020. due to the covid-19 pandemic and the lack of air ventilation in the space, the studio was closed in march 2020 and did not reopen until august 2021. while any university-affiliated patron is welcome to use the studio, first-time users were expected to complete an orientation with the author to become familiar with the equipment setup and the audio workflow. to use the studio, patrons had to make reservations, up to two weeks in advance, for up to two hours per day. reservations were made from the studio’s webpage and managed through springshare’s libcal product. patrons who frequented the studio pursued various personal, creative, instructional, and curriculum-related projects, including video recording with the green screen, video editing, podcast recording, voice-over narration recording, etc. the studio was used by patrons several times a week. according to the libcal space statistics, in fall semester 2021, the studio had 48 unique users, 147 total bookings, 211 hours booked, and the average reserved time block was 86 minutes. in spring semester 2022, the studio had 30 unique users, 64 total bookings, 103 hours booked, and the average reserved time block was 97 minutes. a noticeable usage drop in the spring semester was likely due to a reduced number of advertised studio orientations provided to the campus community and fewer classroom assignments that required or promoted studio use. for several years, the studio was home to a lightboard for faculty to record class lectures. designed as open-source hardware by dr. michael peshkin from the mccormick school of engineering at northwestern university, the lightboard was a transparent glass whiteboard illuminated with a built-in light, and the ink would glow in low-light environments. instructors could write and draw on the glass with neon markers while facing the viewers, and the writings and drawings along with the instructor could all be captured in the same frame using a separate camera.1 dr. peshkin provided two solutions for those who were interested in acquiring a lightboard : buying a commercially-produced one or building one from scratch. the lightboard in the studio was built by a group of students in a mechanical engineering class for a senior capstone project as part of a design challenge in partnership with the center for excellence in teaching and learning (cetl), and the students later gifted the lightboard to the library. the lightboard that the studio received came with a steel frame and wheels. the unit’s overall dimensions were 75 inches long, 45 inches wide, and 78 inches high. the glass board itself measured 71.5 by 47.5 inches (see figure 1). information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 3 dong figure 1. the lightboard that the library received. the lightboard was used by a few instructors who frequented the studio over the years. during fall 2019, one faculty member from the college of natural resources regularly used the lightboard two to three times per week for about 45 minutes to an hour per session. another engineering faculty member, whose students built the lightboard, also used the lightboard several times but did not have a regularly scheduled appointment. there had not been any regular users since then. recording videos using the lightboard required a complicated setup. first, instructors would need to gather several pieces of equipment. for instance, they would need to check out a video camera and a tripod at the circulation desk downstairs and a lavalier microphone at the room adjacent to the studio. the setup required the lightboard to be positioned between the instructor and the camera. it was necessary to change the camera setting to flip the video horizontally; otherwise, any writings or drawings in the final recording would be displayed backward. additional steps included starting and stopping the camera recordings, checking throughout the recording process to ensure the instructor’s writing on the lightboard stayed within the camera’s frame of capture, and transferring the media from the camera’s sd card to an external hard drive or to cloud storage. as a result, recording a session using the lightboard required assistance from at least one other individual, usually a library staff or faculty member, from start to finish. the many different information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 4 dong moving parts made the whole experience time-consuming and labor-intensive both for the library staff and the lightboard users. literature review lightboard technology has been implemented at various higher education institutions since 2014. thanks to dr. peshkin, who made the lightboard an open-source technology and provided the building instructions on his website, many institutions built their own versions of lightboards with variable setups. due to the nature of the lightboard requiring a controlled lighting environment and the writing being backwards from the perspective of those facing the glass (including the camera), the lightboards were used almost exclusively in dedicated studio spaces where the videos were to be recorded. for instance, similar to the university of idaho library studio setup, the complete setup at the university of western australia consists of a lightboard, a camera, lights, markers, a lapel microphone, and a black canvas.2 a budget setup that cost as little as $100 as a removable, tabletop version was also developed.3 cornell university came up with a lightboard and projector setup that can be used in a live 500-person auditorium.4 needless to say, the lightboard technology was adaptable enough to meet various needs on many campuses. several studies show that, among the various types of instructional videos for asynchronous learning, students favor lightboard videos. one unique feature of the lightboard technology, for example, is that it enables instructors to incorporate their gaze and gestures into the instruction. according to a 2015 study, combining gaze and gestures with traditional instructional materials proved to be more effective in directing students’ attention.5 in a 2019 study, several researchers analyzed various lightboard cases in the context of learning theories and theoretical frameworks, such as cognitive load theory, cognitive theory of multimedia learning, and social learning theory. the researchers concluded that while more empirical research was needed, the lightboard videos could improve student learning and engagement.6 in another study conducted by researchers at the university of illinois urbana-champaign, students watched two types of recorded lectures—picture-in-picture with the instructor appearing in a corner of the video, or an overlay of the instructor without the background. study results showed that the overlay videos where the instructor interacted with the content had more views and were preferred by the students, likely thanks to the gaze and gestures of the instructor increasing accessibility. 7 in classes in which the instructors opted to use the lightboard, students generally responded positively to the lightboard videos. for example, in two online classes at clayton state university, most students preferred the lightboard lecture over the traditional narrated powerpoint lecture, and “students described it as engaging, more personable, appealing to visual learners, easier to follow and retain the information, and more similar to a conventional live lecture.”8 at bond university, in queensland, australia, in a chemistry class where the lightboard videos were incorporated as a learning aid, researchers reported that over a four-year period, students scored higher on exams in courses in which lightboard videos were incorporated as instructional materials.9 in another example, students enrolled in a physics class at san diego state university were exposed to the learning glass, a commercial product that was based on the lightboard technology. students responded in a post-course assessment that they felt more connected to their instructor when the instructor utilized the learning glass, and thus the researchers argued that the learning glass could positively impact stem students’ retention rates.10 lastly, at georgia southern university, two researchers conducted a mixed-method study to assess different groups of students’ perceptions of lightboard videos. the findings showed that while performing equally information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 5 dong well when comparing test scores, the students in the class that incorporated lightboard videos had better understanding, engagement, and satisfaction based on the assessment measures.11 lightboards are not without their drawbacks given the requirements and the limitations of the equipment and the recording conditions. in an engineering class where students used lightboard for a problem-solving assignment to demonstrate their learning, researchers identified the various requirements including a room with sufficient size, the need for filming equipment, and long postproduction processing time.12 other disadvantages of the lightboard included immobility, limited writing surface, and a more rigorous cleaning process.13 the type of content being presented in lightboard videos also required consideration. in a study comparing different types of lecture videos, students showed a strong preference to the learning glass videos and “suggested that this style be used to supplement lecture videos (in the form of practice problems and follow-up videos).”14 this conclusion corroborated another study that a lightboard was useful for step-by-step problem-solving explanations.15 lastly, in a study that examined three different styles of lightboard videos (interview style, multipresenter, and multimedia-enriched), the researchers identified the benefits along with the drawbacks of each style.16 for example, while interview videos highlighted interactions between the presenter and the interviewer, the presenter experienced “difficulty in multitasking between writing notes on the lightboard and attending to the interviewer’s questions.” having several presenters could also limit the amount of space for them to move around and write on the glass while remaining in frame and created possible distractions of having too many people as well as too much writing on the glass. another potential issue is that not all presenters could be wearing darker-colored clothing for better contrast with the writing. eglass context in spring 2021, the manager at the collaboration & classroom technology services (ccts) department at the university of idaho informed the author that they were planning on purchasing several eglass units for the campus to support faculty’s instructional video creation. the funds came from the governor’s emergency education relief (geer) grant to address the covid-19 pandemic’s impact on higher education. initially, the grant was written by several individuals who intended to purchase commercially-made lightboards to enhance distance teaching options. while researching for the grant, the team stumbled upon the eglass, which seemed to be easier to use than the lightboard. the pricing was reasonable, so the team decided to purchase several of these devices instead of the original two lightboards that were originally recommended. if interested, the library could receive one unit alongside eight other colleges on campus. the author checked out the demo unit at ccts and reported the first impressions as a user to the dean of university of idaho libraries. the latter reasoned that due to the lightboard and eglass’s duplicating functionalities and the fact that the eglass had more perceived ease of use given its all in-one package without the lighting and camera being separate, it would be best to replace the lightboard with the eglass. the author contacted the lightboard capstone project faculty member, who chose to rehome the lightboard to the engineering outreach department at the college of engineering. removing the lightboard paved the way for welcoming the eglass to the studio by reclaiming needed room space. information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 6 dong the eglass came in two sizes—a 35-inch and a 50-inch diagonal writing surface. the library received a 50-inch unit with the writing surface measured at 45.64 inches long and 27.40 inches tall. the height of the overall unit could be adjusted to 29.37 inches, 31.33 inches, or 33.31 inches. additional accessories that the library received included a desktop computer, two heightadjustable desks, a touchscreen monitor, a webcam, a ring light, peripherals, neon pens, and white clothes for wiping down the writings. once the order of the eglass came through, a ccts team that consisted of several individuals brought the eglass along with two height-adjustable tables to assemble (see figure 2). the assembling of all the equipment took about an hour. figure 2. ccts team assembling the eglass; disclosure: the shirt logo does not represent any affiliations. description similar to the lightboard, the eglass was made of a sheet of glass and a frame, and the instructors could write on the glass using neon markers. however, the eglass had several distinct features and advantages over the lightboard. first and foremost, the eglass had a built-in camera and the recording function that enabled the instructors to start, pause, and stop the recording on their own with a touch of a button. in addition, the eglass internal system flipped the image automatically in real time so that instructors did not need to write backward. therefore, using the information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 7 dong eglass would not require additional support from library personnel since the separate camera setup was no longer needed. the eglass’s built-in lights were also an improvement over the lightboard’s lights. the lightboard came with one set of lights on the frame that illuminated the writings on the glass, but it was necessary to set up additional portable lights to ensure the instructors were illuminated as well. the eglass came with two sets of lights—the instructor light illuminated the instructor, and the blue glass lights ensured the ink on the glass would glow for better visibility. each set of lights was controlled by a separate knob to adjust the intensity. moreover, the eglass could be used as a standalone unit for simple tasks that involved writing and drawing on the glass. for example, instructors could start, pause, and stop the recording using the touch buttons located below the writing surface on the frame. instructors could also use the freeto-download eglassfusion software to access additional features, such as taking snapshots; importing powerpoint slides, word documents, pdfs, and other types of media files; removing the imported media’s background color; zooming in and out; and annotating by typing texts and drawing rectangles or arrows. figure 3. a faculty member recording a video with an application overlay. while the eglass was connected to a desktop computer via a usb cable, instructors could bring their own devices to connect to the eglass, which supports windows, macos, and chromebook operating systems. with a laptop connected to the eglass, instructors could use the down loaded and installed eglassfusion software to control what they were sharing on their screens. for instance, on their devices, instructors could use video conference software such as zoom and microsoft teams for synchronous online instruction via screen sharing and could switch from their laptops’ camera to the eglass camera as the output video. in addition, students could see the information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 8 dong writings and drawings on the glass, the instructor’s face, body, gestures, and any programs opened on the instructor’s laptop on the same screen (see example in figure 3). lastly, instructors could choose to use the eglass while sitting down or standing up as the eglass was placed on a height-adjustable desk. the desktop computer, touchscreen monitor, webcam, and ring light enabled a one-button studio setup. instructors could open any video recording software when pressing the button to start a recording and use the touchscreen monitor for zoom whiteboard and camtasia for screencast recording with annotating. outreach the new equipment setup was completed a few weeks before the start of fall semester 2021. ccts sent out an announcement to the university daily newsletter targeted to faculty and staff to advertise that the eglass had been set up at various locations on campus. the author also provided 20 in-person studio orientations sessions, scheduled at 10:00 a.m. and 2:00 p.m. monday–friday during the first two weeks of classes, to campus students, faculty, and staff. prior sign-ups were not necessary, so patrons could simply show up at the orientation time. these orientations provided an overview to patrons unfamiliar with the studio or any pieces of the existing or new equipment. among the 36 patrons who showed up to the orientations, three faculty members were introduced to the eglass and one-button studio. several additional informational and educational workshops were conducted to promote awareness of the eglass. in the fall semester, ccts hosted a workshop introducing the eglass. due to the limited physical space in the studio that could only comfortably accommodate less than five people, the workshop was hosted in a hybrid format with the in-person location in a room adjacent to the studio. participants could choose to attend either via zoom or in person. if attending in person, participants could visit the studio after the workshop to check out the eglass setup and try out the equipment. workshop attendees noticed that the writing on the eglass was difficult to differentiate from the white wall, which served as the background. after the workshop, the author ordered some black wallpaper and applied it to the wall facing the eglass to help improve the contrast. in the 2022 spring semester, the author facilitated an online library workshop to introduce the eglass, its core features, advantages over the traditional white/blackboard or zoom instructions, examples of applicable disciplines to use eglass for instruction, and best practices to five faculty and two staff attendees. another event to promote the eglass was the engineering design expo at the university of idaho college of engineering, an annual event that showcases design projects created by students. this event attracted regional k–12 students, community college students, industry partners, and community partners. the makerfaire, an event that featured makerspace technologies and a drone demonstration, took place on the same day as the expo. due to the perceived impact of eglass and its application to stem instructions, marketing eglass to the stem audience seemed to be a natural fit. thanks to the assembling ease, the author staffed a table at the makerfaire with a smaller eglass unit loaned from another campus location. the author demoed the eglass to passersby, including students, faculty, and community members. lastly, the active learning symposium is an annual event hosted by ccts and cetl at the university of idaho. in 50-minute presentations, instructors shared their teaching strategies to promote active learning in their classrooms. the author reached out to one eglass regular user, information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 9 dong the computer science department chair, to co-present at the symposium to introduce the eglass and showcase some eglass videos created for a computer science class. usage in the 2021–2022 academic year, two faculty members regularly reserved the studio to use the eglass. one faculty member was the chair of the computer science department, and the other person was in the animal, veterinary and food sciences department. after attending an orientation to the equipment, setup, and software, the faculty members reserved the space and recorded on their own a few more times without the need for support from the author or a staff member. one of the initial goals of replacing the lightboard with the eglass was to free up library staff time to support faculty recording lectures, and the author believed that having this new equipment reached this goal. about halfway through the fall semester in 2021, the author added a checkbox for patrons to indicate their intended studio usage when making a reservation on the library website. based on the statistics generated by libcal, in addition to the two faculty members, five students booked the studio to create instructional videos. however, since none of the students reached out to the author directly and the studio was not staffed, it was not possible to confirm if the students used the eglass or any other pieces of equipment in the studio for video creation. regardless, the overall usage of eglass was lower than anticipated, and the author believed that there were several contributing factors. first, the equipment was not properly set up until the end of summer. several faculty who heard of the eglass expressed interest in using it to prepare for fall instruction, but shipping delays prevented the equipment from being delivered and set up in time. moreover, since several other colleges also received the eglass, faculty members who could access a unit at their colleges chose not to check out the library studio location despite the additional equipment and the optimized space to help improve the user experience. lastly, despite the marketing efforts, the author suspected that the majority of campus was still not aware of the existence of the eglass technology, so additional outreach was probably still needed. lessons learned after overseeing the studio with the new eglass equipment for two semesters, the author underestimated the amount of work to promote the eglass—the saying that “if you build it, they will come” does not always ring true. ensuring that the eglass was adopted by more faculty members required a lot of dedicated effort. identifying several early adopters who saw the value of the technology and were willing to advocate for it by spreading the word to their colleagues was key. even then, the author noticed that the two faculty members who had been using the eglass had stopped coming to the studio regularly after several sessions. keeping faculty engaged despite their diminishing interest in using the equipment was an issue that the author did not anticipate or resolve. in the 2022–2023 academic year, the library engaged in an organization-wide reorganization that halted several existing and anticipated work priorities, one of which was conducting studio space and service assessments. in the 2023–2024 academic year, through a collaborative effort with the new department administrator, the author hopes to improve the studio and eglass usage by planning promotional initiatives and resuming assessment activities. the space to place the equipment, on the other hand, was another consideration. while it was decided to put the eglass in the studio so that the lightboard could be replaced, the physical unit of the 50-inch eglass took more space than the original lightboard. occasionally, the author received information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 10 dong requests from patrons who wanted to use the studio to record videos using a green screen. while it was still manageable to set up a green screen in the remaining space, the lack of room made patrons’ recording experience feel cramped and awkward. overall, for a 200-square-foot studio that had a computer desks, audio equipment racks, portable lights, housing the eglass was less ideal than anticipated. moreover, in order for the studio to be optimized for using the eglass, the lighting, sound, and background required permanent adjustments. for example, after the initial setup, the eglass was facing a white wall in the studio. ideally, the background needs to be dark to help contrasting the lighter neon color writings on the glass. possible solutions included installing a black backdrop, painting the wall black, or applying black wallpaper. installing a backdrop with curtains was the most expensive and time-consuming option, and painting the wall would require temporary closure of the studio. the author opted to order black wallpaper from amazon.com to minimize the disruption to studio operations during the regular semester. the wallpaper cost less than a hundred dollars and applying it to the wall only required an hour, but eventually the adhesive started to wear off. the author decided to remove the wallpaper over the summer and contacted the facilities department to paint the wall black, which took time for removing and restoring the equipment in addition to the time for the wall to dry. lighting was another challenge since the eglass required a light-controlled environment. ideally, all the lights in the room should be turned off for patrons who wanted to use the eglass so that the writings and drawings on the glass were highly visible. some fluorescent lights in the studio were emergency lights that could not be turned off by flipping the light switches. the author had to manually disable some of the lights for the eglass users. the last space-related challenge was sound. the eglass came with a built-in microphone that did not require a separate microphone setup. however, the eglass was placed close to the walls in the studio due to a lack of space which caused some reverberations, lowering the overall sound quality. the sound could be improved if patrons used a headset with microphone and connected the headset to the computer dedicated to the eglass. installing acoustic wall panels was another viable option, and the author might consider such an approach if the usage of the eglass grew to justify the equipment purchase. conclusion the eglass technology at the university of idaho library offered an improved instructional video creation experience to the campus community. thanks to the eglass’s easier setup compared to the lightboard and the studio space improvement in terms of the controlled lighting and the black wall, faculty were greatly benefitted from having access to a tool that enabled them to create engaging videos for classes delivered in online and hybrid modalities. however, additional dedicated outreach efforts are needed for a wider campus adoption. at the university of idaho, seven other colleges on campus owned eglass alongside the library, and there has not been any coordinated communications to promote the technology among all locations. while marketing emails and newsletters would work well for most new services, it is the author’s opinion that potential users would better understand the applicability of the eglass to their instruction when they are able to see the physical unit in person. more in-person outreach, such as inviting faculty to the studio or attending departmental faculty meetings to show videos made using eglass, would be of help. information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 11 dong for other institutions that might be interested in acquiring an eglass or a similar technology, the author would suggest conducting an environment scan first to determine the campus need. are there faculty on campus who could benefit from this type of technology to achieve their instructional goals? are there any existing spaces on campus that offer comparable services or resources? if the library administration was interested in acquiring the technology for the library, is there an existing space that would be suitable for placing the equipment? would the library invest in the room so that the lights could be fully controlled, sounds could be proofed or dampened, and a background could be darkened? would there be a staff member to be assigned as the dedicated person to support and maintain the technology? the author hopes that this case study presents a myriad of ideas for those considering adopting a technology similar to an eglass at their libraries. endnotes 1 michael peshkin, “lightboard.info,” https://www.lightboard.info/. 2 timothy r. corkish et al., “a how-to guide for making online pre-laboratory lightboard videos,” in advances in online chemistry education, acs symposium series, vol. 1389 (washington, dc: american chemical society, 2021), 77–91, https://doi.org/10.1021/bk2021-1389.ch006. 3 katrina hay and zachary wiren, “do-it-yourself low-cost desktop lightboard for engaging flipped learning videos,” the physics teacher 57, no. 8 (november 1, 2019): 523–25, https://doi.org/10.1119/1.5131115. 4 erik s. skibinski et al., “a blackboard for the 21st century: an inexpensive light board projection system for classroom use,” journal of chemical education 92, no. 10 (october 13, 2015): 1754– 56, https://doi.org/10.1021/acs.jchemed.5b00155. 5 kim ouwehand, tamara van gog, and fred paas, “designing effective video-based modeling examples using gaze and gesture cues,” journal of educational technology & society 18, no. 4 (2015): 78–88. 6 mark lubrick, george zhou, and jingsheng zhang, “is the future bright? the potential of lightboard videos for student achievement and engagement in learning,” eurasia journal of mathematics, science and technology education 15, no. 8 (april 11, 2019): em1735, https://doi.org/10.29333/ejmste/108437. 7 suma bhat, phakpoom chinprutthiwong, and michelle perry, “seeing the instructor in two video styles: preferences and patterns” (paper, international conference on educational data mining, madrid, spain, june 26–29, 2015), https://eric.ed.gov/?id=ed560520. 8 sheryne southard and karen young, “an exploration of online students’ impressions of contextualization, segmentation, and incorporation of light board lectures in multimedia instructional content,” the journal of public and professional sociology 10, no. 1 (january 5, 2018), https://digitalcommons.kennesaw.edu/jpps/vol10/iss1/7. https://www.lightboard.info/ https://doi.org/10.1021/bk-2021-1389.ch006 https://doi.org/10.1021/bk-2021-1389.ch006 https://doi.org/10.1119/1.5131115 https://doi.org/10.1021/acs.jchemed.5b00155 https://doi.org/10.29333/ejmste/108437 https://eric.ed.gov/?id=ed560520 https://digitalcommons.kennesaw.edu/jpps/vol10/iss1/7 information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 12 dong 9 stephanie s. schweiker and stephan m. levonis, “a quick guide to producing a virtual chemistry course for online education,” future medicinal chemistry 12, no. 14 (july 1, 2020): 1289–91, https://doi.org/10.4155/fmc-2020-0103. 10 shawn firouzian, chris rasmussen, and matthew anderson, “adaptations of learning glass solutions in undergraduate stem education,” in proceedings of the 19th annual conference on research in undergraduate mathematics education, (pittsburgh, pennsylvania: special interest group of the mathematical association of america on research in undergraduate mathematics education, 2016), 751–60, http://sigmaa.maa.org/rume/rume19v3.pdf. 11 peter d. rogers and diana t. botnaru, “shedding light on student learning through the use of lightboard videos,” international journal for the scholarship of teaching and learning 13, no. 3 (2019), https://eric.ed.gov/?id=ej1235871. 12 kenneth r. hite et al., “effects of lightboard usage on circuit problem skills,” in 2017 ieee frontiers in education conference (fie) proceedings, (ieee, 2017), 1–4, https://doi.org/10.1109/fie.2017.8190529. 13 weibing ye, “lightboard and chinese language instruction,” journal of technology and chinese language teaching 7, no. 2 (december 31, 2016): 97–112. 14 ronny c. choe et al., “student satisfaction and learning outcomes in asynchronous online lecture videos,” cbe—life sciences education 18, no. 4 (december 2019): ar55, https://doi.org/10.1187/cbe.18-08-0171. 15 julia vandermolen, kristen vu, and justin melick, “use of lightboard video technology to address medical dosimetry concepts: field notes,” current issues in emerging elearning 4, no. 1 (june 13, 2018), https://scholarworks.umb.edu/ciee/vol4/iss1/6. 16 christoph dominik zimmermann et al., “utilizing the power of blended learning through varied presentation styles of lightboard videos,” in technology-enabled blended learning experiences for chemistry education and outreach, ed. fun man fung and christoph zimmermann (elsevier, inc., 2021), 31–40, https://doi.org/10.1016/b978-0-12-822879-1.00003-2. https://doi.org/10.4155/fmc-2020-0103 http://sigmaa.maa.org/rume/rume19v3.pdf https://eric.ed.gov/?id=ej1235871 https://doi.org/10.1109/fie.2017.8190529 https://doi.org/10.1187/cbe.18-08-0171 https://scholarworks.umb.edu/ciee/vol4/iss1/6 https://doi.org/10.1016/b978-0-12-822879-1.00003-2 abstract introduction background literature review eglass context description outreach usage lessons learned conclusion endnotes 44 information technology and libraries | december 2007 author id box for 3 column layout column titlecommunications afghanistan digital library initiative: revitalizing an integrated library system yan han and atifa rawan this paper describes an afghanistan digital library initiative of building an integrated library system (ils) for afghanistan universities and colleges based on open-source software. as one of the goals of the afghan equality digital libraries alliance, the authors applied systems analysis approach, evaluated different open-source ilss, and customized the selected software to accommodate users’ needs. improvements include arabic and persian language support, user interface changes, call number label printing, and isbn-13 support. to our knowledge, this ils is the first at a large academic library running on open-source software. the last quarter­century has been devastating for afghanistan, with an uninterrupted period of inva­ sions, civil wars, and oppressive regimes. “since 1979, the education system was virtually destroyed on all levels. schools and colleges were closed, looted, or physically reduced; student bodies and faculties were emptied by war, migration, and eco­ nomic hardship; and libraries were gutted.”1 kabul university (ku), for example, was largely demolished by 1994 and completely closed down in 1998. it is universally recognized that afghanistan desperately needs trained faculty, teachers, librarians, and staff. the current state of the higher education system is one of dramatic destruction and deteriora­ tion. based on rawan’s assessments of ku library, most of its collections were damaged or destroyed. she found that there were approximately 60,000 to 70,000 books in english, 2,000 to 3,000 books in persian, and 2,000 theses in persian. none of these collections have manual or online catalog records. the library has eigh­ teen staff members, but not all are fully trained in library activities.2 rebuilding the educational infra­ structure in afghanistan is essential. afghan equality digital libraries alliance the university of arizona (ua) library has been involved in rebuilding academic libraries in afghanistan since april 2002. in 2005, we were invited to be part of the digital libraries alliance (dla) as part of the afghan equality alliances: 21st century universities for afghanistan initiative funded by the usaid and washington state university. dla’s goal is to build the capacity of afghan libraries and librarians to work with open source digital libraries platforms; and to provide and enhance access to schol­ arly information resources and open content that all afghanistan univer­ sities can share. revitalizing the afghan ils an integrated library system (ils) usually includes several critical com­ ponents, such as acquisitions, cat­ aloging, catalog (search and find), circulation, and patron management. traditionally it has been the center of any library. recent developments in digital libraries have resulted in dis­ tributed systems in libraries, and the ils is treated as one of many digital library systems. it still is critical to have a centralized ils to provide a primary way to access library­owned materials for afghanistan universi­ ties and colleges. other services, such as interlibrary loan and other digital library systems, can be further devel­ oped to extend libraries’ services to users and communities. the ua library is working collab­ oratively with other dla members, including universities around the world and universities in afghanistan. one of the goals is to develop a digital library environment, includ­ ing a centralized ils for four aca­ demic universities in kabul (kabul university, polytechnic university, kabul medical university, and kabul education university). in the future, the ils will include other regional institutions throughout afghanistan. the ils will support 30,000 students and 2,000 faculty in afghan universi­ ties and colleges. overview of the ils market currently the ils market is primar­ ily dominated by commercial sys­ tems, such as innovative interface, endeavor, and sirsi. compared with other computing areas, open­source systems in ils are immature and limited, as there are only a few prod­ ucts available, and most of them do not have the full features of an ils. however, they are providing a valu­ able alternative to those costly com­ mercial systems. based on the availability of exist­ ing funding, experiences with com­ mercial vendors, and consideration of vendor supports and future direc­ tions, the authors decided to build a digital library infrastructure with the open concept (open access, open source, and open standards). the decision is widely influenced by glo­ balization, open access, open source, open standards, and increasing user expectations. at the same time, the decision gives us an opportunity to develop and integrate new tools and services for libraries as suggested by the university of california.3 koha is probably the most renowned open­source ils. it is yan han (hany@u.library.arizona.edu) is systems librarian and atifa rawan (rawana@u.library.arizona.edu) is librarian at the university of arizona libraries, tucson. afghanistan digital library initiative | han and rawan 45 a full­featured ils, developed in new zealand and first deployed in horowhenua library trust in 2000. so far koha has been running in a few public and special libraries. the underlying architecture is the linux, apache, mysql, and perl (lamp) stack. building on a simi­ lar lamp (linux, apache, mysql, and php) architecture, openbiblio has a relatively short history, releas­ ing its first beta 0.1.0 version in 2002 and currently in beta 0.5.1 version. webils is an open­source ils based on unesco’s cds/isis database, developed by the institute for computer and information engineering in poland. the software has some ils features, including cataloging, catalog (search and find), loan, and report modules. weblis must run on windows and window­ based web servers, such as xitami/ microsoft iis and isis database. gnuteca, another open­source ils widely deployed in south america universities, was developed in brazil. as with webils, it has some ils features, such as cataloging, cata­ log, and loan; however, the software interface is written in portuguese, which presents a language barrier for u.s. and afghanistan users. the paper open source integrated library systems provides a good overview of other systems.4 systems analysis the authors adopted systems analy­ sis by taking account of afghan col­ lections, users’ needs, and systems functionality required to perform essential library operations. koha was chosen as the base software, due to its functionality, maturity, and support. some of the reasons are: ■ the software architecture is open­ source lamp, which is popular, stable, and predominant. ■ our staff have skills in these open software systems. ■ it is a full­featured open­source ils. certain components, such as multiple branch support and user management, are critical. ■ two large public libraries serv­ ing population of 30,000 users in new zealand and united states have been running their ils on koha for a few years. the soft­ ware is stable, and most bugs have been fixed. ■ koha has a mailing list that is used by koha developers and users as a communication tool to ask and answer questions. kabul universities have com­ puter science faculty and students who have the capacity to participate in the development. due to working schedules and locations, we prefer to develop and maintain the system in the ua library. the technical project team consists of three people: yan han, who is responsible for manag­ ing the overall implementation and development in the open source ils system; one part­time (twenty hours per week) student developer whose major task is to develop and man­ age source code; and a temporary student (ten hours per week for two months) responsible for translating english to farsi and dari. testing tasks, such as unit testing and sys­ tem testing, are shared by all mem­ bers of the team. major challenges farsi and dari languages support koha version 2.2 cannot correctly handle east asian language records, including farsi and dari records. supporting persian, farsi, and dari records is a very important require­ ment, as these afghan universities have quite a few persian and dari materials. koha generates a web­ based graphical user interface (gui) through perl included templates that use a html meta tag with western character set (iso­8559­1) to encode characters. browsers such as internet explorer and firefox use the meta tag to decode characters with a predefined character set. therefore, other characters, such as arabic and persian as well as chinese would not be displayed correctly. the perl tem­ plates were identified and modified to allow characters to be encoded in unicode, and this solved the prob­ lem. persian and dari characters can be entered into the cataloging module and displayed correctly in the gui. however, we should understand the limitations of this approach when dealing with other east asian character sets, such as chinese characters. only frequently used characters can be represented. a project of academia sinica is one of the efforts to deal with 65,000 unique chinese characters.5 farsi/dari gui as the project is designed for local afghanistan users, there is a need for a farsi and dari gui. the current version of koha does not have such an interface, and we decided to create a new farsi/dari gui for the opac. the koha system’s internal structure is logically arranged; therefore, our development work in translation is not difficult to manage. the transla­ tion student translates english words in perl template files into farsi and dari. at the same time he works with the developer to make sure it is dis­ played correctly in the opac. figure 1 is the screenshot of the gui. other improvements we further developed a spine label printing module and integrated the module into the ils, as there is no such function provided. the module allows library staff to print one or more standardized labels (1.5 inches high by 1 inch wide) with oclc formats on gaylord lsl 01 paper, which has fifty­six labels per sheet. 46 information technology and libraries | december 2007 lstaff can select an appropriate label slot to start and print out his or her choices of labels through the web preview feature. this feature eases library staff operations and provides cost savings for label papers. isbn­13 replaced isbn­10 after january 1, 2007, and any ils has to be able to handle the new isbn­13. our ils has been improved to han­ dle both isbn standards. thanks to koha’s delegation of the gui and major functionality, interfaces such as fonts and web pages can be modi­ fied through the templates and css. a z39.50 service has been configured to allow users to search other librar­ ies’ catalogs. hardware and software support afghanistan is still developing its fun­ damental infrastructure: electricity, transportation, and communication. when considering buying hardware for the ils, difficult issues, such as server services and computer parts, have to be solved. even international it companies, such as dell, hp, and ibm, have very limited services and support in afghanistan. regarding software and system support, our strategies are to: ■ maintain and develop the open source software at the ua library by the project team; ■ run one server in kabul, afghanistan, administrated by a local system administrator. ■ run one server in the ua library administrated by the library’s system administrator. cost we estimated our overall cost for building the open­source system is low and reasonable. the system is currently run­ ning on a dell 2800 server ($5,000 for 3ghz cpu, 4gb ram, and five 73gb hard drives), kernel built debian linux (free), apache 2 (free), mysql (free), and perl (free). han spends four hours per week for coor­ dination, communication, and man­ agement of the project. the student developer works twenty hours per week for development and mainte­ nance, while the translation student will spend one hundred hours for translation. conclusion revitalizing an afghan ils is the first important goal to build digital library initiatives for the afghanistan higher education system. by under­ standing afghan university librar­ ies, collections, and users, the ua library is working with other dla members to build the open source ils. the new farsi and dari user interface, language support, and other improvements have been made to meet needs of afghan uni­ versities and colleges. the cost of using and developing existing open source software is reasonable. acknowledgments we thank usaid, washington state university, and other dla mem­ bers for providing support. this work was supported by usaid and washington state university. references and notes 1. nazif sharani et. al., conference transcription, conference on strate­ gic planning of higher education for afghanistan, 2002, indiana university, bloomington, oct. 6–7. 2. atifa rawan, transformation in afghanistan: rebuilding libraries, paper presented at azla conference, mesa, ariz., oct. 11–13, 2005. 3. the university of california libraries, rethinking how we provide bibliographic services for the university of california, 2005, http://libraries.univer­ sityofcalifornia.edu/sopag/bstf/final. pdf. 4. eric anctil and jamshid beheshti, open source integrated library systems: an overview, 2004, www.anctil.org/users/ eric/oss4ils.html (accessed nov. 5, 2006). 5. derming juang et al., “resolving the unencoded character problem for chinese digital libraries,” proceedings of the 5th acm/ieee-cs joint conference on digital libraries, jcdl 2005, denver (june 7–11, 2005): 311–19 (new york: acm pr., 2005). figure 1: afghanistan academic libraries union catalog in farsi/dari lita cover 2, cover 3, cover 4 index to advertisers reducing psychological resistance to digital repositories | quinn 67 and mit mandates, and other mandates such as the one instituted at stanford’s school of education, have come to pass, and the registry of open access repository material archiving policies (roarmap) lists more than 120 mandates around the world that now exist.3 while it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. how many institutions follow and how effective the mandates will be once enacted remains to be seen. will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? what of those that do not? even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? for those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? are mandates the only way to ensure faculty cooperation and compliance, or are mandates even necessarily the best way? to begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. the existing literature on user behavior with regard to digital repositories devotes scant attention to the psychology of resistance. in an article entitled “institutional repositories: partnering with faculty to enhance scholarly communication,” johnson discusses the inertia of the traditional publishing paradigm. he notes that this inertia is most evident in academic faculty. this would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 heterick, in his article “faculty attitudes toward electronic resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. in response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 the implication is that digital repositories and archives may never completely replace hard copies in the minds of scholars. in “understanding faculty to improve content recruitment for institutional repositories,” foster and gibbons point out that faculty complain of having too much work already. they resent any additional work that contributing to a digital repository might entail. thus the authors echo johnson in suggesting that faculty resistance the potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. although many researchers have been resistant to submitting their work, the literature on digital repositories contains very little research on the psychology of resistance. this article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. o bserving the development and growth of digital repositories in recent years has been a bit like riding an emotional roller coaster. even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. the initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research available to anyone with internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. in august 2007, bailey posted an entry to his digital koans blog titled “institutional repositories: doa?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 the success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. salo elaborated on bailey’s post with “yes, irs are broken. let’s talk about it,” on her own blog, caveat lector. salo points out that institutional repositories have not fulfilled their early promise of attracting a large number of faculty who are willing to submit their work. she criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that mandates are impractical.2 subsequent events in the world of scholarly communication might suggest that mandates may be less impractical than salo originally thought. since her post, the national institutes of health mandate, the harvard brian quinn (brian.quinn@ttu.edu) is social sciences librarian, texas tech university libraries, lubbock. brian quinn reducing psychological resistance to digital repositories 68 information technology and libraries | june 2010 whether or not this was actually the case.11 this study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. it can be seen from the preceding review of the literature that several factors have been identified as being possible sources of user resistance to digital repositories. yet the authors offer little in the way of strategies for addressing this resistance other than to suggest workaround solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 in contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. it may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. it should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 this study has chosen to focus primarily on the psychology of resistance because many repositories are situated in a university setting. unlike employees of a corporation, faculty members typically have a greater degree of autonomy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ the psychology of user resistance psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. this motivation may occur on a cognitive, affective, or behavioral level. psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not comply. the source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstanding, ignorance, or confusion.14 it is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 in another article published a few months later, foster and gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cognitive one: faculty have not understood how they would benefit by doing so. the authors also mention that users may feel anxiety when executing the sequence of technical steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 the psychology of resistance may thus manifest itself in both cognitive and affective ways. harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “the influence of academic values on scholarly publication and communication practices.” this perception results in reduced drive to participate. anxiety is another factor contributing to resistance: faculty fear that their work may be vulnerable to plagiarism in an openaccess environment.8 in “towards user responsive institutional repositories: a case study,” devakos suggests that one source of user resistance is cognitive in origin. scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. the same is true for entering metadata for their work.9 their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. davis and connolly point to several sources of confusion, uncertainty, and anxiety among faculty in their article “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace.” cognitive problems arise from having to learn new technology to deposit work and not knowing copyright details well enough to know whether publishers would permit the deposit of research prior to publication. faculty wonder whether this might jeopardize their chances of acceptance by important journals whose editors might view deposit as a form of prior publication that would disqualify them from consideration. there is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; faculty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like google.10 kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “motivating and impeding factors affecting faculty contribution to institutional repositories.” kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reducing psychological resistance to digital repositories | quinn 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. this could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. the affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are openminded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. once the users have been affirmed, librarians can then lead into a discussion of the importance of submitting scholarly research to repositories. self-generated affirmations may be even more effective. for example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. this could serve as a segue into a discussion of the repository as one more opportunity to be innovative. once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. a related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” this is especially easy to do with scholars because they are experts in their fields. casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 rather than the librarian being seen as the persuader, the scholar is placed in that role. by saying to the scholar, “you are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves consideration once you understand how it works and how it may benefit you,” you are empowering the user. casting the user as an expert imparts a sense of control to the user. it helps to disable resistance by placing the user in a position of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. priming and imaging one important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. in an interesting experiment, psychologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. half of them read an editorial discussing a public opinion survey of youth ■■ negative and positive strategies for reducing resistance just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. perhaps the most basic example is to counter resistance by acknowledging it. when scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultaneously generate a covert reaction in the form of resistance. rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately introduce some counterrationales to address those reasons.16 psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. they have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. thus talking about the weaknesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 a librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. the unexpected approach of bringing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. this is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. when suddenly faced with a more candid appeal the user may be thrown off balance psychologically, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to possibly viewing that information in a more positive light. if one way to disarm a user is to begin by discussing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. a stable self-concept is necessary for the user to maintain a sense of order and predictability. reinforcing the self-concept of the user should therefore make the user less likely to resist depositing work in a digital repository. self-affirmed users are 70 information technology and libraries | june 2010 or even possibly collaborating on research. their imaginations could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking conscious processes like imagining are not the only psychological means of reducing the resistance of users to digital repositories. unconscious processes can also be helpful. one example of such a process is what psychologists refer to as the “liking heuristic.” this refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. this tendency results from users constantly being inundated with requests. consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. the liking heuristic holds that users are more likely to help someone they might otherwise not help if they unconsciously identify with the person. at an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. in one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. the subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. each subject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of themselves. the experimenter then told the participants that they would get to see each other’s lists. the experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). the subject was then asked to examine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. at the end of the experiment, the two subjects were brought together and given credit for participating. the experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an english class. the results of the experiment indicated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedonistic needs. the students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. the results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argument for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 while it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. the presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. for example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. interventions like these should help mitigate any predisposition toward resistance on the part of users. imagining is a strategy related to priming that psychologists have found to be effective in reducing resistance. taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psychologists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable tv into two groups. one group was presented with the benefits of cable in a straightforward, informative way that described various features. the other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experience and how entertaining it might be. the psychologists then administered a questionnaire. the results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable tv and to subscribe to it than were those who were only given information about cable tv.21 in other words, imagining resulted in more positive attitudes and beliefs. this study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advantages of depositing their work. they may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been downloaded dozens or even hundreds of times. librarians could ask them to imagine receiving e-mail or calls from colleagues indicating that they had accessed their work in the repository and were interested in learning more about it, reducing psychological resistance to digital repositories | quinn 71 students typically overestimate the amount of drinking that their peers engage in at parties. these inaccurate normative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. by informing students that almost threequarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reducing excessive drinking behavior by students.23 the power of normative messages is illustrated by a recent experiment conducted by a group of psychologists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. the psychologists hypothesized that by appealing to social norms, they could increase compliance rates. to test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. one card appealed to environmental concerns (“help save the environment”), another to environmental cooperation (“partner with us to save the environment”), a third card appealed to the advantage to the hotel (“help the hotel save energy”), a fourth card targeted future generations (“help save resources for future generations”), and a final card appealed to guests by making reference to a descriptive norm of the situation (“join your fellow citizens in helping to save the environment”). the results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 this research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. this might prove to be particularly true if they learned that prominent or influential scholars were engaged in populating repositories with their work. this would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. the idea that accomplished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. a related finding is that the more influential people are, the more likely it is for others to emulate their actions. this is even more probable for highstatus individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. the more she liked the confederate and experienced a perception of consensus, the more likely she was to comply with her request to critique the paper.22 thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. universities sometimes host scholarly communication symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to submit their work to the institution’s repositories. frequently, speakers at these symposia consist of academic administrators, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. the research conducted by psychologists, however, suggests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resistance. librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. perceived consensus and social modeling the processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling researchers why they should submit their work. information about the practices of others may help change beliefs because of the need to identify with other in-group members. this is particularly true of faculty, who are prone to making continuous comparisons with their peers at other institutions and who are highly competitive by nature. once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportunities, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. a perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. psychologists have leveraged the power of identification by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. for example, college 72 information technology and libraries | june 2010 highly resistant users that may be unwilling to submit their work to a repository. rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. this means conveying the facts about open access and digital repositories in the form of a story. stories are less rhetorical and tend not to be viewed by listeners as attempts at persuasion. the intent of the communicator and the counterresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. a well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. in a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more difficult to refute than overtly stated beliefs. listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effective with users who are strongly resistant. the longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. the protagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. the protagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exorbitant prices. these prices far exceed the rate of inflation or the budgets of universities to pay for them. the publishers engage in a series of mergers and acquisitions that swallow up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreasonable demands when negotiating with them. presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. and while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. introducing a time element into the attempt to persuade users to deposit their work in digital repositories can play an important role in reducing resistance. given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization another strategy that psychologists use to reduce resistance among users is to change the definition of the situation. resistant users see the process of submitting their research to the repository as an imposition at best. in their view, the last thing that they need is another obligation or responsibility to burden their already busy lives. psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. in the current situation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the web. seen in this way, the additional work of submission is less of a distraction and more of a career investment. moreover, this approach leverages a related psychological concept that can be useful in helping to dissolve resistance. psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 the point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. a digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. while the medium might have changed, the intention and the goal are the same. reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. in approaching highly resistant individuals, psychologists have discovered that it is essential to depersonalize any appeal to change their behavior. instead of saying, “you should reduce your caloric intake,” it is better to say, “it is important for people to reduce their caloric intake.” this helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. reverting to a third-person level of appeal may allow the message to get through without it being immediately rejected by the user. narrative, timing, and anticipation psychologists recommend another strategy to help defuse reducing psychological resistance to digital repositories | quinn 73 technological platforms, and so on. this could be followed by a reminder to users that it is their choice—it is entirely up to them. this reminder that users have the freedom of choice may help to further counter any resistance generated as a result of instructions or inducements to anticipate regret. indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 offering users the freedom to choose between alternatives tends to make them more likely to comply. this is because having a choice enables users to both accept and resist the request rather than simply focus all their resistance on a single alternative. when presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to complying and wants to take advantage of them but also wants to save face and not give in. by being offered several alternatives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 for example, one alternative option to self-archiving might be to present the faculty member with the option of an authorpays publishing model. the choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. at the same time, the librarian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees in addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. one technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” the original request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. setting a high anchor lessens user resistance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). in this way, a high reference anchor is established for the second request, which makes it seem more reasonable in the newly created context of the higher value.34 the user is thus more likely to comply with the second request when it is framed in this way. using this comparative approach may also work because it creates a feeling of reciprocity in the user. when proposition much more salient. it not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. one resource that may be particularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the open citation project titled, “the effect of open access and downloads (hits) on citation impact: a bibliography of studies.”30 it provides substantial documentation of the effect that open access has on scholarly visibility. an additional stimulus might be introduced in conjunction with the time element in the form of a download report. showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. this investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromising their career advancement. an interesting related finding by psychologists suggests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. in particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. normally, users tend not to ruminate about the possibility of future disappointment in making a decision. if users are made to anticipate future regret, however, they will act in the present to try to minimize it. studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consistently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 in an effort to minimize this anticipated regret, they will then be more prone to comply. based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to submit their work. if they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. getting users to think about the future could be done by asking users to imagine various scenarios involving the negative outcomes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 information technology and libraries | june 2010 submit their work. mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. understanding the psychology of resistance is an important prerequisite to any effort to reduce it. psychologists have assembled a significant body of research on resistance and how to address it. some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talking about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. yet when other more conventional techniques have been tried with little or no success, it may make sense to experiment with some of these approaches. particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychology into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. most strategies that librarians have used in trying to persuade faculty to submit their work have been conventional. they are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. researchers have an important affective dimension that needs to be addressed by these appeals, and the psychological research on resistance suggests that a strictly rational approach may not be sufficient. by incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. such a program may not eliminate resistance but could go a long way toward reducing it. future studies that test the effectiveness of such programs will hopefully be conducted to provide us with a better sense of how they work in real-world settings. references 1. charles w. bailey jr., “institutional repositories: doa?,” online posting, digital koans, aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed apr. 21, 2010). 2. dorothea salo, “yes, irs are broken. let’s talk about it,” online posting, caveat lector, sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed apr. 21, 2010). 3. eprints services, roarmap (registry of open access repository material archiving policies) http://www.eprints .org/openaccess/policysignup/ (accessed july 28, 2009). 4. richard k. johnson, “institutional repositories: partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. the cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 for the most resistant users, breaking the request down into the smallest possible increment may prove helpful. by making the request seem more manageable, the user is encouraged to comply. psychologists conducted an experiment to test whether minimizing a request would result in greater cooperation. they went door-to-door, soliciting contributions to the american cancer society, and received donations from 29 percent of households. they then made additional solicitations, this time asking, “would you contribute? even a penny will help!” using this approach, donations increased to 50 percent. even though the solicitors only asked for a penny, the amounts of the donations were equal to that of the original request. by asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. one final strategy that psychological research suggests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. with any new product, users have fears about what might happen if they try it and they are not satisfied with it. not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. this guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. having this reassurance should make researchers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 the digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. the guarantee will help win that trust. ■■ conclusion the scholarly literature on digital repositories has given little attention to the psychology of resistance. yet the ultimate success of digital repositories depends on overcoming the resistance of scholars and researchers to reducing psychological resistance to digital repositories | quinn 75 20. curtis p. haugtvedt et al., “consumer psychology and attitude change,” in knowles and linn, resistance and persuasion, 283–96. 21. larry w. gregory, robert b. cialdini, and kathleen m. carpenter, “self-relevant scenarios as mediators of likelihood estimates and compliance: does imagining make it so?” journal of personality & social psychology 43, no. 1 (1982): 89–99. 22. jerry m. burger, “fleeting attraction and compliance with requests,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 155–66. 23. john d. clapp and anita lyn mcdonald, “the relationship of perceptions of alcohol promotion and peer drinking norms to alcohol problems reported by college students,” journal of college student development 41, no. 1 (2000): 19–26. 24. noah j. goldstein and robert b. cialdini, “using social norms as a lever of social influence,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 167–90. 25. dale h. schunk, “social-self interaction and achievement behavior,” educational psychologist 34, no. 4 (1999): 219–27. 26. rosanna e. guadagno et al., “when saying yes leads to saying no: preference for consistency and the reverse foot-inthe-door effect,” personality & social psychology bulletin 27, no. 7 (2001): 859–67. 27. mary jiang bresnahan et al., “personal and cultural differences in responding to criticism in three countries,” asian journal of social psychology 5, no. 2 (2002): 93–105. 28. melanie c. green and timothy c. brock, “in the mind’s eye: transportation-imagery model of narrative persuasion,” in narrative impact: social and cultural foundations, ed. melanie c. green, jeffrey j. strange, and timothy c. brock (mahwah, n.j.: lawrence erlbaum, 2004): 315–41. 29. oswald huber, “time pressure in risky decision making: effect on risk defusing,” psychology science 49, no. 4 (2007): 415–26. 30. the open citation project, “the effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies,” july 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed july 29, 2009). 31. matthew t. crawford et al., “reactance, compliance, and anticipated regret,” journal of experimental social psychology 38, no. 1 (2002): 56–63. 32. nicolas gueguen and alexandre pascual, “evocation of freedom and compliance: the ‘but you are free of . . .’ technique,” current research in social psychology 5, no. 18 (2000): 264–70. 33. james p. dillard, “the current status of research on sequential request compliance techniques,” personality & social psychology bulletin 17, no. 3 (1991): 283–88. 34. thomas mussweiler, “the malleability of anchoring effects,” experimental psychology 49, no. 1 (2002): 67–72. 35. robert b. cialdini and noah j. goldstein, “social influence: compliance and conformity,” annual review of psychology 55 (2004): 591–21. 36. james m. wyant and stephen l. smith, “getting more by asking for less: the effects of request size on donations of charity,” journal of applied social psychology 17, no. 4 (1987): 392–400. 37. lydia j. price, “the joint effects of brands and warranties in signaling new product quality,” journal of economic psychology 23, no. 2 (2002): 165–90. with faculty to enhance scholarly communication,” d-lib magazine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed apr. 2, 2008). 5. bruce heterick, “faculty attitudes toward electronic resources,” educause review 37, no. 4 (2002): 10–11. 6. nancy fried foster and susan gibbons, “understanding faculty to improve content recruitment for institutional repositories,” d-lib magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed july 29, 2009). 7. suzanne bell, nancy fried foster, and susan gibbons, “reference librarians and the success of institutional repositories,” reference services review 33, no. 3 (2005): 283–90. 8. diane harley et al., “the influence of academic values on scholarly publication and communication practices,” center for studies in higher education, research & occasional paper series: cshe.13.06, sept. 1, 2006, http://repositories.cdlib.org/ cshe/cshe-13-06/ (accessed apr. 17, 2008). 9. rea devakos, “towards user responsive institutional repositories: a case study,” library high tech 24, no. 2 (2006): 173–82. 10. philip m. davis and matthew j. l. connolly, “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace,” d-lib magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed july 29, 2009). 11. jihyun kim, “motivating and impeding factors affecting faculty contribution to institutional repositories,” journal of digital information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed july 29, 2009). 12. peter suber, “open access overview” online posting, open access news: news from the open access environment, june 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 july 2009). 13. see, for example, jeffrey d. ford and laurie w. ford, “decoding resistance to change,” harvard business review 87, no. 4 (2009): 99–103.; john p. kotter and leonard a. schlesinger, “choosing strategies for change,” harvard business review 86, no. 7/8 (2008): 130–39; and paul r. lawrence, “how to deal with resistance to change,” harvard business review 47, no. 1 (1969): 4–176. 14. julia zuwerink jacks and maureen e. o’brien, “decreasing resistance by affirming the self,” in resistance and persuasion, ed. eric s. knowles and jay a. linn (mahwah, n.j.: lawrence erlbaum, 2004): 235–57. 15. benjamin margolis, “notes on narcissistic resistance,” modern psychoanalysis 9, no. 2 (1984): 149–56. 16. ralph grabhorn et al., “the therapeutic relationship as reflected in linguistic interaction: work on resistance,” psychotherapy research 15, no. 4 (2005): 470–82. 17. arthur aron et al., “the experimental generation of interpersonal closeness: a procedure and some preliminary findings,” personality & social psychology bulletin 23, no. 4 (1997): 363–77. 18. geoffrey l. cohen, joshua aronson, and claude m. steele, “when beliefs yield to evidence: reducing biased evaluation by affirming the self,” personality & social psychology bulletin 26, no. 9 (2000): 1151–64. 19. anthony r. pratkanis, “altercasting as an influence tactic,” in attitudes, behavior and social context: the role of norms and group membership, ed. deborah j. terry and michael a.hogg (mahwah, n.j.: lawrence erlbaum, 2000): 201–26. resource discovery: comparative survey results on two catalog interfaces heather hessel and janet fransen resource discovery: comparative survey results | hessel and fransen 21 abstract like many libraries, the university of minnesota libraries-twin cities now offers a next-generation catalog alongside a traditional online public access catalog (opac). one year after the launch of its new platform as the default catalog, usage data for the opac remained relatively high, and anecdotal comments raised questions. in response, the libraries conducted surveys that covered topics such as perceptions of success, known-item searching, preferred search environments, and desirable resource types. results show distinct differences in the behavior of faculty, graduate student, and undergraduate survey respondents, and between library staff and non-library staff respondents. both quantitative and qualitative data inform the analysis and conclusions. introduction the growing level of searching expertise at large research institutions and the increasingly complex array of available discovery tools present unique challenges to librarians as they try to provide authoritative and clear searching options to their communities. many libraries have introduced next-generation catalogs to satisfy the needs and expectations of a new generation of library searchers. these catalogs incorporate some of the features that make the current web environment appealing: relevancy ranking, recommendations, tagging, and intuitive user interfaces. traditional opacs are generally viewed as more complex systems, catering to advanced users and requiring explicit training in order to extract useful data. some librarians and users also see them as more effective tools for conducting research than next-generation catalogs. academic libraries are frequently caught in the middle of conflicting requirements and expectations for discovery from diverse sets of searchers. in 2002, the university of minnesota-twin cities libraries migrated from the notis library system to the aleph500™ system and launched a new web interface based on the aleph online catalog, originally branded as mncat. in 2006, the libraries contracted with the ex libris group as one of three development partners in the creation of a new next-generation search environment called primo. during the development process, the libraries conducted multiple usability studies that provided data to inform the direction of the product. participants in the usability studies generally characterized the primo interface as “clear” and “efficient.”1 a year later the university heather hessel (heatherhessel@yahoo.com) was interim director of enterprise technology and systems, janet fransen (fransen@umn.edu) is the librarian for aerospace engineering, electrical engineering, computer science, and history of science & technology, university of minnesota, minneapolis, mn. mailto:heatherhessel@yahoo.com mailto:fransen@umn.edu information technology and libraries | june 2012 22 libraries branded primo as mncat plus, rebranded the aleph opac as mncat classic, and introduced mncat plus to the twin cities user community as a beta service. in august 2008, mncat plus was configured as the default search for the twin cities catalog on the libraries’ main website, with the libraries continuing to keep a separate link active to the aleph opac. a new organizational body called the primo management group was created in december 2008 to coordinate support, feedback, and enhancements of the local primo installation. this committee’s charge includes evaluating user input and satisfaction, coordinating communication to users and staff, and prioritizing enhancements to the software and the normalization process. when the primo management group began planning its first user satisfaction survey, the group noted that a significant number of library users seemed to prefer mncat classic. therefore, two surveys were developed in response to the group’s charge. these two surveys were identical in scope and questions, except that one survey referenced mncat classic and was targeted to mncat classic searchers (appendix a), while the other survey referenced mncat plus and was targeted to mncat plus searchers (appendix b). these surveys were designed to produce statistics that could be used as internal benchmarks to gauge library progress in areas of user experience, as well as to assist with ongoing and future planning with regard to discovery tools and features. research questions in addition to evaluating user satisfaction and requesting user input, the primo management group also chose to question users about searching behaviors in order to set the direction of future interface work. questions directed toward searching behaviors were informed by the findings from a 2009 university of minnesota libraries report on making resources discoverable.2 the group surveyed respondents about types of items they expect to find in their searches, their interest in online resources, and the entry point for their discovery experience. the primo management group crafted the surveys to get answers to the following research questions:  how often do users view their searching activity as successful?  how often do users know the title of the item that they are looking for, as opposed to finding any resource relevant to their topic?  what search environments do users choose when looking for a book? a journal? anything relevant to a topic?  how interested are users in finding items that are not physically located at the university of minnesota?  are there other types of resources that users would find helpful to discover in a catalog search? resource discovery: comparative survey results | hessel and fransen 23 although it can be tempting to think of the people using the catalog interfaces as a homogeneous group of “users,” large academic libraries serve many types of users. as wakimoto states in “scope of the library catalog in times of transition,” on the one hand, we have ‘net-generation users who are accustomed to the simplicity of the google interface, are content to enter a string of keywords, and want only the results that are available online. on the other hand, we have sophisticated, experienced catalog users who understand the purpose of uniform titles and library of congress classifications and take full advantage of advanced search functions. we need to accommodate both of these user groups effectively.3 the primo management group planned to use the demographic information to look for differences among user communities; therefore the surveys requested demographic information such as role (e.g., student) and college of affiliation (e.g., school of dentistry). in designing the surveys, the group took into account the limitations of this type of survey as well as the availability of other sources of information. for example, the primo management group chose not to include questions about specific interface features because such questions could be answered by analyzing data from system logs. the group was also interested in finding out about users’ strategies for discovering information, but members felt that this information was better obtained through focus groups or usability studies rather than through a survey instrument. research method the primo management group positioned links to the user surveys in several online locations, with the libraries’ home page providing one primary entry point. clicking on the link from the home page presented users with an intermediate page, where they were given a choice of which survey to complete: one based on mncat plus, and the other on mncat classic. if desired, users could choose to complete a separate survey for each of the two systems. links were also provided from within the mncat plus and mncat classic environments, and these links directed users to the relevant version of the survey without the intermediary page. in addition to the survey links in the online environment, announcements were made to staff about the surveys, and librarians were encouraged to publicize the surveys to their constituents around campus. the survey period lasted from october 1 through november 25, 2009. at the time of the surveys, the university of minnesota libraries was running primo version 2 and aleph version 19. because participants were self-selected, the survey results represent a biased sample, are more extreme than the norm, and are not generalizable to the whole university population. participants were not likely to click the survey link or respond to e-mailed requests unless they had sufficient incentive, such as strong feelings about one interface or the other. thirty percent of respondents provided an e-mail address to indicate that they would be willing to be contacted for focus groups or further surveys, indicating a high level of interest in the public-facing interfaces the libraries employ. in considering a process for repeating this project, more attention would be paid to methodology to address validity concerns. findings and analysis information technology and libraries | june 2012 24 findings relevant to each research question are discussed here. six hundred twenty-nine surveys contained at least one response—476 for mncat plus and 153 for mncat classic. responses by demographics as shown in table 1, graduate students were the primary respondents for both mncat plus and mncat classic, followed by undergraduates and faculty members. library staff made up 13 percent of mncat classic respondents and 4 percent of mncat plus respondents, although the actual number of library staff responding was nearly identical (twenty-one for mncat plus, twenty for mncat classic). library staff members were disproportionately represented in these survey responses and the group analyzed the results to identify categories in which library staff members differed from overall trends in the responses. questions about affiliation appeared at the end of the surveys, which may account for the high number of respondents in the “unspecified” category. mncat classic respondents frequency mncat plus respondents frequency graduate student 50 33% graduate student 176 37% undergraduate student 31 20% undergraduate student 110 23% library staff 20 13% faculty 40 8% faculty 21 14% staff (non-library) 28 6% staff (non-library) 10 7% library staff 21 4% community member 2 1% community member 11 2% (unspecified) 19 12% (unspecified) 90 19% total 153 100% total 476 100% table 1. respondents by user population a comparison of the student survey responses shows that graduate students were overrepresented, while undergraduates were underrepresented, at close to a reverse ratio. of the total number of graduate and undergraduate students, 62 percent of the respondents were graduate students, even though they accounted for only 32 percent in the larger population. conversely, undergraduates represented only 38 percent of the student respondents, even though they accounted for 68 percent of the graduate and undergraduate total. regrettably, the surveys did not include options for identifying oneself as a non-degree-seeking or professional student, so the analysis of students compared with overall population in this section includes only graduate students and undergraduates. differences were also apparent in the representation of all four categories of students within a particular college unit. at least two college units were underrepresented in the survey responses: resource discovery: comparative survey results | hessel and fransen 25 carlson school of management and the college of continuing education. one college unit was overrepresented in the survey results; 59 percent of the overall student respondents to the mncat classic survey, and 47 percent of the mncat plus students indicated that they were housed in the college of liberal arts (cla), and yet cla students only represent 32 percent of the total number of students on campus. table 2 shows the breakdown of percentages by college or unit and the corresponding breakdown by survey respondent, highlighting where significant discrepancies are evident. twin cities overall percentage of students mncat classic student survey respondents +/mncat plus student survey respondents +/ carlson school of management 9% 0% -9% 2% -7% center for allied health 0% 2% +1% 1% 0% col of educ/human development 10% 9% -1% 14% +3% col of food, agr & nat res sci 5% 4% 0% 7% +2% coll of continuing education 8% 1% -7% 1% -7% college of biological sciences 4% 6% +2% 5% 0% college of design 3% 3% 0% 3% 0% college of liberal arts 32% 59% +27% 47% +15% college of pharmacy 1% 1% 0% 0% -1% college of veterinary medicine 1% 1% 0% 1% 0% graduate school 0% 0% 0% 0% 0% humphrey inst of publ affairs 1% 1% 0% 1% 0% institute of technology (now college of science & engineering) 14% 9% -5% 10% -4% law school 2% 1% -1% 1% 0% medical school 4% 2% -3% 5% 0% school of dentistry 1% 1% 0% 0% -1% school of nursing 1% 0% -1% 0% -1% school of public health 2% 1% -1% 3% +1% table 2. student responses by affiliation information technology and libraries | june 2012 26 faculty and staff together totaled only eighty-nine respondents on the mncat plus survey and fifty-one respondents on the mncat classic survey. in keeping with graduate and undergraduate student trends, the college of liberal arts (cla) was clearly over-represented in terms of faculty responses. the cla faculty group represents about 17 percent of the faculty at the university of minnesota. yet over half the faculty respondents on the mncat plus survey were from cla; over 80 percent of the mncat classic faculty respondents identified themselves as affiliated with cla. faculty groups that were underrepresented include the medical school and the institute of technology. perceptions of success a critical area of inquiry for the surveys was user satisfaction and perceptions of success: “do users perceive their searching activity as successful?” asked in both surveys, the question’s responses allowed the primo management group to compare respondents’ perceived success between the two interfaces. results show a marked difference: while 86 percent of the mncat classic respondents reported that they are “usually” or “very often” successful at finding what they are looking for, only 62 percent of the mncat plus respondents reported the same perception of success. respondents reported very similar rates of success regardless of school, type of affiliation, or student status. figure 1. perceptions of success: mncat plus and mncat classic these results should be interpreted cautiously. because mncat plus is the libraries’ default catalog interface, mncat classic users are a self-selecting group whose members make a conscious decision to bookmark or click the extra link to use the mncat classic interface. one cannot assume that mncat users in general also would have an 86 percent perception of success were they to use mncat classic; familiarity with the tool could play a part in mncat classic users’ success. 14% 24% 44% 18% 4% 11% 32% 54% 0% 10% 20% 30% 40% 50% 60% rarely sometimes usually very often mncat classic mncat plus resource discovery: comparative survey results | hessel and fransen 27 another possible factor in the reported difference in user success is the higher proportion of known-item searching—finding a book by title—occurring in mncat classic. a user’s criteria for success differ when searching for a known item versus conducting a general topical search. it is easier for a searcher to determine that they have been successful in a situation where they are looking for a specific item. some features of mncat classic, such as the start-of-title and other browse indexes, are well suited to known-item searching and had no direct equivalent in mncat plus, which defaults to relevance-ranked results. (primo version 3 has implemented new features to enhance known-item searching.) comments received from users suggest that several factors played a role. one mncat classic respondent praised the “precision of the search...not just lots of random hits” and noted that mncat classic supports a “[m]ore focused search since i usually already know the title or author.” in contrast, a mncat plus respondent commented that the next-generation interface was “great for browsing topics when you do not have a specific title in mind.” this comment is consonant with the results from other usability testing done on next-generation catalogs. in "next generation catalogs: what do they do and why should we care?", emanuel describes observed differences between topical and known-item searching: “during the testing, users were generally happy with the results when they searched for a broad term, but they were not happy with results for more specific searches because often they had to further limit to find what they wanted in the first screen of results.”4 a common characteristic of next-generation catalogs is that they return a large result set that can then be limited using facets. training and experience may also explain some of the differences in success. mncat plus also enables functionality associated with the functional requirements for bibliographic records (frbr), which is intended to group items with the same core intellectual content in a way that is more intuitive to searchers. however, this feature is unfamiliar to traditional catalog searchers and requires an extra step to discover very specific known-items in primo. one mncat plus user expressed dissatisfaction and added, “i'm not sure if it's my lack of training/practice or that the system is not user-friendly.” in focus group analyses conducted in 2008, oclc found that “when participants conducted general searches on a topic (i.e., searches for unknown items) that they expressed dissatisfaction when items unrelated to what they were looking for were returned in the results list. end users may not understand how to best craft an appropriate search strategy for topic searches.”5 how often do users know the title of the item that they are looking for? users come to the library with different goals in mind. in “chang's browsing,” available in theories of information behavior, chang identified five general browsing themes,6 adapted to discovery by carter.7 for the purposes of the survey, the primo management group grouped those themes into two goals: finding an item when the title is known, and finding anything on a given topic. the primo management group had heard concerns from faculty and staff that they have more difficulty finding an item when they know the title when using mncat plus than they did with mncat classic. the group was interested in knowing how often users search for known items. to explore this topic and its impact on perceptions of success, the surveys included two questions on known-item and topical searching. the survey results shown in table 3 indicate that a significantly higher proportion of mncat classic respondents (30 percent plus 43 percent = 73 percent) than mncat plus respondents (24 information technology and libraries | june 2012 28 percent plus 29 percent = 53 percent) were “very often” or “usually” searching for known items. it may be that users in search of known items have learned to go to mncat classic rather than mncat plus. rarely sometimes usually very often total i already know the title of the item i am looking for mncat classic 7% (11) 19% (29) 30% (46) 43% (66) 152 mncat plus 15% (69) 33% (151) 24% (111) 29% (132) 463 i am looking for any resource relevant to my topic mncat classic 14% (21) 32% (47) 20% (29) 34% (51) 148 mncat plus 14% (62) 29% (133) 29% (133) 28% (127) 455 table 3. responses to “i already know the title of the item i am looking for” when the primo management group considered how often researchers in different user roles searched for known items versus anything on a topic, clear patterns emerged as shown in figure 2. in the mncat plus survey, only 34 percent of undergraduate mncat plus searchers “usually” or “very often” search for a particular item, versus 74 percent of faculty. conversely, 75 percent of undergraduate respondents “usually” or “very often” search for any resource relevant to a topic, versus 37 percent of faculty. graduate student respondents showed interest in both kinds of use. if successful browsing by topic is best achieved using post-search filtering, it may help to explain differences between undergraduate students and faculty. the analysis of usability testing done on other next generation catalogs described in “next generation catalogs: what do they do and why should we care?” states that “users that did not have extensive searching skills were more likely to appreciate the search first, limit later approach, while faculty members were faster to get frustrated with this technique.”8 results for all mncat classic respondents showed a preference for known item searching, but undergraduate students still indicated that they search more for anything on the topic and less for known items than faculty respondents. no significant differences were identified by discipline. resource discovery: comparative survey results | hessel and fransen 29 figure 2. searching for a known item vs. any relevant resource some qualitative comments from survey takers suggest that respondents view the library interface as a place to go to find something already known to exist, e.g., “i never want to search by topic. library catalogs are for looking up specific items.” however, with respect to discovering resources for a subject in general, both mncat classic and mncat plus respondents showed that they would also like to find items relevant to their topic (figure 2). there was no significant difference between mncat classic and mncat plus respondents on this question; in both environments, only 14 percent of the users said that they would “rarely” be interested in general results relevant to their topic. perceptions of success by specific characteristics for mncat plus, the majority of respondents “somewhat agree” or “strongly agree” that items available online or in a particular collection are easy to find. one-third of the mncat plus respondents had never tried to find an item in a particular format. over 40 percent had never tried to find an item with a particular isbn/issn. interface features may be a factor here: isbn/issn searching is not a choice in the mncat plus drop down menu, so users may not know that they can do such a search. a higher percentage of mncat classic respondents “strongly agree” that it is easy to find items by collection, available online, or in a particular format, than mncat plus respondents. figure 3 shows results based on particular characteristics. information technology and libraries | june 2012 30 figure 3. perception of success by characteristic although the surveys were primarily intended to gather reactions from end users, some interesting data emerged about usage by library staff. as demonstrated in figure 4, library staff respondents were much more likely to have performed the specific types of searches listed in this section than users generally, and reported a much higher rate of perceived success with mncat classic. figure 4. perception of success by characteristic: library staff resource discovery: comparative survey results | hessel and fransen 31 searching by location: local collections and other resources in a large research institution with several physical library locations and many distinct collections, users need the ability to quickly narrow a search to a particular collection. but even the largest institution cannot collect everything a researcher might need. the primo management group wondered not only whether users felt successful when they looked for an item in a particular collection but also wanted to explore whether users want to see items not owned by the institution as part of their search results. finding items among the many library locations was not a problem for either mncat plus or mncat classic respondents: 72 percent either somewhat or strongly agreed that it is easy to find items in a particular collection using mncat. furthermore, survey respondents of both interfaces agreed that they are interested in items no matter where the items are, which underlines the value of a service such as worldcat; 73 percent of mncat plus respondents and 78 percent of mncat classic respondents expressed a preference for seeing items held by other libraries, knowing they could request items using an interlibrary loan service if necessary. preferred search environments three of the survey questions asked users about their preferred search environments for different searching needs:  when looking for a particular book  when looking for a particular journal article  when searching without a particular title in mind each survey presented respondents with a list of choices and space to specify other sources not listed. respondents were encouraged to mark as many sources as they regularly use. when searching for a specific book, users of the two catalog environments identified a number of other sources. the top five sources in each survey are listed in table 4. when i am looking for a specific book, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. mncat classic (116) 1. mncat plus (217) 2. worldcat (50) 2. google (165) 3. amazon (50) 3. mncat classic (163) 4. google (49) 4. amazon (160) 5. google books (31) 5. google books (108) table 4. search environment for books information technology and libraries | june 2012 32 qualitative comments indicated that users like being able to connect to amazon and google books in order to look at tables of contents and reviews. they also specifically mentioned barnes and noble, as well as other local libraries. these results show that mncat plus respondents were more likely to also use mncat classic than vice-versa. the data do not suggest why this would be the case, but familiarity with the older interface may play a role. mncat classic respondents were more likely than mncat plus users to return to their search environment when searching for a particular book (82 percent versus 53 percent). one mncat plus respondent commented “i didn't know i could still get to mncat classic.” when searching for a specific journal article, users of both systems chose “other databases (jstor, pubmed, etc.)” above all the other choices. even more respondents would likely have marked this choice if not for confusion over the term “other databases.” most of the comments mentioned specific databases, even when the respondent had not selected the “other databases” choice. one user commented, “most of these choices would be illogical. you don't list article indexes, that's where i go first.” table 5 lists the five responses marked most often for each survey. when i am looking for a specific journal article, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (92) 1. other databases (jstor, pubmed, etc.) (232) 2. mncat classic (53) 2. google scholar (131) 3. google scholar (40) 3. e-journals list (130) 4. e-journals list (34) 4. mncat plus (110) 5. google (29) 5. mncat plus article search (101) table 5. search environment for articles. qualitative comments from respondents indicated that interfaces would be more useful if they helped users find online journal articles. this raised some questions with regard to mncat plus, which includes a tab labeled “articles” for conducting federated article searches. however, mncat plus respondents noted that they used the plus “articles” search almost as much as they did mncat plus. other plus comments included: i tried to use this for journal articles but it only has some in the database i guess and when i did my search it only found books and no articles. i don't understand it. i tried this new one and it came up with wierd [sic] stuff in terms of articles. my professor said to give up and use the regular indexes because i wasn't getting what i needed to do the paper. it wasted my time. this desire for federated search coupled with the expressions of dissatisfaction with the existing federated search platform is consistent with the mixed opinions expressed in other studies, such as sam houston state university’s assessment of use of and satisfaction with the webfeat resource discovery: comparative survey results | hessel and fransen 33 federated search tool. that study found “[f]ederated search use was highest among lower-level undergraduates, and both use and satisfaction declined as student classification rose.”9 the new search tools that contain preindexed articles, such as primo central, summon, worldcat local, and ebsco discovery service, may address the frustrations that more experienced searchers express regarding federated search technology. when researching a topic without a specific title in mind, “google” and “other databases” were nearly equal and ranked first for mncat plus respondents, while “other databases” ranked first for mncat classic respondents. table 6 lists the five responses marked most option for each survey. when i am researching a topic without a specific title in mind, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (84) 1. google (197) 2. mncat classic (76) 2. other databases (jstor, pubmed, etc.) (192) 3. google (63) 3. google scholar (155) 4. google scholar (47) 4. mncat plus (145) 5. worldcat (32) 5. mncat classic (101) table 6. search environment for topics significant differences based on school affiliation were evident in the area of preferred search environments for topical research. for example, institute of technology respondents reported using google much more often when researching without a specific title in mind than respondents in other areas. evidence from the health sciences is limited in that only seven percent of respondents in total identified themselves as being from this area. however, these limited results show that health sciences respondents relied more on library databases than on google. respondents in the liberal arts relied more on mncat, in either version, than did respondents in the other fields. desired resource types one feature of the primo discovery interface is its ability to aggregate records from more than one source. university libraries maintains several internal data sources that are not included in the catalog, and the possibility of including some of these in the mncat plus catalog has been considered many times since primo’s release. the primo management group was interested to hear from users whether they would find three types of internal sources useful: research reports and preprints, online media, and archival finding aids. the group also asked users to mark “online journal articles” if they would find article results helpful. the question did not specify whether journal articles would appear integrated with other search results in a mncat “books” search or information technology and libraries | june 2012 34 in a separate search such as that already provided through a metasearch on the mncat plus articles tab. the surveys asked users what kinds of resources would make mncat more useful. the results for both mncat plus and mncat classic were similar and response counts for both surveys were ordered as shown in table 7. respondents could mark more than one of the choices. i would find mncat more useful if it helped me find: mncat classic frequency mncat plus frequency online journal articles 65 255 u of m research materials (e.g., research reports, preprints) 34 149 online media (e.g., digital images, streaming audio/visual) 27 134 archival finding aids 27 90 table 7. desired resource types the primo management group noted that more mncat plus respondents chose “online journal articles” more frequently than the other categories even though the mncat plus interface includes an “articles” tab for federated searching. it is unclear whether the respondents were not seeing the “articles” tab in mncat plus because they would like to see search results integrated, or if they were using the “articles” tab and were not satisfied with the results. comments from respondents generally supported the inclusion of a wider range of resources in mncat. however, several respondents also expressed concerns about the trade-offs that might be involved in providing wider coverage. one user liked the idea of having the databases “all … in one place,” but added that “it would have to just give you the stuff that you need.” several users cited the varying quality of the material discovered through library sources. one user supported the inclusion of articles “if it included good articles and not the ones i got.” a mncat classic respondent gave the variable quality of the material he or she had found through a database search as a reason for leaving the coverage of mncat as it is: “i use the best sources depending on my needs.” another mncat classic user expressed doubt that coverage of all disciplines was feasible. in commenting on the content of mncat, respondents also mentioned specific types of material that they wanted to see (e.g. archives of various countries), as well as difficulties with particular classes of material (“the confusing world of government documents”). one mncat plus user related his or her interest in public domain items to a specific item of functionality that would enhance their discovery, namely a date sort. in general, the interest in university of minnesota research material was fairly high. however, faculty members ranked university of minnesota research materials last in terms of preference: only twelve faculty respondents chose the option, out of sixty-one total faculty respondents. resource discovery: comparative survey results | hessel and fransen 35 conclusions the data from two surveys, conducted concurrently in 2009 on a traditional opac (mncat classic) and next-generation catalog (mncat plus), point to differences in the use and perceptions of both systems. there appeared to be fairly strong “brand loyalty” with mncat classic, given that this interface is no longer the default search for the libraries. surveys for both systems suggest a perception of success that is lower than desirable and that there is room to improve the quality of the discovery experience. it is unclear from the data if the reported perceptions of success were the result of the systems not finding what the user wants, or if the systems did not contain what the user wanted to find. mncat classic respondents were more likely to use worldcat to find a specific book than mncat plus respondents. mncat plus respondents indicated a use of mncat classic, but not vice versa. both sets of surveys described use of amazon and google for discovery. mncat plus respondents reported lower rates of success at finding known items than mncat classic respondents. mncat classic respondents were far more likely to have a specific title in mind that they wanted to obtain; half of the mncat plus respondents reported having a specific title in mind. the team that examined the survey responses found that the data suggested several key attributes that should be present in the libraries discovery environment. further discussion of the results and suggested attributes was conducted with library staff members in open sessions. results also informed local work on improving discovery interfaces. the results suggested:  the environment should support multiple discovery tasks, including known-item searching and topical research.  support for discovery activity should be provided to all primary constituent groups, noting the significant survey response by graduate student searchers.  users want to discover materials that are not owned by the libraries, in addition to local holdings.  a discovery environment should make it easy for users to find and access resources in vendor-provided resources, such as jstor and pubmed. while the results of the 2009 surveys provided a valuable description of usage, the survey team recognized that methodological choices limit the usefulness in applying results to a larger population. the team also recognized that there were a number of questions yet unanswered. some of these outstanding questions present opportunities for future research and suggest that a variety of formats might be useful, including surveys, focus groups, and targeted interviews.  to what extent do users expect to find integrated search results among different kinds of content, such as articles, databases, indexes, and even large scale data sets?  what general search strategies do users use to navigate the complex discovery environment that is available to them, and where are the failure points?  how much of the current environment requires training and how much is truly intuitive to users? information technology and libraries | june 2012 36  how can the university libraries identify and serve users who did not complete the surveys?  how useful would users find targeted results based on a particular characteristic such as role, student status, or discipline? since the surveys were conducted, the university libraries upgraded to primo version 3, which included features to address some of the concerns respondents identified in the surveys, such as known-item searching. primo version 3 allows users to conduct a left-justified title search (“title begins with…”), as well as sort by fields such as title and author. once the new version has been in place long enough for users to develop some comfort with the interface, the primo management group intends to resolve methodological issues and repeat its surveys, measuring users’ reactions against the baseline data set in the 2009 surveys. acknowledgements we would like to thank the other members of the primo management group, who helped to design and implement the surveys, as well as analyze and communicate the results: chew chiat naun (chair), susan gangl, connie hendrick, lois hendrickson, kristen mastel, r. arvid nelsen, and jeff peterson. we also want to acknowledge the helpful feedback and guidance of the group’s sponsor, john butler. references 1 tamar sadeh, “user experience in the library: a case study.” new library world 109, no. 1/2 (2008): 7–24. 2 cody hanson et al., discoverability phase 1 final report (minneapolis: university of minnesota, 2009), http://purl.umn.edu/48258/ (accessed dec. 20, 2010). 3 jina choi wakimoto, “scope of the library catalog in times of transition.” cataloging & classification quarterly 47, no. 5 (2009): 409–26. 4 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 5 karen calhoun, diane cellentani, and oclc, online catalogs : what users and librarians want: an oclc report (dublin, ohio: oclc, 2009). 6 shan-ju chang, “chang's browsing,” in theories of information behavior, ed. karen e. fisher, sandra erdelez and lynne mckechnie, 69-74 (medford, n.j.: information today, 2005). 7 judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (december 2009): 161–63. 8 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 9 abe korah and erin dorris cassidy. “students and federated searching: a survey of use and satisfaction,” reference & user services quarterly 49, no. 4 (summer 2010): 325–32. https://purl.umn.edu/48258 resource discovery: comparative survey results | hessel and fransen 37 appendix a. mncat classic survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat classic for these different purposes. 1. when i visit mncat classic… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 38 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat classic and what new features (if any) you’d like to see. 5. when i use mncat classic very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat classic strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat classic an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 39 7. i would find mncat classic more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat classic. 9. what i like most about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat classic, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 40  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ resource discovery: comparative survey results | hessel and fransen 41 appendix b. mncat plus survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat plus for these different purposes. 1. when i visit mncat plus… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 42 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat plus and what new features (if any) you’d like to see. 5. when i use mncat plus very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat plus strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat plus an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 43 7. i would find mncat plus more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat plus. 9. what i like most about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat plus, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 44  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ article explainable artificial intelligence (xai) adoption and advocacy michael ridley information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14683 michael ridley (mridley@uoguelph.ca) is librarian, university of guelph. © 2022. abstract the field of explainable artificial intelligence (xai) advances techniques, processes, and strategies that provide explanations for the predictions, recommendations, and decisions of opaque and complex machine learning systems. increasingly academic libraries are providing library users with systems, services, and collections created and delivered by machine learning. academic libraries should adopt xai as a tool set to verify and validate these resources, and advocate for public policy regarding xai that serves libraries, the academy, and the public interest. introduction explainable artificial intelligence (xai) is a subfield of artificial intelligence (ai) that provides explanations for the predictions, recommendations, and decisions of intelligent systems.1 machine learning is rapidly becoming an integral part of academic libraries. xai is a set of techniques, processes, and strategies that libraries should adopt and advocate for to ensure that machine learning appropriately serves librarianship, the academy, and the public interest. knowingly or not, libraries acquire and provide access to systems, services, and collections infused and directed by machine learning methods, and library users are engaged in information behavior (e.g., seeking, using, managing) facilitated or augmented by machine learning. machine learning in library and information science (lis), as with many other fields, has become ubiquitous. however, this technology is often opaque and complex, yet consequential. there are significant concerns about bias, unfairness, and veracity.2 there are troubling questions about user agency and power imbalances.3 while lis has a long-standing interest in ai and intelligent information systems generally, 4 it has only recently turned its attention to xai and how it affects the field and how the field might influence it.5 xai is a critical lens through which to view machine learning in libraries. it is also a set of techniques, processes, and strategies essential to influencing and shaping this stil l emerging technology: research libraries have a unique and important opportunity to shape the development, deployment, and use of intelligent systems in a manner consistent with the values of scholarship and librarianship. the area of explainable artificial intelligence is only one component of this, but in many ways, it may be the most important.6 dismissing engagement with xai because it is “highly technical and impenetrable to those outside that community” is neither acceptable nor increasingly possible.7 artificial intelligence is the essential substrate of contemporary information systems and xai is a tool set for critical assessment and accountability. the details matter and must be understood if libraries are to have a place at the table as xai, and machine learning, evolves and further deepens its effect on lis. mailto:mridley@uoguelph.ca information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 2 this paper provides an overview of xai with key definitions, a historical context, and examples of xai techniques, strategies, and processes that form the basis of the field. it considers areas where xai and academic libraries intersect. the dual emphasis is on xai as a toolset for libraries to adopt and xai as an area for public policy advocacy. what is xai? xai is plagued by definitional problems.8 some definitions are focused solely and narrowly on the technical concepts while others focus only on the broad social and political dimensions. lacking “a theory of explainable ai, with a formal and universally agreed definition of what explanations are,”9 the fundamentals of this field are still being explored, often from different disciplinary perspectives.10 critical algorithm studies position machine learning as socio-techno-informational systems.11 as such, a definition of xai must encompass not just the techniques, as important and necessary as they are, but also the context within which xai operates. the us defense advanced research projects agency (darpa) description of xai captures the breadth and scope of the field. the purpose of xai is for ai systems to have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” 12 and to “enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.”13 xai is needed to: 1. generate trust, transparency, and understanding; 2. ensure compliance with regulations and legislation; 3. mitigate risk; 4. generate accountable, reliable, and sound models for justification; 5. minimize or mitigate bias, unfairness, and misinterpretation in model performance and interpretation; and 6. validate models and validate explanations generated by xai.14 xai consists of testable and unambiguous proofs, various verification and validation methods that assess influence and veracity, and authorizations that define requirements or mandate auditing within a public policy framework. xai is not a new consideration. explainability has been a preoccupation of computer science since the early days of expert systems in the late twentieth century.15 however, the 2018 introduction of the general data protection regulation (gdpr) by the european union (eu) shifted explainability from a purely technical issue to one with an additional and urgent focus on public policy.16 while the presence of a “right to explanation” in the gdpr is highly contested, 17 industry groups and jurisdictions beyond the eu recognized its evitability spurring an explosion in xai research and development.18 types of xai taxonomies of xai types are classified based on their scope and mechanism.19 local explanations interpret the decisions of a machine learning model used in a specific instance (i.e., involving data and context relevant to the circumstance). global explanations interpret the model more generally (i.e., involving all the training data and relevant contexts). in black-box or model-agnostic explanations, only the input and the output of the machine learning model are required while information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 3 white-box or model-specific explanations require more detailed information regarding the processing or design of the model. another way to categorize xai is as proofs, validations, and authorizations. proofs are testable, traceable, and unambiguous explanations demonstrable through causal links, logic statements, or transparent processes. typically, proofs are only available for ai systems that use “inherently interpretable” techniques such as rules, decisions trees, or linear regressions.20 validations are explanations that confirm the veracity of the ai system. these verifications occur through testing procedures, reproducibility, approximations and abstractions, and justifications. authorizations are explanations because of processes in which third parties provide some form of standard, ratification, prohibition, or audit. authorizations might pertain to the ai model, its operation in specific instances, or even the process by which the ai was created. they can be provided by professional groups, nongovernmental organizations, governments and government agencies, and third parties in the public and private sector. academic libraries can adopt proofs and validations as means to interrogate information systems and resources. this includes collections which are increasingly machine learning systems themselves or developed with machine learning methods. the recognition of “collections as data” is an important shift in this direction.21 where appropriate, proofs and validations should accompany content and systems derived from machine learning. libraries must also engage with xai as authorizations to assess the public policy implications that exist, are emergent, or are necessary. library advocacy is currently lacking in this area. the requirement for policy and governance frameworks is a reminder that machine learning is “far from being purely mechanistic, it is deeply, inescapably human”22 and that while complex and opaque “the ‘black box’ is full of people.”23 prerequisites to an xai strategy three questions are important for any xai strategy: • what constitutes a good explanation? • who is the explanation for? • how will the explanation be provided? explanations are context specific. the “goodness” of an explanation is dependent on the needs and objectives of the explainee (a user) and the explainer (an xai). following research from the fields of psychology and cognitive science, keil suggests five reasons for why someone wants an explanation: (1) to predict similar events in the future, (2) to diagnose, (3) to assess blame or guilt, (4) to justify or rationalize an action, and (5) for aesthetic pleasure.24 for most people, explanations need not be complete or even fully accurate.25 as a result, who the explanation is for is critical to a good explanation. different audiences have different priorities. system developers are primarily interested in performance explanations while clients focus on effectiveness or efficacy, professionals are concerned about veracity, and regulators are interested in policy implications. nonexpert, lay users of a system want explanations that build trust and provide accountability. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 4 a good explanation is also affected by its presentation. there are temporal and format considerations. explanations can be provided or available in real time and continuously as the process occurs (hence partial explanations) or post hoc and in summary form. interactive explanations are widely preferred but are not always appropriate or actionable. 26 studies have compared textual, visual, and multimodal formats with differing results. familiar textual responses or simple visual explanations such as venn diagrams are often most effective for nonexpert users.27 drawing from philosophy, psychology, and cognitive science, miller recommends four approach es for xai.28 explanations are contrastive. when people want to know the “why” of something, “people do not ask why event p happened, but rather why event p happened instead of some event q.” explanations are selected. “humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.” explanations are social. “they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs.” finally, miller cautions against using probabilities and statistical relationships and encourages references to causes. burrell identifies three key barriers to explainability: concealment, the limited technical understanding of the user, and an incompatibility between the user (human) and algorithmic reasoning.29 while concealment is deliberate, it may or may not be justified. protecting ip and trade secrets is acceptable while obscuring processes to purposively deceive users is not. regulations are a tool to moderate the former and minimize the latter. the technical limitations of users and the incompatibility between users and algorithms suggest two remedies. first is enhancing algorithmic literacy. algorithmic literacy is a “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.”30 libraries have a key role in advancing algorithmic literacy in their communities.31 just as libraries championed information literacy through the promulgation of standards and principles, the provision of diverse educational programming, and the engagement of the broad academic community, so too can libraries be central to efforts to enhance algorithmic literacy. second is a requirement that xai must be sensitive to the abilities and needs of different users. a survey of the key challenges and research direction of xai identified 39 issues, including the need to understand and enhance the user experience, match xai to user expertise, and explain the competencies of ai systems to users.32 this is the essence of human-centered explainable ai (hcxai). among hcxai principles are the importance of context (regarding user objectives, decision consequences, timing, modality, and intended audience), the value of using hybrid explanation methods that complement and extend each other, and the power of contrastive examples and approaches.33 proofs and validations xai that provide proofs or validations can be adopted by libraries to assess and evaluate machine learning utilized in systems, services, and collections. since proofs pertain to already interpretable systems, the four examples provided focus on validations: feature audit, approximation and abstraction, reproducibility, and xai by ai. these techniques may require access to, or information about, the machine learning model. this would include such characteristics as the algorithms used, settings of the parameters and hyperparameters, optimization choices, and the training data. while all these may not be normally information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 5 available, designers of machine learning systems in consequential settings should expect to provide, indeed be required to provide, such access. similarly, vendors of library content or systems utilizing machine learning should make explanatory proofs and validations available for library inspection. feature audit feature audit is an explanatory strategy that attempts to reveal the key features (e.g., characteristics of the data or settings of the hyperparameters used to the differentiate data) that have a primary role in the prediction of the algorithm. by isolating these features, it is possible to explain the key components of the decision. feature audit is a standard technique of linear regression, but it is made more difficult in machine learning because of the complexity of the information space (e.g., billions of parameters and high dimensionality). there are various feature audit techniques34 but all of them are “decompositional” in that they attempt to reduce the work of the algorithm to its component parts and then use those results as an explanation.35 feature audit can highlight bias or inaccuracy by revealing incongruence between the data and the prediction. more advanced feature audit techniques (e.g., gradient feature auditing) recognize that features can indirectly influence other features and that these features are not easily detectable as separate, influential elements.36 this interaction among features challenges the strict decompositional approach to feature audit and will likely lead to an increased focus on the relational analysis among and between elements. approximation and abstraction approximation and abstraction are techniques that create a more simplified model to explain the more complex model.37 people seek and accept explanations that “satisfice”38 and are coherent with existing beliefs.39 this recognizes that “an explanation has greater power than an alternative if it makes what is being explained less surprising.”40 approaches such as “model distillation”41 or the “model agnostic” feature reduction of the local interpretable model-agnostic explanations (lime) tool create a simplified presentation of the algorithmic model.42 this approximation or abstraction may compromise accuracy, but it provides an accessible representation that enhances understandability. a different type of approximation or abstraction is a narrative of the machine learning processes utilized that provides sufficient documentation for a reader to act as an explanation of the outcomes. an exemplary case of this is lithium-ion batteries: a machine-generated summary of current research published by springer nature and written by beta writer, an ai or more accurately a suite of algorithms.43 a collaboration of machine learning and human editors, the full production cycle of the book is documented in the introduction.44 in lieu of being able to interrogate the system directly, this detailed account provides an explanation of the system allowing readers to assess the strengths, limitations, and confidence levels of the algorithmic processes and offers a model of what might be necessary for future ai generated texts.45 libraries can utilize this documentation in acquisition or licensing decisions and subsequently make it available as user guides when resources are added to the collection. reproducibility replication is a verification strategy fundamental to science. being able to independently reproduce results in different settings provides evidence of veracity and supports user trust. however, documented problems in reproducing machine learning studies have questioned the information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 6 generalizability of these approaches and undermined their explanatory capacity. for example, an analysis of text mining studies using machine learning for citation screening in the preparation of systemic reviews revealed a lack of key elements to enable replicability (e.g., access to research datasets, software environments used, randomization control, and lack of detail on new methods proposed or employed).46 in response, a “reproducibility challenge” was created by the international conference on learning representations (iclr) to validate 2018 conference submissions and has continued in subsequent meetings.47 more rigorous replication through the availability of all necessary components and the development of standards will be important to this type of verification.48 xai by ai the inherent complexity and opacity of unsupervised learning or reinforcement learning suggests, as xai researcher trevor darrell puts it, “the solution to explainable ai is more ai.”49 in this approach to explanation, oversight ai are positioned as intermediaries between an ai and its users: workers have supervisors; businesses have accountants; schoolteachers have principals. we suggest that the time has come to develop ai oversight systems (“ai guardians”) that will seek to ensure that the various smart machines will not stray from the guidelines their programmers have provided.50 while the prospect of ai guardians may be dystopic, oversight systems performing roles that validate, interrogate, and report are common in code checking tools. generative adversarial networks (gans) have been used to create counterfactual explanations of another machine learning model to enhance explainability.51 with strategic organizational and staffing changes to enhance capabilities, libraries can design and deploy such oversight or adversarial tools with objectives appropriate to the requirements and norms of libraries and the academy. authorization xai that results from authorizations is an area where public policy engagement is needed to ensure xai, and machine learning, are appropriately serving libraries, the academy, and the public at large. three examples are provided: codes and standards, regulation, and audit. codes and standards one approach to explanation, supported by the ai industry and professional organizations, are voluntary codes or standards that encourage explanatory capabilities. these nonbinding principles are a type of self-regulation and are widely promoted as a means of assurance.52 the association for computing machinery’s statement on algorithms highlights seven principles as guides to system design and use: awareness, access and redress, accountability, explanation, data provenance, auditability, validation, and testing. however, the language used is tentative and conditional. designers are “encouraged” to provide explanations and to “encourage” a means for interrogation and auditing “where harm is suspected” (i.e., a post hoc process). despite this, the statement concludes with a strong position on accountability if not explainability: “institutions should be held responsible for decisions made by the algorithms that they use, even if it is not feasible to explain in detail how the algorithms produce their results.”53 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 7 unfortunately, the optimism for self-regulation in explainability is undercut by the poor experience with voluntary mechanisms regarding privacy protection.54 in addition, library associations, library system vendors, and scholarly publishers have been slow to endorse any codes or standards regarding explainability. regulation the most common recommendation for ai oversight and authorization to ensure explainability is the creation of a regulatory agency. specific suggestions include a “neutral data arbiter” with investigative powers like the us federal trade commission,55 a food and drug administration “for algorithms,”56 a standing “commission on artificial intelligence,”57 quasi-governmental agencies such as the council of europe,58 and a hybrid agency model combining certification and liability.59 such agencies would have legislated or delegated powers to investigate, certify, license, and arbitrate on matters relating to ai and algorithms, including their design, use, and effects. there are few calls for an international regulatory agency despite digitally porous national boundaries and the global reach of machine learning.60 that almost no such agencies have been created reveals the strength and influence of the large corporations responsible for developing and deploying most machine learning tools and systems.61 reports comparing regulatory approaches to ai among the european union, the united kingdom, the united states, and canada indicate significantly different approaches but with most proceeding with a “light touch” to avoid competitive disadvantages in a multitrillion dollar global marketplace.62 the introduction of the draft eu artificial intelligence act marks the first major jurisdiction to propose specific ai legislation.63 while the act is fulsome about high-risk ai, it is silent on any notion of “explainable” ai, preferring to focus on the less specific idea of “trustworthy artificial intelligence.” with this the eu appears to retreat from the idea of explainability in the gdpr. an exception to this inertia or backtracking is the development and use of algorithmic impact assessments in both governments and industry. these instruments help prospective users of an algorithmic decision-making system determine levels of explanatory requirements and standards to meet those requirements.64 canada has been a leader in this area with a protocol covering use of these systems in the federal government.65 some identify due process as a possible, if limited, remedy for explainability.66 however, a landmark us case suggests otherwise. in state v. loomis, regarding the use of compas, an algorithmic sentencing system, the court ruled on the role of explanation in due process:67 the wisconsin supreme court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant.68 the petition of the loomis case to the us supreme court was denied, so a higher court ruling on this issue is unavailable.69 advocacy for regulations regarding explainability should be a central concern for libraries. without strong regulatory oversight requiring disclosure and accountability, machine learning information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 8 systems will remain black boxes and presence of these consequential systems in the lives of users will be obscured. audit a commonly recommended approach to ai oversight and explanation is third-party auditing.70 the use of audit and principles of auditing are widely accepted in a variety of areas. 71 in a library context, auditing of ai can be thought of as a reviewing process to achieve transparency or to determine product compliance. auditing is typically done after system implementation, but it can be accomplished at any stage. it is possible to audit design specifications, completed code, cognitive models, or periodic audits of specific decisions.72 the keys to successful audit oversight are clear audit goals and objectives (e.g., what is being audited and for what purpose), acknowledged expertise of the auditors, authority of the auditors to recommend, and authorization of the auditors to investigate. any such auditing responsibility for xai would require the trust of stakeholders such as ai designers, government regulators, industry representatives as well as users themselves. critics of the audit approach have focused on lack of auditor expertise, algorithmic complexity, and the need for approaches that assess the algorithmic system prior to its release. 73 while most audit recommendations assume a public agency in this role, an innovative suggestion is a crowdsourced audit (a form of audit study that involves the recruitment of testers to anonymously assess an algorithmic system; an xai form of the “secret shopper”).74 this approach resembles techniques used by consumer advocates and might indicate the rise of public activists into the xai arena. the complexity of algorithms suggests that a precondition for an audit is “auditability.”75 this would require that ai be designed in such a way that an audit is possible (i.e., inspectable in some manner) while, presumably, not impairing its predictive performance. sandvig et al. propose regulatory changes because “rather than regulating for transparency or misbehavior, we find this situation argues for ‘regulation toward auditability’.”76 auditing is not without its difficulties. there are no industry standards for algorithmic auditing.77 a high-profile development was the recent launch of orcaa (orcaarisk.com), an algorithmic auditing company started by cathy o’neil, a data scientist who has written extensively about the perils of uncontrolled algorithms.78 however, the legitimacy of third-party auditing has been criticized as lacking public transparency and the capacity to demand change.79 while libraries may not be able to create their own auditing capacity, whether collectively or individually, they are encouraged to engage with the emerging algorithmic auditing community to shape auditing practices appropriate for scholarly communication. xai as discovery while xai is primarily a means to validate and authorize machine learning systems, another use of xai is gaining attention. since xai can find new information latent in large and complex datasets, discovery is promoted as “one of the most important achievements of the entire algorithmic explainability project.”80 alkhateeb asks “can scientific discovery really be automated” while invoking the earlier work of swanson which mined the medical literature for new knowledge by connecting seemingly unrelated articles through search.81 an emerging reason for libraries to adopt xai may be as a powerful discovery tool. https://orcaarisk.com/ information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 9 conclusion our lives have become “algorithmically mediated”82 where we are “dependent on computational spectacles to see the world.”83 academic libraries are now sites where systems, services, and collections are increasingly shaped and provided by machine learning. the predictions, recommendations, and decisions of machine learning systems are powerful as well as consequential. however, “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”84 taddeo notes that “delegation without supervision characterises the presence of trust.”85 xai is an essential tool to build that trust. geoffrey hinton, a central figure in the development of machine learning,86 argues that requiring an explanation from an ai system would be “a complete disaster” and that trust and acceptance should be based on the system’s performance, not its explainability.87 this is consistent with the view of many that “if algorithms that cannot be easily explained consistently make better decisions in certain areas, then policymakers should not require an explanation.”88 both these views are at odds with the tenants of critical thought and assessment, and both challenge norms of algorithmic accountability. xai is a dual opportunity for libraries. on one hand, it is a set of techniques, processes, and strategies that enable the interrogation of the algorithmically driven resources that libraries provide to their users. on the other hand, it is a public policy arena where advocacy is necessary to promote and uphold the values of librarianship, the academy, and the public interest in the face of powerful new technologies. many disciplines have engaged with xai as machine learning has impacted their fields.89 xai has been called a “disruptive force” in lis,90 warranting the growing interest in how xai affects the field and how the field might influence it. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 10 endnotes 1 vijay arya et al., “one explanation does not fit all: a toolkit and taxonomy of ai explainability techniques,” arxiv:1909.03012 [cs, stat], 2019, http://arxiv.org/abs/1909.03012; shane t. mueller et al., “explanation in human-ai systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable ai,” arxiv:1902.01876 [cs], 2019, http://arxiv.org/abs/1902.01876; ingrid nunes and dietmar jannach, “a systematic review and taxonomy of explanations in decision support and recommender systems,” user modeling and user-adapted interaction 27, no. 3 (2017): 393–444, https://doi.org/10.1007/s11257-017-9195-0; gesina schwalbe and bettina finzel, “xai method properties: a (meta-) study,” arxiv:2105.07190 [cs], 2021, http://arxiv.org/abs/2105.07190. 2 safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, mass.: harvard university press, 2015); sara wachter-boettcher, technically wrong: sexist apps, biased algorithms, and other threats of toxic tech (new york: w. w. norton, 2017). 3 abeba birhane et al., “the values encoded in machine learning research,” arxiv:2106.15590 [cs], 2021, http://arxiv.org/abs/2106.15590; taina bucher, if ... then: algorithmic power and politics (new york: oxford university press, 2018); sarah myers west, meredith whittaker, and kate crawford, discriminating systems: gender, race, and power in ai (ai now institute, 2019), https://ainowinstitute.org/discriminatingsystems.html. 4 rao aluri and donald e. riggs, “application of expert systems to libraries,” ed. joe a. hewitt, advances in library automation and networking 2 (1988): 1–43; ryan cordell, machine learning + libraries: a report on the state of the field (washington dc: library of congress, 2020), https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf; jason griffey, ed., “artificial intelligence and machine learning in libraries,” library technology reports 55, no. 1 (2019), https://doi.org/10.5860/ltr.55n1; guoying liu, “the application of intelligent agents in libraries: a survey,” program: electronic library and information systems 45, no. 1 (2011): 78–97, https://doi.org/10.1108/00330331111107411; linda c. smith, “artificial intelligence in information retrieval systems,” information processing and management 12, no. 3 (1976): 189–222, https://doi.org/10.1016/0306-4573(76)90005-4. 5 jenny bunn, “working in contexts for which transparency is important: a recordkeeping view of explainable artificial intelligence (xai),” records management journal (london, england) 30, no. 2 (2020): 143–53, https://doi.org/10.1108/rmj-08-2019-0038; cordell, “machine learning + libraries”; andrew m. cox, the impact of ai, machine learning, automation and robotics on the information professions (cilip, 2021), http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report__final_lo.pdf; daniel johnson, machine learning, libraries, and cross-disciplinary research: possibilities and provocations (notre dame, indiana: hesburgh libraries, university of notre dame, 2020), https://dx.doi.org/10.7274/r0-wxg0-pe06; sarah lippincott, mapping the current landscape of research library engagement with emerging technologies in research and learning (washington dc: association of research libraries, 2020), https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies http://arxiv.org/abs/1909.03012 http://arxiv.org/abs/1902.01876 https://doi.org/10.1007/s11257-017-9195-0 http://arxiv.org/abs/2105.07190 http://arxiv.org/abs/2106.15590 https://ainowinstitute.org/discriminatingsystems.html https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.5860/ltr.55n1 https://doi.org/10.1108/00330331111107411 https://doi.org/10.1016/0306-4573(76)90005-4 https://doi.org/10.1108/rmj-08-2019-0038 http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf https://dx.doi.org/10.7274/r0-wxg0-pe06 https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 11 landscape-summary.pdf; thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97; michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28–46, https://doi.org/10.29242/rli.299.3. 6 ridley, “explainable artificial intelligence,” 42. 7 bunn, “working in contexts for which transparency is important,” 151. 8 sebastian palacio et al., “xai handbook: towards a unified framework for explainable ai,” arxiv:2105.06677 [cs], 2021, http://arxiv.org/abs/2105.06677; sahil verma et al., “pitfalls of explainable ml: an industry perspective,” in mlsys journe workshop, 2021, http://arxiv.org/abs/2106.07758; giulia vilone and luca longo, “explainable artificial intelligence: a systematic review,” arxiv:2006.00093 [cs], 2020, http://arxiv.org/abs/2006.00093. 9 wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 17. 10 mueller et al., “explanation in human-ai systems.” 11 isto huvila et al., “information behavior and practices research informing information systems design,” journal of the association for information science and technology, 2021, 1–15, https://doi.org/10.1002/asi.24611. 12 darpa, explainable artificial intelligence (xai) (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 13 matt turek, “explainable artificial intelligence (xai),” darpa, https://www.darpa.mil/program/explainable-artificial-intelligence. 14 julie gerlings, arisa shollo, and ioanna constantiou, “reviewing the need for explainable artificial intelligence (xai),” in proceedings of the hawaii international conference on system sciences, 2020, http://arxiv.org/abs/2012.01007. 15 william j. clancey, “the epistemology of a rule-based expert system—a framework for explanation,” artificial intelligence 20, no. 3 (1983): 215–51, https://doi.org/10.1016/00043702(83)90008-5; william swartout, “xplain: a system for creating and explaining expert consulting programs,” artificial intelligence 21 (1983): 285–325; william swartout, cecile paris, and johanna moore, “design for explainable expert systems,” ieee expert-intelligent systems & their applications 6, no. 3 (1991): 58–64, https://doi.org/10.1109/64.87686. 16 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679. https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.29242/rli.299.3 http://arxiv.org/abs/2105.06677 http://arxiv.org/abs/2006.00093 https://doi.org/10.1002/asi.24611 http://www.darpa.mil/attachments/darpa-baa-16-53.pdf https://www.darpa.mil/program/explainable-artificial-intelligence http://arxiv.org/abs/2012.01007 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1109/64.87686 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 12 17 lilian edwards and michael veale, “slave to the algorithm? why a ‘right to explanation’ is probably not the remedy you are looking for,” duke law & technology review 16 (2017): 18–84; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation’,” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741; margot e. kaminski, “the right to explanation, explained,” berkeley technology law journal 34, no. 1 (2019): 189–218, https://doi.org/10.15779/z38td9n83h; sandra wachter, brent mittelstadt, and luciano floridi, “why a right to explanation of automated decision-making does not exist in the general data protection regulation,” international data privacy law 7, no. 2 (2017): 76–99, https://doi.org/10.1093/idpl/ipx005. 18 amina adadi and mohammed berrada, “peeking inside the black-box: a survey on explainable artificial intelligence (xai),” ieee access 6 (2018): 52138–60, https://doi.org/10.1109/access.2018.2870052; mueller et al., “explanation in human-ai systems”; vilone and longo, “explainable artificial intelligence.” 19 schwalbe and finzel, “xai method properties.” 20 or biran and courtenay cotton, “explanation and justification in machine learning: a survey” (international joint conference on artificial intelligence, workshop on explainable artificial intelligence (xai), melbourne, 2017), http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf. 21 padilla, responsible operations. 22 jenna burrell and marion fourcade, “the society of algorithms,” annual review of sociology 47, no. 1 (2021): 231, https://doi.org/10.1146/annurev-soc-090820-020800. 23 nick seaver, “seeing like an infrastructure: avidity and difference in algorithmic recommendation,” cultural studies 35, no. 4–5 (2021): 775, https://doi.org/10.1080/09502386.2021.1895248. 24 frank c. keil, “explanation and understanding,” annual review of psychology 57 (2006): 227– 54, https://doi.org/10.1146/annurev.psych.57.102904.190100. 25 donald a. norman, “some observations on mental models,” in mental models, ed. dedre gentner and albert l. stevens (new york: psychology press, 1983), 7–14. 26 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; joachim diederich, “methods for the explanation of machine learning processes and results for non-experts,” psyarxiv, 2018, https://doi.org/10.31234/osf.io/54eub. 27 pigi kouki et al., “user preferences for hybrid explanations,” in proceedings of the eleventh acm conference on recommender systems, recsys ’17 (new york, ny: acm, 2017), 84–88, https://doi.org/10.1145/3109859.3109915. https://doi.org/10.1609/aimag.v38i3.2741 https://doi.org/10.15779/z38td9n83h https://doi.org/10.1093/idpl/ipx005 https://doi.org/10.1109/access.2018.2870052 http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf https://doi.org/10.1146/annurev-soc-090820-020800 https://doi.org/10.1080/09502386.2021.1895248 https://doi.org/10.1146/annurev.psych.57.102904.190100 https://doi.org/10.1145/3173574.3174156 https://doi.org/10.31234/osf.io/54eub https://doi.org/10.1145/3109859.3109915 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 13 28 tim miller, “explanation in artificial intelligence: insights from the social sciences,” artificial intelligence 267 (2019): 3, https://doi.org/10.1016/j.artint.2018.07.007. 29 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512. 30 duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 2, https://doi.org/10.1145/3313831.3376727. 31 michael ridley and danica pawlick-potts, “algorithmic literacy and the role for libraries,” information technology and libraries 40, no. 2 (2021), https://doi.org/doi.org/10.6017/ital.v40i2.12963. 32 waddah saeed and christian omlin, “explainable ai (xai): a systematic meta-survey of current challenges and future opportunities,” arxiv:2111.06420 [cs], 2021, http://arxiv.org/abs/2111.06420. 33 shane t. mueller et al., “principles of explanation in human-ai systems” (explainable agency in artificial intelligence workshop, aaai 2021), http://arxiv.org/abs/2102.04972. 34 sebastian bach et al., “on pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation,” plos one 10, no. 7 (2015): e0130140, https://doi.org/10.1371/journal.pone.0130140; biran and cotton, “explanation and justification in machine learning: a survey”; chris brinton, “a framework for explanation of machine learning decisions” (ijcai-17 workshop on explainable ai (xai), melbourne: ijcai, 2017), http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf; chris olah, alexander mordvintsev, and ludwig schubert, “feature visualization,” distill, november 7, 2017, https://doi.org/10.23915/distill.00007. 35 edwards and veale, “slave to the algorithm?” 36 philip adler et al., “auditing black-box models for indirect influence,” knowledge and information systems 54 (2018): 95–122, https://doi.org/10.1007/s10115-017-1116-3. 37 alisa bokulich, “how scientific models can explain,” synthese 180, no. 1 (2011): 33–45, https://doi.org/10.1007/s11229-009-9565-1; keil, “explanation and understanding.” 38 herbert a. simon, “what is an ‘explanation’ of behavior?,” psychological science 3, no. 3 (1992): 150–61, https://doi.org/10.1111/j.1467-9280.1992.tb00017.x. 39 norbert schwarz et al., “ease of retrieval as information: another look at the availability heuristic,” journal of personality and social psychology 61, no. 2 (1991): 195–202, https://doi.org/10.1037/0022-3514.61.2.195; paul thagard, “evaluating explanations in law, science, and everyday life,” current directions in psychological science 15, no. 3 (2006): 141– 45, https://doi.org/10.1111/j.0963-7214.2006.00424.x. https://doi.org/10.1016/j.artint.2018.07.007 https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/3313831.3376727 https://doi.org/doi.org/10.6017/ital.v40i2.12963 http://arxiv.org/abs/2111.06420 http://arxiv.org/abs/2102.04972 https://doi.org/10.1371/journal.pone.0130140 http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf https://doi.org/10.23915/distill.00007 https://doi.org/10.1007/s10115-017-1116-3 https://doi.org/10.1007/s11229-009-9565-1 https://doi.org/10.1111/j.1467-9280.1992.tb00017.x https://doi.org/10.1037/0022-3514.61.2.195 https://doi.org/10.1111/j.0963-7214.2006.00424.x information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 14 40 tania lombrozo, “explanatory preferences shape learning and inference,” trends in cognitive sciences 20, no. 10 (2016): 756, https://doi.org/10.1016/j.tics.2016.08.001. 41 sarah tan et al., “detecting bias in black-box models using transparent model distillation,” arxiv:1710.06169 [cs, stat], november 18, 2017, http://arxiv.org/abs/1710.06169. 42 marco tulio ribeiro, sameer singh, and carlos guestrin, “model-agnostic interpretability of machine learning,” arxiv:1606.05386 [cs, stat], 2016, http://arxiv.org/abs/1606.05386. 43 beta writer, lithium-ion batteries: a machine-generated summary of current research (heidelberg: springer nature, 2019), https://link.springer.com/book/10.1007/978-3-03016800-1. 44 henning schoenenberger, christian chiarcos, and niko schenk, preface to lithium-ion batteries; a machine-generated summary of current research, by beta writer, (heidelberg: springer international publishing, 2019). 45 michael ridley, “machine information behaviour,” in the rise of ai: implications and applications of artificial intelligence in academic libraries, ed. sandy hervieux and amanda wheatley (association of college and university libraries, 2022). 46 babatunde kazeem olorisade, pearl brereton, and peter andras, “reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist,” journal of biomedical informatics 73 (2017): 1–13, https://doi.org/10.1016/j.jbi.2017.07.010; babatunde k. olorisade, pearl brereton, and peter andras, “reproducibility in machine learning-based studies: an example of text mining,” in reproducibility in ml workshop (international conference on machine learning, sydney, australia, 2017), https://openreview.net/pdf?id=by4l2pbq-. 47 joelle pineau, “reproducibility challenge,” october 6, 2017, http://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html. 48 benjamin haibe-kains et al., “transparency and reproducibility in artificial intelligence,” nature 586, no. 7829 (2020): e14–e16, https://doi.org/10.1038/s41586-020-2766-y; benjamin j. heil et al., “reproducibility standards for machine learning in the life sciences,” nature methods, august 30, 2021, https://doi.org/10.1038/s41592-021-01256-7. 49 cliff kuang, “can a.i. be taught to explain itself?,” the new york times magazine, november 21, 2017, 50, https://nyti.ms/2hr1s15. 50 amitai etzioni and oren etzioni, “incorporating ethics into artificial intelligence,” the journal of ethics 21, no. 4 (2017): 403–18, https://doi.org/10.1007/s10892-017-9252-2. 51 kamran alipour et al., “improving users’ mental model with attention-directed counterfactual edits,” applied ai letters, 2021, e47, https://doi.org/10.1002/ail2.47. 52 association for computing machinery, statement on algorithmic transparency and accountability (new york: acm, 2017), http://www.acm.org/binaries/content/assets/publicpolicy/2017_joint_statement_algorithms.pdf; alex campolo et al., ai now 2017 report (new https://doi.org/10.1016/j.tics.2016.08.001 http://arxiv.org/abs/1710.06169 http://arxiv.org/abs/1606.05386 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://doi.org/10.1016/j.jbi.2017.07.010 https://openreview.net/pdf?id=by4l2pbqhttp://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html https://doi.org/10.1038/s41586-020-2766-y https://doi.org/10.1038/s41592-021-01256-7 https://nyti.ms/2hr1s15 https://doi.org/10.1007/s10892-017-9252-2 https://doi.org/10.1002/ail2.47 http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 15 york: ai now institute, 2017); ieee, ethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems (new york: ieee, 2019), https://standards.ieee.org/content/dam/ieeestandards/standards/web/documents/other/ead1e.pdf. 53 association for computing machinery, statement on algorithmic transparency and accountability, 2. 54 lilian edwards and michael veale, “enslaving the algorithm: from a ‘right to an explanation’ to a ‘right to better decisions’?,” ieee security & privacy 16, no. 3 (2018): 46–54. 55 kate crawford and jason schultz, “big data and due process: toward a framework to redress predictive privacy harms,” boston college law review 55, no. 1 (2014): 93–128. 56 andrew tutt, “an fda for algorithms,” administrative law review 69, no. 1 (2017): 83–123. 57 corinne cath et al., “artificial intelligence and the ‘good society’: the us, eu, and uk approach,” science and engineering ethics, march 28, 2017, https://doi.org/10.1007/s11948-017-9901-7. 58 edwards and veale, “slave to the algorithm?” 59 matthew u. scherer, “regulating artificial intelligence systems: risks, challenges, competencies, and strategies,” harvard journal of law & technology 29, no. 2 (2016): 353– 400. 60 roger brownsword, “from erewhon to alphago: for the sake of human dignity, should we destroy the machines?,” law, innovation and technology 9, no. 1 (january 2, 2017): 117–53, https://doi.org/10.1080/17579961.2017.1303927. 61 birhane et al., “the values encoded in machine learning research”; ana brandusescu, artificial intelligence policy and funding in canada: public investments, private interests (montreal: centre for interdisciplinary research on montreal, mcgill university, 2021). 62 cath et al., “artificial intelligence and the ‘good society’”; law commission of ontario and céline castets-renard, comparing european and canadian ai regulation, 2021, https://www.lcocdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulationfinal-november-2021.pdf. 63 european commission, “artificial intelligence act,” 2021, https://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:52021pc0206. 64 dillon reisman et al., algorithmic impact assessment: a practical framework for public agency accountability (new york: ai now institute, 2018), https://ainowinstitute.org/aiareport2018.pdf. 65 treasury board of canada secretariat, “directive on automated decision-making,” 2019, http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://doi.org/10.1007/s11948-017-9901-7 https://doi.org/10.1080/17579961.2017.1303927 https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://ainowinstitute.org/aiareport2018.pdf http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 16 66 danielle keats citron and frank pasquale, “the scored society: due process for automated predictions,” washington law review 89 (2014): 1–33; scherer, “regulating artificial intelligence systems.” 67 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. 68 “state v. loomis,” harvard law review 130, no. 5 (2017), https://harvardlawreview.org/2017/03/state-v-loomis/. 69 “loomis v. wisconsin,” scotusblog, june 26, 2017, http://www.scotusblog.com/casefiles/cases/loomis-v-wisconsin/. 70 brownsword, “from erewhon to alphago”; campolo et al., ai now 2017 report; ieee, ethically aligned design; pasquale, the black box society: the secret algorithms that control money and information; wachter, mittelstadt, and floridi, “why a right to explanation.” 71 michael power, the audit society: rituals of verification (oxford: oxford university press, 1997). 72 alfred ng, “can auditing eliminate bias from algorithms?,” the markup, february 23, 2021, https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-fromalgorithms. 73 joshua alexander knoll, “accountable algorithms” (phd diss, princeton university, 2015). 74 christian sandvig et al., “auditing algorithms: research methods for detecting discrimination on internet platforms,” data and discrimination: converting critical concerns into productive inquiry, 2014, http://wwwpersonal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20-%20ica%202014%20data%20and%20discrimination%20preconference.pdf. 75 association for computing machinery, statement on algorithmic transparency and accountability. 76 sandvig et al., “auditing algorithms,” 17. 77 ng, “can auditing eliminate bias from algorithms?” 78 cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016). 79 emanuel moss et al., assembling accountability: algorithmic impact assessment for the public interest (data & society, 2021), https://datasociety.net/wpcontent/uploads/2021/06/assembling-accountability.pdf. 80 david s. watson and luciano floridi, “the explanation game: a formal framework for interpretable machine learning,” synthese (dordrecht) 198, no. 10 (2020): 9214, https://doi.org/10.1007/s11229-020-02629-9. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing https://harvardlawreview.org/2017/03/state-v-loomis/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://doi.org/10.1007/s11229-020-02629-9 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 17 81 ahmed alkhateeb, “science has outgrown the human mind and its limited capacities,” aeon, april 24, 2017, https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limitedcapacities; don r. swanson, “undiscovered public knowledge,” the library quarterly 56, no. 2 (1986): 103–18; don r. swanson, “medical literature as a potential source of new knowledge.,” bulletin of the medical library association 78, no. 1 (1990): 29–37. 82 jack anderson, “understanding and interpreting algorithms: toward a hermeneutics of algorithms,” media, culture & society 42, no. 7–8 (2020): 1479–94, https://doi.org/10.1177/0163443720919373. 83 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 24. 84 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computermediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 59. 85 mariarosaria taddeo, “trusting digital technologies correctly,” minds and machines 27, no. 4 (2017): 565, https://doi.org/10.1007/s11023-017-9450-5. 86 cade metz, genius makers: the mavericks who brought ai to google, facebook, and the world (dutton, 2021). 87 tom simonite, “google’s ai guru wants computers to think more like brains,” wired, december 12, 2018, https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/. 88 nick wallace, “eu’s right to explanation: a harmful restriction on artificial intelligence,” techzone, january 25, 2017, http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-rightexplanation-harmful-restriction-artificial-intelligence.htm#. 89 mueller et al., “explanation in human-ai systems.” 90 bunn, “working in contexts for which transparency is important,” 143. https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://doi.org/10.1177/0163443720919373 https://doi.org/10.1007/s11023-017-9450-5 https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/ http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm abstract introduction what is xai? types of xai prerequisites to an xai strategy proofs and validations feature audit approximation and abstraction reproducibility xai by ai authorization codes and standards regulation audit xai as discovery conclusion endnotes 50 information technology and libraries | june 2009 andrew k. pace president’s message: lita forever andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. i was warned when i started my term as lita president that my time at the helm would seem fleeting in retrospect, and i didn’t believe it. i should have. i suppose most advice of that sort falls on deaf ears—advice to children about growing up, advice to newlyweds, advice to new parents. some things you just have to experience. now i am left with that feeling of having worked very hard while not accomplishing nearly enough. it’s time to buy myself some more time. my predecessor, mark beatty, likes to jokingly introduce himself in ala circles as “lita has-been” in reference to his role as lita past-president. i say jokingly because he and i both know it is not true. not only does the past-president continue in an active role on the lita board and executive committee, the past-president has the daunting task of acting as the division’s financial officer. just as mark knows well the nature of this elected (but still volunteer) commitment, so michelle frisque, my successor this july, knows that the hard work started as vice-president/ president-elect has two challenging years ahead. being elected lita president is for all intents and purposes a three-year term with shifting responsibilities. add to this the possibility of serving on the board beforehand, and it’s likely that one could serve less time for knocking over a liquor store. i’m joking, of course— there’s nothing punitive about being a lita officer; it’s as rewarding as it is challenging. neither is this intended to be a self-congratulatory screed as my last hurrah in print as lita president. i’ve referred repeatedly to the grassroots success of lita’s board, interest groups, dedicated committees, and engaged volunteers. the flatness of our division is often emulated by others. i thoroughly enjoy engagement with the lita membership, face-to-face and virtual recruitment of new members and volunteers, and group meetings to discuss moving lita forward. i love that lita is fun. fun and enjoyment, coupled with my dedication to the profession that i love, is why i plan to make the most of my time, even as a has-been. all those meetings, all that bureaucracy? well, believe it or not, i like the bureaucracy—process works when you learn to work the process—and all those meetings have actually created some excellent feedback for the lita board. changes in ala, changes in the membership, and changes suggested by committees and interest groups all suggest . . . guess what? change. “change” has been a popular theme these days. i’m in that weird minority of people who does not believe that people don’t like to change. i think if the ideas are good, if the destination is worthwhile, then change is possible and even desirable. i’m always geared up for change, for learning from our mistakes, for asking forgiveness on occasion and for permission even less. this is a long-winded way of saying that i think lita is ready for some change. change to the board, change to the committees and interest groups, and changes to our interactions with lita and ala staff. i think ala and the other divisions are anxious for change as well, and i feel confident that lita and its membership can help, even while we change ourselves. don’t ask me today what the details of these changes are. all i can say is that i will be there for them, help see them through, and will be there on the other side to asses which changes worked and which didn’t. one thing i hope does not change is the passion and dedication of the leaders, volunteers, and members of this great organization. i only hope that our ranks grow, even in times of financial uncertainty. lita provides a valuable network of colleagues and friends—this network is always valuable, but it is indispensible in times of difficulty. for many, lita represents a second or third divisional membership, but for networking and collegial support, i think we are second to none. i titled my previous column “lita now.” i think it’s safe for me to say now, “lita forever.” 2 information technology and libraries | march 2008 currently we librarians seem to be hitching our wagon to the idea of library as community because in part it’s what we ourselves want. we’ve seen that our lita members want more community from our association, so it makes sense to us that our patrons also want community. it’s what pew, oclc, and other studies seem to be telling us. the business-wired side of the world is breaking their backs to create every form of virtual community they can think of as quickly as possible. apply the appropriate amounts of marketing and then our patrons want those things and expect them from all of their historically important community resources, the library being a prime player in that group. so we strive and strive and strive to not only provide the standard issue face-to-face community we’ve always created, but to also create that new highly desired virtual community. either we create a library-specific version, or we at the very least create a way for our patrons to access those communities. hopefully, when our patrons step into those virtual communities, we work to make it possible for them to find libraries there, too. all well and good, but do we have a plan? what’s the goal? what’s the end achievement? if, as studies say, patrons with a research need turn to libraries first only one percent of the time, and instead first hit up friends and family fifty or more percent of the time, then where is our significance and place in either the physical or virtual spaces? we know we serve significant numbers in many ways. we have gate counts, circulation records, holds placed, warm bodies in the building—all manners of indicators that show a well-managed and -marketed library is in demand and appreciated. as we run into the terrible head-on crash of community and technology, willy-nilly doing absolutely everything we can to accommodate everyone and everything, because we’re librarians and library technologists and that’s what we do, do we really have a clue why we’re doing it? all fodder for deep thought and many lattes or beers and late night discussions. on the lita side, though, we’re embarking on doing something about this knot when it comes to serving our members. under the guidance of past-president bonnie postlethwaite we’ve established an assessment and research committee co-chaired by bonnie and diane bisom. to kick off the committee activities and to help them establish an agenda and direction, lita hired the research firm the wedewer group to work with the lita board and the new committee. stay tuned for reports and announcements from this committee as it works to find answers to some of those questions. and have that latte with a lita colleague as you seek to find some answers yourself. it’s all part of building community. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s message: doing something about life’s persistent problems? mark beatty reproduced with permission of the copyright owner. further reproduction prohibited without permission. wikiwikiwebs: new ways to communicate in a web environment chawner, brenda;lewis, paul h information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 33 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. examining attributes of open standard file formats for long-term preservation and open access eun g.park and sam oh information technology and libraries | december 2012 44 abstract this study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats to establish open-standard file-formatselection criteria. a comprehensive review was undertaken to identify the current knowledge regarding file-format-selection criteria. the findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability, and independence. these attributes appear to be closely related. additional attributes include presentation, authenticity, adoption, protection, preservation, reference, and others. introduction file format is one of the core issues in the fields of digital content management and digital preservation. as many different types of file formats are available for texts, images, graphs, audio recordings, videos, databases, and web applications, the selection of appropriate file formats poses an ongoing challenge to libraries, archives, and other cultural heritage institutions. some file formats appear to be more widely accepted: tagged image file format (tiff), portable document format (pdf), pdf/a, office open xml (ooxml), and open document format (odf), to name a few. many institutions, including the library of congress (lc), possess guidelines on file format applications for long-term preservation strategies that specify requisite characteristics of acceptable file formats (e.g., they are independent of specific operating systems, are independent of hardware and software functions, conform to international standards, etc.).1 the format descriptions database of the global digital format registry is an effort to maintain a detailed representation of information and sustainability factors for as many file formats as possible (the pronom technical registry is another such database).2 despite these developments, file format selection remains a complex task and prompts many questions that range from a general interest (“which selection criteria are appropriate?”) to more specific (“are these international standard file formats sufficient for us to ensure long term preservation and access?” or “how should we define and implement standard file formats in harmony with our local context?”). in this study, we investigate the definitions and features of standard file formats and examine the eun g. park (eun.park@mcgill.ca) is associate professor, school of information studies, mcgill university, montreal, canada. sam oh (samoh@skku.edu) is corresponding author and professor, department of library and information science, sungkyunkwan university, seoul, korea. mailto:eun.park@mcgill.ca mailto:samoh@skku.edu information technology and libraries | december 2012 45 major attributes of assessing file formats. we discuss relevant issues from the viewpoint of openstandard file formats for long-term preservation and open access. background on standard file formats the term file format is generally defined as what “specifies the organization of information at some level of abstraction, contained in one or more byte streams that can be exchanged between systems.”3 according to interpares 2, file format is “the organization of data within files, usually designed to facilitate the storage, retrieval, processing, presentation, and/or transmission of the data by software.”4 the premis data dictionary for preservation metadata observes that, technically, file format is “a specific, pre-established structure for the organization of a digital file or bitstream.”5 in general, file format can be divided into two types: an access format and a preservation format. an access format is “suitable for viewing a document or doing something with it so that users access the on-the-fly converted access formats.”6 in comparison, a preservation format is “suitable for storing a document in an electronic archive for a long period”7; it provides “the ability to capture the material into the archive and render and disseminate the information now and in the future.”8 while the ability to ensure long-term preservation focuses on the sustainability of preservation formats, the document in its access format tends to emphasize that it should be accessible and available by users, presumably all of the time. many researchers have discussed file formats and long-term preservation in relation to various types of resources. for example, folk and barkstrom describe and adopt several attributes of file formats that may affect the long-term preservation of scientific and engineering data (e.g., the ease of archival storage, ease of archival access, usability, data scholarship enablement, support for data integrity, and maintainability and durability of file formats).9 barnes suggests converting word processing documents in digital repositories, which are unsuitable for long-term storage, into a preservation format.10 the evaluation by rauch, krottmaier, and tochtermann illustrates the practical use of file formats for 3d objects in terms of long-term reliability.11 others have developed and/or applied numerous criteria in different settings. for instance, sullivan uses a list of desirable properties of a long-term preservation format to explain the purpose of pdf)/a from an archival and records management prospective.12 sullivan cites device independence, self-containment, self-describing, transparency, accessibility, disclosure, and adoption as such properties. rauch, krottmaier, and tochtermann’s study applies criteria that consist of technical characteristics (e.g., open specification, compatibility, and standardization) and market characteristics (e.g., guarantee duration, support duration, market penetration, and the number of independent producers). rog and van wijk propose a quantifiable assessment method to calculate composite scores of file formats.13 they identify seven main categories of criteria: openness, adoption, complexity, technical protection mechanism, self-documentation, robustness, and dependencies. sahu focuses on the criteria developed by the uk’s national archives, which include open standards, ubiquity, stability, metadata support, feature set, examining attributes of open standard file formats for long-term preservation and open access | park and oh 46 interoperability, and viability.14 a more comprehensive evaluation by the lc reveals three components—technical factors, quality, and functionality—while placing a particular emphasis on the balance between the first two.15 hodge and anderson use seven criteria for sustainability, which are similar to the technical factors of the lc study: disclosure, adoption, transparency, selfdocumentation, external dependencies, impact of patents, and technical protection mechanisms.16 some institutions adopt another term, standard file formats, to differentiate accepted and recommended file formats from others. according to the david project, “standard file formats owe their status to (official) initiatives for standardizing or to their widespread use.”17 standard may be too general to specify the elements of file formats. however, there is a recognition that only those file formats accepted and recommended by national or international standard organizations (such as the international standardization organization [iso], international industry imaging association [i3a], www consortium, etc.) are genuine standard file formats. for example, iso has announced several standard file formats for images: tiff/it (iso 12639:2004), png (iso/iec 15948:2004), and jpeg 2000 (iso/iec 15444:2003, 2004, 2005, 2007, 2008). for document file formats, pdf/a-1 (iso standard 19005-1. document file format for long-term preservation) is one example. this format is proprietary to maintain archival and recordsmanagement requirements and to preserve the visual appearance and migration needs of electronic documents. office open xml file format (iso/iec 29500–1:2008. information technology—document description and processing languages) is another open standard that can be implemented from microsoft office applications on multiple platforms. odf (iso/iec 26300:2006. information technology—open document format for office applications [opendocument] v1.0) is an xml-based open file format. regardless of iso-announced standards, some errors in these file formats have been reported. for example, although pdf/a-1 is for longterm preservation of and access to documents, studies reveal that the feature-rich nature of pdf can create difficulties in preserving pdf information over time.18 to overcome the barriers of pdf and pdf/a-1, xml technology seems prevalent for digital resources in archiving systems and digital preservation.19 the digital repository community is treating xml technology as a panacea and converting most of their digital resources to xml. the netherlands institute for scientific information service (nisis) adopts another noteworthy definition of standard file formats. it observes that standard image file formats “are widely accepted, have freely available specifications, are highly interoperable, incorporate no data compression and are capable of supporting preservation metadata.”20 this definition implies specific and advanced ramifications for cost-free interoperability and metadata, which closely relates to open access. open standard is another relevant term to consider in file formats. although perspectives vary greatly between researchers, open standards can be acquired and used without any barrier or cost.21 in other words, open standard products are free from restrictions, such as patents, and are independent of proprietary hardware or software. since the 1990s, open standard has been broadly adopted in many fields and is now an almost compulsory feature in information services. information technology and libraries | december 2012 47 to follow the national archives’ definition, open standard formats are “formats for which the technical specifications have been made available in the public domain.”22 in comparison, the folk and barkstrom approach opens standards from institutional support perspectives, relying on user communities for standards that are widely available and used.23 on a more specific level, stanescu emphasizes independence as the basic selection criteria for file formats.24 others, such as todd, propose determining whether a standard should be more open than others by applying criteria: adoption, platform independence, disclosure, transparency, and metadata support.25 other factors considered by todd include reusability and interoperability; robustness, complexity, and viability; stability; and intellectual property (ip) and rights management.26 echoing the lc, hodge and anderson also suggest a list of selection criteria that have been grouped under the banner of “technical factors”: disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms.27 researchers agree that open standard file formats are less obsolete and more reliable than proprietary formats.28 close examination of the nisis definition mentioned above reveals that standard file formats are in reality not free, nor do they allow unrestricted access to resources. the three file formats that iso has announced (pdf/a, ooxml, and odf) are proprietary and sometimes costly. they also prohibit the purchase of access to a proprietary standard, although there is an assumption that a standard should be free from legal and financial restrictions. the iso-announced file formats, in short, are only standard file formats, not open standard file formats. for cultural heritage institutions, questions regarding appropriate selection criteria and the sufficiency of existing international standard file formats for long-term preservation and access remain unanswered. there exists neither a uniform method to compare the specifications of different file formats nor an objective approach to assess format specifications that would ensure long-term preservation and persistent access. objectives of the study in this study, we attempt to better define and establish open-standard file-format-selection criteria. to that end, we assess and compile the most frequently used attributes of file formats to establish open-standard file-format-selection criteria. method we performed a comprehensive review of published articles, institutional reports, and other literature to identify the current knowledge regarding file-format-selection criteria. we included literature that deals with the three standard file formats (pdf, pdf/a, and xml) but excluded the recently announced odf format due to the scarcity of literature on odf. among more than the thirty articles initially reviewed, only twenty-five that use their own clear attributes were included in this study. all of the attributes that we have employed are listed by frequency and grouped according to similarities in meaning (see appendix). the original definitions or descriptions that we used are listed in the second column. the file formats that we assessed by their attributes are examining attributes of open standard file formats for long-term preservation and open access | park and oh 48 listed in the third column. when we give attributes without specific definitions or descriptions, “no definite term” is inserted. findings as illustrated in the appendix, the criteria identified by the studies vary. although the requirements and context of the studies may differ, the most common criteria can be divided into five categories: functionality, metadata, openness, interoperability, and independence. first, functionality refers to the ability of a format to do exactly what it is supposed to be doing.29 it is important to distinguish between two broad uses: preservation of document structure and formatting and preservation of useable content. to preserve document formatting, a “published view” of a given piece of content is critical for distribution. other content, such as database information or device-specific documents, needs to be preserved as well. functionality criteria include various attributes related to formats and structure or physical and technical specifications of files (e.g., robustness, feature set, viability, color maintenance, clarity, compactness, modularity, compression algorithms, etc.). second, metadata indicates that a format allows rich descriptive and technical metadata to be embedded in files. metadata can be expressed as metadata support, self-documentation (selfdocumenting), documentation, content-level (as opposed to presentation-level) description, selfdescribing, self-describing files, formal description of format, etc. third, openness refers to specifications of a file format that are publicly available and accessible and formats that are not proprietary. whether seen as a single definition or as a set of criteria, the characteristic that appears to be at the core of the open standard movement is its independence from outside proprietary or commercial control. openness also may refer to the autonomy of a file format, which relies on several factors. first, the document should be self-contained in terms of the content information (e.g., the text), the structural information (i.e., for those documents that are structured), the formatting information (e.g., fonts, colours, styles, etc.), and the metadata information. self-containment does not necessarily mean that an archivist will only have one document to deal with. it does mean, however, that they will have documents that will provide them with all the information to access and process the content, structure, formatting, and metadata. openness is expressed as open availability by some researchers.30 other researchers adopt the term disclosure for expressing that specification is publicly available.31 fourth, is the independence of a document from proprietary or commercial hardware and software configurations, especially to prevent any issues resulting from different versions of software, hardware, and operating systems. this aspect is expressed in the appendix as open standards, open source software or equivalent, standard/proprietary, etc. this also closely relates to independence, one of the five categories in the appendix, expressed as device independencies, independent implementations, no external dependency, no external dependencies, portability, and monitoring obsolescence. having documents in a proprietary format controlled by a third party information technology and libraries | december 2012 49 implies that, at one time or another, this format may no longer be supported, or that a change in the user agreement may lead to restricted access, access to outdated material, or patent and copyright issues. this fact means that the document must be freely accessible, without password restrictions or protection, and without any digital rights management scheme. blocking access to a document with a password can lead to serious problems if the password gets lost. in addition, the size and compactness of the document will influence the selection of a file format. fifth, interoperability primarily refers to the ability of a file format to be compatible with other formats and to exchange documents without loss of information.32 specifically, it refers to the ability of a given software to open a document without requiring any special application, plug-in, codec, or proprietary add-on. adherence to open source standards is usually a good indication of the interoperability of a format. in general, an open standard is released after years of bargaining and agreements between major players. supervision by an international standard (such as iso or the w3c) commonly helps propagate the format. in addition to the five categories mentioned above, other attributes are often used. presentation, authenticity, adoption, protection, preservation and reference are such examples. among these attributes, authenticity, although this is the seventh in the appendix, is one of the most important attributes in archives and records management. it refers to the ability to guarantee that a file is what it originally was without any corruption or alteration.33 specific to authenticity is data integrity, which assesses the integrity of the file through an internal mechanism (e.g., png files include byte sequences to validate against errors). another method of validating the authenticity of a document is to look at its traceability,34 that is, the traces left by the original author and those who modified or opened a file. one example is the difference between the creation date, modification date, and access date of any file on a personal computer. these three dates correspond to a moment when someone (often a different person each time) opened the file. other mechanisms may require log information, which is external to the file. another good indication of authenticity is the stability of a format.35 a format that is widely used is more likely to be stable. a stable format is also more likely to cause less data loss and corruption; hence it is a better indicator of authenticity. presentation includes attributes related to presenting and rendering data, expressed as distributing a page image, normal rendering, self-containment, selfcontained, and beyond normal rendering. adoption indicates how popular and widely a file format is adopted by user communities, also represented as popularity, widely used formats, ubiquity, or continuity. protection includes the technical protection mechanism or source verification to protect with security skills. preservation means long-term preservation, institutional support, or ease of transformation and preservation. reference indicates citability, or referential extensibility. among other attributes, transparency is interesting to note because it indicates the degree to which files are open to direct analysis with basic tools and human readability. another important aspect across these criteria is that the terminologies used in the studies may be quite different yet describe the same or similar concepts from different angles. for instance, rog and van wijk use openness for standardization and specification without restrictions,36 while examining attributes of open standard file formats for long-term preservation and open access | park and oh 50 several other researchers use open availability to convey the same thing.37 they in turn adopt the term disclosure to express that specification is publicly available.38 discussion and conclusion functionality, metadata, openness, interoperability, and independence appear to be the most important factors when selecting file formats. when file formats for long-term preservation and open access are under discussion, cultural heritage institutions need to consider many issues. despite several efforts, it is still tricky for them to identify the most appropriate file format or even to discern acceptable formats from unacceptable formats. where it is difficult to prevent the creation of a new file format, format selection is not an easy task, both in theory and in practice. it is critical, however, to base the decision on a clear understanding of the purpose for which the document is preserved: access preservation or repurposing preservation. cultural heritage institutions and digital repository communities need to guarantee long-term preservation of digital resources in selected file formats. additionally, users find it necessary to have access to digital information in these file formats. additional consideration involves the level of access users may enjoy (e.g., long-term access, permanent access, open access, persistent access, etc.). when determining international standard file formats, an aspect of open access should be included because it is a well-liked topic. it is necessary to develop a scale or measurement to assess open-standard format specifications to ensure long-term preservation and open access. identifying which attributes are required to be an open-standard file format and which digital format is most apt for the use and sustainability of long-term preservation is a meaningful task. the outcome of our study provides a framework for appropriate strategies when selecting file formats for long-term preservation and access to digital content. we hope that the criteria described in this study will benefit librarians, preservers, record creators, record managers, archivists, and users. we are reminded of todd’s remark that “the most important action is to align the recognition and weighting of criteria with a clear preservation strategy and keep them under review using risk management techniques.”39 the question of how to adopt and implement these attributes can only be answered in the local context and decisions of each cultural heritage institution.40 each institution should consider implementing a file format throughout the entire life cycle of digital resources, with a holistic approach to managerial, technical, procedural, archival, and financial issues for the purpose of long-term preservation and persistent access. the criteria may change over time, as is necessary for any format to adequately serve its purpose. maintaining its quality may be an ongoing task that cultural heritage institutions should take into account at all times. even more importantly, cultural heritage institutions need to establish and implement a set of standard guidelines specific to each context for the selection of open-standard file formats. note: this research was supported by the sungkyunkwan university research fund (2010-2011). information technology and libraries | december 2012 51 references and notes 1. library of congress, “sustainability of digital formats: planning for library of congress collections,” www.digitalpreservation.gov/formats/intro/intro.shtml (accessed november 21, 2011). 2. global digital format registry, www.gdfr.info (accessed november 17, 2011); the technical registry pronom, www.nationalarchives.gov.uk/aboutapps/pronom (accessed november 21, 2011). 3. mike folk and bruce r. barkstrom, “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries” (paper presented at the joint conference on digital libraries (jcdl), houston, tx, may 27–31, 2003), 1, www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom2 00305.pdf (accessed november 21, 2011). 4. interpares 2 project glossary, p. 24, www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary (accessed november 21, 2011). 5. premis editorial committee, premis data dictionary for preservation metadata, ver. 2.0, march 2008, p. 195, www.loc.gov/standards/premis/v2/premis-2-0.pdf (accessed november 21, 2011). 6. ian barnes, “preservation of word processing documents,” july 14, 2006, p. 4, http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). 7. ibid. 8. gail hodge and nikkia anderson, “formats for digital preservation: a review of alternatives and issues,” information services & use 27 (2007): 46. 9. folk and barkstrom, “attributes of file formats.” 10. barnes, “preservation of word processing documents.” 11. carl rauch, harald krottmaier, and klaus tochtermann, “file-formats for preservation: evaluating the long-term stability of file-formats,” in proceedings of the 11th international conference on electronic publishing 2007 (vienna, austria, june 13–15, 2007): 101–6. 12. susan j. sullivan, “an archival/records management perspective on pdf/a,” records management journal 16, no. 1 (2006): 51–56. 13. judith rog and caroline van wijk, “evaluating file formats for long-term preservation,” 2008, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/intro/intro.shtml http://www.nationalarchives.gov.uk/aboutapps/pronom http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary http://www.loc.gov/standards/premis/v2/premis-2-0.pdf http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 52 14. d. k. sahu, “long term preservation: which file format to use” (paper presented in workshops on open access & institutional repository, chennai, india, may 2–8, 2004), http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). 15. cendi digital preservation task group, “formats for digital preservation: a review of alternatives and issues,” www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). 16. hodge and anderson, “formats for digital preservation.” 17. david 4 project (digital archiving, guideline and advice 4), “standards for fileformats,” 1, www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf (accessed november 21, 2011). 18. sullivan, “an archival/records management perspective on pdf/a”; john michael potter, “formats conversion technologies set to benefit institutional repositories,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf (accessed november 21, 2011). 19. eva müller et al., “using xml for long-term preservation: experiences from the diva project,” in proceedings of the 6th international symposium on electronic theses and dissertations (may 20–24, 2003): 109–16, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/html/index.html (accessed november 21, 2011). 20. rene van horik, “image formats: practical experiences” (paper presented in erpanet training, vienna, austria, may 10–11, 2004), 22, www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf (accessed november 21, 2011). 21. open standard is related to open access, which comes from the open access movement that allows resources to be freely available to the public and permits any user to use those resources (e.g., mainly electronic journals, repositories, databases, software applications, etc.) without financial, legal, or technical barriers. see amy e. c. koehler, “some thoughts on the meaning of open access for university library technical services,” serials review 32, no. 1 (march 2006): 17–21; budapest open access initiative, “read the budapest open access initiative,” www.soros.org/openaccess/read.shtml (accessed november 21, 2011). 22. national archives, “selecting file formats for long-term preservation,” 6, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 23. folk and barkstrom, “attributes of file formats.” http://openmed.nic.in/1363/01/long_term_preservation.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html http://www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf http://www.soros.org/openaccess/read.shtml http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf information technology and libraries | december 2012 53 24. andreas stanescu, “assessing the durability of formats in a digital preservation environment: the inform methodology,” d-lib magazine 10, no. 11 (november 2004), www.dlib.org/dlib/november04/stanescu/11stanescu.html (accessed november 21, 2011). 25. malcolm todd, “technology watch report: file formats for preservation,” www.dpconline.org/advice/technology-watch-reports (accessed november 21, 2011). 26. ibid. 27. hodge and anderson, “formats for digital preservation.” 28. edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science & technology librarianship (spring 2005), www.library.ucsb.edu/istl/05-spring/article2.html (accessed november 21, 2011); carl vilbrandt et al., “cultural heritage preservation using constructive shape modeling,” computer graphics forum 23, no. 1 (2004): 25–41; marshall breeding, “preserving digital information,” information today 19, no. 5 (2002): 48–49. 29. eun g. park, “xml: examining the criteria to be open standard file format,” (paper presented at the interpares 3 international symposium, oslo, norway, september 17, 2010), www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf (accessed november 21, 2011). 30. adrian brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 31. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011); sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 32. the national archives, http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_me thod_27022008.pdf (accessed november 21, 2011); ecma international, “office open xml file formats—ecma-376,” www.ecma-international.org/publications/standards/ecma-376.htm (accessed november 21, 2011). 33. christoph becker et al., “systematic characterisation of objects in digital preservation: the extensible characterisation languages,” www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_bec ker.pdf (accessed november 21, 2011); national archives, http://www.dlib.org/dlib/november04/stanescu/11stanescu.html http://www.dpconline.org/advice/technology-watch-reports http://www.library.ucsb.edu/istl/05-spring/article2.html http://www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 54 www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 34. folk and barkstrom, “attributes of file formats.” 35. national archives, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011); rog and van wijk, “evaluating file formats for long-term preservation.” 36. rog and van wijk, “evaluating file formats for long-term preservation.” 37. see brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 38. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011).; sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 39. todd, “technology watch report,” 33. 40. evelyn peters mclellan, “selecting digital file formats for long-term preservation: interpares 2 project general study 11 final report,” www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf (accessed november 21, 2011). http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf information technology and libraries | december 2012 55 appendix: file format attributes no. attribute definition/description assessed file format 1. f u n c t i o n a l i t y robustness robust against single point of failure, support for file corruption detection, file format stability, backward compatibility and forward compatibility (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) a robust format contains several layers of defense against corruption (frey, 2000). n/a feature set formats supporting the full range of features and functionality (brown, 2003) n/a not defined (sahu, 2006) n/a viability error-detection facilities to allow detection of file corruption (brown, 2003). png format (yes) not defined (sahu, 2006) n/a support for graphic effects and typography not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (no) color maintenance not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (limited) clarity support for high image resolution (cendi, 2007; hodge & anderson, 2007) tiff_g4 (yes) quality this pertains to how well the format fulfills its task today: (1) low space costs, (2) highly encompassing, (3) robust, (4) simplicity, (5) highly tested, (6) loss-free, (7) supports metadata (clausen, 2004). n/a compactness to minimize storage and i/o costs (folk & barkstrom, 2003) n/a simplicity ease of implementing readers (folk & barkstrom, 2003) n/a file corruption detection to be able to detect that a file has been corrupted; to provide errorcorrection (folk & barkstrom, 2003) n/a raw i/o efficiency formats that are organized for fast sequential access (folk & barkstrom, 2003) n/a availability of readers to maintain ease of data access for readers (folk & barkstrom, 2003) n/a ease of subsetting to process only part of data files (folk & barkstrom, 2003) n/a size to transfer data in large blocks (folk & barkstrom, 2003) n/a ability to aggregate many objects in a single file to maintain as small as archive “name space” as possible (folk & barkstrom, 2003) n/a ability to embed data extraction software in the files the files come with read software embedded (folk & barkstrom, 2003). n/a ability to name file elements to work with data based on manipulating the element names instead of binary offsets, or other references (folk & barkstrom, 2003) n/a rigorous definition to be defined in a sufficient rigorous way (folk & barkstrom, 2003) n/a multilanguage implementation of library software to have multiple implementations of readers for a single format (folk & barkstrom, 2003) n/a memory some formats emphasize the presence or absence of memory (frey, 2000). tiff (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 56 accuracy in some cases, the accuracy of the data can be decreased to save memory, e.g., through compression. in the case of a digital master, however, accuracy is very important (frey, 2000). n/a speed the ability to access or display a data set at a certain speed is critical to certain applications (frey, 2000). n/a extendibility a data format can be modified to allow for new types of data and features in the future (frey, 2000). n/a modularity a modular data set definition is designed to allow some of its functionality to be upgraded or enhanced without having to propagate changes through all parts of the data set (frey, 2000). n/a plugability related to modularity, this permits the user of an implementation of a data set reader or writer to replace a module with private code (frey, 2000). n/a interpretability not binary formats (barnes, 2006) rtf (yes) ms word (no) xml (yes) the standard should be written in characters that people can read (lesk, 1995). n/a complexity human readability, compression, variety of features (rog & van wijk, 2008; wijk & rog, 2007). n/a simple raster formats are preferred (puglia et al., 2004). n/a compression algorithms the format uses standard algorithms (puglia et al., 2004). n/a accessibility to prohibit encryption in the file trailer (sullivan, 2006) pdf/a (yes) component reuse not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) repurposing not defined (sahu, 1999) pdf (limited) html (limited) sgml (excellent) xml (excellent) packaging formats in general, packaging formats should be acceptable as transfer mechanisms for image file formats (puglia et al., 2004). zip (yes) significant properties the format accommodates high-bit, high-resolution (detail), color accuracy, and multiple compression options (puglia et al., 2004). n/a processability the requirement to maintain a processable version of the record to have any reuse value (brown, 2003) conversion of a word-processed document into pdf format. (no) searching not defined (sahu, 2006) pdf (limited) html (good) sgml (excellent) xml (excellent) no definite term to support the automatic validation of document conversions and the evaluation of conversion quality by hierarchically decomposing documents from different sources and representing them in an abstract xml language (becker et al., 2008a; becker et al., 2008b) n/a xcl (yes) to make transferring data easy (johnson, 1999) n/a xml (yes) a format that is easy to restore and understand by both humans and machines (müller et al., 2003) n/a xml (yes) information technology and libraries | december 2012 57 inability to be backed out into a usable format (potter, 2006) pdfs (no) 2. m e t a d a t a self-documentation self-documenting digital objects that contain basic descriptive, technical, and other administrative metadata (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) metadata and technical description of format embedded (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) the ability of a digital format to hold (in a transparent form) metadata beyond that needed for basic rendering of the content (arms & fleischhauer, 2006) n/a self-documenting to contain its own description (abrams et al., 2005) n/a documentation deep technical documentation publicly and fully is available. it is maintained for older versions of the format (puglia et al., 2004). n/a metadata support file formats making provision for the inclusion of metadata (brown, 2003) tiff (yes) microsoft word 2000 (yes) not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) not defined (sahu, 2006) n/a metadata the format allows for self-documentation (puglia et al., 2004). n/a content-level description not presentation-level description; structural markup, not formatting (barnes, 2006) pdf (no) docbook (yes) tei (yes) xhtml (yes) xml (yes) content-level, not presentation-level, descriptions where possible, the labeling of items should reflect their meaning, not their appearance (lesk, 1995). sgml (yes) self-describing many different types of metadata are required to decipher the contents of a file (folk & barkstrom, 2003). n/a self-describing files embed metadata in pdf files (sullivan, 2006) pdf/a (adobe extensible metadata platform required) formal (bnfor xml-like) description of format to create new readers solely on the basis of formal descriptions of the file content (folk & barkstrom, 2003) n/a no definite term its self-describing tags identify what your content is all about (johnson, 1999). n/a xml (yes) a format for strong descriptive and administrative metadata and the complete content of the document (müller et al., 2003) n/a xml (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 58 3. o p e n n e s s disclosure authoritative specification publicly available (abrams et al., 2005) pdf/a (yes) microsoft word (no) the degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content (cendi, 2007; hodge & anderson, 2007; arms & fleischhauer, 2006) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) authoritative specification is publicly available (sullivan, 2006). pdf/a (yes) open availability no proprietary formats (barnes, 2006) odf (yes) gif (no) pdf (no) rtf (no) microsoft word (no) any manufacturer or researcher should have the ability to use the standard, rather than having it under the control of only one company (lesk, 1995). kodak photocd (no) gif (no) openness standardization, restrictions on the interpretation of the file format, reader with freely available source (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) ms word (no) a standard is designed to be implemented by multiple providers and guide 5: file formats for digital masters employed by a large number of users (frey, 2000). n/a formats that are described by publicly available specifications or open-source source code can, with some effort, be reconstructed later: (1) open publicly available specification, (2) specification in public domain, (3) viewer with freely available source, (4) viewer with gpl’ed source, (5) not encrypted (clausen, 2004). n/a open-source software or equivalent to move toward obtaining open-source arrangements for all parts of the file format and associated libraries (folk & barkstrom, 2003) n/a open standard formats for which the technical specification has been made available in the public domain (brown, 2003) jpeg (yes) pdf (limited) ascii (limited) not defined (sahu, 2006) n/a standard/ proprietary not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) nonproprietary formats the specification is independent of a particular vendor (public records office of victoria, 2004). n/a no definite term to avoid vendor-lock (potter, 2006) odf (yes) information technology and libraries | december 2012 59 4. i n t e r o p e r a b i l i t y interoperability is the format supported by many software applications/os platforms or is it linked closely with a specific application (puglia et al., 2004)? n/a the ability to exchange electronic records with other users and it systems (brown, 2003) n/a not defined (sahu, 2006) n/a data interchange not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) compatibility compatibility with prior versions of data set definitions often is needed for access and migration considerations (frey, 2000). n/a stability compatibility between versions (folk & barkstrom, 2003) n/a stable, not subject to constant or major changes over time (brown, 2003) n/a the format is supported by current applications and backward compatible, and there are frequent updates to the format or the specification (puglia et al., 2004). n/a not defined (sahu, 2006). n/a scalability the design should be applicable both to small and large data sets and to small and large hardware systems (frey, 2000). n/a markup compatibility and extensibility to support a much broader range of applications (ecma, 2008) n/a xml (yes) suitability for a variety of storage technologies the format should not be geared toward any particular technology (folk & barkstrom, 2003). n/a no definite term to allow data to be shared across information systems and remain impervious to many proprietary software revisions (potter, 2006) openoffice (yes) 5. i n d e p e n d e n c e device independencies can be reliably and consistently rendered without regard to the hardware/software platform (abrams et al., 2005) pdf/a (yes) tiff (no) static visual appearance can be reliably and consistently rendered and printed without regard to the hardware or software platform used (sullivan, 2006). pdf/a (yes) pdf/x (yes) this is a very important aspect for master files because they will be most likely used on various systems (frey, 2000). n/a independent implementations independent implementations help ensure that vendors accurately implement the specification (public records office of victoria, 2004). n/a externaldependency degree to which the format is dependent on specific hardware, operating system, or software for rendering or use and the complexity of dealing with those dependencies in future technical environments (arms & fleischhauer, 2006) n/a external dependencies the degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (no) tiff_g4 (no) xml (no) examining attributes of open standard file formats for long-term preservation and open access | park and oh 60 portability a format that makes extensive use of specific hardware or operating system features is likely to be unusable when that hardware or operating system falls into disuse. a format that is defined in an independent way will be much easier to use in the future: (1) independent of hardware; (2) independent of operating system; (3) independent of other software; (4) independent of particular institutions, groups, or events; (5) widespread current use; (6) little built-in functionality; and (7) single version or well-defined versions (clausen, 2004). n/a monitoring obsolescence information gathered through regular web harvesting can give us some information about what file types are approaching obsolescence, at least for the more frequently used types (clausen, 2004). n/a no definite term a human-readable text format and internationalized character sets are supported (müller et al., 2003). n/a xml (yes) not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, not dependent on other external resources (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (little) the format requires a plug-in for viewing if appropriate software is not available or relies on external programs to function (puglia et al., 2004). n/a 6. p r e s e n t a t i o n distributing page image not defined (sahu, 2006) pdf (excellent) html (good) sgml (good) xml (good) normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (limited) tiff_g4 (yes) xml (yes) presentation preservation of its original look and feel (brown, 2003) n/a self-containment everything that is necessary to render or print a pdf/a file must be contained within the file (sullivan, 2006). pdf/a (yes) self-contained to contain all resources necessary for rendering (abrams et al., 2005) n/a beyond normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (limited) 7. a u t h e n t i c i t y authenticity the format must preserve the content (data and structure) of the record and any inherent contextual, provenance, referencing and fixity information (brown, 2003). n/a provenance traceability ability to trace the entire configuration of data production (folk & barkstrom, 2003) n/a integrity of layout not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (yes) integrity of rendering of equations not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (limited) integrity of structure not defined (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (limited) tiff_g4 (n/a) information technology and libraries | december 2012 61 xml (yes) 8. a d o p t i o n adoption degree to which the format is already used by the primary creators, disseminators, or users of information resources (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) worldwide usage, usage in the cultural heritage sector as archival format (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) microsoft word (limited) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a widespread use may be the best deterrent against preservation risk (abrams et al., 2005). tiff (yes) the format is widely used by the imaging community in cultural institutions (puglia et al., 2004). n/a flexibility of implementation to promote its wide adoption (sullivan, 2006) pdf/a (yes) popularity a format that is widely used (folk & barkstrom, 2003) n/a widely used formats it is far more likely that software will continue to be available to render the format (public records office of victoria, 2004). n/a ubiquity popular formats supported by as much software as possible (brown, 2003) n/a not defined (sahu, 2006) n/a continuity the file format is mature (puglia et al., 2004) n/a 9. p r o t e c t i o n technical protection mechanism password protection, copy protection, digital signature, printing protection and content extraction protection (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) implementation of a mechanism such as encryption that prevents the preservation of content by a trusted repository (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (no) tiff_g4 (no) xml (no) it must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints (arms & fleischhauer, 2006). n/a no encryption, passwords, etc. (abrams et al. (2005) n/a protection the format accommodates error detection, correction mechanisms, and encryption options (puglia et al., 2004). n/a source verification cryptographic encoding of files or digital watermarks without overburdening the data centers or archives (folk & barkstrom, 2003) n/a examining attributes of open standard file formats for long-term preservation and open access | park and oh 62 10. p r e s e r v a t i o n preservation the format contains embedded objects (e.g., fonts, raster images) or links to external objects (puglia et al., 2004). n/a long-term institutional support to ensure the long-term maintenance and support of a data format by placing responsibility for these operations on institutions (folk & barkstrom, 2003) n/a ease of transformation/ preservation the format will be supported for fully functional preservation in a repository setting, or the format guarantee can currently only be made at the bitstream (content data) level (puglia et al., 2004). n/a no definite term to create files with either a very high or very low preservation value (becker et al., 2008a, becker et al., 2008b) pdf (no) tiff (no) 11. r e f e r e n c e citability a machine-independent ability to reference or “cite” the individual data element in a stable way (folk & barkstrom, 2003) n/a referential extensibility ability to build annotations about new interpretations of the data (folk & barkstrom, 2003) n/a no definite term an open and established notation (müller et al., 2003) n/a xml (yes) data is easily repurposed via tags or translated to any medium (johnson, 1999) n/a xml (yes) creating, using, and reusing tags is easy, making it highly extensible (johnson, 1999). n/a xml (yes) 12. o t h e r s transparency degree to which the digital representation is open to direct analysis with basic tools, such as human readability using a text-only editor (cendi, 2007, hodge & anderson, 2007). pdf (limited) pdf/a (limited) tiff_g4 (limited) xml (yes) in natural reading order (sullivan, 2006). pdf/a (yes) microsoft notepad (yes) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a amenable to direct analysis with basic tools (abrams et al., 2005) n/a ample comment space to allow rich metadata (barnes, 2006) n/a items should be labeled, as far as possible, with enough information to serve for searching or cataloging (lesk, 1995). tiff (yes) a digital format may inhibit the ability of archival institutions to sustain content in that format (arms & fleischhauer, 2006). n/a information technology and libraries | december 2012 63 table bibliography abrams, stephen et al. 2005. “pdf-a: the development of a digital preservation standard.” paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, http://www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011). arms, caroline r. and carl fleischhauer. 2006. “sustainability of digital formats: planning for library of congress collections.” http://www.digitalpreservation.gov/formats/sustain/sustain.shtml (accessed november 21, 2011). barnes, ian. 2006. “preservation of word processing documents.” http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). becker, christoph et al. 2008. “a generic xml language for characterising objects to support digital preservation.” in proceedings of the 2008 acm symposium on applied computing, fortaleza, ceara, brazil, march 16–20. becker, christoph et al. 2008. “systematic characterization of objects in digital preservation: the extensible characterization language.” journal of universal computer science 14, no 18: 2936– 2952. brown, adams. 2003. “the national archives. digital preservation guidance note: selecting file formats for long-term preservation.” http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011). cendi digital preservation task group. 2007. “formats for digital preservation: a review of alternatives and issues.” http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). clausen, lars r. 2004. “handling file formats.” http://netarchive.dk/publikationer/fileformats2004.pdf (accessed november 21, 2011). ecma. 2008. “office open xml file formats—part 1.” 2nd ed. http://www.ecmainternational.org/publications/standards/ecma-376.htm (accessed november 21, 2011). folk, mike, and bruce barkstrom. 2003. “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries.” paper presented at the joint conference on digital libraries, houston, tx, may 27–31. http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 64 frey, franziska. 2000. “5. file formats for digital masters.” in guides to quality in visual resource imaging, research libraries group and digital library federation. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf (accessed november 21, 2011). hodge, gail and nikkia anderson. 2007. “formats for digital preservation: a review of alternatives and issues.” information services & use 27: 45–63. johnson, amy helen. 1999. “xml xtends its reach: xml finds favor in many it shops, but it’s still not right for everyone.” computerworld 33, no. 42: 76–81. lesk, michael e. 1995. “preserving digital objects: recurrent needs and challenges.” in proceedings of the 2nd npo conference on multimedia preservation. brisbane, australia. http://www.lesk.com/mlesk/auspres/aus.html (accessed november 21, 2011). müller, eva et al. 2003. “using xml for long-term preservation: experiences from the diva project.” in proceedings of the sixth international symposium on electronic theses and dissertations. berlin, may: 109–116, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/pdf/index.pdf (accessed december 8, 2012). potter, john michael. 2006. “formats conversion technologies set to benefit institutional repositories.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026typ e=pdf (accessed november 21, 2011). public records office of victoria (australia). 2006. “advice on vers long-term preservation formats pros 99/007 (version2) specification 4.” department for victorian communities. http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf (accessed november 21, 2011). puglia, steven, jeffrey reed, and erin rhodes. 2004. “technical guidelines for digitizing archival materials for electronic access: creation of production master files—raster images.” us national archives and records administration. http://www.archives.gov/preservation/technical/guidelines.pdf (accessed november 21, 2011). rog, judith, and caroline van wijk. 2008. “evaluating file formats for long-term preservation.” national library of the netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_metho d_27022008.pdf (accessed november 21, 2011). sahu, d.k. 2004. “long term preservation: which file format to use.” presentation at workshops on open access & institutional repository, chennai, india, may 2–8, http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). sullivan, susan j. 2006. “an archival/records management perspective on pdf/a.” records management journal 16, no. 1: 51–56. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf http://www.lesk.com/mlesk/auspres/aus.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf http://www.archives.gov/preservation/technical/guidelines.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://openmed.nic.in/1363/01/long_term_preservation.pdf information technology and libraries | december 2012 65 van wijk, caroline, and judith rog. 2007. “evaluating file formats for long-term preservation.” presentation at international conference on digital preservation, beijing, china, oct 11–12. http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf (accessed november 21, 2011). http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf editorial board thoughts | eden 109 editorial board thoughts bradford lee eden musings on the demise of paper w e have been hearing the dire predictions about the end of paper and the book since microfiche was hailed as the savior of libraries decades ago. now it seems that technology may be finally catching up with the hype. with the amazon kindle and the sony reader beginning to sell in the marketplace despite the cost (about $360 for the kindle), it appears that a whole new group of electronic alternatives to the print book will soon be available for users next year. amazon reports that e-book sales quadrupled in 2008 from the previous year. this has many technology firms salivating and hoping that the consumer market is ready to move to digital reading as quickly and profitably as the move to digital music. some of these new devices and technologies are featured in the march 3, 2009, fortune article by michael v. copeland titled “the end of paper?”1 part of the problem with current readers is their challenges for advertising. because the screen is so small, there isn’t any room to insert ads (i.e., revenue) around the margins of the text. but new readers such as plastic logic, polymer vision, and firstpaper will have larger screens, stronger image resolution, and automatic wireless updates, with color screens and video capabilities just over the horizon. still, working out a business model for newspapers and magazines is the real challenge. and how much will readers pay for content? with everything “free” over the internet, consumers have become accustomed to information readily available for no immediate cost. so how much to charge and how to make money selling content? the plastic logic reader weighs less than a pound, is one-eighth of an inch thick, and resembles an 8½ x 11 inch sheet of paper or a clipboard. it will appear in the marketplace next year, using plastic transistors powered by a lithium battery. while not flexible, it is a very durable and break-resistant device. other e-readers will use flexible display technology that allows one to fold up the screen and place the device into a pocket. much of this technology is fueled by e-ink, a start-up company that is behind the success of the kindle and the reader. they are exploring the use of color and video, but both have problems in terms of reading experience and battery wear. in the long run, however, these issues will be resolved. expense is the main concern: just how much are users willing to pay to read something in digital rather than analog? amazon has been hugely successful with the kindle, selling more than 500,000 for just under $400 in 2007. and with the drop in subscriptions for analog magazines and newspapers, advertisers are becoming nervous about their futures. or will the “pay by the article” model, like that used for digital music sales, become the norm? so what should or do these developments mean for libraries? it means that we should probably be exploring the purchase of some of these products when they appear and offering them (with some content) for checkout to our patrons. many of us did something similar when it became apparent that laptops were wanted and needed by students for their use. many of us still offer this service today, even though many campuses now require students to purchase them anyway. offering cutting-edge technology with content related to the transmission and packaging of information is one way for our clientele to see libraries as more than just print materials and a social space. and libraries shouldn’t pay full price (or any price) for these new toys; companies that develop these products are dying to find free research and development focus groups that will assist them in versioning and upgrading their products for the marketplace. what better avenue than college students? related to this is the recent announcement by the university of michigan that their university press will now be a digital operation to be run as part of the library.2 decreased university and library budgets have meant that university presses have not been able to sell enough of their monographs to maintain viable business models. the move of a university press to a successful scholarly communication and open-source publishing entity like the university of michigan libraries means that the press will be able to survive, and it also indicates that the newer model of academic libraries as university publishers will have a prototypical example to point out to their university’s administration. in the long run, these types of partnerships are essential if academic libraries are to survive their own budget cuts in the future. references 1. michael v. copeland, “the end of paper?” cnnmoney .com, mar. 3, 2009, http://money.cnn.com/2009/03/03/ technology/copeland_epaper.fortune/ (accessed june 22, 2009). 2. andrew albanese, “university of michigan press merged with library, with new emphasis on digital monographs,” libraryjournal.com, mar. 26, 2009, http://www .libraryjournal.com/article/ca6647076.html (accessed june 22, 2009). bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services and scholarly communication, university of california, santa barbara. ital_24n4p3 ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 6 information technology and libraries | march 2009 paul t. jaeger and zheng yan one law with two outcomes: comparing the implementation of cipa in public libraries and schools though the children’s internet protection act (cipa) established requirements for both public libraries and public schools to adopt filters on all of their computers when they receive certain federal funding, it has not attracted a great amount of research into the effects on libraries and schools and the users of these social institutions. this paper explores the implications of cipa in terms of its effects on public libraries and public schools, individually and in tandem. drawing from both library and education research, the paper examines the legal background and basis of cipa, the current state of internet access and levels of filtering in public libraries and public schools, the perceived value of cipa, the perceived consequences of cipa, the differences in levels of implementation of cipa in public libraries and public schools, and the reasons for those dramatic differences. after an analysis of these issues within the greater policy context, the paper suggests research questions to help provide more data about the challenges and questions revealed in this analysis. t he children’s internet protection act (cipa) established requirements for both public libraries and public schools to—as a condition for receiving certain federal funds—adopt filters on all of their computers to protect children from online content that was deemed potentially harmful.1 passed in 2000, cipa was initially implemented by public schools after its passage, but it was not widely implemented in public libraries until the 2003 supreme court decision (united states v. american library association) upholding the law’s constitutionality.2 now that cipa has been extensively implemented for five years in libraries and eight years in schools, it has had time to have significant effects on access to online information and services. while the goal of filtering requirements is to protect children from potentially inappropriate content, filtering also creates major educational and social implications because filters also limit access to other kinds of information and create different perceptions about schools and libraries as social institutions. curiously, cipa and its requirements have not attracted a great amount of research into the effects on schools, libraries, and the users of these social institutions. much of the literature about cipa has focused on practical issues—either recommendations on implementing filters or stories of practical experiences with filtering. while those types of writing are valuable to practitioners who must deal with the consequences of filtering, there are major educational and societal issues raised by filtering that merit much greater exploration. while relatively small bodies of research have been generated about cipa’s effects in public libraries and public schools,3 thus far these two strands of research have remained separate. but it is the contention of this paper that these two strands of research, when viewed together, have much more value for creating a broader understanding of the educational and societal implications. it would be impossible to see the real consequences of cipa without the development of an integrative picture of its effects on both public schools and public libraries. in this paper, the implications of cipa will be explored in terms of effects on public libraries and public schools, individually and in tandem. public libraries and public schools are generally considered separate but related public sphere entities because both serve core educational and information-provision functions in society. furthermore, the fact that public schools also contain school library media centers highlights some very interesting points of intersection between public libraries and school libraries in terms of the consequences of cipa: while cipa requires filtering of computers throughout public libraries and public schools, the presence of school library media centers makes the connection between libraries and schools stronger, as do the teaching roles of public libraries (e.g., training classes, workshops, and evening classes). n the legal road to cipa history under cipa, public libraries and public schools receiving certain kinds of federal funds are required to use filtering programs to protect children under the age of seventeen from harmful visual depictions on the internet and to provide public notices and hearings to increase public awareness of internet safety. senator john mccain (r-az) sponsored cipa, and it was signed into law by president bill clinton on december 21, 2000. cipa requires that filters at public libraries and public schools block three specific types of content: (1) obscene material (that paul t. jaeger (pjaeger@umd.edu) is assistant professor at the college of information studies and director of the center for information policy and electronic government of the university of maryland in college park. zheng yan (zyan@uamail.albany .edu) is associate professor at the department of educational and counseling psychology in the school of education of the state university of new york at albany. one law with two outcomes | jaeger and yan 7 which appeals to prurient interests only and is “offensive to community standards”); (2) child pornography (depictions of sexual conduct and or lewd exhibitionism involving minors); and (3) material that is harmful to minors (depictions of nudity and sexual activity that lack artistic, literary, or scientific value). cipa focused on “the recipients of internet transmission,” rather than the senders, in an attempt to avoid the constitutional issues that undermined the previous attempts to regulate internet content.4 using congressional authority under the spending clause of article i, section 8 of the u.s. constitution, cipa ties the direct or indirect receipt of certain types of federal funds to the installation of filters on library and school computers. therefore each public library and school that receives the applicable types of federal funding must implement filters on all computers in the library and school buildings, including computers that are exclusively for staff use. libraries and schools had to address these issues very quickly because the federal communications commission (fcc) mandated certification of compliance with cipa by funding year 2004, which began in summer 2004.5 cipa requires that filters on computers block three specific types of content, and each of the three categories of materials has a specific legal meaning. the first type—obscene materials—is statutorily defined as depicting sexual conduct that appeals only to prurient interests, is offensive to community standards, and lacks serious literary, artistic, political, or scientific value.6 historically, obscene speech has been viewed as being bereft of any meaningful ideas or educational, social, or professional value to society.7 statutes regulating speech as obscene have to do so very carefully and specifically, and speech can only be labeled obscene if the entire work is without merit.8 if speech has any educational, social, or professional importance, even for embodying controversial or unorthodox ideas, it is supposed to receive first amendment protection.9 the second type of content—child pornography—is statutorily defined as depicting any form of sexual conduct or lewd exhibitionism involving minors.10 both of these types of speech have a long history of being regulated and being considered as having no constitutional protections in the united states. the third type of content that must be filtered— material that is harmful to minors—encompasses a range of otherwise protected forms of speech. cipa defines “harmful to minors” as including any depiction of nudity, sexual activity, or simulated sexual activity that has no serious literary, artistic, political, or scientific value to minors.11 the material that falls into this third category is constitutionally protected speech that encompasses any depiction of nudity, sexual activity, or simulated sexual activity that has serious literary, artistic, political, or scientific value to adults. along with possibly including a range of materials related to literature, art, science, and policy, this third category may involve materials on issues vital to personal well-being such as safe sexual practices, sexual identity issues, and even general health care issues such as breast cancer. in addition to the filtering requirements, section 1731 also prescribes an internet awareness strategy that public libraries and schools must adopt to address five major internet safety issues related to minors. it requires libraries and schools to provide reasonable public notice and to hold at least one public hearing or meeting to address these internet safety issues. requirements for schools and libraries cipa includes sections specifying two major strategies for protecting children online (mainly in sections 1711, 1712, 1721, and 1732) as well as sections describing various definitions and procedural issues for implementing the strategies (mainly in sections 1701, 1703, 1731, 1732, 1733, and 1741). section 1711 specifies the primary internet protection strategy—filtering—in public schools. specifically, it amends the elementary and secondary education act of 1965 by limiting funding availability for schools under section 254 of the communication act of 1934. through a compliance certification process within a school under supervision by the local educational agency, it requires schools to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, are child pornography, or are harmful to minors under the age of seventeen. likewise, section 1712 specifies the same filtering strategy in public libraries. specifically, it amends section 224 of the museum and library service act of 1996/2003 by limiting funding availability for libraries under section 254 of the communication act of 1934. through a compliance certification process within a library under supervision by the institute of museum and library services (imls), it requires libraries to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, child pornography, or harmful to minors under the age of seventeen. section 1721 is a requirement for both libraries and schools to enforce the internet safety policy with the internet safety policy strategy and the filtering technology strategy as a condition of universal service discounts. specifically, it amends section 254 of the communication act of 1934 and requests both schools and libraries to monitor the online activities of minors, operate a technical protection measure, provide reasonable public notice, and hold at least one public hearing or meeting to address the internet safety policy. this is through the 8 information technology and libraries | march 2009 certification process regulated by the fcc. section 1732, titled the neighborhood children’s internet protection act (ncipa), amends section 254 of the communication act of 1934 and requires schools and libraries to adopt and implement an internet safety policy. it specifies five types of internet safety issues: (1) access by minors to inappropriate matter on the internet; (2) safety and security of minors when using e-mail, chat rooms, and other online communications; (3) unauthorized access; (4) unauthorized disclosure, use, and dissemination of personal information; and (5) measures to restrict access to harmful online materials. from the above summary, it is clear that (1) the two protection strategies of cipa (the internet filtering strategy and safety policy strategy) were equally enforced in both public schools and public libraries because they are two of the most important social institutions for children’s internet safety; (2) the nature of the implementation mechanism is exactly the same, using the same federal funding mechanisms as the sole financial incentive (limiting funding availability for schools and libraries under section 254 of the communication act of 1934) through a compliance certification process to enforce the implementation of cipa; and (3) the actual implementation procedure differs in libraries and schools, with schools to be certified under the supervision of local educational agencies (such as school districts and state departments of education) and with libraries to be certified within a library under the supervision of the imls. economics of cipa the universal service program (commonly known as e–rate) was established by the telecommunications act of 1996 to provide discounts, ranging from 20 to 90 percent, to libraries and schools for telecommunications services, internet services, internal systems, and equipment.12 the program has been very successful, providing approximately $2.25 billion dollars a year to public schools, public libraries, and public hospitals. the vast majority of e-rate funding—about 90 percent—goes to public schools each year, with roughly 4 percent being awarded to public libraries and the remainder going to hospitals.13 the emphasis on funding schools results from the large number of public schools and the sizeable computing needs of all of these schools. but even 4 percent of the e-rate funding is quite substantial, with public libraries receiving more than $250 million between 2000 and 2003.14 schools received about $12 billion in the same time period.15 along with e-rate funds, the library services and technology act (lsta) program administered by the imls provides money to each state library agency to use on library programs and services in that state, though the amount of these funds is considerably lower than e-rate funds. the american library association (ala) has noted that the e-rate program has been particularly significant in its role of expanding online access to students and to library patrons in both rural and underserved communities.16 in addition to the effect on libraries, e-rate and lsta funds have significantly affected the lives of individuals and communities. these programs have contributed to the increase in the availability of free public internet access in schools and libraries. by 2001, more than 99 percent of public school libraries provided students with internet access.17 by 2007, 99.7 percent of public library branches were connected to the internet, and 99.1 percent of public library branches offered public internet access.18 however, only a small portion of libraries and schools used filters prior to cipa.19 since the advent of computers in libraries, librarians typically had used informal monitoring practices for computer users to ensure that nothing age inappropriate or morally offensive was publicly visible.20 some individual school and library systems, such as in kansas and indiana, even developed formal or informal statewide internet safety strategies and approaches.21 why were only libraries and schools chosen to protect children’s online safety? while there are many social institutions that could have been the focus of cipa, the law places the requirements specifically on public libraries and public schools. if congress was so interested in protecting children from access to harmful internet content, it seems that the law would be more expansive and focused on the content itself rather than filtering access to the content. however, earlier laws that attempted to regulate access to internet content failed legal challenges specifically because they tried to regulate content. prior to the enactment of cipa, there were a number of other proposed laws aimed at preventing minors from accessing inappropriate internet content. the communications decency act (cda) of 1996 prohibited the sending or posting of obscene material through the internet to individuals under the age of eighteen.22 however, the supreme court found the cda to be unconstitutional, stating that the law violated free speech under the first amendment. in 1998, congress passed the child online protection act (copa), which prohibited commercial websites from displaying material deemed harmful to minors and imposed criminal penalties on internet violators.23 a three-panel judge for the district court for the eastern district of pennsylvania ruled that copa’s focus on “contemporary community standards” violated the first amendment, and the panel subsequently imposed an one law with two outcomes | jaeger and yan 9 injunction on copa’s enforcement. cipa’s force comes from congress’s power under the spending clause; that is, congress can legally attach requirements to funds that it gives out. since cipa is based on economic persuasion—the potential loss of funds for technology—the law can only have an effect on recipients of those funds. while regulating internet access in other venues like coffee shops, internet cafés, bookstores, and even individual homes would provide a more comprehensive shield to limit children’s access to certain online content, these institutions could not be reached under the spending clause. as a result, the burdens of cipa fall squarely on public libraries and public schools. n the current state of filtering when did cipa actually come into effect in libraries and schools? after overcoming a series of legal challenges that were ultimately decided by the supreme court, cipa came into effect in full force in 2003, though 96 percent of public schools were already in compliance with cipa in 2001. when the court upheld the constitutionality of cipa, the legal challenge by public libraries centered on the way the statute was written.24 the court’s decision states that the wording of the law does not place unconstitutional limitations on free speech in public libraries. to continue receiving federal dollars directly or indirectly through certain federal programs, public libraries and schools were required to install filtering technologies on all computers. while the case decided by the supreme court focused on public libraries, the decision virtually precludes public schools from making the same or related challenges.25 before that case was decided, however, most schools had already adopted filters to comply with cipa. as a result of cipa, a public library or public school must install technology protection measures, better known as filters, on all of its computers if it receives n e-rate discounts for internet access costs, n e–rate discounts for internal connections costs, n lsta funding for direct internet costs,26 or n lsta funding for purchasing technology to access the internet. the requirements of cipa extend to public libraries, public schools, and any library institution that receives lsta and e–rate funds as part of a system, including state library agencies and library consortia. as a result of the financial incentives to comply, almost 100 percent of public schools in the united states have implemented the requirements of cipa,27 and approximately half of public libraries have done so.28 how many public schools have implemented cipa? according to the latest report by the department of education (see table 1), by 2005, 100 percent of public schools had implemented both the internet filtering strategy and safety policy strategy. in fact, in 2001 (the first year cipa was in effect), 96 percent of schools had implemented cipa, with 99 percent filtering by 2002. when compared to the percentage of all public schools with internet access from 1994 to 2005, internet access became nearly universal in schools between 1999 and 2000 (95 to 98 percent), and one can see that the internet access percentage in 2001 was almost the same as the cipa implementation percentage. according to the department of education, the above estimations are based on a survey of 1,205 elementary and secondary schools selected from 63,000 elementary schools and 21,000 secondary and combined schools.29 after reviewing the design and administration of the survey, it can be concluded that these estimations should be considered valid and reliable and that cipa was immediately and consistently implemented in the majority of the public schools since 2001.30 how many public libraries have implemented cipa? in 2002, 43.4 percent of public libraries were receiving e-rate discounts, and 18.9 percent said they would not apply for e-rate discounts if cipa was upheld.31 since the supreme court decision upholding cipa, the number of libraries complying with cipa has increased, as table 1. implementation of cipa in public schools year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2005 access (%) 35 50 65 78 89 95 98 99 99 100 100 filtering (%) 96 99 97 100 10 information technology and libraries | march 2009 have the number of libraries not applying for e-rate funds to avoid complying with cipa. however, unlike schools, there is no exact count of how many libraries have filtered internet access. in many cases, the libraries themselves do not filter, but a state library, library consortium, or local or state government system of which they are a part filters access from beyond the walls of the library. in some of these cases, the library staff may not even be aware that such filtering is occurring. a number of state and local governments have also passed their own laws to encourage or require all libraries in the state to filter internet access regardless of e-rate or lsta funds.32 in 2008, 38.2 percent of public libraries were filtering access within the library as a result of directly receiving e-rate funding.33 furthermore, 13.1 percent of libraries were receiving e-rate funding as a part of another organization, meaning that these libraries also would need to comply with cipa’s requirements.34 as such, the number of public libraries filtering access is now at least 51.3 percent, but the number will likely be higher as a result of state and local laws requiring libraries to filter as well as other reasons libraries have implemented filters. in contrast, among libraries not receiving e-rate funds, the number of libraries now not applying for e-rate intentionally to avoid the cipa requirements is 31.6 percent.35 while it is not possible to identify an exact number of public libraries that filter access, it is clear that libraries overall have far lower levels of filtering than the 100 percent of public schools that filter access. e-rate and other program issues the administration of the e-rate program has not occurred without controversy. throughout the course of the program, many applicants for and recipients of the funding have found the program structure to be obtuse, the application process to be complicated and time consuming, and the administration of the decision-making process to be slow.36 as a result, many schools and libraries find it difficult to plan ahead for budgeting purposes, not knowing how much funding they will receive or when they will receive it.37 there also have been larger difficulties for the program. following revelations about the uses of some e-rate awards, the fcc suspended the program from august to december 2004 to impose new accounting and spending rules for the funds, delaying the distribution of over $1 billion in funding to libraries and schools.38 news investigations had discovered that certain school systems were using e-rate funds to purchase more technology than they needed or could afford to maintain, and some school systems failed to ever use technology they had acquired.39 while the administration of the e-rate program has been comparatively smooth since, the temporary suspension of the program caused serious short-term problems for, and left a sense of distrust of, the program among many recipients.40 filtering issues during the 1990s, many types of software filtering products became available to consumers, including serverside filtering products (using a list of server-selected blocked urls that may or may not be disclosed to the user), client-side filtering (controlling the blocking of specific content with a user password), text-based content-analysis filtering (removing illicit content of a website using real-time analysis), monitoring and timelimiting technologies (tracking a child’s online activities and limiting the amount of time he or she spends online), and age-verification systems (allowing access to webpages by passwords issued by a third party to an adult).41 but because filtering software companies make the decisions about how the products work, content and collection decisions for electronic resources in schools and public libraries have been taken out of the hands of librarians, teachers, and local communities and placed in the trust of proprietary software products.42 some filtering programs also have specific political agendas, which many organizations that purchase them are not aware of.43 in a study of over one million pages, for every webpage blocked by a filter as advertised by the software vendor, one or more pages were blocked inappropriately, while many of the criteria used by the filtering products go beyond the criteria enumerated in cipa.44 filters have significant rates of inappropriately blocking materials, meaning that filters misidentify harmless materials as suspect and prevent access to harmless items (e.g., one filter blocked access to the declaration of independence and the constitution).45 furthermore, when libraries install filters to comply with cipa, in many instances the filters will frequently be blocking text as well as images, and (depending on the type of filtering product employed) filters may be blocking access to entire websites or even all the sites from certain internet service providers. as such, the current state of filtering technology will create the practical effect of cipa restricting access to far more than just certain types of images in many schools and libraries.46 n differences in the perceived value of cipa and filtering based on the available data, there clearly is a sizeable contrast in the levels of implementation of cipa between one law with two outcomes | jaeger and yan 11 schools and libraries. this difference raises a number of questions: for what reasons has cipa been much more widely implemented in schools? is this issue mainly value driven, dollar driven, both, or neither in these two public institutions? why are these two institutions so different regarding cipa implementation while they share many social and educational similarities? reasons for nationwide full implementation in schools there are various reasons—from financial, population, social, and management issues to computer and internet availability—that have driven the rapid and comprehensive implementation of filters in public schools. first, public schools have to implement cipa because of societal pressures and the lobbying of parents to ensure students’ internet safety. almost all users of computers in schools are minors, the most vulnerable groups for internet crimes and child pornography. public schools in america have been the focus of public attention and scrutiny for years, and the political and social responsibility of public schools for children’s internet safety is huge. as a result, society has decided these students should be most strongly protected, and cipa was implemented immediately and most widely at schools. second, in contrast to public libraries (which average slightly less than eleven computers per library outlet), the typical number of computers in public schools ranges from one hundred to five hundred, which are needed to meet the needs of students and teachers for daily learning and teaching. since the number of computers is quite large, the financial incentives of e-rate funding are substantial and critical to the operation of the schools. this situation provides administrators in schools and school districts with the incentive to make decisions to implement cipa as quickly and extensively as possible. furthermore, the amount of money that e-rate provides for schools in terms of technology is astounding. as was noted earlier, schools received over $12 billion from 2000 to 2003 alone. schools likely would not be able to provide the necessary computers for students and teachers without the e-rate funds. third, the actual implementation procedure differs in schools and libraries: schools are certified under the supervision of the local educational agencies such as school districts and state departments of education; libraries are certified within a library organization under the supervision of the imls. in other words, the certification process at schools is directly and effectively controlled by school districts and state departments of education, following the same fundamental values of protecting children. the resistance to cipa in schools has been very small in comparison to libraries. the primary concern raised has been the issue of educational equality. concerns have been raised that filters in schools may create two classes of students—ones with only filtered access at school and ones who also can get unfiltered access at home.47 reasons for more limited implementation in libraries in public libraries, the reasons for implementing cipa are similar to those of public schools in many ways. public libraries provide an average of 10.7 computers in each of the approximately seven thousand public libraries in the united states, which is a lot of technology that needs to be supported. the e-rate and lsta funds are vital to many libraries in the provision of computers and the internet. furthermore, with limited alternative sources of funding, the e-rate and lsta funds are hard to replace if they are not available. given that the public libraries have become the guarantor of public access to computing and the internet, libraries have to find ways to ensure that patrons can access the internet.48 libraries also have to be concerned about protecting and providing a safe environment for younger patrons. while libraries serve patrons of all ages, one of the key social expectations of libraries is the provision of educational materials for children and young adults. children’s sections of libraries almost always have computers in them. much of the content blocked by filters is of little or no education value. as such, “defending unfiltered internet access was quite different from defending catcher in the rye.”49 nevertheless, many libraries have fought against the filtering requirements of cipa because they believe that it violates the principles of librarianship or for a number of other reasons. in 2008, 31.6 percent of public libraries refused to apply for e-rate or lsta funds specifically to avoid cipa requirements, a substantial increase from the 15.3 percent of libraries that did not apply for e-rate because of cipa in 2006.50 as a result of defending patron’s rights to free access, the libraries that are not applying for e-rate funds because of the requirements of cipa are being forced to turn down the chance for funding to help pay for internet access in order to preserve community access to the internet. because many libraries feel that they cannot apply for e-rate funds, local and regional discrepancies are occurring in the levels of internet access that are available to patrons of public libraries in different parts of the country.51 for adult patrons who wish to access material on computers with filters, cipa states that the library has the option of disabling the filters for “bona fide research or other lawful purposes” when adult patrons request such disabling. the law does not require libraries to 12 information technology and libraries | march 2009 disable the filters for adult patrons, and the criteria for disabling of filters do not have a set definition in the law. the potential problems in the process of having the filters disabled are many and significant, including librarians not allowing the filters to be turned off, librarians not knowing how to turn the filters off, the filtering software being too complicated to turn off without injuring the performance of the workstation in other applications, or the filtering software being unable to be turned off in a reasonable amount of time.52 it has been estimated that approximately 11 million low-income individuals rely on public libraries to access online information because they lack internet access at home or work.53 the e-rate and lsta programs have helped to make public libraries a trusted community source of internet access, with the public library being the only source of free public internet access available to all community residents in nearly 75 percent of communities in the united states.54 therefore usage of computers and the internet in public libraries has continued to grow at a very fast pace over the past ten years.55 thus public libraries are torn between the values of providing safe access for younger patrons and broad access for adult patrons who may have no other means of accessing the internet. n cipa, public policy, and further research while the diverse implementations, effects, and levels of acceptance of cipa across schools and libraries demonstrate the wide range of potential ramifications of the law, surprisingly little consideration is given to major assumptions in the law, including the appropriateness of the requirements to different age groups and the nature of information on the internet. cipa treats all users as if they are the same level of maturity and need the same level of protection as a small child, as evidenced by the requirement that all computers in a library or school have filters regardless of whether children use a particular computer. in reality, children and adults interact in different social, physical, and cognitive ways with computers because of different developmental processes.56 cipa fails to recognize that children as individual users are active processors of information and that children of different ages are going to be affected in divergent ways by filtering programs.57 younger children benefit from more restrictive filters while older children benefit from less restrictive filters. moreover, filtering can be complimented by encouragement of frequent positive internet usage and informal instruction to encourage positive use. finally, children of all ages need a better understanding of the structure of the internet to encourage appropriate caution in terms of online safety. the internet represents a new social and cultural environment in which users simultaneously are affected by the social environment and also construct that environment with other users.58 cipa also is based on fundamental misconceptions about information on the internet. the supreme court’s decision upholding cipa represents several of these misconceptions, adopting an attitude that ‘we know what is best for you’ in terms of the information that citizens should be allowed to access.59 it assumes that schools and libraries select printed materials out of a desire to protect and censor rather than recognizing the basic reality that only a small number of print materials can be afforded by any school or library. the internet frees schools and libraries from many of these costs. furthermore, the court assumes that libraries should censor the internet as well, ultimately upholding the same level of access to information for adult patrons and librarians in public libraries as students in public schools. these two major unexamined assumptions in the law certainly have played a part in the difficulty of implementing cipa and in the resistance to the law. and this does not even address the problems of assuming that public libraries and public schools can be treated interchangeably in crafting legislation. these problematic assumptions point to a significantly larger issue: in trying to deal with the new situations created by the internet and related technology, the federal government has significantly increased the attention paid to information policy.60 over the past few years, government laws and standards related to information have begun to more clearly relate to social aspects of information technologies such as the filtering requirements of cipa.61 but the social, economic, and political ramifications for decisions about information policy are often woefully underexamined in the development of legislation.62 this paper has documented that many of the reasons for and statistics about cipa implementation are available by bringing together information from different social institutions. the biggest questions about cipa are about the societal effects of the policy decisions: n has cipa changed the education and informationprovision roles of libraries and schools? n has cipa changed the social expectations for libraries and schools? n have adult patron information behaviors changed in libraries? n have minor patron information behaviors changed in libraries? n have student information behaviors changed in school? n how has cipa changed the management of libraries and schools? n will congress view cipa as successful enough to merit using libraries and schools as the means of enforcing other legislation? one law with two outcomes | jaeger and yan 13 but these social and administrative concerns are not the only major research questions raised by the implementation of cipa. future research about cipa not only needs to focus on the individual, institutional, and social effects of the law. it must explore the lessons that cipa can provide to the process of creating and implementing information policies with significant societal implications. the most significant research issues related to cipa may be the ones that help illuminate how to improve the legislative process to better account for the potential consequences of regulating information while the legislation is still being developed. such cross-disciplinary analyses would be of great value as information becomes the center of an increasing amount of legislation, and the effects of this legislation have continually wider consequences for the flow of information through society. it could also be of great benefit to public schools and libraries, which, if cipa is any indication, may play a large role in future legislation about public internet access. references 1. children’s internet protection act (cipa), public law 106554. 2. united states v. american library association, 539 u.s. 154 (2003). 3. american library association, libraries connect communities: public library funding & technology access study 2007–2008 (chicago: ala, 2008); paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–9; zheng yan, “limited knowledge and limited resources: children’s and adolescents’ understanding of the internet,” journal of applied developmental psychology (forthcoming); zheng yan, “differences in basic knowledge and perceived education of internet safety between high school and undergraduate students: do high school students really benefit from the children’s internet protection act?” journal of applied developmental psychology (forthcoming); zheng yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?,” developmental psychology 42 (2006): 418–28. 4. martha m. mccarthy, “filtering the internet: the children’s internet protection act,” educational horizons 82, no, 2 (winter 2004): 108. 5. federal communications commission, in the matter of federal–state joint board on universal service: children’s internet protection act, fcc order 03-188 (washington, d.c.: 2003). 6. cipa. 7. roth v. united states, 354 u.s. 476 (1957). 8. miller v. california, 413 u.s. 15 (1973). 9. roth v. united states. 10. cipa. 11. cipa. 12. telecommunications act of 1996, public law 104-104 (feb. 8, 1996). 13. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 14. ibid. 15. ibid. 16. american library association, “u.s. supreme court arguments on cipa expected in late winter or early spring,” press release, nov. 13, 2002, www.ala.org/ala/aboutala/hqops/ pio/pressreleasesbucket/ussupremecourt.cfm (accessed may 19, 2008). 17. kelly rodden, “the children’s internet protection act in public schools: the government stepping on parents’ toes?” fordham law review 71 (2003): 2141–75. 18. john carlo bertot, paul t. jaeger, and charles r. mcclure, “public libraries and the internet 2007: issues, implications, and expectations,” library & information science research 30 (2008): 175–184; charles r. mcclure, paul t. jaeger, and john carlo bertot, “the looming infrastructure plateau?: space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007), www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/2017/1907 (accessed may 19, 2008). 19. mccarthy, “filtering the internet.” 20. leigh s. estabrook and edward lakner, “managing internet access: results of a national survey,” american libraries 31, no. 8 (2000): 60–62. 21. alberta davis comer, “studying indiana public libraries’ usage of internet filters,” computers in libraries (june 2005): 10–15; thomas m. reddick, “building and running a collaborative internet filter is akin to a kansas barn raising,” computers in libraries 20, no. 4 (2004): 10–14. 22. communications decency act of 1996, public law 104-104 (feb. 8, 1996). 23. child online protection act (copa), public law 105-277 (oct. 21, 1998). 24. united states v. american library association. 25. r. trevor hall and ed carter, “examining the constitutionality of internet filtering in public schools: a u.s. perspective,” education & the law 18, no. 4 (2006): 227–45; mccarthy “filtering the internet.” 26. library services and technology act, public law 104-208 (sept. 30, 1996). 27. john wells and laurie lewis, internet access in u.s. public schools and classrooms: 1994–2005, special report prepared at the request of the national center for education statistics, nov. 2006. 28. american library association, libraries connect communities; john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 285–301; jaeger et al., “cipa.” 29. wells and lewis, internet access in u.s. public schools and classrooms. 14 information technology and libraries | march 2009 30. ibid. 31. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 32. jaeger et al., “cipa.” 33. american library association, libraries connect communities. 34. ibid. 35. ibid. 36. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 37. ibid. 38. norman oder, “$40 million in e-rate funds suspended: delays caused as fcc requires new accounting standards,” library journal 129, no. 18 (2004): 16; debra lau whelan, “e-rate funding still up in the air: schools, libraries left in the dark about discounted funds for internet services,” school library journal 50, no. 11 (2004): 16. 39. ken foskett and paul donsky, “hard eye on city schools’ hardware,” atlanta journal-constitution, may 25, 2004; ken foskett and jeff nesmith, “wired for waste: abuses tarnish e-rate program,” atlanta journal-constitution, may 24, 2004. 40. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 41. department of commerce, national telecommunication and information administration, children’s internet protection act: study of technology protection measures in section 1703, report to congress (washington, d.c.: 2003). 42. mccarthy, “filtering the internet.” 43. paul t. jaeger and charles r. mcclure, “potential legal challenges to the application of the children’s internet protection act (cipa) in public libraries: strategies and issues,” first monday 9, no. 2 (2004), www.firstmonday.org/issues/issue9_2/ jaeger/index.html (accessed may 19, 2008). 44. electronic frontier foundation, internet blocking in public schools (washington, d.c.: 2004), http://w2.eff.org/censor ship/censorware/net_block_report (accessed may 19, 2008). 45. adam horowitz, “the constitutionality of the children’s internet protection act,” st. thomas law review 13, no. 1 (2000): 425–44. 46. tanessa cabe, “regulation of speech on the internet: fourth time’s the charm?” media law and policy 11 (2002): 50–61; adam goldstein, “like a sieve: the child internet protection act and ineffective filters in libraries,” fordham intellectual property, media, and entertainment law journal 12 (2002): 1187–1202; horowitz, “the constitutionality of the children’s internet protection act”; marilyn j. maloney and julia morgan, “rock and a hard place: the public library’s dilemma in providing access to legal materials on the internet while restricting access to illegal materials,” hamline law review 24, no. 2 (2001): 199–222; mary minow, “filters and the public library: a legal and policy analysis,” first monday 2, no. 12 (1997), www .firstmonday.org/issues/issue2_12/minnow (accessed may 19, 2008); richard j. peltz, “use ‘the filter you were born with’: the unconstitutionality of mandatory internet filtering for adult patrons of public libraries,” washington law review 77, no. 2 (2002): 397–479. 47. mccarthy, “filtering the internet.” 48. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2008); john carlo bertot et al., “drafted: i want you to deliver e-government,” library journal 131, no. 13 (2006): 34–39; paul t. jaeger and kenneth r. fleischmann, “public libraries, values, trust, and e-government,” information technology and libraries 26, no. 4 (2007): 35–43. 49. doug johnson, “maintaining intellectual freedom in a filtered world,” learning & leading with technology 32, no. 8 (may 2005): 39. 50. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities.” 51. jaeger et al., “public libraries and internet access across the united states.” 52. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 53. goldstein, “like a sieve.” 54. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities”; jaeger and fleischmann, “public libraries, values, trust, and e-government.“ 55. bertot, jaeger, and mcclure, “public libraries and the internet 2007”; charles r. mcclure et al., “funding and expenditures related to internet access in public libraries,” information technology & libraries (forthcoming). 56. zheng yan and kurt w. fischer, “how children and adults learn to use computers: a developmental approach,” new directions for child and adolescent development 105 (2004): 41–61. 57. zheng yan, “age differences in children’s understanding of the complexity of the internet,” journal of applied developmental psychology 26 (2005): 385–96; yan, “limited knowledge and limited resources”; yan, “differences in basic knowledge and perceived education of internet safety”; yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?” 58. patricia greenfield and zheng yan, “children, adolescents, and the internet: a new field of inquiry in developmental psychology,” developmental psychology 42 (2006): 391–93. 59. john n. gathegi, “the public library as a public forum: the (de)evolution of a legal doctrine,” library quarterly 75 (2005): 12. 60. sandra braman, “where has media policy gone? defining the field in the 21st century,” communication law and policy 9, no. 2 (2004): 153–82; sandra braman, change of state: information, policy, & power (cambridge, mass.: mit pr., 2007); charles r. mcclure and paul t. jaeger, “government information policy research: importance, approaches, and realities,” library & information science research 30 (2008): 257–64; milton mueller, christiane page, and brendan kuerbis, “civil society and the shaping of communication-information policy: four decades of advocacy,” information society 20, no. 3 (2004): 169–85. 61. paul t. jaeger, “information policy, information access, and democratic participation: the national and international implications of the bush administration’s information politics,” government information quarterly 24 (2007): 840–59. 62. mcclure and jaeger, “government information policy research.” president's column 114 information technology and libraries | september 2006 b eing president of a dynamic organization like lita is truly a humbling experience. every day i am awestruck by the dedication, energy, creativity, and excitement exhibited by lita’s members. i see it in everything that lita does, from its stellar publications and communications—including this journal, ital—to its programming and contribution to standards and system development. none of this would be possible without the hard work of all the dedicated members who volunteer their time not only to advancing their own professional development, but also to advancing the profession. thank you all. for forty years now, lita members have been dedicated to the association’s work, and we have been celebrating our fortieth anniversary throughout 2006. the celebration continues as we prepare to convene in nashville for the ninth lita national forum, october 26– 29, 2006. lita has had a long tradition of providing quality conferences. the first, held in 1970, was the conference on interlibrary communications and information networks, more familiarly known as the “airlie conference,” which had published proceedings. the second was a cooperative effort held in 1971 with the library education division and the american society for information science (asis), entitled “directions in education for information science: a symposium for educators.” in later years, lita held three national conferences: baltimore (1983), boston (1988), and denver (1992). in 1996, lita and the library administration and management association (lama) held a joint conference in pittsburgh. while the national conferences were very successful, the idea of a more informal, intimate event to be held annually took form, and in 1998 lita held its first annual national forum. next year we will continue the tradition of successful conference programming as we celebrate the tenth anniversary of the lita national forum in denver. this year’s theme is “netville in nashville: web services as library services.” we have an exciting lineup of keynote and concurrent-session speakers as well as several poster-session presenters who will stimulate lively discussions in all of the wonderful, informal networking opportunities this small conference offers. the sponsor showcase allows plenty of time for attendees to talk to our valued sponsors and learn more about their products. the two preconference programs offer in-depth experiences: “opensource installfest” and “developing best project management practices for it projects.” lita bloggers will be out in force producing summaries and reactions to it all. one of lita’s strongest membership benefits is the personal networking opportunities it provides. by providing an informal and enjoyable atmosphere, the national forum is one of the best places to network with others dealing with the same issues as you. i hope to see you there. besides the national forum (just one of lita’s many educational programs), one of the things i like most about lita is its flexibility to quickly accommodate programming to cover the latest issues and trends. lita’s programming at ala annual conferences attracts attendees from all divisions for this reason. every year, the highly successful top technology trends attracts more and more people who come to listen to the experts speak on the latest trends. the lita interest groups, like the technologies they focus on, also exhibit great flexibility because they can come and go—it’s easy to locate a few other members to create a new group where interested parties can come together for focused discussions or formal presentations. since its inception, lita has had traveling educational programs to provide programming opportunities for people who cannot attend the ala conferences. these in-depth programs, now called the regional institutes, focus on a topic and are offered as long as that issue is relevant. look for new electronic delivery of lita programs in the future. of course, lita’s publications provide a very lasting educational component. lita launched journal of library automation (jola), the predecessor of ital, in 1968, one year after the formation of the new division of ala. jola and, later, ital have consistently been a place for library information technologists to publish in a peer-reviewed scholarly journal. these well-respected publications have had a wonderful group of editors and editorial boards over the years. we are pleased that ital is now available online for members from the moment of publication. i want to thank all the people who work so hard to produce this publication on a quarterly basis. i also want to thank all the authors who submit their research for publication here and make a lasting contribution to the profession. all of these programs are just a sampling of what lita provides its members. is it any wonder i am awed by it all? i hope you are as well. i also hope that, in my year as your president, you will communicate with me in an open dialogue on the lita blog, via e-mail, or in person at conferences regarding how lita can better meet your needs as a member. we have been focusing a great deal on our educational goal because that is what we have heard you want out of lita. i encourage you to let me and the rest of the lita board know how we can best deliver a quality set of educational programs. president’s column bonnie postlethwaite bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. enhancing opac records for discover | griffis and ford 191 patrick griffis and cyrus ford enhancing opac records for discovery this article proposes adding keywords and descriptors to the catalog records of electronic databases and media items to enhance their discovery. the authors contend that subject liaisons can add value to opac records and enhance discovery of electronic databases and media items by providing searchable keywords and resource descriptions. the authors provide an examination of opac records at their own library, which illustrates the disparity of useful keywords and descriptions within the notes field for media item records versus electronic database records. the authors outline methods for identifying useful keywords for indexing opac records of electronic databases. also included is an analysis of the advantages of using encore’s community tag and community review features to allow subject liaisons to work directly in the catalog instead of collaborating with cataloging staff. a t the university of nevada las vegas (unlv) libraries’ discovery mini-conference, there was a wide range of initiatives and ideas presented. some were large-scale initiatives that focused on designing search platforms and systems as well as information architecture schemas that would enhance library resource discovery. but there was not much focus on enhancing the representation of library resources within the construct of bibliographic records in the opac. since searching platforms can only be as useful as the information available for searching, and since opac records are the method for representing the majority of library resources, we thought it important that the prominence of opac records and how they represent library resources be considered in the mini-conference. to that end, our presentation focused on enhancing the opac records for nonbook items to support their discoverability as opposed to focusing on search systems and information architecture schemas. our proposition was that subject liaisons’ expertise could be used to enhance opac records by including their own keyword search terms and descriptive summaries in opac records for electronic databases as well as records of media items. this proposition acts as a moderate approach to initiatives that call for opac records to be opened for usergenerated content in that this approach provides subject liaison mediation and expertise to modify records. as such, this approach may serve as an effective stopgap in cases where there is resistance toward permitting social tagging and user descriptions within opac records. such an initiative also is scalable, allowing liaisons to provide as few or as many terms as they want. such an initiative would require collaboration between cataloging staff and subject liaisons. n disparity between media and database records at unlv libraries, terms included in the notes fields of bibliographic records are indexed for keyword searching. in the case of media items, there is extensive use of notes to include descriptive terms that enhance discoverability for users. for example, notes for films indicate any awards the film has won as well as festivals in which it has been featured (see figure 1). as a result, users can discover films through keyword searches of film awards or film festivals. a film student who is searching “cannes film festival” via a keyword search will generate results that include films owned by unlv libraries that have been featured at that festival. these keyword-searchable notes add value and discoverability for this type of material, and subject liaisons can be a source for such information. while it appears that notes in media records are heavily populated with a variety of user-centric information, there is relatively little use of descriptive notes for figure 1. the notes field in an opac record of a film item patrick griffis (patrick.griffis@unlv.edu) is business librarian and cyrus ford (cyrus.ford@unlv.edu) is special formats catalog librarian, university of nevada las vegas libraries. 192 information technology and libraries | december 2009 electronic databases (see figure 2). for databases, notes traditionally include information about access restrictions and mode of access while overlooking information representing the content of the resource. these fields could be utilized for specific terms relating to database content not adequately covered by the library of congress subject headings (lcsh). subject liaisons have specialized knowledge of which databases work best for unique content areas, class assignments, and information needs. this user-centric knowledge can be used to enhance database discovery if liaisons were to provide catalogers with information and descriptors to add to the record. as an example, at unlv libraries there is one particular database that provides a strengths, weaknesses, opportunities, and threats (swot) analysis for companies, but that natural language term isn’t found anywhere in the general database summary listing or subject headings. if it were added to a note field as part of a description or as a labeled descriptor, then students could easily find this database to complete their assignments. this proposal is scalable, allowing liaisons to provide as few or as many key terms as they want, depending on their preference or on the vagaries of a particular database. subject liaisons could opt to add a few major terms from their own knowledge and expertise that they feel will add value for patrons searching the opac. subject liaisons also could mine the index and thesaurus terms of individual databases to identify prominent content areas for individual databases to find useful keywords. n mining electronic database index descriptors electronic databases typically have subject matter taxonomies developed by experts who assign descriptors to journal articles. subject liaisons could mine these taxonomies to identify predominant descriptors for individual databases to add to the database catalog records. predominance of a subject descriptor could be determined by examining the relative number of articles that are assigned to that descriptor. such a strategy of indexing key predominant subject descriptors identified from database subject matter taxonomies could serve to uncover unique content areas not served with lcsh. a different application of this strategy could be employed for identifying predominant and emerging research areas for particular groups. subject liaisons could conduct a citation analysis of articles authored by members of a particular research group to record and codify the subject descriptors of each article. once codified, an analysis could determine the most predominant subject descriptors for articles authored by that particular group. this could serve as a baseline for identifying emerging research areas and their terms. both types of analysis have potential to provide useful keyword terms for database records. n using encore’s community features in 2008, unlv libraries purchased and implemented the innovative interfaces’ encore discovery platform, which provides a google-like interface for searching the public catalog and the ability to narrow results using facets such as location, year, language, and format. encore also includes many display features that showcase the information provided in the bibliographic records. two of encore’s web 2.0 features provide users with the ability to contribute data to records via community tags and community reviews. unlv requires users to enter a valid library barcode number and pin. subject liaisons could use the community reviews feature to add descriptive summaries of items to encore records independently, without the need for cataloging staff to edit a marc record. however, the content of community reviews are not indexed for searching and thus only add value at the point when a user is determining whether the resources they have retrieved are valuable for them. on the other hand, if a community tag is added to an item, that tag is included in the community tags section figure 2. the notes field in an opac record of an electronic database enhancing opac records for discover | griffis and ford 193 of the encore result display and becomes an indexed keyword for searches in encore (see figure 3). if that tag term is searched in encore’s keyword search, the bibliographic record attached to that tag term will be included in the results list under the community tags facet. since these community tags are searchable, subject liaisons can add keywords to encore records without collaboration with cataloging staff. however, this provides limited success because the keyword is included and indexed only in encore records—not in the opac records. also, the community tags facet must be selected from the results display for the encore record tags to be searchable. n the case for collaboration as described above, keywords and descriptions added by subject liaisons into encore records have inherent discovery limitations when compared to a cataloger adding the same information directly to the marc bibliographic record. the advantages of collaboration between subject liaisons and catalogers is clear, and subject librarians at unlv libraries have experienced similar collaboration in efforts in the past. in 2006, subject librarians at unlv libraries were offered the opportunity to create their own descriptions of electronic resources through an initiative to update the summary descriptions for the electronic databases portion of the libraries’ website. at that time, all existing electronic database summaries were those used by the database publishers. the project provided subject liaisons the option to create custom summary descriptions to represent electronic databases in their own terms. each subject liaison had a document file for their descriptions, and the website editors used them to update the electronic databases list on the libraries’ website. this particular initiative serves as one example of the willingness of subject liaisons to share their subject expertise to enhance the representation of library resources through collaboration with technical services staff. as such, collaboration between subject liaisons and catalogers to allow liaisons to add terms to opac records of electronic databases and media items could prove to be both effective and feasible as an initiative toward enhancing the discovery of library resources. figure 3. encore community tag lita cover 2, cover 3, cover 4 index to advertisers usability test results for a discovery tool in an academic library jody condit fagan meris mandernach carl s. nelson jonathan r. paulo grover saunders information technology and libraries | march 2012 83 abstract discovery tools are emerging in libraries. these tools offer library patrons the ability to concurrently search the library catalog and journal articles. while vendors rush to provide feature-rich interfaces and access to as much content as possible, librarians wonder about the usefulness of these tools to library patrons. to learn about both the utility and usability of ebsco discovery service, james madison university (jmu) conducted a usability test with eight students and two faculty members. the test consisted of nine tasks focused on common patron requests or related to the utility of specific discovery tool features. software recorded participants’ actions and time on task, human observers judged the success of each task, and a post–survey questionnaire gathered qualitative feedback and comments from the participants. participants were successful at most tasks, but specific usability problems suggested some interface changes for both ebsco discovery service and jmu’s customizations of the tool. the study also raised several questions for libraries above and beyond any specific discovery-tool interface, including the scope and purpose of a discovery tool versus other library systems, working with the large result sets made possible by discovery tools, and navigation between the tool and other library services and resources. this article will be of interest to those who are investigating discovery tools, selecting products, integrating discovery tools into a library web presence, or performing evaluations of similar systems. introduction discovery tools appeared on the library scene shortly after the arrival of next-generation catalogs. the authors of this paper define discovery tools as web software that searches journal-article and library-catalog metadata in a unified index and presents search results in a single interface. this differs from federated search software, which searches multiple databases and aggregates the results. examples of discovery tools include serials solutions summon, ebsco discovery service, jody condit fagan (faganjc@jmu.edu) is director, scholarly content systems, meris mandernach (manderma@jmu.edu) is collection management librarian, carl s. nelson (nelsoncs@jmu.edu) is digital user experience specialist, jonathan r. paulo (paulojr@jmu.edu) is education librarian, and grover saunders (saundebn@jmu.edu) is web media developer, carrier library, james madison university, harrisonburg, va. mailto:faganjc@jmu.edu mailto:manderma@jmu.edu mailto:nelsoncs@jmu.edu mailto:paulojr@jmu.edu mailto:saundebn@jmu.edu usability test results for a discovery tool in an academic library | fagan et al 84 ex libris primo, and oclc worldcat local; examples of federated search software include serials solutions webfeat and ebsco integrated search. with federated search software, results rely on the search algorithm and relevance ranking as well as each tool’s algorithms and relevance rankings. discovery tools, which import metadata into one index, apply one set of search algorithms to retrieve and rank results. this difference is important because it contributes to a fundamentally different user experience in terms of speed, relevance, and ability to interact consistently with results. combining the library catalog, article indexes, and other source types in a unified interface is a big change for users because they no longer need to choose a specific search tool to begin their search. research has shown that such a choice has long been in conflict with users’ expectati ons.1 federated search software was unable to completely fulfill users’ expectations because of its limited technology.2 now that discovery tools provide a truly integrated search experience, with greatly improved relevance rankings, response times, and increased consistency, libraries can finally begin to meet this area of user expectation. however, discovery tools present new challenges for users: will they be able to differentiate between source types in the integrated results sets? will they be able to limit large results sets effectively? do they understand the scope of the tool and that other online resources exist outside the tool’s boundaries? the sea change brought by discovery tools also raises challenges for librarians, who have grown comfortable with the separation between the library catalog and other online databases. discovery tools may mask important differences between disciplinary searching, and they do not currently offer discipline-specific strategies or limits. they also lack authority control, which makes topical precision a challenge. their usual prominence on library websites may direct traffic away from carefully cultivated and organized collections of online resources. discovery tools offer both opportunities and challenges for library instruction, depending on the academic discipline, users’ knowledge, and information-seeking need. james madison university (jmu) is a predominantly undergraduate institution of approximately 18,000 students in virginia. jmu has a strong information literacy program integrated into the curriculum through the university’s information seeking skills test (isst). the isst is completed before students are able to register for third-semester courses. additionally, the library provides an information literacy tutorial, “go for the gold,” that supports the skills needed for the isst. jmu launched ebsco discovery service (eds) in august 2010 after participating as a beta development partner in spring and summer 2010. as with other discovery tools, the predominant feature of eds is integration of the library catalog with article databases and other types of sources. at the time of this study, eds had a few differentiating features. first, because of ebsco’s business as a database and journal provider, article metadata was drawn from a combination of journal-publisher information and abstracts and index records. the latter included robust subject indexing (e.g., the medical subject headings in cinahl). the content searched by eds varies by information technology and libraries | march 2012 85 institution according to the institution’s subscription. jmu had a large number of ebsco databases and third-party database subscriptions through ebsco, so the quantity of information searched by eds at jmu is quite large. eds also allowed for extensive customization of the tool, including header navigation links, results-screen layout, and the inclusion of widgets in the right-hand column of the results screen. jmu libraries developed a custom “quick search” widget based on eds for the library home page (see figure 1), which allows users to add limits to the discovery-tool search and assists with local authentication requirements. based on experience with a pilot test of the open-source vufind next-generation catalog, jmu libraries believed users would find the ability to limit up-front useful, so quick search’s first drop-down menu contained keyword, title, and author field limits; the second drop-down contained limits for books, articles, scholarly articles, “just leo library catalog,” and the library website (which did not use eds). the “just leo library catalog” option limited the user’s search to the library catalog database records but used the eds interface to perform the search. to access the native catalog interface, a link to leo library catalog was included immediately above the search box as well as in the library website header. figure 1. quick search widget on jmu library homepage usability test results for a discovery tool in an academic library | fagan et al 86 evaluation was included as part of the implementation process for the discovery t ool, and therefore a usability test was conducted in october 2010. the purpose of the study was to explore how patrons used the discovery tool, to uncover any usability issues with the chosen system and to investigate user satisfaction. specific tasks addressed the use of facets within the discovery tool, patrons’ use of date limiters, and the usability of the quick search widget. the usability test also had tasks in which users were asked to locate books and articles using only the discovery tool, then repeat the task using anything but the discovery tool. this article interprets the usability study’s results in the context of other local usability tests and web-usage data from the first semester of use. some findings were used to implement changes to quick search and the library website, and to recommend changes to ebsco; however, other findings suggested general questions related to discovery tool software that libraries will need to investigate further. literature review literature reviewed for this article included some background reading on users and library catalogs, library responses to users’ expectations, usability studies in libraries, and usability studies of discovery tools specifically. the first group of articles comprised a discussion about the limitations of traditional library catalogs. the strengths and weaknesses of library catalogs were reported in several academic libraries’ usability studies.3 calhoun recognized that library users’ preference for google caused a decline in the use and value of library catalogs, and encouraged library leaders to “establish the catalog within the framework of online information discovery systems.” 4 this awareness of changes in user expectations during a time when google set the benchmark for search simplicity was echoed by numerous authors who recognized the limits of library catalogs and expressed a need for the catalog to be greatly modernized to keep pace with the evolution of the web. 5 libraries have responded in several ways to the call for modernization, most notably through investigations related to federated searching and next-generation catalogs. several articles have presented usability studies results for various federated searching products.6 fagan provided a thorough literature review of faceted browsing and next-generation catalogs.7 western michigan university presented usability study results for the next-generation catalog vufind, revealing that participants took advantage of the simple search box but did not use the next-generation catalog features of tagging, comments, favorites, and sms texting. 8 the university of minnesota conducted two usability studies of primo and reported that participants were satisfied with using primo to find known print items, limit by author and date, and find a journal title.9 tod olson conducted a study with graduate students and faculty using the aquabrowser interface, and his participants located sources for their research they had not previously been able to find.10 information technology and libraries | march 2012 87 the literature also revealed both opportunities and limitations of federated searching and nextgeneration catalogs. allison presented statistics from google analytics for an implementation of encore at the university of nebraska-lincoln. 11 the usage statistics revealed an increased use of article databases as well as an increased use of narrowing facets such as format and media type, and library location. allison concluded that encore increased users’ exposure to the entire collection. breeding concluded that federated searching had various limitations, especially search speed and interface design, and was thus unable to compete with google scholar. 12 usability studies of next-generation catalogs revealed a lack of features necessary to fully incorporate an entire library’s collection. breeding also recognized the limitations of next-generation library catalogs and saw discovery tools as their next step in evolution: “it’s all about helping users discover library content in all formats, regardless of whether it resides within the physical library or among its collections of electronic content, spanning both locally owned materials and those accessed remotely through subscriptions.” 13 the dominant literature related to discovery tools discussed features,14 reviewed them from a library selector perspective,15 summarized academic libraries’ decisions following selection, 16 presented questions related to evaluation after selection,17 and offered a thorough evaluation of common features.18 allison concluded that “usability testing will help clarify what aspects need improvement, what additions will make [the interface] more useful, and how the interface can be made so intuitive that user training is not needed.”19 breeding noted “it will only be through the experience of library users that these products will either prove themselves or not.”20 libraries have been adapting techniques from the field of usability testing for over a decade to learn more about user behavior, usability, and user satisfaction, with library web sites and systems. 21 rubin and chisnell and dumas and redish provided an authoritative overview of the benefits and best practices of usability testing. 22 in addition, campbell and norlin and winters offered specific usability methodologies for libraries.23 worldcat local has dominated usability studies of discovery tools published to date. ward, shadle, and mofield conducted a usability study at the university of washington. 24 although the second round of testing was not published, the first round involved seven undergraduate and three graduate students; its purpose “was to determine how successful uw students would be in using worldcat local to discover and obtain books and journal articles (in both print and electronic form) from the uw collection, from the summit consortium, and from other worldcat libraries.” 25 although participants were successful at completing these tasks, a few issues arose out of the usability study. users had difficulty with the brief item display because reviews were listed higher than the actual items. the detailed item display also hindered users’ ability to decipher between various editions and formats. the second round of usability testing, not yet published, included tasks related to finding materials on specific subject areas. usability test results for a discovery tool in an academic library | fagan et al 88 boock, chadwell, and reese conducted a usability study of worldcat local at oregon state university.26 the study included four tasks and five evaluative questions. forty undergraduate students, sixteen graduate students, twenty-four library employees, four instructors, and eighteen faculty members took part in the study. they summarized that users found known-title searching to be easier in the library catalog but found topical searches to be more effective in worldcat local.the participants preferred worldcat local for the ability to find articles and search for materials in other institutions. western washington university also conducted a usability study of worldcat local. they selected twenty-four participants with a wide range of academic experience to conduct twenty tasks in both worldcat local and the traditional library catalog.27 the comparison revealed several problems in using worldcat local, including users’ inability to determine the scope of the content, confusion over the intermixing of formats, problems with the display of facet option, and difficulty with known-item searches. western washington university decided not to implement worldcat local. oclc published a thorough summary of several usability studies conducted mostly with academic libraries piloting the tool, including the university of washington; the university of california (berkeley, davis, and irvine campuses); ohio state university; the peninsula library system in san mateo, california; and the free library of urbana and the des plaines public library, both in illinois.28 the report conveyed favorable user interest in searching local, group, and global collections together. users also appreciated the ability to search articles and books together. the authors commented, “however, most academic participants in one test (nine of fourteen) wrongly assumed that journal article coverage includes all the licensed content available at their campuses.”29 oclc used the testing results to improve the order of search results, provide clarity about various editions, improve facets for narrowing a search, provide links to electronic resources, and increase visibility of search terms. at grand valley state university, doug way conducted an analysis of usage statistics after implementing the discovery tool summon in 2009; the usage statistics revealed an increased use of full-text downloads and link resolver software but a decrease in the use of core subject databases.30 the usage statistics showed promising results, but way recommended further studies of usage statistics over a longer period of time to better understand how discovery tools affect entire library collections. north carolina state university libraries released a final report about their usability study of summon.31 the results of these usability studies were similar to other studies of discovery tools: users were satisfied with the ability to search the library catalog and article databases with a single search, but users had mixed results with known-item searching and confusion about narrowing facets and results ranking. although several additional academic libraries have conducted usability studies of encore, summon, and ebsco discovery service, the results have not yet been published.32 information technology and libraries | march 2012 89 only one usability study of ebsco discovery service was found. in a study with six participants, williams and foster found users were satisfied and able to adapt to the new system quickly but did not take full advantage of the rich feature set.33 combined with the rapid changes in these tools, the literature illustrates a current need for more usability studies related to discovery tools. the necessary focus on specific software implementations and different study designs make it difficult to identify common themes. additional usability studies will offer greater breadth and depth to the current dialogue about discovery tools. this article will help fill the gap by presenting results from a usability study of ebsco discovery service. publishing such usability results of discovery tools will inform institutional decisions, improve user experiences, and advance the tools’ content, features, and interface design. in addition, libraries will be able to more thoroughly modernize library catalogs to meet users’ changing needs and expectations as well as keep pace with the evolution of the web. method james madison university libraries’ usability lab features one workstation with two pieces of usability software: techsmith’s morae (version 3) (http://www.techsmith.com/morae.asp), which records screen captures of participant actions during the usability studies, and the usability testing environment (ute) (version 3), which presents participants with tasks in a web-browser environment. the ute also presents end-of-task questions to measure time on task and task success. the study of eds, conducted in october 2010, was covered by an institutional review board – approved protocol. participants were recruited for the study through a bulk email sent to all students and faculty. interested respondents were randomly selected to include a variety of grade levels and majors for students and years of service and disciplines taught for faculty members. the study included ten participants with ranging levels of experience: two freshman, two sophomores, two juniors, one senior, one graduate student, and two faculty members. three of the participants were from the school of business, one from education, two from the arts and humanities, and two from the sciences. the remaining two participants had dual majors in the humanities and the sciences. a usability rule of thumb is that at least five users will reveal more than 75 percent of usability issues.34 because the goal was to observe a wide range of user behaviors and usability issues, and to gather data about satisfaction from a variety of perspectives, this study used two users of each grade level plus two faculty participants (for a total of ten) to provide as much heterogeneity as possible. student participants were presented with ten pre–study questions, and faculty participants were asked nine pre–study questions (see appendix a). the pre–study questions were intended to http://www.techsmith.com/morae.asp usability test results for a discovery tool in an academic library | fagan et al 90 gather information about participants’ background, including their time at jmu, their academic discipline, and their experience with the library website, the ebscohost interface, the library catalog, and library instruction. since participants were anonymous, we hoped their answers would help us interpret unusual comments or findings. pre–test results were not used to form comparison groups (e.g., freshmen versus senior) because these groups would not be representative of their larger populations. these questions were followed by a practice task to help familiarize participants with the testing software. the study consisted of nine tasks designed to showcase usability issues, show the researchers how users behaved in the system, and measure user satisfaction. appendix b lists the tasks and what they were intended to measure. in designing the test, determining success on some tasks seemed very objective (find a video about a given topic) while others appeared to be more subjective (those involving relevance judgments). for this reason, we asked participants to provide satisfaction information on some tasks and not others. in retrospect, for consistency of interpretation, we probably should have asked participants to rate or comment on every task. all of the tasks were presented in the same order. tasks were completed either by clicking “answer” and answering a question (multiple choice or typed response), or by clicking “finished” after navigating to a particular webpage. participants also had the option to skip the task they were working on and move to the next task. allowing participants to skip a task helps differentiate between genuinely incorrect answers and incorrect answers due to participant frustration or guessing. a time limit of 5 minutes was set for tasks 1–7, while tasks 8 and 9 were given time limits of 8 minutes, after which the participant was timed out. time limits were used to ensure participants were able to complete all tasks within the agreed-upon session. average time on task across all tasks was 1 minute, 35 seconds. after the study was completed, participants were presented with the system usability scale (sus), a ten-item scale using statements of subjective assessment and covering a variety of aspects of system usability.35 sus scores, which provide a numerical score out of 100, are affected by the complexity of both the system and the tasks users may have performed before taking the sus. the sus was followed by a post–test consisting of six open-ended questions, plus one additional question for faculty participants, intended to gather more qualitative feedback about user satisfaction with the system (see appendix a). a technical glitch with the ute software affected the study in two ways. first, on seven of the ninety tasks, the ute failed to enforce the five-minute maximum time limit, and participants exceeding a task’s time limit were allowed to continue the task until they completed or skipped the task. one participant exceeded the time limit on task 1 while three of these errors occurred during both tasks 8 and 9. this problem potentially limits the ability to compare the average time on task across tasks; however, since this study used time on task in a descriptive rather than comparative way, the impact on interpreting results is minimal. the seven instances in which the glitch occurred were included in the average time on task data found in figure 3 because the times information technology and libraries | march 2012 91 were not extreme and the time limit had been imposed mostly to be sure participants had time to complete all the tasks. a second problem with the ute was that it randomly and prematurely aborted some users’ tasks; when this happened, participants were informed that their time had run out and were then moved on to the next task. this problem is more serious because it is unknown how much more time or effort the participant would have spent on the task or whether they would have been more successful. because of this, the results below specify how many participants were affected for each task. although this was unfortunate, the results of the participants who did not experience this problem still provide useful cases of user behavior, especially because this study does not attempt to generalize observed behavior or usability issues to the larger population. although a participant mentioned a few technical glitches during testing to the facilitator, the extent of software errors was not discovered until after the tests were complete (and the semester was over) because the facilitator did not directly observe participants during sessions. results the participants were asked several pre–test questions to learn about their research habits. all but one participant indicated they used the library website no more than six times per month (see figure 2). common tasks this study’s student participants said they performed on the website were searching for books and articles, searching for music scores, “research using databases,” and checking library hours. the two faculty participants mentioned book and database searches, electronic journal access, and interlibrary loan. participants were shown the quick search widget and were asked “how much of the library’s resources do you think the quick search will search?” seven participants said “most”; only one person, a faculty member, said it would search “all” the library’s resources. figure 2. monthly visits to library website < 1 visit (2) 1 3 visits (4) 4 6 visits (3) > 7 visits (1) usability test results for a discovery tool in an academic library | fagan et al 92 when shown screenshots of the library catalog and an ebscohost database, seven participants were sure they had used leo library catalog, and three were not sure. three indicated that they had used an ebsco database before, five had not, and two were not su re. participants were also asked how often they had used library resources for assignments in their major field of study; four said “often,” two said “sometimes,” one “rarely/never,” and one “very often.” students were also asked “has a librarian spoken to a class you’ve attended about library research?” and two said yes, five said no, and one was not sure. a “practice task” was administered to ensure participants were comfortable with the workstation and software: “use quick search to search a topic relating to your major/discipline or another topic of interest to you. if you were writing a paper on this topic how satisfied would you be with these results?” no one selected “no opinion” or very unsatisfied”; sixty percent were “very satisfied” or “satisfied” with their results; forty percent were “somewhat unsatisfied.” figure 3 shows the time spent on each task, while figure 4 describes participants’ success on the tasks. task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 no. of responses (not including timeouts) 10 9 5 7 9 10 10 8 10 avg. time on task (in seconds) 175* 123 116 97 34 120 92 252* 255* standard deviation 212 43 50 49 26 36 51 177 174 *includes time(s) in excess of the set time limit. excess time allowed by software error. figure 3. average time spent on tasks 175 123 116 97 34 120 92 292 255 0 50 100 150 200 250 300 350 task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 t im e o n t a sk ( in s e co n d s) average time for all tasks (not including timeouts) information technology and libraries | march 2012 93 the first task (“what was the last thing you searched for when doing a research assignment for class? use quick search to re-search for this.”) started participants on the library homepage. participants were then asked to “tell us how this compared to your previous experience” using a text box. the average time on task was almost 2 minutes; however one faculty participant took more than 12 minutes on this task; if his or her time was removed, the time on task average was 1 minute, 23 seconds. figure 5 shows the participants’ search terms and their comments. task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 how success determined users only asked to provide feedback valid typed-in response provided how many subtasks completed (out of 3) how many subtasks completed (out of 2) correct multiple choice answer how many subtasks completed (out of 2) end task at correct web location how many subtasks complete d (out of 4) how many subtasks completed (out of 4) p01 n/a correct 3 2 timeout 2 correct 0* 0** p02 n/a correct 3* 1 correct 2 correct 0** 3 p03 n/a correct 0* 1 incorrect 2 correct 4 3 p04 n/a correct 2 0* correct 2 skip 3 2 p05 n/a correct* 2 2 correct 1 correct 4 2 p06 n/a correct 3* 1 correct 1 correct 3 0** p07 n/a correct 2 1* correct 1 correct 0 2 p08 n/a correct 2 0* correct 0 skip timeout 0** p09 n/a correct 2* skip correct 2 correct 4 2 p10 n/a correct 1* 1 correct 2 skip 4 2 note: “timeout” indicates an immediate timeout error. users were unable to take any action on the task. *user experienced a timeout error while working on the task. this may have affected their ability to complete the task. **user did not follow directions. figure 4. participants’ success on tasks usability test results for a discovery tool in an academic library | fagan et al 94 participant jmu status major/discipline search terms p01 faculty geology large low shear wave velocity province comments: ebsco did a fairly complete job. there were some irrelevant results that i don’t remember seeing when i used georef. p02 faculty computer information systems & management science (statistics) student cheating comments: this is a topic that i am somewhat familiar with the related literature. i was pleased with the diversity of journals that were found in the search. the topics of the articles was right on target. the recency of the articles was great. this is a topic for which i am somewhat familiar with the related literature. i was impressed with the search results regarding: diversity of journals; recency of articles; just the topic in articles i was looking for. p03 graduate student education death of a salesman comments: there is a lot of variety in the types of sources that quick search is pulling up now. i would still have liked to see more critical sources on the play but i could probably have found more results of that nature with a better search term, such as “death of a salesman criticism.” p04 1st year voice performance current issues in russia comments: it was somewhat helpful in the way that it gave me information about what had happened in the past couple months, but not what was happening now in russia. p05 3rd year nursing uninsured and health care reform comments: the quick search gave very detailed articles i thought, which could be good, but were not exactly what i was looking for. then again, i didn’t read all these articles either p06 1st year history headscarf law comments: this search yielded more results related to my topic. i needed other sources for an argument on the french creating law banning religious dress and symbols in school. using other methods with the same keyword, i had an enormous amount of trouble finding articles that pertained to my essay. p07 3rd year english jung comments: i like the fact that it can be so defined to help me get exactly what i need. p08 4th year spanish restaurant industry comments: this is about the same as the last time that i researched this topic. p09 2nd year hospitality aphasia comments: there are many good sources, however there are also completely irrelevant sources. p10 2nd year management rogers five types of feedback comments: there is not many documents on the topic i searched for. this may be because the topic is not popular or my search is not specific/too specific. figure 5. participants’ search terms and comments information technology and libraries | march 2012 95 the second task started on the library homepage and asked participants to find a video related to early childhood cognitive development. this task was chosen because jmu libraries have significant video collections and because the research team hypothesized users might have trouble because there was no explicit way to limit to videos at the time. the average time on this task was two minutes, with one person experiencing an arbitrary time out by the software. participants were judged to be successful on this task by the researchers if they found any video related to the topic. all participants were successful on this task, but four entered, then left the discovery tool interface to complete the task. five participants looked for a video search option in the drop-down menu, and of these, three immediately used something other than quick search when they saw that there was no video search option. of those who tried quick search, six opened the source type facet in eds search results and four selected a source type limit, but only two selected a source type that led directly to success (“non-print resources”). task 3 started participants in eds (see figure 6) and asked them to search on speech pathology, find a way to limit search results to audiology, and limit their search results to peer-reviewed sources. participants spent an average of 1 minute, 40 seconds on this task, with five participants being artificially timed out by the software. participants’ success on this task was determined by the researchers’ examination of the number of subtasks they completed. the three subtasks consisted of successfully searching for the given topic (speech language pathology) limiting the search results to audiology, and further limiting the results to peer reviewed sources. four participants were able to complete all three subtasks, including two who were timed out. (the times for those who were timed out were not included in time on task averages, but they were given credit for success.) five completed just two of the subtasks, failing to limit to peerreviewed; one of these because of a timeout. it was unclear why the remaining participants did not attempt to alter the search results to “peer reviewed.” looking at the performed actions, six of the ten typed “and audiology” into search keywords to narrow the search results, while one found and used “audiology” in the subject facet on the search results page. six participants found and used the “scholarly (peer reviewed) journals” checkbox limiter. usability test results for a discovery tool in an academic library | fagan et al 96 figure 6. ebsco discovery service interface beginning with the results they had from task 3, task 4 asked participants to find more recent sources and to select the most recent source available. task success was measured by correct completion of two subtasks: limiting the search results to the last five years and finding the most recent source. the average time on task was 1 minute, 14 seconds, with three artificial timeouts. of those who did not time out, all seven were able to limit their sources to be more recent in some way, but only three were able to select the most recent source. in addition to this being a common research task, the team was interested to see how users accomplished this task. three typed in the limiter in the left-hand column, two typed in the limiter on the advanced search screen, and two used the date slider. two participants used the “sort” drop-down menu to change the sort order to “date descending,” which helped them complete this task. other participants changed the dates, and then selected the first result, which was not the most recent. task 5, which started within eds, asked participants to find a way to ask a jmu librarian for help. the success of this task was measured by whether they reached the correct url for the ask-a information technology and libraries | march 2012 97 librarian page; eight of the ten participants were successful. this task took an average of only 31 seconds to complete, and eight of the ten used the ask-a-librarian link at the top of the page. of the two unsuccessful participants, one was timed out, while another clicked “search modes” for no apparent reason, then clicked back and decided to finish the task. task 6 started in the eds interface and asked participants to locate the journal yachting and boating world and select the correct coverage dates and online status from a list of four options; participants were deemed successful at two subtasks if they selected the correct option and successful at one subtask if they chose an option that was partially correct. participants took an average of two minutes on this task; only five answered correctly. during this task, three participants used the ebsco search option “so journal title/source,” four used quotation marks, and four searched or re-searched with the “title” drop-down menu option. three chose the correct dates of coverage, but were unable to correctly identify the online availability. it is important to note that only searching and locating the journal title were accomplished with the discovery tool; to see dates of coverage and online availability, users clicked jmu’s link resolver button, and the resulting screen was served from serials solutions’ article linker product. although some users spent more time than perhaps was necessary using the eds search options to locate the journal, the real barriers to this task were encountered when trying to interpret the serials solutions screen. task 7, where participants started in eds, was designed to determine whether users could navigate to a research database outside of eds. users were asked to look up the sculpture genius of mirth and were told the library database camio would be the best place to search. they were instructed to “locate this database and find the sculpture.” the researcher observed the recordings to determine success on this task, which was defined as using camio to find the sculpture. participants took an average of 1 minute, 32 seconds on this task; seven were observed to complete the task successfully, while three chose to skip the task. to accomplish this task, seven participants used the jmu research databases link in the header navigation at some point, but only four began the task by doing this. six participants began by searching within eds. the final two tasks started on the library homepage and were a pair: participants were asked to find two books and two recent, peer-reviewed articles (from the last five years) on rheumatoid arthritis. task 8 asked them to use the library’s eds widget, quick search, to accomplish this, and task 9 asked them to accomplish the same task without using quick search. when they found sources, they were asked to enter the four relevant titles in a text-entry box. the average time spent on these tasks was similar: about four minutes per task. comparing these tasks was somewhat confusing because some participants did not follow instructions. user s uccess was determined by the researchers’ observation of how many of the four subtasks the user was able to complete successfully: find two books, find two articles, limit to peer reviewed, and select articles from last five years (with or without using a limiter); figure 4 shows their success. usability test results for a discovery tool in an academic library | fagan et al 98 looking at the seven users who used quick search on the quick search tasks, six limited to “scholarly (peer reviewed) journals”; six limited to the last five years; and seven narrowed results using the source type facet. the average number of subtasks completed on task eight was 3.14 out of 4. looking at the seven users who followed instructions and did not use quick search on task 9, all began with the library catalog and tried to locate articles within the library catalog. the average number of subtasks completed on task 9 was 2.29 out of 4. some users tried to locate articles by setting the catalog’s material type drop-down menu to “periodicals” and others used the catalog’s “periodical” tab, which performed a title keyword search of the e-journal portal. for task 9, only two users eventually chose a research database to find articles. user behavior can only be compared for the six users (all students) who followed instructions on both tasks; a summary is provided in figure 4. after completing all nine tasks, participants were presented with the system usability scale. eds scored 56 out of 100. following the sus, participants were asked a series of post–test questions. only one of the faculty members chose to answer the post–test questions. when asked how they would use quick search, all eight students explicitly mentioned class assignments, and the participating faculty member replied “to search for books.” two students mentioned books specifically, while the rest used the more generic term “sources” to describe items for which they would search. when asked “when would you not use this search tool?” the faculty member said “i would just have to get used to using it. i mainly go to [the library catalog] and then research databases.” responses from the six students who answered this question were vague and hard to categorize: • “not really sure for more general question/learning” • “when just browsing” • “for quick answers” • “if i could look up the information on the internet” • “when the material i need is broad” • “basic searching when you do not need to say where you got the info from” when asked for the advantages of quick search, four specifically mentioned the ability to narrow results, three respondents mentioned “speed,” three mentioned ease of use, and three mentioned relevance in some way (e.g., “it does a pretty good job associating keywords with sources”). two mentioned the broad coverage and one compared it to google, “which is what students are looking for.” when asked to list disadvantages, the faculty member mentioned he/she was not sure what part of the library home page was actually “quick search,” and was not sure how to get to his/her library account. three students talked about quick search being “overwhelming” or “confusing” because of the many features, although one of these also stated, “like anything you need to learn in order to use it efficiently.” one student mentioned the lack of an audio recording limit and another said “when the search results come up it is hard to tell if they are usable results.” information technology and libraries | march 2012 99 knowing that quick search may not always provide the best results, the research team also asked users what they would do if they were unable to find an item using quick search. a faculty participant said he or she would log into the library catalog and start from there. five students mentioned consulting a library staff member in some fashion. three mentioned moving on from library resources, although not necessarily as their first step. one said “find out more information on it to help narrow down my search.” only one student mentioned the library catalog or any other specific library resource. when participants were asked if “quick search” was an appropriate name, seven agreed that it was. of those who did not agree, one participant’s comment was “not really, though i don’t think it matters.” and another’s was “i think it represents the idea of the search, but not the action. it could be quicker.” the only alternative name suggestion was “search tool.” web traffic analysis web traffic through quick search and in eds provides additional context for this study’s results. during august–december 2010, quick search was searched 81,841 times from the library homepage. this is an increase from traffic into the previous widget in this location that searched the catalog, which received 41,740 searches during the same period in 2009. even adjusting for an approximately 22 percent increase in website traffic from 2009 to 2010, this is an increase of 75 percent. interestingly, the traffic to the most popular link on the library homepage, research databases, went from 55,891 in 2009 to 30,616 in 2010, a decrease of 55 percent when adjusting for the change in website traffic. during fall 2010, 28 percent of quick search searches from the homepage were executed using at least one drop-down menu. twelve percent changed quick search’s first drop-down menu to something other than the keyword default, with “title” being the most popular option (7 percent of searches) followed by author (4 percent of searches). twenty percent of users changed the second drop-down option; “just articles” and “just books” were the most popular options, garnering 7 percent and 6 percent of searches, respectively, followed by “just scholarly articles,” which accounted for 4 percent of searches. looking at ebsco’s statistical reports for jmu’s implementation of eds, there were 85,835 sessions and approximately 195,400 searches during august–december 2010. this means about 95 percent of eds sessions were launched using quick search from the homepage. there were an average of 2.3 searches per session, which is comparable to past behavior in jmu’s other ebscohost databases. discussion usability test results for a discovery tool in an academic library | fagan et al 100 the goal of this study was to gather initial data about user behavior, usability issues, and user satisfaction with discovery tools. the task design and technical limitations of the study mean that comparing time on task between participants or tasks would not be particularly illuminating; and, while the success rates on tasks are interesting, they are not generalizable to the larger jmu population. instead, this study provided observations of user behavior that librarians can use to improve services, it suggested some “quick fixes” to usability issues, and it pointed to several research questions. when possible, these observations are supplemented by comparisons between this study and the only other published usability study of eds.36 this study confirmed a previous finding of user studies of federated search software and discovery tools: students have trouble determining what is searched by various systems.37 on the tasks in which they were asked to not use quick search to find articles, participants tried to search for articles in the library catalog. although all but one of this study’s participants correctly answered that quick search did not search “all” library resources, seven thought it searched “most.” both “most” or “some” would be considered correct; however, it is interesting that answering this question more specifically is challenging even for librarians. many journals in subject article indexes and abstracts are included in the eds foundation index; furthermore, jmu’s implementation of eds includes all of jmu’s ebsco subscription resources as well, making it impractical to assemble a master list of indexed titles. of course, there are numerous online resources with contents which may never be included in a discovery tool, such as political voting records, ethnographic files, and financial data. users often have access to these resources through their library. however, if they do not know the library has a database of financial data, they will certainly not consider this content in their response to a question of how many of the library resources are included in the discovery tool. as discovery tools begin to fulfill users’ expectations for a “single search,” libraries will need to share best practices for showcasing valuable, useful collections that fall outside the discovery tool’s scope or abilities. this is especially critical when reviewing the 72 percent increase in homepage traffic to the homepage search widget compared with the 55 percent decrease in homepage traffic to the research databases page. it is important to note these trends do not mean the library’s other research databases have fallen in usage by 55 percent. though there was not a comprehensive examination of usage statistics, spot-checking suggested ebsco and non-ebsco subject databases had both increases and decreases in usage from previous years. another issue libraries should consider, especially when preparing for instruction classes, is that users do not seem to understand which information needs are suited to a discovery tool versus the catalog or subject-specific databases. several tasks provided additional information about users’ mental models of the tool, which may help libraries make better decisions about navigation customizations in discovery tool interfaces and on library websites. task 7 was designed to discover whether users could find their way to a database outside of eds if they knew they needed to use a specific database. six participants, including one of the faculty members, began by searching eds for the name of the sculpture and/or the database name. on task 1, a graduate information technology and libraries | march 2012 101 student who searched on “death of a salesman” and was asked to comment on how quick search results compared to his or her previous experience, said, “i would still have liked to see more critical sources on the play but i could probably have found more results of that nature with a better search term, such as ‘death of a salesman criticism.’” while true, most librarians would suggest using a literary criticism database, which would target this information need. librarians may have differing opinions regarding the best research starting point, but their rationale would be much different than that of the students in this study. this study’s participants said they would use quick search/eds when they were doing class work or research, but would not use it for general inquiries. if librarians were to list which user information needs are best met by a discovery tool versus a subject-specific database, the types of information needs listed would be much more numerous and diverse, regardless of differences over how to classify them. in addition to helping users choose between a discovery tool or a subject-specific database, libraries will need to conceptualize how users will move in and out of the discovery tool to other library resources, services, and user accounts. while users had no trouble finding the ask-alibrarian link in the header, it might have been more informative if users started from a searchresults page to see if they would find the right-hand column’s ask-a-librarian link or links to library subject guides and database lists. discovery tools vary in their abilities to connect users with their online library accounts and are changing quickly in this area. this study also provided some interesting observations about discovery tool interfaces. the default setting for ebsco discovery service is a single search box. however, this study suggests that while users desire a single search, they are willing to use multiple interface options. this was supported by log analysis of the library’s locally developed entry widget, quick search, in which 28 percent of searches included the use of a drop-down menu. on the first usability task, users left quick search’s options set to the default. on other tasks, participants frequently used the dropdown menus and limiters in both quick search and eds. for example, on task 2, which asked them to look for videos, five users looked in the quick search format drop-down menu. on the same task within eds, six users attempted to use the source type facet. use of limiters was similarly observed by williams and foster in their eds usability study.38 one eds interface option that was not obvious to participants was the link to change the sort order. when asked to find the most recent article, only two participants changed the sort option. most others used the date input boxes to limit their search, then selected the first result even thought it was not the most recent one. it is unclear whether the participant assumed the first result was the most recent or whether they could not figure out how to display the most recent sources. finding a journal title from library homepages has long been a difficult task,39 and this study provided no exception, even with the addition of a discovery tool. it is important to note that the standard eds implementation would include a “publications” or “journals a–z” link in the header; usability test results for a discovery tool in an academic library | fagan et al 102 in eds, libraries can customize the text of this link. jmu did not have this type of link enabled in our test, since the hope was that users could find journal titles within the eds results. however, neither eds nor the quick search widget’s search interfaces offered a way to limit the search to a journal title at the time of this study. during the usability test, four participants changed the field search drop-down menu to “title” in eds, and three participants changed the eds field search drop-down menu to “so journal title/source,” which limits the search to articles within that journal title. while both of these ideas were good, neither one resulted in a precise results set in eds for this task unless the user also limited to “jmu catalog only,” a nonintuitive option. since the test, jmu has added a “journal titles” option to quick search that launches the user’s search into the journal a–z list (provided by serials solutions). in two months after the change (february and march 2011), only 391 searches were performed with this option. this was less than 1 percent of all searches, indicating that while it may be an important task, it is not a popular one. like many libraries with discovery tools, jmu added federated search capabilities to eds using ebscohost integrated search software in an attempt to draw some traffic to databases not included in eds (or not subscribed to through ebsco by jmu), such as mla international bibliography, scopus, and credo reference. links to these databases appeared in the upper-righthand column of eds during the usability study (see figure 6.) usage data from ebsco showed that less than 1 percent of all jmu’s eds sessions for fall 2010 included any interaction with this area. likewise, williams and foster observed their participants did not use their federated search until explicitly asked to do so.40 perhaps users faced with discovery tool results simply have no motivation to click on additional database results. since the usability test, jmu has replaced the right-hand column with static links to ask-a-librarian, subject guides, and research database lists. readers may wonder why one of the most common tasks, finding a specific book title, was not included in this usability study; this was because jmu libraries posed this task in a concurrent homepage usability study. on that study, twenty of the twenty-five participants used quick search to find the title “pigs in heaven” and choose the correct call number. eleven of the twenty used the quick search drop-down menu to choose a title search option, further confirming users’ willingness to limit up-front. the average time on this task was just under a minute, and all participants completed this task successfully, so this task was not repeated in the eds usability test. other studies have reported trouble with this type of task;41 much could depend on the item chosen as well as the tool’s relevance ranking. user satisfaction with eds can be summarized from the open-ended post–study questions, from the responses to task 1 (figure 5), and the sus scale. answers to the post–study questions indicated participants liked the ability to narrow results, the speed and ease of use, and relevance of the system. a few participants did describe the system as being “overwhelming” or “confusing” because of the many features, which was also supported by the sus scores. jmu has been using the sus to understand the relative usability of library systems. the sus offers a benchmark for system improvement; for example, ebsco discovery service received an sus of only 37 in spring 2010 (n information technology and libraries | march 2012 103 = 7) but a 56 on this study in fall 2010 (n = 10). this suggests the interface has become more usable. in 2009, jmu libraries also used the sus to test the library catalog’s classic interface as well as a vufind interface to the library catalog, which received scores of 68 (n = 15) and 80 (n = 14), respectively. the differences between the catalog scores and eds indicate an important distinction between usability and usefulness, with the latter concept encompassing a system’s content and capabilities. the library catalog is, perhaps, a more straightforward tool than a discovery tool and attempts to provide access to a smaller set of information. it has none of the complexity involved in finding article-level or book chapter information. all else being equal, simpler tools will be more usable. in an experimental study, tsakonas and paptheodorou found that while users did not distinguish between the concepts of usability and usefulness, they prefer attributes composing a useful system in contrast to those supporting usability.42 discovery tools, which support more tasks, must make compromises in usability that simpler systems can avoid. in their study of eds, williams and foster also found overall user satisfaction with eds. their participants made positive comments about the interface as well as the usefulness and relevance of the results.43 jmu passed on several suggestions to ebsco related to eds based on the test results. ebsco subsequently added “audio” and “video” to the source types, which enabled jmu to add a “just videos at jmu” option to quick search. while it is confusing that “audio” and “video” source types currently behave differently than the others in eds, in that they limit to jmu’s catalog as well as to the source type, this behavior produces what most local users expect. a previous usability study of worldcat local showed users have trouble discriminating between source types in results lists, so the source types facet is important.44 another piece of feedback provided to ebsco was that on the task where users needed to choose the most recent result, only two of our participants sorted by date descending. perhaps the textual appearance of the sort option (instead of a drop-down menu) was not obvious to participants (see figure 6); however, williams and foster did not observe this to be an issue in their study.45 future research the findings of this study suggest many avenues for future research. libraries will need to revisit the scope of their catalogs and other systems to keep up with users’ mental models and information needs. catalogs and subject-specific databases still perform some tasks much better than discovery tools, but libraries will need to investigate how to situate the discovery tool and specialized tools within their web presence in a way that will make sense to users. when should a user be directed to the catalog versus a discovery tool? what items should libraries continue to include in their catalogs? what role do institutional repositories play in the suite of library tools, and how does the discovery tool connect to them (or include them?) how do library websites begin to make sense of the current state of library search systems? above all, are users able to find the best resources for their research needs? although research on searchers’ mental models has been extensive,46 librarians’ mental models have not been studied as such. yet placing the usability test results for a discovery tool in an academic library | fagan et al 104 discovery tool among the library’s suite of services will involve compromises between these two models. another area needing research is how to instruct users to work with the large numbers of results returned by discovery tools. in subject-specific databases, librarians often help users measure the success of their strategy—or even their topic—by the number of results returned: in criminal justice abstracts, 5,000 results means a topic is too broad or the search strategy needs refinement. in a discovery tool, a result set this large will likely have some good results on the first couple of pages if sorted by relevance; however, users will still need to know how to grow or reduce their results sets. participants in this study showed a willingness to use limiters and other interface features, but not always the most helpful ones. when asked to narrow a broad subject on task 3 of this study, only one participant chose to use the “subject” facet even when the subtopic, audiology, was clearly available. most added search terms. it will be important for future studies to investigate the best way for users to narrow large results set in a discovery tool. this study also suggested possible areas of investigation for future user studies. one interesting finding related to this study’s users’ information contexts was that when users were asked to search on their last research topic, it did not always match up with their major: a voice performance student searched on “current issues in russia,” and the hospitality major searched on “aphasia.” to what extent does a discovery tool help or hinder students who are searching outside their major area of study? one of jmu’s reference librarians noted that while he would usually teach a student majoring in a subject how to use that subject’s specific indexes, as opposed to a discovery tool, a student outside the major might not need to learn the subject-specific indexes for that subject and could be well served by the discovery tool. future studies could also investigate the usage and usability of discovery tool features in order to continue informing library customizations and advice to vendors. for example, this study did not have a task related to logging into a patron account or requesting items, but that would be good to investigate in a follow-up study. another area ripe for further investigation is discovery tool limiters. this study’s participants frequently attempted to use limiters, but didn’t always choose the correct ones for the task. what are the ideal design choices for making limiters intuitive? this study found almost no use of the embedded federated search add-on: is this true at other institutions? finally, this study and others reveal difficulty in distinguishing source types. development and testing of interface enhancements to support this ability would be helpful to many libraries’ systems. conclusion this usability test of a discovery tool at james madison university did not reveal as many interface-specific findings as it did questions about the role of discovery tools in libraries. users were generally able to navigate through the quick search and eds interfaces and complete tasks successfully. tasks that are challenging in other interfaces, such as locating journal articles and discriminating between source types, continued to be challenging in a discovery tool interface. information technology and libraries | march 2012 105 this usability test suggested that while some interface features were heavily used, such as drop down limits and facets, other features were not used, such as federated search results. as discovery tools continue to grow and evolve, libraries should continue to conduct usability tests, both to find usability issues and to understand user behavior and satisfaction. although discovery tools challenge libraries to think not only about access but also about the best research pathways for users, they provide users with a search that more closely matches their expectations. acknowledgement the authors would like to thank patrick ragland for his editorial assistance in preparing this manuscript. correction april 12, 2018: at the request of the author, this article was revised to remove a link to a website. references 1. emily alling and rachael naismith, “protocol analysis of a federated search tool: designing for users,” internet reference services quarterly 12, no. 1 (2007): 195, http://scholarworks.umass.edu/librarian_pubs/1/ (accessed jan. 11, 2012); frank cervone, “what we've learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10 ; sara randall, “federated searching and usability testing: building the perfect beast,” serials review 32, no. 3 (2006): 181–82, doi:10.1016/j.serrev.2006.06.003; ed tallent, “metasearching in boston college libraries —a case study of user reactions,” new library world 105, no. 1 (2004): 69-75, doi: 10.1108/03074800410515282. 2. s. c. williams and a. k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011), http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 (accessed jan. 11, 2012). 3. janet k. chisman, karen r. diller, and sharon l. walbridge, “usability testing: a case study,” college & research libraries 60, no. 6 (november 1999): 552–69, http://crl.acrl.org/content/60/6/552.short (accessed jan. 11, 2012); frances c. johnson and jenny craven, “beyond usability: the study of functionality of the 2.0 online catalogue,” new review of academic librarianship 16, no. 2 (2010): 228–50, doi: 10.1108/00012531011015217 (accessed jan, 11, 2012); jennifer e. knievel, jina choi wakimoto, and sara holladay, “does interface design influence catalog use? a case study,” college & research libraries 70, no. 5 (september 2009): 446–58, http://crl.acrl.org/content/70/5/446.short (accessed jan. 11, 2012); jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (march 2008): 5–22, http://0http://scholarworks.umass.edu/librarian_pubs/1/ http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 http://crl.acrl.org/content/60/6/552.short http://crl.acrl.org/content/70/5/446.short http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf usability test results for a discovery tool in an academic library | fagan et al 106 www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf (accessed jan. 11, 2012). 4. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed mar. 11, 2011). 5. dee ann allison, “information portals: the next generation catalog,” journal of web librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed january 11, 2012); marshall breeding, “the state of the art in library discovery,” computers in libraries 30, no. 1 (2010): 31–34; c. p diedrichs, “discovery and delivery: making it work for users . . . taking the sting out of serials!” (lecture, north american serials interest group, inc. 23rd annual conference, phoenix, arizona, june 5–8, 2008), doi: 10.1080/03615260802679127; ian hargraves, “controversies of information discovery,” knowledge, technology & policy 20, no. 2 (summer 2007): 83, http://www.springerlink.com/content/au20jr6226252272/fulltext.html (accessed jan. 11, 2012); jane hutton, “academic libraries as digital gateways: linking students to the burgeoning wealth of open online collections,” journal of library administration 48, no. 3 (2008): 495–507, doi: 10.1080/01930820802289615; oclc, “online catalogs: what users and librarians want: an oclc report,” http://www.oclc.org/reports/onlinecatalogs/default.htm (accessed mar. 11 2011). 6. c. j. belliston, jared l. howland, and brian c. roberts, “undergraduate use of federated searching: a survey of preferences and perceptions of value-added functionality,” college & research libraries 68, no. 6 (november 2007): 472–86, http://crl.acrl.org/content/68/6/472.full.pdf+html (accessed jan. 11, 2012); judith z. emde, sara e. morris, and monica claassen‐wilson, “testing an academic library website for usability with faculty and graduate students,” evidence based library & information practice 4, no. 4 (2009): 24– 36, http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_cw.pdf (accessed jan. 11,2012); karla saari kitalong, athena hoeppner, and meg scharf, “making sense of an academic library web site: toward a more usable interface for university researchers,” journal of web librarianship 2, no. 2/3 (2008): 177–204, http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 (accessed jan. 11, 2012); ed tallent, “metasearching in boston college libraries—a case study of user reactions,” new library world 105, no. 1 (2004): 69–75, doi: 10.1108/03074800410515282; rong tang, ingrid hsieh-yee, and shanyun zhang, “user perceptions of metalib combined search: an investigation of how users make sense of federated searching,” internet reference services quarterly 12, no. 1 (2007): 211–36, http://www.tandfonline.com/doi/abs/10.1300/j136v12n01_11 (accessed jan. 11, 2012). 7. jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology & libraries 29, no. 2 (2010): 58–66, http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience http://www.springerlink.com/content/au20jr6226252272/fulltext.html http://www.oclc.org/reports/onlinecatalogs/default.htm http://crl.acrl.org/content/68/6/472.full.pdf+html http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_cw.pdf http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 http://www.tandfonline.com/doi/abs/10.1300/j136v12n01_11 information technology and libraries | march 2012 107 http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf (accessed jan. 11, 2012). 8. birong ho, keith kelley, and scott garrison, “implementing vufind as an alternative to voyager’s web voyage interface: one library’s experience,” library hi tech 27, no. 1 (2009): 8292, doi: 10.1108/07378830910942946 (accessed jan. 11, 2012). 9. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1 (2008): 7–24, doi: 10.1108/03074800810845976 (accessed jan. 11, 2012). 10. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61, doi: 10.1108/07378830710840509 (accessed jan. 11, 2012). 11. allison, “information portals,” 375–89. 12. marshall breeding, “plotting a new course for metasearch,” computers in libraries 25, no. 2 (2005): 27. 13. ibid. 14. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (2010): 60–62; dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (2010): 62–65, http://www.ebscohost.com/uploads/discovery/pdfs/topicfile-121.pdf (accessed jan. 11, 2012). 15. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (2010): 5–10; k. stevenson et al., “next-generation library catalogues: reviews of encore, primo, summon and summa,” serials 22, no. 1 (2009): 68–78. 16. jason vaughan, “chapter 7: questions to consider,” library technology reports 47, no. 1 (2011): 54; paula l. webb and muriel d. nero, “opacs in the clouds,” computers in libraries 29, no. 9 (2009): 18. 17. jason vaughan, “investigations into library web scale discovery services,” articles (libraries), paper 44 (2011), http://digitalcommons.library.unlv.edu/lib_articles/44. 18. marshall breeding, “the state of the art in library discovery,” 31–34; sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?” library hi tech 28, no. 4 (2010): 690–709. 19. allison, “information portals,” 375–89. 20. breeding, “the state of the art in library discovery,” 31–34. 21. galina letnikova, “usability testing of academic library websites: a selective bibliography,” internet reference services quarterly 8, no. 4 (2003): 53–68. http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf http://www.ebscohost.com/uploads/discovery/pdfs/topicfile-121.pdf http://digitalcommons.library.unlv.edu/lib_articles/44 usability test results for a discovery tool in an academic library | fagan et al 108 22. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); joseph s. dumas and janice redish, a practical guide to usability testing, rev. ed. (portland, or: intellect, 1999). 23. nicole campbell, ed., usability assessment of library-related web sites: methods and case studies (chicago: library & information technology association, 2001); elaina norlin and c. m. winters, usability testing for library web sites: a hands-on guide (chicago: american library association, 2002). 24. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (2008): 17. 25. ibid. 26. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,” http://hdl.handle.net/1957/11167 (accessed mar. 11 2011). 27. bob thomas and stefanie buck, “oclc’s worldcat local versus iii’s webpac: which interface is better at supporting common user tasks?” library hi tech 28, no. 4 (2010): 648–71. 28. oclc, “some findings from worldcat local usability tests prepared for ala annual,” http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf (accessed mar. 11, 2011). 29. ibid., 2. 30. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review 36, no. 4 (2010): 21420. 31. north carolina state university libraries, “final summon user research report,” http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ (accessed mar. 28, 2011). 32. alesia mcmanus, “the discovery sandbox: aleph and encore playing together,” http://www.nercomp.org/data/media/discovery%20sandbox%20mcmanus.pdf (accessed mar. 28, 2011); prweb, “deakin university in australia chooses ebsco discovery service,” http://www.prweb.com/releases/deakin/chooseseds/prweb8059318.htm (accessed mar. 28, 2011); university of manitoba, “summon usability: partnering with the vendor,” http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor (accessed mar. 28, 2011). 33. williams and foster, “promise fulfilled?” 34. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed aug. 20, 2011). 35. john brooke, “sus: a ‘quick and dirty’ usability scale,” in usability evaluation in industry, ed. p. w. jordanet al. (london: taylor & francis, 1996), http://www.usabilitynet.org/trump/documents/suschapt.doc (accessed apr. 6, 2011). 36. williams and foster, “promise fulfilled?” http://hdl.handle.net/1957/11167 http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ http://www.nercomp.org/data/media/discovery%20sandbox%20mcmanus.pdf http://www.prweb.com/releases/deakin/chooseseds/prweb8059318.htm http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor/ http://www.useit.com/alertbox/20000319.html http://www.usabilitynet.org/trump/documents/suschapt.doc information technology and libraries | march 2012 109 37. seikyung jung et al., “libraryfind: system design and usability testing of academic metasearch system,” journal of the american society for information science & technology 59, no. 3 (2008): 375–89; williams and foster, “promise fulfilled?”; laura wrubel and kari schmidt, “usability testing of a metasearch interface: a case study,” college & research libraries 68, no. 4 (2007): 292–311. 38. williams and foster, “promise fulfilled?” 39. letnikova, “usability testing of academic library websites,” 53–68; tom ipri, michael yunkin, and jeanne m. brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181–86; susan h. mvungi, karin de jager, and peter g. underwood, “an evaluation of the information architecture of the uct library web site,” south african journal of library & information science 74, no. 2 (2008): 171–82. 40. williams and foster, “promise fulfilled?” 41. ward et al., “user experience, feedback, and testing,” 17. 42. giannis tsakonas and christos papatheodorou, “analysing and evaluating usefulness and usability in electronic information services,” journal of information science 32, no. 5 (2006): 400– 419. 43. williams and foster, “promise fulfilled?” 44. bob thomas and stefanie buck, “oclc’s worldcat local versus iii’s webpac: which interface is better at supporting common user tasks?” library hi tech 28, no. 4 (2010): 648–71. 45. williams and foster, “promise fulfilled?” 46. tracy gabridge, millicent gaskell, and amy stout, “information seeking through students’ eyes: the mit photo diary study,” college & research libraries 69, no. 6 (2008): 510–22; yan zhang, “undergraduate students’ mental models of the web as an information retrieval system,” journal of the american society for information science & technology 59, no. 13 (2008): 2087–98; brenda reeb and susan gibbons, “students, librarians, and subject guides: improving a poor rate of return,” portal: libraries and the academy 4, no. 1 (2004): 123–30; alexandra dimitroff, “mental models theory and search outcome in a bibliographic retrieval system,” library & information science research 14, no. 2 (1992): 141–56. usability test results for a discovery tool in an academic library | fagan et al 110 appendix a task pre–test 1: please indicate your jmu status (1st year, 2nd year, 3rd year, 4th year, graduate student, faculty, other) pre–test 2: please list your major(s) or area of teaching (open ended) pre–test 3: how often do you use the library website? (less than once a month, 1–3 visits per month, 4–6 visits per month, more than 7 visits per month) pre–test 4: what are some of the most common things you currently do on the library website? (open ended) pre–test 5: how much of the library’s resources do you think the quick search will search? (less than a third, less than half, half, most, all) pre–test 6: have you used leo? (show screenshot on printout) (yes, no, not sure) pre–test 7: have you used ebsco? (show screenshot on printout) (yes, no, not sure) pre–test 8 (student participants only): how often have you used library web resources for course assignments in your major? (rarely/never, sometimes, often, very often) pre–test 9 (student participants only): how often have you used library resources for course assignments outside of your major? (rarely/never, sometimes, often, very often) pre–test 10 (student participants only): has a librarian spoken to a class you've attended about library research? (yes, no, not sure) pre–test 11 (faculty participants only): how often do you give assignments that require the use of library resources? (rarely/never, sometimes, often, very often) pre–test 12 (faculty participants only): how often have you had a librarian visit one of your classes to teach your students about library research? (rarely/never, sometimes, often, very often) post–test 1: when would you use this search tool? post–test 2: when would you not use this search tool? post–test 3: what would you say are the major advantages of quick search? information technology and libraries | march 2012 111 post–test 4: what would you say are the major problems with quick search? post–test 5: if you were unable to find an item using quick search/ebsco discovery service what would your next steps be? post–test 6: do you think the name “quick search” is fitting for this search tool? if not, what would you call it? post–test 7 (faculty participants only): if you knew students would use this tool to complete assignments would you alter how you structure assignments and how? appendix b task purpose • practice task: use quick search to search a topic relating to your major / discipline or another topic of interest to you. if you were writing a paper on this topic how satisfied would you be with these results? help users get comfortable with the usability testing software. also, since the first time someone uses a piece of software involves behaviors unique to that case, we wanted participants’ first use of eds to be with a practice task. 1. what was the last thing you searched for when doing a research assignment for class? use quick search to re-search for this. tell us how this compared to your previous experience. having participants re-search a topic with which they had some experience and interest would motivate them to engage with results and provide a comparison point for their answer. we hoped to learn about their satisfaction with relevance, quality, and quantity of results. (user behavior, user satisfaction) 2. using quick search find a video related to early childhood cognitive development. when you’ve found a suitable video recording, click answer and copy and paste the title. this task aimed to determine whether participants could complete the task, as well as show us which features they used in their attempts. (usability, user behavior) 3. search on speech pathology and find a way to limit your search results to audiology. then, limit your search results to peer reviewed sources. how satisfied are you with the results? since there are several ways to limit results in eds, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. we also hoped to learn about whether they thought the limiters provided satisfactory results. (usability, user behavior, user satisfaction) usability test results for a discovery tool in an academic library | fagan et al 112 4. you need more recent sources. please limit these search results to the last 5 years, then select the most recent source available. click finished when you are done. since there are several ways to limit by date in eds, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. (usability, user behavior) 5. find a way to ask a jmu librarian for help using this search tool. after you’ve found the correct web page, click finished. we wanted to determine whether the user could complete this task, and which pathway they chose to do it. (usability, user behavior) 6. locate the journal yachting and boating world. what are the coverage dates? is this journal available in online full text? we wanted to determine whether the user could locate a journal by title. (usability) 7. you need to look up the sculpture genius of mirth. you have been told that the library database, camio, would be the best place to search for this. locate this database and find the sculpture. we wanted to know whether users who knew they needed to use a specific database could find that database from within the discovery tool. (usability, user behavior). 8. use quick search to find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. when you have found suitable source click answer and copy and paste the titles. click back to webpage if you need to return to your search results. these two tasks were intended to show us how users completed a common, broad task with and without a discovery tool, whether they would be more successful with or without the tool, and what barriers existed with and without the tool (usability, user behavior) 9. without using quick search, find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. when you have found suitable sources click answer and copy and paste the titles. click back to webpage if you need to return to your search results. presidents ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 2 information technology and libraries | september 2008 andrew k. pacepresident’s message andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. w elcome to my first ital column as lita president. i’ve had the good fortune to write a number of columns in the past—in computers in libraries, smart libraries newsletter, and most recently american libraries—and it is a role that i have always cherished. there is just enough space to say what you want, but not all the responsibility of backing it up with facts and figures. in the past, i have worried about having enough to say month after month for an undefined period. now i am daunted by only having one year to address the lita membership and communicate goals and accomplishments of my quickly passing tenure. i am simultaneously humbled and extremely excited to start my presidential year with lita. i have some ambitious agenda items for the division. i said when i was running that i wanted to make lita the kind of organization that new librarians and it professionals want to join and that seasoned librarians wanted to be active in. recruitment to lita is vital, but there is also work to be done to make that recruitment even easier. i am fortunate in following up the great work of my predecessors, many of whom i have had the pleasure of serving with on the lita board since 2005. they have set the bar for me and make the coming year as challenging as anything i have done in my career. i also owe a lot to the membership who stepped forward to volunteer for committees, liaison appointments, and other volunteer opportunities. i also think it is important for lita members to know just how much the board relies on the faithful and diligent services of the lita staff. at my vice presidential town meeting, i talked about marketing and communication in terms of list (who), method (how), and message (what and why). not only was this a good way to do some navel gazing on what it means to be a member of lita, it laid some groundwork for the year ahead. i think it is an inescapable conclusion that the lita board needs to take another look at strategic planning (which expires this year). the approach i am going to recommend, however, is not one that tries to connote the collective wisdom of a dozen lita leaders. instead, i hope we can define a methodology by which lita committees, interest groups, and the membership at large are empowered to both do the work of the division and benefit from it. one of the quirky things that some people know about me is that i actually love bureaucracy. i was pleased to read in the lita bylaws that it is actually my duty as president to “see that the bylaws are observed by the officers and members of the board of directors.” i will tell you all that i also interpret this to mean that the president and the board will not act in ways that are not prescribed. the strength of a volunteer organization comes from its volunteers. the best legacy a lita president can provide is to give committees, interest groups, and the membership a free reign to create its future. as for the board, its main objective is to oversee the affairs of the division during the period between meetings. frankly, we’re not so great at this, and it is one of the biggest challenges for any volunteer organization. it is also one of my predecessor’s initiatives that i plan to follow through on with his help as immediate past president. participation and involvement—and the ability to follow the work and strategies of the division—should be easier for all of us. so, if i were to put my platform in a nutshell it would be this—recruitment, communication, strategic planning, and volunteer empowerment. i left out fun, because it goes without saying that most of us are part of lita because it’s a fun division with great members. this is a lot to get done in one year, but because it will be fun, i’m looking forward to it. editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 214 information technology and libraries | december 2010 margaret brown-sica, jeffrey beall, and nina mchale next-generation library catalogs and the problem of slow response time and librarians will benefit from knowing what typical and acceptable response times are in online catalogs, and this information will assist in the design and evaluation of library discovery systems. this study also looks at benchmarks in response time and defines what is unacceptable and why. when advanced features and content in library catalogs increase response time to the extent that users become disaffected and use the catalog less, nextgen catalogs represent a step backward, not forward. in august 2009, the auraria library launched an instance of the worldcat local product from oclc, dubbed worldcat@auraria. the library’s traditional catalog—named skyline and running on the innovative interfaces platform—still runs concurrently with worldcat@auraria. because worldcat local currently lacks a library circulation module that the library was able to use, the legacy catalog is still required for its circulation functionality. in addition, skyline contains marc records from the serialssolution 360 marc product. since many of these records are not yet available in the oclc worldcat database, these records are being maintained in the legacy catalog to enable access to the library’s extensive collection of online journals. almost immediately upon implementation of worldcat local, many library staff began to express concern about the product’s slow response time. they bemoaned its slowness both at the reference desk and during library instruction sessions. few of the discussions of the product’s slow response time evaluated this weakness in the context of its advanced features. several of the reference and instruction librarians even stated that they refused to use it any longer and that they were not recommending it to students and faculty. indeed, many stated that they would only use the legacy skyline catalog from then on. therefore we decided to analyze the product’s response time in relation to the legacy catalog. we also decided to further our study by examining response time in library catalogs in general, including several different online catalog products from different vendors. ■■ response time the term response time can mean different things in different contexts. here we use it to mean the time it takes for all files that constitute a single webpage (in the case of testing performed, a permalink to a bibliographic record) to travel across the internet from a web server to the computer on which the page is to be displayed. we do not include the time it takes for the browser to render the page, only the time it takes for the files to arrive to the requesting computer. typically, a single webpage is made of multiple files; these are sent via the internet from a web response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the internet from a web server to the end user’s browser. in this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (nextgen) catalog. the authors also discuss acceptable response time and how it may affect the discovery process. they suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selection and development processes. n ext-generation, or nextgen, library catalogs offer advanced features and functionality that facilitate library research and enable web 2.0 features such as tagging and the ability for end users to create lists and add book reviews. in addition, individual catalog records now typically contain much more data than they did in earlier generations of online catalogs. this additional data can include the previously mentioned tags, lists, and reviews, but a bibliographic record may also contain cover images, multiple icons and graphics, tables of contents, holdings data, links to similar items, and much more. this additional data is designed to assist catalog users in the selection, evaluation, and access of library materials. however, all of the additional data and features have the disadvantage of increasing the time it takes for the information to flow across the internet and reach the end user. moreover, the code that handles all this data is much more complex than the coding used in earlier, traditional library catalogs. slow response time has the potential to discourage both library patrons from using the catalog and library staff from using or recommending it. during a reference interview or library instruction session, a slow response time creates an awkward lull in the process, a delay that decreases confidence in the mind of library users, especially novices who are accustomed to the speed of an open internet search. the two-fold purpose of this study is to define the concept of response time as it relates to both traditional and nextgen library catalogs and to measure some typical response times in a selection of library catalogs. libraries margaret brown-sica (margaret.brown-sica@ucdenver.edu) is assistant professor, associate director of technology strategy and learning spaces, jeffrey beall (jeffrey.beall@ucdenver.edu) is assistant professor, metadata librarian, and nina mchale (nina.mchale@ucdenver.edu) is assistant professor, web librarian, university of colorado denver. next-generation library catalogs | brown-sica, beall, and mchale 215 mathews posted an article called “5 next gen library catalogs and 5 students: their initial impressions.”7 here he shares student impressions of several nextgen catalogs. regarding slow response time mathews notes, “lots of comments on slowness. one student said it took more than ten seconds to provide results. some other comments were: ‘that’s unacceptable’ and ‘slow-motion search, typical library.’” nagy and garrison, on lauren’s library blog, emphasized that any “cross-silo federated search” is “as slow as the slower silos.”8 any search interface is as slow as the slowest database from which it pulls information; however, that does not make users more likely to wait for search results. in fact, many users will not even know—or care—what is happening behind the scenes in a nextgen catalog. the assertion that slow response time makes wellintentioned improvements to an interface irrelevant is supported by an article that analyzes the development of semantic web browsers. frachtenberg notes that users, however, have grown to expect web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. it is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them.9 this is not just a library issue. users expect a fast response to all web queries, and we can learn from studies on general web response time and how it affects the user experience. huang and fong-ling help explain different user standards when using websites. their research suggests that “hygiene factors” such as “navigation, information display, ease of learning and response time” are more important to people using “utilitarian” sites to accomplish tasks rather than “hedonistic” sites.10 in other words, response time importance increases when the user is trying to perform a task— such as research—and possibly even more for a task that may be time sensitive—such as trying to complete an assignment for class. ■■ method for testing response time in an assortment of library catalogs, we used the websitepulse service (http://www .websitepulse.com). websitepulse provides in-depth website and server diagnostic services that are intended to save e-business customers time and money by reporting errors and web server and website performance issues to clients. a thirty-day free trial is available for potential customers to review the full array of their services; however, the free web page test, available at http://www.website server and arrive sequentially at the computer where the request was initiated. while the world wide web consortium (w3c) does not set forth any particular guidelines regarding response time, go-to usability expert jakob nielsen states that “0.1 second is about the limit for having the user feel that the system is reacting instantaneously.”1 he further posits that 1.0 second is “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”2 finally, he asserts that: 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. for longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.3 even though this advice dates to 1994, nielsen noted even then that it had “been about the same for many years.”4 ■■ previous studies the chief benefit of studying response time is to establish it as a criterion for evaluating online products that libraries license and purchase, including nextgen online catalogs. establishing response-time benchmarks will aid in the evaluation of these products and will help libraries convey the message to product vendors that fast response time is a valuable product feature. long response times will indicate that a product is deficient and suffers from poor usability. it is important to note, however, that sometimes library technology environments can be at fault in lengthening response time as well; in “playing tag in the dark: diagnosing slowness in library response time,” brown-sica diagnosed delays in response time by testing such variables as vendor and proxy issues, hardware, bandwidth, and network traffic.5 in that case, inadequate server specifications and settings were at fault. while there are many articles on nextgen catalogs, few of them discuss the issue of response time in relation to their success. search slowness has been reported in library literature about nextgen catalogs’ metasearch cousins, federated search products. in a 2006 review of federated search tools metalib and webfeat, chen noted that “a federated search could be dozens of times slower than google.”6 more comments about the negative effects of slow response time in nextgen catalogs can be found in popular library technology blogs. on his blog, 216 information technology and libraries | december 2010 ■■ findings: skyline versus worldcat@auraria in figure 2, the bar graph shows a sample load time for the permalink to the bibliographic record for the title hard lessons: the iraq reconstruction experience in skyline, auraria’s traditional catalog load time for the page is pulse.com/corporate/alltools.php, met our needs. to use the webpage test, simply select “web page test” from the dropdown menu, input a url—in the case of the testing done for this study, the permalink for one of three books (see, for example, figure 1)—enter the validation code, and click “test it.” websitepulse returns a bar graph (figure 2) and a table (figure 3) of the file activity from the server sending the composite files to the end user ’s web browser. each line represents one of the files that make up the rendered webpage. they load sequentially, and the bar graph shows both the time it took for each file to load and the order in which the files were received. longer segments of the bar graph provide visual indication of where a slow-loading webpage might encounter sticking points—for example, waiting for a large image file or third-party content to load. accompanying the bar graph is a table describing the file transmissions in more detail, including dns, connection, file redirects (if applicable), first and last bytes, file transmission times, and file sizes. figure 1. permalink screen shot for the record for the title hard lessons in auraria library’s skyline catalog figure 2. websitepulse webpage test bar graph results for skyline (traditional) catalog record figure 3. websitepulse webpage test table results for skyline (traditional) catalog record next-generation library catalogs | brown-sica, beall, and mchale 217 requested at items 8, 14, 15, 17, 26, and 27. the third parties include yahoo! api services, the google api service, recaptcha, and addthis. recaptcha is used to provide security within worldcat local with optical character recognition images (“captchas”), and the addthis api is used to provide bookmarking functionality. at number 22, a connection is made to the auraria library web server to retrieve a logo image hosted on the web server. at number 28, the cover photo for hard lessons is retrieved from an oclc server. the files listed in figure 6 details the complex process of web browsers’ assembly of them. each connection to third-party content, while all relatively short, allows for additional features and functionality, but lengthens overall response. as figure 6 shows, the response time is slightly more than 10 seconds, which, according to nielsen, “is about the limit for keeping the user ’s attention focused on the dialogue.”12 while widgets, third-party content, and other web 2.0 tools add desirable content and functionality to the library’s catalog, they also do slow response time considerably. the total file size for the bibliographic record in worldcat@auraria—compared to skyline’s 84.64 kb—is 633.09 kb. as will be shown in the test results below for the catalog and nextgen catalog products, bells and whistles added to traditional 1.1429 seconds total. the record is composed of a total of fourteen items, including image files (gifs), cascading style sheet (css) files, and javascript (js) files. as the graph is read downward, the longer segments of the bars reveal the sticking points. in the case of skyline, the nine image files, two css files, and one js file loaded quickly; the only cause for concern is the red line at item four. this revealed that we were not taking advantage of the option to add a favicon to our iii catalog. the web librarian provided the ils server technician with the same favicon image used for the library’s website, correcting this issue. the skyline catalog, judging by this data, falls into nielsen’s second range of user expectations regarding response time, which is more than one second, or “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”11 further detail is provided in figure 3; this table lists each of the webpage’s component files, and various times associated with the delivery of each file. the column on the right lists the size in kilobytes of each file. the total size of the combined files is 84.64 kb. in contrast to skyline’s meager 14 files, worldcat local requires 31 items to assemble the webpage (figure 4) for the same bibliographic record. figures 5 and 6 show that this includes 10 css files, 10 javascript files, and 8 images files (gifs and pngs). no item in particular slows down the overall process very much; the longestloading item is number 13, which is a wait for third-party content, a connection to yahoo!’s user interface (yui) api service. additional third-party content is being figure4. permalink screen shot for the record for the title hard lessons in worldcat@auraria figure 5. websitepulse webpage test bar graph results for worldcat@auraria record 218 information technology and libraries | december 2010 total time for each permalinked bibliographic record to load as reported by the websitepulse tests; this number appears near the lower right-hand corner of the tables in figures 3, 6, 9, 12, and 15. we selected three books that were each held by all five of our test sites, verifying that we were searching the same three bibliographic records in each of the online catalogs by looking at the oclc number in the records. each of the catalogs we tested has a permalink feature; this is a stable url that always points to the same record in each catalog. using a permalink approximates conducting a known-item search for that item from a catalog search screen. we saved these links and used them in our searches. the bibliographic records we tested were for these books; the permalinks used for testing follow the books: book 1: hard lessons: the iraq reconstruction experience. washington, d.c.: special inspector general, iraq reconstruction, 2009 (oclc number 302189848). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/302189848 ■■ skyline: http://skyline.cudenver.edu/record=b243 3301~s0 ■■ lcoc: http://lccn.loc.gov/2009366172 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7195737~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2770895{ckey} book 2: ehrenreich, barbara. nickel and dimed: on (not) getting by in america. 1st ed. new york: metropolitan, 2001 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/45243324 ■■ skyline: http://skyline.cudenver.edu/record=b187 0305~s0 ■■ lcoc: http://lccn.loc.gov/00052514 ■■ ut austin: http://catalog.lib.utexas.edu/record= b5133603~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=1856407{ckey} book 3: langley, lester d. simón bolívar: venezuelan rebel, american revolutionary. lanham: rowman & littlefield catalogs slowed response time considerably, even doubling it in one case. are they worth it? the response of auraria’s reference and instruction staff seems to indicate that they are not. ■■ gathering more data: selecting the books and catalogs to study to broaden our comparison and to increase our data collection, we also tested three other non-auraria catalogs. we designed our study to incorporate a number of variables. we decided to link to bibliographic records for three different books in the five different online catalogs tested. these included skyline and worldcat@auraria as well three additional online public access catalog products, for a total of two instances of innovative interfaces products, one of a voyager catalog, and one of a sirsidynix catalog. we also selected online catalogs in different parts of the country: worldcatlocal in ohio; skyline in denver; the library of congress’ online catalog (lcoc) in washington, d.c.; the university of texas at austin’s (ut austin) online catalog; and the university of southern california’s (usc) online catalog, named homer, in los angeles. we also did our testing at different times of the day. one book was tested in the morning, one at midday, and one in the afternoon. websitepulse performs its webpage tests from three different locations in seattle, munich, and brisbane; we selected seattle for all of our tests. we recorded the figure 6. websitepulse webpage test table results for worldcat@auraria record next-generation library catalogs | brown-sica, beall, and mchale 219 .org/oclc/256770509 ■■ skyline: http://skyline.cudenver.edu/record=b242 6349~s0 ■■ lcoc: http://lccn.loc.gov/2008041868 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7192968~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2755357{ckey} we gathered the data for thirteen days in early november 2009, an active period in the middle of the semester. for each test, we recorded the response time total in seconds. the data is displayed in tables 1–3. we searched bibliographic records for three books in five library catalogs over thirteen days (3 x 5 x 13) for a total of 195 response time measurements. the websitepulse data is calculated to the ten thousandth of a second, and we recorded the data exactly as it was presented. publishers, c2009 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat table 1. response times for book 1 response time in seconds day wor ldcat skyline lc ut austin usc 1 10.5230 1.3191 2.6366 3.6643 3.1816 2 10.5329 1.2058 1.2588 3.5089 4.0855 3 10.4948 1.2796 2.5506 3.4462 2.8584 4 13.2433 1.4668 1.4071 3.6368 3.2750 5 10.5834 1.3763 3.6363 3.3143 4.6205 6 11.2617 1.2461 2.3836 3.4764 2.9421 7 20.5529 1.2791 3.3990 3.4349 3.2563 8 12.6071 1.3172 3.6494 3.5085 2.7958 9 10.4936 1.1767 2.6883 3.7392 4.0548 10 10.1173 1.5679 1.3661 3.7634 3.1165 11 9.4755 1.1872 1.3535 3.4504 3.3764 12 12.1935 1.3467 4.7499 3.2683 3.4529 13 11.7236 1.2754 1.5569 3.1250 3.1230 average 11.8310 1.3111 2.5105 3.4874 3.3953 table 2. response times for book 2 response time in seconds day worldcat skyline lc ut austin usc 1 10.9524 1.4504 2.5669 3.4649 3.2345 2 10.5885 1.2890 2.7130 3.8244 3.7859 3 10.9267 1.3051 0.2168 4.0154 3.6989 4 13.8776 1.3052 1.3149 4.0293 3.3358 5 10.6495 1.3250 4.5732 3.5775 3.2979 6 11.8369 1.3645 1.3605 3.3152 2.9023 7 11.3482 1.2348 2.3685 3.4073 3.5559 8 10.7717 1.2317 1.3196 3.5326 3.3657 9 11.1694 1.0997 1.0433 2.8096 2.6839 10 19.0694 1.6479 2.5779 4.3595 2.6945 11 12.0109 1.1945 2.5344 3.0848 18.5552 12 12.6881 0.7384 1.3863 3.7873 3.9975 13 11.6370 1.1668 1.2573 3.3211 3.6393 average 12.1174 1.2579 1.9410 3.5791 4.5190 table 3. response times for book 3 response time in seconds day worldcat skyline lc ut austin usc 1 10.8560 1.3345 1.9055 3.7001 2.6903 2 10.1936 1.2671 1.8801 3.5036 2.7641 3 11.0900 1.5326 1.3983 3.5983 3.0025 4 10.9030 1.4557 2.0432 3.6248 2.9285 5 12.3503 1.5972 3.5474 3.6428 4.5431 6 9.1008 1.1661 1.4440 3.4577 3.1080 7 9.6263 1.1240 2.3688 3.1041 3.3388 8 10.9539 1.1944 1.4941 2.8968 3.4224 9 11.0001 1.2805 1.3255 3.3644 2.7236 10 10.2231 1.3778 1.3131 3.3863 3.4885 11 10.1358 1.2476 2.3199 3.4552 2.9302 12 12.0109 1.1945 2.5344 3.0848 18.5552 13 11.5881 1.2596 2.5245 3.8040 3.8506 average 10.7717 1.3101 2.0076 3.4325 4.4112 table 4. averages response time in seconds book worldcat skyline lc ut austin usc book 1 11.8310 1.3111 2.5105 3.4874 3.3953 book 2 12.1174 1.2579 1.9410 3.5791 4.5190 book 3 10.7717 1.3101 2.0076 3.4325 4.4112 average 11.5734 1.2930 2.1530 3.4997 4.1085 220 information technology and libraries | december 2010 university of colorado denver: skyline (innovative interfaces) as previously mentioned, the traditional catalog at auraria library runs on an innovative interfaces integrated library system (ils). testing revealed a missing favicon image file that the web server tries to send each time (item 4 in figure 3). however, this did not negatively affect the response time. the catalog’s response time was good, with an average of 1.2930 seconds, giving it the fastest average time among all the test sites in the testing period. as figure 1 shows, however, skyline is a typical legacy catalog that is designed for a traditional library environment. library of congress: online catalog (voyager) the average response time for the lcoc was 2.0076 ■■ results the data shows the response times for each of the three books in each of the five online catalogs over the thirteenday testing period. the raw data was used to calculate averages for each book in each of the five online catalogs, and then we calculated averages for each of the five online catalogs (table 4). the averages show that during the testing period, the response time varied between 1.2930 seconds for the skyline library catalog in denver to 11.5734 seconds for worldcat@auraria, which has its servers in ohio. university of colorado denver: worldcat@auraria worldcat@auraria was routinely over nielsen’s ten second limit, sometimes taking as long as twenty seconds to load all the files to generate a single webpage. as previously discussed, this is due to the high number and variety of files that make up a single bibliographic record. the files sent also include cover images, but they are small and do not add much to the total time. after our tests on worldcat@auraria were conducted, the site removed one of the features on pages for individual resources, namely the “similar items” feature. this feature was one of the most file-intensive on a typical page, and its removal should speed up page loads. however, worldcat@auraria had the highest average response time by far of the five catalogs tested. figure 7. permalink screen shot for the record for the title hard lessons in the library of congress online catalog figure 8. websitepulse webpage test bar graph results for library of congress online catalog record figure 9. websitepulse webpage test table results for library of congress online catalog record next-generation library catalogs | brown-sica, beall, and mchale 221 item 14 is a script, that while hosted on the ils server, queries amazon.com to return cover image art (figures 11–12). the average response time for ut austin’s catalog was 3.4997 seconds. this example demonstrates that response times for traditional (i.e., not nextgen) catalogs can be slowed down by additional content as well. university of southern california: homer (sirsidynix) the average response time for usc’s homer catalog was 4.1085 seconds, making it the second slowest after seconds. this was the second fastest average among the five catalogs tested. while, like skyline, the bibliographic record page is sparsely decorated (figure 7), this pays dividends in response time, as there are only two css files and three gif files to load after the html content loads (figure 9). figure 8 shows that initial connection time is the longest factor in load time; however, it is still short enough to not have a negative effect. total file size is 19.27 kb. as with skyline, the page itself (figure 7) is not particularly end-user friendly to nonlibrarians. university of texas at austin: library catalog (innovative interfaces) ut austin, like auraria library, runs an innovative interfaces ils. the library catalog also includes book cover images, one of the most attractive nextgen features (figure 10), and as shown in figure 12, third-party content is used to add features and functionality (items 16 and 17). ut austin’s catalog uses a google javascript api (item 16 in figure 12) and librarything’s catalog enhancement product, which can add book recommendations, tag browsing, and alternate editions and translations. total content size for the bibliographic record is considerably larger than skyline and the lcoc at 138.84 kb. it appears as though inclusion of cover art nearly doubles the response time; figure 10. permalink screen shot for the record for the title hard lessons in university of texas at austin’s library catalog figure 11. websitepulse webpage test bar graph results for university of texas at austin’s library catalog record figure 12. websitepulse webpage test table results for university of texas at austin’s library catalog record 222 information technology and libraries | december 2010 completed. added functionality and features in library search tools are valuable, but there is a tipping point when these features slow down a product’s response time to where users find the search tool too slow or unreliable. based on the findings of this study, we recommend that libraries adopt web response time standards, such as those set forth by nielsen, for evaluating vendor search products and creating in-house search products. commercial tools like websitepulse make this type of data collection simple and easy. testing should be conducted for an extended period of time, preferably during a peak period—i.e., during a busy time of the semester for academic libraries. we further recommend that reviews of electronic resources add response time as an worldcat@auraria, and the slowest among the traditional catalogs. this sirsidynix catalog appears to take a longer time than the other brands of catalogs to make the initial connection to the ils; this accounts for much of the slowness (see figures 14 and 15). once the initial connection is made, however, the remaining content loads very quickly, with one exception: item 13 (see figure 15), which is a connection to the third-party provider syndetic solutions, which provides cover art, a summary, an author biography, and a table of contents. while the display of this content is attractive and well-integrated to the catalog (figure 13), it adds 1.2 seconds to the total response time. also, as shown in item 14 and 15, usc’s homer uses the addthis service to add bookmarking enhancements to the catalog. total combined file size is 148.47 kb, with the bulk of the file size (80 kb) coming from the initial connection (item 1 in figure 15). ■■ conclusion an eye-catching interface and valuable content are lost on the end user if he or she moves on before a search is figure 13. permalink screen shot for the record for the title hard lessons in homer, the university of southern california’s catalog figure 14. websitepulse webpage test bar graph results for homer, the university of southern california’s catalog figure 15. websitepulse webpage test table results for homer, the university of southern california’s catalog next-generation library catalogs | brown-sica, beall, and mchale 223 4. ibid. 5. margaret brown-sica. “playing tag in the dark: diagnosing slowness in library response time,” information technology & libraries 27, no. 4 (2008): 29–32. 6. xiaotian chen, “metalib, webfeat, and google: the strengths and weaknesses of federated search engines compared with google,” online information review 30, no. 4 (2006): 422. 7. brian mathews, “5 next gen library catalogs and 5 students: their initial impressions,” online posting, may 1, 2009, the ubiquitous librarian blog, http://theubiquitouslibrarian .typepad.com/the_ubiquitous_librarian/2009/05/5-next-genlibrary-catalogs-and-5-students-their-initial-impressions.html (accessed feb. 5, 2010) 8. andrew nagy and scott garrison, “next-gen catalogs are only part of the solution,” online posting. oct. 4, 2009, lauren’s library blog, http://laurenpressley.com/library/2009/10/next -gen-catalogs-are-only-part-of-the-solution/ (accessed feb. 5, 2010). 9. eitan frachtenberg, “reducing query latencies in web search using fine-grained parallelism,” world wide web 12, no. 4 (2009): 441–60. 10. travis k huang and fu fong-ling, “understanding user interface needs of e-commerce web sites,” behaviour & information technology 28, no. 5 (2009): 461–69, http://www .informaworld.com/10.1080/01449290903121378 (accessed feb. 5, 2010). 11. nielsen, usability engineering, 135. 12. ibid. evaluation criterion. additional research about response time as defined in this study might look at other search tools, to include article databases, and especially other metasearch products that collect and aggregate search results from several remote sources. further studies with more of a technological focus could include discussions of optimizing data delivery methods—again, in the case of metasearch tools from multiple remote sources—to reduce response time. finally, product designers should pay close attention to response time when designing information retrieval products that libraries purchase. ■■ acknowledgments the authors wish to thank shelley wendt, library data analyst, for her assistance in preparing the test data. references 1. jakob nielsen, usability engineering (san francisco: morgan kaufmann, 1994): 135. 2. ibid. 3. ibid. generating collaborative systems for digital libraries | visser and ball 187 marijke visser and mary alice ball the middle mile: the role of the public library in ensuring access to broadband of fundamentally altering culture and society. in some circles the changes happen in real time as new web-based applications are developed, adopted, and integrated into the user’s daily life. these users are the early adopters; the internet cognoscenti. second tier users appreciate the availability of online resources and use a mix of devices to access internet content but vary in the extent to which they try the latest application or device. the third tier users also vary in the amount they access the internet but have generally not embraced its full potential, from not seeking out readily available resources to not connecting at all.1 regardless of the degree to which they access the internet, all of these users require basic technology skills and a robust underlying infrastructure. since the introduction of web 2.0, the number and type of participatory web-based applications has continued to grow. many people are eagerly taking part in creating an increasing variety of web-based content because the basic tools to do so are widely available. the amateur, creating and sharing for primarily personal reasons, has the ability to reach an audience of unprecedented size. in turn, the internet audience, or virtual audience, can select from a vast menu of formats, including multimedia and print. with print resources disappearing, it is increasingly likely for an individual to only be able to access necessary material online. web-based resources are unique in that they enable an undetermined number of people, personally connected or complete strangers, to interact with and manipulate the content thereby creating something new with each interaction and subsequent iteration. many of these new resources and applications require much more bandwidth than traditional print resources. with the necessary technology no longer out of reach, a crosssection of society is affecting the course the twenty-first century is taking vis à vis how information is created, who can create it, and how we share it.2 in turn, who can access web-based content and who decides how it can be accessed become critical questions to answer. as people become more adept at using web-based tools and eager to try new applications, the need for greater broadband will intensify. the economic downturn is having a marked effect on people’s internet use. if there was a preexisting problem with inadequate access to broadband, current circumstances exacerbate it to where it needs immediate attention. access to broadband internet today increases this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. i n the last twenty years library collections have evolved from being predominantly print-based to ones that have a significant digital component. this trend, which has a direct impact on library services, has only accelerated with the advent of web 2.0 technologies and participatory content creation. cutting-edge libraries with next generation catalogs encourage patrons to post reviews, contribute videos, and write on library blogs and wikis. even less adventuresome institutions offer a variety of electronic databases licensed from multiple publishers and vendors. the piece of these library portfolios that is at best ignored and at worst vilified is the infrastructure that enables internet connectivity. in 2010, broadband telecommunication is recognized as essential to access the full range of information resources. telecommunications experts articulate their concerns about the digital divide by focusing on firstand last-mile issues of bringing fiber and cable to end users. the library, particularly the public library, represents the metaphorical middle mile providing the public with access to rich information content. equally important, it provides technical knowledge, subject matter expertise, and general training and support to library users. this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. ■■ the culture of information information today is dynamic. as the internet continues on its fast paced, evolutionary track, what we call ‘information’ fluctuates with each emerging web-based technology. theoretically a democratic platform, the internet and its user-generated content is in the process marijke visser (mvisser@alawash.org) is information technology policy analyst and mary alice ball (maryaliceball@yahoo .com) former chair, telecommunications subcommittee, office for information technology policy, american library association, washington, dc. 188 information technology and libraries | december 2010 the geographical location of a community will also influence what kind of internet service is available because of deployment costs. these costs are typically reflected in varying prices to consumers. in addition to the physical layout of an area, current federal telecommunications policies limit the degree to which incentives can be used on the local level.7 encouraging competition between isps, including municipal electric utilities, incumbent local exchange carriers, and national cable companies, for example, requires coordination between local needs and state and federal policies. such coordinated efforts are inherently difficult when taking into consideration the numerous differences between locales. ultimately, though, all of these factors influence the price end users must pay for internet access. with necessary infrastructure and telecommunications policies in place, there are individual behaviors that also affect broadband adoption. according to the pew study, “home broadband adoption 2008,” 62 percent of dial-up users are not interested in switching to broadband.8 clearly there is a segment of the population that has not yet found personal relevance to high-speed access to online resources. in part this may be because they only have experience with dial-up connections. depending on dial-up gives the user an inherently inferior experience because bandwidth requirements to download a document or view a website with multimedia features automatically prevent these users from accessing the same resources as a user with a high-speed connection. a dial-up user would not necessarily be aware of this difference. if this is the only experience a user has it might be enough to deter broadband adoption, especially if there are other contributing factors like lack of technical comfort or availability of relevant content. motivation to use the internet is influenced by the extent to which individuals find content personally relevant. whether it is searching for a job and filling out an application, looking at pictures of grandchildren, using skype to talk to a family member deployed in iraq, researching healthcare providers, updating a personal webpage, or streaming video, people who do these things have discovered personally relevant internet content and applications. understanding the potential relevance of going online makes it more likely that someone would experiment with other applications, thus increasing both the familiarity with what is available and the comfort level with accessing it. without relevant content, there is little motivation for someone not inclined to experiment with internet technology to cross what amounts to a significant hurdle to adoption. anthony wilhelm argues in a 2003 article discussing the growing digital divide that culturally relevant content is critical in increasing the likelihood that non-users will want to access web-based resources.9 the scope of the issue of providing culturally relevant content is underscored in the 2008 pew study, the amount of information and variety of formats available to the user. in turn more content is being distributed as users create and share original content.3 businesses, nonprofits, municipal agencies, and educational institutions appreciate that by putting their resources online they reach a broader segment of their constituency. this approach to reaching an audience works provided the constituents have their own access to the materials, both physically and intellectually. it is one thing to have an internet connection and another to have the skill set necessary to make productive use of it. as reported in job-seeking in u.s. public libraries in 2009, “less than 44% of the top 100 u.s. retailers accept instore paper applications.”4 municipal, state, and federal agencies are increasingly putting their resources online, including unemployment benefit applications, tax forms, and court documents.5 in addition to online documents, the report finds social service agencies may encourage clients to make appointments and apply for state jobs online.6 many of the processes that are now online require an ability to navigate the complexities of the internet at the same time as navigating difficult forms and websites. the combination of the two can deter someone from retrieving necessary resources or successfully completing a critical procedure. while early adopters and policy-makers debate the issues surrounding internet access, the other strata of society, knowingly or not and to varying degrees, are enmeshed in the outcomes of these ongoing discussions because their right to information is at stake. ■■ barriers to broadband access by condensing internet access issues to focus on the availability of adequate and sustainable broadband, it is possible to pinpoint four significant barriers to access: price, availability, perceived relevance, and technical skill level. the first two barriers are determined by existing telecommunications infrastructure as well as local, state, and federal telecommunications policies. the latter barriers are influenced by individual behaviors. both divisions deserve attention. if local infrastructure and the internet service provider (isp) options do not support broadband access to all areas within its boundaries, the result will be that some community members can have broadband services at home while others must rely on work or public access computers. it is important to determine what kind of broadband services are available (e.g., cable, dsl, fiber, satellite) and if they are robust enough to support the activities of the community. infrastructure must already be in place or there must be economic incentive for isps to invest in improving current infrastructure or in installing new infrastructure. generating collaborative systems for digital libraries | visser and ball 189 at all. success hinges on understanding that each community is unique, on leveraging its strengths, and on ameliorating its weaknesses. local government can play a significant role in the availability of broadband access. from a municipal perspective, emphasizing the role of broadband as a factor in economic development can help define how the municipality should most effectively advocate for broadband deployment and adoption. gillett offers four initiatives appropriate for stimulating broadband from a local viewpoint. municipal governments can ■■ become leaders in developing locally relevant internet content and using broadband in their own services; ■■ adopt policies that make it easier for isps to offer broadband; ■■ subsidize broadband users and/or isps; or ■■ become involved in providing the infrastructure or services themselves.12 individually or in combination these four initiatives underscore the fact that government awareness of the possibilities for community growth made possible by broadband access can lead to local government support for the initiatives of other local agencies, including nonprofit, municipal, or small businesses. agencies partnering to support community needs can provide evidence to local policy makers that broadband is essential for community success. once the municipality sees the potential for social and economic development, it is more likely to support policies that stimulate broadband buildout. building strong local partnerships will set the stage for the development of a sustainable broadband initiative as the different stakeholders share perspectives that take into account a variety of necessary components. when the time comes to implement a strategy, not only will different perspectives have been included, the plan will have champions to speak for it: the government, isps, public and private agencies, and community members. it is important to know which constituents are already engaged in supporting community broadband initiatives and which should be tapped. the ultimate purpose in establishing broadband internet access in a community is to benefit the individual community members, thereby stimulating local economic development. key players need to represent agencies that recognize the individual voice. a 2004 study led by strover provides an example of the importance of engaging local community leaders and agencies in developing a successful broadband access project.13 the study looked at thirty-six communities that received state funding to establish community technology centers (ctc). it addressed the effective use and management of ctcs and called attention to the inadequacy of supplying the hardware without community support which found that of the 27 percent of adult americans who are not internet users, 33 percent report they are not interested in going online.10 that pew can report similar information five years after the wilhelm article identifies a barrier to equitable access that has not been adequately resolved. ■■ models for sustainable broadband availability in discussing broadband, the question of what constitutes broadband inevitably arises. gillett, lehr, and osoria, in “local government broadband initiatives,” offers a functional definition: “access is ‘broadband’ if it represents a noticeable improvement over standard dial-up and, once in place, is no longer perceived as the limiting constraint on what can be done over the internet.”11 while this definition works in relationship to dial-up, it is flexible enough to apply to all situations by focusing on “a noticeable improvement” and “no longer perceived as the limiting constraint” (added emphasis). ensuring sustainable broadband access necessitates anticipating future demand. short sighted definitions, applicable at a set moment in time, limit long-term viability of alternative solutions. devising a sustainable solution calls for careful scrutiny of alternative models, because the stakes are so high in the broadband debate. there are many different players involved in constructing information policies. this does not mean, however, that their perspectives are mutually exclusive. in debates with multiple perspectives, it is important to involve stakeholders who are aligned with the ultimate goal: assuring access to quality broadband to anyone going online. what is successful for one community may be entirely inappropriate in another; designing a successful system requires examining and comparing a range of scenarios. existing circumstances may predetermine a particular starting point, but one first step is to evaluate best practices currently in place in a variety of communities to come up with a plan that meets the unique criteria of the community in question. sustainable broadband solutions need to be developed with local constituents in mind and successful solutions will incorporate the realities of current and future local technologies and infrastructure as well as local, state, and federal information policies. presupposing that the goal is to provide the community with the best possible option(s) for quality broadband access, these are key considerations to take into account when devising the plan. in addition to the technological and infrastructure issues, within a community there will be a combination of ways people access the internet. there will be those who have home access, those who need public access, and those who do not seek access 190 information technology and libraries | december 2010 the current emphasis on universal broadband depends on selecting the best of the alternative plans according to carefully vetted criteria in order to develop a flexible and forward-thinking course of action. can we let people remain without access to robust broadband and the necessary skill set to use it effectively? no. as more and more resources critical to basic life tasks are accessible only online, those individuals that face challenges to going online will likely be socially and economically disadvantaged when compared to their online counterparts. recognition of this potential for intensifying digital divide is recognized in the federal communication commission’s (fcc) national broadband plan (nbp) released in march 2010.18 the nbp states six national broadband goals, the third of which is “every american should have affordable access to robust broadband service, and the means and skills to subscribe if they so choose.”19 research conducted for the recommendations in the nbp was comprehensive in scope including voices from industry, public interest, academia, and municipal and state government. responses to more than thirty public notices issued by the fcc provide evidence of wide concern from a variety of perspectives that broadband access should become ubiquitous if the united states is to be a competitive force in the twentyfirst century. access to essential information such as government, public safety, educational, and economic resources requires a broadband connection to the internet. it is incumbent on government officials, isps, and community organizations to share ideas and resources to achieve a solution for providing their communities with robust and sustainable broadband. it is not necessary to have all users up to par with the early adopters. there is not a one-size-fits-all approach to wanting to be connected, nor is there a one-size-fits-all solution to providing access. what is important is that an individual can go online via a robust, high-speed connection that meets that individual’s needs at that moment. what this means for finding solutions is ■■ there needs to be a range of solutions to meet the needs of individual communities; ■■ they need to be flexible enough to meet the evolving needs of these communities as applications and online content continue to change; and ■■ they must be sustainable for the long term so that the community is prepared to meet future needs that are as yet unknown. solutions to providing broadband internet access will be most successful when they are designed starting at the local level. community needs vary according to local demographics, geography, existing infrastructure, types of service providers, and how state and federal systems in place. users need a support system that highlights opportunities available via the internet and that provides help when they run into problems. access is more than providing the infrastructure and hardware. the potential users must also find content that is culturally relevant in an environment that supports local needs and expectations. strover found the most successful ctcs were located in places that “actively attracted people for other social and entertaining reasons.”14 in other words, the ctcs did not operate in a vacuum devoid of social context. successful adoption of the ctcs as a resource for information was dependent on the targeted population finding culturally relevant content in a supportive environment. an additional point made in the study showed that without strong community leadership, there was not significant use of the ctc even when placed in an already established community center.15 this has significant implications for what constitutes access as libraries plan broadband initiatives. investments in technology and a national commitment to ensure universal access to these new technologies in the 1990s provide the current policy framework. as suggested by wilhelm in 2003, to continue to move forward the national agenda needs to focus on updating policies to fit new information circumstances as they arise. today’s information policy debates should emphasize a similar focus. beyond accelerating broadband deployment into underserved areas, wilhelm suggests there needs to be support for training and content development that guarantees communities will actually use and benefit from having broadband deployed in their area.16 technology training and support for local agencies that provide the public with internet access, as well as opportunities for the individuals themselves, is essential if policies are going to actually lead to useful broadband adoption. individual and agency internet access and adoption require investment beyond infrastructure; they depend on having both culturally relevant content and the information literacy skills necessary to benefit from it. ■■ finding the right solution though it may have taken an economic crisis to bring broadband discussions into the living room, the result is causing renewed interest in a long-standing issue. many states have formed broadband task forces or councils to address the lack of adequate broadband access at the state level and, on the national front, broadband was a key component of the american recovery and reinvestment act of 2009.17 the issue changes as technologies evolve but the underlying tenet of providing people access to the information and resources they need to be productive members of society is the same. what becomes of generating collaborative systems for digital libraries | visser and ball 191 difficult to measure, these kinds of social and cultural capital are important elements in ongoing debates about uses and consequences of broadband access. an ongoing challenge for those interested in the social, economic, and policy consequences of modern information networks will be to keep up with changing notions of what it means to be connected in cyberspace.”20 the social contexts in which a broadband plan will be enacted influence the appropriateness of different scenarios and should help guide which ones are implemented. engaging a variety of stakeholders will increase the likelihood of positive outcomes as community members embrace the opportunities provided by broadband internet access. it is difficult, however, to anticipate the outcomes that may occur as users become more familiar with the resources and achieve a higher level of comfort with technology. ramirez states, the “unexpected outcomes” section of many evaluation reports tends to be rich with anecdotes . . . . the unexpected, the emergent, the socially constructed innovations seem to be, to a large extent, off the radar screen, and yet they often contain relevant evidence of how people embrace technology and how they innovate once they discover its potential.21 community members have the most to gain from having broadband internet access. including them will increase the community’s return on its investment as they take advantage of the available resources. ramirez suggests that “participatory, learning, and adaptive policy approaches” will guide the community toward developing communication technology policies that lead to a vibrant future for individuals and community alike.22 as success stories increase, the aggregation of local communities’ social and economic growth will lead to a net sum gain for the nation as a whole. ■■ the role of the library public libraries play an important role in providing internet access to their community members. according to a 2008 study, the public library is the only outlet for no-fee internet access in 72.5 percent of communities nationwide; in rural communities the number goes up to 82.0 percent.23 beyond having desktop or, in some cases, wireless access, public libraries offer invaluable user support in the form of technical training and locally relevant content. libraries provide a secondary community resource for other local agencies who can point their clients to the library for no-fee internet access. in today’s economy where anecdotal reports show an increase in library use, particularly internet use, the role of the public policies mesh with local ordinances. local stakeholders best understand the complex interworking of their community and are aware of who should be included in the decision-making process. including a local perspective will also increase the likelihood that as community needs change, new issues will be brought to the attention of policy makers and agencies who advocate for the individual community members. community agencies that already are familiar with local needs, abilities, and expectations are logical groups to be part of developing a successful local broadband access strategy. the library exemplifies a community resource whose expertise in local issues can inform information policy discussions on local, state, and federal levels. as a natural extension of library service, libraries offer the added value support necessary for many users to successfully navigate the internet. the library is an established community hub for informational resources and provides dedicated staff, technology training opportunities, and no-fee public access computers with an internet connection. libraries in many communities are creating locally relevant web-based content as well as linking to other community resources on their own websites. seeking a partnership with the local library will augment a community broadband initiative. it is difficult to appreciate the impacts of current information technologies because they change so rapidly there is not enough time to realistically measure the effects of one before it is mixed in with a new innovation. with web-based technologies there is a lag time between what those in the front of the pack are doing online and what those in the rear are experiencing. while there is general consensus that broadband internet access is critical in promoting social and economic development in the twenty-first century as is evidenced by the national purposes outlined in the nbp, there is not necessarily agreement on benchmarks for measuring the impacts. three anticipated outcomes of providing community access to broadband are ■■ civic participation will increase; ■■ communities will realize economic growth; and ■■ individual quality of life will improve. when a strategy involves significant financial and energy investments there is a tendency to want palpable results. the success of providing broadband access in a community is challenging to capture. to achieve a level of acceptable success it is necessary to focus on local communities and aggregate anecdotal evidence of incremental changes in public welfare and economic gain. acceptable success is subjective at best but can be usefully defined in context of local constituencies. referring to participation in the development of a vibrant culture, horrigan notes that “while inherently 192 information technology and libraries | december 2010 isolation. an individual must possess skills to navigate the online resources. as users gain an understanding of the potential personal growth and opportunities broadband yields, they will be more likely to seek additional online resources. by stimulating broadband use, the library will contribute to the social and economic health of the community. if the library is to extend its role as the information hub in the community by providing no-fee access to broadband to anyone who walks through the door, the local community must be prepared to support that role. it requires a commitment to encourage build out of appropriate technology necessary for the library to maintain a sustainable internet connection. it necessitates that local communities advocate for national information and communication policies that are pro-library. when public policy supports the library’s efforts, the local community benefits and society at large can progress. what if the library’s own technology needs are not met? the role of the library in its community is becoming increasingly important as more people turn to it for their internet access. without sufficient revenue, the library will have a difficult time meeting this additional demand for services. in turn, in many libraries increased demand for broadband access stretches the limit of it support for both the library staff and the patrons needing help at the computers. what will be the fallout from the library not being able to provide internet services the patrons desire and require? will there be a growing skills difference between people who adopt emerging technologies and incorporate them into their daily lives and those who maintain the technological status quo? what will the social impact be of remaining off line either completely or only marginally? can the library be the bridge between those on the edge, those in the middle, and those at the end? with a strong and well articulated vision for the future, the library can be the link that provides the community with sustainable broadband. ■■ conclusion the recent national focus on universal broadband access has provided an opportunity to rectify a lapse in effective information policy. whether the goal includes facilitating meaningful access continues to be more elusive. as government, organizations, businesses, and individuals rely more heavily on the internet for sharing and receiving information, broadband internet access will continue to increase in importance. following the status quo will not necessarily lead to more people having broadband access in the long run. the early adopters will continue to stimulate technological innovation which, in turn, will trickle down the ranks of the different user types. currently, library as a stable internet provider cannot be overestimated. to maintain its vital function, however, the library must also resolve infrastructure challenges of its own. because of the increased demand for access to internet resources, public libraries are finding their current broadband services are not able to support the demand of their patrons. the issues are two-fold: increased patron use means there are often neither sufficient workstations nor broadband speeds to meet patron demand. in 2008, about 82.5 percent of libraries reported an insufficient number of public workstations, and about 57.5 percent reported insufficient broadband speeds.24 to add to these already significant issues, the report indicates libraries are having trouble supporting the necessary information technology (it) because of either staff time constraints or the lack of a dedicated it staff.25 public libraries are facing considerable infrastructure management issues at a time when library use is increasing. overcoming the challenges successfully will require support on the local, state, and federal level. here is where the librarian, as someone trained to become inherently familiar with the needs of her local constituency and ethically bound to provide access to a variety of information resources, needs to insert herself into the debate. librarians need to be ahead of the crowd as the voice that assures content will be readily accessible to those who seek it. today, the elemental policy issue regarding access to information via the internet hinges on connectivity to a sustainable broadband network. to promote equitable broadband access, the librarian needs be aware of the pertinent information policies in place or under consideration, and be able to anticipate those in the future. additionally, she will need to educate local policy makers about the need for broadband in their community. in some circumstances, the librarian will need to move beyond her local community and raise awareness of community access issues on the state and federal level. the librarian is already able to articulate numerous issues to a variety of stakeholders and can transfer this skill to advocate for sustainable broadband strategies that will succeed in her local community. there are many strata of internet users, from those in the forefront of early adoption to those not interested in being online at all. the early adopters drive the market which responds by making resources more and more likely to be primarily available only online. as we continue this trend, the social repercussions increase from merely not being able to access entertainment and news to being unable to participate in the knowledge-based society of the twenty-first century. by folding in added value online access for the community, the library helps increase the likelihood that the community will benefit from broadband being available to the library patrons and by extension to the community as a whole. to realize the internet’s full potential, access to it cannot be provided in generating collaborative systems for digital libraries | visser and ball 193 community, the entire community benefits regardless of where and how the individuals go online. the effects of the internet are now becoming broadly social enough that there is a general awareness that the internet is not decoration on contemporary society but a challenge to it.28 being connected is no longer an optional luxury; to engage in the twenty-first century it is essential. access to the internet, however, is more than simple connectivity. successful access requires: an understanding of the benefits to going on line, technological comfort, information literacy, ongoing support and training, and the availability of culturally relevant content. people are at various levels of internet use, from those eagerly anticipating the next iteration of web-based applications to those hesitant to open an e-mail account. this user spectrum is likely to continue. though the starting point may vary depending on the applications that become important to the user in the middle of the spectrum, there will be those out in front and those barely keeping up. the implications of the pervasiveness of the internet are only beginning to be appreciated and understood. because of their involvement at the cutting edge of internet evolution, librarians can help lead the conversations. libraries have always been situated in neutral territory within their communities and closely aligned with the public good. librarians understand the perspective of their patrons and are grounded in their local communities. librarians can therefore advocate effectively for their communities on issues that may not completely be understood or even recognized as mattering. connectivity is an issue supremely important to the library as today access to the full range of information necessitates a broadband connection. libraries have carved out a role for themselves as a premier internet access provider in the continually evolving online culture. as noted by bertot, mcclure, and jaeger, the “role of internet access provider for the community is ingrained in the social perceptions of public libraries, and public internet access has become a central part of community perceptions about libraries and the value of the library profession.”29 in times of both economic crisis and technological innovation, there are many unknowns. in part because of these two juxtaposed events, the role of the public library is in flux. additionally, the network of community organizations that libraries link to is becoming more and more complex. it is a time of great opportunity if the library can articulate its role and frame it in relationship to broader society. evolving internet applications require increasing amounts of bandwidth and the trend is to make these bandwidth-heavy applications more and more vital to daily life. one clear path the library community can take however, the supply of internet resources is unevenly stimulating user demand and the unequal distribution of broadband access has greater potential for significant negative social consequences. staying the course and following a haphazard evolution of broadband adoption, may, in fact, renew valid concerns about a digital divide. without an intentional and coordinated approach to developing a broadband strategy, its success is likely to fall short of expectations. the question of how to ensure that internet content is meaningful requires instituting a plan on a very local level, including stakeholders who are familiar with the unique strengths and weaknesses of their community. strover, in her 2000 article the first mile, suggests connectivity issues should be viewed from a first mile perspective where the focus is on the person accessing the internet and her qualitative experience rather than from a last mile perspective which emphasizes isp, infrastructure, and market concerns.26 both perspectives are talking about the same physical section of the connection network: the piece that connects the user to the network. according to strover, distinguishing between the first mile and last mile perspectives is more than an arbitrary argument over semantics. instead, a first mile perspective represents a shift “in the values and priorities that shape telecommunications policy.”27 by switching to a first mile perspective, connectivity issues immediately take into account the social aspects of what it means to be online. who will bring this perspective to the table? and how will we ascertain what the best approach to supporting the individual voice should be? the first mile perspective is one the library is intimately familiar with as an organization that traditionally advocates for the first mile of all information policies. the library is in a key position in the connectivity debate because of its inclination to speak for the user and to be aware of the unique attributes and needs of its local community. as part of its mission, the library takes into account the distinctive needs of its user community when it designs and implements its services. a natural outgrowth of this practice is to be keenly aware of the demographics of the community at large. the library can leverage its knowledge and understanding to create an even greater positive impact on the social, educational, and economic community development made possible by broadband adoption. to extend the first mile perspective analogy, in the connectivity debate, the library will play the role of the middle mile: the support system that successfully connects the internet to the consumer. while the target populations for stimulating demand for broadband are really those in the second tier of users, by advocating for the first mile perspective, the library will be advocating for equitable information policies whose implementation has bearing on the early adopters as well. by stimulating demand for broadband within a 194 information technology and libraries | december 2010 initiatives,” 538. 12. ibid., 537–58. 13. sharon strover, gary chapman, and jody waters, “beyond community networking and ctcs: access, development, and public policy,” telecommunications policy 28, no. 7/8 (2004): 465–85. 14. ibid., 483. 15. ibid. 16. wilhelm, “leveraging sunken investments in communications infrastructure,” 282. 17. see, for example, the virginia broadband round table (http://www.otpba.vi.virginia.gov/broadband_roundtable .shtml), the ohio broadband council (http://www.ohiobroad bandcouncil.org/), and the california broadband task force (http://gov.ca.gov/speech/4596. see www.fcc.gov/recovery/ broadband/) for information on broadband initiatives in the american recovery and reinvestment act. 18. federal communication commission, national broadband plan: connecting america, http://www.broadband.gov/ (accessed apr. 11, 2010). 19. ibid. 20. horrigan, “broadband: what’s all the fuss about?” 2. 21. ricardo ramirez, “appreciating the contribution of broadband ict with rural and remote communities: stepping stones toward and alternative paradigm,” the information society 23 (2007): 86. 22. ibid., 92. 23. denise m. davis, john carlo bertot, and charles, r. mcclure, “libraries connect communities: public library funding & technology access study 2007–2008,” 35, http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708/libraries connectcommunities.pdf (accessed jan. 24, 2009). 24. john carlo bertot et al., “public libraries and the internet 2008: study results and findings,” 11, http://www.ii.fsu.edu/ projectfiles/plinternet/2008/everything.pdf (accessed jan. 24, 2009). these numbers represent an increase from the previous year’s study which suggests that libraries while trying to meet demand are not able to keep up. 25. ibid. 26. sharon strover, “the first mile,” the information society 16, no. 2 (2000): 151–54. 27. ibid., 151. 28. clay shirky, “here comes everybody: the power of organizing without organizations.” berkman center for internet & society (2008). video presentation. available at http:// cyber.law.harvard.edu/interactive/events/2008/02/shirky (retrieved march 1, 2009). 29. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286, http://www.journals.uchicago.edu.proxy.ulib.iupui.edu/ doi/pdf/10.1086/588445 (accessed jan. 30, 2009). is to develop its role as the middle mile connecting the increasing breadth of internet resources to the general public. the broadband debate has moved out of the background of telecommunication policy and into the center of public attention. now is the moment that calls for an information policy advocate who can represent the end user while understanding the complexity of the other stakeholder perspectives. the library undoubtedly has its own share of stakeholders, but over time it is an institution that has maintained a neutral stance within its community, thereby achieving a unique ability to speak for all parties. those who speak for the library are able to represent the needs of the public, work with a diverse group of stakeholders, and help negotiate a sustainable strategy for providing broadband internet access. references and notes 1. lee rainie, “2.0 and the internet world,” internet librarian 2007, http://www.pewinternet.org/presentations/2007/20 -and-the-internet-world.aspx (accessed mar. 4, 2009). see also john horrigan, “a typology of information and communication technology users,” 2007, www.pewinternet.org/~/media// files/reports/2007/pip_ict_typology.pdf.pdf (accessed feb. 12, 2009). 2. lawrence lessig, “early creative commons history, my version,” video blog post, 2008, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed jan. 20, 2009). see the relevant passage from 20:53 through 21:50. 3. john horrigan, “broadband: what’s all the fuss about?” 2007, p. 1, http://www.pewinternet.org/~/media/ files/reports/2007/broadband%20fuss.pdf.pdf (accessed feb. 12, 2009). 4. “job-seeking in us public libraries,” public library funding & technology access study, 2009, http://www.ala.org/ ala/research/initiatives/plftas/issuesbriefs/brief_jobs_july.pdf (accessed mar. 27, 2009). 5. ibid. 6. ibid. 7. sharon e. gillett, william h. lehr, and carlos osorio, “local government broadband initiatives,” telecommunications policy 28 (2004): 539. 8. john horrigan, “home broadband adoption 2008,” 10, http://www.pewinternet.org/~/media//files/reports/2008/ pip_broadband_2008.pdf (accessed feb. 12, 2009). 9. anthony g. wilhelm, “leveraging sunken investments in communications infrastructure: a policy perspective from the united states,” the information society 19 (2003): 279–86. 10. horrigan, “home broadband adoption,” 12. 11. gillett, lehr, and osorio, “local government broadband can bibliographic data be put directly onto the semantic web? | yee 55 martha m. yee can bibliographic data be put directly onto the semantic web? this paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the semantic web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of more frbrized cataloging rules than those about to be introduced to the library community (resource description and access) and in creating an rdf data model for the rules. i am now in the process of trying to model my cataloging rules in the form of an rdf model, which can also be inspected at http://myee. bol.ucla.edu/. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is sophisticated enough yet to deal with our data. this article is an attempt to identify some of those areas and explore whether or not the problems i have encountered are soluble—in other words, whether or not our data might be able to live on the semantic web. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. t his paper is a think piece about the possible future of bibliographic control; as such, it raises more complex questions than it answers. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of frbrized descriptive and subject-cataloging rules. here my focus will be on the data model rather than on the frbrized cataloging rules for gathering data to put in the model, although i hope to have more to say about the latter in the future. the intent is not to present you with conclusions but to present some questions about data modeling that have arisen in the course of the experiment. my premise is that decisions about the data model we follow in the future should be made openly and as a community rather than in a small, closed group of insiders. if we are to move toward the creation of metadata that is more interoperable with metadata being created outside our community, as is called for by many in our profession, we will need to address these complex questions as a community following a period of deep thinking, clever experimentation, and astute political strategizing. n the vision the semantic web is still a bewitching midsummer night’s dream. it is the idea that we might be able to replace the existing html–based web consisting of marked-up documents—or pages—with a new rdf– based web consisting of data encoded as classes, class properties, and class relationships (semantic linkages), allowing the web to become a huge shared database. some call this web 3.0, with hyperdata replacing hypertext. embracing the semantic web might allow us to do a better job of integrating our content and services with the wider internet, thereby satisfying the desire for greater data interoperability that seems to be widespread in our field. it also might free our data from the proprietary prisons in which it is currently held and allow us to cooperate in developing open-source software to index and display the data in much better ways than we have managed to achieve so far in vendor-developed ils opacs or in giant, bureaucratic bibliographic empires such as oclc worldcat. the semantic web also holds the promise of allowing us to make our work more efficient. in this bewitching vision, we would share in the creation of uniform resource identifiers (uris) for works, expressions, manifestations, persons, corporate bodies, places, subjects, and so on. at the uri would be found all of the data about that entity, including the preferred name and the variant names, but also including much more data about the entity than we currently put into our work (name-title and title), such as personal name, corporate name, geographic, and subject authority records. if any of that data needed to be changed, it would be changed only once, and the change would be immediately accessible to all users, libraries, and library staff by means of links down to local data such as circulation, acquisitions, and binding data. each work would need to be described only once at one uri, each expression would need to be described only once at one uri, and so forth. very much up in the air is the question of what institutional structures would support the sharing of the creation of uris for entities on the semantic web. for the data to be reliable, we would need to have a way to ensure that the system would be under the control of people who had been educated about the value of clean and accurate entity definition, the value of choosing “most commonly known” preferred forms (for display in lists of multiple different entities), and the value of providing access martha m. yee (myee@ucla.edu) is cataloging supervisor at the university of california, los angeles film and television archive. 56 information technology and libraries | june 2009 under all variant forms likely to be sought. at the same time, we would need a mechanism to ensure that any interested members of the public could contribute to the effort of gathering variants or correcting entity definitions when we have had inadequate information. for example, it would be very valuable to have the input of a textual or descriptive bibliographer applied to difficult questions concerning particular editions, issues, and states of a significant literary work. it would also be very valuable to be able to solicit input from a subject expert in determining the bounds of a concept entity (subject heading) or class entity (classification). n the experiment (my project) to explore these bewitching ideas, i have been conducting an experiment. as part of my experiment, i designed a set of cataloging rules that are more frbrized than is rda in the sense that they more clearly differentiate between data applying to expression and data applying to manifestation. note that there is an underlying assumption in both frbr (which defines expression quite differently from manifestation) and on my part, namely that catalogers always know whether a given piece of data applies at either the expression or the manifestation level. that assumption is open to questioning in the process of the experiment as well. my rules also call for creating a more hierarchical and degressive relationship between the frbr entities work, expression, manifestation, and item, such that data pertaining to the work does not need to be repeated for every expression, data pertaining to the expression does not need to be repeated for every manifestation, and so forth. degressive is an old term used by bibliographers for bibliographies that provide great detail about first editions and less detail for editions after the first. i have adapted this term to characterize my rules, according to which the cataloger begins by describing the work; any details that pertain to all expressions and manifestations of the work are not repeated in the expression and manifestation descriptions. this paper would be entirely too long if i spent any more time describing the rules i am developing, which can be inspected at http://myee.bol.ucla .edu. here, i would like to focus on the data-modeling process and the questions about the suitability of rdf and the semantic web for encoding our data. (by the way, i don’t seriously expect anyone to adopt my rules! they are radically different than the rules currently being applied and would represent a revolution in cataloging practice that we may not be up to undertaking in the current economic climate. their value lies in their thought-experiment aspect and their ability to clarify what entities we can model and what entities we may not be able to model.) i am now in the process of trying to model my cataloging rules in the form of an rdf model (“rdf” as used in this paper should be considered from now on to encompass rdf schema [rdfs], web ontology language [owl], and simple knowledge organization system [skos] unless otherwise stated); this model can also be inspected at http://myee.bol .ucla.edu. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is yet sophisticated enough to deal with our data. this article is an attempt to outline some of those areas and explore whether the problems i have encountered are soluble, in other words, whether or not our data might be able to live on the semantic web eventually. i have already heard from rdf experts bruce d’arcus (miami university) and rob styles (developer of talis, as semantic web technology company), whom i cite later, but through this article i hope to reach a larger community. my research questions can be found later, but first some definitions. n definition of terms the semantic web is a way to represent knowledge; it is a knowledge-representation language that provides ways of expressing meaning that are amenable to computation; it is also a means of constructing knowledgedomain maps consisting of class and property axioms with a formal semantics rdf is a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats; an rdf metadata model is based on making statements about resources in the form of triples that consist of 1. the subject of the triple (e.g., “new york”); 2. the predicate of the triple that links the subject and the object (e.g., “has the postal abbreviation”); and 3. the object of the triple (e.g., “ny”). xml is commonly used to express rdf, but it is not a necessity; it can also be expressed in notation 3 or n3, for example.1 rdfs is an extensible knowledge-representation language that provides basic elements for the description of ontologies, also known as rdf vocabularies. using rdfs, statements are made about resources in the form of 1. a class (or entity) as subject of the rdf triple (e.g., “new york”); 2. a relationship (or semantic linkage) as predicate of the rdf triple that links the subject and the object (e.g., can bibliographic data be put directly onto the semantic web? | yee 57 “has the postal abbreviation”); and 3. a property (or attribute) as object of the rdf triple (e.g., “ny”). owl is a family of knowledge representation languages for authoring ontologies compatible with rdf. skos is a family of formal languages built upon rdf and designed for representation of thesauri, classification schemes, taxonomies, or subject-heading systems. n research questions actually, the full-blown semantic web may not be exactly what we need. remember that the fundamental definition of the semantic web is “a way to represent knowledge.” the semantic web is a direct descendant of the attempt to create artificial intelligence, that is, of the attempt to encode enough knowledge of the real world to allow a computer to reason about reality in a way indistinguishable from the way a human being reasons. one of the research questions should probably be whether or not the technology developed to support the semantic web can be used to represent information rather than knowledge. fortunately, we do not need to represent all of human knowledge—we simply need to describe and index resources to facilitate their retrieval. we need to encode facts about the resources and what the resources discuss (what they are “about”), not facts about “reality.” based on our past experience, doing even this is not as simple as people think it is. the question is whether we could do what we need to do within the context of the semantic web. sometimes things that sound simple do not turn out to be so simple in the doing. my research questions are as follows: 1. is it possible for catalogers to tell in all cases whether a piece of data pertains to the frbr expression or the frbr manifestation? 2. is it possible to fit our data into rdf? given that rdf was designed to encode knowledge rather than information, perhaps it is the wrong technology to use for our purposes? 3. if it is possible to fit our data into rdf, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (i.e., providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)? as stated previously, i am not yet ready to answer these questions. i hope to find answers in the course of developing the rules and the model. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. n other relevant projects other relevant projects include the following: 1. frbr, functional requirements for authority data (frad), funtional requirements for subject authority records (frsar), and frbr-objectoriented (frbroo). all are attempts to create conceptual models of bibliographic entities using an entity-relationship model that is very similar to the class-property model used by rdf.2 2. various initiatives at the library of congress (lc), such as lc subject headings (lcsh) in skos,3 the lc name authority file in skos,4 the lccn permalink project to create persistent uris for bibliographic records,5 and initiatives to provide skos representations for vocabularies and data elements used in marc, premis, and mets. these all represent attempts to convert our existing bibliographic data into uris that stand for the bibliographic entities represented by bibliographic records and authority records; the uris would then be available for experiments in putting our data directly onto the semantic web. 3. the dc-rda task group project to put rda data elements into rdf.6 as noted previously and discussed further later, rda is less frbrized than my cataloging rules, but otherwise this project is very similar to mine. 4. dublin core’s (dc’s) work on an rdf schema.7 dublin core is very focused on manifestation and does not deal with expressions and works, so it is less similar to my project than is the dc-rda task groups’s project (see further discussion later). n why my project? one might legitimately ask why there is a need for a different model than the ones already provided by frbr, frad, frsar, frbroo, rda, and dc. the frbr and rda models are still tied to the model that is implicit in our current bibliographic data in which expression and manifestation are undifferentiated. this is because publishers publish and libraries acquire and shelve manifestations. in our current bibliographic practice, a new 58 information technology and libraries | june 2009 bibliographic record is made for either a new manifestation or a new expression. thus, in effect, there is no way for a computer to tell one from the other in our current data. despite the fact that frbr has good definitions of expression (change in content) and manifestation (mere change in carrier), it perpetuates the existing implicit model in its mapping of attributes to entities. for example, frbr maps the following to manifestation: edition statements (“2nd rev. ed.”); statements of responsibility that identify translators, editors, and illustrators; physical description statements that identify illustrated editions; and extent statements that differentiate expressions (the 102-minute version vs. the 89-minute version); etc. thus the frbr definition of expression recognizes that a 2nd revised edition is a new expression, but frbr maps the edition statement to manifestation. in my model, i have tried to differentiate more cleanly data applying to expressions from data applying to manifestations.8 frbr and rda tend to assume that our current bibliographic data elements map to one and only one group 1 entity or class. there are exceptions, such as title, which frbr and rda define at work, expression, and manifestation levels. however, there is a lack of recognition that, to create an accurate model of the bibliographic universe, more data elements need to be applied at the work and expression level in addition to (or even instead of) the manifestation level. in the appendix i have tried to contrast the frbr, frad, and rda models with mine. in my model, many more data elements (properties and attributes) are linked to the work and expression level. after all, if the expression entity is defined as any change in work content, the work entity needs to be associated with all content elements that might change, such as the original extent of the work, the original statement of responsibility, whether illustrations were originally present, whether color was originally present in a visual work, whether sound was originally present in an audiovisual work, the original aspect ratio of a moving image work, and so on. frbr also tends to assume that our current data elements map to one and only one entity. in working on my model, i have come to the conclusion that this is not necessarily true. in some cases, a data element pertaining to a manifestation also pertains to the expression and the work. in other cases, the same data element is specific to that manifestation, and, in other cases, the same data element is specific to its expression. this is true of most of the elements of the bibliographic description. frad, in attempting to deal with the fact that our current cataloging rules allow a single person to have several bibliographic identities (or pseudonyms), treats person, name, and controlled access point as three separate entities or classes. i have tried to keep my model simpler and more elegant by treating only person as an entity, with preferred name and variant name as attributes or properties of that entity. frbroo is focused on the creation process for works, with special attention to the creation of unique works of art and other one-off items found in museums. thus frbroo tends to neglect the collocation of the various expressions that develop in the history of a work that is reproduced and published, such as translations, abridged editions, editions with commentary, etc. dc has concentrated exclusively on the description of manifestations and has neglected expression and work altogether. one of the tenets of semantic web development is that, once an entity is defined by a community, other communities can reuse that entity without defining it themselves. the very different definitions of the work and expression entities in the different communities described above raise some serious questions about the viability of this tenet. n assumptions it should be noted that this entire experiment is based on two assumptions about the future of human intervention for information organization. these two assumptions are based on the even bigger assumption that, even though the internet seems to be an economy based on free intellectual labor, and, even though human intervention for information organization is expensive (and therefore at more risk than ever), human intervention for information organization is worth the expense. n assumption 1: what we need is not artificial intelligence, but a better human–machine partnership such that humans can do all of the intellectual labor and machines can do all of the repetitive clerical labor. currently, catalogers spend too much time on the latter because of the poor design of current systems for inputting data. the universal employment provided by paying humans to do the intellectual labor of building the semantic web might be just the stimulus our economy needs. n assumption 2: those who need structured and granular data—and the precise retrieval that results from it—to carry out research and scholarship may constitute an elite minority rather than most of the people of the world (sadly), but that talented and intelligent minority is an important one for the cultural and technological advancement of humanity. it is even possible that, if we did a better job of providing access to such data, we might enable the enlargement of that minority. can bibliographic data be put directly onto the semantic web? | yee 59 n granularity and structure issues as soon as one starts to create a data model, one encounters granularity or cataloger-data parsing issues. these issues have actually been with us all along as we developed the data model implicit in aacr2r and marc 21. those familiar with rda, frbr, and frad development will recognize that much of that development is directed at increasing structure and granularity in catalogerproduced data to prepare for moving it onto the semantic web. however, there are clear trade-offs in an increase in structure and granularity. more structure and more granularity make possible more powerful indexing and more sophisticated display, but more structure and more granularity are more complex and expensive to apply and less likely to be implemented in a standard fashion across all communities; that is, it is less likely that interoperable data would be produced. any switching or mapping that was employed to create interoperable data would produce the lowest common denominator (the simplest and least granular data), and once rendered interoperable, it would not be possible for that data to swim back upstream to regain its lost granularity. data with less structure and less granularity could be easier and cheaper to apply and might have the potential to be adopted in a more standard fashion across all communities, but that data would limit the degree to which powerful indexing and sophisticated display would be possible. take the example of a personal name: currently, we demarcate surname from forename by putting the surname first, followed by a comma and then the forename. even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is surname and which part is forename in a culture unfamiliar to the cataloger. in other words, the more granularity you desire in your data, the more often the people collecting the data are going to encounter ambiguous situations. another example: currently, we do not collect information about gender self-identification; if we were to increase the granularity of our data to gather that information, we would surely encounter situations in which the cataloger would not necessarily know if a given creator was self-defined as a female or a male or of some other gender identity. presently, if we are adding a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which date is the birth date and which is the death date (although an occasional “b.” or “d.” will tell us this kind of information). we could certainly provide more granularity for dates, but that would make the marc 21 format much more complex and difficult to learn. people who dislike the marc 21 format already argue that it is too granular and therefore requires too much of a learning curve before people can use it. for example, tennant claims that “there are only two kinds of people who believe themselves able to read a marc record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.”9 how much of the granularity already in marc 21 is used either in existing records or, even if present, is used in indexing and display software? granularity costs money, and libraries and archives are already starving for resources. granularity can only be provided by people, and people are expensive. granularity and structure also exist in tension with each other. more granularity can lead to less structure (or more complexity to retain structure along with granularity). in the pursuit of more granularity of data than we have now, rda, attempting to support rdf–compliant xml encoding, has been atomizing data to make it useful to computers, but this will not necessarily make the data more useful to humans. to be useful to humans, it must be possible to group and arrange (sort) the data meaningfully, both for indexing and for display. the developers of skos refer to the “vast amounts of unstructured (i.e., human readable) information in the web,”10 yet labeling bits of data as to type and recording semantic relationships in a machine-actionable way do not necessarily provide the kind of structure necessary to make data readable by humans and therefore useful to the people the web is ultimately supposed to serve. consider the case of music instrumentation. if you have a piece of music for five guitars and one flute, and you simply code number and instrumentation without any way to link “five” with “guitars” and “one” with “flute,” you will not be able to guarantee that a person looking for music for five flutes and one guitar will not be given this piece of music in their results (see figure 1).11 the more granular the data, the less the cataloger can build order, sequencing, and linking into the data; the coding must be carefully designed to allow the desired order, sequencing, and linking for indexing and display to be possible, which might call for even more complex coding. it would be easy to lose information about order, sequencing, and linking inadvertently. actually, there are several different meanings for the term structure: 1. structure is an object of a record (structure of document?); for example, elings and waibel refer to “data fields . . . also referred to as elements . . . which are organized into a record by a data structure.”12 2. structure is the communications layer, as opposed to the display layer or content designation.13 3. structure is the record, field, and subfield. 4. structure is the linking of bits of data together in the 60 information technology and libraries | june 2009 form of various types of relationships. 5. structure is the display of data in a structured, ordered, and sequenced manner to facilitate human understanding. 6. data structure is a way of storing data in a computer so that it can be used efficiently (this is how computer programmers use the term). i hasten to add that i am definitely in favor of adding more structure and granularity to our data when it is necessary to carry out the fundamental objectives of our profession and of our catalogs. i argued earlier that frbr and rda are not granular enough when it comes to the distinction between data elements that apply to expression and those that apply to manifestation. if we could just agree on how to differentiate data applying to the manifestation from data applying to the expression instead of our current practice of identifying works with headings and lumping all manifestation and expression data together, we could increase the level of service we are able to provide to users a thousandfold. however, if we are not going to commit to differentiating between figure 1b. example of encoding of musical instrumentation at the expression level based on the above model 5 guitars 1 flute instrumentation of musical expression original instrumentation of musical expression—number of a particular instrument original instrumentation of musical expression—type of instrument figure 1a. extract from yee rdf model that illustrates one technique for modeling musical instrumentation at the expression level (using a blank node to group repeated number and instrument type) can bibliographic data be put directly onto the semantic web? | yee 61 expression and manifestation, it would be more intellectually honest for frbr and rda to take the less granular path of mapping all existing bibliographic data to manifestation and expression undifferentiated, that is, to use our current data model unchanged and state this openly. i am not in favor of adding granularity for granularity’s sake or for the sake of vague conceptions of possible future use. granularity is expensive and should be used only in support of clear and fundamental objectives. n the goal: efficient displays and indexes my main concern is that we model and then structure the data in a way that allows us to build the complex displays that are necessary to make catalogs appear simple to use. i am aware that the current orthodoxy is that recording data should be kept completely separate from indexing and display (“the applications layer”). because i have spent my career in a field in which catalog records are indexed and displayed badly by systems people who don’t seem to understand the data contained in them, i am a skeptic. it is definitely possible to model and structure data in such a way that desired displays and indexes are impossible to construct. i have seen it happen! the lc working group report states that “it will be recognized that human users and their needs for display and discovery do not represent the only use of bibliographic metadata; instead, to an increasing degree, machine applications are their primary users.”14 my fear is that the underlying assumption here is that users need to (and can) retrieve the single perfect record. this will never be true for bibliographic metadata. users will always need to assemble all relevant records (of all kinds) as precisely as possible and then browse through them before making a decision about which resources to obtain. this is as true in the semantic web—where “records” can be conceived of as entity or class uris—as it is in the world of marc–encoded metadata. some of the problems that have arisen in the past in trying to index bibliographic metadata for humans are connected to the fact that existing systems do not group all of the data related to a particular entity effectively, such that a user can use any variant name or any combination of variant names for an entity and do a successful search. currently, you can only look for a match among two or more keywords within the bounds of a single manifestation-based bibliographic record or within the bounds of a single heading, minus any variant terms for that entity. thus, when you do a keyword search for two keywords, for example, “clemens” and “adventures,” you will retrieve only those manifestations of mark twain’s adventures of tom sawyer that have his real name (clemens) and the title word “adventures” co-occurring within the bounded space created by a single manifestation-based bibliographic record. instead, the preferred forms and the variant forms for a given entity need to be bounded for indexing such that the keywords the user employs to search for that entity can be matched using co-occurrence rules that look for matches within a single bounded space representing the entity desired. we will return to this problem in the discussion of issue 3 in the later section “rdf problems encountered.” the most complex indexing problem has always proven to be the grouping or bounding of data related to a work, since it requires pulling in all variants for the creator(s) of that work as well. otherwise, a user who searches for a work using a variant of the author’s name and a variant of the title will continue to fail (as they do in all current opacs), even when the desired work exists in the catalog. if we could create a uri for the adventures of tom sawyer that included all variant names for the author and all variant titles for the work (including the variant title tom sawyer), the same keyword search described above (“clemens” and “adventures”) could be made to retrieve all manifestations and expressions of the adventures of tom sawyer, instead of the few isolated manifestations that it would retrieve in current catalogs. we need to make sure that we design and structure the data such that the following displays are possible: n display all works by this author in alphabetical order by title with the sorting element (title) appearing at the top of each work displayed. n display all works on this subject in alphabetical order by principal author and title (with principal author and title appearing at top of each work displayed), or title if there is no principal author (with title appearing at top of each work displayed). we must ensure that we design and structure the data in such a way that our structure allows us to create subgroups of related data, such as instrumentation for a piece of music (consisting of a number associated with each particular instrument), place and related publisher for a certain span of dates on a serial title change record, and the like. n which standards will carry out which functions? currently, we have a number of different standards to carry out a number of different functions; we can speculate about how those functions might be allocated in a new semantic web–based dispensation, as shown in table 1. in table 1, data structure is taken to mean what a record represents or stands for; traditionally, a record has represented an expression (in the days of hand62 information technology and libraries | june 2009 press books) or a manifestation (ever since reproduction mechanisms have become more sophisticated, allowing an explosion of reproductions of the same content in different formats and coming from different distributors). rda is record-neutral; rdf would allow uris to be established for any and all of the frbr levels; that is, there would be a uri for a particular work, a uri for a particular expression, a uri for a particular manifestation, and a uri for a particular item. note that i am not using data structure in the sense that a computer programmer does (as a way of storing data in a computer so that it can be used efficiently). currently, the encoding of facts about entity relationships (see table 1) is carried out by matching data-value character strings (headings or linking fields using issns and the like) that are defined by the lc/naco authority file (following aacr2r rules), lcsh (following rules in the subject cataloging manual), etc. in the future, this function might be carried out by using rdf to link the uri for a resource to the uri for a data value. display rules (see table 1) are currently defined by isbd and aacr2r but widely ignored by systems, which frequently truncate bibliographic records arbitrarily in displays, supply labels, and the like; rda abdicates responsibility, pushing display out of the cataloging rules. the general principle on the web is to divorce data from display and allow anyone to display the data any way they want. display is the heart of the objects (or goals) of cataloging: the point is to display to the user the works of an author, the editions of a work, or the works on a subject. all of these goals only can be met if complex, high-quality displays can be built from the data created according to the data model. indexing rules (see table 1) were once under the control of catalogers (in book and card catalogs) in that users had to navigate through headings and cross-references to find table 1. possible reallocation of current functions in a new semantic web–based dispensation function current future? data content, or content guidelines (rules for providing data in a particular element) defined by aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data elements defined by isbd–based aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data values defined by lc/naco authority file, lcsh, marc 21 coded data values, etc. defined as ontologies using rdf/ rdfs/owl/skos encoding or labeling of data elements for machine manipulation; same as data format? defined by iso 2709–based marc 21 defined by rdf/rdfs/xml data structure (i.e., what a record stands for) defined by aacr2r and marc 21; also frbr? defined by rdf/rdfs/owl/ skos schematization (constraint on structure and content) marc 21, mods, dcmi abstract model defined by rdf/rdfs/owl/ skos encoding of facts about entity relationships carried out by matching data value strings (headings found in lc/naco authority file and lcsh, issn’s, and the like) carried out by rdf/rdfs/owl/ skos in the form of uri links display rules ils software, formerly isbd– based aacr2r (“application layer”) or yee rules indexing rules ils software sparql, “application layer,” or yee rules can bibliographic data be put directly onto the semantic web? | yee 63 what they wanted; currently indexing is in the hands of system designers who prefer to provide keyword indexing of bibliographic (i.e., manifestation-based) records rather than provide users with access to the entities they are really interested in (works, authors and subjects), all represented currently by authority records for headings and cross-references. rda abdicates responsibility, pushing indexing concerns completely out of the cataloging rules. the general principle on the web is to allow resources to be indexed by any web search engines that wish to index them. current web data is not structured at all for either indexing or display. i would argue that our interest in the semantic web should be focused on whether or not it will support more data structure—as well as more logic in that data structure—to support better indexes and better displays than we have now in manifestation-based ils opacs. crucial to better indexing than we have ever had before are the co-occurrence rules for keyword indexing, that is, the rules for when a co-occurrence of two or more keywords should produce a match. we need to be able to do a keyword search across all possible variant names for the entity of interest, and the entity of interest for the average catalog user is much more likely to be a particular work than to be a particular manifestation. unfortunately, catalog-use studies only have studied so-called known-item searches without investigating whether a known-item searcher was looking for a particular edition or manifestation of a work or was simply looking for a particular work in order to make a choice as to edition or manifestation once the work was found. however, common sense tells us that it is a rare user who approaches the catalog with prior knowledge about all published editions of a given work. the more common situation is surely one in which a user desires to read a particular shakespeare play or view a particular david lean film and discovers that the desired work exists in more than one expression or manifestation only after searching the catalog. we need to have the keyword(s) in our search for a particular work co-occur within a bounded space that encompasses all possible keywords that might refer to that particular work entity, including both creator and title keywords. notice in table 1 the unifying effect that rdf could potentially have; it could free us from the use of multiple standards that can easily contradict each other, or at least not live peacefully together. examples are not hard to find in the current environment. one that has cropped up in the course of rda development concerns family names. presently the rules for naming families are different depending on whether the family is the subject of a work (and established according to lcsh) or whether the family is responsible for a collection of papers (and established according to rda). n types of data rda has blurred the distinctions among certain types of data, apparently because there is a perception that on the semantic web the same piece of data needs to be coded only once, and all indexing and display needs can be supported from that one piece of data. i question that assumption on the basis of my experience with bibliographic cataloging. all of the following ways of encoding the same piece of data can still have value in certain circumstances: n transcribed; in rdf terms, a literal (i.e., any data that is not a uri, a constant value). transcribed data is data copied from an item being cataloged. it is valuable for providing access to the form of the name used on a title page and is particularly useful for people who use pseudonyms, corporate bodies that change name, and so on. transcribed data is an important part of the historical record and not just for off-line materials; it can be a historical record of changing data on notoriously fluid webpages. n composed; in rdf terms, also a literal. composed data is information composed by a cataloger on the basis of observation of the item in hand; it can be valuable for historical purposes to know which data was composed. n supplied; in rdf terms, also a literal. supplied data is information supplied by a cataloger from outside sources; it can be valuable for historical purposes to know which data was supplied and from which outside sources it came. n coded; in rdf, represented by a uri. coded data would likely transform on the semantic web into links to ontologies that could provide normalized, human-readable identification strings on demand, thus causing coded and normalized data to merge into one type of data. is it not possible, though, that the coded form of normalized data might continue to provide for more efficient searching for computers as opposed to humans? coded data also has great cross-cultural value, since it is not as language-dependent as literals or normalized headings. n normalized headings (controlled headings); in rdf, represented by a uri. normalized or controlled headings are still necessary to provide users with coherent, ordered displays of thousands of entities that all match the user’s search for a particular entity (work, author, subject, etc.). the reason google displays are so hideous is that, so far, the data searched lacks any normalized display data. if variant language forms of the name for an entity 64 information technology and libraries | june 2009 are linked to an entity uri, it should be possible to supply headings in the language and script desired by a particular user. n the rdf model those who have become familiar with frbr over the years will probably not find it too difficult to transition from the frbr conceptual model to the rdf model. what frbr calls an “entity,” rdf calls a “subject” and rdfs calls a “class.” what frbr calls an “attribute,” rdf calls an “object” and rdfs calls a “property.” what frbr calls a “relationship,” rdf calls a “predicate” and rdfs calls a “relationship” or a “semantic linkage” (see table 2). the difficulty in any data-modeling exercise lies in deciding what to treat as an entity or class and what to treat as an attribute or property. the authors of frbr decided to create a class called expression to deal with any change in the content of a work. when frbr is applied to serials, which change content with every issue, the model does not work well. in my model, i found it useful to create a new entity at the manifestation level, the serial title, to deal with the type of change that is more relevant to serials, the change in title. i also created another new entity at the manifestation level, title-manifestation, to deal with a change of title in a nonserial work that is not associated with a change in content. one hundred years ago, this entity would have been called title-edition. i am also in the process of developing an entity at the expression level—surrogate—to deal with reproductions of original artworks that need to inherit the qualities of the original artwork they reproduce without being treated as an edition of that original artwork, which ipso facto is unique. these are just examples of cases in which it is not that easy to decide on the classes or entities that are necessary to accurately model bibliographic information. see the appendix for a complete comparison of the classes and entities defined in four different models: frbr, frad, rda, and the yee cataloging rules (ycr). the appendix also shows variation among these models concerning whether a given data element is treated as a class/entity or as an attribute/property. the most notable examples are name and preferred access point, which are treated as classes/entities in frad, as attributes in frbr and ycr, and as both in rda. n rdf problems encountered my goal for this paper is to institute discussion with data modelers about which problems i observed are insoluble and which are soluble: 1. is there an assumption on the part of semantic web developers that a given data element, such as a publisher name, should be expressed as either a literal or using a uri (i.e., controlled), but never both? cataloging is rooted in humanistic practices that require careful recording of evidence. there will always be value in distinguishing and labeling the following types of data: n copied as is from an artifact (transcribed) n supplied by a cataloger n categorized by a cataloger (controlled) tim berners-lee (the father of the internet and the semantic web) emphasizes the importance of recording not just data but also its provenance for the sake of authenticity.15 for many data elements, therefore, it will be important to be able to record both a literal (transcribed or composed form or both) and a uri (controlled form). is this a problem in rdf? as a corollary, if any data that can be given a uri cannot also be represented by a literal (transcribed and composed data, or one or the other), it may not be possible to design coherent, readable displays of the data describing a particular entity. among other things, cataloging is a discursive writing skill. does rdf require that all data be represented only once, either by a literal or by a uri? or is it perhaps possible that data that has a uri could also have a transcribed or composed form as a property? perhaps it will even be possible to store multiple snapshots of online works that change over time to document variant forms of a name for works, persons, and so on. 2. will the internet ever be fast enough to assemble the equivalent of our current records from a collection of hundreds or even thousands of uris? in rdf, links are one-to-one rather than one-to-many. this leads to a great proliferation of reciprocal links. the more granularity there is in the data, the more linking is necessary to ensure that atomized data elements are linked together. potentially, every piece of data describing a particular entity could be represented by a uri leading out to a skos list of data values. the number of links necessary to pull together table 2. the frbr conceptual model translated into rdf and rdfs frbr rdf rdfs entity subject class attribute object property relationship predicate relationship/ semantic linkage can bibliographic data be put directly onto the semantic web? | yee 65 all of the data just to describe one manifestation could become astronomical, as could the number of one-to-one links necessary to create the appearance of a one-to-many link, such as the link between an author and all the works of an author. is the internet really fast enough to assemble a record from hundreds of uris in a reasonable amount of time? given the often slow network throughput typical of many of our current internet connections, is it really practical to expect all of these pieces to be pulled together efficiently to create a single display for a single user? we yet may feel nostalgia for the single manifestation-based record that already has all of the relevant data in it (no assembly required). bruce d’arcus points out, however, that i think if you’re dealing with rdf, you wouldn’t necessarily be gathering these data in real-time. the uris that are the targets for those links are really just global identifiers. how you get the triples is a separate matter. so, for example, in my own personal case, i’m going to put together an rdf store that is populated with data from a variety of sources, but that data population will happen by script, and i’ll still be querying a single endpoint, where the rdf is stored in a relational database.16 in other words, d’arcus essentially will put them all in one place, or in one database that “looks” from a uri perspective to be “one place” where they’re already gathered. 3. is rdf capable of dealing with works that are identified using their creators? we need to treat author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. lexical labels, or human-readable identifiers for works that are identified using both the principal author and the title, are particularly problematic in rdf given that the principal author is an entity in its own right. is rdf capable of supporting the indexing necessary to allow a user to search using any variant of the author’s name and any variant of the title of a work in combination and still retrieve all expressions and manifestations of that work, given that author will have a uri of its own, linked by means of a relationship link to the work uri? is rdf capable of supporting the display of a list of one thousand works, each identified by principal author, in order first by principal author, then by title, then by publication date, given that the preferred heading for each principal author would have to be assembled from the uri for that principal author and the preferred title for each work would have to be assembled from the uri for that work? for fear that this will not, in fact, be possible, i have put a human-readable work-identifier data element into my model that consists of principal author and title when appropriate, even though that means the preferred name of the principal author may not be able to be controlled by the entity record for the principal author. any guidance from experienced data modelers in this regard would be appreciated. according to bruce d’arcus, this is purely an interface or application question that does not require a solution at the data layer.17 since we have never had interfaces or applications that would do this correctly, even though the data is readily available in authority records, i am skeptical about this answer! perhaps bruce’s suggestion under item 9 of designating a sortname property for each entity is the solution here as well. my human-readable work identifier consisting of the name of the principal creator and uniform title of work could be designated the sortname poperty for the work. it would have to be changed whenever the preferred form of the name for the principal creator changed, however. 4. do all possible inverse relationships need to be expressed explicitly, or can they be inferred? my model is already quite large, and i have not yet defined the inverse of every property as i really should to have a correct rdf model. in other words, for every property there needs to be an inverse property; for example, the property iscreatorof needs to have the inverse property iscreatedby; thus “twain” has the property iscreatorof, while “adventures of tom sawyer” has the property iscreatedby. perhaps users and inputters will not actually have to see the huge, complex rdf data model that would result from creating all the inverse relationships, but those who maintain the model will have to deal with a great deal of complexity. however, since i’m not a programmer, i don’t know how the complexity of rdf compares to the complexity of existing ils software. 5. can rdf solve the problems we are having now because of the lack of transitivity or inheritance in the data models that underlie current ilses, or will rdf merely perpetuate these problems? we have problems now with the data models that underlie our current ilses because of the inability of these models to deal with hierarchical inheritance, such that whatever is true of an entity in the hierarchy is also true of every entity below that entity in the hierarchy. one example is that of cross-references to a parent corporate body that should be held to apply to all subdivisions of that corporate body but never are in existing ils systems. there is a cross-reference from “fbi” to “united states. federal bureau of investigation,” but not from “fbi counterterrorism division” to “united states. federal bureau of investigation. counterterrorism division.” for that reason, a search in any opac name index for “fbi counterterrorism division” will fail. we need systems that recognize that data about a parent corporate body is relevant to all subdivisions of that parent body. we need systems that recognize that data about a work is relevant to all expressions and manifestations of that work. rdf allows you to link a work to an expression 66 information technology and libraries | june 2009 and an expression to a manifestation, but i don’t believe it allows you to encode the information that everything that is true of the work is true of all of its expressions and manifestations. rob styles seems to confirm this: “rdf doesn’t have hierarchy. in computer science terms, it’s a graph, not a tree, which means you can connect anything to anything else in any direction.”18 of course, not all links should be this kind of transitive or inheritance link. one expression of work a is linked to another expression of work a by links to work a, but whatever is true of one of those expressions is not necessarily true of the other; one may be illustrated, for example, while the other is not. whatever is true of one work is not necessarily true of another work related to it by related work link. it should be recognized that bibliographic data is rife with hierarchy. it is one of our major tools for expressing meaning to our users. corporate bodies have corporate subdivisions, and many things that are true for the parent body also are true for its subdivisions. subjects are expressed using main headings and subject subdivisions, and many things that are true for the main heading (such as variant names) also are true for the heading combined with one of its subdivisions. geographic areas are contained within larger geographic areas, and many things that are true of the larger geographic area also are true for smaller regions, counties, cities, etc., contained within that larger geographic area. for all these reasons, i believe that, to do effective displays and indexes for our bibliographic data, it is critical that we be able to distinguish between a hierarchical relationship and a nonhierarchical relationship. 6. to recognize the fact that the subject of a book or a film could be a work, a person, a concept, an object, an event, or a place (all classes in the model), is there any reason we cannot define subject itself as a property (a relationship) rather than a class in its own right? in my model, all subject properties are defined as having a domain of resource, meaning there is no constraint as to the class to which these subject properties apply. i’m not sure if there will be any fall-out from that modeling decision. 7. how do we distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location? sometimes a place is a jurisdiction and behaves like a corporate body (e.g., united states is the name of the government of the united states). sometimes place is a physical location in which something is located (e.g., the birds discussed in a book about the birds of the united states). to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, i have defined two different classes for place: place as jurisdictional corporate body and place as geographic area. will this cause problems in the model? will there be times when it prevents us from making elegant generalizations in the model about place per se? there is a similar problem with events. some events are corporate bodies (e.g., conferences that publish papers) and some are a kind of subject (e.g., an earthquake). i have defined two different classes for event: conference or other event as corporate body creator and event as subject. 8. what is the best way to model a bound-with or an issuedwith relationship, or a part–whole relationship in which the whole must be located to obtain the part? the bound-with relationship is actually between two items containing two different works, while the issued-with relationship is between two manifestations containing two different works (see figure 2). is this a work-to-work relationship? will designating it a work-to-work relationship cause problems for indicating which specific items or manifestation-items of each work are physically located in the same place? this question may also apply to those part–whole relationships in which the part is physically contained within the whole and both are located in the same place (sometimes known as analytics). one thing to bear in mind is that in all of these cases the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that are contained in the particular manifestation or item that is bound with, issued with, or part of the whole. however, if the relationship is modeled as a work-1manifestation to work-2-manifestation relationship, or a work-1-item to work-2-item relationship,, care must be taken in the design of displays to pull in enough information about the two or more works so as not to confuse the user. 9. how do we express the arrangement of elements that have a definite order? i am having trouble imagining how to encode the ordering of data elements that make up a larger element, such as the pieces of a personal name. this is really a desire to control the display of those atomized elements so that they make sense to human beings rather than just to machines. could one define a property such as natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of a person given that the ideal order of these elements might vary from one person to another? could one define properties such as sorting element 1, sorting element 2, sorting element 3, etc., and assign them to the various pieces that will be assembled to make a particular heading for an entity, such as an lcsh heading for a historical period? (depending on the answer to the question in item 11, it may or may not be possible to assign a property to a property in this fashion.) are there standard sorting rules we need to be aware of (in unicode, for example)? are there other rdf techniques available to deal with sorting and arrangement? bruce d’arcus suggests that, instead of coding the name parts, it would be more useful to designate sortname properties;19 might it not be necessary to designate a sortname property for each variant name, as well, can bibliographic data be put directly onto the semantic web? | yee 67 for cases in which variants need to appear in sorted displays? and wouldn’t these sortname properties complicate maintenance over time as preferred and variant names changed? 10. how do we link related data elements in such a way that effective indexing and displays are possible? some examples: number and kind of instrument (e.g., music written for two oboes and three guitars); multiple publishers, frequencies, subtitles, editors, etc., with date spans for a serial title change (or will it be necessary to create a new manifestation for every single change in subtitle, publisher name, place of publication, etc?). the assumption seems to be that there will be no repeatable data elements. based on my somewhat limited experience with rdf, it appears that there are record equivalents (every data element—property or relationship—pertaining to a particular entity with a uri), but there are no field or subfield equivalents that allow the sublinking of related pieces of data about an entity. indeed, rob styles goes so far as to argue that ultimately there is no notion of a “record” in rdf.20 it is possible that blank nodes might be able to fill in for fields and subfields in some cases for grouping data, but there are dangers involved in their use.21 to a cataloger, it looks as though the plan is for rdf data to float around loose without any requirement that there be a method for pulling it together into coherent displays designed for human beings. 11. can a property have a property in rdf? as an example of where it might be useful to define a property of a property, robert maxwell suggests that date of publication is really an attribute (property) of the published by relationship (another property).22 another example: in my model, a variant title for a serial is a property. can that property itself have the property type of variant title to encompass things like spine title, key title, etc.? another example appeared in item 9, in which it is suggested that it might be desirable to assign sort-element properties to the various elements of a name property. 12. how do we document record display decisions? there is no way to record display decisions in rdf itself; it is completely display-neutral. we could not safely commit to a particular rdf–based data model until a significant amount of sample bibliographic data had been created and open-source indexing and display software had been designed and user-tested on that data. it may be that we will need to supplement rdf with some other encoding mechanism that allows us to record display decisions along with the data. current cataloging rules are about display as much as they are about content designation. isbd concerns the order in which the elements should be displayed to humans. the cataloging objectives concern display to users of such entity groups as the works of an author, the editions of a work, and the works on a subject. 13. can all bibliographic data be reduced to either a class or a property with a finite list of values? another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus. cataloging is the art of writing discursive prose as much as it is the ability to select the correct value for a particular data element. we must deal with ambiguous data (presented by joe blow could mean that joe created the entire work, produced it, distributed it, sponsored it, or merely funded it). we must sometimes record information without knowing its exact meaning. we must deal with situations that have not been anticipated in advance. it is not possible to list every possible kind of data and every possible value for each type of figure 2. examples of part–whole relationships. how might these be best expressed in rdf? issued-with relationship a copy of charlie chaplin’s 1917 film the immigrant can be found on a videodisc compilation called charlie chaplin, the early years along with two other chaplin films. this compilation was published and collected by many different libraries and media centers. if a user wants to view this copy of the immigrant, he or she will first have to locate charlie chaplin, the early years, then look for the desired film at the beginning of the first videodisc in the set. the issued-with relationship between the immigrant and the other two films on charlie chaplin, the early years is currently expressed in the bibliographic record by means of a “with” note: first on charlie chaplin, the early years, v. 1 (62 min.) with: the count – easy street. bound-with relationship the university of california, los angeles film & television archive has acquired a reel of 16 mm. film from a collector who strung five warner bros. cartoons together on a single reel of film. we can assume that no other archive, library, or media collection will have this particular compilation of cartoons, so the relationship between the five cartoons is purely local in nature. however, any user at the film & television archive who wishes to view one of these cartoons will have to request a viewing appointment for the entire reel and then find the desired cartoon among the other four on the reel. the bound-with relationship among these cartoons is currently expressed in a holdings record by means of a “with” note: fourth on reel with: daffy doodles – tweety pie – i love to singa – along flirtation walk. 68 information technology and libraries | june 2009 data up front before any data is gathered. it will always be necessary to provide a plain-text escape hatch. the bibliographic world is a complex, constantly changing world filled with ambiguity. n what are the next steps? in a sense, this paper is a first crude attempt at locating unmapped territory that has not yet been explored. if we were to decide as a community that it would be valuable to move our shared cataloging activities onto the semantic web, we would have a lot of work ahead of us. if some of the rdf problems described above are insoluble, we may need to work with semantic web developers to create a more sophisticated version of rdf that can handle the transitivity and complex linking required by our data. we will also need to encourage a very complex existing community to evolve institutional structures that would enable a more efficient use of the internet for the sharing of cataloging and other metadata creation. this is not just a technological problem, but also a political one. in the meantime, the experiment continues. let the thinking and learning begin! references and notes 1. “notation3, or n3 as it is more commonly known, is a shorthand non–xml serialization of resource description framework models, designed with human-readability in mind: n3 is much more compact and readable than xml rdf notation. the format is being developed by tim berners-lee and others from the semantic web community.” wikipedia, “notation 3,” http://en.wikipedia.org/wiki/notation_3 (accessed feb. 19, 2009). 2. frbr review group, www.ifla.org/vii/s13/wgfrbr/; frbr review group, franar (working group on functional requirements and numbering of authority records), www .ifla.org/vii/d4/wg-franar.htm; frbr review group, frsar (working group, functional requirements for subject authority records), www.ifla.org/vii/s29/wgfrsar.htm; frbroo, frbr review group, working group on frbr/crm dialogue, www .ifla.org/vii/s13/wgfrbr/frbr-crmdialogue_wg.htm. 3. library of congress, response to on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 24, 39, 40, www.loc.gov/bibliographic-future/news/lcwgrpt response_dm_053008.pdf (accessed mar. 25, 2009). 4. ibid., 39. 5. ibid., 41. 6. dublin core metadata initiative, dcmi/rda task group wiki, http://www.dublincore.org/dcmirdataskgroup/ (accessed mar. 25, 2009). 7. mikael nilsson, andy powell, pete johnston, and ambjorn naeve, expressing dublin core metadata using the resource description framework (rdf), http://dublincore.org/ documents/2008/01/14/dc-rdf/ (accessed mar. 25, 2009). 8. see for example table 6.3 in frbr, which maps to manifestation every kind of data that pertains to expression change with the exception of language change. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998): 95, http://www.ifla.org/vii/s13/frbr/frbr.pdf (accessed mar. 4, 2009). 9. roy tennant, “marc must die,” library journal 127, no. 17 (oct. 15, 2002): 26. 10. w3c, skos simple knowledge organization system reference, w3c working draft 29 august 2008, http://www.w3.org/ tr/skos-reference/ (accessed mar. 25, 2009). 11. the extract in figure 1 is taken from my complete rdf model, which can be found at http://myee.bol.ucla.edu/ ycrschemardf.txt. 12. mary w. elings and gunter waibel, “metadata for all: descriptive standards and metadata sharing across libraries, archives and museums,” first monday 12, no. 3 (mar. 5, 2007), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/1628/1543 (accessed mar. 25, 2009). 13. oclc, a holdings primer: principles and standards for local holdings records, 2nd ed. (dublin, ohio: oclc, 2008), 4, http:// www.oclc.org/us/en/support/documentation/localholdings/ primer/holdings%20primer%202008.pdf (accessed mar. 25, 2009). 14. the library of congress working group, on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 30, http:// www.loc.gov/bibliographic-future/news/lcwg-ontherecord -jan08-final.pdf (accessed mar. 25, 2009). 15. talis, sir tim berners-lee talks with talis about the semantic web: transcript of an interview recorded on 7 february 2008, http://talis-podcasts.s3.amazonaws.com/twt20080207_timbl .html (accessed mar. 25, 2009). 16. bruce d’arcus, e-mail to author, mar. 18, 2008. 17. ibid. 18. rob styles, e-mail to author, mar. 25, 2008. 19. bruce d’arcus, e-mail to author, mar. 18, 2008. 20. rob styles, e-mail to author, mar. 25, 2008. 21. w3c, “section 2.3, structured property values and blank nodes,” in rdf primer: w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-primer/#structuredproperties (accessed mar. 25, 2009). 22. robert maxwell, frbr: a guide for the perplexed (chicago: ala, 2008). can bibliographic data be put directly onto the semantic web? | yee 69 entities/classes in rda, frbr, frad compared to yee cataloging rules (ycr) rda, frbr, and frad ycr group 1: work work group 1: expression expression surrogate group 1: manifestation manifestation title-manifestation serial title group 1: item item group 2: person person fictitious character performing animal group 2: corporate body corporate body corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family (rda and frad only) group 3: concept concept group 3: object object group 3: event event or historical period as subject group 3: place place as geographic area discipline genre/form name identifier controlled access point rules (frad only) agency (frad only) appendix. entity/class and attribute/property comparisons 70 information technology and libraries | june 2009 attributes/properties in frbr compared to frad model entity frbr frad work title of the work form of work date of the work other distinguishing characteristics intended termination intended audience context for the work medium of performance (musical work) numeric designation (musical work) key (musical work) coordinates (cartographic work) equinox (cartographic work) form of work date of the work medium of performance subject of the work numeric designation key place of origin of the work original language of the work history other distinguishing characteristic expression title of the expression form of expression date of expression language of expression other distinguishing characteristics extensibility of expression revisability of expression extent of the expression summarization of content context for the expression critical response to the expression use restrictions on the expression sequencing pattern (serial) expected regularity of issue (serial) expected frequency of issue (serial) type of score (musical notation) medium of performance (musical notation or recorded sound) scale (cartographic image/object) projection (cartographic image/object) presentation technique (cartographic image/object) representation of relief (cartographic image/object) geodetic, grid, and vertical measurement (cartographic image/ object) recording technique (remote sensing image) special characteristic (remote sensing image) technique (graphic or projected image) form of expression date of expression language of expression technique other distinguishing characteristic surrogate can bibliographic data be put directly onto the semantic web? | yee 71 model entity frbr frad manifestation title of the manifestation statement of responsibility edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution fabricator/manufacturer series statement form of carrier extent of the carrier physical medium capture mode dimensions of the carrier manifestation identifier source for acquisition/access authorization terms of availability access restrictions on the manifestation typeface (printed book) type size (printed book) foliation (hand-printed book) collation (hand-printed book) publication status (serial) numbering (serial) playing speed (sound recording) groove width (sound recording) kind of cutting (sound recording) tape configuration (sound recording) kind of sound (sound recording) special reproduction characteristic (sound recording) colour (image) reduction ratio (microform) polarity (microform or visual projection) generation (microform or visual projection) presentation format (visual projection) system requirements (electronic resource) file characteristics (electronic resource) mode of access (remote access electronic resource) access address (remote access electronic resource) edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution form of carrier numbering title-manifestation serial title item item identifier fingerprint provenance of the item marks/inscriptions exhibition history condition of the item treatment history scheduled treatment access restrictions on the item location of item attributes/properties in frbr compared to frad (cont.) 72 information technology and libraries | june 2009 model entity frbr frad person name of person dates of person title of person other designation associated with the person dates associated with the person title of person other designation associated with the person gender place of birth place of death country place of residence affiliation address language of person field of activity profession/occupation biography/history fictitious character performing animal corporate body name of the corporate body number associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body type of corporate body language of the corporate body address field of activity history corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family type of family dates of family places associated with family history of family concept term for the concept type of concept object term for the object type of object date of production place of production producer/fabricator physical medium event term for the event date associated with the event place associated with the event attributes/properties in frbr compared to frad (cont.) can bibliographic data be put directly onto the semantic web? | yee 73 model entity frbr frad place term for the place coordinates other geographical information discipline genre/form name type of name scope of usage dates of usage language of name script of name transliteration scheme of name identifier type of identifier identifier string suffix controlled access point type of controlled access point status of controlled access point designated usage of controlled access point undifferentiated access point language of base access point script of base access point script of cataloguing transliteration scheme of base access point transliteration scheme of cataloguing source of controlled access point base access point addition rules citation for rules rules identifier agency name of agency agency identifier location of agency attributes/properties in frbr compared to frad (cont.) 74 information technology and libraries | june 2009 attributes/properties in rda compared to ycr model entity rda ycr work title of the work form of work date of work place of origin of work medium of performance numeric designation key signatory to a treaty, etc. other distinguishing characteristic of the work original language of the work history of the work identifier for the work nature of the content coverage of the content coordinates of cartographic content equinox epoch intended audience system of organization dissertation or theses information key identifier for work language-based identifier (preferred lexical label) variant language-based identifier (alternate lexical label) language-based identifier (preferred lexical label) for work language-based identifier for work (preferred lexical label) identified by principalcreator in combination with uniform title language-based identifier (preferred lexical label) for work identified by title alone (uniform title) supplied title for work variant title for work original language of work responsibility for work original publication statement of work dates associated with work original publication/release/broadcast date of work copyright date of work creation date of work date of first recording of a work date of first performance of a work finding date of naturally occurring object original publisher/distributor/broadcaster of work places associated with work original place of publication/distribution/broadcasting for work country of origin of work place of creation of work place of first recording of work place of first performance of work finding place of naturally occurring object original method of publication/distribution/broadcast of work serial or integrating work original numeric and/or alphabetic designations—beginning serial or integrating work original chronological designations— beginning serial or integrating work original numeric and/or alphabetic designations—ending serial or integrating work original chronological designations— ending encoding of content of work genre/form of content of work original instrumentation of musical work instrumentation of musical work—number of a particular instrument instrumentation of musical work—type of instrument original voice(s) of musical work voice(s) of musical work—number of a particular type of voice voice(s) of musical work—type of voice original key of musical work numeric designation of musical work coordinates of cartographic work equinox of cartographic work original physical characteristics of work original extent of work original dimensions of work mode of issuance of work can bibliographic data be put directly onto the semantic web? | yee 75 model entity rda ycr work (cont.) original aspect ratio of moving image work original image format of moving image work original base of work original materials applied to base of work work summary work contents list custodial history of work creation of archival collection censorship history of work note about relationship(s) to other works expression content type date of expression language of expression other distinguishing characteristic of the expression identifier for the expression summarization of the content place and date of capture language of the content form of notation accessibility content illustrative content supplementary content colour content sound content aspect ratio format of notated music medium of performance of musical content duration performer, narrator, and/or presenter artistic and/or technical credits scale projection of cartographic content other details of cartographic content awards key identifier for expression language-based identifier (preferred lexical label) for expression variant title for expression nature of modification of expression expression title expression statement of responsibility edition statement scale of cartographic expression projection of cartographic expression publication statement of expression place of publication/distribution/release/broadcasting for expression place of recording for expression publisher/distributor/releaser/broadcaster for expression publication/distribution/release/broadcast date for expression copyright date for expression date of recording for expression numeric and/or alphabetic designations for serial expressions chronological designations for serial expressions performance date for expression place of performance for expression extent of expression content of expression language of expression text language of expression captions language of expression sound track language of sung or spoken text of expression language of expression subtitles language of expression intertitles language of summary or abstract of expression instrumentation of musical expression instrumentation of musical expression—number of a particular instrument instrumentation of musical expression—type of instrument voice(s) of musical expression voice(s) of musical expression—number of a particular type of voice voice(s) of musical expression—type of voice key of musical expression appendages to the expression expression series statement mode of issuance for expression notes about expression surrogate [under development] attributes/properties in rda compared to ycr (cont.) 76 information technology and libraries | june 2009 model entity rda ycr manifestation title statement of responsibility edition statement numbering of serials production statement publication statement distribution statement manufacture statement copyright date series statement mode of issuance frequency identifier for the manifestation note media type carrier type base material applied material mount production method generation layout book format font size polarity reduction ratio sound characteristics projection characteristics of motion picture film video characteristics digital file characteristics equipment and system requirements terms of availability key identifier for manifestation publication statement of manifestation place of publication/distribution/release/broadcast of manifestation manifestation publisher/distributor/releaser/broadcaster manifestation date of publication/distribution/release/broadcast carrier edition statement carrier piece count carrier name carrier broadcast standard carrier recording type carrier playing speed carrier configuration of playback channels process used to produce carrier carrier dimensions carrier base materials carrier generation carrier polarity materials applied to carrier carrier encoding format intermediation tool requirements system requirements serial manifestation illustration statement manifestation standard number manifestation isbn manifestation issn manifestation publisher number manifestation universal product code notes about manifestation titlemanifestation key identifier for title-manifestation variant title for title-manifestation title-manifestation title title-manifestation statement of responsibilities title-manifestation edition statement publication statement of title-manifestation place of publication/distribution/release/broadcasting of titlemanifestation publisher/distributor/releaser, broadcaster of title-manifestation date of publication/distribution/release/broadcast of titlemanifestation title-manifestation series title-manifestation mode of issuance notes about title-manifestation title-manifestation standard number attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 77 model entity rda ycr serial title key identifier for serial title variant title for serial title title of serial title serial title statement of responsibility serial title edition statement publication statement of serial title place of publication/distribution/release/broadcast of serial title publisher/distributor/releaser/broadcaster of serial title date of publication/distribution/release/broadcast of serial title serial title beginning numeric and/or alphabetic designations serial title beginning chronological designations serial title ending numeric and/or alphabetic designations serial title ending chronological designations serial title frequency serial title mode of issuance serial title illustration statement notes about serial title serial title issn-l item preferred citation custodial history immediate source of acquisition identifier for the item item-specific carrier characteristics key identifier for item item barcode item location item call number or accession number item copy number item provenance item condition item marks and inscriptions item exhibition history item treatment history item scheduled treatment item access restrictions attributes/properties in rda compared to ycr (cont.) 78 information technology and libraries | june 2009 model entity rda ycr person name of the person preferred name for the person variant name for the person date associated with the person title of the person fuller form of name other designation associated with the person gender place of birth place of death country associated with the person place of residence address of the person affiliation language of the person field of activity of the person profession or occupation biographical information identifier for the person key identifier for person language-based identifier (preferred lexical label) for person clan name of person forename/given name/first name of person matronymic of person middle name of person nickname of person patronymic of person surname/family name of person natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of person affiliation of person biography/history of person date of birth of person date of death of person ethnicity of person field of activity of person gender of person language of person place of birth of person place of death of person place of residence of person political affiliation of person profession/occupation of person religion of person variant name for person fictitious character [under development] performing animal [under development] corporate body name of the corporate body preferred name for the corporate body variant name for the corporate body place associated with the corporate body date associated with the corporate body associated institution other designation associated with the corporate body language of the corporate body address of the corporate body field of activity of the corporate body corporate history identifier for the corporate body key identifier for corporate body language-based identifier (preferred lexical label) for corporate body dates associated with corporate body field of activity of corporate body history of corporate body language of corporate body place associated with corporate body type of corporate body variant name for corporate body corporate subdivision [under development] place as jurisdictional corporate body [under development] attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 79 model entity rda ycr conference or other event as corporate body creator [under development] jurisdictional corporate subdivision [under development] family name of the family preferred name for the family variant name for the family type of family date associated with the family place associated with the family prominent member of the family hereditary title family history identifier for the family concept term for the concept preferred term for the concept variant term for the concept type of concept identifier for the concept key identifier for concept language-based identifier (preferred lexical label) for concept qualifier for concept language-based identifier variant name for concept object name of the object preferred name for the object variant name for the object type of object date of production place of production producer/fabricator physical medium identifier for the object key identifier for object language-based identifier (preferred lexical label) for object qualifier for object language-based identifier variant name for object event name of the event preferred name for the event variant name for the event date associated with the event place associated with the event identifier for the event key identifier for event or historical period as subject language-based identifier (preferred lexical label) for event or historical period as subject beginning date for event or historical period as subject ending date for event or historical period as subject variant name for event or historical period as subject place name of the place preferred name for the place variant name for the place coordinates other geographical information identifier for the place key identifier for place as geographic area language-based identifier (preferred lexical label) for place as geographic area qualifier for place as geographic area variant name for place as geographic area discipline key identifier for discipline language-based identifier (preferred lexical label) (name or classification number or symbol) for discipline translation of meaning of classification number or symbol for discipline attributes/properties in rda compared to ycr (cont.) 80 information technology and libraries | june 2009 model entity rda ycr genre/form key identifier for genre/form language-based identifier (preferred lexical label) for genre/form variant name for genre/form name scope of usage date of usage identifier controlled access point rules agency note: in rda, the following attributes have not yet been assigned to a particular class or entity: extent, dimensions, terms of availability, contact information, restrictions on access, restrictions on use, uniform resource locator, status of identification, source consulted, cataloguer’s note, status of identification, and undifferentiated name indicator. name is being treated as both a class and a property. identifier and controlled access point are treated as properties rather than classes in both rda and ycr. attributes/properties in rda compared to ycr (cont.) 116 information technology and libraries | september 2009 success factors and strategic planning: rebuilding an academic library digitization program cory lampert and jason vaughan this paper discusses a dual approach of case study and research survey to investigate the complex factors in sustaining academic library digitization programs. the case study involves the background of the university of nevada, las vegas (unlv) libraries’ digitization program and elaborates on the authors’ efforts to gain staff support for this program. a related survey was administered to all association of research libraries (arl) members, seeking to collect baseline data on their digital collections, understand their respective administrative frameworks, and to gather feedback on both negative obstacles and positive inputs affecting their success. results from the survey, combined with the authors’ local experience, point to several potential success factors including staff skill sets, funding, and strategic planning. e stablishing a successful digitization program is a dialog and process already undertaken or currently underway at many academic libraries. in 2002, according to an institute of museum and library services report, “thirty-four percent of academic libraries reported digitization activities within the past 12 months.” nineteen percent expect to be involved in digitization work in the next twelve months, and forty-four percent beyond twelve months.1 more current statistics from a subsequent study in 2004 reflected that digitization work has both continued and expanded, with half of all academic libraries performing digitization activities.2 fifty-five percent of arl libraries responded to a survey informing part of the 2006 association of research libraries (arl) study managing digitization activities; of these, 97 percent of the respondents indicated engagement in digitization.3 the 2008 ithaka study key stakeholders in the digital transformation in higher education found that nearly 80 percent of large academic libraries either already have or plan to have digital repositories.4 with digitization becoming the norm in many institutions, the time is right to consider what factors contribute to the success and rapid growth of some library digitization programs while other institutions find digitization challenging to sustain. the evolution of digitization at the unlv libraries is doubtless a journey many institutions have undertaken. over the past couple of years, those responsible for such a program at the unlv libraries have had the opportunity to revitalize the program and help collaboratively address some key philosophical questions that had not been systematically asked before, let alone answered. associated with this was a concerted focus to engage other less involved staff. one goal was to help educate them on academic digitization programs. another goal was to provide an opportunity for input on key questions related to the programs’ strategic direction. as a subsequent action, the authors conducted a survey of other academic libraries to better understand what factors have contributed to their programs’ own success as well as challenges that have proven problematic. many questions asked of our library staff in the planning and reorganization process were asked in the survey of other academic libraries. while the unlv libraries have undertaken what is felt are the proper structural steps and have begun to author policies and procedures geared toward an efficient operation, the authors wanted to better understand the experiences, key players, and underlying philosophies of other institutional libraries as theses pertain to their own digitization program. the following article provides a brief context relating the background of the unlv libraries’ digitization program and elaborates on the authors’ efforts toward educating library colleagues and gaining staff buy-in for unlv’s digitization program—a process that countless other institutions have no doubt experienced, led, or suffered. the administered survey to arl members dealt with many topics similar to those that arose during the authors’ initial planning and later conversations with library staff, and as such, survey questions and responses are integrated in the following discussion. the authors administered a 26-question survey to the 123 members of the arl. the focus of this survey was different from the previously mentioned arl study managing digitization activities, though several of the questions overlapped to some degree. in addition to demographic or concrete factual types of questions, the unlv libraries digitization survey had several questions focused on perceptions—that is, staff support, administrative support, challenges, and benefits. areas of overlap with the earlier arl survey are mentioned in the appropriate context. though unlv isn’t a member of the arl, we consider ourselves a research library, and, regardless, it was a convenient way to provide some structure to the survey. survey responses were collected for a forty-five-day period from mid-june to late july, 2008. through visiting each and every arl library’s website, the authors identified the individuals that appeared to be the “leaders” of the arl digitization programs, with instructions to forward the message to a colleague if cory lampert (cory.lampert@unlv.edu) is digitization projects librarian and jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. success factors and strategic planning | lampert and vaughan 117 they themselves had been incorrectly identified. this was very tricky, and revealed numerous program structures in place, differences between institutions in promoting their collections, and so on. the authors didn’t necessarily start with the presumption that all arl libraries even have a digitization program, but most (but not all) either seemed to have a formal organized digitization program with staffing, or at least had digitized and made available something, even if only a single collection. we e-mailed a survey announcement and a link to the survey to the targeted individuals, with a follow-up reminder a month later. responses were anonymous, and respondents were allowed to skip questions; thus the number of responses for the twenty-six questions making up the survey ranged from a low of thirty (24.4 percent) to a high of forty-four responses (35.8 percent). the average number of responses for each of the questions was 39.8, yielding an overall response rate of 32.4 percent. questions were of three types: multiple choice (select one answer), multiple choice (mark all that apply), and open text. in addition, some of the multiple choice questions allowed additional open text comments. survey responses appear in appendix a. n context of the unlv libraries’ digitization program “digital collection,” for the purpose of the unlv library digitization survey, was defined as a collection of library or archival materials converted to machine-readable format to provide electronic access or for preservation purposes; typically, digital collections are library-created digital copies of original materials presented online and organized to be easily searched. they may offer features such as: full text search, browsing, zooming and panning, side by side comparison of objects, and export for presentation and reuse. one question the survey asked was “what year do you feel your library published its first ‘major’ digital collection?” responses ranged from 1990 to 2007; the general average of all responses was 2001. the earlier arl study found 2000 as the year most respondents began digitization activities.5 mirroring this chronology, the unlv libraries has been active in designing digital projects and digitizing materials from library collections since the late 1990s. technical web design expertise was developed in the cataloging unit (later renamed bibliographic and metadata services), and some of the initial efforts were to create online galleries and exhibits of visual materials from special collections, such as the jeanne russell janish (1998) exhibit.6 subsequently, the unlv libraries purchased the contentdm digital collection management software, providing both back-end infrastructure and front-end presentation for digital collections. later, the first digitization project with search functionality was created in partnership with special collections and was funded by a unlv planning initiative award received in 1999. the early las vegas (2003) project focused on las vegas historical material and was designed to guide users to search, retrieve, and manipulate results using contentdm software to query a database.7 unlv’s development corresponds with regional developments in utah in 2001, when “the largest academic institutions in utah were just beginning to develop digital imaging projects.”8 data from the 2004 imls study showed that, in the twelve months prior to the study release in 2004, the majority of larger academic libraries had digitized between one and five hundred images for online presentation.9 in terms of staffing, digitization efforts occur in a wide variety of configurations, from large departments to solo librarians managing volunteers. for institutions with recognized digitization staff, great variations exist between institutions in terms of where in the organizational chart digitization staff are placed. boock and vondacek’s research revealed that, of departments involved in digitization, special collections, archives, technical services, and newly created digital library units are where digitization activities most commonly take place.10 a majority of respondents to the arl study indicated that some or all activities associated with digitization are distributed across various units in the library.11 in 2003, the unlv libraries created a formal department within the knowledge access management division—web and digitization services (wds)—initially comprising five staff focused on the development of the unlv libraries’ public website, the development of web-based applications and databases to manage and efficiently present information resources, and the digitization and online presentation of library materials unique to the unlv libraries’ collections and of potential interest to a wider audience. augmenting their efforts were individuals in other departments helping with metadata standards, content selection, and associated systems technical support. the unlv library digitization survey showed that the majority (78 percent) of libraries that responded have at least one full-time staff member whose central job responsibility is to support digitization activities. this should not imply the existence of a fully staffed digitization program; the 2006 imls study found that 74.1 percent of larger academic libraries described themselves as lacking in sufficiently skilled technology staff to accomplish technology-related activities.12 central to any digitization program should be some structure in terms of how projects are proposed and subsequently prioritized. to help guide the priorities 118 information technology and libraries | september 2009 of unlv’s infant wds department, a digital projects advisory committee was formed to help solicit and prioritize project ideas, and subsequently track the development of approved projects. this committee’s work could be judged as having mixed success partly because it met too infrequently, struggled with conflicting philosophical thoughts on digitization, and was confronted with the reality that staff that were needed to help bring approved ideas to fruition simply weren’t in place because of too many other library priorities drawing attention away from digitization. an evaluation of the lessons learned from these early years can be found in brad eden’s article.13 the unlv library digitization survey had several questions related to management and prioritization for digital projects and shows that despite the challenges of a committee-based decisionmaking structure, when a formal process is in place at all, 42.1 percent of survey respondents used a committee versus a single decision maker (23.7 percent) for determining to whom projects are proposed for production. a follow-up question asked “how are approved projects ultimately prioritized?” the most popular response (54.1 percent) indicated “by a committee for review by multiple people,” followed by “no formal process” (27 percent). “by a single decision maker” was selected by 18.9 percent of the respondents. the earlier arl study asked a somewhat related question: “who makes decisions about the allocation of staff support for digitization efforts? check all that apply.” out of seven possible responses, the three most popular were “head of centralized unit,” “digitization team/committee/working group,” and “other person”; the other person was most often in an administrative capacity, such as a dean, director, or department head.14 administrative support for a program was another variable the unlv library digitization survey investigated. the survey asked respondents to rate, on a scale of one to five, “how would you characterize current support for digitization by your library’s administration?” more than 40 percent of responses indicated “consistent support,” followed by 31 percent of respondents indicating “very strong support, top priority,” 14.3 percent ranking support as neutral, and 14.2 percent claiming “minimal support” or “very little support, or some resistance.” it was also clear from some of the other questions’ responses that the dean or director’s support (or lack thereof) can have dramatic effects on the digitization program. 2005 brought change to the unlv libraries in the form of a new dean. well-suited for the digitization program, she came from california, a state very heavily engaged and at the forefront of digitization within the library and larger academic environment. one of her initiatives was a retooling of the digitization program at the unlv libraries, and her enthusiasm reflects a growing awareness of administrators regarding the benefits of digitization. n reorganization, library staff engagement, and decision making in 2006, two new individuals joined unlv libraries’ web and digitization services department, the digitization projects librarian (filling a vacancy), and the web technical support manager (a new position). a bit later, the systems department (providing technical support for the web and digitization servers, among other things), and the wds department were combined into a single unit and renamed library technologies. collectively, these changes brought new and engaged staff into the digitization program and combined under one division many of the individuals responsible for digital collection creation and support. perhaps more subtlety, this arrangement also provided formal acknowledgement of the importance and desire of publishing digital collections. with the addition of new staff and a reorganization, a piece still missing was a resuscitation of library stakeholders to help solicit, prioritize, and manage the creation of digital collections and an overall vision guiding the program. while the technical expertise, knowledge of metadata and imaging standards, and deep-rooted knowledge of digitization programs and concepts existed within the library technologies staff, other knowledge didn’t—primarily in-depth knowledge of the unlv libraries’ special collections and a track record of deep engagement with college faculty and the educational curriculum. similar to other organizations, the unlv libraries had not only created a new unit, but was also poised to introduce cross-departmental project groups that would collaborate on digitization activities. in their study of arl and greater western library association (gwla) libraries, book and vondracek found that this was the most commonly used organizational structure.15 knowledge of the concepts of a digitization program and what is involved in digitizing and sustaining a collection was not widespread among other library colleagues. acknowledged, but not guaranteed up front for the unlv libraries, was the likely eventual reformation of a group of interested and engaged library stakeholders charged to solicit, prioritize, and provide oversight of the unlv libraries’ digitization program. for various reasons, the authors wanted to garner staff buy-in to the highest degree possible. apart from wanting less informed colleagues to understand the benefits of a digitization program, it was also likely that such colleagues would help solicit projects through their liaison work with programs of study across campus. one unlv library digitization survey question asked, “how would you characterize support for digitization in your library by the majority of those providing content for digitization projects?” “consistent support” was indicated by 65.9 percent of respondents; 15.9 percent indicated “very strong support, top priority,” 13.6 percent indicated neutrality, and 4.6 success factors and strategic planning | lampert and vaughan 119 percent indicated either minimal support or even some resistance. to help garner staff buy-in and set the stage for revitalizing the unlv libraries’ digitization efforts, we began laying the groundwork to educate and engage library staff in the benefits of a digitization program. this work included language successfully woven into the unlv libraries’ strategic plan and an authored white paper posing engaging questions to the larger library audience related to the strategic direction of the program. finally, we planned and executed two digitization workshops for library staff. n the strategic plan one unlv library digitization survey question asked, “is the digitization program or digitization activities referenced in your library’s strategic plan?” a total of 63.4 percent indicated yes, with an additional 22 percent indicating no specific references, but rather implied references. only 7.3 percent indicated that the digitization program was not referenced in any manner in the strategic plan, while, surprisingly, 3 responses (7.3 percent) indicated that their library doesn’t have a strategic plan. the unlv libraries’ strategic plan is an important document authored with wide feedback from library staff, and it exemplifies the participatory decision-making process in place in the library. the current iteration of the strategic plan covers 2007–9 and includes various goals with supporting strategies and action items.16 in addition, all action items have associated assessment metrics and library staff responsible for championing the action items. departmental annual reports explicitly reference progress toward strategic plan goals. as such, if goals related to the digitization program appear in the strategic plan, that’s a clear indication, to some degree, of staff buy-in in acknowledging the significance of the digitization program. fortunately, digitization efforts figure prominently in several goals, strategies, and action items, including the following: n increasingly provide access to digital collections and services to support instruction, research, and outreach while improving access to the unlv libraries’ print and media collections. n provide greater access to digital collections while continuing to build and improve access to collections in all formats to meet the research and teaching needs of the university. identify collections to digitize that are unique to unlv and that have a regional, national, and international research interest. create digital projects utilizing and linking collections. develop and adapt metadata and scanning standards that conform to national standards for all formats. provide content and metadata for regional and national digital projects. continue to develop expertise in the creation and management of digital collections and information. collaborate with faculty, students, and others outside the library in developing and presenting digital collections. n be a comprehensive resource for the documentation, investigation, and interpretation of the complex realities of the las vegas metropolitan area and provide an international focal point for the study of las vegas as a unique urban and cultural phenomenon. facilitate real and digital access to materials and information that document the historical, cultural, social, and environmental setting of las vegas and its region by identifying, collecting, preserving, and managing information and materials in all formats. identify unique collections that strengthen current collections of national and international significance in urban development and design, gaming, entertainment, and architecture. develop new access tools and enhance the use of current bibliographic and metadata utilities to provide access to physical and digital collections. develop web-based digital projects and exhibits based upon the collections. an associated capital campaign case statement associated with the strategic plan lists several gift opportunities that would benefit various aspects of the unlv libraries; several of these include gift ideas related to the digitization of materials. n the white paper another important step in laying the groundwork for the digitization program was a comprehensive white paper authored by the recently hired digitization projects librarian. the finished paper was originally given to the dean of libraries and thereafter to the administrative cabinet, and eventually distributed to all library staff. the outline of this white paper is provided as appendix b. the purpose of the white paper was multifaceted. after a brief historical context, the white paper addressed perhaps the single most important aspect of a digitization program—program planning—developing the strategic goals of the program, selecting and prioritizing projects though a formal decision-making process, and managing initiatives from idea to reality through efficient project teams. this first topic addressing the core values of the program had a strong educational purpose for the entire library staff—the ultimate audience of the paper. as part of its educational goal, the white paper enumerated the various strengths of digitization and why an institution 120 information technology and libraries | september 2009 would want to sustain a digitization program (providing greater worldwide access to unique materials, promoting and supporting education and learning when integrated with the curriculum, etc.). it defined distinctions between an ephemeral digital exhibit and a long-term published and maintained collection. it discussed the various components of a digital collection—images, multimedia, metadata, indexing, thematic presentation (and the preference to be unbiased), integration with other digital collections and the library website, etc. it posited important questions on sustenance and assessment, and defined concepts such as refreshing of data and migration of data to help set the stage for future philosophical discussions. given the myriad reasons one might want to publish a digital collection, checked by the reality that all the reasons and advantages may not be realized or given equal importance, the white paper listed several scenarios and asked if each scenario was a strong underlying goal for our program—in short, true or false: n “the libraries are interested in digitizing select unique items held in our collection and providing access to these items in new formats.” n “the libraries are interested in digitizing whole runs of an information resource for access in new formats.” n “the libraries should actively pursue funding to support major digitization initiatives.” n “the libraries should take advantage of the unique publicity, promotion, and marketing opportunities afforded by a digital project/program.” continuing with a purpose of defining boundaries of the new program, the paper asked questions related to audience, required skill sets, and resources. the second primary topic introduced the selection and prioritization of the items and ideas suggested for digitization. it posed questions related to content criteria (why does this idea warrant consideration? would complex or unique metadata be required from a subject specialist?) and listed various potential evaluative measures of project ideas (should we do this if another library is already doing a very similar project?). technical criteria considerations were enumerated, touching on interoperability of collections in different formats, technical infrastructure considerations, and so on. multiple simultaneous ideas beg for prioritization, and the white paper proposed a formal review process and the library staff and skill sets that would help make such a process successful. the third primary topic focused on the details of carrying an approved idea to reality, and strengthened the educational purpose of the white paper. it described the general planning steps for an approved project and included a list of typical steps involved with most digital projects—scanning; creating metadata, indexes, and controlled vocabulary; coding and designing the web interface; loading records into unlv libraries’ contentdm system; publicizing the launch of the project; and assessing the project after completion. one unlv library digitization survey question was related to thirteen such skills the unlv libraries identified as critical for a successful digitization program. the question asked respondents to rate skill levels possessed by personnel at their library, based on a five-point scale (from one to five: “no expertise,” “very limited expertise,” “working knowledge/enough to get by,” “advanced knowledge,” and “tremendous expertise”). neither “no expertise” nor “very limited expertise” garnered the highest number of responses for any of the skills. the overall rating average of all thirteen skills was 3.79 out of 5. the skills with the highest rating averages were “metadata creation/cataloging” 4.4 and “digital imaging/document scanning/post image processing/photography” with 4.27. the skills with the lowest rating averages were “marketing and promotion” with 2.95 followed by “multimedia formats” with 3.33. the unlv libraries’ white paper contained several appendixes that likely provided some of the richest content of the white paper. with the educational thrust completed, the appendixes drew a roadmap of “where do we want to go from here?” this roadmap suggested the revitalization of an overarching digital projects advisory committee, potential members of the committee, and functions of the committee. the committee would be responsible for soliciting and prioritizing ideas and tracking the progress of approved ideas to publication. the appendixes also proposed project teams (which would exist for each project), likely members of the project teams, and the functions of the project team to complete day-to-day digitization activities. the liaison between the digital projects advisory committee and the project team would be the digitization projects librarian, who would always serve on both. the last page of the white paper provided an illustration highlighting the various steps proposed in the lifecycle of a digital project—from concept to reality. n digitization workshops several months after the white paper had been shared, the next step in restructuring the program and building momentum was sponsoring two forums on digitization. the first one occurred in november 2006 and included two speakers brought in for the event, roy tennant (formerly user services architect with the california digital library and now with oclc) and ann lally (head of the digital initiatives program at the university of washington libraries). this session consisted of a success factors and strategic planning | lampert and vaughan 121 two-hour presentation and q&a to which all library staff were invited, followed by two breakout sessions. all three sessions were moderated by the digitization projects librarian. questions from these sessions are provided in appendix c. the breakout sessions were each targeted to specific departments in the unlv libraries. the first focused on providing access to digital collections (definitions of digital libraries, standards, designing useful metadata, accessibility and interoperability, etc.). the second focused on components of a well-built digital library (goals of a digitization program, content selection criteria, collaboration, evaluation and assessment, etc.). colleagues from other libraries in nevada were invited, and the forum was well attended and highly praised. the sessions were recorded and later made available on dvd for library staff unable to attend. this initial forum accomplished two important goals. first, it was an allstaff meeting offering a chance to meet, explore ideas, and learn from two well-known experts in the field. second, it offered a more intimate chance to talk about the technical and philosophical aspects of a digitization program for those individuals in the unlv libraries associated with such tasks. as a momentum-building opportunity for the digitization program, the forum was successful. the second workshop occurred in april 2007. to gain initial feedback on several digitization questions and to help focus this second workshop, we sent out a survey to several dozen library staff—those that would likely play some role at some point in the digitization program. the survey contained questions focused on several thematic areas: defining digital libraries, boundaries to the digitization program, users and audience, digital project design, and potential projects and ideas. it contained thirteen questions consisting of open-ended response questions, questions where the respondent ranked items on a five-point scale, and “select all that apply”–type questions. we distributed the survey to invitees to the second workshop, approximately three dozen individuals; of those, eighteen (about 50 percent) responded to most of the questions. the survey was closely tied to the white paper and meant to gauge early opinions on some of the questions posed by that paper. whereas the first workshop included some open q&a, the second session was structured as a hands-on workshop to answer some of the digitization questions and to illustrate the complexity of prioritizing projects. the second workshop began with a status update on the retooling of the unlv libraries’ digitization program. this was followed by an educational component that focused on a diagram that detailed the workflow of a typical digitization project and who was involved and that emphasized the fact that there is a lot of planning and effort needed to bring an idea to reality. in addition, we discussed project types and how digital projects can vary widely in scope, content, and purpose. finally, we shared general results from the aforementioned survey to help set the stage for the structured hands-on exercises. the outline for this second workshop is provided in appendix d. one question of the unlv library digitization survey asked, “on a scale of 1 to 5, how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project, or enhancement of an existing project?” eight factors were listed, and the fivepoint scale was used (from one to five: “not important,” “less important,” “neutral,” “important,” and “vitally important”). the average rating for all eight factors was 3.66. the two most important factors were “collection includes unique items” (4.49 average rating) and “collection includes items for which there is a preservation concern or to make fragile items more accessible to the public” (3.95 average rating). the factors with the lowest average ratings were “collection includes integration of various media into a themed presentation” (2.54 average rating) followed by “collection involves a whole run of an information resource (i.e., such as an entire manuscript, newspaper run, etc.” (3.39 average rating). the earlier arl survey asked a somewhat related question, “what is/has been the purpose of these digitization efforts? check all that apply.” of the six possible responses (which differed somewhat from those in the unlv library digitization survey), the most frequent responses were “improved access to library collections,” “support for research,” and “preservation.”17 the earlier survey also asked the question, “what are the criteria for selecting material to be digitized? check all that apply.” the most frequent responses were “subject matter,” “material is part of a collection being digitized,” and “rarity or uniqueness of the item(s).”18 the first exercise of the second digitization workshop focused on digital collection brainstorming. the authors provided a list of ten project examples and asked each of the six tables (with four colleagues each) to prioritize the ideas. afterward, a speaker from each table presented the prioritizations and defended their rankings. this exercise successfully illustrated to peers in attendance that different groups of people have different ideas about what’s important and what constitutes prime materials for digitization. the rankings from the varying tables were quite divergent. a related question asked of the arl libraries in the unlv library digitization survey was “from where have ideas originated for existing, published digital collection at your library?” and offered six choices. respondents could mark multiple items. the most chosen answer (92.7 percent) was “special collections, archives, or library with a specialized collection or focus.” the least chosen answer (51.2 percent) was “an external donor, friend of the library, community user, etc.” for the second part of the workshop exercise, each table came up with their own digital collection ideas, defined the audience and content of the proposal, and defended and 122 information technology and libraries | september 2009 explained why they thought these were good proposals. fourteen unique and varied ideas were proposed, most of which were tightly focused on las vegas and nevada, such as “history of las vegas,” “unlv yearbooks,” “las vegas gambling and gamblers,” and “african american entertainers in las vegas.” other proposals were less tied to the area, such as a “botany collection,” “movie posters,” “children’s literature,” “architecture,” and “federal land management.” this exercise successfully showed that ideas for digital collections stretch across a broad spectrum, as broad as the individual brainchilden themselves. finally, in the last digitization workshop exercise, each table came up with specialties, roles, and skills of candidates who could potentially serve on the proposed committee, and defended their rationale—in other words, committee success factors. this exercise generated nineteen skills seen as beneficial by one or more of the group tables. at the end of the workshop, we asked if others had alternate ideas to the proposed committee. none surfaced, and the audience thought such a committee should be reestablished. this second workshop concluded with a brief discussion on next steps—drafting a charge for the committee, choosing members, and a plug for the expectation of subject liaisons working with their respective areas to help better identify opportunities for collaboration on digital projects across campus. n toward the future digital projects currently maintained by the unlv libraries include both static web exhibits in the tradition of unlv’s first digitization efforts, as well as several searchable contentdm–powered collections. the unlv libraries have also sought to continue collaborative efforts, participating as project partners for the western waters digital library (phase 1) and continuing in a regional collaboration as a hosting partner in the mountain west digital library. partnerships were shown in the unlv library digitization survey to garner increased buy-in for projects, with one respondent commenting that faculty partnerships had been “the biggest factor for success of a digital library project.” institutional priorities at unlv libraries reflect another respondent’s comment regarding “interesting archival collections” as a success factor. one recently launched unlv collection is the showgirls collection (2006), focused on a themed collection of historical material about las vegas entertainment history.19 another recently launched collection, the nevada test site oral history project (2008), recounts the memories of those affiliated with and affected by the nevada test site during the era of cold war nuclear testing and includes searchable transcripts, selected audio and video clips, and scanned photographs and images.20 with general library approval, the restructured digitization projects advisory committee was established in july 2007 with six members drawn from library technologies, special collections, the subject specialists, and at large. the advisory committee has drafted and gained approval for several key documents to help govern the committee’s future work. this includes a collection development policy for digitization projects and a project proposal form to be completed by the individual or group proposing an idea for a digital collection. at the time of writing, the committee is just now at the point of advertising the project proposal form and process, and time will tell how successful these documents prove. in the unlv library digitization survey, 65.4 percent responded that a digitization mission statement or collection development policy was in place at their institution. one goal at unlv is to “ramp up” the number of simultaneous digitization projects underway at any one time at unlv. many items in the special collections are ripe for digitization. many of these are uncataloged, and digitizing such collections would help promote these hidden treasures. related to ramping up production, one unlv library digitization survey question asked, “on average over the past three years, approximately how many new digital collections are published each year?” responses ranged from zero new collections to sixty. the average number of new collections added each year was 6.4 for the 32 respondents who gave exact numerical answers. while this is perhaps double the unlv libraries’ current rate of production, it illustrates that increasing production is an achievable goal. staffing and funding for the unlv libraries’ digitization program have both seen increases over the past several years. a new application developer was hired, and a new graphics/multimedia specialist filled an existing vacancy. together, these staff have helped with projects such as modifying contentdm templates, graphic design, and multimedia creation related to digital projects, in addition to working on other web-based projects not necessarily related to the digitization program. another position has a job focus shifted toward usability for all things webbased, including digitization projects. in terms of funding, the two most recent projects at the unlv libraries are both the result of successful grants. the recently launched nevada test site oral history project was the result of two grants from the u.s. departments of education and energy. subsequently, a $95,000 lsta grant proposal seeking to digitize key items related to the history of southern nevada from 1900 to 1925 was funded for 2008–9, with the resulting digital collection publicly launched in may 2009. this collection, southern nevada: the boomtown years, contains more than 1,500 items from several institutions, focused on the heyday of mining town life in southern success factors and strategic planning | lampert and vaughan 123 nevada during the early twentieth century.21 this grant funded four temporary positions: a metadata specialist, an archivist, a digital projects intern, and an education consultant to help tie the digitized collection into the k–12 curriculum. grants will likely play a large role in the unlv libraries’ future digitization activities. the unlv library digitization survey asked, “has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program?” the question sought to determine if grants had played a role, and if so, whether it was primarily large grants (defined as > $100,000), small grants (< $100,000), or both. the majority of responses (46.2 percent), indicated a combination of both small and large grants had been received in support of a project or the program. an additional 25.6 percent indicated that large grants had played a role, and 23.1 percent indicated that one or more small grants had played a role. two respondents (5.1 percent) indicated that no grants had been received or that they had not applied for any grants. the earlier arl survey asked the question, “what was/is the source of the funds for digitization activities? check all that apply.” of seven possible responses, “grant” was the second most frequent response, trailing only “library.”22 with an eye toward the future, the survey administered to arl libraries asked two blunt questions summarizing the overall thrust of the survey. one of the final open-ended survey questions asked, “what are some of the factors that you feel have contributed to the success of your institution’s digitization program?” forty respondents offered answers that ranged from listing one item to multiple items. several responses along the same general theme seemed to surface, which could be organized into rough clusters. in general, support from library administration was mentioned by a dozen respondents, with such statements as “consistent interest on the part of higher level administration,” “having support for the digitization program at an administrative level from the very beginning,” “good support from the library administration,” “support of the dean,” and, mentioned multiple times in the same precise language, “support from library administration.” faculty collaboration and interest across campus was mentioned by ten respondents, evidenced by statements such as “strong collaboration with faculty partners,” “support of faculty and other partners,” “interest from faculty,” “heavily involving faculty in particular . . . ensures that we can have continued funding since the faculty can lobby the provost’s office,” and “grant writing partnerships with faculty.” passionate individuals involved with the program and/or support from other staff in the libraries were mentioned by ten respondents, with comments such as “program management is motivated to achieve success,” “a strong department head,” “individual staff member ’s dedication to a project,” “commitment of the people involved,” “team work, different departments and staff willing to work together,” and “supportive individuals within the library.” having “good” content to digitize was mentioned by seven respondents, with statements such as “good content,” “collection strength,” “good collections,” and “availability of unique source materials.” strategic plan or goals integration was mentioned in several responses, such as “strong financial commitment from the strategic plan” and “mainstreaming the work of digital collection building into the strategic goals of many library departments.” successful grants and donor cultivation were mentioned by four respondents. other responses were more unique, such as one respondent’s one-word response—“luck”—and other responses such as “nimbleness, willingness, and creativity,” and “a vision for large-scale production, and an ability to achieve it.” the final unlv library digitization survey question asked, “what are the biggest challenges for your institution’s digitization program?” thirty-nine respondents provided feedback, and again, several variations on a theme emerged. the most common response, unsurprisingly, “not enough staffing,” was mentioned by eighteen respondents, with responses such as “lack of support for staffing at all necessary levels,” “the real problem is people, we don’t have enough staff,” “limited by staff,” and “we need more full-time people.” following this was (a likely related response) “funding,” mentioned by another nine respondents, with statements such as “funding for external digitization,” “identifying enough funding to support conversion,” “we could always use more money,” and, succinctly, “money.” related to staffing, specifically, six responses focused on technical staff or support from technical staff, such as “need more it (information technology) staff,” “need support from existing it staff,” “not enough application development staff,” and “limited technical expertise.” prioritization and demand issues surfaced in six responses, with responses such as “prioritizing efforts now that many more requests for digital projects have been submitted,” “prioritization,” “can’t keep up with demand,” and “everyone wants to digitize everything.” workflow was mentioned in four responses, such as “workflow bottlenecks,” “we need to simplify the process of getting materials into the repository,” and “it takes far longer to describe an object than to digitize it, thus creating bottlenecks.” “not enough space” was mentioned by three respondents, and “maintaining general librarywide staff support for the program” was mentioned by two respondents. the unlv libraries will keep in mind the experiences of our colleagues, as few, if any, libraries are likely immune to similar issues. 124 information technology and libraries | september 2009 n conclusions the unlv library digitization survey revealed, not surprisingly, that not all libraries, even those of high stature, are created equally. many have struggled to some extent in growing and sustaining their digitization programs. many have numerous published projects, others have few or perhaps even none. administrative and fellow colleague support varies, as does funding. additional questions remain to be tackled at the unlv libraries. how precisely will we define success for the digitization program? by the number of published collections? by the number of successful grants executed? by the number of image views or metadata record accesses? by the frequency of press in publications and word-of-mouth praise from fellow colleagues? ideas abound, but no definitive answers exist as of yet. at the larger level, other questions are looming. as libraries continue to promote themselves as relevant in the digital age, and promote themselves as a (or the) central partner in student learning, to what degree will libraries’ digital collections be tied into the educational curriculum, whether at their own affiliated institutions or with k–12 in their own states as well as beyond? clearly the profession is changing, with library schools creating courses and certificate programs in digitization. discussions about the integration of various information silos, metadata crosswalking, and item exposure in other online systems used by students will continue. library digitized collections are primary resources involved in such discussions. while these questions persist, it’s hoped that at a minimum, the unlv libraries have established the foundational structure to foster what we hope will be a successful digitization program. references 1. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2002 report,” may 23, 2002, www.imls.gov/publications/ techdig02/2002report.pdf (accessed mar. 1, 2009). 2. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report,” jan. 2006, www.imls.gov/resources/ techdig05/technology%2bdigitization.pdf (accessed mar. 1, 2009). 3. rebecca mugridge, managing digitization activities, spec kit 294 (washington, d.c.: association of research libraries, 2006): 11. 4. ross housewright and roger schonfeld, “ithaka’s 2006 studies of key stakeholders in the digital transformation in higher education,” aug. 18, 2008, www.ithaka.org/research/ ithakas%202006%20studies%20of%20key%20stakeholders%20 in%20the%20digital%20transformation%20in%20higher%20 education.pdf (accessed mar 1, 2009). 5. ibid. 6. university of nevada, las vegas university libraries, “jeanne russell janish, botanical illustrator: landscapes of china and the southwest,” oct. 17, 2006, http://library.unlv .edu/speccol/janish/index.html (accessed mar. 1, 2009). 7. university of nevada, las vegas university libraries, “early las vegas,” http://digital.library.unlv.edu/early_ las_vegas/earlylasvegas/earlylasvegas.html (accessed mar. 1, 2009). 8. arlitsch, kenning, and jeff jonsson, “aggregating distributed digital collections in the mountain west digital library with the contentdm multi-site server,” library hi tech 23, no. 2 (2005): 221. 9. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 10. michael boock and ruth vondracek, “organizing for digitization: a survey,” portal: libraries and the academy 6, no. 2 (2006), http://muse.jhu.edu/journals/portal_libraries_and_ the_academy/v006/6.2boock.pdf (accessed mar. 1, 2009). 11. mugridge, managing digitization activities, 12. 12. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 13. brad eden, “managing and directing a digital project,” online information review 25, no. 6 (2001), www.emerald insight.com/insight/viewpdf.jsp?contenttype=article& filename=html/output/published/emeraldfulltextarticle/ pdf/2640250607.pdf (accessed mar. 1, 2009). 14. mugridge, managing digitization activities, 32–33. 15. boock and vondracek, “organizing for digitization: a survey.” 16. university of nevada, las vegas university libraries, “university libraries strategic goals and objectives,” june 1, 2005, www.library.unlv.edu/about/strategic_goals.pdf (accessed mar. 1, 2009). 17. mugridge, managing digitization activities, 20. 18. ibid, 48. 19. university of nevada, las vegas university libraries, “showgirls,” http://digital.library.unlv.edu/showgirls/ (accessed mar. 1, 2009). 20. university of nevada, las vegas university libraries, “nevada test site oral history project,” http://digital.library .unlv.edu/ntsohp/ (accessed mar. 1, 2009). 21. university of nevada, las vegas university libraries, “southern nevada: the boomtown years,” http://digital .library.unlv.edu/boomtown/ (accessed may 15, 2009). 22. mugridge, managing digitization activities, 40. success factors and strategic planning | lampert and vaughan 125 appendix a. unlv library digitization survey responses 1. is the digitization program or digitization activities referenced in your library’s strategic plan? answer options (41 responses total) response percent response count yes 63.4 26 no 7.3 3 not specifically, but implied 22.0 9 our library doesn’t have a strategic plan 7.3 3 2. how would you characterize current support for digitization by your library’s administration? answer options (42 responses total) response percent response count very strong support, top priority 31.0 13 consistently supportive 40.5 17 neutral 14.3 6 minimal support, 7.1 3 very little support, or some resistance 7.1 3 3. how would you characterize support for digitization in your library by the majority of those providing content for digitization projects (i.e., regardless of whether those providing content have as a primary or a minor responsibility provisioning content for digitization projects)? answer options (44 responses total) response percent response count very strong support, top priority 15.9 7 consistently supportive 65.9 29 neutral 13.6 6 minimal support 2.3 1 very little support, or some resistance 2.3 1 126 information technology and libraries | september 2009 4. what year do you feel your library published its first “major” digital collection? major is defined as this was the first project deemed as having permanence and which would be sustained; it has associated metadata, etc. if you do not know, you may estimate or type “unknown.” responses ranged from 1990 to 2007. 5. to date, approximately how many digital collections has your library published? (please do not include ephemeral exhibits that may have existed in the past but no longer are present or sustained.) responses ranged from 1 to 1,000s. the great majority of responses were under 100; four responses were between 100 and 200, and one response was “1,000s.” success factors and strategic planning | lampert and vaughan 127 6. on average over the past 3 years, approximately how many new digital collections are published each year? all but two responses ranged from 0 to 10. one response was 13, one was 60. 7. what hosting platform(s) do you use for your digital collections (e.g., contentdm, etc.)? 8. does your institution have an institutional repository (e.g., dspace)? answer options (41 responses total) response percent response count yes 73.2 30 no 26.8 11 9. if the answer was “yes” in question 5, is your institutional repository using the same software as your digital collections? answer options (30 responses total) response percent response count yes 26.7 8 no 73.3 22 128 information technology and libraries | september 2009 10. is there an individual at your library whose central job responsibility is the development, oversight, and management of the library’s digitization program? (for purposes of this survey, central job responsibility means that 50 percent or more of the employee’s time is dedicated to digitization activities.) answer options (38 responses total) response percent response count yes 78.9 30 no 21.1 8 11. are there regular, full-time staff at your library who have as their primary or one of their primary job responsibilities support of the digitization program? for this question, a primary job responsibility means that at least 20 percent of their normal time is spent on activities directly related to supporting the digitization program or development of a digital collection. (mark all that apply) answer options (39 responses total) response percent response count digital imaging/document scanning, post-image processing, photography 82.1 32 metadata creation/cataloging 79.5 31 archival research of documents included in a collection(s) 28.2 11 administration of the hosting server 53.8 21 grant writing/donor cultivation/program or collection marketing 23.1 9 project management 61.5 24 multimedia formats 25.6 10 database design and data manipulation 53.8 21 maintenance, customization, and/or configuration of digital asset management software or features within that software (e.g., contentdm) 64.1 25 programming languages 30.8 12 web design and development 71.8 28 usability 25.6 10 marketing and promotion 28.2 11 none of the above 2.6 1 12. approximately how many individuals not on the full-time library staff payroll (i.e., student workers, interns, fieldworkers, volunteers) are currently working on digitization projects? answers ranged from 0 to “approximately 46.” the majority of responses (24) fell between 0 and 10 workers; twelve responses indicated more than 10; several responses indicated “unknown.” success factors and strategic planning | lampert and vaughan 129 13. has your library funded staff development, training, or conference opportunities that directly relate to your digitization program and activities for one or more library staff members? answer options (41 responses total) response percent response count yes, frequently, one or more staff have been funded by library administration for such activities 48.8 20 yes, occasionally, one or more staff have been funded by library administration for such activities 51.2 21 no, to the best of my knowledge, no library staff member has been funded for such activities 0.0 0 14. where does the majority of digitization work take place? answer options (41 responses total) response percent response count centralized in the library (majority of content digitized using library staff and equipment in one department) 48.8 20 decentralized (majority of content digitized in multiple library departments or outside the library by other university entities) 12.2 5 through vendors or outsourcing 7.3 3 hybrid of approaches depending on project 31.7 13 15. on a scale of 1 to 5 (1 being least important and 5 being vitally important), how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project or enhancement of an existing project? answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes item(s) for which there is a preservation concern or to make fragile item(s) more accessible to the public 0 1 9 22 9 3.95 41 collection includes unique items 0 0 1 19 21 4.49 41 collection involves a whole run of an information resource (e.g., an entire manuscript, newspaper run, etc.) 2 5 11 21 2 3.39 41 130 information technology and libraries | september 2009 answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes the integration of various media (i.e., images, documents, audio) into a themed presentation 7 11 17 6 0 2.54 41 collection has a direct tie to educational programs and initiatives (e.g., university courses, statewide education programs, or k–12 education) 3 3 6 17 12 3.78 41 collection supports scholarly communication and/or management of institutional content 1 4 7 21 8 3.76 41 collection involves a collaboration with university colleagues 1 3 9 18 10 3.83 41 collection involves a collaboration with entities external to the university (e.g., public libraries, historical societies, museums) 2 4 11 19 5 3.51 41 16. from where have ideas originated for existing, published digital collections at your library? in other words, have one or more digital collections been the brainchild of one of the following? (mark all that apply) answer options (41 responses total) response percent response count library subject liaison or staff working with teaching faculty on a regular basis 75.6 31 library administration 65.9 27 special collections, archives, or library with a specialized collection or focus 92.7 38 digitization program manager 63.4 26 university staff or faculty member outside the library 68.3 28 an external donor, friend of the library, community user, etc. 51.2 21 (continued from previous page) success factors and strategic planning | lampert and vaughan 131 17. to whom are new projects first proposed to be evaluated for digitization consideration? answer options (38 responses total) response percent response count to an individual decision-maker 23.7 9 to a committee for review by multiple people 42.1 16 no formal process 34.2 13 18. how are approved projects ultimately prioritized? answer options (37 responses total) response percent response count by a single decision-maker 18.9 7 by a committee for review by multiple people 54.1 20 by departments or groups outside of the library 0.0 0 no formal process 27.0 10 19. are digitization program mission statements, selection criteria, or specific prioritization procedures in use? answer options (40 responses total) response percent response count yes, one or more of these forms of documentation exist detailing process 67.5 27 yes, some criteria are used but no formal documentation exists 25.0 10 no documented process in use 7.5 3 20. what general evaluation criteria do you employ to measure how successful a typical digital project is? (mark all that apply) answer options (39 responses total) response percent response count log analysis showing utilization/record views of digital collection items 69.2 27 analysis of feedback or survey responses associated with the digital collection 38.5 15 publicity generated by, or citations referencing, digital collection 46.2 18 e-commerce sales or reproduction requests for digital images 12.8 5 we have no specific evaluation measures in use 33.3 13 132 information technology and libraries | september 2009 21. has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program? answer options (39 responses total) response percent response count we have received one or more smaller grants or donations (each of which was $100,000 or less) to support a digital collection/program 23.1 9 we have received one or more larger grants or donations (each of which was greater than $100,000) to support a digital collection/program 25.6 10 we have received a mix of small and large grants or donations to support a digital collection/program 46.2 18 we have been unsuccessful in receiving grants or have not applied for any grants—grants and/or donations have not played any role whatsoever in supporting a digital collection or our digitization program 5.1 2 22. how would you rate the overall level of buy-in for collaborative digitization projects between the library and external partners (an external partner is someone not on the full-time library staff payroll, such as other university colleagues, colleagues from other universities, etc.)? answer options (41 responses total) response percent response count excellent 41.5 17 good 39.0 16 neutral 4.9 2 minimal 7.3 3 low or none 0.0 0 not applicable—our library has not yet published or attempted to publish a collaborative digital project involving individuals outside the library 7.3 3 23. when considering the content available for digitization, which of the following statements apply? (mark all that apply) answer options (40 responses total) response percent response count at my institution, there is a lack of suitable library collections for digitization 0.0 0 content providers regularly contact the digitization program with project ideas 52.5 21 the main source of content for new digitization projects comes from special collections, archives, other libraries with specialized collections (maps, music, etc.), or local cultural organizations (historical societies, museums) 87.5 35 success factors and strategic planning | lampert and vaughan 133 answer options (40 responses total) response percent response count the main source of content for new digitization projects comes from born digital materials (such as dissertations, learning objects, or faculty research materials) 32.5 13 content digitization is mainly limited by available resources (lack of staffing, space, equipment, expertise) 47.5 19 obtaining good content for digitization can be challenging 7.5 3 24. various types of expertise are important in collaborative digitization projects. please rate the level of your local library staff’s expertise in the following areas (1–5 scale, with 1 having no expertise and 5 having tremendous expertise). answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital imaging/ document scanning, post image processing, photography 0 1 3 21 16 0 4.27 41 metadata creation/ cataloging 0 0 2 20 18 0 4.40 40 archival research of documents included in a collection 0 2 6 15 16 2 4.15 41 administration of the hosting server 1 2 7 16 15 0 4.02 41 grant writing/ donor cultivation 1 4 13 13 8 2 3.59 41 project management 0 1 9 23 8 0 3.93 41 multimedia formats 0 5 21 10 4 1 3.33 41 database design and data manipulation 0 4 9 14 13 1 3.90 41 (continued from previous page) 134 information technology and libraries | september 2009 answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital asset management software (e.g., contentdm) 3 0 5 21 11 0 3.93 40 programming languages 4 3 14 9 11 0 3.49 41 web design and development 2 1 13 10 15 0 3.85 41 usability 1 7 12 13 8 0 3.49 41 marketing and promotion 2 11 17 7 3 1 2.95 41 25. what are some of the factors that you feel have contributed to the success of your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. 26. what are the biggest challenges for your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. appendix b. white paper organization i. introduction ii. current status of digitization projects at the unlv libraries iii. topic 1: program planning a. are there boundaries to the libraries digitization program? what should the program support? b. what resources are needed to realize program goals? c. who is the user or audience? d. when selecting and designing future projects, how can high-quality information be presented in online formats incorporating new features while remaining un-biased and accurate in service provision? e. to what degree do digitization initiatives need their own identity versus heavily integrating with the libraries’ other online components, such as the general website? f. how do the libraries plan on sustaining and evaluating digital collections over time? g. what type of authority will review projects at completion? how will the project be evaluated and promoted? iv. topic 2: initiative selection and prioritization a. project selection: what content criteria should projects fall within in order to be considered for digitization and what is the justification for conversion of the proposed materials? (continued from previous page) success factors and strategic planning | lampert and vaughan 135 b. project selection: what technical criteria should projects fall within in order to be considered for digitization? c. project selection: how does the project relate to, interact with, or complement other published projects and collections available globally, nationally, and locally? d. project selection and prioritization: after a project meets all selection criteria, resources may need to be evaluated before the proposal reaches final approval. what information needs to be discussed in order to finalize the selection process, select between qualified project candidates, and begin the prioritization process for approved proposals? e. project prioritization: should we develop a formal review process? v. topic 3: project planning a. what are the planning steps that each project requires? b. who will be responsible for the different steps in the project plan and department workload? c. how can the libraries provide rich metadata and useful access points? d. what type of web design will each project require? e. what type of communication needs to exist between groups during the project? vi. concluding remarks vii. related links and resources cited viii. white paper appendixes a. working list of advisory committee functions and project workgroup functions b. contentdm software: roles and expertise c. project team workflow d. contentdm elements appendix c. first workshop questions general questions 1. how do you define a digital library? do the terms “repository,” “digital project,” “exhibit,” or “online collection” connote different things? if so, what are the differences, similarities, and boundaries for each? 2. what factors have contributed to a successful digitization program at your institution? did anything go drastically wrong? were there any surprises? what should new digitization programs be cautious and aware of? 3. what is the role, specifically, of the academic library in creating digital collections? how is digitization tied to the mission of your institution? 4. why digitize and for whom? do digital libraries need their own mission statement or philosophy because they differ from physical collections? should there be boundaries to what is digitized? 5. what standards are most widely in use at this time? what does the future hold? are there new standards you are interested in? technical questions, metadata questions 1. what are some of the recommended components of digital library infrastructure that should be in place to support a digitization program (equipment, staff, planning, technical expertise, content expertise, etc?) 2. what are the relationships between library digitization initiatives, the library website, the campus website or portal, and the web? in what ways do these information sources overlap, interoperate, or require boundaries? 3. how do you decide on what technology to use? what is the decision-making process when implementing a new technology? 4. standards are used in various ways during digitization. what is the importance of using standards, and are there areas where standards should be relaxed, or not used at all? how do digitization programs deal with evolving standards? 5. preservation isn’t talked about as much as it used to be. what’s your solution or strategy to the problem of preserving digital materials? 6. will embedded metadata ever be the norm for digital objects, or will we continue to rely on collection management like contentdm to link digital objects to their associated metadata? 136 information technology and libraries | september 2009 appendix d. second workshop outline 1. introduction—purpose/focus of the meeting a. to talk about next steps in the digitization program b. quick review of the current status and where the program has been c. serve to further educate participants on the steps involved in taking a project idea to reality d. goals for participants: understand types of projects and project prioritization; engage in activities on ideas and prioritization; talk about process and discuss committee; open forum 2. staff digitization survey discussion a. “defining digital libraries” b. “boundaries to the digitization program” c. “users and audience” d. “digital project design” e. “potential projects and ideas” 3. first group exercise: digital project idea ranking and defense of ranking 4. second group exercise: digital project idea brainstorming and defense of ideas brainstormed 5. concept/proposal for a digitization advisory committee 6. conclusion and next steps collections and design questions 1. how do you decide what should be included in a digital library? does the digital library need a collection development policy and if so, what type? how are projects prioritized at your institution? 2. how do you decide who your user is? are digital libraries targeting mobile users or other users with unique needs? what value-added material compliments and enhances digital collections (i.e., item-level metadata records, guided searches, narrative or scholarly content, teaching material, etc.)? 3. how should digital libraries be assessed and evaluated? how do you gauge the success of a digital collection, exhibit, or library? what has been proven and disproved in the short time that libraries have been doing digital projects? 4. what role do digital libraries play in marketing the library? how do you market your digital collections? are there any design criteria that should be considered for the web presence of digital libraries (should the digital library look like the library website, the campus website, or have a unique look and feel)? 5. do you have any experience partnering with teaching faculty to create digital collections? how are collaborations initiated? are such collaborations a priority? what other types of collaborations are you involved in now? how do you achieve consensus with a diverse group of collaborators? to what degree is centralization important or unnecessary? article title | author 23frbrization of a library catalog | dickey 23 the functional requirements for bibliographic records (frbr)’s hierarchical system defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems. certain library materials (especially audio-visual formats) pose notable challenges to search and retrieval; the first benefits of a frbrized system would be felt in music libraries, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. this report will summarize the benefits of frbr to nextgeneration library catalogs and opacs, and will review the handful of ils and catalog systems currently operating with its theoretical structure. editor’s note: this article is the winner of the lita/ ex libris writing award, 2007. t he following review addresses the challenges and benefits of a next-generation online public access catalog (opac) according to the functional requirements for bibliographic records (frbr).1 after a brief recapitulation of the challenges posed by certain library materials—specifically, but not limited to, audiovisual materials—this report will present frbr’s benefits as a means of organizing the database and public search results from an opac.2 frbr’s hierarchical system of records defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems; it thus affords both library users and staff a more streamlined navigation between related items in different materials formats and among editions and adaptations of a work. in the eight years since the frbr report’s publication, a handful of working systems have been developed. the first benefits of such a system to an average academic library system would be felt in a branch music library, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. ■ current search and retrieval challenges the difficulties faced first, but not exclusively, by music users of most integrated library systems fall into two related categories: issues of materials formats, and issues of cataloging, indexing, and marc record structure. music libraries must collect, catalog, and support materials in more formats than anyone else; this makes their experience of the most common ils modules—circulation, reserves, and acquisitions—by definition more complicated. the study of music continues to rely on the interrelated use of three distinct information formats—scores (the notated manifestation of a composer’s or improviser’s thought), recordings (realizations in sound, and sometimes video, of such compositions and improvisations), and books and journals (intellectual thought regarding such compositions and improvisations)—music libraries continue to require . . . collections that integrate [emphasis mine] these three information formats appropriately.3 put a different way, “relatedness is a pervasive characteristic of music materials.”4 this is why frbr’s model of bibliographic relationships offers benefits that will first impact the music collection.5 at present, however, musical formats pose search and retrieval challenges for most ils users, and the problem is certainly replicated with microforms and video recordings. the marc codes distinguish between material formats, but they support only one category for sound recordings, lumping together cd, dvd audio, cassette tape, reel-toreel tape, and all other types.6 this single “sound recording” definition is easily reflected in opacs (such as those powered by innovative interfaces’ millennium and ex libris’ aleph 500) and union catalogs (such as worldcat. org).7 however, the distinction between sound recording formats is embedded in subfields of the 007 field, which presently cannot be indexed by many library automation systems because the subfields are not adjacent. an even more central challenge derives from the fact that music sound recordings—such as journals and essay collections—contain within each item more than one work. thus, for one of the central material formats collected by a music library (as well as by a public library or other academic branches), users routinely find themselves searching for a distinct subset of the item record. perversely, though music catalogers do tend to include analytic added-entries for the subparts of a cd recording or printed score, and major ils vendors are learning to index them, aacr2 guidelines set arbitrary cutoff points of about fifteen tracks on a sound recording, and three performable units within a score.8 subsets of essay collections and journal runs are routinely exposed to users’ searches by indexing and abstracting services and major databases, but subsets of libraries’ music collections depend upon catalogers to exploit the marc records for user access.9 timothy j. dickey (dickeyt@oclc.org) is a post-doctoral researcher, oclc office of programs and research, dublin, ohio. frbrization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display timothy j. dickey 24 information technology and libraries | march 200824 information technology and libraries | march 2008 in light of these pervasive bibliographic relationships, catalogers of music (again, with parallels in other subjects) have developed a distinctive approach to the marc metadata schema. in particular, they—with their colleagues in literature, fine arts, and theology—rely upon the 700t field for uniform work titles, and upon careful authority control.10 however, once again, many major ils portals have spotty records in affording access to library collections via these data. innovative interfaces’ millennium, though it clearly leads other major library products in this market, frequently frustrates music librarians (it is, of course, not alone in doing so).11 its automatic authority control feature works poorly with (necessary) music authority records.12 and even though innovative has been one of the first vendors to add a database index to the 700t field, partly in response to concerns expressed to the company by the music librarians’ user group, millennium apparently does not allow for an appropriate level of follow-through on searching.13 an initial search by name of a major composer, for instance, yields a huge and cluttered result set containing all indexed 700t fields.14 the results do helpfully include the appropriate see also references, but those references disappear in a subsidiary (limited) search. in addition, the subsidiary display inexplicably changes to an unhelpful arrangement of generic 245 fields (“mozart, symphonies”; “mozart, operas, excerpts”). similar challenges will be faced by other parts of an academic or large public library collection, including the literature collections (for works such as shakespeare’s plays), fine arts (for images and artists’ works), and theology (for works whose uniform title is in latin). the opac interfaces of other major ils vendors fare little better. the same search (for “mozart”) on the emory university library catalog (with an ils by sirsidynix), similarly yields a rich results set of more than one thousand records, and poses similar problems in refining the search.15 in the case of this opac, an index of 700t fields also exists, but it only may be searched from the inside of a single record; as with millennium, sirsidynix’s interface will then group the next set of results confusingly by 245 fields. the library corporation’s carl-x apparently does not contain a 700t index; the simple “mozart” search returns a muchsimplified set of only 97 results organized by 245a fields, and thus offers a more concise set of results but avoids the most incisive index for audio-visual materials.16 ex libris offers a somewhat more helpful display of its more restricted results; unfortunately for the present comparison, though the detailed results set does list the “format” of all mozart-authored items, the same term— “music”—is used for sound recordings, musical scores, and score excerpts, with no attempt logically to group the results around individual works.17 no 700t index appears present. ■ the frbr paradigm: review of literature and theory from the earliest library catalogs in the modern age, the tools of bibliographic organization have sought to afford users both access to the collection and collocation of related materials. anglo-american cataloging practice has traditionally served the first function by main entries and alternate access points and the second function by classification systems. however, as knowledge increases in scope and complexity, the systems of bibliographic control have needed to evolve. as early as the 1950s, theories were developing that sought to distinguish between the intellectual content of a work, and its often manifold physical embodiments.18 the 1961 paris international conference on cataloging principles first reified within the cataloging community a work-item distinction, though even the 1988 publication of the anglo-american cataloging rules, 2nd ed., “continued to demonstrate confusion about the nature . . . of works.”19 meanwhile, extensive research into the nature of bibliographic relationships groped toward a consensus definition of the entity-types that could encompass such relationships.20 ed o’neill and diane vizine-goetz examined some one hundred editions of smollett’s the expedition of humphrey clinker over a two-hundred-year span of publication history to propose a hierarchical set of definitions to define entity levels.21 the theoretical entities include the intellectual content of a work—which in the case of audio-visual works, may not even exist in any printed formats—the various versions, editions, and printings in which that intellectual content manifests itself, and the specific copies of each manifestation which a library may hold.22 research has discovered such clusters of bibliographically related entities for as much as 50 percent or more of all the intellectual works in any given library catalog, and as many as 85 percent of the works in a music catalog.23 this work laid the foundation for frbr (and, once again, incidentally underscored the breadth of its applicability to, and beyond, music catalogs). the theoretical framework of frbr is most concisely set forth in the final report of the ifla study group. the long-awaited publication traces its genesis to the 1990 stockholm seminar, and the resultant 1992 founding of the ilfa study group on functional requirements for bibliographic records. the study group set out to develop: a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities . . . a conceptual model that would serve as the basis for relating specific attributes and relationships . . . to the various tasks that users perform when consulting bibliographic records. article title | author 25frbrization of a library catalog | dickey 25 the study makes no a priori assumptions about the bibliographic record itself, either in terms of content or structure.24 in other words, the intention of the group’s deliberations and the final report is to present a model for understanding bibliographic entities and the relationships between them to support information organization tools. it specifically adopts an approach that defines classes of entities based upon how users, rather than catalogers, approach bibliographic records—or, by natural extension, any system of metadata. the frbr hierarchical entities comprise a fourfold set of definitions: ■ work: “a distinct intellectual or artistic creation”; ■ expression: “the intellectual or artistic realization of a work” in any combination of forms (including editions, arrangements, adaptations, translations, performances, etc.); ■ manifestation: “the physical embodiment of an expression of a work”; and ■ item: “a single exemplar of a manifestation.”25 examples of these hierarchical levels abound in the bibliographic universe, but frequently music offers the quickest examples: ■ work: mozart’s die zauberflöte (the magic flute) ■ work: puccini’s la bohéme ■ expression: the composer’s complete musical score (1896) ■ manifestation: edition of the score printed by ricordi in 1897 ■ expression: an english language edition for piano and voices ■ expression: a performance by mirella freni, luciano pavarotti, and the berlin philharmonic orchestra (october 1972) ■ manifestation: a recording of this perfor mance released on 33¹/³ rpm sound discs in 1972 by london records ■ manifestation: a re-release of the same per formance on compact disc in 1987 by london records ■ item: the copy of the compact disc held by the columbus metropolitan library ■ item: the copy of the compact disc held by the university of cincinnati in fact, lis research has tended to demonstrate what music librarians have always understood—that relatedness among items and complexity of families is most prevalent in audio-visual collections. even before the ifla report had been penned, sherry vellucci had set out the task: “to create new catalog structures that better serve the needs of the music user community, it is important first to understand the exact nature and complexity of the materials to be described in the catalog.”26 even limiting herself to musical scores alone (that is, no recordings or monographs), vellucci found that more than 94.8 percent of her sample exhibited at least one bibliographic relationship with another entity in the collection; she further related this finding to the very “inherent nature of music, which requires performance for its aural realization,” as opposed to, for example, monographic book printing.27 vellucci and others have frequently commented on how the relatedness of manifestations—in different formats, arrangements, and abridgements—of musical works continues to be a problem for information retrieval in the world of music bibliography.28 musical works have been variously and industriously described by musicologists and music bibliographers. yet, in the information retrieval domain [and, i might add, under both aacr and aacr2] . . . systems for bibliographic information retrieval . . . have been designed with the document as the key entity, and works have been dismissed as too abstract . . .29 the work is the access point many users will bring—in their minds, and thus in their queries—to a system. they intend, however, to discover, identify, and obtain specific manifestations of that work. very recently, research has begun to demonstrate that the frbr model can offer specific advantages to music retrieval in cases such as these: “the description of bibliographic data in a frbr-based database leads to less redundancy and a clearer presentation of the relationships which are implicit in the traditional databases found in libraries today.”30 explorations of the theory in view of the benefits to other disciplines, such as audio-visual and other graphic materials, maps, oral literature, and rare books, have appeared in the literature as well.31 the admitted weakness of the frbr theory, of course, is that it remains a theory at its inception, with still preciously few working applications. ■ frbr applications working implementations of frbr to catalogs, opacs, and ilss are still relatively few but promise much for the future. the frbr theoretical framework has remained an area of intense research at oclc, which has even led to some prototype applications and, very recently, deployment in the worldcat local interface.32 a scattered few other researchers have crafted frbr catalogs and catalog displays for their own ends; the library of congress has a prototype as well. innovative, the leading academic ils vendor, announced a frbr feature for 2005 release, 26 information technology and libraries | march 200826 information technology and libraries | march 2008 yet shelved the project for lack of a beta-testing partner library.33 ex libris’ primo discovery tool, one other complete ils (by visionary technologies for library systems, or vtls), and the national library of australia, have each deployed operational frbr applications.34 the number of projects testifies to the high level of interest among the cataloging and information science communities, while the relatively small number of successful applications testifies to the difficulties faced. oclc has engaged in a number of research projects and prototypes in order to explore ways that frbrization of bibliographic records could enhance information access. oclc research frequently notes the potential streamlining of library cataloging by frbrization; in addition they have experienced “superior presentation” and “more intuitive clustering” of search results when the model is incorporated into systems.35 work-level definitions stand behind such oclc research prototypes as audience level, dewey browser, fictionfinder, xisbn, and live search. in every case, researchers determined that, though it was very difficult to automate any identification of expressions, application of work-level categories both simplifies and improves search result sets.36 an algorithm common to several of these applications is freely available as an open source application, and now as a public interface option in oclc’s worldcat local.37 the algorithm creates an author/title key to cluster worksets (often at a higher level than the frbr work, as in the case of the two distinct works that are the book and screenplay for gone with the wind). in the public search interface, the results sets may be grouped at the work level; users may then execute a more granular search for “all editions,” an option that then displays the group of expressions linked to the work record. unfortunately, as the software does not use 700t fields (its intention is to travel up the entity hierarchy, and it uses the 1xx, 24x, and 130 fields), its usefulness in solving the above challenges may not be immediate. a somewhat similar application (though merrilee proffitt declares it not to be a frbr product) was redlightgreen, a user interface for the exrlg union catalog based upon quasi-frbr clustering.38 the reports from designers of other automated systems offer interesting commentaries on the process. the team building an automatically frbrized database and user interface for austlit—a new union collection of australian literature among eight academic libraries and the national library of australia—acknowledged some difficulty with non-monographic works such as poems, though the majority of their database consisted of simpler work-manifestation pairs.39 based on strongly positive user feedback (“the presentation of information about related works [is] both useful and comprehensible”), a similar application was attempted on the australian national music gateway musicaustralia; it is unclear whether the project was shelved due to difficulties in automating the frbrization process.40 one recent application created for the perseus digital library adopts a somewhat different approach.41 rather than altering previously created marc records to allow hierarchical relationships to surface, this team created new records using crosswalks between marc and, for instance, mods, for work-level records. they claim some moderate level of success; though once again, their discussion of the process is more illuminating than their product. mimno and crane successfully allowed a single manifestation-level record to link upwards to many expressions, a necessary analytic feature especially for dealing with sound recordings. they did practically demonstrate the difficulty of searching elements from different levels of the hierarchy at the same time (such as work title and translator), a complication predicted by yee.42 three ils vendors have released products that use the frbr model: portia (visualcat), ex libris (primo), and vtls (virtua).43 the first product, a cataloging utility from a smaller player in the vendor market, claims to incorporate frbr into its metadata capture, yet the information available does not explain how, nor do they offer an opac to exploit it. the 2007 release of ex libris’ primo offers what the company calls “frbr groupings” of results.44 this discovery tool is not itself an ils, but promises to interoperate with major existing ils products to consolidate search results. it remains unclear at this time how ex libris’ “standard frbr algorithms” actually group records; the single deployment in the danish royal library allows searching for more records with the same title, for instance, but does not distinguish between translations of the same work.45 vtls, on the other hand, has since 2004 offered a complete product that has the potential to modify existing marc records—via local linking tags in the 001 and 004 fields—to create frbr relationships.46 their own studies agreed with oclc that a subset, roughly 18 percent, of existing catalog records (most heavily concentrated in music collections) would benefit from the process, and they thus allow for “mixed” catalogs, with only subsets (or even individually selected records) to be frbrized. the company’s own information suggests relatively simple implementation by library catalogers, coupled with robust functionality for users, and may be the leading edge of the next generation of catalog products. ■ frbr solutions the ilfa study group, following its user-centered approach, set out a list of specific tasks that users of a computer-aided catalog should be able to accomplish: article title | author 27frbrization of a library catalog | dickey 27 ■ to find all manifestations embodying certain criteria, or to find a specific manifestation given identifying information about it; ■ to identify a work, and to identify expressions and manifestations of that work; ■ to select among works, among expressions, and among manifestations; and ■ to obtain a particular manifestation once selected. it seems clear that the frbr model offers a framework of relationships that can aid each task. unfortunately, none of the currently available commercial solutions may be in themselves completely applicable for a single library. the oclc work-set algorithm is open source, as well as easily available through worldcat local, but it only works to create super-work records; it also ignores the 700t field so crucial to many of the issues noted above. none of the other home-grown applications may have code available to an institution. the virtua module from vtls offers a very tempting solution, but may require a change of vendor.47 either adapting one of these solutions or designing a local application, then, raises the question: what would the ideal system entail? catalog frbrization will transpire in two segments: enhancing the existing catalog to add bibliographic relationships to surface in the retrieval phase, and designing or adaptating a new interface and display to reflect the relationships.48 the first task may prove the more formidable, due to the size of even a modest catalog database and the difficulties often observed in automating such a task; while the librarians constructing the austlit system found a relatively high percentage of records could be transferred en masse, the oclc research team had difficulty automatically pinpointing expressions from current marc records.49 despite current technology trends toward users’ application of tags, reviews, and other metadata, a task as specialized as adding bibliographic relationships to the catalog demands specialized cataloging professionals.50 the best approach within a current library structure may be to create a single new position to head the project and to act as liaison with cataloging staff in the various branches and with vendor staff, if applicable. each library branch may judge on its own the proportions of records to frbrize, beginning with high-traffic works and authors, those for whom search results tend to be the most overwhelming and confusing to users. each branch can be responsible for allocation of cataloging staff effort to the process, and will thus have specialist oversight of subsets of the database. three technical solutions to actually changing the database structure have been attempted in the literature to date: incrementally improving the existing marc records to better reflect bibliographic relationships, adding local linking tags, and simply creating new metadata schemas. the vtls solution of adding local linking tags seems most appropriate; relationships between records are created and maintained via unique identifiers and linking statements in the 001 and 004 fields.51 oclc’s open source software could expedite the creation of work-level records, and the creation of expression-level records will be made easier by the large amount of bibliographic information already present in the current catalog. wherever possible, cataloging staff also should take the opportunity to verify or create links to authority files so as to enhance retrieval.52 creating a new catalog display option could be accomplished via additions to current opac coding, either by adopting worldcat local or by designing parts of a new local interface. it need not even require a complete revision; the single site (ucl) currently deploying vtls’ frbrized interface maintains a mixed catalog and offers, once again, a highly intuitive model.53 when a searcher comes across a bibliographic record for which frbr linking is available, they may click a link to open a new display screen. we should strive, however, to use simple interface statements such as “view all different kinds of holdings,” “this work has x editions, in y languages” or “this version of the work has been published z times” (both the oclc prototype and the austlit gateway offer such helpful and user-friendly statements). though the foundational work of both tillett and smiraglia focused upon taxonomies of relationships, the hierarchical structure of the ifla proposal should remain at the forefront of the display, with a secondary organization by type of relationship or type of entity. rather than adopting a design which automatically refreshes at each click, a tree organization of the display should be more user-friendly, allowing users to maintain a visual sense of the organization that they are encountering (see appendix for screenshots of this type of tree display).54 format information should be included in the display, as an indication of a users’ primary category, as well as a distinction among expressions of a work. with these changes, the library catalog will begin to afford its users better access to many of its core collections. frbrization of even part of the catalog—concentrating on high-incidence authors, as identified by subject specialists—will allow it better to reflect, and collocate, items within the families of bibliographic relationships that have been acknowledged a part of library collections for decades. this increased collocation will begin to counteract the pitfalls of mere keyword searching on the part of users, especially in conjunction with renewed authority work. finally, frbr offers a display option in a revamped opac that is at the same time simpler than current result lists, and more elegant in its reflection of relatedness among items. each feature should better 28 information technology and libraries | march 200828 information technology and libraries | march 2008 enable the users of our catalog to find, select, and obtain appropriate resources, and will bring our libraries into the next generation of cataloging practice. references and notes 1. ifla committee on the functional requirements for bibliographic records, final report (munich: k. g. saur, 1998); see also http://www.ifla.org/vii/s13/wgfrbr/bibliography.htm (accessed mar. 10, 2007). 2. this paper began as a graduate research assignment for lis 60640 (library automation), in the kent state university mlis program, march 19, 2007. my thanks to jennifer hambrick, nancy lensenmayer, and joan lippincott, for their helpful comments on earlier drafts. the curricular assignment asked for a library automation proposal in a specific library setting; the original review contained a set of recommendations concerning frbr through the lens of a (fictional) medium-sized academic library system, that of st. hildegard of bingen catholic university. as will be noted below, the branch music library typically serves a small population of music majors (graduate and undergraduate) within such an institution, but also a large portion of the student body that use the library’s collection to support their music coursework and arts distribution requirements. any music library’s proportion of the overall system’s holdings may be relatively small, but will include materials in a diverse set of formats: monographs, serials, musical scores, sound recordings in several formats (cassette tapes, lps, cds, and streaming audio files), and a growing collection of video recordings, likewise in several formats (vhs, laser discs, and dvd). it thus offers an early test case for difficulties with an automated library system. 3. dan zager, “collection development and management,” notes—quarterly journal of the music library association 56, no. 3 (march 2000): 569. 4. sherry l. velluci, “music metadata and authority control in an international context,” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541. 5. the opac for the university of huddersfield library system famously first deployed a search option for related items (“did you mean . . . ?”); http://www.hud.ac.uk/cls (accessed july 10, 2007). frbr not only offers the related item search, but also logically groups related works throughout the library catalog. 6. allyson carlyle demonstrated empirically that users value an object’s format as one of the first distinguishing features: “user categorization of works: toward improved organization of online catalog displays,” journal of documentation 55, no. 2 (mar. 1999): 184–208 at 197. 7. millennium will feature heavily in the following discussion, both because of its position leading the academic library automation market (being adopted wholesale by, for instance, the ohio statewide academic library consortium), and because it was the subject of the original paper. 8. see alastair boyd, “the worst of both worlds: how old rules and new interfaces hinder access to music,” caml review 33, no. 3 (nov. 2005), http://www.yorku.ca/caml/ review/33-3/both_worlds.htm (accessed mar. 12, 2007); michael gorman and paul w. winkler, eds., anglo-american cataloging rules, 2nd ed. (chicago: ala, 1988). 9. in the past few years, a small subset of the search literature has described technical efforts to develop search engines that can query by musical example; see j. stephen downie, “the scientific evaluation of music information retrieval systems: foundations and future,” computer music journal 28, no. 2 (summer 2004): 12–23. a company called melodis corporation has recently announced a successful launch of a query-by-humming search engine, though a verdict from the music community remains out; http://www.midomi.com (accessed jan. 31, 2007). 10. see velluci, “music metadata and authority control in an international context”; richard p. smiraglia, “uniform titles for music: an exercise in collocating works,” cataloging and classification quarterly 9, no. 3 (1989): 97–114; steven h. wright, “music librarianship at the turn of the century: technology,” notes—quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. each author builds upon the foundational work of barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss., university of california at los angeles, 1987). 11. “at conferences, [my colleagues] are always groaning if they are a voyager client,” interview with an academic music librarian by the author, feb. 9, 2007. 12. several prominent music librarians only discovered that innovative’s system had such a feature when instances of the automatic system’s changing carefully crafted music authority records were discovered; mark sharff (washington university in st. louis) and deborah pierce (university of washington), postings to innovative music users’ group electronic discussion list, oct. 6, 2006, archive accessed feb. 1, 2007. 13. music librarians are the only subset of the millennium users to have formed their own innovate users’ group. sirsidynix has a separate users’ group for stm librarians, and ex libris hosts a law librarians’ users’ group, two other groups whose interaction with the ils poses discipline-specific challenges. 14. searches were tested on the the ohio state university libraries’ opac , http://library.osu.edu (accessed mar. 10, 2007). 15. http://www.emory.edu/libraries.cfm (accessed june 27, 2007). 16. searches performed on the library of oklahoma state university, http://www.library.okstate.edu (accessed june 27, 2007); tlc has considered making frbrization a possible feature of their product. they offer some concatenation of “intellectually similar bibliographic records,” and “tlc continues to monitor emerging frbr standards”; don kaiser, personal communication to the author, july 8, 2007. i was unable to reach representatives of sirsidynix on this issue. 17. searches performed on the mit library catalog, powered by aleph 500 http://libraries.mit.edu (accessed june 27, 2007). 18. eva verona, “literary unit versus bibliographic unit [1959],” in foundations of descriptive cataloging, ed. michael carpenter and elaine svenonius, 155–75 (littleton, colo.: libraries unlimited, 1985), and seymour lubetzky, principles of cataloging, final report phase i: descriptive cataloging (los angeles: institute for library research, 1969), are usually credited with article title | author 29frbrization of a library catalog | dickey 29 the foundational work on such theories; see richard p. smiraglia, the nature of “a work”: implications for the organization of knowledge (lanham, md.: scarecrow, 2001), 15–33, to whom the following overview is indebted. 19. anglo-american cataloging rules, cited in smiraglia, the nature of “a work,” 33. 20. among the many library and information science thinkers contributing to this body of research, the most prominent have been patrick wilson, “the second objective” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 5–16 (san diego: academic publ., 1989); edward t. o’neill and diane vizine-goetz, “bibliographic relationships: implications for the function of the catalog,” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 167–79 (san diego: academic publ., 1989); barbara ann tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss, university of california, los angeles, 1987); eadem, “bibliographic relationships,” in relationships in the organization of knowledge, carol a. bean and rebecca green, eds. , 19–35 (dordrecht: kluwer, 2001) (summary of her dissertation findings on 19–20); martha m. yee, “manifestations and near-equivalents: theory with special attention to moving-image materials,” library resources and technical services 38, no. 3 (1994): 227–55. 21. o’neill and vizine-goetz, “bibliographic relationships”; see also edward t. o’neill, “frbr: application of the entityrelationship model to humphrey clinker,” library resources and technical services 46, no. 4 (oct. 2002): 150–59. 22. theorists in music semiotics who have more or less profoundly influenced music librarians’ view of their materials include jean-jacques nattiez, music and discourse: toward a semiology of music, trans. by carolyn abbate (princeton, n.j.: princeton univ. pr., 1990), and lydia goehr, the imaginary museum of musical works (new york: oxford univ. pr., 1992). see also smiraglia, the nature of “a work,” 64. for a concise overview of how semiotic theory has influenced thinking about literary texts, see w. c. greetham, theories of the text (oxford: oxford univ. pr., 1999), 276–325. 23. studies have found families of derivative bibliographic relationships in 30.2 percent of all worldcat records, 49.9 percent of records in the catalog of georgetown university library, 52.9 percent in the burke theological library (union theological seminary), 57.9 percent of theological works in the new york university library, and 85.4 percent in the sibley music library at the eastman school of music (university of rochester). see smiraglia, the nature of “a work,” 87, who cites richard p. smiraglia and gregory h. leazer, “derivative bibliographic relationships: the work relationship in a global bibliographic database,” journal of the american society for information science 50 (1999): 493–504; richard p. smiraglia, “authority control and the extent of derivative bibliographic relationships” (ph.d. diss., university of chicago, 1992); richard p. smiraglia, “derivative bibliographic relationships among theological works,” proceedings of the 62nd annual meeting of the american society for information science (medford, n.j.: information today, 1999): 497–506; and sherry l. vellucci, “bibliographic relationships among musical bibliographic entities: a conceptual analysis of music represented in a library catalog with a taxonomy of the relationships” (d.l.s. diss., columbia university, 1994). 24. ifla, final report, 2–3. 25. ibid, 16–23. 26. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md.: scarecrow, 1997), 1. 27. ibid, 238; 251. 28. vellucci, “music metadata”; richard p. smiraglia, “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002). patrick le boeuf notes that users of music collections often use the single word “score” to indicate any one of the four frbr entities; “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco,” in functional requirements for bibliographic records (frbr): hype or cure-all? ed. patrick le boeuf, 103–23 at 105–06 (new york: haworth, 2005). 29. smiraglia, “musical works and information retrieval,” 2. 30. marte brenne, “storage and retrieval of musical documents in a frbr-based library catalogue” (masters’ thesis, oslo university college, 2004), 79. see also john anderies, “enhancing library catalogs for music,” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; powerpoint presentation accessed mar. 12, 2007, from http://academics. hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt; boyd, “the worst of both worlds.” 31. see the extensive bibliography compiled by ifla, cataloging division: “frbr bibliography,” http://www.ifla.org/ vii/s13/wgfrbr.bibliography.htm (accessed mar. 10, 2007). 32. the first ils deployment of the worldcat local application using frbr is with the university of washington libraries: http://www.lib.washington.edu (accessed june 27, 2007). 33. innovative interfaces, inc., “millennium 2005 preview: frbr support,” inn-touch (june 2004), 9. interestingly, the onepage advertisement for the new service chose a musical work, puccini’s opera la bohème, to illustrate how the sorting would work. innovative interfaces booth staff at the ala national conference, washington, d.c., june 24, 2007, told the author the company has moved in a different development direction now (investing more heavily in faceted browsing). 34. denmark’s det kongelige bibliotek has been the first ex libris partner library to deploy primo, http://www.kb.dk/en (accessed july 10, 2007). the vtls system has been operating since 2004 at the université catholique de louvain, http:// www.bib.ucl.ac.be (accessed mar. 15, 2007). for austlit, see http://www.austlit.edu.au (accessed mar. 14, 2007). 35. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. work-level records allow manifestation and item records to inherit labor-intensive subject classification metadata; eric childress, “frbr and oclc research,” paper presented at the university of north carolina-chapel hill, apr. 10, 2006, http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). 36. thomas b. hickey, edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002), http://www.dlib.org/dlib/september02/hickey/09hickey.html (accessed mar. 12, 2007). 37. thomas b. hickey and jenny toves, “frbr work-set algorithm,” apr. 2005 report, http://www.oclc.org/research/ projects/frbr/default.htm (accessed mar. 12, 2007); algorithm 30 information technology and libraries | march 200830 information technology and libraries | march 2008 available at http://www.oclc.org/research/projects/frbr/algorithm.htm. on worldcat local, see above, note 32. 38. merrilee proffitt, “redlightgreen: frbr between a rock and a hard place,” http://www.ala.org/ala/alcts/alctsconted/ presentations/proffitt.pdf (accessed mar. 12, 2007). redlight green has been discontinued, and some of its technology incorporated into worldcat local. 39. http://www.austlit.edu.au (accessed mar. 14, 2007), but unfortunately a subscription database at this time, and thus unavailable for operational comparison. see marie-louise ayres, “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia,” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54, http:// www.nla.gov.au/nla/staffpaper/2005/ayres1.html (accessed mar. 12, 2007). 40. ibid. 41. see david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/ crane/10crane.html (accessed mar. 12, 2007). 42. ibid. see also martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 3 (2005): 77–95, http://repositories.cdlib.org/postprints/715 (accessed mar. 12, 2007). 43. portia, “visualcat overview,” http://www.portia.dk/ pubs/visualcat/present/visualcatoverview20050607.pdf (accessed mar. 14, 2007); vtls, inc., “virtua,” http://www.vtls. com/brochures/virtua.pdf (accessed mar. 14, 2007). 44. http://www.exlibrisgroup.com/primo_orig.htm (accessed july 10, 2007). 45. syed ahmed, personal communication to the author, july 10, 2007; searches run july 10, 2007, on http://www.kb.dk/en. the library’s holdings of manifestations of mozart’s singspiel opera, the magic flute, run to four different groupings on this catalog: one under the title “die zauberflöte,” one under the title “la flute enchantée: opéra fantastique en 4 actes,” and two separate groups under the title “tryllefløtjen.” 46. “vtls announces first production use of frbr,” http:// www.vtls.com/corporate/releases/2004/6.shtml (accessed mar. 14, 2007). unfortunately, though this press release indicates commitments on the part of the université catholique de louvain and vaughan public libraries (ontario, canada) to use fully frbrized catalogs, only the first is operating in this mode as of july 2007, and with only a subset of its catalog adapted. 47. virtua is not interoperable, for instance, with any of innovative’s other ils modules, which continue to dominate a number of larger academic consortia; john espley, vtls inc. director of design, personal communication to the author, mar. 15, 2007. 48. see allyson carlyle, “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. 49. even at the work-level, yee distinguished fully eight different places in a marc record in which the identity of a work may be located, “frbrization,” 79–80. 50. gregory leazer and richard p. smiraglia imply that cataloger-based “maps” of bibliographic relationships are inadequate; “bibliographic families in the library catalog: a qualitative analysis and grounded theory,” library resources and technical services 43, no. 4 (1999): 191–212. the cataloging failures they describe, however, are more a result of inadequacies in the current rules and practice, and do not really prove that catalogers have failed in the task of creating useful systems. 51. vinood chacra and john espley, “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval,” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). 52. see yee, “frbrization.” 53. http://www.bib.ucl.ac.be (accessed mar. 15, 2007). 54. not only does the ex libris primo application need clickthroughs, it creates a new window for an extra step before presenting a new group of records. bibliography anderies, john. “enhancing library catalogs for music.” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; http://academics.hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt (accessed mar. 12, 2007). ayres, marie-louise. “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia.” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54; http://www.nla.gov.au/nla/staffpaper/2005/ ayres1.html (accessed mar. 12, 2007). bennett, rick, brian f. lavoie, and edward t. o’neill. “the concept of a work in worldcat: an application of frbr.” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. boyd, alistair. “the worst of both worlds: how old rules and new interfaces hinder access to music.” caml review 33, no. 3 (nov. 2005); http://www.yorku.ca/caml/review/33-3/ both_worlds.htm (accessed mar. 12, 2007). brenne, marte. “storage and retrieval of musical documents in a frbr-based library catalogue.” masters’ thesis, oslo university college, 2004. carlyle, allyson. “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. ______. “user categorization of works: toward improved organization of online catalog displays.” journal of documentation 55, no. 2 (mar. 1999): 184–208 chacra, vinood, and john espley. “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval.” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). childress, eric. “frbr and oclc research.” paper presented at the university of north carolina-chapel hill, apr. 10, 2006; http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). hickey, thomas b., and edward o’neill. “frbrizing oclc’s worldcat.” in functional requirements for bibliographic records article title | author 31frbrization of a library catalog | dickey 31 (frbr): hype or cure-all? ed. patrick le boeuf, 239-251. new york: haworth, 2005. hickey, thomas b., and jenny toves. “frbr work-set algorithm.” apr. 2005 report; http://www.oclc.org/research/ frbr (accessed mar. 12, 2007). hickey, thomas b., edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002); http://www.dlib.org/dlib/september02/hickey/09hickey. html (accessed mar. 12, 2007). ifla study group on the functional requirements for bibliographic records. functional requirements for bibliographic records: final report. munich: k. g. saur, 1998. layne, sara shatford. “subject access to art images.” in introduction to art image access: issues, tools, standards, strategies, murtha baca, ed., 1–18. los angeles: getty research institute, 2002. leazer, gregory, and richard p. smiraglia. “bibliographic families in the library catalog: a qualitative analysis and grounded theory.” library resources and technical services 43, no. 4 (1999): 191–212. le boeuf, patrick. “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco.” in functional requirements for bibliographic records (frbr): hype or cure-all? patrick le boeuf, ed., 103–23 new york: haworth, 2005. markey, karen. subject access to visual resources collections: a model for computer construction of thematic catalogs. new york: greenwood, 1986. mimno, david, and gregory crane. “hierarchical catalog records: implementing a frbr catalog.” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/crane/10crane. html (accessed mar. 12, 2007). o’neill, edward t. “frbr: application of the entity-relationship model to humphrey clinker.” library resources and technical services 46, no. 4 (oct. 2002): 150–59. o’neill, edward t., and diane vizine-goetz. “bibliographic relationships: implications for the function of the catalog.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 167–79. san diego: academic publ., 1989. proffitt, merrilee. “redlightgreen: frbr between a rock and a hard place.” paper presented at the 2004 ala annual conference, orlando, fla.; http://www.ala.org/ala/alcts/alctsconted/presentations/proffitt.pdf (accessed mar. 12, 2007). smiraglia, richard p. bibliographic control of music, 1897–2000. lanham, md.: scarecrow and music library association, 2006. ______. “content metadata: an analysis of etruscan artifacts in a museum of archaeology.” cataloging and classification quarterly, 40, no. 3/4 (2005): 135–51. ______. “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002): 747–64. ______. the nature of “a work”: implications for the organization of knowledge. lanham, md.: scarecrow, 2001. ______. “uniform titles for music: an exercise in collocating works.” cataloging and classification quarterly 9, no. 3 (1989): 97–114. tillett, barbara ann. “bibliographic relationships.” in relationships in the organization of knowledge. carol a. bean and rebecca green, eds., 19–35. dordrecht: kluwer, 2001. vellucci, sherry l. bibliographic relationships in music catalogs. lanham, md.: scarecrow, 1997. ______. “music metadata and authority control in an international context.” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541–54. wilson, patrick. “the second objective.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 5–16. san diego: academic publ., 1989. wright, h. s. “music librarianship at the turn of the century: technology.” notes: quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. yee, martha m. “frbrization: a method for turning online public finding lists into online public catalogs.” information technology and libraries 24, no. 3 (2005): 77–95; http://repositories.cdlib.org/postprints/713 (accessed mar. 12, 2007). ______. “manifestations and near-equivalents: theory with special attention to moving-image materials.” library resources and technical services 38, no. 3 (1994): 227–55. zager, daniel. “collection development and management.” notes: quarterly journal of the music library association 56, no. 3 (2000): 567–73. 32 information technology and libraries | march 200832 information technology and libraries | march 2008 a search on also sprach zarathustra on the online public access catalog for the universite catholique de louvain, with results frbrized. (a vtls opac). selecting the first work yields the following screen: . . . which, when frbrized, yields a list of expressions. any part of the tree may be expanded, to display manifestations, and item-level records follow. appendix: examples of a frbrized tree display 2 information technology and libraries | september 2007 w elcome to my first ital president’s column. each president only gets a year to do these col­ umns, so expectations must be low all around. my hope is to stimulate some thinking and conversation that results in lita members’ ideas being exchanged and to create real opportunities to implement those ideas. my first column i thought i would keep short and sweet, and discuss just a few of the ideas that have been rattling around in my head since the 2007 midwinter lita town meeting, which have been enhanced by a number of discussions among librarians over the last six months. with any luck, these thoughts might have some bearing on what any of those ideas could mean to our organization. first off, i don’t think i can express how weird this whole presidential appellation is to me. i am extremely proud to be associated with lita, and honored and surprised at being elected. i come from a consortia envi­ ronment and an extremely flat organization. solving problems is often a matter of throwing all the parties in a room together and hashing it out until solutions are arrived at. i’ve been a training librarian for quite a while now, and pragmatic approaches to problem solving are my central focus. i’m a consortia wrangler, a trainer, and a technology pusher, and i hope my approach is, and will be, to listen hard and then see what can be accomplished. so in my own way, i find being president kind of on the embarrassing side. it’s like not knowing what to do with your hands when you’re speaking in public. at the lita town meeting (http://litablog .org/2007/06/17/lita­town­meeting­2007­report/) it was pretty obvious that members want community in all its various forms, face­to­face in multiple venues and online in multiple venues. it’s also pretty obvious from the studies done by pew internet and american life and by oclc that our users, and in particular our younger users, really want community. the web 2.0 and the library 2.0 movements are responses to that desire. as a somewhat flippant observation, we spent a generation educating our kids to work in groups, and now we shouldn’t be sur­ prised that they want to work and play in groups. many of us work effectively in collaborative groups everyday. we find it exciting, productive, and even fun. it’s an environment that we would like to create for our patrons, in­house and virtually. it’s what we would like to see in our association. having been to every single top tech trends program and listened to the lita trendsters, one theme that often comes up is that complaining about the systems our ven­ dors deliver can at times be pointless, because they sim­ ply deliver what we ask for. there is of course a corollary to this. once a system is in the marketplace, adding func­ tionality often becomes centered around the low­hang­ ing fruit. as a fictitious example, a vendor might easily add the ability to change the colors of the display to the patron, but adding a shelf list browse might take serious coding to create. so through discussions and rfp, we ask for and get the pretty colors while the browsing function waits, a form of procrastination. so then does innovation come only when all the low­hanging fruit has finally been plucked, and there’s nothing else to procrastinate on? as social organizations, libraries, ala, lita and other groups, it appears that we have plucked all the low­hanging fruit of web 1.0. e­mail and static web pages have been done to death. as a pragmatist, what concerns me most is implementation. what delivery systems should and can we adopt and develop to fulfill the promise of services we’d like? can we ensure that barriers to participation are either eliminated or so low as to include everyone? i like to think that web 2.0 is innovation toward mirroring how we personally want to work and play and how we want our social structures to perform. so how can we make lita mirror how we want to work and play? i do know it’s not just making everything a wiki. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s column mark beatty editorial i think that writing editorials in my job as the new editor of information technology and libraries (ital) is going to be a real piece of cake. all i have to do, dear readers, is to quote (with proper attribution) walt crawford, the title of whose book i repeat as the title of this, my inaugural editorial.1 and then quote other sages of our profession, using only as many of their words as is fitting and proper to make my editorials relevant to the concerns of our membership and readers and as few of my own words as i can to repay the confidence that the library information and technology association (lita) has placed in me— and to avoid muddling the ideas of those to whom i shall be indebted. those of you reading this will note that i have already fallen prey to the conceit of all scholarly journal editors: that their readers, of course, after surveying the tables of contents, dive wide-eyed first into the editorials. of course. to paraphrase a technologist of an earlier era, “when in the course of human events, it becomes necessary for” a new editor to take on the responsibility for the stewardship of ital, “a decent respect to the opinions of mankind requires that” he “should declare the causes which impel” him to accept that responsibility and, further, to write editorials. i quote, of course, from the first paragraph of the declaration of independence adopted by the “thirteen united states of america” july 4, 1776. in this, my first editorial, i, too, shall put forth for the examination of the members of lita and the readers of ital my goals and hopes for the journal that i am now honored to lead. these goals and hopes are shared by the members of the ital editorial board, whose names appear in the masthead of this journal. ital is a double-blind refereed journal that currently has a manuscript acceptance rate of 50 percent. it began in 1968 as the journal of library automation (jola), the journal of the information science and automation division (isad) of ala, and its first editor was fred kilgour. in 1978 isad became lita, and in 1982, the journal title was changed to reflect the expanding role of information technology in libraries, an expansion that continues to accelerate so that ital is no longer the only professional journal within ala whose pages are now dominated by our accelerating use of information technologies as tools to manage the services we provide our users and as tools we use ourselves to accomplish our daily duties. i write part of this editorial in the skies over the middle section of the united states as i return home from the seventh national lita forum held in st. louis, october 7–10. at the forum, i heard presentations, visited poster sessions, and talked with colleagues from forty-four states and six countries who had something to say and said it well. i hope that some of them may submit manuscripts to ital so that all the members of lita and all the readers of the journal will profit as well from some of what the attendees of the forum heard and saw. i attended the forum forewarned by previous ital editors to carry plenty of business cards, and i went armed with a pocketful. i think i distributed enough that, if pieced together, their blank sides would provide sufficient writing space for at least one manuscript! in an attempt to fulfill the jeffersonian promise above, i hereby list a few of my goals for the beginning of my term as editor. i must emphasize that these goals of mine supplement but do not supplant the purposes of the journal as stated on the first page and on the ital web site (www.ala.org/lita/litapublications/ital/italinformation. htm); likewise, they do not supplant the goals of my predecessors. in no particular order: i hope to increase the number of manuscripts received from our library and information schools. their faculty and doctoral students are some of the incubators of new and exciting information technologies that may bear fruit for future library users. however, not all research turns up maps on which “x marks the spot.” exploration is interesting, even vital, for the journey, for the search itself, and our graduate faculties and students have something to say. i hope to increase the submission of manuscripts that describe relevant sponsored research. in the earlier volumes, jola had an average of at least one article per issue, maybe more, describing the results of funded research. ital can and should be a source that information-technology researchers consider as a vehicle for the publication of their results. two articles in this issue result from sponsored research. in fact, i hope to increase the number of manuscripts that describe any relevant research or cutting-edge developments. much of the exploration undertaken by librarians improving and strengthening their services involves research or problems solved on both small scales and large. neither the officers of lita, the referees, the readers, nor i are interested in very many “how i run my library good” articles. we all want to read a statement of the problem(s), the hypotheses developed to explore the issues surrounding the problem(s), the research methods, the results, the assessment of the outcomes, and, when feasible, a synthesis of how the research methods or results may be generalized. i hope to increase the number of articles with multiple authors. libraries are among society’s most cooperative institutions and librarians, members of one of the most cooperative of professions. the work we do is rarely that of solitary performers, whether it be research or the editorial | webb 3 editorial: first have something to say john webb john webb (jwebb@wsu.edu) is assistant director for digital services/collections, washington state university libraries, pullman, and editor of information technology and libraries. (continued on page 21) __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 discovery: what do you mean by that? | carter 161 judith carter editorial board thoughts: issue introduction discovery: what do you mean by that? m wuah ha ha ha haaa! finally it’s my turn. i hold the power of the editorial. (can you tell i’m writing this around halloween?) seriously now, i’ve been intimately and extensively involved with information technology and libraries for eleven years, yet this is the first time i’ve escaped from behind the editing scenes to address the readership directly. as managing editor for seven of the eleven volumes (18–22 and 27–28) and an editorial board member reviewing manuscripts (vols. 23–26), i am honored marc agreed to let me be guest editor for this theme issue. this issue is a compilation of presentations from the discovery mini-conference held at the university of nevada las vegas (unlv) libraries in the spring of 2009. the first article by jennifer fabbi gives the full chronology and framework of the project, but i have the pleasure of introducing this issue and topic by virtue of my role as guest editor, as well as my own participation in the miniconference before i left unlv in july 2009. n what is discovery? when the dean of libraries, patricia iannuzzi, announced that unlv would have a libraries-wide, poster-session style discovery mini-conference, jennifer fabbi and i decided we wanted to be part of it. we had already been exploring various aspects of discovery as part of an organizational focus as well as following up on a particular event that happened earlier in the year. while serving on a search committee, we posed a question to all the candidates: “what do you see the library catalog looking like in the future? what do you see as the relationship between the library catalog and other access or discovery tools?” one of the candidates had such a unique answer that it got us thinking: are we all talking about the same thing when we discuss discovery? the mini-conference gave us the opportunity to explore the idea further. an all-library summit that preceded the mini-conference announcement had focused on users finding known items. we knew that discovery was so much more and that it depended on the users’ needs. of course, first we went to multiple online dictionaries to look up the meanings of “discovery” and found the following definitions: n something learned or found; something new that has been learned or found n the process of learning something; the fact or process of finding out about something for the first time n the process of finding something; the process or act of finding something or somebody unexpectedly or after searching we also looked at famous quotes about discovery. there were some of our favorites: a discovery is said to be an accident meeting a prepared mind. —albert szent-gyorgyi education is a progressive discovery of our own ignorance. —will durant next, a colleague recommended we look at chang’s browsing theory.1 this theory covered the broad spectrum of how users seek information and showed a more serendipitous view than the former focus of known item search. obviously, browsing implies a physical interaction with a collection, so we reframed the themes to fit discovery in the “every-library” electronic information environment. chang’s five browsing themes, adapted to discovery: n looking for a specific item, to locate n looking for something with common characteristics, to find “more like this” n keeping up-to-date, to find out what’s new in a field, topic or intellectual area n learning or finding out, to define or form a research question n goal-free, to satisfy curiosity or be entertained.2 all interesting information, but a little theoretical for a visual presentation. to make these themes more concrete and visual, i suggested we apply them to personas as described in one of my favorite books, the inmates are running the asylum.3 this encourages programmers to create a user with a full backstory and then design a product for their needs. to do this in an entertaining way, we identified five types of users we’ve encountered in our libraries and described an information-seeking need for each. i then created some colorful and representational characters using a well-known, alliteratively named candy’s website. our five characters were 1. mina, stylishly dressed and always carries a cell phone, is an undergraduate who rarely uses the library. she has a sociology class library assignment to find information on the cell phone habits of generation x. 2. ms. lvite lives in the las vegas area and contributes to the library. she is a regular from the community judith carter (jcarter.mls@gmail.com) is head of technical services at marquette university raynor memorial libraries, milwaukee, wi and managing editor of ital. 162 information technology and libraries | december 2009 who likes to dig into everything the library owns about small mining towns in nevada. 3. dr. prof is a faculty member with a slightly outdated wardrobe but a thirst for knowledge. he wants to know what books have been published in his field of quantum bowtie mechanics by any of his colleagues across the country. 4. phdead tired is a slightly mussed grad student who is always in the library clutching a cup of coffee. he needs to narrow down his dissertation topic. 5. duuuuude is an energetic, sociable young man who likes to hang out in the library with his friends. he has some time to kill on the computer. on our poster, we asked the discovery miniconference attendees to place cutouts of our personas on a pie chart divided into the five themes of discovery. jennifer and i expected certain placements and were pleasantly surprised when our attendees challenged our assumptions with alternate possibilities. another section of the poster related discovery behaviors to specific electronic discovery tools. we provided a few and asked the attendees to add others (see table 1). while talking with each attendee, we provided a bookmark listing the five discovery behaviors (with colorful character personas) and suggested they keep them in mind as they visited the other conference sessions. we challenged them to identify what user behaviors the other presenters’ systems or services were targeting. the message jennifer and i hoped to convey with our poster was this: the way we think about discovery, or the users’ goals in finding information, drives the discovery table 1. relating discovery behaviors to electronic discovery tools user wants . . . provide the user . . . other tools?* to find a specific item search by title, author, or call number (e.g., libraries’ webopac) search a database worldcat flickr google books to find items with common characteristics items linked by subject headings, format, or other elements; tag clouds; federated search for article databases (e.g., webopac, encore, article databases) flickr summon twine delicious to be kept up-to-date recently added items by subject; integration of blogs for news or updates (e.g., new books list, libguides, encore “recently added”) blogs rss feeds apple itunes amazon readers advisory authors/musicians websites newspapers online to learn more about something general information that provides context, reviews (e.g., wikipedia, google, encore community reviews) dissertation abstracts encyclopedias database of databases (for context) peer to peer: delicious, social tagging to satisfy curiosity or be entertained surfing the web, multimedia, social networking (e.g., google, youtube, facebook) myspace world of warcraft second life podcasts wikipedia “random article” feature * ideas generated at the discovery mini-conference discovery: what do you mean by that? | carter 163 systems we have or will create. as you read through this issue, i hope you’ll see some new ways to think about discovery and that those ways will fuel this audience’s potential to create new tools. what follows is a textual walk around our miniconference. taken as individual articles, each might not look like what you are used to seeing in ital. taken as a whole that grew out of the process, these articles are what makes this a special issue. as i said before, jennifer fabbi provides the background and process for the discovery mini-conference. then, alex dolski describes a prototype multipac discovery system he created and demonstrated, and he discusses the issues surrounding the design of such a system. tom ipri, michael yunkin, and jeanne brown, as members of the usability working group, had already been conducting testing on unlv libraries’ website. they share their methods, findings, and results with us. thomas sommer presents a look at what the special collections department has implemented to aid discovery of their unique materials. wendy starkweather and eva stowers used the mini-conference as an opportunity to research how other libraries are providing discovery opportunities to students via smartphones. patrick griffis describes his work with free screen capture tools to build pathfinders to promote resource discovery. patrick griffis and cyrus ford each looked at enhancing catalog records, so they combined their two presentations here to describe ways to enrich the online catalog to better aid our users’ success. references 1. shan-ju chang, “chang’s browsing,” in theories of information behavior, by karen e. fisher, sanda erdelez, and lynne mckechnie (medford, n.j.: information today, 2005): 69–74. 2. ibid., 71–72. 3. alan cooper, the inmates are running the asylum, (indianapolis, ind.: sams, 1999). personas are described in chapter 9. figure 1. “initial thoughts” and “five general themes of discovery behavior” panel from the discovery mini-conference poster 26 information technology and libraries | september 2007 author id box for 2 column layout wikis in libraries matthew m. bejune wikis have recently been adopted to support a variety of collaborative activities within libraries. this article and its companion wiki, librarywikis (http://librarywikis. pbwiki.com/), seek to document the phenomenon of wikis in libraries. this subject is considered within the framework of computer-supported cooperative work (cscw). the author identified thirty-three library wikis and developed a classification schema with four categories: (1) collaboration among libraries (45.7 percent); (2) collaboration among library staff (31.4 percent); (3) collaboration among library staff and patrons (14.3 percent); and (4) collaboration among patrons (8.6 percent). examples of library wikis are presented within the article, as is a discussion for why wikis are primarily utilized within categories i and ii and not within categories iii and iv. it is clear that wikis have great utility within libraries, and the author urges further application of wikis in libraries. i n recent years, the popularity of wikis has skyrocketed. wikis were invented in the mid­1990s to help facilitate the exchange of ideas between computer programmers. the use of wikis has gone far beyond the domain of com­ puter programming, and now it seems as if every google search contains a wikipedia entry. wikis have entered into the public consciousness. so, too, have wikis entered into the domain of professional library practice. the purpose of this research is to document how wikis are used in librar­ ies. in conjunction with this article, the author has created librarywikis (http://librarywikis.pbwiki.com/), a wiki to which readers can submit additional examples of wikis used in libraries. the article will proceed in three sections. the first section is a literature review that defines wikis and introduces computer­supported cooperative work (cscw) as a context for understanding wikis. the second section documents the author’s research and presents a schema for classifying wikis used in libraries. the third section considers the implications of the research results. ■ literature review what’s a wiki? wikipedia (2007a) defines a wiki as: a type of web site that allows the visitors to add, remove, edit, and change some content, typically with­ out the need for registration. it also allows for linking among any number of pages. this ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring. wikis have been around since the mid­1990s, though it is only recently that they have become ubiquitous. in 1995, ward cunningham launched the first wiki, wikiwikiweb (http://c2.com/cgi/wiki), which is still active today, to facilitate the exchange of ideas among computer program­ mers (wikipedia 2007b). the launch of wikiwikiweb was a departure from the existing model of web communica­ tion ,where there was a clear divide between authors and readers. wikiwikiweb elevated the status of readers, if they so chose, to that of content writers and editors. this model proved popular, and the wiki technology used on wikiwikiweb was soon ported to other online communi­ ties, the most famous example being wikipedia. on january 15, 2001, wikipedia was launched by larry sanger and jimmy wales as a complementary project for the now­defunct nupedia encyclopedia. nupedia was a free, online encyclopedia with articles written by experts and reviewed by editors. wikipedia was designed as a feeder project to solicit new articles for nupedia that were not submitted by experts. the two services coexisted for some time, but in 2003 the nupedia servers were shut down. since its launch, wikipedia has undergone rapid growth. at the close of 2001, wikipedia’s first year of operation, there were 20,000 articles in eighteen language editions. as of this writing, there are approximately seven million articles in 251 languages, fourteen of which have more than 100,000 articles each. as a sign of wikipedia’s growth, when this manuscript was first submitted four months earlier, there were more than five million articles in 250 languages. author’s note: sources in the previous two para­ graphs come from wikipedia. the author acknowledges the concerns within the academy regarding the practice of citing wikipedia within scholarly works; however, it was decided that wikipedia is arguably an authoritative source on wikis and itself. nevertheless, the author notes that there were changes—insubstantial ones—to the cited wikipedia entries between when the manuscript was first submitted and when it was revised four months later. wikis and cscw wikis facilitate collaborative authoring and can be con­ sidered one of the technologies studied under the domain of cscw. in this section, cscw is explained and it is shown how wikis fit within this framework. cscw is an area of computer science research that considers the application of computer technology to sup­ port cooperative, also referred to as collaborative work. the term was first coined in 1984 by irene greif (1988) and matthew m. bejune (mbejune@purdue.edu) is an assistant professor of library science at purdue university libraries. he also is a doctoral student at the graduate school of library and information science, university of illinois at urbana-champaign. article title | author 27wikis in libraries | bejune 27 paul cashman to describe a workshop they were planning on the support of people in work environments with com­ puters. over the years there have been a number of review articles that describe cscw in greater detail, including bannon and schmidt (1991), rodden (1991), schmidt and bannon (1992), sachs (1995), dourish (2001), ackerman (2002), olson and olson (2002), dix, finlay, abowd, and beale (2004), and shneiderman and plaisant (2005). publication in the field of cscw primarily occurs through conferences. the first conference on cscw was held in 1986 in austin, texas. since then, the conference has been held biennially in the united states. proceedings are published by the association for computing machinery (acm, http://www.acm.org/). in 1991, the first european conference on computer supported cooperative work (ecscw) was held in amsterdam. ecscw also is held biennially, in odd­numbered years. ecscw proceedings are published by springer (http://www.ecscw.uni­sie­ gen.de/). the primary journal for cscw is computer supported cooperative work: the journal of collaborative computing. publications also appear within publications of the acm and chi, the conference on human factors in computing. cscw and libraries as libraries are, by nature, collaborative work envi­ ronments—library staff working together and with patrons—and as digital libraries and computer technolo­ gies become increasingly prevalent, there is a natural fit between cscw and libraries. the following researchers have applied cscw to libraries. twidale et al. (1997) pub­ lished a report sponsored by the british library research and innovation centre that examined the role of col­ laboration in the information­searching process to inform how information systems design could better address and support collaborative activity. twidale and nichols (1998) offered ethnographic research of physical collaborative environments—in a university library and an office—to aid the design of digital libraries. they wrote two reviews of cscw as applied to libraries—the first was more com­ prehensive (twidale and nichols 1998) than the second (twidale and nichols 1999). sánchez (2001) discussed collaborative environments designed and prototyped for digital library environments. classification of collaboration technologies that facilitate collaborative work are typically classified within cscw across two continua: synchronous versus asynchronous, and co­located versus remote. if put together in a two­by­two matrix, there are four possibilities: (1) synchronous and co­located (same time, same place); (2) synchronous and remote (same time, different place); (3) asynchronous and remote (different time, different place); and (4) asynchronous and co­located (different time, same place). this classification schema was first proposed by johansen et al. (1988). nichols and twidale (1999) mapped work applications within the realm of cscw in figure 1. wikis are not present in the figure, but their absence is not an indication that they are not cooperative work technologies. rather, wikis were not yet widely in use at the time cscw was considered by nichols and twidale. the author has added wikis to nichols and twidale’s graphical representation in figure 2. interestingly, wikis are border­crossers fitting within two quadrants: the upper right—asynchronous and co­located; and the lower right—asynchronous and remote. wikis are asynchronous in that they do not require people to be working together at the same time. they are both co­located and remote in that people working collaboratively may not need to be working in the same place. it is also interesting to note that library technologies also can be mapped using johansen’s schema. nichols and twidale (1999) also mapped this, and figure 3 illus­ trates the variety of collaborative work that goes on within libraries. ■ method in order to to discover the widest variety of wikis used in libraries, the author searched for examples of wikis used in libraries within three areas—the lis literature, the library success wiki, and within messages posted on three professional electronic discussion lists. when examples were found, they were logged and classified according to a schema created by the author. results are presented in the next section. the first area searched was within the lis literature. the author utilized the wilson library literature and figure 1. classification of cscw applications co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing team rooms organizational memory workflow web-based applications collaborative writing 2� information technology and libraries | september 20072� information technology and libraries | september 2007 information science database. there were two main types of articles: ones that argued for the use of wikis in libraries, and ones that were case studies of wikis that had been implemented. the second area searched was within library success: a best practices wiki (http://www.libsuccess.org/) (see figure 4), created by meredith farkas, distance learning librarian at norwich university. as the name implies, it is a place for people within the library community to share their success stories. posting to the wiki is open to the public, though registration is encouraged. there are many subject areas on the wiki, including management and leadership, readers’ advisory, reference services, infor­ mation literacy, and so on. there also is a section about collaborative tools in libraries (http://www.libsuccess .org/index.php?title=collaborative_tools_in_libraries), in which examples of wikis in libraries are presented. within this section there is a presentation about wikis made by farkas (2006) titled wiki world (http://www. libsuccess.org/indexphp?title=wiki_world), from which examples were culled. the third area that was searched was professional electronic discussion list messages from web4lib, dig_ ref, and libref­l. the web4lib electronic discussion list (tennant 2005) is “for the discussion of issues relating to the creation, management, and support of library­ based world wide web servers, services, and applica­ tions.” the list is moderated by roy tennant and the web4lib advisory board and was started in 1994. the dig_ref electronic discussion list is a forum for “people and organizations answering the questions of users via the internet” (webjunction n.d.). the list is hosted by the information institute of syracuse, school of information studies, syracuse university, and was created in 1998. the libref­l electronic discussion list is “a moderated discussion of issues related to reference librarianship (balraj 2005). established in 1990, it’s operated out of kent state university and moderated by a group of list own­ ers. these three electronic discussion lists were selected for two reasons. first, the author is a subscriber to each electronic discussion list, and prior to the research noted there were messages about wikis in libraries. second, based on the descriptions of each electronic discussion list stated above, the selected electronic discussion lists reasonably covered the discussion of wikis in libraries within the professional library electronic discussion lists. one year of messages, november 15, 2005, through november 14, 2006, was analyzed for each list. messages about wikis in libraries were identified through key­ word searches against the author’s personal archive of electronic discussion list messages collected over the figure 2. classification of cscw applications including wikis co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing wikis team rooms wikis organizational memory workflow web-based applications collaborative writing figure 3. classification of collaborative work within libraries co-located remote synchronous asynchronous personal help reference interview issue of book on loan fact-to-face interactions use of opacs database search video conferencing telephone notice boards post-it notes memos documents for study social information filtering e-mail, voicemail distance learning postal services figure �. library success: a best practices wiki (http://www. libsuccess.org/) article title | author 29wikis in libraries | bejune 29 years. an alternative method would have been to search the web archive of each list, but the author found it easier to search within his mail client, microsoft outlook. messages with the word “wiki” were found in 513 mes­ sages: 354 in web4lib, 91 in dig_ref, and 68 in libref­ l. this approach had high recall, as discourse about wikis frequently included the use of the word “wiki,” though low precision, as there were many results that were not about wikis used in libraries. common false hits included messages about the nature study (giles 2005) that com­ pared wikipedia to encyclopedia britannica, and messages that included the word “wiki” but were simply refer­ ring to wikis, though not examples of wikis used within libraries. from the list of 513 messages, the author read each message and came up with a much shorter list of thirty­nine messages about wikis in libraries: thirty­two in web4lib, three in dig_ref, and four in libref­l. ■ results classification of the results after all wiki examples had been collected, it became clear that there was a way to classify the results. in farkas’s (2006) presentation about wikis, she organized wikis in two categories: (1) how libraries can use wikis with their patrons; and (2) how libraries can use wikis for knowledge sharing and collaboration. this schema, while it accounts for two types of collaboration, is not granular enough to represent the types of collaboration found within the wiki examples identified. as such, it became clear that another schema was needed. twidale and nichols (1998) identified three types of collaboration within libraries: (1) collaboration among library staff; (2) collaboration between a patron and a member of staff; and (3) collaboration among library users. their classification schema mapped well to the examples of wikis that were identified; however, it too was not granular enough, as it did not distinguish among col­ laboration between library staff intraorganizationally and extraorganizationally, the two most common types of wiki usage found in the research (see appendix). to account for these types of collaboration, which are common not only to wiki use in libraries but to all professional library prac­ tice, the author modified twidale and nichols schema (see figure 6). the improved schema also uniformly represents entities across the categories—library staff and member of staff are referred to as “library staff,” and patrons and library users are referred to as “patrons.” examples of wikis used in libraries for each category are provided to better illustrate the proposed classifica­ tion schema. ■ collaboration among libraries the library instruction wiki (http://instructionwiki .org/main_page) is an example of a wiki that is used for collaboration among libraries (figure 7). it appears as though the wiki was originally set up to support library instruction within oregon—it is unclear if this was asso­ ciated with a particular type of library, say academic or public—but now the wiki supports library instruction in general. the wiki is self­described as: a collaboratively developed resource for librarians involved with or interested in instruction. all librarians and others interested in library instruction are welcome and encouraged to contribute. the tagline for the wiki is “stop reinventing the wheel”(library instruction wiki 2006). from this wiki there figure 6. four types of collaboration within libraries 1. collaboration among libraries (extra-organizational) 2. collaboration among library staff (intra-organizational) 3. collaboration among library staff and patrons 4. collaboration among patrons figure 5. wiki world (http://www.libsuccess.org/index.php?title=wiki _world) 30 information technology and libraries | september 200730 information technology and libraries | september 2007 is a list of library instruction resources that include the fol­ lowing: handouts, tutorials, and other resources to share; teaching techniques, tips, and tricks; class­specific web sites and handouts; glossary and encyclopedia; bibliography and suggested reading; and instruction­related projects, brainstorms, and documents. within the handouts, tutori­ als, and other resources to share section, the author found a wide variety of resources from libraries across the country. similarly, there were a number of suggestions to be found under the teaching techniques, tips, and tricks section. another example of a wiki used for collaboration among libraries is the library success wiki (http://www .libsuccess.org/), one of the sources of examples of wikis used in this research. adding to earlier descriptions of this wiki as presented in this paper, library success seems to be one of the most frequently updated library wikis and perhaps the most comprehensive in its cover­ age of library topics. ■ collaboration among library staff the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) is an example of a wiki used for col­ laboration among library staff (figure 8). this wiki is a knowledge base containing more than one thousand infor­ mation technology services (its) documents. its docu­ ments support the information technology needs of the library organization. examples include answers to com­ monly asked questions, user manuals, and instructions for a variety of computer operations. in addition to being a repository of its documents, the wiki also serves as a portal to other wikis within the university of connecticut libraries. there are many other wikis connected to library units; teams; software applications, such as the libraries ils; libraries within the university of connecticut libraries; and other university of connecticut campuses. the health science library knowledge base, stony brook university (http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome) is another example of a wiki that is used for collaboration among library staff (figure 9). the wiki is described as “a space for the dynamic collaboration of the library staff, and a platform of shared resources” (health sciences library 2007). on the wiki there are the following content areas: news and announcements; hsl departments; projects; trouble­ shooting; staff training resources, working papers and support materials; and community activities, scholarship, conferences, and publications. ■ collaboration among library staff and patrons there are only a few examples of wikis used for collabora­ tion among library staff and patrons to cite as exemplars. one example is the st. joseph county public library (sjpl) subject guides (http://www.libraryforlife.org/ subjectguides/index.php/main_page), seen in figure 10. this wiki is a collection of resources and services in print and electronic formats to assist library patrons with subject area searching. as the wiki is published by library staff for public consumption, it has more of a professional feel than wikis from the first two categories. pages have images, and the content is structured to look like a standard web page. though the wiki looks like a web page, there still remain a number of edit links that follow each section of text on the wiki. while these tags bear importance for those editing figure 7. library instruction wiki (http://instructionwiki.org/) figure �. the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) article title | author 31wikis in libraries | bejune 31 the wiki—library staff only in this case—they undoubtedly puzzle library patrons who think that they have the ability to edit the wiki when, in fact, they do not. another example of collaboration between library staff and patrons that takes a similar approach is the usc aiken gregg­graniteville library web site (http://library. usca.edu/) in figure 11. as with the sjpl subject guides, this wiki looks more like a web site than a wiki. in fact, the usc aiken wiki conceals its true identity as a wiki even more so than the sjpl subject guides. the only evidence that the web site is a wiki is a link at the bottom of each page that says “powered by pmwiki.” pmwiki (http:// pmwiki.org/) is a content management system that uti­ lizes the wiki technology on the back end to manage a web site while retaining the look and feel of a standard web site. it seems that the benefits of using a wiki in such a way are shared content creation and management. ■ collaboration among patrons as there are only three examples of wikis used for col­ laboration among patrons, all examples will be high­ lighted in this section. the first example is wiki worldcat (http://www.oclc.org/productworks/wcwiki.htm), sponsored by oclc. wiki worldcat launched as a pilot project in september 2005. the service allows users of open worldcat, oclc’s web version of worldcat, to add book reviews to item records. though this wiki does not have many book reviews in it, even for contemporary bestsellers, it gives a taste for how a wiki could be used to facilitate collaboration among patrons. a second example is the biz wiki from ohio university libraries (http://www.library.ohiou.edu/subjects/ bizwiki/index.php/main_page) (see figure 12). the biz wiki is a collection of business information resources avail­ able through ohio university. the wiki was created by chad boeninger, reference and instruction librarian, as an alternate form of a subject guide or pathfinder. what separates this wiki from those in the third category, collaboration among library staff and patrons, is that the wiki is editable by patrons as well as librarians. similarly, butler wikiref (http://www .seedwiki.com/wiki/butler_wikiref) is a wiki that has reviews of reference resources created by butler librarians, faculty, staff, and students (see figure 13).figure 9. health sciences library knowledge base (http://appdev .hsclib.sunysb.edu/twiki/bin/view/main/webhome) figure 11. usc aiken gregg-graniteville library (http://library.usca .edu/) figure 10. sjcpl subject guides (http://libraryforlife.org/subject guides/index.php/main_page/) 32 information technology and libraries | september 200732 information technology and libraries | september 2007 full results thirty­three wikis were identified. two wikis were classi­ fied in two categories each. the full results are available in the appendix. table 1 illustrates how wikis were not uniformly distributed across the four categories: category i had 45.7 percent, category ii had 31.4 percent, category iii had 14.3 percent, and category iv had 8.6 percent. nearly 80 percent of all examples were found within categories i and ii. as seen in some of the examples in the previous section, wikis were utilized for a variety of purposes. here is a short list of purposes for which wikis were utilized: for sharing information, supporting association work, collecting soft­ ware documentation, supporting conferences, facilitating librarian­to­faculty collaboration, creating digital reposito­ ries, managing web content, creating intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. wiki software utilization is summarized in tables 2 and 3. mediawiki is the most popular software utilized by libraries (33.3 percent), followed by unknown (30.3 percent), pbwiki (12.1 percent), pmwiki (12.1 percent), seedwiki (6.1 percent), twiki (3 percent), and xwiki (3 percent). if the values for unknown are removed from the totals (table 3 ), mediawiki is utilized in almost half (47.8 percent) of all library wiki applications. ■ discussion with a wealth of examples of wikis in categories i and ii and a dearth of examples of wikis in categories iii and iv, the library community seems to be more comfortable using wikis to collaborate within the community, but less comfortable using wikis to collaborate with library patrons or to enable collaboration among patrons. the research results pose the questions: why are wikis pre­ dominantly used for collaboration within the library community? and why are wikis minimally used for col­ laborating with patrons and helping patrons to collabo­ rate with one another? why are wikis predominantly used for collaboration within the library community? this is perhaps the easier of the two questions to explain. there is a long legacy of cooperation and collaboration intraorganizationally and extraorganizationally within libraries. one explanation for this is the shared bud­ getary climate within libraries. all too often there are insufficient money, staff, and resources to offer desired levels of service. librarians work together to overcome these barriers. prominent examples include coopera­ tive cataloging, interlibrary lending, and the formation of consortia to negotiate pricing. another explanation can be found in the personal characteristics of library professionals. librarianship is a service profession that consequently attracts service­minded individuals who are interested in helping others, whether they are library patrons or fellow colleagues. a third reason is the role of library associations, such as the international federation of library associations and institutions, the american library association, the special libraries association, and the medical library association, as well as many others at the international, national, state, and local lev­ figure 12. ohio university libraries biz wiki (http://www.library. ohiou.edu/subjects/bizwiki) figure 13. butler wikiref (http://www.seedwiki.com/wiki/butler_ wikiref) article title | author 33wikis in libraries | bejune 33 els, and the work that is done through these associations at annual conferences and throughout the year. libraries use wikis to collaborate intraorganizationally and extra­ organizationally because collaboration is what they do most naturally. why are wikis minimally used for collaborating with patrons and helping patrons to collaborate with one another? the reasons for why libraries are only minimally using wikis to collaborate with patrons and for patron collabora­ tion are more difficult to ascertain. however, due to the untapped potential of using wikis, the proposed answers to this question are more important and may lead to future implementations of wikis in libraries. here are four pos­ sible explanations, some more speculative than others. first, perhaps one of the reasons is the result of the way in which libraries are conceived by library patrons and librarians alike. a strong case can be made for libraries as places of collaborative work, and the author takes this posi­ tion. however, historically libraries have been repositories of information, and this remains a pervasive and difficult concept to change—libraries are frequently seen simply as places to get books. in this scenario, the librarian is a gate­ keeper that a patron interacts with to get a book—that is, if the patron interacts with a librarian at all. it also is worthy to note that the relationship is one­way—the patron needs the assistance of librarian, but not the other way around. viewed in these terms, this is not a collaborative situation. for libraries to use wikis for the purpose of collaborating with library patrons, it might demand the reconceptualiza­ tion of libraries by library patrons and librarians. similarly, this extreme conceptualization of libraries does not con­ sider patrons working with one another, even though it is an activity that occurs formally and informally within libraries, not to mention with the emergence of interdisci­ plinary and multidisciplinary work. if wikis are to be used to facilitate collaboration between patrons, the conceptual­ ization of the library by library patrons and librarians must be expanded. second, there may be fears within the library commu­ nity about authority, responsibility, and liability. libraries have long held the responsibility of ensuring the authority of the bibliographic catalog. if patrons are allowed to edit the library wiki, there is potential for negatively affecting the authority of the wiki and even the perceived author­ ity of the library. likewise, there is potential liability in allowing patrons to post to the library wiki. similar con­ table 2. software totals wiki software no. % mediawiki 11 33.3 unknown 10 30.3 pbwiki 4 12.1 pmwiki 4 12.1 seedwiki 2 6.1 twiki 1 3 xwiki 1 3 total: 33 100 table 3. software totals without unknowns wiki software no. % mediawiki 11 47.8 pbwiki 4 17.4 pmwiki 4 17.4 seedwiki 2 8.7 twiki 1 4.3 xwiki 1 4.3 total: 23 100.0 table 1. classification summary category no. % i: collaboration among libraries 16 45.7 ii: collaboration among library staff 11 31.4 iii: collaboration among library staff and patrons 5 14.3 iv: collaboration among patrons 3 8.6 total: 35 100.0 3� information technology and libraries | september 20073� information technology and libraries | september 2007 cerns have been raised in the past about other collabora­ tive technologies, such as blogs, bulletin boards, mailing lists, and so on, all aspects of the library 2.0 movement. if libraries are fully to realize library 2.0 as described by casey and savastinuk (2006), miller (2006), and courtney (2007), these issues must be considered. third, perhaps it is due to a matter of fit. it might be the case that wikis are utilized in categories i and ii and not within categories iii and iv because the tools are better suited to support the types of activities within categories i and ii. consider some of the activities listed earlier: sup­ porting association work, collecting software documenta­ tion, supporting conferences, creating digital repositories, creating intranets, and creating knowledge bases. each of these illustrates a wiki that is utilized for the creation of a resource with multiple authors and readers, tasks that are well­suited to wikis. wikipedia is a great example of a wiki with clear, shared tasks for multiple authors and multiple readers and a sense of persistence over time. in contrast, relationships between library staff and patrons do not typically lead to the shared creation of resources. while it is true that the relationship between patron and librarian in the context of a patron’s research assignment can be collab­ orative depending on the circumstances, authorship is not shared but is possessed by the patron. in addition, research assignments in the context of undergraduate coursework are short­lived and seldom go beyond the confines of a particular course. in terms of patrons working together with other patrons, there is the precedent of group work; however, groups often produce projects or papers that share the characteristics of nongroup research assignments listed above. this, of course, does not mean that wikis are not suitable for collaboration within categories iii and iv, but perhaps the opportunities for collaboration are fewer or that they stretch the imagination of the types and ways of doing collaborative work. fourth, perhaps it is a matter of “not yet.” while the research has shown that libraries are not utilizing wikis in categories iii and iv, this may be because it is too soon. it should be noted that wikis are still new technologies. it might be the case that librarians are experimenting in safer contexts so they will gain experience prior to trying more public projects where their expertise will be needed. if this explanation is true, it is expected that more exam­ ples of wikis in libraries will soon emerge. as they do, the author hopes that all examples of wikis in libraries, new and old, will be added to the companion wiki to this article, librarywikis (http://librarywikis.pbwiki.com/). ■ conclusion it appears that wikis are here to stay, and that their utili­ zation within libraries is only just beginning. this article documented the current practice of wikis used in libraries using cscw as a framework for discussion. the author located examples of wikis in three places: within the lis lit­ erature, on the library success wiki, and within messages from three professional electronic discussion lists. thirty­ three examples of wikis were identified and classified using a classification schema created by the author. the schema has four categories: (1) collaboration among librar­ ies; (2) collaboration among library staff; (3) collaboration among library staff and patrons; and (4) collaboration among patrons. wikis were used for a variety of purposes, including for sharing information, supporting associa­ tion work, collecting software documentation, supporting conferences, facilitating librarian­to­faculty collaboration, creating digital repositories, managing web content, creat­ ing intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. by and large, wikis were primarily used to support collaboration among library staff intraorganiza­ tionally and extraorganizationally, with nearly 80 percent (45.7 percent and 31.4 percent respectively) of the examples so identified, and less so in the support of collaboration among library staff and patrons (14.3 percent) and col­ laboration among patrons (8.6 percent). a majority of the examples of wikis utilized the mediawiki software (47.8 percent). it is clear that there are plenty of examples of wikis utilized in libraries, and more to be found each day. it is at this time that the profession is faced with extending the use of this technology, and it is to the future to see how wikis will continue to be used within libraries. works cited ackerman, mark s. 2002. the intellectual challenge of cscw: the gap between social requirements and technical feasibil­ ity. in human-computer interaction in the new millennium, ed. john m. carroll, 179–203. new york: addison­wesley. balraj, leela, et al. 2005 libref­l. kent state university librar­ ies. http://www.library.kent.edu/page/10391 (accessed june 12, 2007). archive is available at this link as well. bannon, liam j., and kjeld schmidt. 1991. cscw: four charac­ ters in search for a context. in studies in computer supported cooperative work. ed. john m. bowers and steven d. benford, 3–16. amsterdam: elsevier. casey, michael e., and laura c. savastinuk. 2006. library 2.0. library journal 131, no. 14: 40–42. http://www.libraryjournal. com/article/ca6365200.html (accessed june 12, 2007). courtney, nancy. 2007. library 2.0 and beyond: innovative technologies and tomorrow’s user (in press). westport, conn.: libraries unlimited. dix, alan, et al. 2004. socio­organizational issues and stake­ holder requirements. in human computer interaction, 3rd ed., 450–74. upper saddle river, n.j.: prentice hall. dourish, paul. 2001. social computing. in where the action is: the foundations of embodied interaction, 55–97. cambridge, mass: mit pr. article title | author 35wikis in libraries | bejune 35 farkas, meredith. 2006. wiki world. http://www.libsuccess. org/index.php?title=wiki_world (accessed june 12, 2007). giles, jim. 2005. internet encyclopaedias go head to head. nature 438: 900–01. http://www.nature.com/nature/journal/v438/ n7070/full/438900a.html (accessed june 12, 2007). greif, irene, ed. 1988. computer supported cooperative work: a book of readings. san mateo, calif.: morgan kaufmann publishers. health sciences library, state university of new york, stony brook. 2007. health sciences library knowledge base. http://appdev.hsclib.sunysb.edu/twiki/bin/view/main/ webhome (accessed june 12, 2007). johansen, robert, et al. 1988. groupware: computer support for business teams. new york: free press. library instruction wiki. 2006. http://instructionwiki.org/ main_page (accessed june 12, 2007). miller, paul. 2006. coming together around library 2.0. dlib magazine 12, no. 4. http://www.dlib.org/dlib/april06/ miller/04miller.html (accessed june 12, 2007). nichols, david m., and michael b. twidale. 1999. com­ puter supported cooperative work and libraries. vine 109: 10–15. http://www.comp.lancs.ac.uk/computing/research/ cseg/projects/ariadne/docs/vine.html (accessed june 12, 2007). olson, gary m., and judith s. olson. 2002. groupware and com­ puter­supported cooperative work. in the human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ed. julie a. jacko and andrew sears, 583–95. mahwah, n.j.: lawrence erlbaum associates, inc.. rodden, tom t. 1991. a survey of cscw systems. interacting with computers 3, no. 3: 319–54. sachs, patricia. 1995. transforming work: collaboration, learn­ ing, and design. communications of the acm 38: 227–49. sánchez, j. alfredo. 2001. hci and cscw in the context of digi­ tal libraries. in chi ‘01 extended abstracts on human factors in computing systems. conference on human factors in computing systems. seattle, wash., mar. 31–apr. 5 2001. schmidt, kjeld, and liam j. bannon. 1992. taking cscw seri­ ously: supporting articulation work. computer supported cooperative work 1, no. 1/2: 7–40. shneiderman, ben, and catherine plaisant. 2005. collaboration. in designing the user interface: strategies for effective humancomputer interaction, 4th ed., 408–50. reading, mass.: addison wesley. tennant, roy. 2005. web4lib electronic discussion. webjunc­ tion.org. http://lists.webjunction.org/web4lib/ (accessed june 12, 2007). archive is available at this link as well. twidale, michael b., et al. 1997. collaboration in physical and digital libraries. report no. 64, british library research and innovation centre. http://www.comp.lancs.ac.uk/ computing/research/cseg/projects/ariadne/bl/report/ (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998a. using studies of collaborative activity in physical environments to inform the design of digital libraries. technical report cseg/11/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/cscw98.html (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998b. a survey of applications of cscw for digital libraries. technical report cseg/4/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/survey.html (accessed june 12, 2007). webjunction. n.d. dig_ref electronic discussion list. http:// www.vrd.org/dig_ref/dig_ref.shtml (accessed june 12, 2007). wikipedia. 2007a. wiki. http://en.wikipedia.org/wiki/wiki (accessed april 29, 2007). wikipedia. 2007b. wikiwikiweb. http://en.wikipedia.org/ wiki/wikiwikiweb (accessed april 29, 2007). 36 information technology and libraries | september 200736 information technology and libraries | september 2007 appendix. wikis in libraries i = collaboration between libraries ii = collaboration between library staff iii = collaboration between library staff and patrons iv = collaboration between patrons category description location wiki software i library success: a best practices wiki—a wiki capturing library success stories. covers a wide variety of topics. also features a presentation about wikis http://www.libsuccess. org/index.php?title=wiki_world http://www.libsuccess.org/ mediawiki i wiki for school library association in alaska http://akasl.pbwiki.com/ pbwiki i wiki to support reserves direct. free, open­source software for managing academic reserves materials developed by emory university. http://www.reservesdirect.org/ wiki/index.php/main_page mediawiki i sunyla new tech wiki—a place for state university of new york (suny) librarians to share how they are using information technologies to interact with patrons http://sunylanewtechwiki.pbwiki. com/ pbwiki i wiki for librarians and faculty members to collaborate across campuses. being used with distance learning instructors and small groups message from robin shapiro. on [dig_ref] electronic discussion list dated 10/18/2006. unknown i discusses setting up three wikis in last month: “one to sup­ port a pre­conference workshop, another for behind­the­ scenes conferences planning by local organizers, and one for conference attendees to use before they arrived and during the sessions” (30). fichter, darlene. 2006. using wikis to support online collaboration in libraries. information outlook 10, no.1: 30­31. unknown i unofficial wiki to the american library association 2005 annual conference http://meredith.wolfwater.com/ wiki/index.php?title=main_page mediawiki i unofficial wiki to the 2005 internet librarian conference http://ili2005.xwiki.com/xwiki/bin/ view/main/webhome xwiki i wiki for the canadian library association (cla) 2005 annual conference http://wiki.ucalgary.ca/page/cla mediawiki i wiki for south carolina library association http://www.scla.org/governance/ homepage pmwiki i wiki set up to support national discussion about institutional repositories in new zealand http://wiki.tertiary.govt.nz/ ~institutionalrepositories pmwiki i the oregon library instruction wiki used for sharing infor­ mation about library instruction http://instructionwiki.org/ mediawiki i personal repositories online wiki environment (prowe)— an online repository sponsored by the open university and the university of leicester that uses wikis and blogs to encourage the open exchange of ideas across communities of practice http://www.prowe.ac.uk/ unknown article title | author 37wikis in libraries | bejune 37 category description location wiki software i lis wiki—space for collecting articles and general informa­ tion about library and information science http://liswiki.org/wiki/main_page mediawiki i making of modern michigan—a wiki to support a state­wide digital library project http://blog.lib.msu.edu/mmmwiki/ index.php/main_page unknown (behind firewall) i wiki used as a web content editing tool in a digital library initiative sponsored by emory university, the university of arizona, virginia tech, and the university of notre dame http://sunylanewtechwiki.pbwiki .com/ pbwiki ii wiki at suny stony brook health sciences library used as knowledge base http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome; presentation can be found at: http:// ms.cc.sunysb.edu/%7edachase/ wikisinaction.htm twiki ii wiki at york university used internally for committee work. exploring how to use wikis as a way to collaborate with users message from mark robertson. on web4lib electronic discussion list dated 10/13/2006. unknown ii wiki for internal staff use at the university of waterloo. they utilize access control to restrict parts of the wiki to groups message from chris gray. on web4lib electronic discussion list dated 08/09/2006. unknown ii wiki at the university of toronto for internal communica­ tions, technical problems, and as a document repository message from stephanie walker. on libref­l electronic discussion list dated 10/28/2006. unknown ii wiki used for coordination and organization of portable professor program, which appears to be a collaborative infor­ mation literacy program for remote faculty http://tfpp­committee.pbwiki.com/ pbwiki ii the university of connecticut libraries’ staff wiki which is a repository of information technology services documents http://wiki.lib.uconn.edu/wiki/ main_page mediawiki ii wiki used at binghamton university libraries for staff intranet. features pages for committees, documentation, policies, newsletters, presentations, and travel reports screenshots can be found at http://library.lib.binghamton.edu/ presentations/cil2006/cil%202006 _wikis.pdf mediawiki ii wiki used at the information desk at miami university described in: withers, rob. “something wiki this way comes.” c&rl news 66, no. 11 (2005): 775–77. unknown ii use of wiki as knowledge base to support reference service http://oregonstate.edu/~reeset/ rdm/ unknown ii university of minnesota libraries staff web site in wiki form https://wiki.lib.umn.edu/ pmwiki ii wiki used to support the mit engineering and science libraries b­team. the wiki may no longer be active, but is still available http://www.seedwiki.com/wiki/b­ team seedwiki iii a wiki that is subject guide at st. joseph county public library in south bend, indiana http://www.libraryforlife.org/ subjectguides/index.php/main_page mediawiki 3� information technology and libraries | september 20073� information technology and libraries | september 2007 category description location wiki software iii wiki used at the aiken library, university of south carolina as a content management system (cms) http://library.usca.edu/main/ homepage pmwiki iii doucette library of teaching resources wiki—a repository of resources for education students http://wiki.ucalgary.ca/page/ doucette mediawiki iv wiki worldcat (wikid) is an oclc pilot project (now defunct) that allowed users to add reviews to open worldcat records http://www.oclc.org/product­ works/wcwiki.htm unknown iii and iv wikiref lists reviews of reference resources—databases, books, web sites, etc. —created by butler librarians, faculty, staff, and students. http://www.seedwiki.com/wiki/ butler_wikiref; reported in matthies, brad, jonathan helmke, and paul slater. using a wiki to enhance library instruction. indiana libraries 25, no. 3 (2006): 32–34. seedwiki iii and iv wiki used as a subject guide at ohio university http://www.library.ohiou.edu/sub­ jects/bizwiki/index.php/main_page; presentation about the wiki: http://www.infotoday.com/cil2006/ presentations/c101­102_boeninger .pps mediawiki introducing zoomify image | smith 55 column title editor author id box for 3 column layout returning classification to the catalog | bland and stoffan 55 communications robert n. bland and mark a. stoffan returning classification to the catalog the concept of a classified catalog, or using classification as a form of subject access, has been almost forgotten by contemporary librarians. recent developments indicate that this is changing as libraries seek to enhance the capabilities of their online catalogs. the western north carolina library network (wncln) has developed a “classified browse” feature for its shared online catalog that makes use of library of congress classification. while this feature is not expected to replace keyword searching, it offers both novice and experienced library users another way of identifying relevant materials. classification to modern librari-ans is almost exclusively a tool for organizing and arranging books (or other physical media) on shelves. the role of classification as a form of subject access to collections through the public catalog—the concept of the classified catalog—has been almost forgotten. from a review of the literature, it does not appear that any major u.s. library has supported a classified catalog since boston university libraries closed its classified catalog in 1973.1 to be sure, nearly all online catalogs nowadays have some form of what is called a “call number search” or a “shelf list browsing capability” that is based on classification, but this is a humble and little-used feature because it requires that a call number (or at least a call number stem) be known and entered by the user, when no verbal index to the classification is available online. this search methodology provides nothing in the way of a systematic and hierarchical arrangement and display of subject classes, complete with accompanying verbal descriptions, that the classified catalog seeks to accomplish. but as karen markey put it in her recent review of classification and the online catalog, “to this day, the only way in which most end users experience classification online is through their online catalog’s shelf list browsing capability.”2 there are signs that this situation is changing. the recently released endeca-based catalog at north carolina state university libraries uses library of congress classification (lcc) in a prominent way to provide for browsing of the collection without need of the user entering any search terms at all.3 the lcc outline is presented on the main search entry screen with verbal captions describing the classes, allowing users to navigate through several layers of the outline to retrieve with a click of the mouse bibliographic records for materials assigned to those classes. in a converse way, the new online catalog being developed by the florida center for library automation uses lc classification as a kind of back end to keyword searching. following a keyword search, a user can limit the results set by confining it to a designated lcc range chosen again from an online display of the lcc outline.4 both of these catalogs use three levels of the lcc outlines from the most general single letter level classes (q for sciences, for example) through the two-letter classes for more specific subjects (qc for physics, qd for chemistry) to an even finer granularity with designated numeric ranges within the two-letter classes identifying specific subdisciplines, (qd241–qd441 for organic chemistry). the western north carolina library network (wncln) has been experimenting with classification as a retrieval tool in the public catalog for some time,5 and it has just implemented the first version of what we call a classified catalog browse in our innovative millennium system.6 like the two catalogs just mentioned, the classified catalog browse is based on software that is external to the ils software and integrated with that software through linking and webpage designs. also, like the previously discussed catalogs, it is robert n. bland (bland@unca.edu) is associate university librarian for technical services, university of north carolina at asheville. mark stoffan (mstoffan@ fsu.edu) is associate director for library technology at florida state university, tallahassee. figure 1. level 1 of lc classification in wncln webpac 56 information technology and libraries | june 200856 information technology and libraries | september 2008 based on scanning and incorporating into the catalog the lcc outlines as published by the library of congress. the wncln catalog goes a step further, however, in bringing the entire lc classification online down to the individual class number level—at least that portion of the classification that is actually used in our catalog. this is done through extracting class numbers and associated subject headings from bibliographic and authority records in our catalog and building an online classification display with descriptive captions (a verbal index) from these bibliographic and authority records. the result is a hierarchical display (to continue the example from above) not only of qc241–qd441 for organic chemistry but within this, qd271 for chromatographic analysis, qd273 for organic electrochemistry, and so on. the design of our interface presents this as a fourth level to which the user can “drill” down beginning with q for sciences, qd for chemistry, qd241–qd441 for organic chemistry, and finally qd271 for chromatographic analysis (figures 1–4.) from this fourth level,the user can click an associated link to execute a search of the catalog by the class number in question using the call number search function of the ils (figure 5); a second link for that class number will present the same list of titles but sorted by “most popular” (i.e., the items that have been checked out most frequently) from a separate but linked external database (figure 6); a third link will search the catalog by the associated subject heading for the class (figure 7); and finally a fourth link will show other subject headings that have been used in the catalog with this specific class number (figure 8). what does having the lc classification online in our catalog accomplish for our users? part of the point of our project is to answer this very question. chan and others7 have theorized that incorporation of the classification system into the catalog as a retrieval tool can figure 2. level 2 of lc classification in wncln webpac figure 3. level 3 of lc classification in wncln webpac provide enhanced subject access that is not possible through standard alphabetical subject headings and keyword searching alone. early studies by markey and others at oclc seem to have confirmed this with an online version of the dewey decimal classification.8 since (as far as we know) the library of congress classification has not really been tested as an online retrieval tool in a live catalog up to now, our implementation will serve as a kind of test bed for this hypothesis. how actual users in fact exploit this feature is of course only something that experience will introducing zoomify image | smith 57returning classification to the catalog | bland and stoffan 57 tell. a cursory look, however, would seem to indicate definite advantages to this approach. first of all, many studies indicate that two of the major sources of failure with subject retrieval in online systems are misspellings and poor choice of search terms by users. no figure 4. level 4 of lc classification in wncln web:pac figure 5. call number search display in wncln matter how far we may try to go with keyword searching and relevance ranking, no online library retrieval system is likely to do much with “napolyan’s fites” when what the user is looking for are books on the military campaigns of the emperor napoleon. with the classification system and verbal index online most of these problems are eliminated, since users can navigate to a subject of choice without ever entering a search term. moreover, given the design of the verbal index based on library of congress subject headings, the user is led to actual subject headings used in the catalog, which should provide for precise retrieval beyond what is ordinarily possible with keywords even when entered correctly, and (importantly) a retrieval set that is always greater than zero. the infamous and frustrating problem of “no hits” is eliminated. secondly, the great attraction of the classified catalog approach is that it arranges subjects in a hierarchical fashion based on integral connections among the topics in a way that cannot be accommodated in an alphabetic subject approach because of the vagaries of spelling. the topics “violence,” “social conflict,” and “conflict management,” for example, obviously spread out in an alphabetical subject list, are collocated in the classified catalog under the class “hm1106–hm1171 interpersonal relations” (figure 9), allowing the user to find references to materials all in one place in the catalog just as the classification system arranges the books on these subjects all in one place on the library shelves. alphabetical subject indexes, of course, attempt to ameliorate this problem by means of cross references, but there is clearly a limit to how far one can go with this approach. finally, the classified catalog provides an efficient way for collection development staff to review specific subject areas and to make better informed purchasing decisions regarding the collections. in the wncln design, the classes at the bottom level of the hierarchy are linked to the catalog by call number and subject headings, and each class carries an indication of the number of items assigned that class number. the classes are also linked to an external database that shows the frequency 58 information technology and libraries | june 200858 information technology and libraries | september 2008 of circulation of items in the class as well as title and date of publication. a quick review of this list can inform a bibliographer of circulation rates as well as the currency of materials in the class. as mentioned, the captions that are displayed with the lcc hierarchy in the wncln catalog are extracted from subject headings and authority records present in our catalog. readers familiar with lc marc record services may wonder why we took this approach to building the verbal index rather than using the information available in the lc marc classification records. machine-readable records for lc classification are now available in marc format. these files include records for each individual class number with a corresponding verbal caption. while we did experiment with using these files, cost and complexity determined that we go another direction. the lc classification files are huge, containing hundreds of thousands of classification numbers that we do not now and probably never would use in our wncln catalog simply because we (unlike lc) have no materials on these subjects. while these records could be filtered out by matching against lc class numbers that are found in our catalog and discarding non-matches, this would add yet another level of processing to an already complex process, as would handling the lc table subdivisions that are used in the lc schedules and that are separate from the standard class numbers. secondly, the lc marc classification files require a subscription costing several thousands dollars per year, as well as a substantial payment for the retrospective file needed to begin building the database of class numbers. on the other hand, extracting the verbal index from subject headings and authority records in our own catalog adds no cost to our processing. these headings and authority records are created and maintained, of course, as a standard part of the figure 6. most used titles display figure 7. subject search display in wncln cataloging process, and accordingly only headings and authority records that match materials owned by our libraries are included. the description or caption that is finally assigned to a class number is determined by a computer program that analyzes both authority records and bibliographic records found in our catalog that are assigned the class number in question, with the subject heading that is used most frequently as a primary subject generally being the one normally selected as the caption for the class. these class numbers with associated subject headings are processed then by another program, which eventually builds html files introducing zoomify image | smith 59returning classification to the catalog | bland and stoffan 59 representing the classification with links to the catalog and the external “most used” database as alluded to above. these standard html files, along with the files representing the first three levels of the lcc outline, are then loaded onto our web server to display the classification system online. figure 9. collocation of terms in the classified catalog figure 8. related subjects display in wncln a second advantage of this approach is that using the actual subject heading as the caption or description for the class makes it possible to use that caption as a direct link to a subject search in the catalog, as shown in the illustration in figure 4. a disadvantage is that the captions from the lcc files are designed to retain the hierarchy that is represented in the printed schedules in a visual way by formatting and indenting. captions derived from subject headings do not retain this feature. we have tried to accommodate this in our display of the schedules by replicating the class number ranges from the outline in the appropriate place in the full display of the schedules, thereby building a hierarchy from these ranges as genus and the individual class numbers as species. this does not manage to retain the full hierarchy of the lc schedules as shown in the printed schedules or as represented in lc’s online classification web product, but it is, we hope, an adequate surrogate for the purpose intended. in fact, in most cases, the captions derived from the extracted subject and authority headings match quite nicely the captions included in the actual lcc schedules, as shown in a comparison from the psychology classification of the hierarchy as it appears in our classified catalog browse and as it appears online in lc’s classification web product (figures 10 and 11). what is missing in our representation of the classification is not so much the subject content of the classes but the notes and information about literary form that are included in the actual lcc schedules. thus, our lcc online is not a strict image of the lcc as it would appear in printed or electronic form based on the hierarchies and captions devised by the lc. nor for that matter—despite our terminology— is it a true classified catalog, since only one classification (that used in the call number) is assigned to each item, whereas in a true classified catalog multiple classifications may be assigned to an item. it is nevertheless an online presentation of the lcc with links to our catalog that seeks to enhance subject access by exploiting the power of the classification system to organize materials by integral subject classes and to show relationships among subjects by a 60 information technology and libraries | june 200860 information technology and libraries | september 2008 hierarchical arrangement of classes as genus, species, and subspecies. and, perhaps just as importantly, it is an implementation that requires no additional cataloging effort on the part of our staff, nor any additional costs for data or processing other than the investment we have made in development of the software and the small amount of time required weekly to update the files. we do not expect that the classified catalog browse will replace keyword or subject searching as the primary means of subject access to our collections. we do believe that it promises to be a powerful and effective complement to our standard ils searches that may improve subject searching for both the novice and the experienced user. references 1. margaret hindle hazen, “the closing of the classified catalog at boston university,” library resources and technical services 18 (1974): 221–26. 2. karen markey, joan s. mitchell, and diane vizine-goetz, “forty years of classification online: final chapter or future unlimited?” cataloging and classification quarterly 42 (2006): 1–63. 3. north carolina state university libraries, “ncsu libraries online catalog,” north carolina state university, www.lib.ncsu.edu/catalog (accessed mar. 23, 2007). 4. florida center for library automation, “state university libraries of florida–endeca,” board of governors, state of florida, http://catalog.fcla.edu (accessed mar. 23, 2007). 5. the western north carolina library network is a consortium consisting of the libraries of appalachian state university, the university of north carolina at asheville, and western carolina university. 6. western north carolina library network, “library catalog,” western north carolina library network, http://wncln .wncln.org (accessed mar. 23, 2007). figure 10. class captions in the wncln webpac figure 11. class captions in lc’s classification web 7. lois mai chan, “library of congress classification as an online retrieval tool: potentials and limitations,” information technology and libraries 5 (1986): 181–92. 8. karen markey and anh demeyer, dewey decimal classification online project: evaluation of a library schedule and index integrated into the subject searching capabilities of an online catalog: final report to the council on library resources (dublin, ohio: oclc, 1986), report no. oclc/ opr/rr-86/1. reproduced with permission of the copyright owner. further reproduction prohibited without permission. graphical table of contents for library collections: the application ... herrero-solana, victor;félix moya-anegón;guerrero-bote, vicente;zapico-alonso, felipe information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. 6 information technology and libraries | june 2008 metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1 jennifer bowen the extensible catalog (xc) project at the university of rochester will design and develop a set of open-source applications to provide libraries with an alternative way to reveal their collections to library users. the goals and functional requirements developed for xc reveal generalizable needs for metadata to support a next-generation discovery system. the strategies that the xc project team and xc partner institutions will use to address these issues can contribute to an agenda for attention and action within the library community to ensure that library metadata will continue to support online resource discovery in the future. library metadata, whether in the form of marc 21 catalog records or in a variety of newer metadata schemas, has served its purpose for library users by facilitating their discovery of library resources within online library catalogs (opacs), digital libraries, and institutional repositories. however, libraries now face the challenge of making this wealth of legacy catalog data function adequately within next-generation web discovery environments. approaching this challenge will require: n an understanding of the metadata itself and a commitment to deriving as much value from it as possible; n a vision for the capabilities of future technology; n an understanding of the needs of current (and, where possible, future) library users; and n a commitment to ensuring that lessons learned in this area inform the development of both future library systems and future metadata standards. the university of rochester ’s extensible catalog (xc) project will bring these various perspectives together to design and develop a set of open-source, collaboratively built next-generation discovery tools for libraries. the xc project team seeks to make the best possible use of legacy library metadata, while also informing the future development of discovery metadata for libraries. during phase 1 of the xc project (2006–2007), the xc project team created a plan for developing xc and defined the goals and initial functional requirements for the system. this paper outlines the major metadatarelated issues that the xc project team and xc partner institutions will need to address to build the xc system during phase 2. it also describes how the xc team and xc partners will address these issues, and concludes by presenting a number of issues for the broader library community to consider. while this paper focuses on the work of a single library project, the goals and functional requirements developed for the xc project reveal many generalizable needs for metadata to support a next-generation discovery system.1 the metadata-related goals of the xc project—to facilitate the use of marc metadata outside an integrated library system (ils), to combine marc metadata with metadata from other sources in a single discovery environment, and to facilitate new functionality (e.g., faceted browsing, user tagging)—are very similar to the goals of other library projects and commercial vendor discovery software. the issues described in this paper thus transcend their connection to the xc project and can be considered general needs for library discovery metadata in the near future. in addition to informing the library community about the xc project and encouraging comment on that work, the author hopes that identifying and describing metadata issues that are important for xc—and that are likely to be important for other projects as well—will encourage the library community to set these issues as high priorities for attention and action within the next few years. n the extensible catalog project the university of rochester’s vision for the extensible catalog (xc) is to design and develop a set of open-source applications that provide libraries with an alternative way to reveal their collections to library users. xc will provide easy access to all resources (both digital and physical collections) and will enable library content to be revealed through other web applications that libraries may already be using. xc will be released as open-source software, so it will be available for free download, and libraries will be able to adopt, customize, and extend the software to meet their local needs. the xc project is a collaborative effort between partner institutions that will serve a variety of roles in its development. phase 1 of the xc project, funded by the andrew w. mellon foundation and carried out by the university of rochester river campus libraries between april 2006 and june 2007, resulted in the creation of a project plan for the development of xc. during xc phase 1, the xc project team recruited a number of other institutions that will serve as xc partners and who have agreed to contribute resources toward building and implementing xc during phase 2. xc phase 2 (october 2007 through jennifer bowen (jbowen@library.rochester.edu) is director of metadata management at the university of rochester river campus libraries, new york, and is co-principal investigator for the extensible catalog project. metadata to support next-generation library resource discovery | bowen 7 june 2009) is supported through additional funding from the andrew w. mellon foundation, the university of rochester, and xc partners. during phase 2, the xc project team, assisted by xc partners, will deploy the xc software and make it available as open-source software.2 through its various components, the xc system will provide a platform for local development and experimentation that will ultimately allow libraries to manage and reveal their metadata through a variety of web applications such as web sites, institutional repositories, and content management systems. a library may choose to create its own customized local interface to xc, or use xc’s native user interface “as is.” the native xc interface will include web 2.0 functionality, such as tagging and faceted browsing of search results that will be informed by frbr (functional requirements for bibliographic records)3 and frad (functional requirements for authority data)4 conceptual models. the xc software will handle multiple metadata schemas, such as marc 215 and dublin core,6 and will be able to serve as a repository for both existing and future library metadata. in addition, xc will facilitate the creation and incorporation of user-created metadata, enabling such metadata to be enhanced, augmented, and redistributed in a variety of ways. the xc project team has designed a modular architecture for xc, as shown in the simplified schematic in figure 1. xc will bring together metadata from a variety of sources (integrated library systems, digital repositories, etc.), apply services to that metadata, and display it in a usable way in the web environments where users expect to find it.7 xc’s architecture will allow institutions that implement the software to take advantage of innovative models for shared metadata services, which will be described in this paper. n xc phase 1 activities during the now-completed xc phase 1, the xc project team focused on six areas of activity: 1. survey and understand existing research on user practices. 2. gauge library demand for the xc system. 3. anticipate and prepare for the metadata requirements of the new system. 4. learn about and build on related projects. 5. experiment with and incorporate useful, freely available code. 6. build a community of interest. the xc project team carried out a variety of research activities to inform the overall goals and high-level functional requirements for xc. this research included a literature search and ongoing monitoring of discussion lists and blogs, to allow the team to keep up with the most current discussions taking place about next-generation library discovery systems and related technologies and projects.8 the xc team also consulted regularly with prospective partners and other knowledgeable colleagues who are engaged in defining the concept of a next-generation library discovery system. in order to gauge library demand for the xc system, the team also conducted a survey of interested institutions.9 this paper reports the results of the third area of activity during xc phase 1—anticipating and preparing for the metadata requirements of the new system—and looks ahead to plans to develop the xc software during phase 2. n xc goals and metadata functional requirements the goals of the xc project have significant implications for the metadata functionality of the system, with each goal suggesting specific high-level functional requirements for how the system can achieve that particular goal. the five goals are: n goal 1: provide access to all library resources, digital and non-digital. n goal 2: bring metadata about library resources into a more open web environment. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing. n goal 4: conduct user research to inform system development. n goal 5: publish the xc code as open-source software. figure 1. xc system diagram 8 information technology and libraries | june 2008 an overview of each xc goal and its related high-level metadata requirements appears below. each requirement is then discussed in more detail, with a plan for how the xc project team will address that requirement when developing the xc software. n goal 1: provide access to all library resources, digital and non-digital working alongside a library’s current integrated library system (ils) and its other web applications, xc will strive to bring together access to all library resources, thus eliminating the data silos that are now likely to exist between a library’s opac and its various digital repositories and commercial databases. this goal suggests two fairly obvious metadata requirements (requirements 1 and 2). requirement 1—the system must be capable of acquiring and managing metadata from multiple sources: ilss, digital repositories, licensed databases, etc. a typical library currently has metadata pertaining to its collections residing in a variety of separate online systems: marc data in an ils, metadata in various schemas in digital collections and repositories, citation data in commercial databases, and other content on library web sites. a library that implements xc may want to populate the system with metadata from several online environments to simplify access to all types of resources. to achieve goal 1, xc must be capable of acquiring and managing metadata from all of these sources. each online environment and type of metadata present their own challenges. repurposing marc data repurposing marc metadata from an existing ils will be one of the biggest metadata tasks for a next-generation discovery system such as xc. in planning xc, we have assumed that most libraries will keep their current ils for the next few years or perhaps migrate to a newer commercial or open-source ils. in either case, most libraries will likely continue to rely on an ils’s staff functionality to handle materials acquisition, cataloging, circulation, etc. for the short term. relying upon an ils as a processing environment does not, however, mean that a library must use the opac portion of that ils as its means of resource discovery for users. xc will provide other options for resource retrieval by using web services to interact with the ils in the background.10 to repurpose ils metadata and enable it to be used in various web discovery environments, xc will harvest a copy of marc metadata records from an institution’s ils using the open archives initiative protocol for metadata harvesting (oai-pmh).11 using web services and standard protocols such as oaipmh offers not only a short-term solution for reusing metadata from an ils, but can also be used in both the shortand long-term to harvest metadata from any system that is oai-pmh harvestable, as will be discussed further below. while harvesting metadata from existing systems into xc creates duplication of metadata between an ils and xc, this actually has significant benefits. xc will handle metadata updates through automated harvesting services that minimize additional work for library staff, other than for setting up and managing the automated services themselves. the internal xc metadata cache can be easily regenerated from the original repositories and services when necessary, such as to enable future changes to the internal xc metadata schema. the xc system architecture also makes use of internal metadata duplication among xc’s components, which allows these components to communicate with each other using oaipmh. this built-in metadata redundancy will also enable xc to communicate with external services using this standard protocol. it is important to distinguish the deliberate metadata redundancies built into the xc architecture from the type of metadata redundancies that have been singled out for elimination in the library of congress working group on the future of bibliographic control draft report (recommendation 1.1)12 and previously in the university of california (uc) libraries bibliographic services task force’s final report.13 these other “negative” redundancies result from difficulties in sharing metadata among different environments and cause significant additional staff expense for libraries to enrich or recreate metadata locally. xc’s architecture actually solves many of these problems by facilitating the sharing of enriched metadata among xc users. xc can also adapt as the library community begins to address the types of costly metadata redundancies mentioned in the above reports, such as between the oclc worldcat database14 and copies of that marc data contained within a library’s ils, because xc will be capable of harvesting metadata from any source that uses a standard api.15 metadata from digital repositories and other free sources xc will harvest metadata from various digital collections and repositories, using oai-pmh, and will maintain a copy of the harvested metadata within the xc metadata cache, as shown in figure 1. the metadata services hub architecture provides flexibility and possible economy for xc users by offering the option for multiple xc institutions to share a single metadata hub, thus allowing participating institutions to take full advantage of the hub’s capabilities to aggregate and augment metadata from multiple sources. while the procedure for harvestmetadata to support next-generation library resource discovery | bowen 9 ing metadata from an external repository is not technologically difficult in itself, managing the flow of metadata coming from multiple sources and aggregating that metadata for use in xc will require the development of sophisticated software. to address this, the xc project team is partnering with established experts in bibliographic metadata aggregation to develop the metadata services portion of the xc architecture. the team from cornell university that has developed the software behind the national science digital library’s metadata management system (nsdl/mms)16 is advising the xc team in the development of the xc metadata services hub, which will be built on top of the basic nsdl/mms software. the xc metadata services hub will coordinate metadata services into a reusable task grouping that can be started on demand or scheduled to run regularly. this xc component will harvest xml metadata and combine metadata records that refer to equivalent resources (based on uniform resource identifier [uri], if available, or other unique identifier) into what the cornell team describes as a “mudball.” each mudball will contain the original metadata, the sources for the metadata, and the references to any services used to combine metadata into the mudball. the mudball may also contain metadata that is the result of further automated processing or services to improve quality or to explicitly identify relationships between resources. hub services could potentially record the source of each individual metadata statement within each mudball, which would then allow a metadata record to be redelivered in its original or in an enriched form when requested.17 by allowing for the capture of provenance data for each data element, the hub could potentially provide much more granular information about the origin of metadata—and much more flexibility for recombining metadata—than is possible in most marcbased environments. after using the redeployed nsdl/mms software as the foundation for the xc metadata hub, the xc project team will develop additional hub services to support xc’s functional requirements. xc-specific hub services will accommodate incoming marc data (including marc holdings data for non-digital resources); basic authority control; mappings from marc 21, marcxml,18 and dublin core to an internal xc schema defined within the xc application profile (described below); and other services to facilitate the functionality of the xc user environments (see discussion of requirement 5, below). finally, the xc hub services will make the metadata available for harvesting from the hub by the xc client integration applications. metadata for licensed content for a next-generation discovery system such as xc to provide access to all library resources, it will need to provide access to licensed content, such as citation data and full-text databases. metasearch technology provides one option for incorporating access to licensed content into xc. unfortunately, various difficulties with metasearch technology19 and usability issues with some metasearch products20 make metasearch technology a less-than-ideal solution. an alternative approach would bring metadata from licensed content directly into a system such as xc. the metadata services hub architecture for xc is capable of handling the ingest and processing of metadata supplied by commercial content providers by adding additional services to handle the necessary schema transformations and to control access to the licensed content. the more difficult issue with licensed content may be to obtain the cooperation of commercial vendors to ingest their metadata into xc. pursuing individual agreements with vendors to negotiate rights to ingest their metadata is beyond the original scope of xc’s phase 2 project. however, the xc team will continue to monitor ongoing developments in this area, especially the work of the ethicshare project, which uses a system architecture very similar to that of xc.21 it remains our goal to build a system that will facilitate the inclusion of licensed content within xc in situations where commercial providers have made it available to xc users. requirement 1 summary when considering needed functionality for a next-generation discovery system, the ability to ingest and manage metadata from a variety of sources is of paramount importance. unlike a current ils, where we often think of metadata as mostly static unless it is supplemented by new, updated, and deleted records, we should instead envision the metadata in a next-generation system as being in constant motion, moving from one environment to another and being harvested and transformed on a scheduled basis. the metadata services hub architecture of the xc system will accommodate and facilitate such constant movement of metadata. requirement 2—the system must handle multiple metadata schemas. an extension of requirement 1 will be the necessity for a next-generation system such as xc to handle metadata from multiple schemas, as the system harvests those schemas from various sources. library metadata priorities as a part of the xc survey of libraries described earlier in this paper, the xt team queried respondents about what metadata schemas they currently use or plan to use in the near future. many responding libraries indicated that they expect to increase their use of non–marc 21 metadata within the next three years, although no library indicated the intention to completely move away from 10 information technology and libraries | june 2008 marc 21 within that time period. nevertheless, the idea of a “marc exit strategy” has been discussed in various circles.22 the architecture of xc will enable libraries to move beyond the constraints of a marc-based system without abandoning their ils, and will provide an opportunity for libraries to stage their “marc exit strategy” in a way that suits their purposes. libraries also indicated that they plan to move away from homegrown schemas toward accepted standards such as mets,23 mods,24 mads,25 premis,26 ead,27 vra core,28 and dublin core.29 several responding libraries plan to move toward a wider variety of metadata schemas in the near future, and will focus on using xmlbased schemas to facilitate interoperability and metadata harvesting. to address the needs of these libraries in the future, xc’s metadata services will contain a variety of transformation services to handle a variety of schemas. taking into account the metadata schemas mentioned the most often among survey respondents, the software developed during phase 2 of the xc project will support harvested metadata in marc 21, marcxml, and dublin core (including qualified dublin core).30 metadata crosswalks and mapping one respondent to the xc survey offered the prediction that “reuse of existing metadata and transformation of metadata from one format to another will become commonplace and routine.”31 xc’s internal metadata transformations must be designed with this in mind, to facilitate making these activities “commonplace and routine.” fortunately, many maps and crosswalks already exist that potentially can be incorporated into a next-generation system such as xc.32 the metadata services hub architecture for xc can function as a standard framework for applying a variety of existing crosswalks within a single, shared environment. following “best practices” for crosswalking metadata, such as those developed by the digital library federation (dlf),33 will be extremely important in this environment. as the dlf guidelines describe, metadata schema transformation is not as straightforward as it might first appear to be. while the dlf guidelines advise always crosswalking from a more robust schema to a simpler one, sometimes in a series of steps, such mapping will often result in “dumbing down” of metadata, or loss of granularity. this is a particularly important concern for the xc project because a large percentage of the metadata handled by xc will be rich legacy marc 21 metadata, and we hope to maintain as much of that richness as possible within the xc system. in addition to simply mapping one data element in a schema to its closest equivalent in another, it is essential to ensure that the underlying metadata models of the two schemas being crosswalked are compatible. the authors of the framework for a bibliographic future draft document define multiple layers of such models that need to be considered,34 and offer a general highlevel comparison between the frbr data model35 and the dcmi (dublin core metadata initiative) abstract model (dcam).36 more detailed comparisons of models are also taking place as a part of the development of the new metadata content standard, resource description and access (rda).37 the developers of rda have issued documents offering a detailed mapping of rda elements to rda’s underlying model (frbr)38 and analyzing the relationship between rda elements, the dcmi abstract model, and the metadata framework.39 as a result of a meeting held april 30–may 1, 2007, a joint dcmi/rda task group is now undertaking the collaborative work necessary to carry out the following tasks: n develop an rda element vocabulary. n develop an rda/dublin core application profile based on frbr and frad. n disclose rda value vocabularies using rdf/ rdfs/skos.40 these efforts hold much potential to provide a more rigorous way to communicate about metadata across multiple communities and to increase the compatibility of different metadata schemas and their underlying models. such compatibility will be essential to enabling the functionality of future discovery systems such as xc. an xc metadata application profile the xc project team will define a metadata application profile for xc as a way to document decisions made about data elements, content standards, and crosswalking used within the system. the use of an application profile can facilitate metadata migration, harvesting, and other automated processes, and presents an approach to metadata that is more flexible and responsive to local needs than simply adopting someone else’s metadata guidelines.41 application profiles facilitate the use of multiple schemas because elements can be selected for inclusion from more than one existing schema, or additional elements can be created and defined locally.42 because the xc system will incorporate harvested metadata from a variety of sources, the use of an application profile will be essential to support xc’s complex system requirements. the dcmi community has published guidelines for creating a dublin core application profile (dcap), which is defined more specifically as: [a] form for documenting which terms a given application uses in its metadata, with what extensions or adaptations, and specifying how those terms relate both to formal standards such as dublin core as well as to less formally defined element sets and vocabularies.43 metadata to support next-generation library resource discovery | bowen 11 the announcement of plans to develop an rda/ dublin core application profile illustrates the important role that application profiles are beginning to take to facilitate the interoperability of metadata schemas. the planned rda/dc application profile will “translate” rda into a standard structure that will allow it to be related more easily to other metadata element sets. unfortunately, the rda/dc application profile will likely not be completed in time for it to be incorporated into the first release of the xc software in mid-2009. nevertheless, we intend to use the existing definitions of rda elements to inform the development of the xc application profile.44 this will allow us to anticipate any future incompatibilities between the rda/dc and the xc application profiles, and ensure that xc will be wellpositioned to take advantage of rda-based metadata when rda is implemented. this process may have the reciprocal benefit of also informing the developers of rda of any rda elements that may be difficult to implement within a next-generation system such as xc. the potential value of rda to the xc project—in terms of providing a consistent approach to bibliographic and authority metadata and facilitating frbr-related user functionality—is very significant. it is hoped that at some point xc can become an early adopter of rda and provide a mechanism through which libraries can move their legacy marc 21 metadata into a system that is compatible with an emerging international metadata standard. n goal 2: bring metadata about library resources into a more open web environment xc will reveal library metadata not only through its own separate interface (either the out-of-the-box xc interface or an interface designed by the local library), but will also allow library metadata to be revealed through other web applications. the latter approach will bring library resources directly to web locations that library users are already visiting, rather than attempting to entice users to visit an additional library-specific web location. making library metadata work effectively in the broader web environment (outside the well-defined boundaries of an ils or repository) will require the following requirements 3 and 4: requirement 3—metadata must conform to the standards of the new web environments as well as to that of the system from which it originated. achieving requirement 3 will require library metadata in future systems to perform a dual function: to conform to both existing library standards as well as to web standards and conventions. one way to achieve this is to ensure that the two types of standards themselves are compatible. coyle and hillmann have argued persuasively for changes in the direction of rda development to allow metadata created using rda to function in the broader web environment. these changes include the need to follow a clearly refined, high-level metadata model, to create data elements that can be manipulated by machines, and to move toward the use of uris instead of textual identifiers.45 after the announcement of the outcomes of the rda/dc data modeling meeting, the two authors are considerably more optimistic about rda functioning as a standard within the broader web environment.46 this discourse concerning rda shows but a piece of the process through which long-established library metadata standards need to be reexamined to make library metadata understandable to both humans and machines on the web. moving away from aacr2 toward rda, and ultimately toward incorporating standard web conventions into library metadata, can be a difficult process for those involved in creating and maintaining library standards. nevertheless, transforming library metadata standards in this way is essential to fulfill the requirements necessary for next-generation library discovery systems. requirement 4—metadata must function effectively within the new web environments as well as within the system from which it originated. not only must metadata for a next-generation system follow the conventions and standards used in the broader web, but the data also needs to be able to function effectively in a broader web environment. this is a slightly different proposition from requirement 3, and will necessitate testing the metadata standards themselves to ensure that they enable library metadata to function effectively. the xc project will provide direct experience with using library metadata in two types of web environments: content management systems and learning management systems. library metadata in a content management system as shown in the xc architecture diagram in figure 1, the xc project team will build one of the primary user environments for xc on top of the open-source content management system, drupal.47 the xc drupal module will allow us to respond to many of the needs expressed by libraries in their responses to the xc survey48 by supplying: n a web application server with a back-end database; 12 information technology and libraries | june 2008 n a user interface with web 2.0 features; n library-controlled web pages that will treat library metadata as a native data type; n a metadata interface for enhancing or correcting metadata in the system; and n an administrative interface. the xc team will bring library metadata into the drupal content management system (cms) as a native content type within that environment, creating a drupal “node” for each metadata record. this will allow xc to take advantage of many native features of the drupal cms, such as a taxonomy system.49 building xc interfaces on top of the drupal cms will also give us an opportunity to collaborate with partner libraries that are already active participants in the drupal user community. xc’s architecture will allow the possibility of developing additional user environments on top of other content management systems. bringing library metadata into these new environments will provide many new opportunities for libraries to manipulate their metadata and present it to users without being constrained by the limitations of the current generation of library systems. such opportunities will then inform the future requirements for library metadata in such environments. library metadata in a learning management system figure 1 illustrates two examples of xc user environments through learning management systems: xc interfaces to both the blackboard learning system50 and sakai.51 much exciting work is being done at other institutions to bring library content into these web applications.52 xc will build on projects such as these to reveal library metadata for non-licensed library resources from an ils through learning management systems. specifically, we plan to develop the capability for libraries to make the display of library metadata context-sensitive within the learning management system. for example, searching or browsing on a page for a particular academic course could be configured to reflect the subject area of the course (e.g., chemistry) and automatically present library resources related to that subject.53 this capability will build upon the experiences gained by the university of rochester through its work to develop its “course resources” system.54 such xc functionality will be integrated directly into the learning management system, rather than simply providing a link out to a separate library system. again, we hope that our efforts to bring library metadata into these new environments will encourage libraries to engage in further work to integrate library resources into broader web environments and inform future requirements for library metadata in these environments. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing new functionality for users will require that metadata fulfill more sophisticated functions in a next-generation system than it may have done in an ils or repository, in order to provide more intuitive searching and navigation. the system will also need to capture and incorporate metadata generated through tagging, user-contributed reviews, etc. such new functionality creates the need for requirements 5 and 6. requirement 5—metadata must support functionality to facilitate intuitive searching and navigation, such as faceted browsing and frbrinformed results groupings. enabling faceting and clustering much research has already been done regarding the design of faceted search interfaces in general.55 when considered along with user research conducted at other institutions56 and to be conducted during the development of xc, this data provides a strong foundation for the design of a faceted browse environment. the xc project team has already gained firsthand experience with developing faceted browsing through the development of the “c4” prototype interface during phase 1 of the xc project.57 to enable faceting within xc, we will also pay particular attention to what others have discovered through designing faceted interfaces on top of legacy marc 21 metadata. specific lessons learned from those involved with north carolina state university’s endeca-based catalog,58 vanderbilt university’s primo implementation,59 and plymouth state university’s scriblio system60 provide valuable guidance for the xc project team as we design facets for the xc system. ideally, a mechanism should be developed to enable these discoveries to feed back into the development of metadata and encoding standards, so that changes to existing standards can be considered to facilitate faceting in the future. several new system implementations have used library of congress subject headings (lcsh) and lc subdivisions from marc 21 records as the basis for deriving facets. the xc “c4” prototype interface provides facets for topic, genre, and region that are based simply upon one or more marc 21 6xx tags.61 north carolina state university’s endeca-based system has enabled facets for topic, genre, region, and era using lcsh subdivisions as well, but this has necessitated a “massive cleanup” of subdivisions, as described by charley pennell.62 oclc’s fast (faceted application of subject terminology) project may provide another option for enabling such facets.63 a library could populate its marc 21 data with fast headings, based metadata to support next-generation library resource discovery | bowen 13 upon the existing lcsh in the records, and then use the fast headings as the basis for generating facets. it remains to be seen whether fast will offer significant benefit over lcsh itself when it comes to faceting, however, since fast headings are generated directly from lcsh. while marc 21 metadata has some known difficulties where faceting and clustering are concerned (such as those involving lcsh), the xc system will encounter additional difficulties when implementing these technologies with less robust metadata schemas such as simple dublin core, and especially across metadata from a variety of schemas. the development of web services to augment batches of metadata records in an automated manner holds some promise for improving the creation of facets from other metadata schemas. within the xc system, such services could be added to the metadata services hub and run against ingested metadata. while designing extensive services of this type is beyond the scope of the next phase of xc software development, we will encourage others to develop such services for xc. another (but much less desirable) approach to augmenting metadata is for a metadata specialist to manually edit one record or group of records. the xc cataloging interface, built within the drupal cms, will allow recordby-record editing of metadata when necessary. while we see this editing interface as essential functionality for xc, we anticipate that libraries will want to use this feature sparingly. in many cases it will be preferable to correct or augment metadata within its original repository (e.g., the institution’s ils) and then re-harvest the corrected metadata, rather than correcting it manually within xc itself. because of the expense of manual metadata augmentation and correction, libraries will be well-advised to rely upon insights gained through user research to assess the value of this type of work. for example, a library might decide to edit individual metadata records only when the correction or augmentation will support specific system functionality that is of high priority for the institution’s users. implementing frbr results groupings to incorporate logical groupings of search results based upon the frbr64 and frad65 data models over sets of diverse metadata within xc, we will encounter similar difficulties that we face with faceting and clustering. various analyses of the marc 21 formats have dealt extensively with the relationship between frbr and marc 21,66 and others have written specifically about methodology for frbrizing a marc-based catalog.67 in addition, various tools and web services are available that can potentially facilitate this process.68 even with this extensive body of work to draw upon, however, the success of our implementation of frbr-based functionality will depend upon both the quality and completeness of the system’s metadata. metadata in xc that originated as dublin core records may need significant augmentation to be incorporated effectively into frbrized results displays. to maximize the ability of the system to support frbr/frad results groupings, we may need to supplement automated grouping of resources with a combination of additional services for the metadata services hub, and with cataloger-generated metadata correction and augmentation, as described above.69 the xc team will use the results of user research carried out during the next phase of the xc project to inform our decision-making regarding what frbr-informed results grouping users find helpful, and then assess what specific metadata augmentation services are needed for xc. providing frbr-informed groupings of related records in search results will be easier when the underlying metadata incorporates principles of authority control. of course, the vast majority of the non-marc metadata that will be ingested into xc will not be under authority control. again, this situation suggests the need for additional services or functionality to improve existing metadata within the xc metadata hub, the xc cataloging interface, or both. as an experiment in developing services to facilitate authority control, the xc project team carried out a pilot project in partnership with a group of software engineering students from the rochester institute of technology (rit) during phase 1 of xc. the rit students designed a basic name access control tool that can be used across disparate metadata schemas in an environment such as xc. the tool can ingest marc 21 authority and bibliographic records as well as dublin core records, provide automated matching, and facilitate a cataloger’s handling of problem reports.70 the xc project team will implement the automated portion of the tool as a web service within the xc hub, and the “cataloger facilitation” portion of the tool within the xc cataloging user interface. institutions that use xc can then incorporate additional tools to facilitate authority control into xc as they are needed and developed. in addition to providing a test case for developing xc metadata services, the rit pilot project proved valuable by providing an opportunity for student software developers and catalogers to discuss the functional requirements of a cataloging tool. not only did the experience enable the developers to understand the needs of the system’s intended users, but it also presented an opportunity for the engineering students to demonstrate technological possibilities that the catalogers—who work almost exclusively with legacy ils technology—may not have envisioned before participating in the project. requirement 6—the system must manage usergenerated metadata resulting from user tagging, submission of reviews, etc. because users now expect web-based tools to offer web 2.0 functionalities, the xc project has as one of its basic 14 information technology and libraries | june 2008 goals to incorporate these functionalities into xc’s user environments. the results of the xc survey rank tools to support the finding, gathering, use, and reuse of scholarly content (e.g., rss feeds, blogs, tagging, user reviews) eighth out of a list of twenty new desirable opac features.71 we expect to learn much more about the usefulness of web 2.0 technology within a next-generation system through the user research that we will carry out during phase 2 of the xc project. the xc system will capture metadata generated by users from any one of the system’s user environments (e.g., drupal-based interface, learning management system integration) and harvest it back into the system’s metadata services hub for processing.72 the xc application profile will incorporate user-generated metadata, mapped into its own carefully defined metadata elements. this will allow us to capture and manage this metadata as discrete content, without inadvertently mixing it with other metadata created by library staff or ingested from other sources. n goal 4: conduct user research to inform system development user research will be essential to informing the design and functionality of the xc software. to align xc’s functional requirements as closely as possible with user needs, the xc project team will practice a user-centered design methodology that takes an iterative approach to defining the system’s functional requirements. since we will engage concurrently in the processes of user research and software design, we will not fully determine the system requirements for xc until a significant amount of user research has been done. a complete picture of the demands upon metadata within xc will thus emerge as we gain information from our user research. n goal 5: publish the xc code as open-source software central to the vision of the xc project is sharing the xc software freely throughout the library community and beyond. our hope is that others will use all or part of the xc software, modify it, and improve it to meet their own needs. new requirements for the metadata within xc are likely to arise as this process takes place. other future changes to the xc software will also be needed to ensure the software’s continued compatibility with various metadata standards and schemas. these changes will all affect the system requirements for xc over time. addressing goals 4 and 5 while goals 1 through 3 for the xc project result in specific high-level functional requirements for the system’s discovery metadata that can be addressed and discussed as xc is being developed, goals 4 and 5 present general challenges that must be addressed in the future. goal 4 is likely to fuel the need to update the xc software over time as the needs of users change. goal 5 provides a challenge to managing that updating process in a collaborative environment. these two goals suggest an additional general requirement for the system’s metadata requirement 7: requirement 7—the system’s metadata must be extensible to facilitate future enhancements and updates. enabling future user needs developing xc using a user-centered design process in which user research and software design occur simultaneously will enable us to design and build a system that is as responsive as possible to the needs of users that are seeking library resources. however, user needs will change during the life of the xc software. these needs must be assessed and addressed, and then weighed against the desires of individual institutions that use xc and who request specific system enhancements. to carry forward the xc project’s commitment to serving users, we will develop a governance model for the xc community that brings the needs of future users into the decision-making process by providing a method for continuing to determine and capture user needs. in addition, we will consciously cultivate a commitment to user research among members of the xc community. because the xc software will be released as open source, we can also encourage xc partners to develop whatever additional functionality they need for their own institutions and make these enhancements available to the entire community of xc users. this approach is very different from the enhancement process in place for most commercial systems, and xc partner institutions may need to adjust to this approach. enabling future metadata standards as current metadata standards are revised and new standards and schemas are created, xc must be able to accommodate these changes. new crosswalks will allow new metadata schemas to be mapped to the xc internal schema in the future. the xc application profile can be updated with the addition of new data elements as needed. the drupal-based xc user environment will also allow institutions that use xc to create new internal data types to incorporate additional types of metadata. as the development of the semantic web moves forward73 and enables smart linking between existing authority files and vocabularies,74 xc’s architecture can make use of the resulting web services, either by incorporating them metadata to support next-generation library resource discovery | bowen 15 through the xc metadata services hub or through the native xc user interface as part of a user search query. n further considerations the above discussion of the goals and requirements for xc has revealed a number of issues related to the development of next-generation discovery systems that are unfortunately beyond the scope of the next phase of the xc project. we therefore offer them as a possible agenda for future work by the broader library community: 1. explore the wider usefulness of web-based metadata services and the need for an automated metadata services coordinator to control these functions. libraries are already comfortable with basic “services” that are performed on metadata by an outside agency: for example, a library may send copies of its marc records to a vendor for authority processing or enrichment with tables of contents or other data elements. the library community should encourage vendors and others to develop these and other metadata enrichment options as automated web services. 2. study the advantages of using statement-level metadata provenance, as used in the nsdl metadata management system and considered for use within the xc metadata services hub, and explore whether there are ways that marc 21 could move toward allowing more granularity in recording and sharing metadata provenance. 3. to facilitate access to licensed library resources, encourage the development of more robust metasearch technology and standards so that technological limitations do not hinder system performance and search result usability. if this is not successful, libraries and content providers must work together to enable metadata for licensed resources to be revealed within open discovery environments such as xc and ethicshare.75 this second scenario will enable libraries to directly address usability issues with the display of licensed content, which may make it a more desirable longer-term solution than attempting to improve metasearch technology. 4. the administrative bodies of the two groups represented on the dcmi/rda task group (i.e., the dublin core metadata initiative and the rda committee of principals) have a responsibility to take the lead in funding this group’s work to develop and maintain the rda/dc application profile and its related registries and vocabularies. beyond this, however, the broader library community must recognize that this work is essential to ensure that future library metadata standards will function in the broader web environment, and offer additional administrative and financial support for it in the coming years. 5. to ensure that library standards work effectively outside of traditional library systems, catalogers and metadata experts must develop ongoing, collaborative working relationships with system developers. such collaboration will necessitate educating each group of experts about the domain of the other. 6. libraries should experiment with using metadata in new environments and use the lessons learned from this activity to inform the metadata standards development process. while current library automation environments by and large do not provide opportunities for this, the extensible catalog will provide a flexible platform where experimentation can take place.76 xc will make experimentation as risk-free as possible by ensuring that the original metadata brought into the system can be reharvested in its original form, thus minimizing concerns about possible data corruption. xc will also minimize the investment needed for a library to engage in this experimentation because it will be released as open-source software. 7. to facilitate new functionality for next-generation library discovery environments, libraries must share their new expertise in this area with each other. for example, library professional organizations (such as ala and its associations) should form discussion groups and committees devoted to sharing lessons learned from the implementation of faceted interfaces and web 2.0 technologies, such as tagging and folksonomies. such groups should develop a “best practices” document outlining a preferred way to define facets from marc 21 data that can be used by any library implementing faceting on top of its legacy metadata. 8. the library community should discuss and encourage mechanisms for pooling and sharing usergenerated metadata among libraries and other interested institutions. n conclusions to present library resources via the web in a manner that users now expect, library metadata must function in ways that have never been required of it before. making library metadata function effectively within the broader web environment will require that libraries take advantage of the combined knowledge of experts in the areas of cataloging/metadata and system development who share a 16 information technology and libraries | june 2008 common vision for serving library users. the challenges to making legacy library metadata and newer metadata for digital resources interact effectively in the broader web environment are significant, and work must begin now to ensure that we can preserve the investment that libraries have made in their legacy metadata. while the recommendations within this report are the result of planning to develop one particular library discovery system—the extensible catalog (xc)—these lessons can inform the development of other systems as well. the actual development of xc will continue to add to our knowledge in this area. while it may be tempting to wait and see what commercial vendors offer as their next generation of commercial discovery products, such a passive approach may jeopardize the future viability of library metadata. projects such as the extensible catalog can serve as a vehicle for moving forward by providing an opportunity for libraries to experiment and to then take informed action to move the library community toward a next generation of resource discovery systems. acknowledgments phase 1 of the extensible catalog project was funded through a grant from the andrew w. mellon foundation. this paper is in partial fulfillment of that grant, originally funded on april 1, 2006, and concluding on june 30, 2007. the author acknowledges the contributions of the entire university of rochester extensible catalog project team to the content of this paper, and especially thanks david lindahl, barbara tillett, and konstantin gurevich for reading and offering suggestions on drafts of this paper. references and notes 1. despite the use of the word “catalog” within the name of the extensible catalog project, this paper will avoid using the word “catalog” in the phrase “next-generation catalog” because this may misleadingly convey the idea of a catalog as solely a single, separate web destination for library users. instead, terms such as “discovery environment” and “discovery system” will be preferred. 2. the xc blog provides a list of xc partners, describes their roles in xc phase 2, and provides links to reports that represent the outcomes of xc phase 1. “xc (extensible catalog): an opensource online system that will unify access to traditional and digital library resources,” www.extensiblecatalog.info (accessed october 4, 2007). 3. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998), www.ifla.org/vii/s13/frbr/ frbr.pdf (accessed july 23, 2007). 4. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data: a conceptual model,” april 1, 2007, www.ifla.org/vii/d4/franar-conceptualmodel2ndreview.pdf (accessed july 23, 2007). 5. library of congress, network development and marc standards office, “marc 21 formats,” april 18, 2005, www.loc .gov/marc/marcdocz.html (accessed september 3, 2007). 6. “dublin core metadata element set, version 1.1,” december 20, 2004, http://dublincore.org/documents/dces (accessed september 3, 2007). 7. university of rochester river campus libraries, “extensible catalog phase 2,” (grant proposal submitted to the andrew w. mellon foundation, july 11, 2007). 8. “literature list,” extensible catalog blog, www. extensiblecatalog.info/?page_id=17 (accessed august 27, 2007). 9. a summary of the results of this survey is available on the xc blog. nancy fried foster et al., “extensible catalog survey report,” july 20, 2007, www.extensiblecatalog.info/wp-content/ uploads/2007/07/xc%20survey%20report.pdf (accessed july 23, 2007). 10. lorcan dempsey has written of the need for a service layer for libraries that would facilitate the “de-coupling” of resource retrieval from back-end processing. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, january 20, 2006, http://orweblog.oclc.org/archives/000927. html (accessed august 24, 2007). 11. “open archives initiative protocol for metadata harvesting v. 2.0,” www.openarchives.org/oai/openarchivesprotocol. html (accessed august 27, 2007). 12. library of congress, working group on the future of bibliographic control, “report on the future of bibliographic control: draft for public comment,” november 30, 2007, www .loc.gov/bibliographic-future/news/lcwg-report-draft-11-3007-final.pdf (accessed december 30, 2007). 13. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, 34, http://libraries. universityofcalifornia.edu/sopag/bstf/final.pdf (accessed august 24, 2007). 14. “[worldcat.org] search for an item in libraries near you,” www.worldcat.org (accessed august 24, 2007). 15. oclc’s plan to create additional apis to worldcat as part of its worldcat grid project is a welcome development that may enable oclc members to harvest metadata directly from worldcat into a system such as xc in the future. see the following blog posting for an early description of oclc’s plans, which have not been formally unveiled by oclc as of this writing: bess sadler, “the librarians and the chocolate factory: oclc developer network day,” solvitur ambulando, october 3, 2007, www.ibiblio.org/bess/?p=88 (accessed december 30, 2007). 16. “metadata management system,” nsdl registry, september 20, 2006, http://metadataregistry.org/wiki/index.php/ metadata_management_system (accessed july 23, 2007). 17. diane hillmann, stuart sutton, and jon phipps, “nsdl metadata improvement and augmentation services,”(grant proposal submitted to the national science foundation, 2007). 18. library of congress, network development and marc standards office, “marcxml: marc 21 xml schema,” july 26, 2006, www.loc.gov/standards/marcxml (accessed september 3, 2007). metadata to support next-generation library resource discovery | bowen 17 19. andrew k. pace, “category: metasearch,” hectic pace, http://blogs.ala.org/pace.php?cat=150 (accessed august 27, 2007). see in particular the following blog entries: “metameta,” july 25, 2006; “more meta,” september 29, 2006; “preaching to the publishers,” oct 31, 2006; “even more meta,” july 11, 2007; and “still here,” august 21, 2007. 20. david lindahl, “metasearch in the users’ context,” the serials librarian 51, no. 3/4 (2007): 220–222. 21. ethicshare, a collaborative project of the university of minnesota, georgetown university, indiana university–bloomington, indiana university–purdue university indianapolis, and the university of virginia, is addressing this challenge as part of its plan to develop a sustainable online environment for the practical ethics community. the architecture of the proposed ethicshare system has many similarities to that of xc, but the project focuses specifically upon ingesting citation metadata from a variety of sources, including commercial providers. see cecily marcus, “ethicshare planning phase final report,” july 2007, www.lib.umn.edu/about/ethicshare/university%20 of%20minnesota_ethicshare_final_report.pdf (accessed august 27, 2007). 22. roy tennant used this phrase in “marc exit strategies,” library journal 127, no. 19 (november 15, 2002), www.libraryjournal.com/article/ca256611.html?q=tennant+exit (accessed july 23, 2007); karen coyle presented her vision for moving beyond marc to a more flexible, identifier-based record structure that will facilitate a range of library functions in “future considerations: the functional library systems record,” library hi tech 22, no. 2 (2004). 23. library of congress, network development and marc standards office, “mets: metadata encoding and transmission standard official web site,” august 23, 2007, www.loc.gov/ standards/mets (accessed september 3, 2007). 24. library of congress, network development and marc standards office, “mods: metadata object description schema,” august 22, 2007, www.loc.gov/standards/mods (accessed september 3, 2007). 25. library of congress, network development and marc standards office, “mads: metadata authority description schema,” february 2, 2007, www.loc.gov/standards/mads (accessed september 3, 2007). 26. “premis: preservation metadata maintenance activity,” july 31, 2007, www.loc.gov/standards/premis (accessed september 3, 2007). 27. library of congress, network development and marc standards office, “ead: encoded archival description version 2002 official site,” august 17, 2007, www.loc.gov/ead (accessed september 3, 2007). 28. visual resources association, “vra core: welcome to the vra core 4.0,” www.vraweb.org/projects/vracore4 (accessed september 3, 2007). 29. “dublin core metadata element set, version 1.1.” 30. other xml-compatible schemas, such as mods and mads, will also be supported initially in xc if they are first converted into marc xml or qualified dublin core. in the future, we plan to allow these other schemas to be harvested directly into xc. 31. foster et al., “extensible catalog survey report,” july 20, 2007, 15. the original comment was submitted by meg bellinger in yale university’s response to the xc survey. 32. patricia harpring et al., “metadata standards crosswalks,” in introduction to metadata: pathways to digital information (getty research institute, n.d.), www.getty.edu/research/ conducting_research/standards/intrometadata/crosswalks. html (accessed august 29, 2007); see also carol jean godby, jeffrey a. young, and eric childress, “a repository of metadata crosswalks,” d-lib magazine 10, no. 12 (december 2004), www .dlib.org/dlib/december04/godby/12godby.html (accessed july 23, 2007). 33. digital library federation, “crosswalkinglogic,” june 22, 2007, http://webservices.itcs.umich.edu/mediawiki/oaibp/ index.php/crosswalkinglogic (accessed august 28, 2007). 34. karen coyle et al., “framework for a bibliographic future,” may 2007, http://futurelib.pbwiki.com/framework (accessed july 23, 2007). 35. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 36. andy powell et al., “dcmi abstract model,” dublin core metadata initiative, june 4, 2007, http://dublincore.org/ documents/abstract-model (accessed august 29, 2007). 37. joint steering committee for development of rda, “rda: resource description and access: background,” july 16, 2007, www.collectionscanada.ca/jsc/rda.html (accessed august 29, 2007). 38. joint steering committee for development of rda, “rda-frbr mapping,” june 14, 2007, www.collectionscanada .ca/jsc/docs/5rda-frbrmapping.pdf (accessed august 29, 2007). 39. joint steering committee for development of rda, “rda element analysis,” june 14, 2007, www.collectionscanada.ca/ jsc/docs/5rda-elementanalysis.pdf (accessed august 28, 2007). a revised version of the document was issued on december 16, 2007, at www.collectionscanada.gc.ca/jsc/docs/5rda-element analysisrev.pdf (accessed december 30, 2007). 40. “data model meeting: british library, london 30 april–1 may 2007,” www.bl.uk/services/bibliographic/meeting.html (accessed july 23, 2007). the task group has outlined its work plan, including deliverables, on its wiki at http://dublincore .org/dcmirdataskgroup (accessed october 4, 2007). 41. emily a hicks, jody perkins, and margaret beecher maurer, “application profile development for consortial digital libraries,” library resources and technical services 51, no. 2 (april 2007). 42. makx dekkers, “application profiles, or how to mix and match metadata schemas,” cultivate interactive, january 2001, www.cultivate-int.org/issue3/schemas (accessed august 29, 2007). 43. thomas baker et al., “dublin core application profile guidelines,” september 3, 2005, http://dublincore.org/usage/ documents/profile-guidelines (accessed october 8, 2007). 44. joint steering committee for development of rda, “rda element analysis.” 45. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (jan./feb. 2007), www.dlib.org/dlib/ january07/coyle/01coyle.html (accessed august 24, 2007). 46. karen coyle, “astonishing announcement: rda goes 2.0,” coyle’s information, may 3, 2007, http://kcoyle.blogspot .com/2007/05/astonishing-announcement-rda-goes-20.html (accessed august 29, 2007). 18 information technology and libraries | june 2008 47. “drupal.org,” http://drupal.org (accessed august 30, 2007). 48. foster et al., “extensible catalog survey report,” 14. 49. “taxonomy: a way to organize your content,” drupal.org, http://drupal.org/handbook/modules/taxonomy (accessed september 12, 2007). 50. “blackboard learning system,” www.blackboard.com/ products/academic_suite/learning_system/index.bb (accessed august 31, 2007). 51. “sakai: collaboration and learning environment for education,” http://sakaiproject.org (accessed august 31, 2007). 52. for example, the library into blackboard project at california state fullerton has developed a toolkit for faculty that brings openurl resolver functionality into blackboard to create linked citations to resources. see “putting the library into blackboard: a toolkit for cal state fullerton faculty,” 2005, www .library.fullerton.edu/librarytoolkit/default.shtml (accessed august 31, 2007); and susan tschabrun, “putting the library into blackboard: using the sfx openurl generator to create a toolkit for faculty.” the sakaibrary project at indiana university and the university of michigan are working to integrate licensed library content into sakai using metasearch technology. see “sakaibrary: integrating licensed library resources with sakai,” june 28, 2007, www.dlib.indiana.edu/projects/sakai (accessed august 31, 2007). 53. university of rochester river campus libraries, “extensible catalog phase 2.” 54. susan gibbons, “library course management systems: an overview,” library technology reports 41, no. 3 (may/june 2005): 34–37. 55. marti a. hearst, “design recommendations for hierarchical faceted search interfaces,” august 2006, http:// flamenco.berkeley.edu/papers/faceted-workshop06.pdf (accessed august 31, 2007). 56. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology and libraries 25, no. 3 (september 2006): 128–138. 57. “c4,” https://www.library.rochester.edu/c4 (accessed september 28, 2007). as of the time of this writing, the c4 prototype is available to the public. however, the prototype is no longer being developed, and this prototype may cease to be available at some point in the future. 58. charley pennell, “forward to the past: resurrecting faceted search @ ncsu libraries,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.lib.ncsu.edu/endeca/ presentations/200706-facetedcatalogs-pennell.ppt (accessed august 31, 2007). 59. mary charles lasater, “authority control meets faceted browse: vanderbilt and primo,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.ala.org/ala/lita/litamembership/ litaigs/authorityalcts/2007annualfiles/marycharleslasater.ppt (accessed august 31, 2007). 60. casey bisson, “faceting and clustering: an implementation report based on scriblio,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), http://oz.plymouth.edu/~cbisson/ presentations/alaannual_2-2007june24.pdf (accessed august 31, 2007). 61. “subject access fields (6xx),” in marc 21 concise format for bibliographic data (2006), www.loc.gov/marc/bibliographic/ ecbdsubj.html (accessed september 28, 2007). 62. pennell, “forward to the past: resurrecting faceted search@ ncsu libraries.” 63. “fast: faceted application of subject terminology,” www.oclc.org/research/projects/fast (accessed august 31, 2007). 64. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 65. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data.” 66. library of congress, network development and marc standards office, “functional analysis of the marc 21 bibliographic and holding formats,” april 6, 2006, www.loc. gov/marc/marc-functional-analysis/functional-analysis.html (accessed august 31, 2007); martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 2 (june 2005): 77–95; pat riva, “mapping marc 21 linking entry fields to frbr and tillett’s taxonomy of bibliographic relationships,” library resources and technical services 48, no. 2 (april 2004): 130–143. 67. trond aalberg, “a process and tool for the conversion of marc records to a normalized frbr implementation,” in digital libraries: achievements, challenges and opportunities (berlin/heidelberg: springer, 2006), 283–292; christian monch and trond aalberg, “automatic conversion from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2003): 405–411; david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib magazine 11, no. 10 (october 2005), www .dlib.org/dlib/october05/crane/10crane.html (accessed august 24, 2007). 68. trond aalberg, frank berg haugen, and ole husby, “a tool for converting from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2006), 453–456; “frbr work-set algorithm,” www .oclc.org/research/software/frbr/default.htm (accessed august 31, 2007); “xisbn (web service),” www.worldcat .org/affiliate/webservices/xisbn/app.jsp (accessed august 31, 2007). 69. for example, marc 21 data may need to be augmented to extract data attributes related to frbr works and expressions that are not explicitly coded within a marc 21 bibliographic record (such as a date associated with a work coded within a general note field); or to “sort out” the fields in a marc 21 bibliographic record for a single resource that contains various works and/or expressions (e.g. ,a sound recording with multiple tracks), to associate the various fields (performer access points, analytical entries, subject headings, etc.) with the appropriate work or expression. 70. while the rit-developed tool is not publicly available at the time of this writing, it is our intent to post it to sourceforge (www.sourceforge.net) in the near future. the final report of the rit project is available at http://docushare.lib.rochester.edu/ docushare/dsweb/get/document-27362 (accessed january 2, 2008). metadata to support next-generation library resource discovery | bowen 19 71. foster et al., “extensible catalog survey report.” 72. note the arrow pointing to the left in figure 1 between the user environments and the metadata services hub. 73. jane greenberg and eva mendez, knitting the semantic web (binghamton, ny: haworth information press, 2007). this volume, co-published simultaneously as cataloging and classification quarterly 43, no. 3/4, contains a wealth of articles that explore the role that libraries can, and should, play in the development of the semantic web. 74. corey a. harper and barbara b. tillett explore various methods for making these controlled vocabularies available in “library of congress controlled vocabularies and their application to the semantic web,” cataloging and classification quarterly 43, no. 3/4 (2007): 63. the development of skos (simple knowledge organization system), a semantic web language for representing controlled structured vocabularies, will also be valuable for xc. see alistair miles and jose r. perez-aguiera, “skos: simple knowledge organisation for the web,” catalogingand classification quarterly 43, no. 3/4 (2007). 75. marcus, “ethicshare planning phase final report.” 76. the talis platform provides another promising environment for experimentation and development. see “talis platform: semantic web application platform,” talis, www.talis.com/ platform (accessed september 2, 2007). article title | author 3editorial | truitt 3 editorial: beginnings marc truitt as i write these lines in late february, the first hints of spring on the alberta prairie are manifest. alternatively, perhaps it’s just that the longer and warmer days are causing me to “think spring.” there are no signs yet of early bulbs—at least, none that i can detect with around a foot of snow in most places—but the sun is now rising at 7:30 a.m. and not setting until 6 p.m., a dramatic change from the barely seven hours of daylight typical of december and january. and while none but the hardiest souls are yet outside in shorts and shirt-sleeves, somehow, daytime highs that hover around freezing seem downright pleasant in comparison with the minus thirties (not counting the wind chill) we were experiencing even a couple of weeks ago. yes, spring is in the air, even if the calendar says it is still nearly a month away. . . . so what, you may fairly ask, does the weather in edmonton have to do with ital? this is my first issue of ital as editor, and it may not surprise you to hear that i’ve been thinking quite a bit about what might be the right theme and tone for my first column. while i’ve been associated with the journal for quite awhile—first as a board member, and more recently as managing editor—my role has always been comfortably limited to background tasks such as refereeing papers and production issues. now, that is about to change; i am stepping a bit out of my comfort zone. it’s about beginnings. i follow with some awe in the footsteps of a long line of editors of ital (and jola, its predecessor). i’ve been honored to serve—and to learn a great deal—from the last two, dan marmion and john webb. you, the readers of ital, and i are fortunate to have as returning managing editor judith carter, who preceded me and taught me the skills required for that post; i hasten to emphasize that she is definitely not responsible for the things i did not do right in the job! regular readers of ital will recall that john webb often referred humorously and admiringly to the members of the ital editorial board as his “junkyard dogs;” he claimed that they kept him honest. with the addition of a couple of fine new members, i’m confident that they will continue to do so in my case! okay, with that as preface, enough about me . . . let’s talk about ital. ■ what’s inside this issue ital content has traditionally represented an eclectic blend of the best mainstream and leading/bleeding edge of library technology. we strive to be reflective of the broad, major issues of concern to all librarians, as well as alert to interesting applications that may be little more than a blip at the edge of our collective professional radar screen. our audience is not limited to those actively working in library technology, although they certainly form ital’s core readership; rather, we seek to identify and publish content that will be relevant to all with an interest in or need to know about how technology is affecting our profession. thus, some articles will resonate with staff seeking new ways to use web 2.0 technologies to engage our readers, while other articles will be of interest to those interested in better exploiting the four decades’ worth of bibliographic metadata that forms the backbone of our integrated library systems. the current issue of ital is no exception in this regard. we lead off with two papers that reflect the renewed interest of the past several years in the role and improvement of the library online catalog. jia mi and cathy weng review opac interfaces, searching functionality, and results displays to address the question of why the current opac is ineffective and what we can do to revitalize it. timothy dickey, in a contribution that received the 2007 lita/ exlibris student writing award,1 summarizes the challenges and benefits of a frbr approach to current and “next-gen” library catalogs. interestingly, as will become clear at the end of this column, dickey’s is not the first prize-winning frbr study to appear in the pages of ital. online learning has long been a subject of interest both to librarians and to the education sector as a whole. whereas the focus of many previous studies has been on the techniques and efficacy of online learning systems, though, connie haley’s paper takes a rather different approach, describing and exploring factors that characterize the preference of learners for online training, as compared with more traditional in-person techniques. in gary wan’s and zao liu’s investigation of contentbased information retrieval (cbir) in digital libraries, the authors describe and argue for systems that will enable identification of images and audio clips by automated comparison against digital libraries of image and audio files. finally, wooseob jeong prototypes an innovative application for enhancing web access by the visually impaired. jeong’s application makes use of force feedback, an inexpensive, proven technology drawn from the world of video gaming. ■ some ideas about where we are going a change of editorship is always one of those good opportunities for thinking about how we might improve, or of marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2008 different directions we might explore. with that in mind, here are a couple of things we’re either going to try, or that we’re considering: different voices. ital’s format has long included provision for two “opinion” columns, one by the editor, and another by the president of lita. from time to time, past editors have given over their columns for guest editorials. however, there are many other voices that could enrich ital’s pages, and the existing structure doesn’t really have a “place” for the regular airing of these voices. beginning with the june 2008 issue, ital will include a regular column contributed by members of the board, on a rotating basis. the column will be about any topic related to technology and libraries that is on the author’s mind. i’m thinking about how we might expand this to include a similar column contributed by ital readers. while such reader contributions may lack the currency of a weblog, i think that they would make for thoughtprovoking commentary. oh, and there’s that “currency thing.” in recent years, those of us who bring you ital have—as have those responsible for other ala publications—discussed at length the whole question of when and how to move to a sustainable model of electronic publishing that will address the needs of readers. this issue is of course especially important in the case of a technology-focused journal, where content tends to age rapidly. unfortunately, for various reasons, we’re not yet at the stage where we can go completely and solely electronic. a recent conversation with one board member, though, surfaced an idea that i think in the meantime has merit: essentially, we might create a preprint site for papers that have been accepted and edited for future publication in ital. we might call it something such as ital express, and its mission would be to get content awaiting publication out and accessible. is this a “done-deal”? no, at this stage, it’s just an intriguing idea, and i’d be interested in hearing your views about it . . . or anything else related to ital, for that matter. you can e-mail me at marc.truitt@ualberta.ca. ■ and finally, congratulations dept. last week, martha yee, of the film and television archive at the university of california, los angeles received the alcts cataloging and classification section’s margaret mann citation for 2008. martha was “recognized for her outstanding contributions to the practice of cataloging and her interest in cataloging education . . . [and her] professional contributions[, which] have included active participation in ala and alcts and numerous publications.” of particular note, the citation specifically singled out her work in the areas of “frbr, opac displays, shared cataloging and other important issues, [in which] yee is making a significant contribution to the discussions that are leading the development of our field.” surely among the most important of these is her paper “frbrization: a method for turning online public finding lists into online public catalogs,” which appeared in the june 2005 issue of ital (p. 77–95). archived at the ital site, d-list, the cdl e-scholarship repository, and elsewhere, this seminal contribution has become one of the most accessed and cited works on frbr. we at ital are proud to have provided the original venue for this paper and congratulate martha on being named recipient of the margaret mann award. usability as a method for assessing discovery | ipri, yunkin, and brown 181 tom ipri, michael yunkin, and jeanne m. brown usability as a method for assessing discovery the university of nevada las vegas libraries engaged in three projects that helped identify areas of its website that had inhibited discovery of services and resources. these projects also helped generate staff interest in the usability working group, which led these endeavors. the first project studied student responses to the site. the second focused on a usability test with the libraries’ peer research coaches and resulted in a presentation of those findings to the libraries staff. the final project involved a specialized test, the results of which also were presented to staff. all three of these projects led to improvements to the website and will inform a larger redesign. u sability testing has been a component of the university of nevada las vegas (unlv) libraries web management since our first usability studies in 2000.1 usability studies are a widely used and relatively standard set of tools for gaining insight into web functionality. these tests can explore issues such as the effectiveness of interactive forms or the complexity of accessing full-text articles from third-party databases. they can explore aesthetic and other emotional responses to a site. in addition, they can provide an opportunity to collect input concerning satisfaction with the layout and logic of the site. they can reveal mistakes on the site, such as coding errors, incorrect or broken links, and problematic wording. they also allow us to engage in testing issues of discovery to isolate site elements that facilitate or hamper discovery of the libraries’ resources and services. the libraries’ usability working group seized upon two library-wide opportunities to highlight findings of the past year’s studies. the first was the discovery summit, in which the staff viewed videos of staff attempting finding exercises on the homepage and discussed the finding process. the second was the discovery mini-conference, an outgrowth of a new evaluation framework and the libraries’ strategic plan. through a poster display, the working group highlighted areas dealing with discovery of library resources. the mini-conference allowed us to leverage library-wide interest in the topic of effective information-finding on the web to draw wider attention to usability’s importance in identifying the likelihood of our users discovering library resources independently. the usability working group engaged in three projects to help identify areas of the website that inhibited discovery and to generate staff interest in the process of usability. all three of these projects led to improvements to the website and will inform a larger redesign. the first project is an ongoing effort to study student responses to the site. the second was to administer a usability test with the libraries’ peer research coaches and present those findings to the libraries’ staff. the final project was requested by the dean of libraries and involved a specialized test, the results of which also were presented to staff. n student studies the usability working group began its ongoing evaluation of unlv libraries’ website by conducting two series of tests: one with five undergraduate students and one with five graduate students. not surprisingly, most students self-reported that the main reason they come to the libraries’ site is to find books and journal articles for assignments. the group created a set of fourteen tasks that were based on common needs for completing assignments: 1. find a journal article on the death penalty. (note: if students go somewhere other than the library, guide them back.) 2. find what floor the book the catcher in the rye is on. 3. find the most current issue of the journal popular mechanics. 4. identify a way to ask a question from home. 5. find a video on global warming. 6. you need to write a bibliography for a paper. find something on the website that would help you. 7. find out what lied library’s hours were for july 4. 8. find the libraries’ tutorial on finding books in the library. 9. the library offers workshops on how to use the library. find one you can take. 10. find a library-recommended website in business. 11. find out what books are checked out on this card. 12. find instructions for printing from your personal laptop. 13. your sociology professor, dr. lampert, has placed something on reserve for your class. please find the material. 14. your professor wants you to read the book efficiency and complexity in grammars by john a. hawkins. find a copy of the book for your assignment. (the tom ipri (tom.ipri@unlv.edu) is head, media and computer services; michael yunkin (michael.yunkin@unlv.edu) is web content manager/usability specialist; and jeanne m. brown (jeanne.brown@unlv.edu) is head, architecture studies library and assessment librarian, university of nevada las vegas libraries. 182 information technology and libraries | december 2009 moderator will prompt if the person stops at the catalog.) the results of these tests revealed that the site was not as conducive to discovery as was hoped. the libraries are planning on a complete redesign of the site in the near future; however, the results of these first two series of usability tests were compelling enough to prompt an intermediary redesign to improve some of the areas that were troublesome to students. that said, the tests also found certain parts of the old site (figure 1) to be very effective: 1. all participants used the tabbed box in the center of the page, which gives them access to the catalog, serials lists, databases, and reserves. 2. all students quickly found the “ask a librarian” link when prompted to find a way to ask a question from home. 3. most students found the libraries’ hours, partly because of the “hours” tab at the top of the page and partly because of multiple access points. 4. many participants used the “site search” tab to navigate to the search page, but few actually used it to conduct searches. they effectively used the site map information also included on the search page. the usability tests also revealed some variables that undermined the goal of discoverability. 1. due to the various sources of library-related information (website, catalog, vendor databases) navigation posed problems for students. although not a specific question in the usability tests, the results show students often struggled to get back to the libraries’ home page to start a new question. 2. students often expected to find different content under “help and instruction” than what was there. 3. students used the drop down boxes as a last resort. often, they would expand a drop down box and quickly navigate away without selecting anything from the list. 4. with some exceptions, students mainly ignored the tabs across the top of the home page. 5. although students made good use of the tabbed box in the center of the page, many could not distinguish between “journals” and “articles & databases.” 6. similarly, students easily found the “reserves” tab but could not make sense of the difference between “electronic reserves (e-reserves)” and “other reserves.” 7. no student found business resources via the “subject guides” drop down menu at the bottom of the home page. n peer-coach test and staff presentation unlv libraries employs peer research coaches, undergraduate students who serve as frontline research mentors to their peers. the usability working group administered the same test they used with the first group of undergraduate and graduate students to the peer research coaches. although these students are trained in library research, they still struggled with some of the usability tasks. the usability working group presented the findings of the peer research coach tests with staff. the peer research coaches are highly regarded in the libraries, so staff were surprised that they had so much difficulty navigating the site; this presentation was the first time many of the staff had seen the results of usability studies of the site. the shocking nature of these results generated a great deal of interest among the staff regarding the work of the usability working group. n the dean’s project in january 2009, the dean of libraries asked the usability working group for assistance in planning for the discovery summit. initially, she requested to view figure 1. unlv libraries’ original website design usability as a method for assessing discovery | ipri, yunkin, and brown 183 the video from some of the usability tests with the goal of identifying discovery-oriented problems on the libraries’ website. soon after, the dean tasked the group with performing a new set of usability tests using three subjects: a librarian, a library employee with little research or web expertise, and a faculty researcher. each participant was asked to complete three tasks, first using the libraries’ website, then using google. the tasks were based on items found in the libraries’ special collections: 1. find a photograph available in unlv libraries of the basic magnesium mine in henderson, nevada. 2. find some information about the baneberry nuclear test. are there any documents in unlv libraries about the lawsuit associated with the test? 3. find some information about the local greenpeace chapter. are there any documents in unlv libraries about the las vegas chapter? the dean viewed those videos and chose the most interesting clips for a presentation at the discovery summit. prior to this meeting, the libraries’ staff were instructed to try completing the tasks on their own so that they might see the potential difficulties users must overcome and to compare the user experience provided by our website with that provided by google. at the discovery summit, the dean presented the staff a number of clips from these special usability tests, giving the staff an opportunity to see where users familiar with the libraries collections stumble. the staff also were shown several clips of undergraduates using the website to perform basic tasks, such as finding journal articles or videos in the libraries, with varying degrees of success. these clips helped illustrate the various difficulties users encounter when attempting to discover library holdings, including unfamiliar search interfaces, library jargon, and a lack of clear relationships between the catalog and other databases. this discussion helped set the stage for the discovery mini-conference. n initial changes to the site unlv libraries’ website is in the process of being redesigned, and the results of the usability studies are being used to inform that process. however, because of the seriousness of some of the issues, some changes are being implemented into an intermediary design (figure 2). the new homepage n combines article and journal searching into one tab and removes the word “databases” from the page entirely; n adds a website search to the tabbed box; n adds a “music & video” search option; n makes better use of the picture on the page by incorporating rotating advertisements in that area; n widens the page, allowing more space on the rest of the site’s templates; n breaks the confusing “help & instruction” page into two more specific pages: “help” and “using the libraries”; and n adds the main library and the branch library hours to the homepage. this new homepage is just the beginning of our efforts to improve discovery through the libraries’ website. the usability working group already has plans to do a card sort for the “using the library” category to further refine the content and language of that section. the group plans to test the initial changes to the site to ensure that they are improving discovery. reference 1. jennifer church, jeanne brown, and diane vanderpol, “walking the web: usability testing of navigational pathways at the university of nevada las vegas libraries,” in usability assessment of library-related web sites: methods and case studies, ed. nicole campbell (chicago: ala, 2001). figure 2. unlv libraries’ new website design salazar 170 information technology and libraries | september 2006 author id box for 3 column layout traditional, larger libraries can rely on their physical collection, coffee shops, and study rooms as ways to entice patrons into their library. yet virtual libraries merely have their online presence to attract students to resources. this can only be achieved by providing a fully functional site that is well designed and organized, allowing patrons to navigate and locate information easily. one such technology significantly improving the overall usefulness of web sites is a content management system (cms). although the cms is not a novel technology per se, it is a technology smaller libraries cannot afford to ignore. in the fall of 2004, the northcentral university electronic learning resources center (elrc), a small, virtual library, moved from a static to a database-driven web site. this article explains the importance of a cms for the virtual or smaller library and describes the methodology used by elrc to complete the project. state of the virtual library the northcentral university electronic learning resource center (elrc), a virtual library, recently moved from a static to a databasedriven web site in 2004.1 before this, the site consisted of 450 static pages and continued to multiply due to the creation and expansion of northcentral university (ncu) programs. to provide the type of service demanded by our internet-savvy patrons, the elrc felt it needed to evolve to the next stage of web management and design. ncu, with a current enrollment of roughly twenty-one hundred fulltime students, is one of many forprofit virtual universities (including the university of phoenix, capella, and walden, among others) seeking to carve a niche in the education market by offering professional degrees entirely online.2 in the past few years, distance education has experienced exponential growth, causing virtual universities to flourish, but forcing on their libraries the challenge of keeping pace.3 typically, virtual libraries are manned by a limited staff comprised of one or two librarians who are responsible for all facets of the library, including interlibrary loan, virtual reference, library instruction, and web site management, among other library duties. 4 web site management, as expected, becomes cumbersome when a site exceeds two hundred or more static pages and a clear and structured system is not in place to maintain a proliferating number of web pages. because virtual, for-profit libraries do not rely on public funding and taxes, they tend not to be as concerned about autonomy as public or state libraries, which must find ways to stay within budget and curtail expenses. on the same note, some academic libraries prefer to maintain a local area network (lan), while other libraries may not have the staff, resources, or need for such a system. thus, for some virtual libraries, such as elrc, the incorporation of technology takes on a more dependent role. that is, where some libraries are encouraged to explore open source applications and create homegrown tools, the virtual, smaller-staffed library finds itself more or less reliant on its university’s information technology (it) department.5 virtual libraries address the needs of distance education students, who demand an equivalent, if not surpassing, level of service and instruction as they would expect to find at physical libraries.6 meeting these needs requires a great deal of creativity, ingenuity, and a strong technical background. recent trends in developing technologies such as mylibrary, learning objects, blogs, virtual chat, and federated searching have broadened the scope of possibilities for the smaller-staffed, virtual library. in particular, a content management system (cms) utilizes a combination of tools that provide numerous advantages, as outlined below: 1. the creation of templates that maintain a consistent design throughout the site 2. the convenience of adding, updating, and deleting information from a single, online location 3. the creation and maintenance of interactive pages or learning objects 4. the implementation of a simple editing interface that eliminates knowledge of extensible hypertext markup language/hypertext markup language (xhtml/ html) by library staff simply defined, a cms is comprised of a database, server pages such as active server page (asp), personal home page (php), or coldfusion; a web server—for example, internet information server (iis), personal web server (pws), or apache; and an editing tool to manage web content.7 these resources vary in price, but for a virtual library integrated into a larger university, it is ideal to implement applications and software supported by the university. for the autonomous academic library, this may differ. there are advantages and disadvantages for using proprietary and nonproprietary software, and it is left to the library, virtual or physical, to determine the type of resources needed to meet the goals and mission of the university.8 although the scope of this article focuses on the creation of tools for a homegrown cms, some libraries may wish to explore commercial cms packages that include additional services such as technical support. these cms packages will vary in price and services depending on the vendor and the needs of the library.9 elrc transformed in fall 2004, a group that consisted ed salazar (esalazar@ncu.edu) is reference/web librarian at northcentral university. content management for the virtual library ed salazar article title | author 171content management for the virtual library | salazar 171 of two librarians, the education chair, and programmer, convened to discuss the redesign of the elrc web site, which had become increasingly difficult to manage. specifically, the amount of duplicated content, inconsistent design and layout, and unstructured architecture of the site posed severe navigational and organizational problems. the group selected and compared other academic library sites to determine a desired design and theme for the new elrc site. discussions also involved the addition of features such as a site search and breadcrumbs, which the group felt were essential. as a result, the creation of a homegrown cms using proprietary software became the route of choice to meeting the increasing demands of patrons and the need to expand the site. because ncu utilizes microsoft (ms) information system products, it was agreed ms or ms-compatible applications would be used to create the cms, which consisted of sql server, iis, asp, visual basic script (vbscript), jspell iframe, and ms visual interdev. ms visual interdev and jspell iframe supplanted our previous web editor, ms frontpage, which seemed to generate superfluous code and thus made it difficult to debug or alter the design and layout of pages. also, using jspell iframe eliminated the need for future ncu librarians to possess an expertise in xhtml/html. with these pieces in place, the arduous task of culling content from static pages and entering it into a database was begun. the database the sql server database helped in organizing and structuring content, and allowed for the creation of templates and administration (admin) pages.10 in addition, the database played an integral part in creating the search, breadcrumb, and site map features the group so desperately wanted. a significant amount of time was spent weeding the site for information that had become obsolete or irrelevant to elrc. it should be noted that the group originally attempted to use access for a database but stumbled across several problems, one being the inability to maintain a stable and reliable connection to the database. the templates with the database nearly complete, the programmer began creating asp templates in ms visual interdev. these templates basically serve as the shell of the web page, preserving the design and layout elements of the page while extracting unique content based on a user’s request. in essence, a single template can produce hundreds of pages consistent in design. likewise, a single change to the template can alter the entire design of the site. for the elrc, seven templates were created for more than 450 pages. figure 1 shows the elrc course guides template. figure 2 shows the public view of the elrc course guide template. changes to the templates are done using ms visual interdev, which offers a user-friendly environment for managing web pages. ms visual interdev also includes helpful features, such as highlighting code errors for easy debugging, and the ability to access, create, and maintain stable connections to databases.11 in addition, the ms visual interdev editor recognizes commonly used asp commands, allowing the user to save time by utilizing keyboard shortcuts when programming. besides creating templates, asp server-include files and cascading style sheets (css) were incorporated, allowing for the easy modification of code on a single file instead of each and every page or template. this, in particular, is time-efficient when having to add or change database connections or design elements. also, the elrc took extra precaution to ensure that style elements met the accessibility requirements and standards set forth by the world wide web consortium (w3c), as well as tested the site on other browsers, such as firefox and netscape.12 as the site continues to grow and expand, so may the need for additional templates. creation or replication of templates is simple, requiring a basic understanding of programming and the re-assigning of new variables in the code to match added or modified tables. there is some speculation in the near future of migrating the site to the asp.net environment for added functionality and security. if and when that time comes, the elrc will be ready. at present, ncu is not considering the use of open source code or applications (the exception being the apache web server); this is primarily due to available technical support, security, and intuitiveness of use associated with commercial software. in addition, the ncu information system was built using commercial software and a complete transition to open source, at the moment, is not possible or desirable. with the templates complete, the elrc began running a prototype of the new site, making it accessible to students and faculty from a link on the old site. a survey was created that allowed users to comment on the new site. one detail of importance to note is that the survey duplicated a prior survey done on the old site in 2003 in order to provide the elrc with comparative data. the admin pages the next phase of the project required the creation of admin pages, which would allow content to be quickly added, updated, and deleted on the site. these pages, like the templates, were created in ms visual interdev; display content is housed within the database on the web, thus allowing 172 information technology and libraries | september 2006 it to be changed on the fly. figure 3 shows all of the web pages for the elrc within a table. what is particularly convenient about the admin edit pages is the incorporation of the jspell iframe editor, which serves as the frontend editor to the site. the reason for using jspell iframe, as stated earlier, is its ease of use: the simple tool bar provides the basic, essential tools necessary for creating content without the daunting number of buttons and menu selections other editors tend to have. also, jspell iframe is reasonably priced and does not entail a complex installation or require any space on local hard drives; instead, the program is maintained on the server. consequentially, all that is required is the insertion of the jspell iframe javascript code into the web pages. in addition to jspell iframe, fields within admin edit pages are or can be pre-populated by content in the database. for instance, the title or display order of links can be easily edited or changed. longer text fields comprised of paragraphs are created or modified using jspell iframe. deleting a page is simple, requiring only the click of a delete button on the bottom, righthand corner. figure 4 shows jspell iframe embedded within an admin edit page. the admin add page is straightforward. information is entered into the fields appearing on a form page, and the proper page type designation is selected from a drop-down menu. yet, more importantly, the admin add and the admin edit pages can filter information to specific users for security purposes and library needs. figure 5 shows an admin add page. figure 6 shows an admin edit page. the admin pages were designed with flexibility in mind. main column headings may be sorted, as seen in figure 3, allowing one to locate a particular page. the sorting feature also displays the inner structure of the database that, in turn, identifies parent-child relationships between pages in the elrc, which is useful and necessary when adding pages to the elrc site. due to the careful thought used in creating the admin pages, they have proven to be extremely effective and useful in maintaining a library web site. each and every change to the site can be made on the web, allowing content to be edited remotely and eliminating the need for installing and maintaining expensive editing software on local and remote machines. usability testing with the site completed, the elrc felt it important to perform usability figure 1. elrc course guide template figure 2. public view of the elrc course guide template article title | author 173content management for the virtual library | salazar 173 tests, but how does a virtual library conduct usability testing when all of its students are distance education students? this is a difficult question that involves some ingenuity to answer. in order to solve this problem, staff members were propositioned (begged) to volunteer for the study. total staff acquired was five. also, a local college class of about ten students was persuaded to participate in the study. granted, the total number of subjects is not representative of the ncu student body; however, substantial changes to the site were made from the data gathered. more usability testing is expected in the immediate future. the findings usability testing complete, the site was launched. during this period, a few minor hang-ups were experienced, including broken links, form page errors, and stray design elements, but these were only minor problems that were quickly fixed. feedback from the elrc survey showed that nearly all of the students and faculty, roughly fifty respondents, approved of the changes by commenting that the site had improved in layout and organization of content as well as navigation. also, responses and comments from usability testing participants were equally positive and encouraging. figure 7 shows the new ncu learners elrc home page. although it is difficult to establish a direct connection between the elrc site and usage, recent statistics appear promising. since the inception of the new site in december 2004, the number of visits to the elrc learners home page has jumped 10 percent. this number is expected to rise as ncu continues to grow and students become more acquainted and familiar with the site. the project took nearly six months to complete and required the expertise of a programmer. although programming may be outside the requisites of a distance librarian, managing the site is not. a general understanding of control statements and sql is all that is needed. for the distance librarian who spends almost all of his or her time online, these skills can be acquired on the job or by taking introductory programming courses at a local college. in the hope that the site will continue to expand in concert with the growing body of ncu students, recently the elrc added a writing center and blog. with the entire site now being database driven, adding, updating, deleting content is done effortlessly. ideally, students and faculty will play a greater role in the development of the elrc site as a result of the changes. involving patrons with the site can play an integral, beneficial role in their academic pursuits. figure 3. web pages for elrc within a table figure 4. jspell iframe editor embedded within an admin edit page 174 information technology and libraries | september 2006 conclusion the elrc at ncu encourages other virtual or smaller libraries to explore their resources for improving their library web sites, which involves understanding campus resources and personnel. with the ever-burgeoning growth of technological resources, every library—small or large, virtual or physical, public or private—can empower itself to meet the needs of internet-savvy students. it is only a matter of being aware of the resources and putting them to good use. references and notes 1. the ncu elrc web site is comprised of three separate sites: the public site www.ncu.edu/elrc (accessed dec. 2, 2004), the mentors site http://mentors .ncu.edu/elrc (accessed dec. 2, 2004), and the learners site http://learners.ncu .edu/elrc (accessed dec. 2, 2004). although similar in design, each site is tailored to meet the needs of each individual group as well as protect ncu’s resources, services, and information. access to subscription resources and personal information is available upon authentication of the user to the site. 2. for a detailed overview of virtual libraries, see valerie a. akuna, “virtual universities: the new higher education paradigm,” estrella mountain college, http://students.estrellamountain .edu/drakuna/virtualuniversities.htm (accessed feb. 15, 2005). 3. u.s. department of education, national center for education statistics, “the condition of education 2004,” distance education at postsecondary institutions, http://nces.ed.gov/pubsearch/ pubsinfo.asp?pubid=2004077 (accessed feb. 8, 2005). 4. for more information on the role of the virtual librarian in a virtual university, see jan zastrow, “going the distance: academic librarians in the virtual university,” university of hawaii–kapiolani community college, http://library.kcc .hawaii.edu/~illdoc/de/depaper.htm (accessed jan. 29, 2005). 5. for an overview on developing an open source cms, please see mark dahl, “content management strategy for a college library web site,” information technology and libraries 23, no. 1 (2004). 6. for a detailed discussion on distance education and virtual libraries, see smiti gandhi, “academic librarians and distance education: challenges and opportunities,” reference & user services quarterly 43, no. 2 (2003). 7. for detailed information on using asp pages for managing databases, see xiaodong li and john paul fullerton, “create, edit, and manage web database content using active server pages,” library hi tech 20, no. 3 (2002); see also, bryan h. davidson, “database driven, dynamic content delivery: providing and managing access to online resources using microsoft access and active server pages,” oclc systems and services 17, no. 1 (2001). figure 6. admin edit page figure 5. admin add page article title | author 175content management for the virtual library | salazar 175 8. for advantages and disadvantages of open source and proprietary software, see john caroll, “open source versus proprietary: both have advantages,” special to cnet asia, http://asia.cnet.com/ builder/program/work/0,39009380,3918 1451,00.htm (accessed feb. 4, 2004); see also, stephen shankland, “study: opensource database going mainstream,” cnet, http://ecoustics-cnet.com.com/ study+open-source+databases+going +mainstream/2100-7344_3-5171543.html (accessed feb. 4, 2004). 9. for information on commercial content management vendors and prices, see cms watch, www.cmswatch.com/cms/ vendors (accessed feb. 15, 2005). “sql server 2000 product overview,” microsoft windows server system, www.microsoft. com/sql/evaluation/overview/default. asp (accessed feb. 15, 2005). 10. for a review on visual interdev, see maggie biggs, “visual studio 6.0 demonstrates improved integration,” infoworld 20, no. 35 (1998), www.infoworld.com/ cgi-bin/displaytc.p1?/reviews/980831 vstudio6.htm (accessed feb. 4, 2004). 11. “checklist of checkpoints for web content accessibility guidelines 1.0,” w3c, www.w3.org/tr/wai-webcon tent/full-checklist.html (accessed feb. 1, 2005). 12. jspell iframe 2004, www.jspell .com/iframe-spell-checker.html (accessed dec. 2, 2004). figure 7. elrc learners home page ebsco cover 2 lama cover 3 lita cover 4 index to advertisers in march 2003 the university of mississippi libraries made our metasearch tool publicly available. after a year of working with this product and integrating it into the library web site, a wide variety of libraries interested in our implementation process and experiences began to call. libraries interested in this product have included consortia, public, and academic libraries in the united states, mexico, and europe. this article was written in an effort to share the recommendations and concerns given. much of the advice is general and could be applied to many of the metasearch tools available. google scholar and other open web initiatives that could impact the future of metasearching are also discussed. m any libraries are looking for ways to facilitate the discovery process for users. implementing a one­stop search product that does not require database­specific knowledge is one of the paths librar­ ies are choosing.1 as these search engines are made available to patrons, the burden of design falls to the library as well as to the product developers. most library users may be familiar with a few databases, but the vast majority of electronic resources remain unrevealed. using a metasearch product, a single search is broadcast out to similar and divergent electronic resources, and search results are returned and typically mixed together. metasearch results are returned in real­time and link the user to the native interface. although there are many products that support one­stop searching, the university of mississippi libraries chose to purchase innovative interfaces’ metafind product because it tied into a digital initiative partnership with innovative. some of the possibilities of the types of resources you can search include: n library catalogs n licensed databases n locally created databases n full text from journals and newspapers n digital collections n selected web sites internet search engines the simplicity of google searching is very appeal­ ing to users. in fact, users have come to expect this kind of empowering tool. at the university of mississippi, students use and have been using google for research. as google scholar went public, it became evident that university faculty also use it for the same reasons. it was apparent from the university of mississippi libraries’ 2003 libqual+ survey results that users would like more personal control than the library was offering (table 1). unintentionally elaborate mazes are created and users become lost in a quagmire of choices. as indicated by our libqual+ survey results, our users want easy­to­use tools that allow them to find informa­ tion on their own, and they want information to be easily accessible for independent use. these are clearly two areas that many libraries are struggling to improve for their patrons. the question is how to go about it. based on several changes made between 2003 and 2005, which included implementing a metasearch tool, the adequacy mean improved for both questions and for undergradu­ ates as well as graduate students and faculty (table 2). the adequacy mean compares the minimum level of ser­ vice that a user expects with the level of service that they perceive. in table 1, the negative adequacy mean figures indicate that the library was not meeting users’ minimum level of service for these two questions or that the per­ ceived level of service was lower than the minimal level of service. table 2 compares the adequacy mean from 2005 with 2003 and indicates a notable, positive change in adequacy mean for each question and with each group. n design perspectives and tension generally, there are conflicts within libraries regarding the question of how to improve access for patrons and allow for independent discovery. for those leading a metasearch implementation, these tensions are important to understand. in implementing new technologies, there are key development issues that may decrease internal acceptance until they are addressed. however, one may also find that there are some underlying fears regarding this technology. although the following cross­subculture comparisons simply do not do justice to each of the valid perspectives, these brief descriptions highlight the types of perspectives one might encounter when considering or implementing a metasearch product. expert searchers prefer native interfaces and all of the functionalities of the native interface. they are typically unhappy with the “dumbed­down” or clunky searching of a metasearch utility. they would prefer for patrons to be taught the ins and outs of the database they should be using for their research. this presupposes that the students either know which database to use, will spend time inves­ tigating each database on their own, or that they will ask for assistance. however, there are clearly native interface 44 information technology and libraries | june 2007 metasearching and beyond: implementation experiences and advice from an academic library gail herrera gail herrera (gherrera@olemiss.edu) is assistant dean for technical services & automation and associate professor at the university of mississippi. metasearching and beyond | herrera 45 functionalities—such as lim­ iting to full text—that, while wonderful to patrons, are not consistent across resources or a part of the metasearch standard. users would cer­ tainly benefit if limiting to full­text was ubiquitous among vendors and if there were some way to determine full­text availability within metasearch tools. results ranking is another issue that expert searchers may bring to the table. currently, there is a niso metasearch initiative that is striving to standard­ ize metasearching.2 another downside for the expert searcher is that there is no browse function. those who are in administrative or manage­ rial positions working with electronic resources see metasearching as an opportunity to reveal these resources to users who might not otherwise discover them. for example, many users have learned to search ebsco’s academic search premier not realizing that key articles on a local civil rights figure such as james meredith are also available in america: history & life, jstor, and lexisnexis. metasearching removes the need for the user to spend additional time choosing databases that seem relevant and searching them indi­ vidually. from a financial perspective, if a library is pay­ ing for these electronic resources, they should be using them as much as possible. and while the university of mississippi libraries generally target the undergraduate audience with our metasearch tool, the james meredith search is a good example of how a metasearch tool might reveal other databases with information that a serious researcher could then further investigate by link­ ing through the citation to the native interface. those associated with library instruction may also be uncomfortable with metasearching. in fact within a short time of implementing the product, several instructors conveyed their fear that in making searching so simple, they would no longer have a job as the product developed. generally, it seems that users are always in need of instruc­ tion although the type of instruction and the tools continue to change. it is an understandable fear and one that would be wise to acknowledge for those embarking on a metasearch implementation. while metasearch can be an empowering tool for users, you may also encounter some emotional reactions among library employees. from an information literacy point of view, frost has noted that metasearching is “a step backward” and “a way of avoiding the learning process.”3 it is true that in providing an easy search tool, the library is not endeavoring to teach all students intermedi­ ate or advanced information retrieval knowledge or skills. however, it is important to provide tools that meet users at their level of expertise and as previously noted, this is an area identified in need of improvement. for those working at public service points such as the reference desk, metasearching is an adjustment. many times those working with patrons tend to use databases with which they are more familiar or in which they feel more confident. federated search tools may reveal resources that are typically less used and therefore unfa­ miliar to library employees. training may then become an issue worthy of addressing not just for the metasearch interface and design but also for the less­used resources. for those involved in technical support, this product may range from exciting to exasperating. the amount of time your technical support personnel have to dedicate to your metasearch project should be a major factor when investigating the available products. just like any other technological investment, you are either going to (1) purchase the technology and outsource manage­ ment or (2) obtain a lesser price from a vendor for the tool and invest in developing it yourself. there is also a middle ground, but this cost­shifting is important to keep in mind. regardless of your approach, it is critical to include the technical support person on your imple­ mentation team and to keep in mind the kind of time investment that is available when reviewing prices. along with developing this product, one may also find oneself investing additional time and money into infra­ structural upgrades such as the proxy server, network equipment, or dns servers. in addition to these perspectives, there is a general tension in library web site design philosophies between how librarians would like patrons to use their services table 1. 2003 libqual adequacy mean undergrad grad faculty easy-to-use access tools that allow me to find things on my own -.10 -.30 -.29 making information easily accessible for independent use .37 -.09 .03 table 2. positive change in libqual adequacy mean from 2003 to 2005 undergrad grad faculty easy-to-use access tools that allow me to find things on my own .53 .46 .24 making information easily accessible for independent use .22 .22 .45 46 information technology and libraries | june 2007 and what patrons want. the traditional design based on educating users and having users navigate to information “our way” has definitely curtailed over the past several years with attention being paid increasingly to usability. as usability studies give librarians increasing informa­ tion, libraries are moving toward designing for our users based on their approaches and needs rather than how librarians would have them work. depending on where one’s library is in this spectrum of design philosophy, implementing a metasearch tool may be harder or easier. judy luther surmised the situa­ tion well, “for many searchers, the quality of the results matter less than the process—they just expect the process to be quick and easy.”4 moving toward this lofty goal is to some extent dictated by the abilities and inabilities of the technologies chosen. as a technologist, the general rule seems to be that the easier navigation is made for our users; the more complex the technical structure becomes. n metasearch categories in arranging categories of searches for a metasearch product, some libraries group their electronic resources by subject, and others use categories that reflect full­text avail­ ability. the university of mississippi libraries use both. the most commonly used category is our full­text category. this full­text category was set as the default on our most popular search box located on our articles and databases web page (figure 1). since limiting to full­text materials is not a standard, the category was defined by the percentage of full­text they contain. this is an important distinction to understand because a user may receive results that are not full­text, but the majority of results will likely be full­text. at our library, if the resource contains more than 50 percent full­text, it is included in the full­text category. other categories included in this implementation are ready reference, library catalogs, digital collections, lim­ ited resources, publicly available databases, and broad subject categories. one electronic resource may be included in the full­text category, a broad sub­ ject category such as “arts and humanities” and also have its own individual category in order to mix and match individual resources on sub­ ject guides using a tailor­made search box. the limited resource category contains resources that should be searchable using the metasearch tool but that have a limited number of simultaneous users. if it were included in the default full­text category that is used so much, it would tie up the resource too much. investigating resources with only one or two simultaneous users at the begin­ ning of the project may help you avoid error messages and user frustration. one might wonder, “why profile limited resources then?” there may be specific search boxes on subject guides where librarians decide to add that individual but limited resource. it might also be necessary to shorten the time­out period for limited user resources. along those same lines, having pay­per­search resources profiled could also be expensive and is not recommended. since the initial implementation, migrating away from per­ search resources has become a priority. within the first few months of implementation, the free resources such as pubmed and askeric were moved to a new “publicly available” category. the reason is that since there is not any authentication involved, these results return very quickly and are always the first results a user sees. while they are important resources, our intent was really to reveal our subscription resources. this approach allows users to search these resources if specifically chosen but they are not included in the default full­text category. this approach does still allow subject librarians to mix and match these free individual resources on subject guide search boxes. n response time of all of the issues with our metasearch tool, response time has been the most challenging. there are so many issues when it comes to tracking down sluggish response that it can be extremely difficult to know where to start. if one’s metasearch software is not locally hosted, response time could involve the library network, campus network, off­campus network provider, and the vendor’s network, not to mention the networks of all the electronic resources users are searching. when one adds the other variable of authentication, the picture becomes even more over­ whelming and difficult to troubleshoot. for authentication, the university of mississippi libraries purchased innovative’s web access management module (wam), which is based on the figure 1. metasearch tailored search box with full text category selected metasearching and beyond | herrera 47 ezproxy software. as the use of our electronic resources from on­campus and off­campus has grown, the inci­ dence of increasing network issues has risen. in work­ ing with our campus telecommunications group, the pursuit of ever­greater bandwidth has become a priority. troubleshooting has included tracking down trouble­ some switch settings, firewall settings, as well as campus dns and vendor dns issues. if your network adminis­ trators use packet shapers, this may be another hurdle. clearly, our metasearch product has placed a significant load increase on the proxy server. in looking at proxy statistics, 24 percent of total proxy hits were from the metasearch product (figure 2). with this in mind, one may find the load on one’s proxy server increasing very dramatically during peak usage and may need to plan for upgrades accordingly. even with improvements and tweaks along the way, response time is still an issue and one of the highest hurdles in selling a metasearch product internally and externally. one metasearch statistical module includes response time information for individual resources along with usage data. the response time information would be very helpful in troubleshooting and in working with electronic resource vendors. usage tracking is another criterion to consider in reviewing metasearch products. n response time and tailored search boxes during implementation, one of the first discussions to have is who will be the target audience for this product. at this institution, undergraduates were the target audi­ ence and more specifically, those looking for three to five articles for a paper. while our metasearch software has a master screen showing all of the resources divided into the main categories, facing users with over sixty check boxes was not a good solution (figure 3). this master screen is good for demonstrating categories to library staff, overall functionality of the technology, and also for quickly checking all of your resources for connectivity errors. from early conversations with students, keeping basic users far away from this busy screen is a good goal. remember, the purpose is to give them an easy starting point. the best way to keep users in a simple search box is to construct search boxes and hand­pick either individual resources or categories keep­ ing in mind the context of the web page. for example, the articles and databases page has a simple search box that searches for articles. subject guide boxes search individual electronic resources selected by the subject librarian. the university of mississippi libraries also have a large col­ lection from the american institute of certified public accountants (aicpa). the search box on that page searches our catalog, which contains aicpa books along with the aicpa digital collection. some libraries are interested in developing a standard metasearch box to display as a widget or standing content area throughout their web site. this is interesting and worth considering. however, matching the web page content with appropri­ ate resources has been our approach. as the standards and technology develop, this may be worth further con­ sideration depending on usability findings. for the most commonly used search box on the articles and databases page (figure 1), the default category checked is the full­ text articles category. donna fyer stated that, “for the average end user, the less decision making, the better.”5 this certainly rings true for our users. originally, a simple metasearch search box was placed on the library homepage. the library catalog and the basic metasearch box were both displayed. this seemed confusing for users since both products have search capabilities. with the next web site redesign, the basic metasearch box moved from the library homepage to the articles and journals web page. this was a success­ ful place for the article quick search box to reside since the default was set to search the full­text category. there were some concerns that users might be typing journal titles into the search box but these were rare instances and not necessarily inappropriate uses. the next rede­ sign eventually moved this search box to the articles and databases page, where it remains. for the articles and databases pages, the simple search box (figure 1) by default searches the full­text category and searches the title keyword index. the index category with the label, “article citations,” can also be checked by the user. the majority of metasearches begin with this search box and figure 2. total proxy hits vs. metafind proxy hits 4� information technology and libraries | june 2007 most users do not change the default settings for the resources or the index. n subject guide search boxes in addition to the “article quick search” box, subject librarians slowly became interested in a search box for their subject guides as the possibili­ ties were demonstrated. in order to do this, the ven­ dor was asked to profile each resource with its own unique value in order to mix and match individual resources. while the idea of searching resources by subject category sounds useful and appealing, sometimes universal design begets universal dis­ cord. even with a steering committee involved, it is hard for everyone to agree what resources should be in each of the main subject categories: arts and humanities, science and engineering, business and economics, and social science. some libraries have put a lot of time and effort into creating a large number of subject categories. the master search screen (figure 3) displays several of this library’s categories but not the broad subject categories noted above. these general sub­ ject categories are brought out in the multipurpose interface called the “library search engine” (figure 4). the library search engine design is a collection of the categories and resources showing the full functionality of our metasearch tool. the subject categorization approach within our metasearch interface is a good way to show the multifunction­ ality of the product but remains relatively unused by patrons. by giving each resource its own value, subject librarians have the flexibility to select spe­ cific resources and/or categories for their subject guides. it is worth noting that it required additional setup from our vendor and was not part of the original implementation. after a few months of testing with the initial implemen­ tation, willing subject librarians chose individual resources for their tailored search boxes. once a simple search box has been constructed, it can be easily copied with minor modi­ fications to make search boxes for those requesting them. while progress was slow to add these boxes to subject guides, after about a year there was growing interest. in setting these up, subject librarians have several choices to make. first of all, they choose the resources that will be searched. for example, the biology subject guide search box searches academic search premier, bioone, and jstor by default. basicbiosis and pubmed are also avail­ able but are not checked by default. users can check these search boxes if they also wish to search these resources. choosing the resources to include in the search box as well as setting what resources are checked by default is the most important decision. the subject librarian is also encour­ aged to assist in evaluating the number of hits per resource returned. with response time being a critical factor, deter­ mining the number of hits per resource should involve testing and take into consideration the overall number of resources being searched. n relevance selecting the default index is another decision in setting up search boxes. again, users are google­oriented and tend to go with whatever is set as the default option. out of the box, our metasearch tool defaults to the keyword index or keyword search. the issue of relevancy is a hot topic for metasearch products. this issue typically comes up in metasearch discussions. it is also listed as an issue in the niso metasearch initiative. from the technical side of the equation, results are displayed to the user as soon as they are retrieved. this allows users to begin immediately exam­ figure 3. master screen display (partial screenshot) figure 4. library search engine subject categories metasearching and beyond | herrera 4� ining the results. adding a relevancy algorithm as a step would mean all of the results would have to be returned, ranked, and then displayed. with response time being a key issue, a faster response is more important than relevance. another consideration is if the metasearch results are displayed to the user as interfiled or by electronic resource where the resource is returning results based on its own relevancy rankings. one way to increase relevance is to change the default index from keyword to title keyword. for our students, bringing back keywords in the title made the results more relevant. this is the default index used for our article search on the articles and database web page. subject librarians have the choice of indexes they prefer when blending resources. one caveat in using title keyword is that there are resources that do not support title keyword searching. for other resources, title keyword is not an appropriate index. for example, wilson biographies does not have a title keyword search. it makes perfect sense that a biography database would not support title keyword searching. in these cases, the search may fail and note that the index is not supported. to accommodate this type of exception, the profile for wilson biographies needed to be changed to have the title keyword search­mapped to a basic keyword search. while this does not make the results as relevant as the other search results, it keeps any errors from appearing and allows results to be retrieved. n results per source and per page for metafind, there are also two minor controls that can work as hidden values unseen by the patron or as compo­ nents within the search box for users to manipulate. the first control is the number of hits to return per resource. if a subject librarian is only searching two or three resources in his tailored search box, he probably will want to set this number higher. if there are many resources, this number should be lower in order to keep response time reasonable. the second control is the number of results to return per page. in general, it is important to adjust these controls after testing the response for the resources selected. while users typically use the default settings, showing these two con­ trols gives the user a visual clue that the metasearch tool is not retrieving all of the results from the resource. instead, it is only retrieving the first twenty­five, for example. n implementation advice one of the most important pieces of advice is that it is extremely important to have a date in one’s contract or rfp for all of the profiling to be completed if the vendor is doing the resource profiling. from this library’s experi­ ence, the profiling of a resource can take a very long time, and this is a critical point to include in the contract. one might also consider adding cost and turn­around time for new resources after the initial implementation to the contract. the more resources profiled, the more useful the product. however, one also needs to pay attention to response time. if the plan is to profile one’s own resources or connectors, librarians should be mindful of the time involved and ask other libraries with the same product about time investments. being able to work with vendors who will provide an opportunity to evaluate the product “live” is preferable. in deciding who to target for an implementation team, consider representatives from reference, collection development, and systems. it is also very important to include whoever manages electronic resource access/ subscriptions and a web manager. in watching other pre­ sentations, exclusion of any of these representatives can seriously undermine the implementation. buy­in is essen­ tial to success. additionally, giving librarians as many options as possible, such as control over what types of resources are in their search boxes as well as the number of hits per resource makes the product more appealing. n questions to ask once the implementation team is set, interviewing refer­ ences for the products under consideration is an impor­ tant part of the process. unstructured conversations with references really allow librarians to explore together what the group wants and how its needs fit with the services the vendor offers. a survey of questions via e­mail is another possibility. in choosing this method, be sure to leave some room for open comments. regardless of the approach, it is important to spend some time asking ques­ tions. provided are a list of recommended questions: n who is responsible for setting up each resource—the vendor or you? n how much time does it typically take to set up a new resource and what is the standard cost to add a new resource? n is there a list or database of already­established pro­ files for electronic resources for this product? n how much time would you estimate that it took to implement the product? n will you be able to edit all of the public web pages yourself or will you be using vendor support staff to make changes? if the vendor support staff has to make some of the changes, how responsive are they? 50 information technology and libraries | june 2007 n can you easily mix and match individual resources for subject guides, departmental pages, or other kinds of web pages? or do you only have the option to set up global categories? n is your installation local or does the vendor host it? are there response issues? n is there an administrative module to allow you to maintain categories, resource values, and configura­ tion options? n how much time goes into managing the product monthly? and who manages the product at your library? n what kind of statistical information does the vendor provide? n how satisfied are you with the training, implementa­ tion support, and technical documentation? n how does the vendor handle broken resources or subscription changes? as with most technologies, there are upfront and hid­ den costs. it is important to determine what hidden costs are involved and if you have the resources to support all of the costs. sometimes libraries choose the least expen­ sive product. however, this approach can lead librar­ ies down the path of hidden costs. for example, if the product is less expensive but your library is responsible for setting up new electronic resources, managing all of the pages, and finding ways to monitor and troubleshoot performance outside of the tools provided, the hidden expenditures in time and training may be more costly in the end than purchasing the premium metasearch tool. in essence, one must pay for the product one way or another. the big question is, where are the resources to support the product? if one’s library has more it/web personnel than money, the lower­costing product may be the way to go, but be sure to check with other librar­ ies to see if they have been able to successfully clear this hurdle. additionally, if your library has more one­time money than yearly subscription money, this may dictate the details of the rfp, and your library may lean toward a purchase rather than an annual subscription. n metasearch summary clearly, students want a simple starting place for their research. implementing a metasearch tool to meet this need can be a hard sell internally for many reasons. at this institution, response time has been the overriding critical issue. response has lagged due to server and network issues that have been difficult to track down and improve. however, authentication is truly the most time­ consuming and complex part of the equation. some fed­ erated search tools are actually searching locally stored information, which helps with response. while these are not truly metasearch tools and are not performing real­ time searches, this approach may yield more stability with faster response. over the years in implementing new services such as the library web site, illiad, electronic resources, and off­ campus authentication, new services are often adopted at a much faster rate by library users than by library employees. typically, there will be early adopters who use the services immediately based on need. it then takes general users about a year to adopt a new service. iii’s metasearch technology has been available for the past four years. however, our implementation is evolving with each web site redesign. still, it is used regularly. the university of mississippi libraries has been pro­ viding access to its electronic resources in two distinct ways: (1) providing urls on web pages to the native interface of the electronic resource and (2) metasearching. as the library moves forward in developing digital col­ lections and the number of electronic resources profiled for metasearching increases, it is possible that this kind of global discovery tool will compete in popularity with the library catalog. providing such information mining tools to patrons will cause endless frustration for the library literate. response times, record retrieval order, as well as licensing and profiling issues, are all obstacles to pro­ viding a successful metasearch infrastructure. retrieval inconsistency and ad hoc retrieval order of records is very unsettling for librarians. however, this is the kind of tool to which web users have become accustomed and certainly seems to fill a need that to date has been lacking where library electronic resources are concerned. n open web developments one other trend appearing is scholarly research discovery tools on the open web. enter google scholar along with other similar initiatives such as windows live academic search. google scholar beta was released in november 2004 and very soon after began an initiative to work with libraries and their openurl resolvers.6 this bridging between an open web tool and libraries is an interest­ ing development. a fair amount has been written about google scholar to date although the project is still in its beta phase. what does google scholar have to do with metasearching? good question. it remains to be seen how much scholarly information will become search­ able via google scholar. for now, the jury is still out as to whether google scholar will begin to encroach upon the traditional territory of the indexing and abstracting world. if sufficient content becomes available on the open web, whether from publishers or vendors allowing their metasearching and beyond | herrera 51 content to be included, then the authentication piece that directly effects response time may be overcome. in using google scholar or other such open web portals, search­ ing happens instantly. when a user uses the openurl resolver to get to the full­text, that is where authentication enters into the picture and removes the negative impact on searching. the tradeoff is that there are many issues involved in openurl linking and the standardization of the metadata needed to provide consistent access. there are many parallels between what google scholar is attempting to offer and what the promises of metasearching have been. for metasearching, under­ graduate students looking for their three to five articles for a paper are considered our target audience. for in­ depth searching, metasearching does have limitations, but for the casual searcher looking for a few full­text articles, it works well. interestingly, similar recommen­ dations are being made for google scholar.7 however, opinions differ on this point. roy tennant went so far as to indicate it is a step forward in access to those users without access to licensed databases, but remained reserved in his opinion regarding the usefulness for those with access.8 google scholar also throws in a few bonuses. while providing access to open access (oa) materials in our opac for specific collections such as the directory of open access journals, these same resources have not been included in our metasearch discovery tool. google scholar is searching these open repositories of scholarly informa­ tion, although there is some concern over the automatic inclusion of materials such as syllabi and undergraduate term papers within the institutional repositories.9 google scholar also provides a useful citation feature and rel­ evancy. google scholar recognizes the user’s preference for full­text access and provides a visual cue from the brief results when article full­text is available. this func­ tionality is not currently available from our metasearch software but would be extremely helpful to users. on the downside, some of google scholar’s linking policies make it difficult for libraries to extend services beyond full­ text articles to their users. another notable development among subscription indexing services is the ability to reveal content to web search engines. ebsco’s initiative is called ebscohost connection.10 in implementing metasearching, libraries have debated about providing access to free versus subscrip­ tion resources. for our purposes, free resources were not included in the most commonly used search in the full­ text category. there are those who would argue against this decision, and they have very good points. in fact, it has already been noted that some libraries use google scholar to verify incomplete interlibrary loan citations quickly.11 in watching the development of google scholar, it seems possible that this free tool that uncovers free open access resources and institutional repository mate­ rials may not necessarily be a competitive product, but may be a very complementary one. n impact on the opac what will this mean for the “beloved” opac? for a very long time, users have expected more of the library catalog than it has provided. while the library catalog is typically appreciated by library personnel, its usefulness for finding materials other than books has been hard for general users to understand. many libraries including the university of mississippi have been loading records from their electronic resources in hopes of making the library catalog more useful. the current conversation regarding digital library creation also begs the question, “what is the library catalog?” although the library catalog serves as a searchable inventory of what the library owns, it is simply a pointing mechanism, whether it points the user to a shelf, a building, or a url. in our endeavor to provide instant gratification and full­text, as well as the user’s desire for information regardless of format, the library catalog is beginning to take a backseat. it was clear four years ago in plan­ ning digital collections that a metasearch tool would be needed to tie together subscription resources, digital collections, publicly available resources, and the library catalog. it will be interesting to see whether patrons choose to use the formal tools provided by the library or the informal tools developing on the open web, such as google scholar, to perform their research. more than likely, discovery and access will happen through many avenues. while this may complicate the big picture for those in library instruction, it is important to meet users on the open web. one’s best intentions and designs are presented to users but they may choose unintended paths. librarians should watch the paths they are taking and build upon them. sometimes even one’s best attempts fall short, as pointed out clearly in karen schneider’s latest series, “how opacs suck.”12 still it is important to acknowl­ edge design shortcomings and keep forging ahead. dale flecker, who spoke at the taiga forum, recommended not to spend years trying to “get it right” before imple­ menting, but instead to consider ourselves in perpetual beta and simply implement and iterate.13 in other words, do not try to make the service perfect before implement­ ing it. most libraries do not have the time and resources to do this. instead, find ways to gain continual feedback and constantly adjust and develop. students are familiar with internet search engines and do not want to choose between resources. access to a simple resource discovery tool is an important service for users. unfortunately, authentication, product design 52 information technology and libraries | june 2007 and management, and licensing restrictions tend to be stumbling blocks to providing fast and comprehen­ sive access. regarding the metasearch tool used at the university of mississippi libraries, development part­ nerships have already been formed between the vendor and a few libraries to improve upon many of the issues discussed. innovative is developing a next­generation metasearch product called research pro that leverages ajax technology. while efforts are made to participate in discussions and develop our already­existing tools, it is also impor­ tant to pay attention to other developments such as google scholar. at this point, google scholar is in beta but this kind of free searching could turn the current infra­ structure on its ear to the benefit of patrons. the efforts to meet users on the open web and reveal scholarly content are definitely worth keeping an eye on. references 1. roland dietz and kate noerr, “one­stop searching bridges the digital divide,” information today 21, no. 7 (2004): s24. 2. niso metasearch initiative, http://www.niso.org/ committees/ms_initiative.html (accessed may 8, 2006). 3. william j. frost, “do we want or need metasearching?” library journal 129, no. 6 (2004): 68. 4. judy luther, “trumping google? metasearching’s prom­ ise.” library journal 128, no. 16 (2003): 36. 5. donna fyer, “federated search engines,” online 28, no. 2 (2004): 19. 6. jill e. grogg and christine l. ferguson, “openurl link­ ing with google scholar,” searcher 13, no. 9 (2005): 39–46. 7. mick o’leary, “google scholar: what’s in it for you?” information today 22, no. 7 (2005): 35–39. 8. roy tennant, “is metasearching dead?” library journal 130, no. 12 (2005): 28. 9. o’leary, “google scholar.” 10. what is ebscohost connection?, http://support.epnet .com/knowledge_base/detail.php?id=2716 (accessed may 10, 2006). 11. laura bowering mullen and karen a. hartman, “google scholar and the library web site: the early response by arl libraries,” college & research libraries 67, no. 2 (2006): 106–22. 12. karen g. schneider, “how opacs suck,” ala techsource, http://www.techsource.ala.org/blog/karen+g./sch­ neider/100003/ (accessed may 10, 2006). 13. dale flecker, “my goodness, life is different,” pre­ sentation to the taiga forum, mar. 27–28, 2006, http://www .taigaforum.org/pres/fleckerlifeisdifferenttaiga20060327.ppt (accessed may 10, 2006). lita cover 2, cover 3, cover 4 index to advertisers editorial | truitt 51 marc truitteditorial: ala and our carbon footprint obligatory disclaimer: before proceeding, i want to state very clearly that—as with anything i write in this space that is not explicitly attributed to someone other than myself—the reflections that follow are my own thoughts and views. they in no way are intended to represent the views either official or personal of lita or ala officials or employees. w hile i am writing these lines just a week or so after the end of the american library association (ala) midwinter meeting, by the time you see them the ala annual conference in chicago will be just days away. i’ve been reflecting (stewing?) for some time now about the question of ala conferences: why do i attend, and what do i get from these gatherings? is the vendor/exhibitor “tail” wagging the ala/attendee “dog”? is attendance responsible in a time of straitened budgets? and, most recently, what is the environmental cost of attendance? for the moment, i’d like to consider only one of these. we all know that flying is, from an environmental perspective, enormously wasteful and destructive. yet, for attendance at ala and most other professional conferences, air travel is the only practical means, unless either one is fortunate enough to live in the area or ala holds the event in a place such as new york, chicago, philadelphia, or washington, each of which can boast credible commuter rail service. sadly, in most other places trains are really not an option; how many of us can imagine being able to take a long-distance amtrak train to an ala conference? so i wondered what it costs the environment for all of us to go to an ala conference. the following admittedly broad-side-of-barn figures for the recently completed midwinter meeting in denver are real eye-openers (you may not like my assumptions, but we have to assume some things, and after all, i’m only trying to get an orderof-magnitude number): a. number of paid attendees at midwinter meeting 2009: 9,8501 b. “fudge” figure for those who didn’t fly (local attendees or those close enough to use other means of transport): 1,000 c. total number of attendees who flew (a-b): 8,850 d. average distance to denver (round trip, in metric tons of co2 produced): .36352 e. total metric tons of co2—the “carbon footprint”— for all attendees who flew to denver (c x d): 3,217 i’m guessing this is a conservative number; still, the total “carbon footprint” of all who flew to the midwinter meeting was more than 3,000 metric tons of co2.3 that seems to me to be a giant’s footprint indeed for what we are told is primarily a “business meeting.” and this, of course, represents only that portion of the footprint that one identifies with air travel . . . enumerating the actual footprint would require taking into account many other sources of waste, with the resulting total being far larger. is it just me, or does this seem to be an extravagance these days? given that the vast majority of our “business meetings” can be transacted through video conference, teleconference, e-mail, or similar technological means, how do we continue to justify the indulgence of attending such conferences as the planet warms to temperature levels not observed in thousands of years? at a minimum, i would suggest that it’s high time we—individually or as a profession—began to think hard about compensating for our excess by purchasing carbon credits. i personally think of them as “bleeding heart environmentalism,” that is, little more than a means for we “haves” to assuage our guilt about our profligate ways. but even offset payments would be better than nothing. the obvious way to handle this would be for ala to add a modest ($5–10) surcharge to the meeting registration fee, with the resulting proceeds dedicated to an approved beneficiary. let’s see . . . my “carbon footprint” for flying to midwinter meeting 2009 is .38 metric tons. i can purchase an “offset” for about $5 and apply it to any of several worthy causes shown on the carbonfootprint.com website. ah, i feel better already . . . . . . or not. n more midwinter meeting fallout one of the more interesting sessions i attended at the midwinter meeting was a sleeper bearing the title “redefining technical services workflows with oclc.” led by karen calhoun, oclc’s vice president of worldcat and metadata services, a panel that included robin fradenburgh of the university of texas and my university of alberta colleagues kathy carter and sharon marshall described several innovative oclc services aimed at “improv[ing] efficiency and enhanc[ing] access to library materials.”4 calhoun’s overview, “reinventing technical services,” nicely summarized many of the issues facing technical services (ts) operations today, marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 52 information technology and libraries | june 2009 including declining staff counts and the desire by library administrators to reclaim for patron use the space currently occupied by ts operations. she then reviewed recent studies about our patrons’ changing preferences for research tools—i.e., the question that has often been cast as “google versus the catalog.” precisely how workflow and organizational efficiencies (whether or not they come from oclc) in ts can alter our users’ research habits is a bit beyond me, but i’ll leave it to you to decide. the presentations are available to view at http://www.oclc. org/us/en/multimedia/2009/ala_mw_redefining_ technical_services.htm; do listen to the presentations and decide for yourself. in any case, calhoun’s talk, and an earlier comment made by a colleague and long-time friend of mine, got me to thinking again about “the catalog.” my friend, when asked at another program held just before the midwinter meeting, had said that the ts efficiency she would like most to institute would be “to stop cataloguing new (trade) books.” instead, we should put our limited cataloging resources where they might best be used, that is, in making rare and unique local resources discoverable. whoa!, i thought at the time. how might we do this? as calhoun talked about our users’ preference for discovery outside of the catalog, my mind wandered back to my friend’s comment. worldcat local? probably not, since it would still involve “cataloging” books, and doesn’t seem likely to be any more appealing to the google and amazon–focused user than are our opacs already. but what about amazon? i can envision a “catalog” search that begins at amazon’s already metadatarich site, enhanced with links to local holdings of all the things listed there—amazoncat local, if you will. blue-skying a bit more, i can imagine amazon’s business model for offering this kind of service. not only would there be even more eyeballs on its site than there are now, but a library considering such a service might offer in return that some or all of its acquisitions be sourced to amazon. conceivably, amazon could even offer a shelf-ready service, in which it provided the materials already barcoded, marked, and ready to park on our shelves. hmmm . . . open the box, shelve the already-in-the-“catalog” books, and pay the invoice. sounds pretty simple, no? things are rarely that simple, and i know that. there would be complexities aplenty, but who knows? am i serious? i make this proposal because i come from a background that respects and values the work of catalogers and other ts staff. part of me wants the idea to be tried and found wanting, that some of those who argue that library cataloging is “dead” might then come to a different view. but, either way, what we’d need would be a sizable institution willing to try it and see. who wants to be the pilot site? amazoncat local, anyone? references and notes 1. library journal.com, “with economy sputtering, ala midwinter attendance dips sharply,”www.libraryjournal.com/ index.asp?layout=talkbackcommentsfull&talk_back_header_ id=6582196&articleid=ca6632569#129349 (accessed feb. 5, 2009). according to libraryjournal.com, the count on saturday, january 24, was 9,850, including 7,689 registrants (of whom 498 were on-site registrants) and 2,161 exhibitors. 2. i used the carbon footprint calculator at www .carbonfootprint.com/calculator.aspx (accessed feb. 5, 2009) to compute the co2 footprint in metric tons for one round-trip flight between denver and each of the following cities: atlanta (.40), boston (.58), chicago (.30), dallas (.22), houston (.29), los angeles (.27), miami (.57), minneapolis (.23), new york–jfk (.54), philadelphia (.52), phoenix (.19), pittsburgh (.43), salt lake city (.23), san diego (.27), san francisco (.31), seattle (.34), and washington, d.c. (.49). i then averaged these for an “average trip” production of .3635 metric tons. 3. according to wikipedia, one metric ton equals 2,204.6226 lbs. or 1.102 u.s. tons. thus, 3,217 metric tons equals approximately 3,545 u.s. tons. wikipedia, “tonne,” http://en.wikipedia .org/wiki/tonne (accessed feb. 5, 2009). 4. oclc, redefining technical services workflows with oclc, www.oclc.org/us/en/multimedia/2009/ala_mw_ redefining_technical_services.htm (accessed feb. 25, 2009). kruger ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout based on data collected as part of the 2006 public libraries and the internet study, the authors assess the degree to which public libraries provide sufficient and quality bandwidth to support the library’s networked services and resources. the topic is complex due to the arbitrary assignment of a number of kilobytes per second (kbps) used to define bandwidth. such arbitrary definitions to describe bandwidth sufficiency and quality are not useful. public libraries are indeed connected to the internet and do provide public-access services and resources. it is, however, time to move beyond connectivity type and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. a secondary, but important issue is the extent to which libraries, particularly in rural areas, have access to broadband telecommunications services. t he biennial public libraries and the internet studies, conducted since 1994, describe public library involve­ ment with and use of the internet.1 over the years, the studies showed the growth of public­access comput­ ing (pac) and internet access provided by public libraries to the communities they serve. internet connectivity rose from 20.9 percent to essentially 100 percent in less than ten years; the average number of public access computers per library increased from an average of two to nearly eleven; and bandwidth rose to the point where 63 percent of public libraries have connection speeds of greater than 769kbps (kilobytes per second) in 2006. this dramatic growth, replete with related information technology challenges, occurred in an environment of challenges—among them budgetary and staffing—that public libraries face in main­ taining traditional services as well as networked services. one challenge is the question of bandwidth suf­ ficiency and quality. the question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” as will be seen in this paper, such arbitrary definitions to describe band­ width sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction.2 there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet­based applications and services. this inad­ equacy will increase significantly as internet­based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high­use multiple­ workstation public­access context. in addition to connectivity speed, there are many ques­ tions related to public library pac and internet access that can affect bandwidth sufficiency—from budget and sus­ tainability, staffing and support, to services public librar­ ies offer through their technology infrastructure, and the impacts of connectivity and pac on the communities that libraries serve. one key question, however, is what is quality pac and internet bandwidth for public libraries? and, in attempting to answer that question, what are measures and benchmarks of quality internet access? this paper provides data from the 2006 public libraries and the internet study to foster discussion and debate around determining quality pac and internet access.3 bandwidth and connectivity data at the library outlet or branch level are presented in this article. the band­ width measures are not systemwide but rather at the point of service delivery in the branch. ■ the bandwidth issue there are a number of factors that affect the sufficiency and quality of bandwidth in a pac and internet service context. examples of factors that influence actual speed include: ■ number of workstations (public­access and staff) that simultaneously access the internet; ■ provision of wireless access that shares the same con­ nection; ■ ultimate connectivity path—that is, a direct connec­ tion to the internet that is truly direct, or one that goes through regional or other local hops (that may have aggregated traffic from other libraries or orga­ nizations) out to the internet; john carlo bertot and charles r. mcclure assessing sufficiency and quality of bandwidth for public libraries john carlo bertot (jbertot@fsu.edu) is the associate director of the information use management and policy institute and professor at the college of information, florida state university; and charles r. mcclure (cmcclure@ci.fsu.edu) is the director of the information use management and policy institute (www .ii.fsu.edu) and francis eppes professor of information studies at the college of information, florida state university. article title | author 15assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 15 ■ type of connection and bandwidth that the telecom­ munications company is able to supply the library; ■ operations (surfing, e­mail, downloading large files, streaming content) being performed by users of the internet connection; ■ switching technologies; ■ latency effects that affect packet loss, jitter, and other forms of noise throughout a network; ■ local settings and parameters, known or unknown, that impede transmission or bog down the delivery of internet­based content; ■ range of networked services (databases, videoconfer­ encing, interactive/real­time services) to which the library is linked; ■ if networked, the speed of the network on which the public­access workstations reside; and ■ general application resource needs, protocol priority, and other general factors. thus, it is difficult to precisely answer “how much bandwidth is enough” within an evolving and dynamic context of public access, use, and infrastructure. putting public­access internet use into a more typi­ cal application­and­use scenario, however, may provide some indication of adequate bandwidth. for example: ■ a typical three­minute digital song is 3mb; ■ a typical digital photo is about 2mb; and ■ a typical powerpoint presentation is about 10mb. if one person in a public library were to e­mail a powerpoint presentation at the same time that another person downloaded multiple songs, and another was exchanging multiple pictures, even a library with a t1 line (1.5mbps—megabytes per second) would experience a temporary network slowdown during these operations. this does not take into account many other new high­ bandwidth­consuming applications such as cnn stream­ ing­video channel; uploading and accessing content to a wiki, blog, or youtube.com; or streaming content such as cbs’s webcasting the 2006 ncaa basketball tournament. an increasingly used technology in various settings is two­way internet­based video conferencing. with an installed t1 line, a library could support two 512kbps or three 384kbps videoconferences, depending on the amount of simultaneous traffic on the network—which, in a public access context, would be heavy. indeed, the 2006 public libraries and the internet study indicated a near continuous use of public­access workstations by patrons (only 14.6 percent of public libraries indicated that they always had a sufficient number of workstations available for patron use). public libraries increasingly serve as access points to e­government services and resources, e.g., social services, disaster relief, health care.4 these services can require the simple completion of a web­based form (low­bandwidth consumption) to more interactive services (high­band­ width consumption). and, as access points to continuing education and online degree programs, public libraries need to offer adequate broadband to enable users to access services and resources that increasingly can depend on streaming technologies that consume greater bandwidth. ■ bandwidth and pac in public libraries today as table 1 demonstrates, public libraries continue to increase their bandwidth, with 63.3 percent of public libraries reporting connection speeds of 769kbps or greater. this compares to 47.7 percent of public libraries reporting connection speeds of greater than 769kbps in 2004. there are disparities between rural and urban pub­ lic libraries, with rural libraries reporting substantially fewer instances of connection speeds of greater than 1.5mbps in 2006. on the one hand, the increase in con­ nectivity speeds between 2004 and 2006 is a positive step. on the other, 16.1 percent of public libraries report that their connection speeds are insufficient to meet patron demands all of the time, and 29.4 percent indicate that their connection speeds are insufficient to meet patron demands some of the time. thus, nearly half of public libraries indicate that their connection speeds are insuf­ ficient to meet patron demands some or all of the time. in terms of public access computers, the average number of workstations that public libraries provide is 10.7 (table 2). urban libraries have an average of 17.1 workstations, as compared to rural libraries, which report an average of 7.1 workstations. a closer look at bandwidth and pac for the next sections, the data offer two key views for analysis purposes: (1) workstations—divided into libraries with ten or fewer public­access workstations and libraries with more than ten public­access worksta­ tions (given that the average number of public­access workstations in libraries is roughly ten); and (2) band­ width—divided into libraries with 769kbps or less and libraries with greater than 769kbps (an arbitrary indicator of broadband for a public library context). in looking across bandwidth and public­access work­ stations (table 3), overall 31.8 percent of public libraries have connection speeds of less than 769kbps while 63.3 percent have connection speeds of greater than 769kbps. a majority of public libraries—68.5 percent—have ten or fewer workstations, while 30.9 percent have more than ten workstations. in general, rural libraries have fewer workstations and lower bandwidth as compared to sub­ urban and urban libraries. indeed, 75.2 percent of urban 16 information technology and libraries | march 200716 information technology and libraries | march 2007 libraries with fewer than ten workstations have connec­ tion speeds of greater than 769kbps, as compared to 45.2 percent of rural libraries. when examining pac capacity, it is clear that public libraries have capacity issues at least some of the time in a typical day (tables 4 through 6). only 14.6 percent of public libraries report that they have sufficient numbers of workstations to meet patron demands at all times (table 6), while nearly as many, 13.7 percent, report that they consistently are unable to meet patron demands for public­access workstations (table 4). a full 71.7 percent indicate that they are unable to meet patron demands during certain times in a typical day (see table 5). in other words, 85.4 percent of public libraries report that they are unable to meet patron demand for public­access workstations some or all of the time during a typical day—regardless of number of workstations available and type of library. the disparities between rural and urban libraries are notable. in general, urban libraries report more difficulty in meeting patron demands for public­access workstations. of urban public libraries, 27.8 percent report that they consistently have difficulty in meeting patron demand for workstations, as compared to 11.0 percent of suburban and 10.6 percent of rural public libraries (table 4). by contrast, 6.6 percent of urban libraries report sufficient workstations to meet patron demand all the time as compared to 18.9 percent of rural libraries (table 6). when reviewing the adequacy of speed of connectiv­ ity data by the number of workstations, bandwidth, and metropolitan status, a more robust and descriptive pic­ table 1. public library outlet maximum speed of public-access internet services by metropolitan status and poverty metropolitan status poverty level maximum speed urban suburban rural low medium high overall less than 56kbps 0.7% ±0.8% (n=18) 0.4% ±0.6% (n=17) 3.7% ±1.9% (n=275) 2.0% ±1.4% (n=245) 2.7% ±1.6% (n=61) 2.6% ±1.6% (n=5) 2.1% ±1.4% (n=311) 56kbps– 128kbps 2.5% ±1.6% (n=67) 5.4% ±2.3% (n=264) 15.2% ±3.6% (n=1,132) 9.9% ±3.0% (n=1,237) 9.5% ±2.9% (n=216) 5.3% ±2.2% (n=10) 9.8% ±3.0% (n=1,463) 129kbps– 256kbps 2.7% ±1.6% (n=72) 6.8% ±2.5% (n=332) 11.1% ±3.1% (n=829) 8.5% ±2.8% (n=1,067) 7.3% ±2.6% (n=166) 8.2% ±2.8% (n=1,233) 257kbps–768kbps 9.1% ±2.9% (n=241) 10.4% ±3.1% (n=504) 13.4% ±3.4% (n=1,002) 12.5% ±3.3% (n=1,557) 8.4% ±2.8% (n=190) 11.7% ±3.2% (n=1,747) 769kbps– 1.5mbps 33.6% ±4.7% (n=889) 40.0% ±4.9% (n=1,945) 31.0% ±4.6% (n=2,310) 34.3% ±4.8% (n=4,286) 34.6% ±4.8% (n=788) 38.1% ±4.9% (n=70) 34.4% ±4.8% (n=5,144) greater than 1.5mbps 49.4% ±5.0% (n=1,304) 31.6% ±4.7% (n=1,533) 19.9% ±4.0% (n=1,488) 27.4% ±4.5% (n=3,423) 35.5% ±4.8% (n=808) 50.5% ±5.0% (n=93) 28.9% ±4.5% (n=4,324) don’t know 1.9% ±1.4% (n=50) 5.4% ±2.3% (n=263) 5.7% ±2.3% (n=427) 5.5% ±2.3% (n=685) 2.1% ±1.4% (n=48) 3.5% ±1.8% (n=6) 4.9% ±2.2% (n=739) weighted missing values, n=1,497 table 2. average number of public library outlet graphical publicaccess internet terminals by metropolitan status and poverty* poverty level metropolitan status low medium high overall urban 14.7 20.9 30.7 17.9 suburban 12.8 9.7 5.0 12.6 rural 7.1 6.7 8.1 7.1 overall 10.0 13.3 26.0 10.7 * note that most library branches defined as “high poverty” are in general part of library systems with multiple branches and not single building systems. by and large, library systems connect and provide pac and internet services systemwide. article title | author 17assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 17 ture emerges. while overall, 53.5 percent of public librar­ ies indicate that their connection speeds are adequate to meet demand, some parsing of this figure reveals more variation (tables 7 through 10): ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at all times, with 24.0 percent of rural libraries, 25.8 percent of suburban libraries, and 25.4 percent of urban libraries so reporting (table 7). ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at some times, with 35.0 percent of rural libraries, 38.1 per­ cent of suburban libraries, and 53.4 percent of urban libraries so reporting (table 8). ■ libraries with connection speeds of greater than 769kbps also report bandwidth­sufficiency issues, with 12.0 percent of rural libraries, 10.5 percent of suburban libraries so reporting; and 14.0 percent of urban librar­ ies indicating that their connection speeds are insuf­ ficient all of the time (table 7); 20.3 percent of rural libraries, 29.5 percent of suburban libraries, and 30.0 percent of urban libraries indicating that their connec­ tion speeds are insufficient some of the time (table 8). ■ libraries that have ten or fewer workstations tend to rate their bandwidth as more sufficient at either 769kbps or less or greater than 769kbps (tables 7, 8, and 10). thus, in looking at the data, it is clear that libraries with fewer workstations indicate that their connection speeds are more sufficient to meet patron demand. table 3. public library public-access workstations and speed of connectivity by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 48.4% n=2,929 45.2% n=2,737 30.1% n=891 63.2% n=1,872 21.6% n=269 75.2% n=937 more than 10 workstations 22.0% n=307 75.5% n=1,053 12.0% n=225 85.1% n=1,595 9.6% n=130 89.8% n=1,221 total 43.4% n=3,242 50.9% n=3,802 23.0% n=1,116 71.6% n=3,474 15.1% n=399 83.0% n=2,194 missing: 7.6% (n=1,239) table 4. fewer public library public-access workstations than patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 10.5% n=681 10.8% n=339 23.6% n=300 12.1% n=1,321 more than 10 workstations 10.8% n=158 11.4% n=220 31.2% n=430 16.9% n=808 total 10.6% n=845 11.0% n=562 27.8% n=748 13.7% n=2,157 missing: 2.9% (n=473) table 5. fewer public library public-access workstations than patrons wishing to use them at certain times during a typical day by metropolitan status rural suburban urban total 10 or fewer workstations 68.8% n=4,444 74.5% n=2,347 69.1% n=880 70.5% n=7,670 more than 10 workstations 78.1% n=1,139 80.2% n=1,548 62.8% n=866 74.5% n=3,553 total 70.5% n=5,605 76.7% n=3,905 65.6% n=1,764 71.7% n=11,273 missing: 2.9% (n=473) table 6. sufficient public library public-access workstations available for patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 20.6% n=1,331 14.7% n=464 7.4% n=94 17.4% n=1,889 more than 10 workstations 11.0% n=161 8.4% n=163 6.0% n=83 8.5% n=406 total 18.9% n=1,501 12.3% n=627 6.6% n=177 14.6% n=2,304 missing: 2.9% (n=473) 18 information technology and libraries | march 200718 information technology and libraries | march 2007 table 7. public library connection speed insufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 25.4% n=668 12.1% n=297 27.4% n=233 9.8% n=173 15.4% n=34 10.2% n=90 more than 10 workstations 11.6% n=34 11.4% n=108 19.2% n=41 11.3% n=168 25.4% n=32 17.1% n=199 total 24.0% n=705 12.0% n=408 25.8% n=274 10.5% n=341 18.7% n=72 14.0% n=293 table 8. public library connection speed insufficient to meet patron needs at some times by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 34.1% n=898 19.3% n=474 37.1% n=315 29.0% n=511 50.0% n=130 27.0% n=238 more than 10 workstations 43.2% n=127 22.5% n=214 42.3% n=90 30.3% n=450 60.3% n=76 32.0% n=374 total 35.0% n=1,025 20.3% n=694 38.1% n=405 29.5% n=961 53.4% n=206 30.0% n=626 table �. public library connection speed is sufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 38.9% n=1,025 68.3% n=1,675 35.0% n=297 60.2% n=1,062 34.6% n=90 62.9% n=556 more than 10 workstations 45.2% n=133 66.1% n=628 38.5% n=82 54.9% n=817 14.3% n=18 50.9% n=594 total 39.5% n=1,158 67.5% n=2,306 35.7% n=379 57.9% n=1,886 28.0% n=108 56.0% n=1,168 table 10. public library connection speed insufficient to meet patron needs some or all of the time by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 59.5% n=1,566 31.4% n=771 64.6% n=549 38.8% n=684 65.4% n=170 37.1% n=328 more than 10 workstations 54.8% n=161 33.9% n=322 61.5% n=131 41.6% n=618 85.7% n=108 49.1% n=573 total 24.0% n=1,025 32.3% n=1,102 64.0% n=680 40.0% n=1,302 72.0% n=278 44.0% n=919 article title | author 1�assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 1� ■ discussion and selected issues the data presented point to a number of issues related to the current state of public library pac and internet­access adequacy in terms of available public access computers and bandwidth. the data also provide a foundation upon which to discuss the nature of quality and sufficient pac and internet access in a public library environment. while public libraries indicate increased ability to meet patron bandwidth demand when providing fewer publicly avail­ able workstations, public libraries indicate that they have difficulty in meeting patron demand for public access computers. growth of wireless connections in 2004, 17.9 percent of public library outlets offered wire­ less access, and a further 21.0 percent planned to make it available. outlets in urban and high­poverty areas were most likely to have wireless access. the majority of librar­ ies (61.2 percent), however, neither had wireless access nor had plans to implement it in 2004. as table 11 demon­ strates, the number of public library outlets offering wire­ less access has roughly doubled from 17.9 percent to 36.7 percent in two years. furthermore, 23.1 percent of outlets that do not currently have it plan to add wireless access in the next year. thus, if libraries follow through with their plans to add wireless access, 61.0 percent of public library outlets in the united states will have it by 2007. the implications of the rapid growth of the public library’s provision of wireless connectivity (as shown in table 11) on bandwidth requirements are significant. either libraries added wireless capabilities through their current overall bandwidth, or they obtained additional bandwidth to support the increased demand created by the service. if the former, then wireless access created an even greater burden on an already problematic band­ width capacity and may have actually reduced the overall quality of connectivity in the library. if the latter, libraries then had to shoulder the burden of increased expendi­ tures for bandwidth. either scenario required additional technology infrastructure, support, and expenditures. sufficient and quality connections the notion of sufficient and quality public library con­ nection to the internet is a moving target and depends on a range of factors and local conditions. for purposes of discussion in this paper, the authors used 769kbps to differentiate “slower” from “faster” connectivity. if, how­ ever, 1.5mbps or greater had been used to define faster connectivity speeds, then only 28.9 percent of public libraries would meet the criterion of “faster” connectiv­ ity (see table 1). and in fact, simply because 28.9 percent of public libraries report connection speeds of 1.5mbps or faster does not also mean that they have sufficient or quality bandwidth to meet the computing needs of their users, their staff, their vendors, and their service provid­ ers. some public libraries may need 10mbps to meet the pac needs of their users as well as the internal staff and management computing needs. the library community needs to become more edu­ cated and knowledgeable about what constitutes sufficient and quality connectivity in their library for the communi­ ties that they serve. a first step is to understand clearly the nature and type of the connectivity of the library. the next step is to conduct an internal audit that minimally: ■ identifies the range of networked services the library provides both to users as well as for the operation of the library; ■ identifies the typical bandwidth consumption of these services; ■ determines the demands of users on the bandwidth in terms of services they use; ■ determines peak bandwidth­usage times; ■ identifies the impact of high­consumption networked services used at these peak­usage times; ■ anticipates bandwidth demands of newer services and resources that users will want to access through the library’s infrastructure—myspace.com, youtube. com—regardless of whether or not the library is the direct provider of such services; and ■ determines what broadband services are available to the library, the costs of these services, and the “fit” of these services to the needs of the library. based on this and related information from such an audit, library administration can better determine the degree to which the bandwidth is sufficient in speed and quality. ■ planning for sufficient and quality bandwidth knowing the current condition of existing bandwidth in the library is not the same as successful technology plan­ ning and management to ensure that the library has, in fact, bandwidth that is sufficient in speed and quality. once an audit such as has been suggested is completed, careful planning for bandwidth deployment in the library is essential. it appears, however, that currently much of the management and planning for networked services is based first on what bandwidth is available as opposed to the bandwidth that is needed to provide the necessary services and resources in a networked environment. this stance puts public libraries in a reactive condition rather than a proactive condition regarding provision of net­ worked services. 20 information technology and libraries | march 200720 information technology and libraries | march 2007 most public library planning approaches stress the importance of conducting some type of needs assessment as a precursor to any type of planning.5 further, technology plans should include such things as goals, objectives, ser­ vices provision, and evaluation as they relate to bandwidth and the appropriate bandwidth needed. recent library technology planning guides, however, give little attention to the management, planning, and evaluation of band­ width as it relates to provision of networked services. it must be noted that some public libraries may be prevented from accessing higher bandwidth due to high cost, lack of availability of bandwidth alternatives, or other local factors that determine access to advanced telecommunications in their areas. in such circumstances, the audit may serve to inform the public service/utilities commissions, fcc, and others of the need for deploy­ ment of advanced telecommunications services in these areas. ■ bandwidth planning in a community context the audit and planning processes that have been described are critical activities for libraries. it is essential, however, for these processes to occur in the larger community con­ text. investments in technology infrastructure are increas­ ingly a community­wide resource that services multiple functions—emergency services, community access, local government agencies, to name a few. it is in this larger context that library pac and internet access occurs. moreover, there is a convergence of technology and service needs. for example, public libraries increasingly serve as agents of e­government and disaster­relief providers.6 first responders rely on the library’s infrastructure when theirs is destroyed, as hurricane katrina and other storms demonstrated. local, state, and federal government agen­ cies rely on broadband and pac and internet access (wired or wireless) to deliver e­government services. thus, at their core, libraries, emergency services, gov­ ernment agencies, and others have similar needs. pooling resources, planning jointly, and looking across needs may yield economies of scale, better service, and a more robust community technology infrastructure. emergency providers need access to reliable broadband and commu­ nications technologies in general, and in emergency situ­ ations in particular. libraries need access to high­quality broadband and pac technologies. both need access to wireless technologies. as broadcast networks relinquish ownership of the 700 mhz frequency used for analog television in february 2009, and this frequency is distributed to municipali­ ties for emergency services, now is an excellent time for libraries to engage in community technology planning for e­government, disaster planning and relief efforts, and pac and internet services. by working with the larger community to build a technology infrastructure, the library and the entire community benefit. ■ availability to high-speed connectivity one key consideration not known at this time is the extent to which public libraries—particularly those in rural areas—even have access to high­speed connec­ tions. many rural communities are served not by the large telecommunications carriers, but rather by small, privately owned­and­run local exchange carriers. iowa and wisconsin, for example, are each served by more than eighty exchange carriers. as such, public libraries are limited in capacity and services to what these exchange table 11. public-access wireless internet connectivity availability in public library outlets by metropolitan status and poverty metropolitan status poverty level provision of public-access wireless internet services urban suburban rural low medium high overall currently available 42.9% ± 4.9% (n=1,211) 42.5% ± 4.9% (n=2,240) 30.7% ± 4.6% (n=2,492) 38.0% ± 4.8% (n=5,165) 28.1% ±4.5% (n=679) 53.8% ± 5.0% (n=99) 36.7% ± 4.8% (n=5,943) not currently available and no plans to make it available within the next year 23.1% ± 4.2% (n=651) 29.7% ± 4.6% (n=1,562) 49.2% ± 5.0% (n=3,988) 37.4% ± 4.8% (n=5,091) 44.4% ± 4.9% (n=1,072) 21.0% ± 4.1% (n=39) 38.3% ± 4.9% (n=6,201) not currently available, but there are plans to make it available within the next year 30.6% ± 4.6% (n=864) 26.0% ± 4.4% (n=1,369) 18.6% ± 3.9% (n=1,509) 22.5% ± 4.2% (n=3,063) 26.2% ± 4.4% (n=633) 25.3% ± 4.4% (n=46) 23.1% ± 4.2% (n=3,742) article title | author 21assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 21 carriers offer and make available. thus, in some areas, dsl service may be the only form of high­speed connec­ tivity available to libraries. and, as suggested earlier, dsl may or may not be considered high speed given the needs of the library and the demands of its users. communities that lack high­quality broadband ser­ vices by telecommunications carriers may want to con­ sider building a municipal wireless network that meets the community’s broadband needs for emergency, disas­ ter, and public­access settings. as a community engages in community­wide technology planning, it may become evident that local telecommunications carriers do not meet the broadband needs of the community. such com­ munities may need to build their own networks, based on identified technology­plan needs. ■ knowledge of networked services connectivity needs patrons may not attempt to use high­bandwidth services at the public library because they know from previous visits that the library cannot provide acceptable connec­ tivity speeds to access that service—thus, they quit trying to access that service, limiting the usefulness of the pub­ lic library. in addition, librarians may have inadequate knowledge or information to determine when bandwidth is or is not sufficient to meet the demands of their users. indeed, the survey and site visits revealed that some librarians did not know the connection speeds that linked their library to the internet. consequently, libraries are in a dilemma: increase both the number of workstations and the bandwidth to meet demand; or provide less service in order to operate within the constraints of current connectivity infrastruc­ ture. and yet, roughly 45 percent of public libraries indi­ cate that they have no plans to add workstations within the next two years; the average number of workstations has been around ten for the last three surveys (2002, 2004, and 2006); and 80 percent of public libraries indicate that space limitations affect their ability to add workstations.7 hence, for many libraries, adding workstations is not an option. ■ missing the mark? the networked environment is such that there are multi­ ple uses of bandwidth within the same library—for exam­ ple, public internet access, staff access, wireless access, integrated library system access. we are now in the web 2.0 environment, which is an interactive web that allows for content uploading by users (e.g., blogs, mytube.com, myspace.com, gaming). streaming content, not text, is increasingly the norm. there are portable devices that allow for text, video, and voice messaging. increasingly, users desire and prefer wireless services. this is a new environment in which libraries provide public access to networked services and resources. it is an enabling environment that puts users fully in the content seat—from creation to design to organization to access to consumption. and users have choices, of which the public library is only one, regarding the information they choose to access. it is an environment of competition, advanced applications, bandwidth intensity, and high­quality com­ puters necessary to access the graphically intense content. the impacts of this new and substantially more com­ plex environment on libraries are potentially significant. as user expectations rise, combined with the provision of high­quality services by other providers, libraries are in a competitive and service­ and resource­rich informa­ tion environment. providing “bare minimum” pac and internet access can have two detrimental effects in that they: (1) relegate libraries to places of last resort, and (2) further digitally divide those who only have public­access computers and internet access through their public librar­ ies. it is critical, therefore, for libraries to chart a high­end course regarding pac and internet access, and not access that is merely perceived to be acceptable by the librarians. ■ additional research the context in which issues regarding quality pac and sufficient connectivity speeds to internet access reside is complex and rapidly changing. research questions to explore include: ■ is it possible to define quality pac and internet access in a public library context? ■ if so, what are the attributes included in the defini­ tion? ■ can these attributes be operationalized and mea­ sured? ■ assuming measurable results, what strategies can the library, policy, research, and other interested communities employ to impact public library move­ ment toward quality pac and internet access? ■ should there be standards for sufficient connectivity and quality pac in public libraries? ■ how can public librarians be better informed regard­ ing the planning and deployment of sufficient and quality bandwidth? ■ what is the role of federal and state governments in supporting adequate bandwidth deployment for public libraries?8 ■ to what extent is broadband deployment and avail­ ability truly universal as per the universal service 22 information technology and libraries | march 200722 information technology and libraries | march 2007 (section 254) of the telecommunications act of 1996 (p.l. 104­104)? these questions are a beginning point to a larger set of activities that need to occur in the research, practitioner, and policy­making communities. ■ obtaining sufficient and quality public-library bandwidth arbitrary connectivity speed targets, e.g., 200kbps or 769kbps, do not in and of themselves ensure quality pac and sufficient connectivity speeds. public libraries are indeed connected to the internet and do provide public­ access services and resources. it is time to move beyond connectivity­type and ­speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. given the widespread connectivity now provided from most public libraries, there continue to be increased demands for more and better networked services. these demands come from governments that expect public libraries to support a range of e­government services, from residents who want to use free wireless connectivity from the public library, to patrons who need to download music or view streaming videos (to name but a few). simply providing more or better connectivity will not, in and of itself, address all of these diverse service needs. increasingly, pac support will require additional public librarian knowledge, resources, and services. sufficient and quality bandwidth is a key component of those services. the degree to which public libraries can provide such enhanced networked services (requiring exceptionally high bandwidth that is both sufficient and of high quality) is unclear. mounting a significant effort now to better understand existing bandwidth use and plan for future needs and requirements in individual public libraries is essential. in today’s networked envi­ ronment, libraries must stay competitive in the provision of networked services. such will require sufficient and high­quality connectivity and bandwidth. ■ acknowledgements the authors gratefully acknowledge the support of the bill & melinda gates foundation and the american library association for support of the 2006 public libraries and the internet study. data from that study have been incorpo­ rated into this paper. references 1. information institute, public libraries and the internet (tal­ lahassee, fla.: information use management and policy insti­ tute, 2006). all studies conducted since 1994 are available at: http://www.ii.fsu.edu/plinternet (accessed march 1, 2007). 2. u.s. federal communications commission, high speed services for internet access: status as of december 31, 2005 (wash­ ington, d.c.: fcc, 2006), available at http://www.fcc.gov/ bureaus/common_carrier/reports/fcc­state_link/iad/ hspd0604.pdf (accessed mar. 1, 2007). 3. j. c. bertot et al., public libraries and the internet 2006 (tal­ lahassee, fla.: information use management and policy insti­ tute, forthcoming), available at http://www.ii.fsu.edu/plinternet (accessed mar. 1, 2007). 4. j. c. bertot et al., “drafted: i want you to deliver e­ government,” library journal 131, no. 13 (aug. 2006): 34–37. 5. c. r. mcclure et al., planning and role setting for public libraries: a manual of options and procedures (chicago: ala, 1987); e. himmel and w. j. wilson, planning for results: a public library transformation process (chicago, ala, 1997). 6. j. c. bertot et al., “drafted: i want you to deliver e­gov­ ernment.”; p. t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 7. j. c. bertot et al., public libraries and the internet 2006. 8. jaeger et al., “the policy implications of internet connec­ tivity in public libraries.” 44 information technology and libraries | march 2011 jennifer emanuel usability of the vufind next-generation online catalog vufind incorporates many of the interactive web and social media technologies that the public uses online, including features from online booksellers and commercial search engines. the vufind search page is simple, containing only a single search box and a dropdown menu that gives users the option to search all fields or to search by title, author, subject, or isbn/issn (see figure 1). to combine searches using boolean logic or to limit to a particular language or format, the user must use the advanced search feature (see figure 2). the recordresults page displays results vertically, with each result containing basic item information, such as title, author, call number, location, item availability, and a graphical icon displaying the material’s format. the results page also has a column on the right side displaying “facets,” which are links that allow a user to refine their search and browse results using catalog data contained within the result set (see figure 3). vufind also contains a variety of web 2.0 features, such as the ability to tag items, create a list of favorite items, leave comments about an item, cite an item, and links to google book previews and extensive author biographies data mined from the internet. corresponding to the beginning of the vufind trial at uiuc, the university library purchased reviews, synopses, and cover images from syndetic solutions to further enhance both vufind and the existing webvoyage catalog. an additional appealing aspect of vufind was its speed; the carli installation of webvoyage is slow to load and is prone to time out while conducting searches. the uiuc library first provided vufind (http:// www.library.illinois.edu/vufind) at the beginning of the 2008 fall semester and expected it to be trialed through the end of the spring semester 2009. use statistics show that throughout the fall semester (september through december), there were approximately six thousand unique visitors each month, producing a total of more than thirty-eight thousand visits. spring statistics show use averaging more than ten thousand visitors a month, an increase most likely from word-of-mouth. librarians at both uiuc and carli were interested in what users thought about vufind, especially in relation to the usability of the interface. with this in mind, the library launched several forms of assessment during the spring semester. the first was a quantitative survey based on yale’s vufind usability testing.3 the second was a more extensive qualitative usability test that had users conducting sample searches in the interface and telling the facilitator their opinions. this article will discuss the hands-on usability portion of this study. survey responses that support the results presented herein will be reported in a separate venue. while this article only discusses vufind at a single institution, it does offer a generalized view of next-generation catalogs and how library users use such a catalog compared to a traditional online catalog. the vufind open–source, next-generation catalog system was implemented by the consortium of academic and research libraries in illinois as an alternative to the webvoyage opac system. the university of illinois at urbana-champaign began offering vufind alongside webvoyage in 2009 as an experiment in next generation catalogs. using a faceted search discovery interface, it offered numerous improvements to the uiuc catalog and focused on limiting results after searching rather than limiting searches up front. library users have praised vufind for its web 2.0 feel and features. however, there are issues, particularly with catalog data. v ufind is an open–source, next-generation catalog overlay system developed by villanova university library that was released to the public as beta in 2007 and version 1.0 in 2008.1 as of july 2009, four institutions implemented vufind as a primary catalog interface, and many more are either beta or internally testing it.2 more information about vufind, including the technical requirements and compatible opacs, is available on the project website (http://www.vufind.org). in illinois, the state consortium of academic and research libraries in illinois (carli) released a beta installation of vufind in 2008 on top of its webvoyage catalog database. the carli installation of vufind is a base installation with minor customizations to the carli catalog environment. some libraries in illinois utilize vufind as an alternative to their online catalog, including the university of illinois at urbana-champaign (uiuc), which currently advertises vufind as a more user friendly and faster version of the library catalog. as a part of the evaluation of nextgeneration catalog systems, uiuc decided to conduct hands-on usability testing during the spring of 2009. the carli catalog environment is very complex and comprises 153 member libraries throughout illinois, ranging from tiny academic libraries to the very large uiuc library. currently, 76 libraries use a centrally managed webvoyage system referred to as i-share. i-share is composed of a union catalog containing holdings of all 76 libraries as well as individual institution catalogs. library users heavily use the union catalog because of a strong culture of sharing materials between member institutions. carli’s vufind installation uses the records of the entire union catalog, but has library-specific views. each of these views is unique to the member library, but each library uses the same interface to view records throughout i-share. jennifer emanuel (emanuelj@illinois.edu) is digital services and reference librarian, university of illinois at urbana-champaign. usability of the vufind next-generation online catalog | emanuel 45 not simply find them.6 as a result, the past five years have been filled with commercial opac providers releasing next-generation library interfaces that overlay existing library catalog information and require an up-front investment by libraries to improve search capabilities. as these systems are inherently commercial and require a significant investment of capital, several open–source, next-generation catalog projects have emerged, such as vufind, blacklight, scriblio, and the extensible catalog project.7 these interfaces are often developed at one institution with their users in mind and then modified and adapted by other institutions to meet local needs. however, because they can be locally customized, libraries with significant technical expertise can have a unique interface that commercial vendors cannot compete against. one cannot discuss next-generation catalogs without mentioning the metadata that underlie opac systems. some librarians view the interface as only part of the problem of library catalogs and point to cataloging and metadata practices as the larger underlying problem. many librarians view traditional cataloging using machine-readable cataloging (marc), which has been used since the 1960s, as outdated because it was developed with nearly fifty-year-old technology in mind.8 however, because marc is so common and allows cataloging with a fine degree of granularity, current opac systems still utilize it. librarians have developed additional cataloging standards, such as dublin core (dc), metadata object description schema (mods), and functional requirements for bibliographic records (frbr), but none of these have achieved widespread adoption for cataloging printed materials. newly developed catalog projects, such as extensible catalog, are beginning to integrate these new metadata schemas, but currently others continue to use marc.9 many librarians also advocate to integrate folksonomy, or user tagging, into library catalogs. folksonomy is used by many library websites, most notably flickr, delicious, and librarything, each of which store user-submitted content that istagged with self-selected keywords that allow for easy retrieval and discovery.10 vufind integrates tagging into individual item records ■■ literature review librarians have complained about the usability of online catalogs since they were first created.4 when amazon.com became the go-to site for books and book information in the early 2000s, librarians and their users began to harshly criticize both opac interfaces and metadata standards.5 ever since north carolina state university announced a partnership with the commercial-search corporation endeca in 2006, librarians have been interested in the next generation of library catalogs and more broadly, discovery systems designed to help users discover library materials, figure 1. vufind default search figure 2. vufind advanced search figure 3. facets in vufind 46 information technology and libraries | march 2011 searching the library’s online catalog and were eager to see changes made to it. the test used was developed from a statewide usability test of different catalog interfaces usedin illinois. the test was adapted using the same sample searches, but was customized to the features and uses of vufind (see appendix). the vufind test was similar to the original test to allow a comparison of other catalog interfaces to vufind for internal evaluation purposes. i designed the test to allow subjects to perform a progressively complicated series of sample searches using the catalog while the moderator pointed out various features of the catalog interface. subjects were also asked what they thought about the search result sets and their opinions of the interface and navigation; they also were asked to perform specific tasks using vufind. the tasks were common library-catalog tasks using topics familiar at undergraduate–level students. the tasks ranged from a keyword search for “global warming” to a more complicated search for a specific compact disc by the artist prince. the tasks also included using the features associated with creating and using an account with vufind, such as adding tags and creating a favorite items list. through completing the test, subjects got an overview of vufind and were then asked to draw conclusions about their experience and compare it to other library catalogs they have used. the tests were performed in a small meeting room with one workstation set up with an install of the morae software, a microphone, and a web camera. morae is a very powerful software program developed by techsmith that records the screen on which the user is interacting with an interface, as well as environmental audio and video. although the study did not utilize all the features of the morae software, it was invaluable to the researcher to be able to review the entire testing experience with the same detail as when the test actually occurred in person. the study was carried out with the researcher sitting next to the workstation asking subjects to perform a task from the script while morae recorded all of their actions. once all fifteen subjects completed the test, the researcher watched the resulting videos and coded the answers into various themes on the basis of both broad subject categories and individual question answers. the researcher then gathered the codes into categories and used them to further analyze and gain insight into both the useful features of and problems with the vufind interface. ■■ analysis participants generally liked vufind and preferred it to the current webvoyage system. when asked to choose which catalog they would rather use, only one person, a faculty member, stated he would still use webvoyage. this faculty but does not pull tags from other sources; rather, users must tag items individually. additionally, next-generation catalogs offer a search mechanism that focuses on discovery rather than simply searching for library materials. users, accustomed to new ways of searching both on the internet and through commercial library indexing and abstracting databases, now search in a fundamentally different style than they did when opacs first became a part of library services. the online catalog is now just one of many tools that library users use to locate information and now covers fewer resources than it did ten to fifteen years ago. library users are now accustomed to using a single search box, such as with google; they also use nonlibrary online tools to find information about books and no longer view library catalogs as the primary place to look for books.11 as users are no longer accustomed to using the controlled language and particular searching methods of library catalogs because they have moved to discovering materials online, libraries must adapt to new way of obtaining information and focus not on teaching users how to locate library materials, but give them the tools to discover on their own.12 vufind is one option among many in the genre of next-generation or discovery-catalog tools. ■■ methods the study employed fifteen subjects who participated in individual, hands-on usability test sessions lasting an average of thirty minutes. i recruited volunteers though several methods, including posting to a university faculty and staff e-mail discussion list, an e-mail discussion lists aimed toward graduate students, and flyers in the undergraduate library. all means of recruitment stated that the library sought volunteer subjects to perform a variety of sample searches in a possible new library catalog interface. i also informed subjects that there was a gift card as a thank you for their time. all subjects had to sign a human subjects statement of informed consent approved by the university of illinois institutional review board. i sought a diverse sample, and therefore accepted the first five volunteers from the following pools: faculty and staff, graduate students, and undergraduate students. i felt that these three user groups were distinct enough to warrant having separate pools. the number of five users in each group was chosen because of jakob nielsen’s statement that five users will find 85 percent of usability problems and that fifteen users will discover all usability problems.13 although i did not specifically aim to recruit a diverse sample, the sample showed a large diversity in areas including age, library experience, and academic discipline. all subjects stated they had some experience usability of the vufind next-generation online catalog | emanuel 47 though there were questions as to how results were deemed relevant to the search statement as well as how they were ranked. participants were then asked to look at the right sidebar of the results page, which contains the facets. most users did not understand the term “facets,” with faculty and staff understanding the term more than graduate and undergraduate students did. one faculty member who understood the term facet noted that “facets are like a diamond with different sides or ways of viewing something.” however, when asked what term would be better to call the limiting options other than facet, several users suggested either calling the facets “categories” or renaming the column “refine search,” “narrow search,” or “sort your search.” participants were then asked to find how to see results for other i-share libraries. only two faculty members found i-share results quickly, and just half of the remaining participants were able to find the option at all. when asked what would make that option easier to find, most said they liked the wording, but the option needed to stand out more, perhaps with a different colored link or bolder type. two users thought having the location integrated as a facet would be the most useful way of seeing it. participants, however, quickly took to using the facets, as they were asked to use the climate change search results to find an electronic book published in 2008. no user had problems with this task, and several remarked that using facets was a lot easier than limiting to format and year before searching. the next task for participants was to open and examine a single record within their original climate change results (see figures 4 and 5). participants liked the layout, including the cover image with some brief title information, and a tabbed bar below showing additional information, such as more detailed description, holdings information, a table of contents, reviews, comments, and a link to request the item. several users remarked that they liked having information contained under tabs, but vufind organized each tab as a new webpage that made going back to previous tabs or the results page cumbersome. the only problem users had with the information contained within the tabs was the “staff view,” which contained the marc record information. most users looked at the marc record with confusion, including one graduate student who said, “if the staff view is of no use to the user, why even have it there?” one other useful feature that individual records in vufind contain is a link to an overlay window containing the full citation information for the item in both apa and mla formats. users were able to find this “cite this” link and liked having that information available. however, several participants noted that citation information would be much more beneficial if it could be easily exported to refworks or other bibliographic software. the next several searches used progressively higher-level member thought most of his searches were too advanced for the vufind interface and needed options that vufind did not have, such as limiting a search to an individual library or call number searching. this user did, however, specify that vufind would be easier to use for a fast and simple search. other users all responded very favorably to vufind, liking it better than any other online catalog they have used, with most stating that they wanted it as a permanent addition to the library. the most common responses to vufind were that the layout is easier on the eyes and displayed data much better than the webvoyage catalog; there were no comments about actual search results. several users stated that it was nice to be able to do a broad search and then have all limiting options presented to them as facets, allowing users to both limit after searching and letting them browse through a large number of search results. one user, an undergraduate student, stated she liked vufind because it “was new” and she always wants to try out new things on the internet. the first section of the usability test asked users to examine both the basic and advanced search options. users easily recognized how the interface functioned and liked having a single search box as the basic interface, noting that it looked more like a web search engine. they also recognized all of the dropdown menu options and agreed that the options included what they most often searched. however, four users wanted a keyword search. even though there is not a keyword search in webvoyage and there is an “all fields” menu option, participants seemed to think of the one box search universally as a keyword search and wanted that to be the default search option. one participant, an international graduate student, remarked that keyword is more understood by international students than the “all fields” search because, internationally, a field is not a search field but a scholarly field such as education or engineering. in the advanced search, all users thought the search options were clear and liked having icons to depict the various media formats. however, two users did remark that it would be useful to be able to limit by year on the advanced search page. the advanced search also is where the user can select one of seven languages, all of which are considered western languages, including latin and russian. two users, both international graduate students, stated that more languages would be beneficial, especially asian and more slavic languages. the university of illinois has separate libraries for asian and slavic materials, and these two participants said it would be useful to have search options that include the languages served by the libraries. the first task that participants were asked to do was an “all fields” search for “climate change.” they were instructed to look at the results page and an individual record to give feedback as to how they liked the layout and what they thought of the search results. upon looking at the results, all participants thought they were relevant, 48 information technology and libraries | march 2011 to items in which james joyce is the author, no participant had any problems, though several pointed out that there were three facets using his name—joyce, james; joyce, james avery; and joyce, j. a.—because of inconsistencies in cataloging (see figure 6). participants were next asked to search for an audio recording by the artist prince using the basic (single) search box. most participants did an “all fields” search for prince and attempted to use the facets to limit by a particular format. all but one was confident that they achieved the proper result, but there was confusion about the format. some participants were confused as to what format an audio recording was because the corresponding facet was for a music recording. a couple of users thought “audio recording” could be a spoken-word recording. most participants preferred that the format facets be more concrete toward a single actual physical format, such as a record, cassette, or a compact disc (see figure 7). physical formats appeared to resonate more with users than the broad cataloging term of “music recording.” a more specific format type (i.e., compact disc) is contained in the call number and should be straightforward to pull out as a facet. it appears vufind pulls the format information from marc field 245 subfield $h for medium rather than the call number (which at illinois can specify the format) or the 300 physical description field or another field such as a notes field that some institutions may use to specify the exact format. however, when participants were asked to further use facets to find prince’s first album, 1978’s for you, limitations with vufind became more apparent. each participant used a different method to search for this album, and none actually found the item either locally or in i-share, though the item has multiple copies available in both locations. most participants tried initially limiting by date because they were given that information. however, vufind’s facets focus on eras rather than specific years, which participants stated was frustrating as many items can fall under a broad era. also, the era facets brought up many more eras than one would consider an audio research skills and showed problems with both vufind and the catalog record data. the first search asked participants to do an “all fields” search for james joyce. all were able to complete the search, but there was notable confusion as to which records were written by james joyce and which were items about him. about half of the first-page results for this search did not list an author on the results page. vufind appears to pull the author field on the results page from the 100 field in the marc record, so if the 700 field is used instead for an editor, this information is not displayed on the results page. individual records do substitute the 700 field if the 100 field is not present, but this should also be the case on the initial results screen as well. several users thought it was strange that the results page often did not list the author, but an author was listed in the individual record. additionally, when asked to use the facets to limit figure 4. results set figure 5. record display figure 6. author facet figure 7. format facet usability of the vufind next-generation online catalog | emanuel 49 about both the reviews and comments that could be seen in the various records participants were asked to examine. many of the participants wanted more information as to where the reviews came from because this information was not clear. they also wanted to know whether the reviews or comments from catalog users had any type of moderation by a librarian. for the most part, participants liked having reviews inside the catalog records, but they liked having a summary even more. several users, all graduate students, expressed concern about the objectiveness of having reviews in the catalog, especially because it was not clear who did the review and feared that reviews may interject some bias that had no place in a library catalog record. one of these participants stated, “if i wanted reviews, i would just go to amazon. i don’t expect reviews, which can be subjective, to be in a library catalog—that is too commercial.” several undergraduate participants stated that reviews helped them decide whether the book was something that would be useful to them. the final task of the usability test asked participants to create an account with vufind because it is not connected to our user database. most users had no problems finishing this task, though they found some problems with the interface. first, it was not clear that users had to create an account and could not log in with their library number as they did in the library’s opac. second, the default field asks users for their barcode, which is not a term used at uiuc (users are assigned a library number). once logged in, participants were satisfied with the menu options and how their account information was displayed. finally, participants were asked, while logged in, to search for a favorite book and add it to their favorites list. all users liked the favorites-list feature, and many already knew of ways they could use it, but several wished they could create multiple lists and have the ability to arrange lists in folders. ■■ discussion participants thought favorably of the vufind interface and would use it again. they liked the layout of information much more than the current webvoyage interface and thought it was much easier to look at. they also had many comments that the color scheme (yellow and grey) was easier than the blues of the primary library opac. vufind also had more visual elements, such as cover images and icons representing format types that participants also commented on favorably. when asked to compare vufind to both the webvoyage catalog and amazon, only one participant indicated a preference for amazon, while the rest preferred vufind. the user who specified amazon, a faculty member, stated that that was where he always started searching for books; he would then search for specific titles in the recording, such as the 15th century. granted, the 15th century probably brings up music that originated in that era, not recorded then, but participants wanted the date to correspond to when an item was initially published or released. it appears that vufind pulls the era facet information from the subject headings and ignores the copyright or issue year. to users, the era facets are not useful for most of their search needs; users would rather limit by copyright or the original date of issue. another search that further highlighted problems searching for multimedia in vufind is the title search participants did for gone with the wind. everyone thought this search brought up relevant results, but when asked to determine whether the uiuc library had a copy of the dvd, many users expressed confusion. once again, the confusion was based on the inability to limit to a specific format. participants could use the facets to limit to a film or video, but not to a specific format. several participants stated that they needed specific formats because when they are doing a comparable search, they only want to find dvds. however, because all film formats are linked together under “film/video,” they must to go into individual records and examine the call number to determine the exact format. most participants stated clearly that “dvd” needed to be it’s own format facet and that entering a record to find the format required too much effort. participants also expressed frustration that the call number was the only place to determine specific format and believed that this information should be contained in the brief item information and not buried in the tabbed areas. the frustrations with the lack of specific formats also were evident when participants were asked to do an advanced search for a dvd on public speaking. all users initially thought the advanced search limiter for film/video was sufficient when they first looked at the advanced search options. however, when presented with an actual search (“public speaking”), they found that there should be more options and specific format choices up-front within the advanced search. another search that participants conducted was an author search for jack london. they then used the facets to find the book white fang. this search was chosen because the resulting records are mostly for older materials that often do not contain a lot of the additional information that newer records contain. participants looked at a specific record and then were asked what they thought of the information that was displayed. most answered that they would like as much information as you can give them, but were accepting of missing information. several participants stated that most people already know this book and thus did not need additional information. however, when pressed as to what information they would like added to the record, several users stated a summary would be the most useful. additionally, several users asked for more information 50 information technology and libraries | march 2011 the simplicity of the favorites listing feature, the difficulty of linking to other i-share library holdings, and the difficulties in using the facet categories. ■■ implications i intend to continue to perform similar usability tests on next-generation catalogs on a trial basis to examine one aspect regarding the future of online catalogs at uiuc. uiuc is looking at various catalog interfaces, of which vufind is one option, to see which best meets the needs of our users. users stated multiple times during testing that they find the current webvoyage interface to be very frustrating and will accept nearly anything that is an improvement, even if the new interface has some usability issues. vufind is not perfect for all searches, as shown by a lack of a call number search and the limitations in searching for multimedia options, but it does provide a more intuitive interface for most patrons. the future of vufind at uiuc is still open. development is currently stalled because of a lack of developer updates and internal staffing constraints both at uiuc and carli. however, because vufind is open–source, and the only ongoing cost is that of server maintenance, both carli and the library are continuing to display it as an option for searching the catalog. both carli and uiuc are closely examining other options for catalog interfaces that would provide patrons with a better search experience, but they have taken no further action to permanently adapt either vufind or to demo other options. despite its limitations, vufind is still a viable option for libraries with substantial technology expertise that are interested in a next-generation catalog interface at a low price. although it does have limitations, it has a better out-of-the-box interface than traditional opacs and should be considered alongside commercial options for any library thinking of adapting a catalog interface overlay. this usability test focused on one institution’s installation of vufind, which may or may not apply to other installations and other institutional needs. it would be interesting to study an installation of vufind at a smaller, nonresearch institution, where users have different searching needs and expectations related to a library’s opac. references 1. john houser, “the vufind implementation at villanova university,” library hi tech 27, no. 1 (2009): 96–105. 2. vufind, “vufind: about,” http://www.vufind.org/about .php (accessed sept. 10 2009). 3. kathleen bauer, “yale university vufind test— undergraduates,” http://www.library.yale.edu/libepub/ usability/studies/summary_undergraduate.doc (accessed mar. 20, 2010). library catalog to check availability. other participants who made comments about amazon stated that it was commercial and more about marketing materials, while the library catalog just provided the basic information needed to evaluate materials without attempting to sell them to you. several participants also stated they checked amazon for book information, but generally did not like it because of its commercial nature; because vufind provides much of the same information as amazon, they will use vufind first in the future. participants also thought amazon was for a popular and not scholarly audience, making it not useful for academic purposes. most users did not have much to say about the webvoyage opac, except it was overwhelming, had too many words on the result screen, and was not pleasantly visual. participants were also asked to look at vufind, amazon, and webvoyage from a visual preference. again, participants believed that vufind had the best layout. they liked that vufind had a very clean and uncluttered interface and that the colors were few and easy on the eye. they also commented about the visuals contained (cover art and icons) in the records and the vertical orientation of vufind (webvoyage has a horizontal orientation) to display records. they also liked how the facets were displayed, though two users thought they would be better situated on the left side of the results because they scan websites from the left to the right. the one thing that was mentioned several times was vufind’s lack of the star rating system that amazon uses to quickly rate an item. participants thought such a system might be better than reviews because it allows users to quickly scan through the item and not have to read through multiple reviews. when asked to rate the ease of use for vufind, with 1 being easy and 5 being difficult, participants rated it an average of 1.92. faculty rated the ease at 1.6, graduate students at 1.75, and undergraduates at 2.8. undergraduates were more likely to get frustrated at media searching and thought that some of the facets related to media items were confusing, which they used to explain their lower scores. however, when asked if they would rather use vufind over the current library catalog (webvoyage), all but one participant enthusiastically stated they would use vufind. most users stated that although vufind was not perfect, it was still much better than the other library catalog because of the better layout, visuals, and ability to limit results. the only user that specified they would still rather use the webvoyage catalog believed it had more options for advanced search, such as call number searching, which vufind lacked. there are, however, several changes that could make vufind more useful to our users that came out of usability testing. some of these are easy to implement on a local level, and others would improve the base build of vufind. a number of issues arose from usability testing, but the largest issues are the lack of refworks integration, usability of the vufind next-generation online catalog | emanuel 51 9. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19. 10. tom steele, “the new cooperative cataloging,” library hi tech 27, no. 1 (2009): 68–77. 11. ian rowlands and david nicholas, “understanding information behaviour: how do students and faculty find books?” journal of academic librarianship 34, no. 1 (2008): 3–15. 12. ja mi and cathy weng, “revitalizing the library opac: interface, searching, and display challengers,” information technology & libraries 27, no. 1 (2008): 5–22. 13. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed mar. 20, 2010). 4. christine borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996): 493–503. 5. georgia briscoe, karne selden, and cheryl rae nyberg, “the catalog versus the home page: best practices for connecting to online resources,” law library journal 95, no. 2 (2003): 151–74. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 7. marshall breeding, “library technology guides: discovery layer interfaces,” http://www.librarytechnology. org/discovery.pl?sid=20100322930450439 (accessed mar. 2010). 8. karen m. spicher, “the development of the marc format,” cataloging & classification quaterly 21, no 3/4 (1996): 75–90. appendix. vufind usability study logging sheets i. the look and feel of vufind a. basic screen (the vufind main page) 1) is it obvious what to do? yes _____ no _____; what were you trying to do? 2) open the drop down box, examine the options. do you recognize theseoptions? yes _____ no _____ some _____ (if some, find out what the patron was expecting and get suggestions for improvement). comments: b. click on the advanced search option—take a minute to allow the participants to look around the screen 1) examine each of the advanced search options a) are the advanced search options clear? yes_____ no_____ b) are the advance search options helpful? yes_____no_____ 2) examine the limits fields, open the drop-down menu boxes a) are the limits clearly identified? yes _____ no _____ b) are the pictures helpful? yes _____ no _____ c) are the drop-down menu box options clear? yes _____ no _____ comments: ii. (back to the) basic search field a. enter the phrase—climate change (search all fields)—examine the search results 1) do the records retrieved appear to be relevant to your search statement? yes _____no _____don’t know _____ 2) what information would you like to see in the record? how should it be displayed? 3) examine the right sidebar. are the “facets” clear? yes _____no _____some, not all _____ 4) if you want to view items from other libraries in your search results, can you find the option? yes _____no _____ 5) can you find an electronic book published in 2008? yes _____no _____don’t know _____ comments: b. click on the first book record in the original climate change search results 1) is information about the book clearly represented? yes _____ no _____ 2) is it clear where to find item? yes _____ no _____ 3) look at the tags. do you understand what this feature is? yes _____ no _____ comments: c. look at the brief item information provided on the screen 1) is the information displayed useful in determining the scope and content of the item? yes _____no _____ 2) are the topics in the record useful for finding additional information on the topic? yes _____no _____ comments: d. click on each button below the brief record information 1) is this information useful? yes _____ no _____ 2) are the names for the tabs accurate? what should they be named? e. can you easily determine where the item is located and how to request it? yes _____no _____ comments: f. go back to the basic search box and enter the author james joyce (all fields) as a new search 1) is it easy to distinguish items by james joyce from items about james joyce? yes _____no _____ 2) using the facets, can you find only titles with james joyce as author? yes _____no _____ 3) can you find out how to cite an item? yes _____ no _____ comments: 52 information technology and libraries | march 2011 g. now try to find an audio recording by the artist prince using basic search were you successful? yes _____no _____ h. find the earliest prince recording ( “for you”; 1978). is it in the local collection? yes _____ no _____ if not, can you get a copy? comments: iii. in the advanced search screen: a. use the title drop down to find the item: gone with the wind 1) were you successful? yes _____ no _____ not sure _____ 2) can you locate a dvd of the same title? yes _____ no _____ 3) are copies of the dvd available in the university of illinois library? yes _____ no _____ comments: b. use the author drop down in the advanced search to locate titles by: jack london using the facets, find and open the record for the jack london novel, white fang. explore each of the: description, holdings, and comments tabs: 1) is this information useful? yes _____ no _____ 2) would you change the names of the tabs or the information on them? 3) other than your local library copy of white fang, can you find copies at other libraries? yes _____ no _____ comments: c. using the advanced search, find a dvd on public speaking (hint: use the limit box to select the film/video format) are there instructional videos in the university of illinois library? yes _____ no _____ 1) identify the author that’s responsible for one of the dvds 2) can you easily find other works by this author? yes _____ no _____ comments: iv. exploring the account features: a. click on login in the upper right corner of the page. on the next page, create an account. is it clear how to create an account? yes _____ no _____ b. once you have your account and are logged in to vufind, look at the menu on the right hand side. is it clear what each of the menu items are? yes _____ no _____ c. while still logged in, do a search for your favorite book and add it to your favorites list. is this tool useful, would you consider using it? yes _____ no _____ comments: v. comparing vufind to other resources: a. open three browser windows (this is easiest in firefox by entering ctrl-t for each new window) with 1) your library catalog 2) vufind 3) amazon.com enter global warming in each website in the basic search window of each. based on your initial reactions, which service appears the best for most of your uses? library catalog _____ vufind _____ amazon _____ comments: c. do you have a preference in the display formats? library catalog _____ vufind _____ amazon _____ comments: debriefing now that you have used vufind, how would you rate it—on a scale from 1–5, from easy to confusing to use? comments? how does it compare to other library catalogs you’ve used? if vufind and your home library catalog were available side-by-side, which would you use first? why? are you familiar with any of these other products: aquabrowser _____ googlebooks _____ microsoft live search _____ librarything _____amazon.com _____other preferred service _____ that’s it! thank you for participating in our usability. you will be receiving one other survey through email, we appreciate your opinions on the vufind product. lita covers 2, 3, and 4 index to advertisers modeling a library website redesign process: developing a user-centered website through usability testing danielle a. becker and lauren yannotta information technology and libraries | march 2013 6 abstract this article presents a model for creating a strong, user-centered web presence by pairing usability testing and the design process. four rounds of usability testing were conducted throughout the process of building a new academic library web site. participants were asked to perform tasks using a talk-aloud protocol. tasks were based on guiding principles of web usability that served as a framework for the new site. results from this study show that testing throughout the design process is an effective way to build a website that not only reflects user needs and preferences, but can be easily changed as new resources and technologies emerge. introduction in 2008 the hunter college libraries launched a two-year website redesign process driven by iterative usability testing. the goals of the redesign were to: • update the design to position the library as a technology leader on campus; • streamline the architecture and navigation; • simplify the language used to describe resources, tools, and services; and • develop a mechanism to quickly incorporate new and emerging tools and technologies. based on the perceived weaknesses of the old site, the libraries’ web committee developed guiding principles that provided a framework for the development of the new site. the guiding principles endorsed solid information architecture, clear navigation systems, strong visual appeal, understandable terminology, and user-centered design. this paper will review the literature on iterative usability testing, user-centered design, and thinkaloud protocol and the implications moving forward. it will also outline the methods used for this study and discuss the results. the model used, building the design based on the guiding principles and using the testing to uphold those principles, led to the development of a strong, user-centered site that can be easily changed or adapted to accommodate new resources and technologies. we believe this model is unique and can be replicated by other academic libraries undertaking a website redesign process. danielle a. becker (dbe0003@hunter.cuny.edu) is assistant professor/web librarian, lauren yannotta (lyannotta@hotmail.com) was assistant professor/instructional design librarian, hunter college libraries, new york, new york. mailto:dbe0003@hunter.cuny.edu mailto:lyannotta@hotmail.com modeling a library website redesign process | becker 7 background the goals of the research were to (1) determine the effectiveness of the hunter college libraries website, (2) discover how iterative usability testing resulting in a complete redesign impacts how the students perceive the usability of a college library website, and (3) reveal student informationseeking habits. a formal usability test was conducted both on the existing hunter college libraries website (appendix a) and the following drafts of the redesign (appendix b) with twenty users over an eighteen-month period. the testing occurred before the website redesign began, while the website was under construction, and after the site was launched. the participants were selected through convenience sampling and informed that participation was confidential. the intent of the usability test was to uncover the flaws in navigation and terminology of the current website and, as the redesign process progressed, to incorporate the users’ feedback into the new website’s design to closely match their wants and needs. the redesign of the website began with a complete inventory of the existing webpages. an analysis was done of the website that identified key information, links, units within the department, and placement of information in the information architecture of the website. we identified six core goals that we felt were the most important for all users of the library’s website: 1. user should be able to locate high-level information within three clicks. 2. eliminate library jargon from navigational system using concise language. 3. improve readability of site. 4. design a visually appealing site. 5. create a site that was easily changeable and expandable. 6. market the libraries’ services and resources through the site. literature review in 2010, oclc compiled a report, “the digital information seeker,” that found 84 percent of users begin their information searches with search engines, while only 1 percent began on a library website. search engines are preferred because of speed, ease of use, convenience, and availability.1 similar studies such as emde et al., and gross and sheridan, have shown that students are not using library websites to do their research.2 gross and sheridan assert in their article on undergraduate search behavior that “although students are provided with library skills sessions, many of them still struggle with the complex interfaces and myriad of choices the library website provides.” 3 this research shows the importance of creating streamlined websites that will information technology and libraries | march 2013 8 compete for our students’ attention. in building a new website at the hunter college libraries, we thought the best way to do this was through user-centered design. web designers both inside and outside the library have recognized the importance of usercentered design. nielsen advises that website structure should be driven by the tasks the users came to the site to perform.4 he asserts the amount of graphics on webpages should be minimized because they often affect page download times and that gratuitous graphics (including text rendered as images) should be eliminated altogether. 5 he also contends it is important to ensure that page designs are accessible to all users regardless of platform or newness of technology. 6 in their article, “how do i find an article? insights from a web usability study,” cockrell and jayne cited instances when researchers concluded that library terminology contributed to patrons’ difficulties when using library websites, thus highlighting the importance of understandable terminology. hulseberg and monson found in their investigation of student-driven taxonomy for library website design that “by developing our websites based on student-driven taxonomy for library website terminology, features, and organization, we can create sites that allow students to get down to the business of conducting research.” 7 performing usability testing is one way to confirm user-centered design. in his book don’t make me think!, krug insists that usability testing can provide designers with invaluable input. that, taken together with experience, professional judgment, and common sense, makes design choices easier.8 ipri, yunkin, and brown, in their article “usability as a method for assessing discovery,” emphasize the important role usability testing has in capturing emotional and aesthetic responses users have to websites, along with expressions of satisfaction with the layout and logic of the site. even the discovery of basic mistakes, such as incorrect or broken links and ineffective wording, can negatively affect discovery of library resources and services. 9 in battleson, booth, and weatherford’s literature review for their usability testing of an academic library website case study, they summarize dumas and redish's discussion of the five facets of formal usability testing: (1) the goal is to improve the usability of the interface, (2) testers should represent real users, (3) testers perform real tasks, (4) user behavior and commentary are observed and recorded, and (5) data are analyzed to recognize problems and suggest solutions. they conclude that when usability testing is "applied to website interfaces, this test method not only results in a more usable site, but also allows the site design team to function more efficiently, since it replaces opinion with user-centered design."10 this allows the designers to evaluate the results and identify problems with the design being tested. 11 usability experts nielsen and tahir contend that the earlier and more frequently usability tests are conducted, the more impact the results will have on the final design of the website because the results can be incorporated throughout the design process. they conclude it is better to conduct frequent, smaller studies with a maximum of five users. they assert, “you will always have discovered so many blunders in the design that it will be better to go back to the drawing board modeling a library website redesign process | becker 9 and redesign the interface than to discover the same usability problems several more times with even more users.” 12 based on the strength of the literature, we decided to use iterative testing for our usability study. krug points out that testing is an iterative process because designers need to create, test, and fix based on test results, then test again.13 according to the united states department of health and human services report “research-based web design and usability guidelines,” conducting before and after studies when revising a website will help designers determine if changes actually made a difference in the usability of the site.14 manzari and trinidad-christensen found in their evaluation of user-centered design for a library website, iterative testing is when a product is tested several times during development, allowing users’ needs to be incorporated into the design. in their study, their aim was that the final draft of their website would closely match the users’ information needs while remaining consistent, easy to learn, and efficient.15 battleson, booth, and weintrop report that there is “a consensus in the literature that usability testing be an iterative process, preferably one built into a web site’s initial design.” 16 they explain that “site developers should test for usability, redesign, and test again—these steps create a cycle for maintaining, evaluating and continually improving a site.” 17 george used iterative testing in her redesign of the carnegie mellon university libraries website and concluded that it was “necessary to provide user-centered services via the web site.” 18 cobus, dent, and ondrusek used six students to usability test the “pilot study.” then eight students participated in the first round of testing; then librarians modified the prototype and tested fourteen students in the second and final round. after the second round of testing they used the results of this test to analyze the user recordings and deliver the findings and proposed “fixes” to the prototype pages to the web editor.19 mcmullen’s redesign of the roger williams university library website was able to “complete the usability-refinement cycle” twice before finalizing the website design.20 but continued refinements were needed, leading to another round of usability tests to identify and correct problem areas.21 bauer-graham, poe, and weatherford did a comparative study of a library websites’ usability via a survey and then redesigned the website after evaluating the survey’s results. they waited a semester, distributed another survey to determine the functionality of the current site. the survey had the participants view the previous design and the current design in a side-by-side comparison to determine how useful the changes made to the site were. 22 when testing participants, in the article “how do i find an article? insights from a web usability study,” cockrell and jayne suggest using a web interface to perform specified tasks while a tester observes, noting the choices made, where mistakes occur, and using a “think aloud” protocol. they found that modifying the website through an ongoing, iterative process of testing, refining, and retesting its component parts improves functionality. 23 in conducting our usability testing we used a think-aloud protocol to capture the participants’ actions. van den haak, de jong, and schellens define think-aloud protocol as relying on a method information technology and libraries | march 2013 10 that asks users to complete a set of tasks and to constantly verbalize their thoughts while working on the tasks. the usefulness of this method of testing lies in the fact that the data collected reflect the actual use of the thing being tested and not the participants’ judgments about its usability. instead, the test follows the individual’s thoughts during the execution of the tasks. 24 nielsen states that think-aloud protocol “may be the single most valuable usability engineering method. . . . one gets a very direct understanding of what parts of the [interface/user] dialog cause the most problems, because the thinking aloud method shows how users interpret each individual interface item.” 25 turnbow ‘s article “usability testing for web redesign: a ucla case study” states that using the “think-aloud protocol” provides crucial real-time feedback on potential problems in the design and organization of a website.26 cobus, dent, and ondrusek used the think-aloud protocol in their usability study. they encouraged participants to talk out loud as they answered the questions, audio taped their comments, and captured their on-screen navigation using camtasia.27 this information was used to successfully reorganize hunter college library’s website. method an interactive draft of hunter college libraries redesigned website was created before the usability study was conducted. in spring 2009, the authors created the protocol for the usability testing. a think-aloud protocol was agreed upon for testing both the old site and the drafts of the new site, including a series of post-test questions that would allow participants to share their demographic information and give subjective feedback on the drafts of the site. draft questions were written, and we conducted mock usability tests on each other. after several drafts we revised our questions and performed pilot tests on an mlis graduate student and two undergraduate student library assistants with little experience with the current website. we ascertained from these pilot tests that we needed to slightly revise the wording of several questions to make them more understandable to all users. we made the revisions and eliminated a question that was redundant. all recruitment materials and finalized questions were submitted to the institutional review board (irb) for review and went through the certification process. after receiving approval we secured a private room to conduct the study. participants were recruited using a variety of methods. signs were posted throughout the library, an e-mail was sent out to several hunter college distribution lists, and a tent sign was erected in the lobby of the library. participants were required to be students or faculty. participants were offered a $10.00 barnes & noble gift card as incentive. applicants were accepted on a rolling basis. twenty students participated in the web usability study (appendix c). no faculty responded to our requests for participation so a decision was made to focus this usability test on students rather than faculty because students comprise our core user base. another usability test will be conducted in the future that will focus on faculty to determine how their academic tasks differ from undergraduates when using the library modeling a library website redesign process | becker 11 website. the redesigned site is malleable, which makes revisions and future changes in the design a predicted outcome of future usability tests. tests were scheduled for thirty-minute intervals. we conducted four rounds of testing using five participants per round. the two researchers switched questioner and observer roles after each round of testing. each participant was asked to think aloud while they completed the tasks and navigated the website. both researchers took notes during the tests to ensure detailed and accurate data was collected. each participant was asked to review the irb forms detailing their involvement in the study, and they were asked to consent at that time. their consent was implied if they participated in the study after reading the form. the usability test consisted of fifteen task-oriented questions. the questions were identical when testing the old and new draft site. the first round tested only the old site, while the following three rounds tested only the new draft site. we tested both sites because we believed that comparing the two sites would reveal if the new site improved performance. the questions (appendix d) were not changed after they were initially finalized and remained the same throughout the entire four rounds of the usability study. participants were reminded at the onset of the test and throughout the process that the design and usability of the site(s) were being tested, not their searching abilities. the tests were scheduled for an hour each, allowing participants to take the tests without time restrictions or without being timed. as a result, the participants were encouraged to take as much time as they needed to answer the questions, but were also allowed to skip questions if they were unable to locate answers. initially the tests were recorded using camtasia software. this allowed us to record participants’ navigation trails through their mouse movements and clicks. but, after the first round of testing, we decided that observing and taking notes was appropriate documentation, and we stopped using the software. after the participants completed the tests we asked them user preference questions to get a sense of their user habits and their candid opinions of the new draft of the website. these questions were designed to elicit ideas for useful links to include on the website and also to gauge the visual appeal of the site. information technology and libraries | march 2013 12 results table 1. percent of tasks answered correctly discussion hunter college libraries’ website was due for a redesign because the site was dated in its appearance and did not allow new content to be added quickly and easily. as a result, a decision was made to build a new site using a content management system (cms) to make the site easily expandable and simple to update. this study tested the simple tasks to determine how to structure the information architecture and to reinforce the guiding principles of the redesigned website. task successes and failures the high percentage of success of participants finding books on the redesigned website using the online library catalog and easily find library hours reinforced our guiding principle of understandable terminology and clear navigational systems. krug contends that navigation educates the user on the site’s contents through its visible hierarchy. the result is a site that guides the user through their options and instills confidence in task old site new site find a book using online library catalog 80% 86% find library hours 100% 100% get help from a librarian using questionpoint 40% 93% find a journal article 20% 66% find reference materials 0% 7% find journals by title 40% 66% find circulation policies 60% 53% find books on reserve 80% 73% find magazines by title 0% 73% find the library staff contact information 60% 100% find contact information for the branch libraries 40% 100% modeling a library website redesign process | becker 13 the website and its designers.28 we found this to be true in the way our users easily found the hours and catalog links on the prototype of our library website. the users on the old site knew where to look for this information because they were accustomed to how to navigate the old site. given that the prototype was a complete departure from the navigation and design of the old site, it was crucial that the labels and links were clear and understandable in the prototype or our design would fail. we made “hours” the first link under the “about” heading and “cuny+/books” the first link under the “find” heading and as a result both our terminology and our structure was a success with participants. on the old website, users rarely used the libraries’ online chat client. despite our efforts to remind students of its usefulness, the website didn’t sufficiently place the link in a reasonably visible location on the home page. in the old site, only 40 percent of participants located the link as it was on the bottom left of the screen and easy to overlook. instead, on the new site, the “ask a librarian” link was prominently featured on the top of the screen. these results upheld the guiding principles of solid information architecture and understandable terminology. it also supported nielsen’s assertion that “site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.” 29 as a result the launch of the redesigned site, the use of the questionpoint chat client has more than doubled. finding a journal article on a topic was always problematic for users of the old library website. the participants we tested were familiar with the site, and 80 percent erroneously clicked on “journal title list” when the more appropriate link would have been “databases” if they didn’t have an exact journal title in mind. although we taught this in our information literacy courses, it was challenging getting the information across. in order to address this on the new site, “databases” was changed to “databases/articles” and categorized under the heading “find.” the participants using the new site had greater success with the new terminology; 66 percent correctly chose “databases/articles.” this question revealed an inconsistency with the guiding principals of understandable terminology and clear navigation systems on the old site. these issues were addressed by adding the word “articles” after “databases” on the new site to clarify what resources could be found in a database and also by placing the link under the heading “find” to further explain the action a student would be taking by clicking on the “databases/articles” link. finding reference materials was challenging for the users of the old site as none of the participants clicked on the intended link “subject guides.” in an effort to increase usage of the research guides, the library not only purchased the libguides tool, but also changed the wording of the link to “topic guides.” as we neared the end of our study we observed that only one participant knew to click on the “topic guides” link for research assistance. the participants suggested calling it “research guides” instead of “topic guides” and we changed it. unfortunately, the usability study was completed and we were unable to further test the effectiveness of the rewording of this link. anecdotally, the rewording of this link appears to be more understandable to users as the information technology and libraries | march 2013 14 research guides are getting more usage (based on hit counts) than the previous guides. the rewording of these guides adhered to both principles of understandable terminology and usercentered design. these results supported nielsen’s assertion that the most important material should be presented up front, using the inverted pyramid principal. “users should be able to tell in a glance what the page is about and what it can do for them.” 30 our results also supported the hhs report, which states that terminology “plays a large role in the user’s ability to find and understand information. many terms are familiar to designers and content writers, but not to users.” 31 we concluded that rewriting the link based on student feedback reduces the use of terminology. although librarians are “subject specialists” and “subject liaisons” and are familiar with those labels and that terminology, our students were looking for the word “research” instead of “subject” so they were not connecting with the library’s libguides. as previously discussed, students of the old site thought the link “journal title list” would give them access to the library’s database holdings. when asked to find a specific journal title the correct answer to this question on the old site was “journal title list,” with only 40 percent of the participants answering correctly. another change to terminology in the new site, both were placed under the heading “find,” and, after testing of the first prototype, “journal title list” was changed to “list of journals and magazines.” in the following tests 66 percent of the participants were able to answer correctly. the percentages of success in finding circulation policies between the old site and the prototype site were slight, only a 7 percent difference. this can be attributed to the fact that participants on the old site could click on multiple links to get to the correct page, and they were familiar enough with the site to know that. in the prototype of the site there were several paths as well, some direct, some indirect. testing the wording of this link supported the understandable terminology principle, more so than the old website’s “library policies” link, yet to be true to our user-centered design principle, we needed to reword it once more. therefore, after the test was completed and the website was launched, we reworded the link to “checkout policies,” which utilizes the same terminology that users are familiar with because they checkout books at our checkout desk. the remaining tasks consisted of locating information, such as finding books on reserve, magazines by title, library staff contact information, and finding branch information were all met with higher success rates in the prototype site because in the redesign process the links were reworded to support the understandable terminology and user-centered design principles. participant feedback: qualitative the usability testing process informed the redesign of our website in many specific ways. if the layout of the site didn’t test well with participants, we planned to create another prototype. in their evaluation of colorado state universities libraries’ digital collections and the western waters digital library websites, zimmerman and paschal describe the importance of first impressions of a website as the determining factor of whether users return to a website; if it is positive they will return and continue to explore.32 modeling a library website redesign process | becker 15 when given an opportunity to give feedback on what they thought of the design of the website the participants commented: • “there were no good library links at the bottom before and there wasn’t the ask a librarian link either which i like a lot.” • “the old site was too difficult to navigate, new site has a lot of information, i like the different color schemes for the different things.” • “it is contemporary and has everything i need in front of me.” • “cool.” • “helpful.” • “straightforward.” • “the organization is easier for when you want to find things.” • “interactivity and rollovers make it easy to use.” • “intuitive, straight-forward and i like the simplicity of the colors.” • “more professional, more aesthetically pleasing than the old site.” • “the four menu options (about, find, services, help) break the information down easily.” additional research conducted by nathan, yeow, and murguesan claims attractiveness (referring to aesthetic appeal of a website) is the most important factor in influencing customer decisionmaking and affects the usability of the website.33 not only that, but users feel better when using a more attractive product. fortunately, the feedback from our participants revealed that the website was visually appealing, and the navigation scheme was clear and easy to understand. other changes made to the libraries’ website because of usability testing participants commented that they expected to find library contact information on the bottom of the homepage, so the bottom of the screen was modified to include this information as well as a “contact us” link. participants did not realize that the “about,” “find,” “services,” and “help” headings were also links, so we modified them so they were underlined when hovered over. there were also adjustments to the gray color bars on the top of the page because participants thought they were too bright, so they were darkened to make the labels easier to read. participants also commented that they wanted links to various public libraries in new york city under the “quick links” section of the homepage. we designed buttons for brooklyn public library, queens public library, and the new york public library and reordered this list to move these links closer to the top of the “quick links” section. information technology and libraries | march 2013 16 conclusion conducting a usability study of hunter college libraries existing website and the various stages of the redesigned website prototypes was instrumental in developing a user-centered design. approaching the website redesign in stages, with guidance from iterative user testing and influenced by the participants’ comments, gave the web librarian and the web committee an opportunity to incorporate the findings of the usability study into the design of the new website. rather than basing design decisions on assumptions of users’ needs and information seeking behaviors, we were able to incorporate what we’d learned from the library literature and the users’ behavior into our evolving designs. this strategy resulted in a redesigned website that, with continued testing, user feedback, and updating, has aligned with the guiding principles we developed at the onset of the redesign project. the one unexpected outcome from this study is that we discovered that despite how well a library website is designed, users will still need to be educated in how to use the site with an emphasis on developing strong information literacy skills. references 1. “the digital information seeker: report of the findings from selected oclc, rin, and jisc user behaviour projects,” oclc research, ed. lynn silipigni-connaway and timothy dickey (2010): 6, www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx. 2. judith emde, lea currie, frances a. devlin, and kathryn graves, “is ‘good enough’ ok? undergraduate search behavior in google and in a library database,” university of kansas scholarworks (2008), http://hdl.handle.net/1808/3869; julia gross and lutie sheridan, “web scale discovery: the user experience,” new library world 112, no. 5/6 (2011): 236, doi: 10.1108/03074801111136275. 3. ibid, 238. 4. jakob nielsen, designing web usability (indianapolis: new riders, 1999), 198. 5. ibid, 134. 6. ibid, 97. 7. barbara j. cockrell and elaine a. jayne, “how do i find an article? insights from a web usability study,” journal of academic librarianship 28, no. 3 (2002): 123, doi: 10.1016/s00991333(02)00279-3. 8. steve krug, don't make me think! a common sense approach to web usability, 2nd ed. (berkeley, ca: new riders, 2006), 135. 9. tom ipri, michael yunkin, and jeanne brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181, doi: 10.6017/ital.v28i4.3229. 10. brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 189–98, doi: 10.1016/s0099-1333(01)00180-x. http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx http://hdl.handle.net/1808/3869 doi:%2010.1108/03074801111136275 doi:%2010.1108/03074801111136275 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.6017/ital.v28i4.3229 doi:%2010.1016/s0099-1333(01)00180-x doi:%2010.1016/s0099-1333(01)00180-x modeling a library website redesign process | becker 17 11. ibid. 12. jakob nielsen and marie tahir, “keep your users in mind,” internet world 6, no. 24 (2000): 44. 13. steve krug, don't make me think! a common sense approach to web usability, 135. 14. research-based web design and usability guidelines, ed. ben schneiderman (washington: united states dept. of health and human services, 2006), 190. 15. laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163, doi: 10.6017/ital.v25i3.3348. 16. battleson, booth, and weintrop, “usability testing of an academic library web site,” 190. 17. ibid. 18. carole a. george, “usability testing and design of a library website: an iterative approach,” oclc systems & services 21, no. 3 (2005): 178, doi: 10.1108/10650750510612371. 19. laura cobus, valeda dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 234–35. 20. susan mcmullen, “usability testing in a library web site redesign project,” reference services review 29, no. 1 (2001): 13, doi: 10.1108/00907320110366732. 21. ibid. 22. john bauer-graham, jodi poe, and kimberly weatherford, “functional by design: a comparative study to determine the usability and functionality of one library's web site,” technical services quarterly 21, no. 2 (2003): 34, doi: 10.1300/j124v21n02_03. 23. cockrell and jayne, “how do i find an article?,” 123. 24. maaike van den haak, menno de jong, and peter jan schellens, “retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue,” behavior & information technology 22, no. 5 (2003): 339. 25. battleson, booth, and weintrop, “usability testing of an academic library web site,” 192. 26. dominique turnbowet al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 231, doi: 10.1108/10650750510612416. 27. cobus, dent, and ondrusek, “how twenty-eight users helped redesign an academic library web site,” 234. 28. krug, don't make me think! 59. 29. nielsen, designing web usability, 164. 30. ibid., 111. doi:%2010.6017/ital.v25i3.3348 doi:%2010.1108/10650750510612371 doi:%2010.1108/00907320110366732 doi:%2010.1300/j124v21n02_03 doi:%2010.1108/10650750510612416 information technology and libraries | march 2013 18 31. schneiderman, research-based web design and usability guidelines, 160. 32. don zimmerman and dawn bastian paschal, “an exploratory evaluation of colorado state universities libraries’ digital collections and the western waters digital library web sites,” journal of academic librarianship 35, no. 3 (2009): 238, doi: 10.1016/j.acalib.2009.03.011. 33. robert j. nathan, paul h. p. yeow, and sam murugesan, “key usability factors of serviceoriented web sites for students: an empirical study,” online information review 32, no. 3 (2008): 308, doi: 10.1108/14684520810889646. doi:%2010.1016/j.acalib.2009.03.011 doi:%2010.1108/14684520810889646 modeling a library website redesign process | becker 19 appendix a. hunter college libraries’ old website information technology and libraries | march 2013 20 appendix b. hunter college libraries’ new website modeling a library website redesign process | becker 21 appendix c. test participant profiles participant sex academic standing major library instruction session? how often in the library 1 female senior history yes every day 2 female sophomore psychology no every day 3 male junior nursing no 1/week 4 female junior studio art no 5/week 5 female senior accounting yes 2–3/week 6 male freshman undeclared yes 1/week 7 female freshman undeclared no every day 8 male senior music yes 3–4/week 9 male freshman physics/english no every day 10 female senior english lit/ media studies no 1/week 11 female junior fine arts/ geography yes 2–3/week 12 male sophomore computer science yes every day 13 male sophomore econ/psychology yes 6 hours/week 14 female senior math/econ yes 2–3/week 15 female senior art yes everyday 16 male n/a* pre-nursing no daily 17 female senior** econ didn’t remember 3/week 18 male senior pre-med yes 2/week 19 female grad art history yes 3/week 20 male grad education (tesol) no every day note: *this student at hunter fulfilling pre-requisites; already had bachelor of arts degree from another college. **this student had just graduated. information technology and libraries | march 2013 22 appendix d. test questions/tasks • what is the first thing you noticed (or looked at) when you launched the hunter libraries homepage? • what’s the second? • if your instructor assigned the book to kill a mockingbird what link would you click on to see if the library owns that book? • when does the library close on wednesday night? • if you have a problem researching a paper topic and are at home, where would you go to get help from a librarian? • where would you click if you needed to find two journal articles on “homelessness in america”? • you have to write your first sociology paper and wanted to know what databases, journals, and web sites would be good resources for you to begin your research. where would you click? • does hunter library subscribe to the e-journal journal of communication? • how long can you check out a book for? • how would you find items on reserve for professor doyle’s liibr100 class? • does hunter library have the latest issue of rolling stone magazine? • what is the e-mail for louise sherby, dean of libraries? • what is the phone number for the social work library? • you are looking for a guide to grammar and writing on the web, does the library’s webpage have a link to such a guide? • your friend is a hunter student who lives near brooklyn college. she says that she may return books she borrowed from the brooklyn college library to hunter library. is she right? where would you find out? • this website is easy to navigate (agree, agree somewhat, disagree somewhat, disagree)? • this website uses too much jargon (agree, agree somewhat, disagree somewhat, disagree)? • i use the hunter library’s website (agree, agree somewhat, disagree somewhat, disagree)? editorial board thoughts: tools of the trade sharon farnel information technology and libraries | march 2012 5 as i was trying to settle on a possible topic for this, my second “editorial board thoughts” piece, i was struggling to find something that i’d like to talk about and that ital readers would (i hope) find interesting. i had my “eureka!” moment one day as i was coming out of a meeting, thinking about a conversation that had taken place around tools. now, by tools, i’m referring not to hardware, but to those programs and applications that we can and do use to make our work easier. the meeting was of our institutional repository team, and the tools discussion specifically focused on data cleanup and normalization, citation integration, and the like. i had just recently returned from a short conference where i had heard mentioned or seen demonstrated a few neat applications that i thought had potential. a colleague also had just returned from a different conference, excited by some of things that he’d learned about. and all of the team members had, in recent days, seen various e-mail messages about new tools and applications that might be useful in our environment. we mentioned and discussed briefly some of the tools that we planned to test. one of the tools had already been test driven by a couple of us, and looked promising; another seemed like it might solve several problems, and so was bumped up the testing priority list. during the course of the conversation, it became clear that each of us had a laundry list of tools that we wanted to explore at greater depth. and it also became clear that, as is so often the case, the challenge was finding the time to do so. as we were talking, my head was full of images of an assembly line, widgets sliding by so quickly that you could hardly keep up. i started thinking how you could stand there forever, overwhelmed by the variety and number of things flying by at what seemed like warp speed. alternatively, if you ever wanted to get anywhere, do anything, or be a part of it all, you just had to roll up your sleeves and grab something. the meeting drew to a close, and we all left with a sense that we needed to find a way of tackling the tools-testing process, of sharing what we learn and what we know, all in the hope of finding a set of tools that we, as a team, could become skilled with. i personally felt a little disappointed at not having managed to get around to all of the tools i’d earmarked for further investigation. but i also felt invigorated at the thought of being able to share the load of testing and researching. if we could coordinate ourselves, we might be able to test drive even more tools, increasing the sharon farnel (sharon.farnel@ualberta.ca) is metadata and cataloguing librarian, university of alberta, edmonton, alberta, canada. mailto:sharon.farnel@ualberta.ca editorial board thoughts | farnel 6 likelihood we’d stumble on the few that would be just right! we’d taken furtive steps towards this in the past, but nothing coordinated enough to make it really stick and be effective. i started wondering how other individuals and institutions manage not only to keep up with all of the new and potentially relevant tools that appear at an ever-increasing pace, but more so how they manage to determine which they will become expert at and use going forward. (although i was excited at what we were thinking of doing, i was quite sure that others were likely far ahead of us in this regard!) it made me realize that at some point i—and we—need to stop being bystanders to the assembly line, watching the endless parade of tools pass us by. we need to simply grab on to a tool and take it for a spin. if it works for what we need, we stick with it. if it doesn’t, we put it back on the line, and grab a different one. but at some point we have to take a chance and give something a shot. we’ve decided on a few methods we’ll try for taking full advantage of the tool-rich environment in which libraries exist today. our metadata team has set up a “test bench,” a workstation that we can all use and share for trying new tools. a colleague is going to organize monthly brown-bag talks at which team members can demonstrate tools that they’ve been working with and that they think have potential uses in our work. and we’re also thinking of starting an informal, and public, blog, where we can post, among other things, about new tools we’ve tried or are trying, what we’re finding works and how, and what doesn’t and why. we hope these and other initiatives will help us all stay abreast or even slightly ahead of new developments, be flexible in incorporating new tools into our workflows when it makes the most sense, and in building skills and expertise that benefit us and that can be shared with others. so, i ask you, our ital readers, how do you manage the assembly line of tools? how do you gather information on them, and when do you decide to take one off and give it a whirl? how do you decide when something is worth keeping, or when something isn’t quite the right fit and gets placed back on the line? why not let us know by posting on the italica blog? or, even better, why not write about your experience and submit it to ital? we’re always on the lookout for interesting and instructional stories on the tools of our trade! http://ital-ica.blogspot.com/ 2 information technology and libraries | december 2008 andrew k. pacepresident’s message i n my first column, i mentioned that the lita board’s main objective is “to oversee the affairs of the division during the period between meetings.” of course, oversight requires communication. sometimes this is among board members, or it’s an e-mail update, or a post to the lita-l discussion list, or even the articles in this journal. regardless, i see the cornerstone of “between-meeting oversight” as keeping the membership fully (or even partially) engaged from january through june and july through december. as a mea culpa for the board, but without placing the blame on any one individual, i am willing to concede that the board has not done an adequate job of engaging the membership between american library association (ala) meetings. while ala itself is addressing this problem with recommendations for virtual participation and online collaboration, lita should be at the forefront of setting the benchmark for virtual communication, participation, education, planning, and membership development. in an attempt to posit some solutions, as opposed to finding someone to blame, i first thought of the lita committees. which one should be responsible for communicating lita opportunities and events to the membership using twenty-first-century technology? education? membership? web coordinating? program planning? publications? in the end, i was left with the choice of two evils: merge all the committees into one so that they can do everything or create a new committee to deal with the perceived problem. knowing that neither of those solutions will suffice, i’d like to put the onus back on the membership. maybe i’m trying to be a 2.0 librarian—crowdsourcing the problem, that is, taking the task that might have been done by an individual or committee and asking for more of a community-driven solution. in the past, lita focused on the necessary technologies for crowdsourcing—discussion lists, blogs, and wikis—as if the technology alone could solve the problem. the bigwig taskforce and web coordinating committee have shouldered the burden of both implementing the technology and gaining philosophical consensus on its use—a daunting task that can easily appear chaotic. now that the technology is commoditized (and generally embraced by ala at large and other divisions as well), perhaps it is time to embrace the philosophy of crowdsourcing. maybe it’s just because i have had cloud computing and web-scale architectures on the brain too much lately (having decided that it is impossible to serve two masters—job and volunteer work—i shall forever endeavor to find the overlap between the two), but i sincerely believe that repeating the mantra that lita’s strength is its membership is not mere rhetorical lipservice. ebay is better for sellers because there are so many buyers; it is better for buyers because there are so many sellers. googledocs works for sharing documents better than a corporate wiki or microsoft sharepoint because it breaks down the barriers of domains, allowing the participants to determine who shares responsibility for producing something. barcamps are rising in popularity not only because of a content focus on open data, open source, and open access, but because of the participatory and usergenerated style of the barcamp-style meetings. as a division of ala, lita has two challenges— leading the efforts of educating the membership, other divisions, and ala about impending sea changes in information technology, but also embracing these technologies itself. we must eat our own dog food, as the saying goes. perhaps it is more fitting to suggest that lita must not only focus on getting technology to work, but putting technology to work. in the next few months, the lita board will be tackling lita’s strategic plan, which expires in 2008. that means it is time not only to review the strategy—to educate, to serve, to reach out—but also to assess the tactics employed to fulfill that strategy. you are probably reading this column in or after the month in which the strategic plan ends, which does not mean that we will be coasting into the ala midwinter meeting. on the contrary, i sincerely hope to gather enough information from committees, task forces, members, and nonmembers in order for the lita leadership to come up with something strategically meaningful going into the next decade. one year isn’t nearly long enough to see something this big through to completion. just as national politicians begin reelection campaigns as soon as they are elected, i suspect that ala divisional presidents begin thinking about their legacy within the first couple months of office, if not before. but i hope, at least, to establish some groundwork, including a platform strategy that will allow the membership to maintain a connection with the board and with other members—to crowdsource solutions on a scale that has not been attempted in the past and that will solidify our future. and when we have a plan, you can trust that we will use all the available methods at our disposal to promote it and solicit your feedback. andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. drawing upon findings from a national survey of u.s. public libraries, this paper examines trends in internet and public computing access in public libraries across states from 2004 to 2006. based on library-supplied information about levels and types of internet and public computing access, the authors offer insights into the network-based content and services that public libraries provide. examining data from 2004 to 2006 reveals trends and accomplishments in certain states and geographic regions. this paper details and discusses the data, identifies and analyzes issues related to internet access, and suggests areas for future research. t his article presents findings from the 2004 and 2006 public libraries and the internet studies detail­ ing the different levels of internet access available in public libraries in different states.1 at this point, 98.9 percent of public library branches are connected to the internet and 98.4 percent of connected public library branches offer public internet access.2 however, the types of access and the quality of access available are not uniformly distributed among libraries or among the libraries in various states. while the data at the national level paint a portrait of the internet and public computing access provided by public libraries overall, studies of these differences among the states can help reveal successes and lessons that may help libraries in other states to increase their levels of access. the need to continue to increase the levels and quality of internet and public computing access in public libraries is not an abstract problem. the services and con­ tent available on the internet continue to require greater bandwidth and computing capacity, so public libraries must address ever­increasing technological demands on the internet and computing access that they provide. 3 public libraries are also facing increased external pressure on their internet and computing access. as patrons have come to rely on the availability of internet and computing access in public libraries, so too have government agencies. many federal, state, and local government agencies now rely on public libraries to facilitate citizens’ access to e­government services, such as applying for the federal prescription drug plans, filing taxes, and many other interactions with the gov­ ernment.4 further, public libraries also face increased demands to supply public access computing in times of natural disasters, such as the major hurricanes of 2004 and 2005.5 as a result, both patrons and govern­ ment agencies depend on the internet and computing access provided by public libraries, and each group has different, but interrelated, expectations of what kinds of access public libraries should provide. however, the data indicate that public libraries are at capacity in meet­ ing some of these expectations, while some libraries lack the funding, technology­support capacity, space, and infrastructure (e.g., power, cabling) to reach the expecta­ tions of each respective group. as public libraries (and the internet and public com­ puting access they provide) continue to fill more social roles and expectations, a range of new ideas and strate­ gies can be considered by public libraries to identify suc­ cessful methods for providing access that is high quality and sufficient to meet the needs of patrons and commu­ nity. the goals of the public libraries and the internet stud­ ies have been to help provide an understanding of the issues and needs of libraries associated with providing internet­based services and resources. the 2006 public libraries and the internet study employed a web­based survey approach to gather both quantitative and qualitative data from a sample of the 16,457 public library outlets in the united states.6 a sample was drawn to accurately represent metropolitan status (roughly equating to their designation of urban, suburban, or rural libraries), poverty levels (as derived through census data), state libraries, and the national picture, producing a sample of 6,979 public library out­ lets.7 the survey received a total of 4,818 responses for a response rate of 69 percent. the data in this article, unless otherwise noted, are drawn from the 2004 and 2006 public libraries and the internet studies.8 while the survey received responses from librar­ ies in all fifty states, there were not enough responses in all states from which to present state­level findings. the study was able to provide state­level analysis for thirty­five states (including washington, d.c.) in 2004 and forty­four states at the outlet level (including washington, d.c.) and forty­two states at the system level (including washington, d.c.) in 2006. in addi­ tion, there was some variance in states with adequate responses between the 2004 and 2006 studies. a full listing of the states is available in the final reports of the 2004 and 2006 studies at http://www.ii.fsu.edu/ plinternet_reports.cfm. thus, the findings below reflect 4 information technology and libraries | june 2007 public libraries and internet access across the united states: a comparison by state 2004–2006 paul t. jaeger, john carlo bertot, charles r. mcclure, and miranda rodriguez paul t. jaeger (pjaeger@umd.edu) is an assistant professor at the college of information studies at the university of maryland; john carlo bertot (bertot@ci.fsu.edu) is professor and associate director of the information use management and policy institute, college of information, florida state university; charles r. mcclure (cmcclure@ci.fsu.edu) is francis eppes professor and director of the information use management and policy institute, college of information, florida state university; and miranda rodriguez (mrodrig08@umd.edu) is a graduate student in the college of information studies at the university of maryland. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 5 only those states for which both the 2004 and 2006 stud­ ies were able to provide analysis. n public libraries and the internet across the states overview of 2004 to 2006 as the public library and the internet studies have been ongoing since 1994, the questions asked in the biennial studies have evolved along with the provision of internet access in libraries. the questions have varied between surveys, but there have been consistent questions that allow for longitudinal analysis at the national level. the 2004 study introduced the analysis of the data at both the national and the state levels. with both the 2004 and 2006 studies providing data at the state level, some longitudi­ nal analysis at the state level is now possible. overall, there were a number of areas of consistent data across the states from 2004 to 2006. most states had fairly similar, if not identical, percentages of library outlets offering public internet access between 2004 and 2006. for the most part, changes were increases in the percentage of library outlets offering patron access. further, the average number of hours open per week in 2004 (44.5) and in 2006 (44.8) were very similar, as were the percentages of library outlets reporting increases in hours per week, decreases in hours per week, and no changes in hours per week. while these numbers are consistent, it is not known whether this average number of hours open, or the distribution of the hours open across the week, is sufficient to meet patron needs in most communities. data across the states also indicated that physical space is the primary reason for the inability of libraries to add more workstations within the library building. there was also consistency in the findings related to upgrades and replacement schedules. changes and continuities from 2004 to 2006 while the items noted above show some areas of stability in the internet access provided by public libraries across the states, insights are possible in the areas of change for libraries overall or in the libraries that are leading in particular areas. table 1 details the states with the highest average number of hours open per public library outlet in 2004 and 2006. between 2004 and 2006, the national average for the number of hours open increased slightly from 44.5 hours per week to 44.8 hours per week. this increase is reflected in the numbers for the individual states in 2006, which are generally slightly higher than the numbers for the individual states in 2004. for example, the top state in 2006 averaged 55.7 hours per outlet each week, while the top state in 2004 averaged 54.8 hours. the top four states—ohio, new jersey, florida, and virginia—were the same in both years, though with the top two switching positions. this demonstrates a continuing commitment in these four states by state and local government to ensure wide access to public librar­ ies. these states are also ones with large populations and state budgets, presumably fueling the commitment and facilitating the ability to keep libraries open for many hours each week. while the needs of patrons in other states are no less significant, the data indicate that states with larger populations and higher budgets, not surpris­ ingly, may be best positioned to provide the highest levels of access to public libraries for state residents. the other six states in the 2006 top ten were not in the 2004 top ten. the primary reason for this is that the six states in 2006 increased their hours more than other states. note that the fifth­ranked state in 2004, south carolina, averaged 49 hours per outlet each week, which is less than the tenth­ranked state in 2006, illinois, at 49.5 hours. simply by maintaining the average number of hours open per outlet between 2004 and 2006, south carolina fell from fifth to out of the top ten. these differ­ ences are reflected in the fact that there is nearly a ten­ hour difference from first place to tenth place in 2004; yet only a six­hour discrepancy exists from first place to tenth in 2006. these numbers suggest that hours of operation may change frequently for many libraries, indicating the need for future evaluations of operational hours in rela­ tion to meeting patron demand. table 2 displays the states with the highest average number of public access workstations per public library in 2004 and 2006. the national averages between 2004 and 2006 also showed a slight increase from 10.4 workstations table 1. highest average number of hours open in public library outlets by state in 2004 and 2006 2004 2006 1. new jersey 54.8 1. ohio 55.7 2. ohio 54.6 2. new jersey 55.6 3. florida 52.4 3. florida 52.3 4. virginia 51.3 4. virginia 52.3 5. south carolina 49.0 5. indiana 51.9 6. utah 48.0 6. pennsylvania 50.6 7. new mexico 47.4 7. washington, d.c. 50.6 8. rhode island 47.3 8. maryland 50.0 9. alabama 46.9 9. connecticut 49.8 10. new york 46.2 10. illinois 49.5 national: 44.5 national: 44.8 in 2004 to 10.7 workstations in 2006. a key reason for this slow growth in the number of workstations appears to have a great deal to do with limitations of physical space in libraries; in spite of increasing demands, space con­ straints often limit computer capacity.9 unlike table 1, the comparisons between 2004 and 2006 in table 2 do not show across­the­board increases from 2004 to 2006. in fact, florida had the highest average of workstations per library outlet in both 2004 and 2006, but the average number decreased from 22.6 in 2004 to 21.7 in 2006. it is interesting to note that florida has a significantly higher number of workstations than the next highest state in both 2004 and 2006. in contrast, many of the states in the lower half of the top ten in 2004 had sub­ stantially lower average numbers of workstations in 2004 than in 2006. in 2004 there were an average of seven more computers in spot two than spot ten; in 2006, there were only an average of four more computers from spot two to ten. the large increases in the number of workstations in some states, like nevada, michigan, and maryland, indicate sizeable changes in budget, numbers of outlets, and/or population size. also of note is the significant drop of the average number of workstations in kentucky, declining from 18.8 in 2004 to fewer than 13 in 2006. a possible explanation is that, since kentucky libraries have been leaders in adopting wireless technologies (see table 3), the demand for workstations has decreased as libraries have added wireless access. five states appear in the top ten of both years— florida, indiana, georgia, california, and new jersey. the average number of workstations in indiana, california, and georgia increased from 2004 to 2006, while the aver­ age number of workstations in florida and new jersey decreased between 2004 and 2006. some of the decreases in workstations can be accounted for by increases in the availability of wireless access in public libraries, as librar­ ies with wireless access may feel less need to add more networked computers, relying on patrons to bring their own laptops. such a strategy, of course, will not increase access for patrons who cannot afford laptops. some libraries have sought to address this issue by having lap­ tops available for loan within the library building. the states listed in table 3 had the highest average levels of wireless connectivity in public library outlets in 2004 and 2006. the differences between the numbers in 2004 and 2006 reveal the dramatic increases in the avail­ ability of wireless internet access in public libraries. the national average in 2004 was 17.9 percent, but in 2006, the national average had more than doubled to 37.4 percent of public libraries offering wireless internet access. this sizeable increase is reflected in the changes in the states with the highest levels of wireless access. every position in the ratings in table 3 shows a dra­ matic jump from 2004 to 2006. the top position increased from 47 percent to 63.8 percent. the tenth position increased from 19.6 percent to 47.8 percent, an increase of nearly two­and­a­half times. these increases show how much more prominent wireless internet access has become in the services that public libraries offer to their communities and to their patrons. four states appear on both the 2004 and 2006 lists— virginia, kentucky, rhode island, and new jersey. these four states all showed increases, but the rises in some table 2. highest average number of public access workstations in public library outlets by state in 2004 and 2006. 2004 2006 1. florida 22.6 1. florida 21.7 2. kentucky 18.8 2. indiana 17.5 3. new jersey 15.5 3. nevada 15.7 4. georgia 14.0 4. michigan 14.8 5. utah 13.0 5. maryland 14.6 6. rhode island 12.6 6. georgia 14.4 7. indiana 12.3 7. arizona 14.1 8. texas 11.9 8. california 14.0 9. california 11.8 9. new jersey 13.8 10. south carolina 11.7 10. virginia 13.0 new york 11.7 national: 10.4 national: 10.7 table 3. highest levels of public access wireless internet connectivity in public library outlets by state in 2004 and 2006 2004 2006 1. kentucky 47% 1. virginia 63.8% 2. new mexico 38.6% 2. connecticut 56.6% 3. new hampshire 31.6% 3. indiana 56.6% 4. virginia 30.8% 4. rhode island 53.9% 5. texas 26.4% 5. kentucky 52.0% 6. kansas 25.8% 6. new jersey 50.9% 7. new jersey 22.8% 7. maryland 49.8% 8. rhode island 22.5% 8. illinois 48.3% 9. florida 21.9% 9. california 47.8% 10. new york 19.6% 10. massachusetts 47.8% national: 17.9% national: 37.4% 6 information technology and libraries | june 2007 public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 7 other states were significant enough to reduce kentucky from the top­ranked state in 2004 to the fifth ranked, in spite of the fact that the number of public libraries in kentucky offering wireless access increased from 47 per­ cent to 52 percent. in both years, a majority of the states in the top ten were located along the east coast. further, high levels of wireless access may be linked in some states to areas of high population density or the strong presence of technology­related sectors in the state, as in california and virginia. smaller states with areas of dense popula­ tions, such as connecticut, rhode island, and maryland, are also among the leaders in wireless access. tables 4 and 5 provide contrasting pictures regarding the number of public access internet workstations in public libraries by state in 2004 and 2006. table 4 shows the states with the highest percentages of libraries that consistently have fewer workstations that are needed by patrons, while table 5 shows the states with the highest percentages of libraries that consistently have sufficient workstations to meet patron needs. of note is the fact that, unlike the preceding three tables, there appears to be no significant geographical clustering of states in tables 4 and 5. nationally, the percentage of libraries that consis­ tently have insufficient workstations to meet patron needs declined from 15.7 percent in to 2004 to 13.7 percent in 2006, a change that is within the margin of error (+/­ 3.4 percent) of the question on the 2006 survey. due to the size of the change, it is not known if the national decline was a real improvement or simply a reflection of the margin of error. washington, d.c., oregon, new mexico, idaho, and california appear on the lists for both 2004 and 2006 in table 4. washington, d.c. had the highest percentage of libraries reporting insufficient workstations in both years, though there was a significant decrease from 100 percent of libraries in 2004 to 69 percent of libraries in 2006. in this case, the significant drop represents major strides forward to providing sufficient access to patrons in washington, d.c. similarly, though california features on both lists, the percentages dropped from 44.9 percent in 2004 to 22.2 percent in 2006, a decline of more than half. states like these are obviously making efforts to address the need for increased workstations. overall, eight out of ten positions in table 4 remained constant or saw a decline percentage in each position from 2004 to 2006, indicating a national decrease in libraries with insufficient workstations. in sharp contrast, fewer than 20 percent of nevada libraries in 2004 reported insufficient workstations, placing well out of the top ten. however, in 2006 nevada ranked second, with 51.5 percent of public libraries reporting insufficient workstations to meet patron demand. with nevada’s rapidly growing population, it appears that the demand for internet access in public libraries may not be keeping pace with the population growth. the percentage of public libraries reporting suffi­ cient workstations to consistently meet patron demands increased slightly at the national level from 14.1 percent in 2004 to 14.6 percent in 2006, again well within the margin of error (+/­ 3.5 percent) of the 2006 question. however, in table 5, the top ten positions in 2006 all fea­ ture lower percentages than the same positions in 2004. in 2004 the top­ranked state had 53.2 percent of libraries able to consistently meet patron needs for internet access, but the top­ranked state in 2006 had only 31 percent of libraries able to consistently meet patron access needs. table 4. public library outlet public access workstation availability by state in 2004 and 2006–consistently have fewer workstations than are needed 2004 2006 1. washington, d.c. 100% 1. washington, d.c. 69.9% 2. california 44.9% 2. nevada 51.5% 3. florida 36% 3. oregon 34.8% 4. new mexico 30.7% 4. new mexico 31.9% 5. oregon 30.4% 5. tennessee 30.4% 6. utah 29.2% 6. alaska 27.8% 7. south carolina 28.4% 7. idaho 26% 8. kentucky 24.1% 8. california 22.2% 9. alabama 21.5% 9. new york 21.4% 10. idaho 21.1% 10. rhode island 19% national: 15.7% national: 13.7% table 5. public library outlet public access workstation availability by state in 2004 and 2006—always have a sufficient number of workstations to meet demand. 2004 2006 1. wyoming 53.2% 1. louisiana 31% 2. alaska 34.9% 2. new hampshire 30.4% 3. kansas 32.2% 3. north carolina 28.4% 4. rhode island 31.4% 4. arkansas 26.2% 5. new hampshire 29.7% 5. wyoming 25.2% 6. south dakota 25.2% 6. mississippi 24.4% 7. georgia 25% 7. missouri 23.6% 8. arkansas 24.8% 8. vermont 22.2% 9. vermont 32.7% 9. nevada 20.9% 10. virginia 22.4% 10. pennsylvania 17.9% west virginia 17.9% national: 14.1% national: 14.6% � information technology and libraries | june 2007 four states—new hampshire, arkansas, wyoming, and vermont—appear on both the 2004 and 2006 lists. the national increase in the sufficiency of the num­ ber of workstations to meet patron access needs and decreases in all of the top­ranked states between 2004 and 2006 seems incongruous. this situation results, however, from a decrease in range of differences among the states from 2004 to 2006, so that the range is compressed and the percentages are more similar among the states. further, in some states, the addition of wireless access may have served to increase the overall sufficiency of the access in libraries, possibly leveling the differences among states. nevertheless, the national average of only 14.6 percent of public libraries consistently having sufficient numbers of workstations to meet patron access needs is clearly a major problem that public libraries must work to address. comparing the 2006 data of tables 4 and 5 demonstrates that patron demands for internet access are being met neither evenly nor consistently across the states. nationally, the percentage of public library systems with increases in the information technology budgets from the previous year dropped dramatically from 36.1 percent in 2004 to 18.6 percent in 2006. as can be seen in table 6, various national, state, and local budget crunches have significantly reduced the percentages of public library systems with increases in information technology budgets. when inflation is taken into account, a stationary information technology budget represents a net decrease in funds available in real dollar terms, so the only public libraries that are not actually having reductions in their information technology budgets are those with increases in such budgets. since internet access and the accompa­ nying hardware necessary to provide it are clearly a key aspect of information technology budgets, decreases in these budgets will have tangible impacts on the ability of public libraries to provide sufficient internet access. virtually every position on table 6 has a decrease of 20 percent to 30 percent from 2004 to 2006, with the largest decrease being from 84.2 percent in 2004 to 48.3 percent in 2006 in the second position. five states—delaware, kentucky, florida, rhode island, and south carolina—are listed for both 2004 and 2006, though every one of these states registered a decrease from 2004 to 2006. no drop was more dramatic than south carolina’s from 84.2 percent in 2004 to 31 percent in 2006. overall, though, the declining information tech­ nology budgets and continuing increases in demands for information technology access among patrons cre­ ates a very difficult situation for libraries. public libraries and the internet in 2006 along with questions that were asked on both the 2004 and 2006 public libraries and the internet studies, the sur­ vey included new questions on the 2006 study to account for social changes, alterations of the policy environment, and the maturation of internet access in public librar­ ies. several findings from the new questions on the 2006 study were noteworthy among the state data. the states listed in table 7 had the highest percentage of public library systems with increases in total operating budget over the previous year in 2006. nationally, 45.1 percent of public library systems had some increase in their overall budget, which includes funding for staff, physical structures, collection development, and many other costs, along with technology. at the state level, three northeastern states clearly led the way, with more than 75 percent of library systems in maryland, delaware, and rhode island benefiting from an increase in the overall operating budget. also of note is the fact that two fairly table 6. highest levels of public library system overall internet information technology budget increases by state in 2004 and 2006 2004 2006 1. florida 87.5% 1. delaware 60% 2. south carolina 84.2% 2. kentucky 48.3% 3. rhode island 67.5% 3. maryland 47.6% 4. delaware 64.9% 4. wyoming 45.7% 5. new jersey 61.5% 5. louisiana 40% 6. north carolina 55.5% 6. florida 38% 7. virginia 53.6% 7. rhode island 33.3% 8. kentucky 53.2% 8. south carolina 31% 9. new mexico 49.3% 9. arkansas 27.5% 10. kansas 49% 10. california 27.3% national: 36.1% national: 18.6% table 7. highest levels of public library system total operating budget increases by state in 2006 1. maryland 85.7% 2. delaware 80% 3. rhode island 76.4% 4. idaho 74.5% 5. kentucky 73.6% 6. connecticut 68.6% 7. virginia 62.8% 8. new hampshire 62.5% 9. north carolina 61.6% 10. wyoming 60.9% national: 45.1% public libraries and internet access | jaeger, bertot, mcclure, and rodriguez � rural and sparsely populated western states—idaho and wyoming—were among the top ten. five of the states in the top ten in highest percent­ ages of increases in operating budget in 2006 were also among the top ten in highest percentages of increases in information technology budgets in 2006. comparing table 7 with table 6 reveals that delaware, kentucky, maryland, rhode island, and wyoming are on both lists. in these states, increases in information technology budgets seem to have accompanied larger increases in the overall 2006 budget. an interesting point to ponder in comparing table 6 with table 7 is the large discrepancy between average increases in information technology budgets (18.6 per­ cent) with overall budgets (45.1 percent) at the national level. as internet access is becoming more vital to pub­ lic libraries in the content and services they provide to patrons, it seems surprising that such a smaller portion of library systems would receive an increase in information technology budgets than in overall budgets. one growing issue with the provision of internet access in public libraries is the provision of access at suf­ ficient connection speeds. more and more internet con­ tent and services are complex and require large amounts of bandwidth, particularly content involving audio and video components. fortunately, as demonstrated in table 8, 53.5 percent of libraries nationally indicate that their connection speed is sufficient at all times to meet patron needs. in contrast, only 16.1 percent of public libraries nationally indicate that their connection speed is insuf­ ficient to meet patron needs at all times. georgia has the highest percentage of libraries that always have sufficient connection speed at 80.5 percent. in the case of georgia, the statewide library network is most likely a key part of ensuring the majority of libraries have sufficient access speed. many of the other states that have the highest percentages of public librar­ ies with sufficient connection speeds are located in the middle part of the country. the state with the highest percentage of libraries with insufficient connection speed to meet patron demands is virginia, with 35 per­ cent of libraries. curiously, virginia consistently ranks in the top ten of tables 1–3. though virginia libraries have some of the longest hours open, some of the high­ est numbers of workstations, and some of the highest levels of wireless access, they still have the highest per­ centage of libraries with insufficient connection speed. only five states had more than 25 percent of libraries with connection speeds insufficient to meet the needs of patrons at all times. this issue is significant now in these states, as these libraries lack the necessary connec­ tion speeds. however, it will continue to escalate as an issue as content and services on the internet continue to evolve and become more complex, thus requiring greater connection speeds. comparing table 8 with table 4 (consistently have fewer workstations than are needed) and table 5 (always have a sufficient number of workstations to meet demand) reveals some parallels. alabama and rhode island are among the top ten states both for connection speed being consistently insufficient to meet patron needs (table 8) and consistently have fewer workstations than are needed (table 4). conversely, vermont and louisiana are among the top ten states both for connection speed being sufficient to meet patron needs at all times (table 8) and always have a sufficient number of workstations to meet demand (table 5). table 9 displays the two leading types of internet connection providers for public libraries and the states with the highest percentages of libraries using each. nationally, 46.4 percent of public libraries rely on an internet service provider (isp) for internet access. in the states listed in table 9, three­quarters or more of librar­ ies use an isp, with more than 90 percent of libraries in kentucky and iowa using an isp. the next most common means of connection for public libraries is through a library cooperative or library network, with 26.2 percent of libraries nationally using these means. in such cases, member libraries rely on their established network to serve as the connector to the internet. the library net­ work approach seems to be most effective in geographi­ cally small states. the top three on the list being three of the smallest of the states—rhode island, delaware, and west virginia—with more than 75 percent of libraries in each of these states connecting through a network. nationally, the remaining approximately 25 percent of table �. highest percentages of public library outlets where public access internet service connection speed is sufficient at all times or insufficient by state in 2006 sufficient to meet patrons needs at all times insufficient to meet patron needs 1. georgia 80.5% 1. virginia 35% 2. new hampshire 70.6% 2. north carolina 28.1% 3. iowa 64.2% 3. alaska 27.3% 4. illinois 64% 4. delaware 26.9% 5. ohio 63.9% 5. mississippi 26.6% 6. indiana 63.6% 6. missouri 24.3% 7. vermont 63.5% 7. rhode island 23.1% 8. oklahoma 62.8% 8. oregon 22.4% 9. louisiana 61.7% 9. connecticut 21.5% 10. wisconsin 61.5% 10. arkansas 21.2% national: 53.5% national: 16.1% 10 information technology and libraries | june 2007 libraries connect through a network managed by a nonlibrary entity or by other means. the highest percentages of public library sys­ tems receiving each kind of e­rate discount are presented in table 10. e­rate discounts are an important source of technology funding for many public libraries across the country, with more than $250,000,000 in e­rate discounts distributed to libraries between 2000 and 2003.10 nationally in 2006, 22.4 percent of public library systems received discounts for internet connectivity, 39.6 percent for telecommunications services, and 4.4 percent for internal connection costs. mississippi and louisiana appear in the top five for each of the three types of discounts. minnesota and west virginia are each in the top five for two of the three lists. many of the states benefiting the most from e­rate funding in 2006 have large rural popu­ lations spread out over a geographically dispersed area, indicating the continuing importance of e­ rate discounts in bringing internet connections to rural public libraries. maryland and west virginia are both included in the telecommunications service column of table 10 due to proportionally large areas of these smaller states that are rural. the importance of the telecommunications dis­ counts in certain states is obviated by the fact that more than 75 percent of public library systems in all five states listed received such discounts. in comparison, only one state has more than 75 percent of library systems receiv­ ing discounts for internet connectivity, while no state has 30 percent of library systems receiving discounts for internal connection costs, with the latter reflecting the manner in which e­rate funding is calculated. in spite of the penetration of the internet into virtually every public library in the united states and the general expectations that internet access will be publicly available in every library, not all public libraries offer information technology training for patrons. nationally, 21.4 percent of public library outlets do not offer technology training. table 10 lists the states with the highest percentages of public library outlets not offering information technol­ ogy training. six of the ten states listed are located in the southeastern part of the country. the lack of resources or adequate number of staff to provide training is a leading concern in these states. not offering patron training may be strongly linked to lacking economic resources to do so. for example, the two states with the highest percentage of public libraries not offering patron training—mississippi and louisiana—are also the two states in the top five recipients of each kind of e­rate funding listed in table 10. if the libraries in states like these are economically struggling just to provide internet access, it seems likely that providing accompany­ ing training might be difficult as well. a further difficulty is that there is little public or private funding available specifically for training. n discussion of issues the similarities and differences among the states indi­ cate that the evolution of public access to the internet in public libraries is not necessarily an evenly distributed phenomenon, as some states appear to be consistent lead­ ers in some areas and other states appear to consistently trail in others. while the national picture is one primarily of continued progress in the availability and quality of internet access available to library patrons, the progress is not evenly distributed among the states. 11 libraries in different states struggle with or benefit from different issues. some public libraries are limited by state and local budgetary limitations, while other libraries are seeking alternate funding sources through grant writ­ ing and building partnerships with the corporate world. some face barriers to providing access due to their geo­ graphical location or small service population. it may also be the case that the libraries in some states do not per­ ceive that patrons desire increased access. other public libraries are able to provide high­end access as a result of having strong local leadership, sufficient state and local funding, well­developed networks and cooperatives, and a proactive state library. though the discussion of the “digital divide” has become much less frequent, the state data seem to indi­ cate that there are gaps in levels of access among libraries in different states. while every state has very successful individual libraries in terms of providing quality internet table �. highest levels of types of internet connection provider for public library outlets by state in 2006 internet service provider library cooperative or network 1. kentucky 93.5% 1. rhode island 84.7% 2. iowa 90.9% 2. delaware 79.5% 3. new hampshire 83.8% 3. west virginia 77.9% 4. vermont 81.1% 4. wisconsin 71.2% 5. oklahoma 80.6% 5. massachusetts 54.7% wyoming 80.6% 6. minnesota 52.5% 7. idaho 80.2% 7. ohio 48.9% 8. montana 78.9% 8. georgia 45.1% 9. tennessee 78.4% 9. mississippi 41.2% 10. alabama 74.6% 10. connecticut 38.5% national: 46.4% national: 26.2% public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 11 access and individual libraries that could be doing a better job, the state data indicate that library patrons in different parts of the country have variations in the levels and quality of access available to them. uniformity across all states clearly will never be feasible, though, as differ­ ent states and their patrons have different needs. for example, tables 1, 2, and 3 all display features that indicate high­level internet access in public librar­ ies—high numbers of hours open, high numbers of public access workstations, and high levels of wireless internet access. three states—maryland, new jersey, and virginia—appear in the top ten in these three lists for 2006. further, connecticut, florida, illinois, and indiana each appear in the top ten of two of these three lists. these states clearly are making successful efforts at the state and local levels to guarantee widespread access to public libraries and the internet access they provide. gaps in access are also evident among different regions of the country. the highest percentages of library systems with increases in total operating budgets were concentrated in states along the east coast, with seven of the states listed in table 7 being mid­atlantic or northeastern states. in con­ trast, the highest percentages of library systems relying on e­rate funding in table 10 were concentrated in the midwest and the southeast. further, the numbers in tables 6 and 7 showed far greater increases in the total operating budgets than in the information technology budgets in all regions of the country. as a result, public libraries in all parts of the united states may need to seek alternate sources of funding specifically for information technology costs. as can be seen in table 3, the leading states in adoption of wireless technology are concentrated in the northeast and mid­atlantic. in table 11, southern states, particu­ larly louisiana and mississippi, had many of the highest percentages of libraries not offering any internet training to patrons. it is important to note with data from the gulfstates, however, that the effects of hurricane katrina may have had a large impact on the results reported. one key difference in a number of states seems to be the presence of a state library actively working to coordi­ nate access issues. this particular study was not able to address such issues, but evidence indicates that the state library can play a significant role in ensuring sufficiency of internet access in public libraries in a state. maine, west virginia, and wisconsin all have state libraries that apply and distribute funds at the statewide level to ensure all public libraries, regardless of size or geography, have high­end connections to the internet. the state library of west virginia, for example, applied for e­rate funding for telecommunications costs on a statewide basis and received 79.1 percent funding in 2006, using such funding to cover not only connection costs for public libraries, but also to provide it and network support to libraries. another example of a successful statewide effort to provide sufficient internet access can be found in maryland. in the early 1990s, maryland public library administrators agreed to let the state library use library services and technology act (lsta) funds to build the sailor network, connecting all public libraries in the state.12 this network predates the e­rate program by a number of years, but having an established statewide network has helped the state library to coordinate table 10. highest percentages of public library systems receiving e-rate discounts by category and state in 2006 internet connectivity telecommunications services internal connection costs 1. louisiana 89.2% 1. mississippi 92.6% 1. mississippi 29.6% 2. indiana 70.8% 2. south carolina 89.4% 2. minnesota 22.6% 3. mississippi 63% 3. louisiana 79.5% 3. arizona 19.3% 4. minnesota 50.5% 4. west virginia 79.1% 4. west virginia 14.2% 5. tennessee 44.7% 5. maryland 76.2% 5. louisiana 12.3% national: 22.4% national: 39.6% national: 4.4% table 11. highest levels of public library systems not offering patron information technology training services by state in 2006 1. louisiana 48.7% 2. mississippi 40.7% 3. arkansas 39.6% 4. alaska 36% 5. arizona 34.8% 6. georgia 34.5% 7. new hampshire 32.8% 8. south carolina 31.1% 9. tennessee 30% 10. idaho 29% national: 21.4% 12 information technology and libraries | june 2007 applications, funding, and services among the libraries of the state. the state budget in maryland also provides other types of funding to support the state library, the library systems, and the library outlets in providing internet access. in states such as georgia, maryland, maine, west virginia, and wisconsin, the provision of internet access in public libraries is shaped not only by library outlets and library systems, but by the state libraries as well. in these and other states, the efforts of the state library appear to be reflected in the data from this study. a final area for discussion is the degree to which librarians understand how much bandwidth is required to meet the needs of library users, how to measure actual bandwidth that is available in the library, and how to determine the degree to which that bandwidth is suf­ ficient. indeed, many providers advertise that their con­ nection speeds are “up to” a certain speed when in fact they deliver considerably less.13 the authors have offered an analysis of determining the quality and sufficiency of bandwidth elsewhere.14 suffice to say that there is consid­ erable confusion as to “how good is good enough” band­ width connection quality. these types of issues frame understandings of how connected libraries in different states are and whether those connections are sufficient to meet the needs of patrons. n future research while the experience of individual patrons in particular libraries will vary widely in terms of whether the access available is sufficient to meet their information needs, the fact that the state data indicate variations in the levels and quality of access among some states and regions of the country is worthy of note. an important area of sub­ sequent research will be to investigate these differences, determine the reasons for them, and develop strategies to alleviate these apparent gaps in access. investigating these differences requires consideration of local and situational factors that may affect access in one library but perhaps not in another. for example, one public library may have access to an internet provider that offers higher speed connectivity that is not available in another location. the range of the possible local and situational factors affecting access and services is extensive. a prelimi­ nary list of the factors that contribute to being a success­ fully networked public library is described in greater detail in the 2006 study.15 however, additional investigation into the degree to which these factors affect access, quality of service, and user satisfaction needs to be continued. the personal experience of the authors in working with various state library agencies suggests the need for additional research that explores relationships among those states ranked highest in areas such as connectivity and workstations with programs and services offered by the state library agencies. one state library, for example, has a specific program that works directly with individual public libraries to assist them in completing the various e­rate forms. is there a link between that state library providing such assistance and the state’s public libraries receiving more e­rate discounts per capita than other states? this is but one example where investigating the role of the state library and comparing those roles and services to the rankings may be useful. perhaps a number of “best practices” could be identified that would assist the libraries in other states in improv­ ing access and services. in terms of research methods, future research on the topics identified in this article may need to draw upon strategies other than a national survey and on­site focus groups/interviews. the 2006 study, for the first time, included site visits and interviews and produced a wealth of data that supplemented the national survey data.16 on­site analysis of actual connection speeds in a sample of public libraries is but one example. the degree to which survey respondents know the connec­ tion speeds at specific workstations is unclear. simply because a t­1 line comes in the front door, it is not nec­ essarily the speed available at a particular workstation. other methods such as log file analysis or user­based surveys of networked services (as opposed to surveys completed by librarians) may offer insights that could augment the national survey data. other approaches such as policy analysis may also prove useful in better understanding access, connectiv­ ity, and services on a state­by­state basis. there has been no systematic description and analysis of state­based laws and regulations that affect public library internet access, connectivity, and services. the authors are aware of some states that ensure a minimum bandwidth will be provided to each public library in the state and pay for such connectivity. such is not true in other states. thus, a better understanding of how state­based policies and regulations affect access, connectivity, and services may identify strategies and policies that could be used in other states to increase or improve access, connectiv­ ity, and services. the data discussed in this article also point to many other important needs in future research. libraries in certain states that seem to be frequently ranking high in the tables indicate that certain states are better able to sustain their libraries in terms of finances and usage. however, additional factors may also be key in the differ­ ences among the states. future research needs to consider the internet access in public libraries in different states in relation to other services offered by libraries and to uses of the internet connectivity in libraries, including types of online content and services available, types of training public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 13 available, community outreach, other collection issues, staffing in relation to technology, and other factors. n conclusion internet and public computing access is almost univer­ sally available in public libraries in the united states, but there are differences in the amounts of access, the kinds of access, and sufficiency of the access available to meet patron demands. now that virtually every public library has an internet connection, provides internet access to patrons, and offers a range of public computing access, the attention of public libraries must refocus on ensuring that every library can provide sufficient internet and com­ puting access to meet patron needs. the issues to address include being open to the public a sufficient number of hours, having enough internet access workstations, hav­ ing adequate wireless access, and having sufficient speed and quality of connectivity to meet the needs of patrons. if a library is not able to provide sufficient access now, the situation will only continue to grow more difficult as the content and services on the internet continue to be more demanding of technical and bandwidth capacity. public libraries must also focus on increasing provi­ sion of internet access in light of federal, state, and local governments recently adding yet another significant level of services to public libraries by “requesting” that they provide access to and training in using numerous e­gov­ ernment services. such e­government services include social services, prescription drug plans, health care, disas­ ter support, tax filing, resource management, and many other activities.17 the maintenance of traditional services, the addi­ tion and expansion of public access computing and networked services, and now the addition of a range of e­government services tacitly required by federal, state, and local governments, in combination, risk stretching public library resources beyond their ability to keep up. to avoid such a situation, public libraries, library sys­ tems, and state governments must learn from the library outlets, systems, and states that are more successfully providing sufficient internet access to their patrons and their communities. among these leaders, there are likely models for success that can be identified for the benefit of other outlets, systems, and states. beyond the lessons that can be learned from the most connected, however, there are also practical and logistical issues that remain beyond the control of an individual library and sometimes the entire state, such as geographical and economic factors. ultimately, the analysis of state data offered here sug­ gests that much can be learned from one state that might assist another state in terms of improving connectivity, access, and services. while the data suggest a number of significant discrepancies among the various states, it may be that a range of best practices can be identified from those more highly ranked states that could be employed in other states to improve access, connectivity, and ser­ vices. staff at the various state library agencies may wish to discuss these findings and develop strategies that can then improve access nationwide. providing access to the internet is now as established a role for public libraries as providing access to books. patrons and communities, and now government orga­ nizations, rely on the fact that internet access will be available to everyone who needs it. while there are other points of access to the internet in some communities, such as school media centers and community technology centers, the public library is often the only public access point available in many communities.18 public libraries across the states must continually work to make sure the access they provide meets all of these needs. n acknowledgements the 2004 and 2006 public libraries and the internet studies were funded by the american library association and the bill & melinda gates foundation. drs. bertot, mcclure, and jaeger served as the co­principal investigators of the study. more information on these studies is available at http://www.ii.fsu.edu/plinternet/. references and notes 1. john carlo bertot, charles r. mcclure, and paul t. jaeger, public libraries and the internet 2004: survey results and findings (tallahassee, fla.: information institute, 2005), http://www.ii.fsu .edu/plinternet_reports.cfm; john carlo bertot et al., public libraries and the internet 2006: study results and findings (tal­ lahassee, fla.: information institute, 2006), http://www.ii.fsu. edu/plinternet_reports.cfm (accessed mar. 31, 2007). 2. bertot et al., public libraries and the internet 2006. 3. john carlo bertot and charles r. mcclure, “assessing the sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (2007): 14–22. 4. john carlo bertot et al., “drafted: i want you to deliver e­government,” library journal 131, no. 13 (2006): 34–39; john carlo bertot et al., “public access computing and internet access in public libraries: the role of pub­ lic libraries in e­government and emergency situations,” first monday 11, no. 9 (2006). http://www.firstmonday .org/issues/issue11_9/bertot/ (accessed mar. 31, 2007). 5. ibid.; paul t. jaeger et al., “the 2004 and 2005 gulf coast hurricanes: evolving roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly (in press). 6. there are actually nearly 17,000 service outlets in the united states. however, the sample frame eliminated bookmobiles as 14 information technology and libraries | june 2007 well as library outlets that the study team could neither geocode nor calculate poverty measures. additional information on the methodology is available in the study report at http://www.ii.fsu .edu/plinternet/ (accessed mar. 31, 2007). 7. bertot et al., public libraries and the internet 2006. 8. bertot, mcclure, and jaeger, public libraries and the internet 2004; bertot et al., public libraries and the internet 2006. the 2004 survey instrument is available at http://www.ii.fsu.edu/pro­ jectfiles/plinternet/plinternet_appendixa.pdf. the 2006 survey instrument is available at http://www.ii.fsu.edu/projectfiles/ plinternet/2006/appendix1.pdf (accessed mar. 31, 2007). 9. bertot et al., public libraries and the internet 2006. 10. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e­rate program and libraries and library consortia, 2000­ 2004: trends and issues,” information technology and libraries 24, no. 2 (2005): 57–67. 11. bertot, mcclure, and jaeger, public libraries and the internet 2004; bertot et al., public libraries and the internet 2006; john carlo bertot, charles r. mcclure, and paul t. jaeger, “public libraries struggle to meet internet demand: new study shows libraries need support to sustain online services,” american libraries 36, no. 7 (2005): 78–79. 12. john carlo bertot and charles r. mcclure, sailor assessment final report: findings and future sailor development (bal­ timore, md.: division of library development and services, 1996). 13. matt richtel and ken belson, “not always full speed ahead,” new york times, nov. 18, 2006. 14. bertot and mcclure, “assessing the sufficiency,” 14–22. 15. bertot et al., public libraries and the internet 2006. 16. ibid. 17. bertot et al., “drafted: i want you to deliver e­govern­ ment”; bertot et al., “public access computing and internet access in public libraries”; jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 18. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 52 information technology and libraries | june 2006 author name and second author author id box for 2 column layout this paper discusses google scholar as an extension of kilgour’s goal to improve the availability of information. kilgour was instrumental in the early development of the online library catalog, and he proposed passage retrieval to aid in information seeking. google scholar is a direct descendent of these technologies foreseen by kilgour. google scholar holds promise as a means for libraries to expand their reach to new user communities, and to enable libraries to provide quality resources to users during their online search process. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. f red kilgour would probably approve of google scholar. kilgour wrote that the paramount goal of his professional career is “improving the availability of information.”1 he wrote about his goal of achieving this increase through shared electronic cataloging, and even argued that shared electronic cataloging will move libraries toward the goal of 100 percent availability of information.2 throughout much of kilgour’s life, 100 percent availability of information meant that all of a library’s books would be on the shelves when a user needed them. in proposing shared electronic cataloging—in other words, online union catalogs—kilgour was proposing that users could identify libraries’ holdings without having to travel to the library to use the card catalog. this would make the holdings of remote libraries as visible to users as the holdings of their local library. kilgour went further than this, however, and also proposed that the full text of books could be made available to users electronically.3 this would move libraries toward the goal of 100 percent availability of information even more than online union catalogs. an electronic resource, unlike physical items, is never checked out; it may, in theory, be simultaneously used by an unlimited number of users. where there are restrictions on the number of users of an electronic resource—as with subscription services such as netlibrary, for example—this is not a necessary limitation of the technology, but rather a limitation imposed by licensing and legal arrangements. kilgour understood that his goal of 100 percent availability of information would only be reached by leveraging increasingly powerful technologies. the existence of effective search tools and the usability of those tools would be crucial so that the user would be able to locate available information without assistance.4 to achieve this goal, therefore, kilgour proposed and was instrumental in the early development of much library automation: he was behind the first uses of punched cards for keeping circulation records, he was behind the development of the first online union catalog, and he called for passage retrieval for information seeking at a time when such systems were first being developed.5 this development and application of technology was all directed toward the goal of improving the availability of information. kilgour stated that the goal of these proposed information-retrieval and other systems was “to supply the user with the information he requires, and only that information.”6 shared catalogs and electronically available text have the effect of removing both spatial and temporal barriers between the user and the material being used. when the user can access materials “from a personal microcomputer that may be located in a home, dormitory, office, or school,” the user no longer has to physically go to the library.7 this is a spatial barrier when the library is located at some distance from the user, or if the user is physically constrained in some way. even if the user is perfectly able-bodied, however, and located close to a library, electronic access still eliminates a temporal barrier: accessing materials online is frequently faster and more convenient than physically going to the library. electronic access enables 100 percent availability of information in two ways: by ensuring that the material is available when the user wants it, and by lowering or removing any actual or perceived barriers to the user accessing the material. ■ library automation weise writes that “for at least the last twenty to thirty years, we [librarians] have done our best to provide them [users] with services so they won’t have to come to the library.”8 the services that weise is referring to are the ability for users to search for and gain access to the full text of materials online. libraries of all types have widely adopted these services: for example, at the author’s own institution, the university of north carolina at chapel hill, the libraries have subscriptions to approximately seven hundred databases and provide access to more than 32,000 unique periodical titles; many of these subscriptions provide access to the full text of materials.9 additionally, the state library of north carolina provides a set of more than one hundred database subscriptions to all academic and public libraries around the jeffrey pomerantz jeffrey pomerantz (pomerantz@unc.edu) is assistant pro fessor in the school of information and library science, university of north carolina at chapel hill. google scholar and 100 percent availability of information google scholar and 100 percent availability of information | pomerantz 53 state; any north carolina resident with a library card may access these databases.10 several other states have similar programs. by providing users with remote access to materials, libraries have created an environment in which it is possible for users to be remote from the library. or rather, as lipow points out, it is the library that is remote from the user, yet the user is able to seek and find information.11 this adoption of technology by libraries has had the effect of enabling and empowering users to seek information for themselves, without either physically going to a library or seeking a librarian’s assistance. the increasing sophistication of freely available tools for information seeking on the web has accelerated this trend. in many cases, users may seek information for themselves online without making any use of a library’s human-intermediated or other traditional services. (certainly, providing access to electronic collections may be considered to be a service of the library, but this is a service that may not require the user either to be physically in the library or to communicate with a librarian.) even technically unsophisticated users may use a search engine and locate information that is “good enough” to fulfill their information needs, even if it is not the ideal or most complete information for those purposes.12 thus, for better or worse, the physical library is no longer the primary focus for many information seekers. part of this movement by users toward self-sufficiency in information seeking is due to the success of the web search engine, and to the success of google in particular. recent reports from the pew internet and american life project shed a great deal of light on users’ use of these tools. rainie and horrigan found that “on a typical day at the end of 2004, some 70 million american adults logged onto the internet.”13 fallows found that “on any given day, 56% of those online use search engines.”14 fallows, rainie, and mudd found that of their respondents, “47% say that google is their top choice of search engine.”15 from these figures, it can be roughly estimated that more than 39 million people use search engines, and more than 18 million use google on any given day—and that is only within the united states. this trend seems quite dark for libraries, but it actually has its bright side. it is important to make a distinction here between use of a search engine and use of a reference service or other library service. there is some evidence that users’ questions to library reference services are becoming more complex.16 why this is occurring is less clear, but it may be hypothesized that users are locating information that is good enough to answer their own simple questions using search engines or other internet-based tools. the definition of “good enough” may differ considerably between a user and a librarian. nevertheless, one function of the library is education, and as with all education, the ultimate goal is to make the student self-sufficient in self-teaching. in the context of the library, this means that one goal is to make the user self-sufficient in finding, evaluating, and using information resources. if users are answering their own simple questions, and asking the more difficult questions, then it may be hypothesized that the widespread use of search engines has had a role in raising the level of debate, so to speak, in libraries. rather than providing instruction to users on simply using search engines, librarians may now assume that some percentage of library users possess this skill, and may focus on teaching higher-level information-literacy skills to users (www.ala.org/ala/acrl/ acrlstandards/informationliteracycompetency.htm). simple questions that users may answer for themselves using a search engine, and complex questions requiring a librarian’s assistance to answer are not opposites, of course, but rather two ends of a spectrum of the complexity of questions. while the advance of online search tools may enable users to seek and find information for themselves at one end of this spectrum, it seems unlikely that such tools will enable users to do the same across the entire spectrum any time soon; perhaps ever. the author believes that there will continue to be a role for librarians in assisting users to find, evaluate, and use information. it is also important to make another distinction here, between the discovery of resources, and access to those resources. libraries have always provided mechanisms for users to both discover and access resources. neither the card catalog nor the online catalog contains the full text of the materials cataloged; rather, these tools are means to enable the user to discover the existence of resources. the user may then access these resources by visiting the library. search engines, similar to the card and online catalogs, are tools primarily for discovery of resources: search-engine databases may contain cached copies of web pages, but the original (and most up-todate) version of the web page resides elsewhere on the web. thus, a search engine enables the user to discover the existence of web pages, but the user must then access those web pages elsewhere. the author believes that there will continue to be a role for libraries in providing access to resources—regardless of where the user has discovered those resources. in order to ensure that libraries and librarians remain a critical part of the user’s information-seeking process, however, libraries must reappropriate technologies for online information seeking. search engines may exist separate from libraries, and users may use them without making use of any library service. however, libraries are already the venue through which users access much online content—newspapers, journals, and other periodicals; reference sources; genealogical materials—even if many users do not physically come to the library or consult a librarian when using them. it is possible for 54 information technology and libraries | june 2006 libraries to add value to search technologies by providing a layer of service available to those using it. ■ google scholar one such technology for online information seeking to which libraries are already adding value, and that could add value to libraries in turn, is google scholar (scholar. google.com). google scholar is a specialty search tool, obviously provided by google, which enables the user to search for scholarly literature online. this literature may be on the free web (as open-access publications become more common and as scholars increasingly post preprint or post-print copies of their work on their personal web sites), or it may be in subscription databases.17 users may access literature in subscription databases in one of two ways: (1) if the user is affiliated with an institution that subscribes to the database, the user may access it via whatever authentication method is in place at the institution (e.g., ip authentication, a proxy server), or (2) if the user is not affiliated with such an institution, the user may pay for access to individual resources on a pay-perview basis. there is not sufficient space here to explore the details of google scholar’s operation, and anyway that is not the point of this paper; for excellent discussions of the operation of google scholar, see gardner and eng, and jacsó.18 pace draws a distinction between federated searching and metasearching: federated search tools compile and index all resources proactively, prior to any user’s actual search, in a just-in-case approach to users’ searching.19 metasearch tools, on the other hand, search all resources on the fly at the time of a user’s search, in a just-in-time approach to users’ searching. google scholar is a federated search tool—as, indeed, are all of google’s current services—in that the database that the user searches is compiled prior to the user’s actual search. in this, google scholar is a direct descendent of kilgour’s work to develop shared online library catalogs. a shared library catalog is a union catalog: it is a database of libraries’ physical holdings, compiled prior to any actual user’s search. google scholar is also a union catalog, though a catalog of publishers’ electronic offerings provided by libraries, rather than of libraries’ physical holdings. it should be noted, however, that while this difference is an important one for libraries and publishers, it might not be understood or even relevant for many users. many of the resources indexed in google scholar are also available in full text. this fact allows google scholar to also move in the direction of kilgour’s goal of making passage retrieval possible for scholarly work. by using google’s core technology—the search engine and the inverted index that is created when pages are indexed by a search engine—google scholar enables full-text searching of scholarly work. as mentioned above, when users search google scholar, they retrieve a set of links to the scholarly literature retrieved by the search. google scholar also makes use of google’s linkanalysis algorithms to analyze the network of citations between publications—instead of the network of hyperlinks between web pages, as google’s search engine more typically analyzes. a cited by link is included with each retrieved link in google scholar, stating how many other publications cite the publication listed. clicking on this cited by link performs a preformulated search for those publications. this citation-analysis functionality resembles the functionality of one of the most common and widely used scholarly databases in the scholarly community: the isi web of science (wos) database (scientific .thomson.com/products/wos). wos enables users to track citations between publications. this functionality has wide use in scholarly research, but until google scholar, it has been largely unknown outside of the scholarly community. with the advent of google scholar, however, this functionality may be employed by any user for any research. further, there is a plugin for the firefox browser (www.mozilla.com/firefox) that displays an icon for every record on the page of retrieved results that links to the appropriate record in the library’s opac (google scholar does not, however, currently provide this functionality natively20). this provides a link from google scholar to the materials that the library holds in its collection. when the item is a book, for example, this link to the opac enables users to find the call number of the book in their local library. when the item is a journal, it enables them to find both the call number and any database subscriptions that index that journal title. periodicals are often indexed in multiple databases, so libraries with multiple-database subscriptions often have multiple means of accessing electronic versions of journal titles. a library user may access a periodical via any or all of these individual subscriptions without using google scholar— but to do so, the user must know which database to use, which means knowing either the topical scope of a database or knowing which specific journals are indexed in a database. as a more centralized means of accessing this material, many users may prefer a link in google scholar to the library’s opac. google scholar thus fulfills, in large part, kilgour’s vision of shared electronic cataloging. in turn, shared cataloging goes a long way toward achieving kilgour’s vision of 100 percent availability of information by allowing a user to discover the existence of information resources. however, discovery of resources is only half of the equation: the other half is access to those resources. and it is here where libraries may position themselves as a critical part of the information-seeking process. search engines google scholar and 100 percent availability of information | pomerantz 55 may enable users to discover information resources on their own, without making use of a library’s services, but it is the library that provides the “last mile” of service, enabling users to gain access to many of those resources. ■ conclusion google scholar is the topic of a great deal of debate, both in the library arena and elsewhere.21 unlike union catalogs and many other online resources used in libraries, it is unknown what materials are included in google scholar, since as of this writing google has not released information about which publishers, titles, and dates are indexed.22 google is known to engage in self-censorship—or self-filtering, depending on what coverage one reads—and so potentially conflicts with the american library association’s freedom to read statement (www .ala.org/ala/oif/statementspols/ftrstatement/freedom readstatement.htm).23 google is a commercial entity and, as such, a primary motivation of google must be profit, and only secondarily, meeting the information needs of library users. for all of these and other reasons, there is considerable debate among librarians about whether it is appropriate for libraries to provide access to google scholar. despite this debate, however, users are using google scholar. google scholar is simply the latest tool to enable users to seek information for themselves; it isn’t the first and it won’t be the last. google scholar holds a great deal of promise for libraries due to the combination of google’s popularity and ease of use, and the resources held by or subscribed to by libraries to which google scholar points. as kesselman and watstein suggest, “libraries and librarians need to have a voice” in how tools such as google scholar are used, given that “we are the ones most passionate about meeting the information needs of our users.” given that library users are using google scholar, it is to libraries’ benefit to see that it is used well. google scholar is the latest tool in a long history of information-seeking technologies that increasingly realize kilgour’s goal of achieving 100 percent availability of information. google scholar does not provide access to 100 percent of information resources in existence; but rather enables discovery of information resources, and allows for the possibility that these resources will be discoverable by the user 100 percent of the time. google scholar may be on the vanguard of a new way of integrating library services into users’ everyday information-seeking habits. as taylor tells us, people have their own individual sources to which they go to find information, and libraries—for many people—are not at the top of their lists.25 google, however, is at the top of the list for a great many people.26 properly harnessed by libraries, therefore, google scholar has the potential to bring users to library resources when they are seeking information. google scholar may not bring users physically to the library. instead, what google scholar can do is bring users into contact with resources provided by the library. this is an important distinction, because it reinforces a change that libraries have been undergoing since the advent of the online database: that of providing access to materials that the library may not own. ownership of materials potentially allows for a greater measure of control over the materials and their use. ownership in the context of libraries has traditionally meant ownership of physical materials, and physical materials by nature restrict use, since the user must be physically collocated with the materials, and use of materials by one user precludes use of those materials by other users for the duration of the use. providing access to materials, on the other hand, means that the library may have less control over materials and their use, but this potentially allows for wider use of these materials. by enabling users to come into contact with library resources in the course of their ordinary web searches, google scholar has the potential to ensure that libraries remain a critical part of the user’s information-seeking process. it benefits google when a library participates with google scholar, but it also benefits the library and the library’s users: the library is able to provide users with a familiar and easy-to-use path to materials. this is (for lack of a better term) a “spoonful of sugar” approach to seeking and finding information resources: by using an interface that is familiar to users, libraries may provide quality information sources in response to users’ information seeking. green wrote that “a librarian should be as unwilling to allow an inquirer to leave the library with his question unanswered as a shop-keeper is to have a customer go out of his store without making a purchase.”27 a modern version of this might be that a librarian should be as unwilling to allow an inquirer to abandon a search with his question unanswered. google scholar and online tools like it have the potential to draw users away from libraries; however, these tools also have the potential to usher in a new era of service for libraries: an expansion of the reach of libraries to new users and user communities; a closer integration with users’ searches for information; and the provision of quality resources to all users, in response to all information needs. google scholar and online tools like it have the potential to enable libraries to realize kilgour ’s goals of improving the availability of information, and to provide 100 percent availability of information. these are goals on which all libraries can agree. 56 information technology and libraries | june 2006 ■ acknowledgements many thanks to lisa norberg, instruction librarian, and timothy shearer, systems librarian, both at the university of north carolina at chapel hill, for many extensive conversations about google scholar, which approached coauthorship of this paper. this paper is dedicated to the memory of kenneth d. shearer. references and notes 1. frederick g. kilgour, “historical note: a personalized prehistory of oclc,” journal of the american society for information science 38, no. 5 (1987): 381. 2. frederick g. kilgour, “future of library computerization,” in current trends in library automation: papers presented at a workshop sponsored by the urban libraries council in cooperation with the cleveland public library, alex ladenson, ed. (chicago: urban libraries council, 1981), 99–106; frederick g. kilgour, “toward 100 percent availability,” library journal 114, no. 19 (1989): 50–53. 3. kilgour, “toward 100 percent availability.” 4. frederick g. kilgour, “lack of indexes in works on information science,” journal of the american society for information science 44, no. 6 (1993): 364; frederick g. kilgour, “implications for the future of reference/information service,” in collected papers of frederick g. kilgour: oclc years, lois l. yoakam, ed. (dublin, ohio: oclc online computer library center, inc., 1984): 9–15. 5. frederick g. kilgour, “a new punched card for circulation records,” library journal 64, no. 4 (1939): 131–33; kilgour, “historical note”; frederick g. kilgour and nancy l. feder, “quotations referenced in scholarly monographs,” journal of the american society for information science 43, no. 3 (1992): 266–70; gerald salton, j. allan, and chris buckley, “approaches to passage retrieval in full-text information systems,” in proceedings of the 16th annual international acm sigir conference on research and development in information retrieval (new york: acm pr., 1993), 49–58. 6. kilgour, “implications for the future of reference/information service,” 95. 7. kilgour, “toward 100 percent availability,” 50. 8. frieda weise, “being there: the library as place,” journal of the medical library association 92, no. 1 (2004): 10, www.pubmedcentral.nih.gov/articlerender.fcgi?artid=314099 (accessed apr. 9, 2006). 9. it is difficult to determine precise figures, as there is considerable overlap in coverage; several vendors provide access to some of the same periodicals. 10. north carolina’s database subscriptions are via the nc live service, www.nclive.org (accessed apr. 9, 2006). 11. anne g. lipow, “serving the remote user: reference service in the digital environment,” paper presented at the ninth australasian information online and on disc conference and exhibition, sydney, australia, 19–21 jan. 1999, www.csu.edu.au/ special/online99/proceedings99/200.htm (accessed apr. 9, 2006). 12. j. janes, “academic reference: playing to our strengths,” portal: libraries and the academy 4, no. 4 (2004): 533–36, http:// muse.jhu.edu/journals/portal_libraries_and_the_academy/ v004/4.4janes.html (accessed apr. 9, 2006). 13. lee rainie and john horrigan, a decade of adoption: how the internet has woven itself into american life (washington, d.c.: pew internet & american life project, 2005), 58, www.pewinter net.org/ppf/r/148/report_display.asp (accessed apr. 9, 2006). 14. deborah fallows, search engine users (washington, d.c.: pew internet & american life project, 2005), i, www.pew internet.org/pdfs/pip_searchengine_users.pdf (accessed apr. 9, 2006). 15. deborah fallows, lee rainie, and graham mudd, data memo on search engines (washington, d.c.: pew internet & american life project, 2004), 3, www.pewinternet.org/ppf/ r/132/report_display.asp (accessed apr. 9, 2006). 16. laura bushallow-wilber, gemma devinney, and fritz whitcomb, “electronic mail reference service: a study,” rq 35, no. 3 (1996): 359–69; carol tenopir and lisa a. ennis, “reference services in the new millennium,” online 25, no. 4 (2001): 40–45. 17. alma swan and sheridan brown, open access selfarchiving: an author study (truro, england: key perspectives, 2005), www.jisc.ac.uk/uploaded_documents/open%20access %20self%20archiving-an%20author%20study.pdf (accessed apr. 9, 2006). 18. susan gardner and susanna eng, “gaga over google? scholar in the social sciences,” library hi tech news 8 (2005): 42–45; péter jacsó, “google scholar: the pros and the cons,” online information review 29, no. 2 (2005): 208–14. 19. andrew pace, “introduction to metasearch . . . and the niso metasearch initiative,” presentation to the openurl and metasearch workshop, sept. 19–21, 2005, www.niso.org/news/ events_workshops/openurl-05-ppts/2-1-pace.ppt (accessed apr. 9, 2006). 20. this plugin was developed by peter binkley, digital initiatives technology librarian at the university of alberta. see www.ualberta.ca/~pbinkley/gso (accessed apr. 9, 2006). 21. see, for example, gardner and eng, “gaga over google?”; jacsó, “google scholar”; m. kesselman and s. b. watstein, “google scholar and libraries: point/counterpoint,” reference services review 33, no. 4 (2005): 380–87. 22. jacsó, “google scholar.” 23. anonymous, google censors itself for china, bbc news, jan. 25, 2006, http://news.bbc.co.uk/2/hi/technology/4645596 .stm (accessed apr. 9, 2006); a. mclaughlin, “google in china,” google blog., jan. 27, 2006, http://googleblog.blogspot .com/2006/01/google-in-china.html (accessed apr. 9, 2006). 24. kesselman and s. b. watstein, “google scholar and libraries,” 386. 25. robert s. taylor, “question-negotiation and information seeking in libraries,” college & research libraries 29, no. 3 (1968): 178–94. 26. fallows, rainie, and mudd, data memo on search engines. 27. samuel s. green, “personal relations between librarians and readers,” american library journal i, no. 2–3 (1876): 79. 1–11. editorial the authors of “the state of rfid applications in libraries,” that appeared in the march 2006 issue, inadvertently included two sentences that are near quotations from a commentary by peter warfield and lee tien in the april 8, 2005 issue of the berkeley daily planet. on page 30 immediately following footnote 24, the authors wrote: “the eugene public library reported ‘collision’ problems on very thin materials and on videos as well as false readings from the rfid security gates. collision problems mean that two or more tags are close enough to cancel the signals, making them undetectable by the rfid checkout and security systems.” warfield and lien wrote: “the eugene (ore.) public library reported ‘collision’ problems on very thin materials and on videos as well as ‘false readings’ from the rfid security gates. (collision problems mean that two or more tags are close enough to ‘cancel the signals,’ according to an american library association publication, making them undetectable by the rfid checkout and security systems.)” (accessed may 16, 2006, www .berkeleydailyplanet.com/article.cfm?archivedate=04-08 -05&storyid=21128). the authors’ research notes indicated that it was a near quotation, but this fact was lost in the writing of the article. the article referee, the copy editors, and i did not question the authors because earlier in the same paragraph they wrote about the eugene public library experience and referred (footnote 23) to an earlier article in the berkeley daily planet. the authors and i apologize for this unfortunate error. **** july 1, 2006 marked the merger of rlg and oclc. by the time this editorial appears, many words will already have been spoken and written about this monumental, twentyfirst century library event. i know what i think the three very important immediate effects of the merger will be. first, it is a giant step toward the realization of a global library bibliographic database. second, taking advantage of rlg’s unique and successful programs and integrating them and their development philosophy as “rlgprograms,” while working alongside oclc research, seems a step so important for the future development of library technology that it cannot be overemphasized. third, and very practically, incorporating redlightgreen into open worldcat will give the library world a product that users might prefer over a search of google books or amazon. i requested and received quotes about the merger from the principals that i might put into this editorial that won’t appear until four months after the may 3 announcement. jay jordan, president and ceo, oclc, remarked: “we have worked cooperatively with rlg on a variety of projects over the years. since we announced our plans to combine, staff from both organizations have been working together to develop plans and strategies to integrate systems, products, and services. over the past several months, staff members have demonstrated great mutual respect, energy, and enthusiasm for the potential of our new relationship and what it means for the organizations we serve. there is much work to be done as we complete this transition. clearly, we are off to a good start.” betsy wilson, chair, oclc board of trustees, and dean of libraries, university of washington, wrote: “the response from our constituencies has been overwhelmingly supportive. over the past several months, we have finalized appointments for the twelve-person program council, which reports to . . . oclc through a standing committee called the rlg board committee. we are starting to build agendas for our new alliance. the members of this group from the rlg board are: james neal, vice president for information services and university librarian, columbia university; nancy eaton, dean of university libraries and scholarly communication, penn state university (and former chair of the oclc board); and carol mandel, dean of libraries, new york university. from oclc the members are elisabeth niggeman, director, deutschesbibliothek; jane ryland, senior scientist, internet 2; and betsy wilson, dean of university libraries, university of washington.” and from james michalko, currently president and ceo of rlg, and by the time you read this, vice president, rlg-programs development, oclc: “we are combining the practices of rlg and oclc in a very powerful way— by putting together the traditions of rlg and oclc we are creating a robust new venue for research institutions and new capacity that will provide unique and beneficial outcomes to the whole community.” by now, all lita members and ital readers know that in 1967, fred kilgour founded oclc; and was the founding editor of the journal of library automation (jola—vol. 1, no. 1 was published in march, 1968), which, with but a mild outcry from serials librarians, changed its title to information technology and libraries in 1982. this afternoon (6/15/06), i called fred. he and his wife eleanor reminisced about the earliest days, and then i asked him for his comments on the oclc-rlg merger. because he had had the first words about both oclc and jola, as it were, i told him that i would like for him to have the last. and this is what he said, “at long last!” fred kilgour died on july 31, 2006, aged 92. a tribute posted by alane wilson of oclc may be read at http:// scanblog.blogspot.com/2006/07/frederick-g-kilgour -1914-2006.html editorial: a confession, a speculation, and a farewell john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university and editor of information technology and libraries. editorial | webb 115 mets as an intermediary schema for a digital library of complex scientific multimedia richard gartner information technology and libraries | september 2012 24 abstract the use of the metadata encoding and transmission standard (mets) schema as a mechanism for delivering a digital library of complex scientific multimedia is examined as an alternative to the fedora content model (fcm). using mets as an “intermediary” schema, where it functions as a template that is populated with content metadata on the fly using extensible stylesheet language transformations (xslt), it is possible to replicate the flexibility of structure and granularity of fcm while avoiding its complexity and often substantial demands on developers. mets as an intermediary schema for a digital library of complex scientific multimedia of the many possible approaches to structuring complex data for delivery via the web, two divergent philosophies appear to predominate. one, exemplified by such standards as the metadata encoding and transmission standard (mets)1 or the digital item declaration language (didl),2 relies on the structured packaging of the multiple components of a complex object within “top-down” hierarchies. the second, of which the fedora content model (fcm) is perhaps a prime example,3 takes the opposite approach of disaggregating structural units into atomistic objects, which can then be recombined according to the requirements of a given application.4 neither is absolute in its approach—mets, for instance, allows cross-hierarchy linkages, and many fcm models are designed hierarchically—but the distinction is clear. many advantages are validly claimed for the fcm approach to structuring digital data objects. individual components, not constrained to hierarchies, may be readily reused in multiple representations with great flexibility.5 complex interobject relationships may be encoded using semantic linkages,6 a potentially much richer approach to expressing these than the structural relationships of xml can allow. multiple levels of granularity, from that of the collection as a whole down to its lowest-level components, can readily be modelled, allowing interobject relationships to be encoded as easily as intercomponent ones.7 such models, particularly the rdf-based fedora content model, are very powerful and flexible, but can often lead to complexity and consequently considerable demands on system development before they can be implemented. in addition, despite the theoretical interoperability offered by rdf, in practice the exchange and reuse of content models has proved somewhat limited because considerable work is usually required to re-create and validate a content model created elsewhere.8 this article examines whether it is possible to replicate the advantages of this approach to structuring data within the constraints of the more rigid mets standard. the data used for this analysis is a set of digital objects that result from biological nanoimaging experiments, the interrelationships of which present complex problems when they are delivered online. the richard gartner (richard.gartner@kcl.ac.uk) is a lecturer in library and information science, king’s college, london. mets as an intermediary schema for a digital library of scientific multimedia | gartner 25 method used is an unconventional use of a mets template as an intermediary schema;9 this allows something of the flexibility of the fcm approach while retaining the relative simplicity of the more structured mets model. a nanoimaging digital library and its metadata requirements the collection analysed for this study derives from biological nanoimaging experiments undertaken at the randall division of cell and molecular biophysics at king’s college london. biological nanoimaging is a relatively new field of research that aims to unravel the biological processes at work when molecules interact in living cells; this is done by using optical techniques that can resolve images down to the molecular level. it has particular value in the study of how diseases progress and has great potential to help predict the effects of drugs on the physiology of human organs. as part of the biophysical repositories in the lab (bril) project at king’s college london,10 a digital library is being produced to meet the needs of practitioners of live cell protein studies. although the material being made available here is highly specialised, and the user base is restricted to a specialist cohort of biologists, the challenges of this library are similar to those of any collection of digital objects: in particular, the metadata strategy employed must be able to handle the delivery of complex, multifile objects as efficiently as, for example, a library of digitized books has to manage the multiplicity of image files that make up a single digital volume. the digital library itself is hosted on the widely used fedora repository platform; as a result, it is employing fcm as the basis of its data modelling. the purpose of this analysis is to ascertain whether mets can also be used for the complex models required by this data and to compare its potential viability as an architecture for this type of application with fcm. a particular challenge of this collection is that the raw images from which it is constituted require combining and processing before they are delivered to the user. a further challenge is that the library encompasses images from a variety of experiments, all of which combine these files in different ways and employ different software for processing them. some measure of the complexity of these requirements can be gathered from figure 1 below, which illustrates the processes involved in delivering the digital objects for two types of experiments. figure 1. architecture for two experiment types mets as an intermediary schema for a digital library of scientific multimedia | gartner 26 the images created by two experiments, bleach and actin_5, are shown here: it will be seen that the bleach experiment is divided into two subtypes (here called 2grating and apotone). each type or subtype of experiment has its own requirements for combining the images it produces before they are displayed. for the subtype 2grating, for instance, two images, each generated using a different camera grating, are processed in parallel (indicated by the brackets); these are then combined using the software package process-non-clem (shown by the hexagonal symbol) to produce a display image in tiff format. the subtype apotone requires three grating images and a further image with background information to be processed in parallel by the software process-apotone; in this case, the background image provides data to be subtracted from the combined three grating images to produce the final tiff for display. actin_5 experiments are entirely different: they produce still images that need to be processed sequentially (shown by the braces) to produce a video. encoding the bril architecture in mets this architecture, although complex, is readily handled within mets in a manner analogous to that of more conventional collections. as in any mets file, the structure of the experiments, including their subexperiments, is encoded using nested division (
) elements in the structural map (example 1a).
[subsidiary
s containing image information]
[subsidiary
s containing image information]
[subsidiary
s containing image information]
example 1a. sample experiment-level structural map within these containing divisions, subsidiary
elements are used to map the combination of images necessary to deliver the content for each type. mets allows the specification of the parallel or sequential structuring of files using its and elements respectively. the parallel processing of the apotone subtype, for instance, could be encoded as shown in example 1b. information technology and libraries | september 2012 27
example 1b. sample parallel structure for raw image files to be combined using a process specified in associated metadata (behavior section) each division of the structural map of this type may in its turn be attached to a specific software item in the mets behavior section to designate the application through which it should be processed: the tri-partite set of images in example 1b, for instance, would be linked to the processapotone software using the code in example 1c. example 1c. sample mets behavior mechanism for a specification of image processing this approach is straightforward, and mets is capable of encoding all of the requirements of this data model, although at the cost of large file sizes and a degree of inflexibility. this may be no problem when the principle rationale behind the creation of this metadata is preservation: linking all of the project metadata in a coherent, albeit monolithic, structure of this kind benefits especially its usage as an open archival information system (oais) archival information package (aip), one of the key functions for which mets was designed. problems are likely to arise, however, when this approach is scaled up in a delivery system to include the potentially millions of data objects that this project may produce. the large size of the mets files that this approach necessitates makes their on-the-fly processing for delivery much slower than a system that uses aggregations of the smaller files required by the fcm model and so processes only metadata at the granularity necessary for the delivery of each object. such flexibility is much harder to achieve within mets, although mechanisms that currently exist for aggregating diverse objects within mets may seem to offer some degree of solution to this problem. complex relationships under mets underlying the mets structural map is an assumed ontology of digital objects that encodes a longestablished view of text as an ordered hierarchy of content objects;11 this model accounts for the mets as an intermediary schema for a digital library of scientific multimedia | gartner 28 map’s use of hierarchical nesting and the ordinality of the object’s components. the rigidity of this model is alleviated to some extent by the facility within mets to encode structural links that cut across these hierarchies. these links, which join nodes at any level of the structural map, are particularly useful for encoding hyperlinks within webpages,12 and so are often used for archiving websites. various attempts have been made to extend the functionality of the structural map and structural links sections to allow more sophisticated aggregations and combinations of components beyond the boundaries of a single digital object, in a manner analogous to the flexible granularity of fcm. mets itself offers the possibility of aggregating other mets files through its (mets pointer) element: this element, always a direct child of a
element in the structural map, references a mets file that contains metadata on the digital content represented by this
. for example, two complex digital objects could be represented at a higher collection level, as shown in example 2.
example 2. use of mets element this feature has found some use in such projects as the echo depository, which uses it to register digital objects at various stages of their ingest into, and dissemination from, a repository;13 it is also recommended by the paradigm project as a method for archiving born-digital content, such as emails.14 nonetheless, its usage remains fairly limited; of all the mets profiles registered on the central mets repository, for instance, echo dep at the time of writing remains the only project on the library of congress’s repository of mets profiles to employ this feature. 15 an important reason for its limited take-up is that its potential for more sophisticated uses than merely populating a division of the structural map is severely limited by its place in the mets schema. the element can only be used as a direct child of its parent
: it cannot, for instance, be located in or elements to indicate that the objects referenced in its subordinate mets files should be processed in parallel or in sequence (as is required by the different experiment types in figure 1), nor may the contents of these files be processed by the sophisticated partitioning features of the element, which allows subsidiary parts of a
to be addressed directly. a more sophisticated approach to combining digital object components is to employ open archives initiative object reuse and exchange (oai-ore) aggregations,16 which express more complex relationships at greater levels of granularity than the method allows. information technology and libraries | september 2012 29 mcdonough’s examination of the possibility of aligning the two standards concludes that it is indeed possible, although at the cost of eliminating the mets behavior section and removing much of the flexibility of mets’s structural links, both side effects of oai-ore’s requirement that resource maps must form a connected rdf graph.17 in addition, converting between mets and oai-ore may not be lossless, depending on the design of the mets document.18 neither approach therefore seems ideal for an application of this type, the former because of the limited ways in which the element can be deployed outside the element and its subsidiaries, the latter because of its removal of the functionality of the behavior section, which is essential for the delivery of material such as this. mets as an intermediary schema an alternative approach adopted here uses the technique of employing mets files as intermediary schemas to act as templates from which mets-encoded packages for delivery can be generated. intermediary xml schemas are intermediary in the sense that they are designed not to act as final metadata containers for archiving or delivery, but as mediating encoding mechanisms from which data or metadata in these final forms can be generated by xslt transformations: one example is cerif4ref, a heavily constrained xml schema used to encode research management information from which metadata in the complex common european research information format (cerif) data model can be generated.19 the cerif4ref schema attempts to emulate the architectural processing features of sgml,20 which are absent from xml; these allowed simpler document type definitions (dtds) to be compiled for specific applications, which could then be mapped to established, more complex, sgml models. instead of architectural processing, cerif4ref uses xslt to carry out this processing, so allowing the combination of a simpler scheme tailored to the requirements of an application to be combined with the benefits of a more interoperable but highly complex model that is difficult to implement in its standard form. instead of using this technique for constraining encoding to a simpler model and generating more complex data structures from this, the intermediary schema technique may be used to define templates, similar to a content model, from which the final mets files to be delivered can be constructed. as is the case with cerif4ref, xslt is used for these transformations, and the xslt files form an integral part of the application. in this way, a series of templates, beginning with highest-level abstractions, are used to generate their more concrete subsidiaries, until a final version used for dissemination is generated. the core of this application is a mets file, which acts as a template for the data delivery requirements for each type of experiment. figure 2 demonstrates the components necessary for defining these for the 2grating experiment subtype detailed previously in figure 1. mets as an intermediary schema for a digital library of scientific multimedia | gartner 30 figure 2. defining an experiment subtype in mets the data model for the delivery of these objects is defined in the (b): as can be seen here, a series of nested
elements is used to define the relationship of experiment subtypes to types, and then to define, at the lowest level of this structure, the model for delivering the objects themselves. in this example, two files are to be processed in parallel; these are defined by elements within the (parallel) element. in a standard mets file, the fileid attribute of would reference a element within the mets file section (a): in this case, however, they reference empty file group () elements, which are populated with elements when this template undergoes its xslt transformation. the final component of this template is the mets behavior section (c), in which the applications required to process the digital objects are defined. two behavior sections are shown in this example: the first is used to invoke the xslt transformation by which this mets template file is to be processed, the second to define the software necessary to co-process the two images files for delivery. both indicate the divisions of the structural map whose components they process by their structid attributes: the first references the map as a whole because it applies to recursively to the mets file itself, the second references the experiment for which it is needed. when delivering a digital object, it is then necessary to process this template mets file to generate the final version used to encode its metadata in full. the xslt used to do this co-processes the template and a separate mets file defined for each object containing all of its relevant metadata: information technology and libraries | september 2012 31 this latter file is used to populate the empty sections of the template, in particular the file section. figure 3 provides an illustration of the xslt fragment which carries out this function. figure 3. the xslt transformation file is evoked with the sample parameter, which contains the number of the sample to be rendered: this is used to generate the filename for the document function, which selects the relevant mets file containing metadata for the image itself. the element within this file, which corresponds to the required image, is then integrated into the relevant element in the template file, populating it with its subcomponents, including the element, which contains the location of the file itself. in the case of the actin_5 experiment, which generates a video file from a sequence of still images, the processes involved are slightly more complicated. because the number of still images to be processed will vary for each sample, it is not possible to specify the template for the delivery mets as an intermediary schema for a digital library of scientific multimedia | gartner 32 version of the sequence explicitly within a element as is done for the other experiments. instead, it is necessary to define a further mets file (the “sequence file”) in which the sequence for a given sample is defined. in this case, the architecture is shown in figure 4. figure 4. populating sequentially processed file section with xslt in this case the element in the mets template file acts as a placeholder only and does not encode even the skeletal information for the parallel-processed tiff files in figure 3. similarly, the structural map
for this experiment indicates only that this section is a sequence but does not enumerate the files themselves even in template form. both of these sections are populated when the file is processed by the xslt transformation to import metadata from the mets “sequence file,” information technology and libraries | september 2012 33 in which the file inventory (a) and sequential structure (b) for a given sample are listed. the xslt file populates the file section and structural map directly from this file, replacing the skeletal sections in the template with their counterparts from the sequence file. through this relatively simple xslt transformation, the final delivery version of the mets file is readily generated for either content model. this file can itself then be delivered on the fly (for instance, as a fedora disseminator); this is done by using a further xslt file to process the complex digital object components using the mechanism associated with each experiment in the mets behavior section. given the relatively small size of all of the files involved, this processing can be done more quickly than would be possibly using a fully aggregated mets approach. in the laboratory environment in particular, where the fast rendering and delivery of these images is needed so as not to impede workflows, this has major advantages. although the project aimed to examine specifically the use of fedora for the delivery of this complex material, and so employed fcm as the basis of its metadata strategy, the technique examined in this article proved itself a viable alternative that made much fewer demands on developer time. the small number of xslt stylesheets required to render and deliver the mets files were written within a few hours: the development time to program the delivery of the rdfbased metadata that formed the fcm required several weeks. processing xml using xslt disseminators in fedora is very fast, and so using this method instead of processing rdf introduces no discernible delays in object delivery. conclusions this approach to delivering complex content appears to offer the benefits of the alternative approaches outlined above in a simpler manner than either currently allows. it offers much greater flexibility than the mets element, which can only populate a complete structural map division. when compared to the fcm approach, this strategy, which relies solely on relatively simple xslt transformations for processing the metadata, requires less developer time but offers a similar degree of flexibility of structure and granularity. it also avoids much of the rigidity of the oai-ore approach by not requiring the use of connected rdf graphs, and so frees up the behavior section to define the processing mechanisms needed to deliver these objects. using the intermediary schema technique in this way does therefore offers a means of combining the advantages of employing well-defined interoperable metadata schemes and the practicalities of delivering digital content in an efficient manner, which makes limited demands on development. as such, it represents a viable alternative to the previous attempts to handle complex aggregations within mets discussed above. the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had been in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources for which they were not intended to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation ils. the authors also examine some of the next-generation ilss currently in development that purport to fill the changing needs of libraries. mets as an intermediary schema for a digital library of scientific multimedia | gartner 34 references 1 library of congress, “metadata encoding and transmission standard (mets) official web site,” 2011 http://www.loc.gov/standards/mets (accessed august 1, 2011). 2 organisation internationale de normalisation, “iso/iec jtc1/sc29/wg11: coding of moving pictures and audio,” 2002, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm (accessed august 1, 2011). 3 fedora commons, “the fedora content model architecture (cma),” 2007, http://fedoracommons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html (accessed december 9, 2011). 4 carl lagoze et al., “fedora: an architecture for complex objects and their relationships,” international journal on digital libraries 6, no. 2 (2005): 130. 5 ibid., 127. 6 ibid., 135. 7 ibid. 8 rishi sharma, fedora interoperability review (london: centre for e-research, 2007), http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 (accessed august 1, 2011). 9 richard gartner, “intermediary schemas for complex xml publications: an example from research information management,” journal of digital information 12, no. 3 (2011), https://journals.tdl.org/jodi/article/view/2069 (accessed august 1, 2011). 10 centre for e-research, “bril,” n.d., http://bril.cerch.kcl.ac.uk (accessed august 1, 2011). 11 s. j. derose et al., “what is text, really,” journal of computing in higher education 1, no. 2 (1990): 6. 12 digital library federation, “: metadata encoding and transmission standard: primer and reference manual,” digital library federation, 2010, www.loc.gov/standards/mets/metsprimerrevised.pdf, 77 (accessed august 1, 2011). 13 bill ingram, “echo dep mets profile for master mets documents,” n.d., http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml (accessed august 1, 2011). 14 susan thomas, “using mets for the preservation and dissemination of digital archives,” n.d., www.paradigm.ac.uk/workbook/metadata/mets-altstruct.html (accessed august 1, 2011). 15 library of congress. “mets profiles: metadata encoding and transmission standard (mets) http://www.loc.gov/standards/mets http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 https://journals.tdl.org/jodi/article/view/2069 http://bril.cerch.kcl.ac.uk/ http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml information technology and libraries | september 2012 35 officialweb site”, 2011. http://www.loc.gov/standards/mets/mets-profiles.html (accessed december 6, 2011). 16 open archives initiative, “open archives initiative protocol—object exchange and reuse,” n.d., www.openarchives.org/ore (accessed december 12, 2011). 17 jerome mcdonough, “aligning mets with the oai-ore data =mmodel,” jcdl ’09 proceedings of the 9th acm/ieee-cs joint conference on digital libraries (new york: association for computing machinery, 2009): 328. 18 ibid., 329. 19 gartner, “intermediary schemas.” 20 gary simons, “using architectural processing to derive small, problem-specific xml applications from large, widely-used sgml applications,” summer institute of linguistics electronic working papers (chicago: summer institute of linguistics, 1998), www.silinternational.org/silewp/1998/006/silewp1998-006.html (accessed august 1, 2011). http://www.loc.gov/standards/mets/mets-profiles.html http://www.openarchives.org/ore/ editorial board thoughts: libraries as makerspace? tod colegrove information technology and libraries | march 2013 2 recently there has been tremendous interest in “makerspace” and its potential in libraries: from middle school and public libraries to academic and special libraries, the topic seems very much top of mind. a number of libraries across the country have been actively expanding makerspace within the physical library and exploring its impact; as head of one such library, i can report that reactions to the associated changes have been quite polarized. those from the supported membership of the library have been uniformly positive, with new and established users as well as principal donors immediately recognizing and embracing its potential to enhance learning and catalyze innovation; interestingly, the minority of individuals that recoil at the idea have been either long-term librarians or library staff members. i suspect the polarization may be more a function of confusion over what makerspace actually is. this piece offers a brief overview of the landscape of makerspace—a glimpse into how its practice can dramatically enhance traditional library offerings, revitalizing the library as a center of learning. been happening for thousands of years . . . dale dougherty, founder of make magazine and maker faire, at the “maker monday” event of the 2013 american library association midwinter meeting framed the question simply, “whether making belongs in libraries or whether libraries can contribute to making.” more than one audience member may have been surprised when he continued, “it’s already been happening for hundreds of years—maybe thousands.”1 the o’reilly/darpa makerspace playbook describes the overall goals and concept of makerspace (emphasis added): “by helping schools and communities everywhere establish makerspaces, we expect to build your makerspace users' literacy in design, science, technology, engineering, art, and math. . . . we see making as a gateway to deeper engagement in science and engineering but also art and design. makerspaces share some aspects of the shop class, home economics class, the art studio and science lab. in effect, a makerspace is a physical mashup of these different places that allows projects to integrate these different kinds of skills.”2 building users’ literacies across multiple domains and a gateway to deeper engagement? surely these are core values of the library; one might even suspect that to some degree libraries have long been makerspace. a familiar example of maker activity in libraries might include digital media: still/video photography and audio mastering and remixing. youmedia network, funded by the macarthur patrick “tod” colegrove (pcolegrove@unr.edu), a lita member, is head of the delamare science & engineering library at the university of nevada, reno, nevada. mailto:pcolegrove@unr.edu editorial board thoughts: libraries as makerspace? | colegrove 3 institute through the institute of museum and library services, is a recent example of such effort aimed at creating transformative spaces; engaged in exploring, expressing, and creating with digital media, youth are encouraged to “hang out, mess around, and geek out.” a more pedestrian example is found in the support of users with first-time learning or refreshing of computer programming skills. as recently as the 1980s, the singular option the library had was to maintain a collection of print texts. through the 1990s and into the early 2000s, that support improved dramatically as publishers distributed code examples and ancillary documents on accompanying cd or dvd media, saving the reader the effort of manually typing in code examples. the associated collections grew rapidly, even as the overhead associated with the maintenance and weeding of a collection that was more and more rapidly obsoleted grew more. today, e-book versions combined with ready availability of computer workstations within the library, and the rapidly growing availability of web-based tutorials and support communities, render a potent combination that customers of the library can use to quickly acquire the ability to create or “make” custom applications. with the migration of the supporting print collections online, the library can contemplate further support in the physical spaces opened up. open working areas and whiteboard walls can further amplify the collaborative nature of such making; the library might even consider adding popular hardware development platforms to its collection of lendable technology, enabling those interested to check out a development kit rather than purchase on their own. after all, in a very real sense that is what libraries do—and have done, for thousands of years: buy sometimes expensive technology tailored to the needs and interest of the local community and make it available on a shared basis. makerspace: a continuum along with outreach opportunities, the exploration of how such examples can be extended to encompass more of the interests supported by the library is the essence of the maker movement in libraries. makerspace encompasses a continuum of activity that includes “co-working,” “hackerspace,” and “fab lab”; the common thread running through each is a focus on making rather than merely consuming. it is important to note that although the terms are often incorrectly used as if they were synonymous, in practice they are very different: for example, a fab lab is about fabrication. realized, it is a workshop designed around personal manufacture of physical items— typically equipped with computer controlled equipment such as laser cutters, multiple axis computer numerical controlled (cnc) milling machines, and 3d printers. in contrast, a “hackerspace” is more focused on computers and technology, attracting computer programmers and web designers, although interests begin to overlap significantly with the fab lab for those interested in robotics. co-working space is a natural evolution for participants of the hackerspace; a shared working environment offering much of the benefit of the social and collaborative aspects of the informal hackerspace, while maintaining a focus on work. as opposed to the hobbyist that might be attracted to a hackerspace, co-working space attracts independent contractors and professionals that may work from home. information technology and libraries | march 2013 4 it is important to note that it is entirely possible for a single makerspace to house all three subtypes and be part hackerspace, fab lab, and co-working space. can it be a library at the same time? to some extent, these activities are likely already ongoing within your library, albeit informally; by recognizing and embracing the passions driving those participating in the activity, the library can become central to the greater community of practice. serving the community’s needs more directly, opportunities for outreach will multiply even as it enables the library to develop a laser-sharp focus on the needs of that community. depending on constraints and the community of support, the library may also be well-served by forming collaborative ties with other local makerspace; having local partners can dramatically improve the options available to the library in day-to-day practice, and better inform the library as it takes well-chosen incremental steps. with hackerspace/co-working/fab lab resources aligned with the traditional resources of the library, engagement with one can lead naturally to the other in an explosion of innovation and creativity. renaissance in addition to supporting the work of the solitary reader, “today's libraries are incubators, collaboratories, the modern equivalent of the seventeenth-century coffeehouse: part information market, part knowledge warehouse, with some workshop thrown in for good measure.”3 consider some of the transformative synergies that are already being realized in libraries experimenting with makerspace across the country: • a child reading about robots able to go hands-on with robotics toolkits, even borrowing the kit for an extended period of time along with the book that piqued the interest; surely such access enables the child to develop a powerful sense of agency from early childhood, including a perception of self as being productive and much more than a consumer. • students or researchers trying to understand or make sense of a chemical model or novel protein strand able not only to visualize and manipulate the subject on a two-dimensional screen, but to relatively quickly print a real-world model to be able and tangibly explore the subject from all angles. • individuals synthesizing knowledge across disciplinary boundaries able to interact with members of communities of practice in a non-threatening environment; learning, developing, and testing ideas—developing rapid prototypes in software or physical media, with a librarian at the ready to assist with resources and dispense advice regarding intellectual property opportunities or concerns. the american libraries association estimates that as of this printing there are approximately 121,169 libraries of all kinds in the united states today; if even a small percentage recognize and begin to realize the full impact that makerspace in the library can have, the future looks bright indeed. editorial board thoughts: libraries as makerspace? | colegrove 5 references 1. dale dougherty, “the new stacks: the maker movement comes to libraries” (presentation at the midwinter meeting of the american library association, seattle, washington, january 28, 2013). http://alamw13.ala.org/node/10004. 2. michele hlubinka et al., makerspace playbook, december 2012, accessed february 13, 2012, http://makerspace.com/playbook. 3. alex soojung-kim pang, "if libraries did not exist, it would be necessary to invent them," contemplative computing, february 6, 2012, http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-benecessary-to-invent-them.html. http://alamw13.ala.org/node/10004 http://makerspace.com/playbook http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html 102 information technology and libraries | september 2010 lita committees and interest groups are being asked to step up to the table and develop action plans to implement the strategies the lita membership have identified as crucial to the association’s ongoing success. members of the board are liaisons to each of the committees, and there is a board liaison to the interest groups. these individuals will work with committee chairs, interest group chairs, and the membership to implement lita’s plan for the future. the committee and interest group chairs are being asked to contribute those actions plans by the 2011 ala midwinter meeting. they will be compiled and made available to all lita and ala members for their use through the lita website (http://lita.org) and ala connect (http://connect.ala.org). what is in it for you? lita is known for its leadership opportunities, continuing education, training, publications, expertise in standards and information policy, and knowledge and understanding of current and cuttingedge technologies. lita provides you with opportunities to develop those leadership skills that you can use in your job and lifelong career. the skills working within a group of individuals to implement a program, influence standards and policy, collaborate with other ala divisions, and publish can be taken home to your library. your participation documents your value as an employee and your commitment to lifelong learning. in today’s work environment, employers look for staff with proven skills who have contributed to the good of the organization and the profession. lita needs your participation in developing and implementing continuing education programs, publishing articles and books, and illustrating by your actions why others want to join the association. how can you do that? volunteer for a committee, help develop a continuing education program, write an article, write a book, role model for others with your lita participation, and recruit. what does your association gain? a solid structure to support its members in accomplishing the mission, vision, and strategic plan they identified as core for years to come. look for opportunities to participate and develop those skills. we will be working with committee and interest group chairs to develop meeting management tool kits over the next year, create opportunities to participate virtually, identify emerging leaders of all types, collaborate with other divisions, and provide input on national information policy and standards through ala’s office for information technology policy and other similar organizations. if you want to be involved, be sure to let lita committee and interest group chairs, the board, and your elected officers know. c loud computing. web 3.0 or the semantic web. google editions. books in copyright and books out of copyright. born digital. digitized material. the reduction of stanford university’s engineering library book collection by 85 percent. the publishing paradigm most of us know, and have taken for granted, has shifted. online databases came and we managed them. then cd-roms showed up and mostly went away. and, along came the internet, which we helped implement, use, and now depend on. how we deal with the current shifts happening in information and technology during the next five to ten years will say a great deal about how the library and information community reinvents itself for its role in the twenty-first century. this shift is different, and it will create both opportunities and challenges for everyone, including those who manage information and those who use it. as a reflection of the shifts in the information arena, lita is facing its own challenges as an association. it has had a long and productive role in the american library association (ala) dating back to 1966. the talent among the association members is amazing, solid, and a tribute to the individuals who belong to and participate in lita. lita’s members are leaders to the core and recognized as standouts within ala as they push the edge of what information management means, and can mean. for the past three years, lita members, the board, and the executive committee have been working on a strategic plan for lita. that process has been described in michelle frisque’s “president’s message” (ital v. 29, no. 2) and elsewhere. the plan was approved at the 2010 ala annual conference in washington, d.c. a plan is not cast in concrete. it is a dynamic, living document that provides the fabric that drives the association. why is this process important now more than ever? we are all dealing with the current recession. libraries are retrenching. people face challenges participating in the library field on various levels. the big information players on the national and international level are changing the playing field. as membership, each of us has an opportunity to affect the future of information and technology locally, nationally, and internationally. this plan is intended to ensure lita’s role as a “go to” place for people in the library, information, and technology fields well into the twenty-first century. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starrpresident’s message: moving forward detection of information requirements of researchers using bibliometric analyses to identify target journals vadim nikolaevich gureyev, nikolai alekseevich mazov information technology and libraries | december 2013 66 abstract bibliometric analyses were used to identify journals that are representative of the authors’ research institutes. methods to semiautomatically collect data for an institute’s publications and which journals they cite are described. citation analyses of lists of articles and their citations can help librarians to quickly identify the preferred journals in terms of the number of publications, and the most frequently cited journals. librarians can use these data to generate a list of journals that an institute should subscribe to. background recent developments in information technology have had a significant impact on the research activities of scientific libraries. such tools have provided new insights into the workload and duties of librarians in research libraries. in the present study, we performed bibliometric analyses to determine the information needs of researchers, and to determine whether they are satisfied with the journal subscriptions available at their institutes. such analyses are important because of limited funding for subscriptions, increases in the cost of electronic resources, and the publication of new journals, especially open-access journals. bibliometric analyses are more accessible and less labor-intensive when using specially designed web services and software. several databases of citation data are accessible online. the leading publishers of these databases, including thomson reuters and elsevier, promote their products such as the web of science (wos) and scopus with travelling and online seminars to increase the number of skilled users. of note, the number of articles devoted to bibliometric analysis has increased about 4-fold since 2000 (see figure 1). vadim nikolaevich gureyev (gureyev@vector.nsc.ru) is leading bibliographer, information analysis department, state research center of virology and biotechnology vector, novosibirsk, russia. nikolai alekseevich mazov (mazovna@ipgg.sbras.ru) is head of information and library center, trofimuk institute of petroleum geology and geophysics sb ras, novosibirsk, russia. mailto:mgureyev@vector.nsc.ru mailto:mazovna@ipgg.sbras.ru information technology and libraries | december 2013 67 figure 1. growth of publications devoted to informetric analysis. data were generated from the wos using the following request: «topic=((bibliometric* or informetric* or webometic* or scientometric*) and (stud* or analys*))». bibliometric analysis appears to be the most objective method for use by librarians. it is important to note that bibliometric analysis shows high objectivity, even when compared with peer review.1 citation analysis can be used to select target journals because it accurately reflects the needs of researchers and can reveal current scientific trends. it also allows librarians to evaluate the effectiveness of each journal, the significance of each journal to the institute, and the minimum archival depth.2 citation analysis is particularly useful when generating a list of journals for subscription and to determine whether to subscribe to specific journals.3 in the present study, we performed citation analyses of scientific papers that were published by researchers at src vb “vector” (biology and medicine) and ipgg sb ras (geosciences). we analyzed groups of journals that published articles from these two institutes and compared the characteristics of the cited and citing journals. many journals publish articles covering the fields associated with the two institutes (biology and medicine, and geosciences), and journals in these fields tend to have the highest impact factors of all fields. therefore, the methods applied in this study and the results may be generalized to other research libraries. detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 68 study design sources. we analyzed articles published in journals or books by researchers at src vb “vector” and ipgg, together with the references cited in these articles. we limited the articles to those published in 2006–2010 (ipgg) or 2007–2011 (src vb “vector”). we did not analyze monographs, theses, or conference proceedings (including those that were published in journals), because our aim was to optimize the list of subscribed journals. to collect comprehensive data regarding these publications, we used four overlapping sources. (1) the russian science citation index (sci) was used to retrieve articles based on the profile of each researcher. the “bound and unbound publications in one list” option was switched off. (2) thomson reuters sci expanded was used to examine the profile of each researcher. the “conference proceedings citation index” option was switched off. (3) scopus was used to retrieve the publications for each researcher. (4) each head of department provided us with the articles published by each member of the research group within the last 5 years. along with publications in which the affiliation was clearly stated, we also analyzed articles where the authors’ affiliation was not stated, the authors reported a superior organization such as a governmental ministry, and the authors from either institute attributed the work to another affiliation (if they worked at two or more organizations). the translated and original versions of the same article were treated as a single article, and the english version was used in our analyses. for journals that published the original russian article and an english translation, we analyzed the latter. citations. citations from the published articles were analyzed to identify the most frequently cited journals. we ignored references that lacked sufficient information or references included in footnotes. cited monographs, theses, and conference proceedings (including those that were published in journals) were also ignored. for citations published in russian with an english translation, we analyzed the translated version, even if the authors originally cited the russian version. we preferred to include the translated versions because they are included in wos database and we can treat them automatically. for example, the wos indexes articles from russian geology and geophysics (print issn 1068-7971) but not the russian-language version geologiya i geofizika (print issn 0016-7886). journals that had been renamed were treated as one journal, and the current/most recent name was used in the analysis. however, journals that had split into multiple titles were analyzed separately, and the journal’s name at the time the cited article was published was used in the analysis. for this study, we first retrieved the journal name and the year the cited article was published. we then expanded on this information by recording the journal publisher, journal accessibility (i.e. subscription method, paper or electronic), open/pay-per-view access, embargo status, and journal length. we ignored the accessibility of individual articles that had been deposited in the author’s personal website or in an institutional repository. information technology and libraries | december 2013 69 results and discussion table 1 summarizes the publication activities for both institutes. a. year number of articles included in russian sci* (%) included in wos (%) included in scopus* (%) nowhere indexed (%) number of journals** 2007 118 94.9 28.8 54.2 5 66 2008 84 96.4 41.6 51.1 3.5 57 2009 82 97.5 39 52.4 2.4 58 2010 100 94.0 41 61 6 60 2011 105 91.4 25.7 55.2 8.5 50 b. year number of articles included in russian sci* (%) included in wos (%) included in scopus* (%) nowhere indexed (%) number of journals** 2007 188 79.8 43.1 43.1 21 82 2008 218 96.8 39.4 41.7 3 88 2009 259 93.0 39.0 37.8 7 87 2010 250 84.4 31.2 29.6 5 102 2011 267 70.4 30.4 30.0 29 97 *the russian sci and scopus indexed some articles twice, particularly those published in russian with an english translation. therefore, some articles have different timelines and citations. these duplications were analyzed as one article. **number of journals in this field, excluding translated journals. table 1. publication activity and articles presence in the main bibliographic databases in the fields of biomedicine (a; 2007–2011) and geoscience (b; 2007–2011). table 1 shows that the two institutes have a stable publication history relative to other russian scientific institutes in terms of publication activity, in publishing approximately 150 articles per year. therefore, our results can be generalized to other institutes in these fields. collecting this information may seem to be a daunting task, especially for librarians who have not conducted such analyses before. we used three databases and contacted the heads of department detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 70 directly. however, our data indicate that it is sufficient to use the free-of-charge russian sci, an extensive index of russian scientific articles that includes almost all of the articles published by russian researchers in russian and international journals. nevertheless, it is essential to review the profile of each author. when searching for articles by affiliation, the number of articles retrieved ranged from 28% to 51%, but the number of publications retrieved tended to decrease over time. this phenomenon may be caused by a deficient system used to identify affiliations because of differences in the spelling of the affiliation name (in our case, more than 70 variants have been used), attribution of the research to a superior organization, and two or more affiliations may have the same name.4 furthermore, recent studies1,5 confirmed that information about authors should be collated by their affiliations, rather than by performing searches in bibliographic databases. it seems paradoxical that the wos and scopus databases index russian articles quicker than the russian sci. by subscribing to the same print and electronic journals, we noted that print editions are published before electronic ones. nevertheless, this seems reasonable based on this 2-year retrospective analysis. therefore, routine analysis of russian articles can be partly automated by efficient searches of the russian sci. table 2 presents the citation details. citing year number of references number of cited journals average number of references in article 2007 1830 492 15.5 2008 1354 472 16.1 2009 1536 558 18.7 2010 1591 471 15.9 2011 1613 484 15.4 table 2. number of cited articles, cited journals, and mean number of references in article in the biomedical field references from articles not indexed in wos were manually extracted, which takes time and effort. references from articles indexed in the wos, including russian articles translated into english, were analyzed semiautomatically. for this purpose, we used endnote software developed by thomson reuters. endnote web is a free alternative that could also be used for this purpose. the references cited in each article were exported into endnote. next, the references were arranged according to the chosen parameters to simplify our analyses. of note, about 35% of the russian articles indexed in wos accounted for 80% of all the references cited in the articles. two possibly reasons for this are (1) the greater number of articles published in translated and international information technology and libraries | december 2013 71 journals, and (2) russian researchers are adopting the western citing culture.6 this finding suggests that it is possible to avoid labor-intensive routine work and to use automated services developed by thomson reuters to collate and analyze up to 80% of all references. the authors of articles in the geosciences field cited 1000 journals, including 750 in western journals and 250 in russian journals. in terms of biomedical articles, the index included 1339 articles, of which 1168 were in western journals and 171 in russian journals. we analyzed about 8000 references cited by authors from each institute over 5 years. the references were divided into three equal groups. the most frequently cited russian journals and book series are listed in table 3. biological and medical sciences geosciences journals/book series percent of references total (%) journals/book series percent of references total (%) problems of virology 16.94 16.94 russian geology and geophysics 34.99 34.99 molecular biology 6.44 23.38 doklady earth sciences 18.18 53.17 biotechnology in russia 6.07 29.45 geochemistry international 6.49 59.66 doklady biological sciences 5.09 34.54 petrology 2.72 62.38 atmospheric and oceanic optics 4.42 38.96 geotectonics 2.55 64.93 annals of the russian academy of medical sciences 4.04 43 geology of ore deposits 2.23 67.16 journal of microbiology, epidemiology and immunobiology 3.82 46.82 national geology 1.98 69.14 molecular genetics microbiology and virology 2.92 49.74 stratigraphy and geological correlation 1.96 71.1 bulletin of experimental biology and medicine 2.77 52.51 izvestiya. physics of the solid earth 1.55 72.65 russian journal of bioorganic chemistry 2.69 55.2 proceedings of all-union mineralogic society 1.45 74.1 problems of tuberculosis 2.62 57.82 bulletin of the russian academy of sciences: geology 1.42 75.52 biochemistry (moscow) 1.79 59.61 lithology and mineral resources 1.42 76.94 pharmaceutical chemistry journal 1.72 61.33 oil and gas geology 1.26 78.2 infectious diseases 1.57 62.9 russian journal of pacific geology 0.82 79.02 bulletin siberian branch of russian academy of medical sciences 1.2 64.1 chemistry for sustainable development 0.69 79.71 russian journal of genetics 1.12 65.22 physics of the solid state 0.68 80.39 table 3. characteristics of the 16 most frequently cited russian journals and journals in the second group listed in order of number of citations. the journals in the colored region include one-third of all citations. the translated titles of each journal and the official translated titles of journals without translated variants are given. detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 72 table 3 shows that two-thirds of all citations were published in only 9% (16/171) of the cited russian biomedical journals. this statistic is even more pronounced in the field of geosciences, as 6% (16/250) of russian journals published 80% of the cited articles. comparing the data between the two institutes, it is notable that the results are consistent. the only difference evident to us is that the geoscience researchers tended to cite more russian journals, whereas biomedical researchers preferred to cite international literature. the greater concentration of citations to select journals in the geosciences field can be explained by the smaller number of citations. in the biomedical field, we observed a high trend towards abundant citations resulting in a wider distribution of citations in each article; the journals with the highest impact factors in biology and medicine confirmed our observation. figures 2 and 3 show the correlations between citations and publication activity in russian journals. figure 2. correlation between publication activity (red) and citations (blue) in the biomedical field (in %) for the data shown in table 3. timescale: 2007–2011. figure 3. correlation between publication activity (red) and citations (blue) in the geosciences field (in %) for the data shown in table 3. timescale: 2006–2010. information technology and libraries | december 2013 73 the citing and cited journals are often the same journals, and publication activity is highly correlated with citation activity. this is more apparent in the geosciences field, where russian geology and geophysics is the most frequently cited journal, as it published about two-thirds of all cited articles. this is unsurprising because it is published by our institute and is the main multidisciplinary russian journal in the field of geosciences. the most frequently cited international journals are listed in table 4. biological and medical sciences geosciences journals/book series percent of references total (%) journals/book series percent of references total (%) journal of virology 6.03 6.03 earth planetary science letters 6.46 6.46 proceedings of the national academy of sciences of the united states of america 3.36 9.39 geochimica et cosmochimica acta 6.28 12.74 virology 3.15 12.54 contributions to mineralogy and petrology 5.67 18.41 vaccine 2.77 15.31 journal of geophysical research 4.9 23.31 journal of biological chemistry 2.4 17.71 nature 3.67 26.98 journal of general virology 2.4 20.11 american mineralogist 3.53 30.51 nature 2.04 22.15 journal of petrology 3.22 33.73 science 1.94 24.09 lithos 2.58 36.31 journal of clinical microbiology 1.94 26.03 chemical geology 2.29 38.6 emerging infectious diseases 1.89 27.92 geology 2.01 40.61 nucleic acids research 1.59 29.51 tectonophysics 1.94 42.55 journal of infectious diseases 1.38 30.89 economic geology 1.93 44.48 detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 74 journal of molecular biology 1.35 32.24 science 1.87 46.35 journal of immunology 1.24 33.48 journal of crystal growth 1.56 47.91 journal of medical virology 1.19 34.67 canadian mineralogist 1.48 49.39 virus research 0.86 35.53 russian geology and geophysics 1.35 50.74 new england journal of medicine 0.86 36.39 european journal of mineralogy 1.32 52.06 archives of virology 0.83 37.22 geophysics 1.02 53.08 antiviral research 0.75 37.97 geophysical research letters 1.02 54.1 lancet 0.73 38.7 journal of metamorphic geology 0.98 55.08 cell 0.65 39.35 journal of geology 0.93 56.01 applied and environmental microbiology 0.6 39.95 international geology review 0.91 56.92 biochemistry 0.59 40.54 physical review. ser. b 0.9 57.82 journal of experimental medicine 0.59 41.13 precambrian research 0.9 58.72 febs letters 0.56 41.69 mineralogical magazine 0.88 59.6 table 4. characteristics of the 25 most frequently cited international journals and journals within the second group listed in terms of number of citations. the colored area includes one-third of all citations. the distribution of citations to international journals was similar to that observed for russian journals, with a greater citation density in journals in the geosciences field. notably, two-thirds of all citations were to articles published in just 25 journals. in terms of biomedical journals, twothirds of all citations were to articles published in 100 journals. only 1.3% (15/1 168) of the cited journals contained one-third of the cited articles in the biomedical field. the corresponding value for journals in the geosciences field was 0.9% (7/750). the correlations between citation activity and publication activity are shown in figures 4 and 5. information technology and libraries | december 2013 75 fig. 4. correlation between publication activity (red) and citations (blue) for biomedical journals (in %) for the data shown in table 4. timescale: 2007–2011. fig. 5. correlation between publication activity (red) and citations (blue) for journals in the geosciences field (in %) for the data shown in table 4. timescale: 2006–2010 detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 76 as illustrated in figures 4 and 5, the distribution of citations to international journals was broader than for russian journals, where there are only 1–4 frequently cited journals. this is probably due to the smaller number of russian journals than international journals. figures 4 and 5 also revealed a difference between the two disciplines, as geoscience researchers published their articles in top cited international journals, whereas biomedical researchers rarely published their research in highly cited journals. this may be due to the greater number of biomedical journals or the lower rate of publication, because relatively few articles were published in the major multidisciplinary journals, such as nature or science, or in specialized journals, such as the journal of virology. conclusion citation analysis enabled rapid identification of the most frequently cited journals that are essential to academic researchers. in the biomedical field, we found that 16 russian and 100 international journals published two-thirds of all cited articles in the last 5 years. in the field of geosciences, we identified 4 russian and 25 international journals that were essential to researchers in this field. interestingly, there were four times as many russian and international journals in the biomedical field than in the geosciences field. the journals that published the researchers’ articles were partially correlated with the cited journals in the geosciences field, but this correlation was less obvious for biomedical journals. it is important to note that all aspects of this study were performed by librarians who used tools that were available in both institutes. we did not require any additional facilities or the assistance of any researchers. we believe our method is one of the most objective and accessible approach for scientific libraries to select target journals. we used our results to optimize subscribed periodical items. in addition to journal acquisition, our methods and results may be applied to other tasks that may be performed by research libraries. for example, it is possible to study the citing and cited halflives of journals, and compare the results with those reported in the journal citation reports. this allows researchers in specific institutes to determine whether they are citing cutting edge or obsolete literature in their studies. the results can also be used to determine whether the subjects of the cited articles are relevant to the institute’s field of research. finally, the results can be used to compare the list of the most frequently cited international journals within a particular field with the list of journals that are most frequently cited by a research institute. perspectives in this study, we revealed some differences in the correlation between citing and cited journals in two distinct fields, namely geosciences and biomedical science. notably, this correlation was greater for journals in the geosciences field. to determine the factors underlying this phenomenon, it will be interesting to extend our study to a greater number of disciplines. it will also be interesting to compare data for cited journals with their usage statistics. information technology and libraries | december 2013 77 references 1. a.f.j. van raan. “the use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments.” technikfolgenabschätzung – theorie und praxis, vol. 1 no.12 (2003): 20-29. 2. n.a. slashcheva, yu.v. mokhnacheva and t.n. kharybina. (2008)/ “study of information requirement of scientists from pushchino scientific center ras in central center library.” http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20slascheva.pdf?sequence=1 (accessed january 21, 2013). 3. nikolay a. mazov. “estimation of a flow of scientific publications in research institute on the basis bibliometric citation analysis”. information technologies in social researches, no.16 (2011): 25-30. 4. leo egghe and ronald rousseau. “citation analysis”, in introduction to informetrics: quantitative methods in library, documentation and information science. amsterdam: elsevier science publishers. (1990): 217-218. 5. “bibliometrics publication analysis as a tool for science mapping and research assessment.” (2008), http://ki.se/content/1/c6/01/79/31/introduction_to_bibliometrics_v1.3.pdf (accessed january 21, 2013). 6. a.e. warshawsky and v.a. markusova. (2009). “estimation of efficiency of russian scientists should be corrected.” http://strf.ru/organization.aspx?catalogid=221&d_no=17296 (accessed january 21, 2013). http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20-slascheva.pdf?sequence=1 http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20-slascheva.pdf?sequence=1 http://ki.se/content/1/c6/01/79/31/introduction_to_bibliometrics_v1.3.pdf http://strf.ru/organization.aspx?catalogid=221&d_no=17296 communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ tending a wild garden: library web design for persons with disabilities | vandenbark 23 r. todd vandenbark tending a wild garden: library web design for persons with disabilities nearly one-fifth of americans have some form of disability, and accessibility guidelines and standards that apply to libraries are complicated, unclear, and difficult to achieve. understanding how persons with disabilities access web-based content is critical to accessible design. recent research supports the use of a database-driven model for library web development. existing technologies offer a variety of tools to meet disabled patrons’ needs, and resources exist to assist library professionals in obtaining and evaluating product accessibility information from vendors. librarians in charge of technology can best serve these patrons by proactively updating and adapting services as assistive technologies improve. i n march 2007, eighty-two countries signed the united nations’ convention on the rights of persons with disabilities, including canada, the european community, and the united states. the convention’s purpose was “to promote, protect and ensure the full and equal enjoyment of all human rights and fundamental freedoms by all persons with disabilities, and to promote respect for their inherent dignity.”1 among the many proscriptions for assuring respect and equal treatment of people with disabilities (pwd) under the law, signatories agreed to take appropriate measures: (g) to promote access for persons with disabilities to new information and communications technologies and systems, including the internet; and (h) to promote the design, development, production and distribution of accessible information and communications technologies and systems at an early stage, so that these technologies and systems become accessible at minimum cost. in addition, the convention seeks to guarantee equal access to information by doing the following: (c) urging private entities that provide services to the general public, including through the internet, to provide information and services in accessible and usable formats for persons with disabilities; and (d) encouraging the mass media, including providers of information through the internet, to make their services accessible to persons with disabilities.2 because the internet and its design standards are evolving at a dizzying rate, it is difficult to create websites that are both cutting-edge and standards-compliant. this paper evaluates the challenge of web design as it relates to individuals with disabilities, exploring current standards, and offering recommendations for accessible development. examining the provision of it for this demographic is vital because according to the u.s. census bureau, the u.s. public includes about 51.2 million noninstitutionalized people living with disabilities, 32.5 million of which are severely disabled. this means that nearly one-fifth of the u.s. public faces some physical, mental, sensory, or other functional impairment (18 percent in 2002).3 because a library’s mandate is to make its resources accessible to everyone, it is important to attend to the special challenges faced by patrons with disabilities and to offer appropriate services with those special needs in mind. n current u.s. regulations, standards, and guidelines in 1990 congress enacted the americans with disabilities act (ada), the first comprehensive legislation mandating equal treatment under the law for pwd. the ada prohibits discrimination against pwd in employment, public services, public accommodations, and in telecommunications. title ii of the ada mandates that all state governments, local governments, and public agencies provide access for pwd to all of their activities, services, and programs. since school, public, and academic libraries are under the purview of title ii, they must “furnish auxiliary aids and services when necessary to ensure effective communication.”4 though predating widespread use of the internet, the law’s intent points toward the adoption and adaptation of appropriate technologies to allow persons with a variety of disabilities to access electronic resources in a way that is most effective for them. changes to section 508 of the 1973 rehabilitation act enacted in 1998 and 2000 introduced the first standards for “accessible information technology recognized by the federal government.”5 many state and local governments have since passed laws applying the standards of section 508 to government agencies and related services. according to the access board, the independent federal agency charged with assuring compliance with a variety of laws regarding services to pwd, information and communication technology (ict) includes any equipment or interconnected system or subsystem of equipment, that is used in the creation, conversion, or duplication of data or information. the term electronic r. todd vandenbark (todd.vandenbark@utah.edu) is web services librarian, eccles health sciences library, university of utah, salt lake city. 24 information technology and libraries | march 2010 and information technology includes, but is not limited to, telecommunications products (such as telephones), information kiosks and transaction machines, world wide web sites, multimedia, and office equipment such as copiers and fax machines.6 the access board further specifies guidelines for “web-based intranet and internet information and applications,” which are directly relevant to the provision of such services in libraries.7 what follows is a detailed examination of these standards with examples to assist in understanding and implementation. (a) a text equivalent for every non-text element shall be provided. assistive technology cannot yet describe what pictures and other images look like; they require meaningful text-based information associated with each picture. if an image directs the user to do something, the associated text must explain the purpose and meaning of the image. this way, someone who cannot see the screen can understand and navigate the page successfully. this is generally accomplished by using the “alt” and “longdesc” attributes for images: “short. however, these aids also can clutter a page when not used properly. the current versions of the most popular screen-reader software do not limit the amount of “alt” text they can read. however, freedom scientific’s jaws 6.x divides the “alt” attribute into distinct chunks of 125 characters each (excluding spaces) and reads them separately as if they were separate graphics.8 this can be confusing to the end user. longer content can be put into a separate text file and the file linked to using the “longdesc” attribute. when a page contains audio or video files, a text alternative needs to be provided. for audio files such as interviews, lectures, and podcasts, a link to a transcript of the audio file must be immediately available. for video clips such as those on youtube, captions must accompany the clip. (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. this means that captions for video must be real-time and synchronized with the actions in the video, not contained solely in a separate transcript. (c) web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. while color can be used, it cannot be the sole source or indicator of information. imagine an educational website offering a story problem presented in black and green print, and the answer to the problem could be deciphered using only the green letters. this would be inaccessible to students who have certain forms of color-blindness as well as those who use screen-reader software. (d) documents shall be organized so they are readable without requiring an associated style sheet. the introduction of cascading style sheets (css) can improve accessibility because they allow the separation of presentation from content. however, not all browsers fully support css, so webpages need to be designed so any browser can read them accurately. the content needs to be organized so that it can be read and understood with css formatting turned off. (e) redundant text links shall be provided for each active region of a server-side image map, and (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. an image map can be thought of as a geometrically defined and arranged group of links to other content on a site. a clickable map of the fifty u.s. states is an example of a functioning image map. a server-side image map would appear to a screen reader only as a set of coordinates, whereas clientside maps can include information about where the link leads through “alt” text. the best practice is to only use client-side image maps and make sure the “alt” text is descriptive and meaningful. (g) row and column headers shall be identified for data tables, and (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. correct table coding is critical. each table should use the “table summary” attribute to provide a meaningful description of its content and arrangement: . headers should be coded using the table header (“th”) tag, and its “scope” attribute should specify whether the header applies to a row or a column:
or . if the table’s content is complex, it may be necessary to provide an alternative presentation of the information. it is best to rely on css for page layout, taking into consideration the directions in subparagraph (d) above. (i) frames shall be titled with text that facilitates frame identification and navigation. frames are a deprecated feature of html, and their use should be avoided in favor of css layout. (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. lights with flicker rates in this range can trigger epileptic seizures. blinking or flashing elements on tending a wild garden: library web design for persons with disabilities | vandenbark 25 a webpage should be avoided until browsers provide the user with the ability to control flickering. (k) a text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished any other way. the content of the text-only page shall be updated whenever the primary page changes. complex content that is entirely visual in nature may require a separate text-only page, such as a page showing the english alphabet in american sign language. this requirement also serves as a stopgap measure for existing sites that require reworking for accessibility. some consider this to be the web’s version of separate-but-equal services, and should be avoided.9 offering a text-only alternative site can increase the sense of exclusion that pwd already feel. also, such versions of a website tend not to be equivalent to the parent site, leaving out promotions or advertisements. finally, a text-only version increases the workload of web development staff, making them more costly than creating a single, fully accessible site in the first place. (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. scripting languages such as javascript allow for more interactive content on a page while reducing the number of times the computer screen needs to be refreshed. if functional text is not available, the screen reader attempts to read the script’s code, which outputs as a meaningless jumble of characters. using redundant text links avoids this result. (m) when a web page requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with [subpart b: technical standards] §1194.22(a) through (i). web developers need to ascertain whether a given plug-in or applet is accessible before requiring their webpage’s visitors to use it. when using applications such as quicktime or realaudio, it is important to provide an accessible link on the same page that will allow users to install the necessary plug-in. (n) when electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access information, field elements, and functionality required for completion and submission of the form, including all directions and cues. if scripts used in the completion of the form are inaccessible, an alternative method of completing the form must be made immediately available. each element of a form needs to be labeled properly using the