lib-mocs-kmc364-20140103102448 on the recursive definition of a format for communication leonid n. sumarokov: head, research department, international center for scientific information, moscow, ussr 61 a recursive presentation of a communication format is discussed and a form of pertinent notation proposed. recursive notation permits presentation of an interchange format in more general terms than heretofore published, and expands application possibilities. the development of the forms of exchange of information among documentation systems, and particularly the development of the technique of recording machine readable bibliographic data on magnetic tape, has led to the requirement for the adoption of an agreement on a standard for a format for communication. thus, the problem of a format for communication reflects the existing tendency toward ensuring compatibility among formats. at the present time the greatest impact on world information practice has been caused by the american national standard institute (ansi) standard for bibliographic information interchange on magnetic tape ( l ) and the several implementations of that standard: marc, inis, cosati and others. it should be noted that, despite numerous existing peculiarities, in principle there is no difference in structure among the formats. one of the most important requisites for a communication format is universality. the practice of processing large quantities of information has emphasized the flexibility of the above-mentioned formats; their use has permitted identification of huge numbers of documentary materials in 62 journal of library automation vol. 4/2 june, 1971 various forms, thereby creating the impression that the structure of the format has been developed to such an extent that it can be canonized for any application. it must be said that support or rejection of this impression can be based only upon future experience in the application of a communication format. nevertheless, it appears expedient to generalize about the structure of a communication format by making a few preliminary remarks and thereby contributing toward expanding the sphere of its application. the remarks deal with the following. in the existing systems for interchanging information on magnetic tape, the document is the object of identification. with the development of data banks the characteristics of the objects to be identified may prove to be so varied, even though presented in the proper documentary form, that their uniform presentation will cause difficulties. (actually, examples can be given of data banks in which data appear in the capacity of objects : information regarding firms, rivers, information about products of the electrical engineering industry, etc.). furthermore, even if it is possible to identify in principle a certain object with the aid of the format, one must distinguish between the question of possible identification in principle, and that of the optimal (or rational) form of identification in view of the limitations of a certain system. the recursive notation of a communication format is presented below. certain definitions and ideas in general are used as source material for such a notation, using the american standard for bibliographic information interchange on magnetic tape ( 1). it must be conceded that the use of one term or another for defining individual elements of a notation, as well as the general structure of the entire notation, are not the principal subject of discussion here; this means that any change, either in definition or, to a certain extent, in the structure of the notation, will not affect the proposed form of the notation. consequently, this article does not pretend to describe a certain universal structure for a communication format. it has a different purpose, viz., to point out wider perspectives that will unfold by applying the recursive presentation of notations in formats at the expense of an object with any hierarchical depth. for the following symbols explanations can be found in the ansi standard ( 1 ) : r=record l=leader dr= directory t=tag d=data, or data elements ft=field terminator, or field separator rt=record terminator, or record separator the concept tt used below, and standing for tag terminator, is analogous to ft and rt. so also is the concept sf, meaning specific fields for de· d efinition of communication formatjsumarokov 63 fining contents that did not appear in the proposed notation although utilized in actual formats. the following symbols are also used : tg=tag generalized f=field df=data field bf=bibliographic fields utilization of special notation in brackets (analogous to the form used in algorithmic languages) enables r to be defined in the form of the following consecutive structure: 1) r=[l] [dr] [sf] [bf] the symbols written in brackets after the equal sign maintain the relationship of priority. further, the recursive universal tag tg is defined as follows: 2) tg=[t;tt] such a notation indicates that the expression in brackets is t or tt. the recursiveness of the notation indicates that it is possible that tg is t1t2 ... tp :tt where p is any whole number, a larger or an equal unit. (obviously p defines the depth of the hierarchic description in accordance with the given characteristic. ) finally 3) f=:[tg] [d]; 4) df=: [f;ft] ; 5) bf=: [df;rt]. thus, the general notation of the format is expressed by 1), in which the element bf, which constitutes the basic part of the so-called alternate fields , is expressed recursively with the aid of the system 2) -5 ). as is evident, the quantity f in df, and df in bf, as well as in the case of the subscripts tg, can arbitrarily be a whole number, changing from notation to notation. reference l. "usa standard for a format for bibliographic information interchange on magnetic tape," 1 ournal of library automation, 2 (june 1969), 53-65. editorial i think that writing editorials in my job as the new editor of information technology and libraries (ital) is going to be a real piece of cake. all i have to do, dear readers, is to quote (with proper attribution) walt crawford, the title of whose book i repeat as the title of this, my inaugural editorial.1 and then quote other sages of our profession, using only as many of their words as is fitting and proper to make my editorials relevant to the concerns of our membership and readers and as few of my own words as i can to repay the confidence that the library information and technology association (lita) has placed in me— and to avoid muddling the ideas of those to whom i shall be indebted. those of you reading this will note that i have already fallen prey to the conceit of all scholarly journal editors: that their readers, of course, after surveying the tables of contents, dive wide-eyed first into the editorials. of course. to paraphrase a technologist of an earlier era, “when in the course of human events, it becomes necessary for” a new editor to take on the responsibility for the stewardship of ital, “a decent respect to the opinions of mankind requires that” he “should declare the causes which impel” him to accept that responsibility and, further, to write editorials. i quote, of course, from the first paragraph of the declaration of independence adopted by the “thirteen united states of america” july 4, 1776. in this, my first editorial, i, too, shall put forth for the examination of the members of lita and the readers of ital my goals and hopes for the journal that i am now honored to lead. these goals and hopes are shared by the members of the ital editorial board, whose names appear in the masthead of this journal. ital is a double-blind refereed journal that currently has a manuscript acceptance rate of 50 percent. it began in 1968 as the journal of library automation (jola), the journal of the information science and automation division (isad) of ala, and its first editor was fred kilgour. in 1978 isad became lita, and in 1982, the journal title was changed to reflect the expanding role of information technology in libraries, an expansion that continues to accelerate so that ital is no longer the only professional journal within ala whose pages are now dominated by our accelerating use of information technologies as tools to manage the services we provide our users and as tools we use ourselves to accomplish our daily duties. i write part of this editorial in the skies over the middle section of the united states as i return home from the seventh national lita forum held in st. louis, october 7–10. at the forum, i heard presentations, visited poster sessions, and talked with colleagues from forty-four states and six countries who had something to say and said it well. i hope that some of them may submit manuscripts to ital so that all the members of lita and all the readers of the journal will profit as well from some of what the attendees of the forum heard and saw. i attended the forum forewarned by previous ital editors to carry plenty of business cards, and i went armed with a pocketful. i think i distributed enough that, if pieced together, their blank sides would provide sufficient writing space for at least one manuscript! in an attempt to fulfill the jeffersonian promise above, i hereby list a few of my goals for the beginning of my term as editor. i must emphasize that these goals of mine supplement but do not supplant the purposes of the journal as stated on the first page and on the ital web site (www.ala.org/lita/litapublications/ital/italinformation. htm); likewise, they do not supplant the goals of my predecessors. in no particular order: i hope to increase the number of manuscripts received from our library and information schools. their faculty and doctoral students are some of the incubators of new and exciting information technologies that may bear fruit for future library users. however, not all research turns up maps on which “x marks the spot.” exploration is interesting, even vital, for the journey, for the search itself, and our graduate faculties and students have something to say. i hope to increase the submission of manuscripts that describe relevant sponsored research. in the earlier volumes, jola had an average of at least one article per issue, maybe more, describing the results of funded research. ital can and should be a source that information-technology researchers consider as a vehicle for the publication of their results. two articles in this issue result from sponsored research. in fact, i hope to increase the number of manuscripts that describe any relevant research or cutting-edge developments. much of the exploration undertaken by librarians improving and strengthening their services involves research or problems solved on both small scales and large. neither the officers of lita, the referees, the readers, nor i are interested in very many “how i run my library good” articles. we all want to read a statement of the problem(s), the hypotheses developed to explore the issues surrounding the problem(s), the research methods, the results, the assessment of the outcomes, and, when feasible, a synthesis of how the research methods or results may be generalized. i hope to increase the number of articles with multiple authors. libraries are among society’s most cooperative institutions and librarians, members of one of the most cooperative of professions. the work we do is rarely that of solitary performers, whether it be research or the editorial | webb 3 editorial: first have something to say john webb john webb (jwebb@wsu.edu) is assistant director for digital services/collections, washington state university libraries, pullman, and editor of information technology and libraries. (continued on page 21) __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 editor’s note: we have an excellent editorial board for this journal and with this issue we’ve decided to begin a new column. in each issue of ital, one of our board members will reflect on some question related to technology and libraries. we hope you find this new feature thought-provoking. enjoy! any librarian who has been following the profes-sional literature at all in the past ten years knows that there has been an increasing emphasis on user-centeredness in the design and creation of library services. librarians are trying to understand and even anticipate the needs of users to a degree that’s perhaps unprecedented in the history of our profession. it’s no mystery as to why. we now live in a world where global computer networks link users directly with information in such a way that often, no middleman is required. users are exploring information on their own terms, at their own convenience, sometimes even using technologies and systems that they themselves have designed or contributed to. at the same time, most libraries are feeling a financial pinch. resources are tight, and local governments, institutions of higher education, and corporations are all scrutinizing their library operations more closely, asking “what have you done for me lately?” the unspoken coda is “it better be something good, or i’m cutting your funding.” the increasing need to justify our existence, together with our desire to build more relevant services, is driving an increased interest in assessment. how do we know when we’ve built a successful service? how do we define “success?” and, perhaps most importantly, in a world filled with technologies that are “here today, gone tomorrow,” how do we decide which ones are appropriate to build into enduring and useful services? as a library technologist, it’s this last question that concerns me the most. i’m painfully aware of how quickly new technologies develop, mature, and fade silently into that good night with nary a trace. it’s like watching protozoa under a microscope. which of these can serve as the foundation for real, useful services? it’s obvious to me that if i’m going to choose well, it’s vital that i place these services in context—and not my context, the user context. in order to do that, i need to understand the users. how do they do their work? what are they most concerned with? how do they think about the library in relation to the research process? how do they use technology as part of that process? how does that process fit into the larger context of the assignment? to answer questions like these, librarians often turn to basic marketing techniques such as the survey or the focus group. whether we are aware of it or not, the emphasis on user-centered design is making librarians into marketers. this is a new role for us, and one that most of us have not had the training to cope with. since most of us haven’t been exposed to marketing as a discipline of study, we don’t think of what we do as marketing, even when we use marketing techniques. but that’s what it is. so whether we know it or not, marketing, particularly market research, is important to us. marketing as a discipline is in the process of undergoing some major changes right now. recent research in sociology, psychology, and neuroscience has uncovered some new and often startling insights into how human beings think and make decisions. marketers are struggling to incorporate these new models into their research methods, and to change their own thinking about how they discover what people want. i recently collided with this change when my own library decided to do a focus group to help us redesign our website. since we have a school of business, i asked one of our marketing professors for help. her advice? don’t do it. as she put it: “you and the users would just be trading ignorances.” she then gave me a reading list, which included how customers think by gerald zaltman, which i now refer to as “the book that made marketing sexy.”1 zaltman’s book pulls together a lot of the recent research on how people think, make choices, and remember. some of it is pretty mind-blowing: n 95% of human reasoning is unconscious. it happens at a level we are barely aware of. n we think in images much more than we do in language n social context, emotion, and reason are all involved in the decision-making process. without emotion, we literally are unable to make choices. n all human beings use metaphors to explain and understand the world around them. metaphor is the bridge between the rational and emotional parts of the decision-making process. n memory is not a collection of immutable snapshots we carry around in our heads. it’s much more like a narrative or story—one that we change just by remembering it. our experience of the past and present are inextricably linked—one is constantly influencing the other. heady stuff. if you follow many of these ideas to their logical conclusions, you end up questioning the value of many traditional marketing techniques, such as surveys and focus groups. for example, if the social context in 4 information technology and libraries | june 2008 kyle felker (felkerk@wlu.edu) is an ital editorial board member, 2007–09, and technology coordinator at washington and lee university library in lexington, virginia. editorial board thoughts kyle felker ital board member’s column | felker 5 which a decision is made is important, then surveys are often going to yield false data, since the context in which the person is deciding to tick off this or that box is very different from the context in which they actually decide to use or not use your service or product. asking users “what services would be useful” in a focus group won’t be effective because you are only interviewing the users’ rational thought process—it’s at least as important to find out how they feel about the service, your library, the task itself, and how they perceive other people’s feelings on the subject. zaltman proposes a number of very different marketing techniques to get a more complete picture of user decision making: n use lengthy, one-on-one interviews. interviewing the unconscious is tricky and takes trust, it’s something you can’t do in a traditional focus group setting. n use images. we think in images, and images are a richer field for bringing unconscious attitudes to the surface. n use metaphor. invite interviewees to describe their feelings and experiences in metaphor. explore the metaphors they come up with to more fully understand all the context. if this sounds more like therapy than marketing to you, then your initial reaction is pretty similar to mine. but the techniques follow logically from the research zaltman presents. how many of us have done user assessment and launched a new service, only to find a less than warm reception for it? how many of us have had users tell us they want something, only to see it go unused when it’s implemented? zaltman’s model offers potential explanations for why this happens, and methods for avoiding it. lest you think this has nothing to do with technology, let me offer an example: library facebook/myspace profile pages. there’s been a lot of debate on how effective and appropriate these are. it seems to me that we can’t gauge how receptive users are to this unless we understand how they feel about and think about those social spaces. this is exactly the sort of insight that new marketing techniques purport to offer us. in fact, if the research is right, and there is a social and emotional component to every choice a person makes, then that applies to every choice a user makes with regard to the library, whether it’s the choice to ask a question at the reference desk, the choice to use the library website, or the choice to vote on a library bond issue. librarians are doing a lot of things we never imagined we’d ever need or want to do. web design. archival digitization. tagging. perhaps it’s also time to acknowledge that what we do has an important marketing component, and to think of ourselves as marketers (at least part time). i’m sold enough on zaltman’s ideas that i’m willing to try them out at my own institution, and i encourage you to do the same. reference 1. zaltman, gerald. how customers think: essential insights into the mind of the market (boston, mass.: harvard business school press, 2003.) editorial | truitt 3 marc truitteditorial w elcome to 2009! it has been unseasonably cold in edmonton, with daytime “highs”—i use the term loosely— averaging around -25°c (that’s -13°f, for those of you ital readers living in the states) for much of the last three weeks. factor in wind chill (a given on the canadian prairies), and you can easily subtract another 10°c. as a result, we’ve had more than a few days and nights where the adjusted temperature has been much closer to -40°, which is the same in either celsius or fahrenheit. while my boss and chief librarian is fond of saying that “real canadians don’t even button their shirts until it gets to minus forty,” i’ve yet to observe such a feat of derring-do by anyone at much less than twenty below . even your editor’s two labrador retrievers—who love cooler weather—are reluctant to go out in such cold, with the result that both humans and pets have all been coping with bouts of cabin fever since before christmas. n so, when is it “too cold” for a server room? why, you may reasonably ask, am i belaboring ital readers with the details of our weather? over the weekend we experienced near-simultaneous failures of both cooling systems in our primary server room (sr1), which meant that nearly all of our library it services, including our opac (which we host for a consortium of twenty area libraries), a separate opac for edmonton public library, our website, and access to licensed e-resources, e-mail, files, and print servers had to be shut down. temperature readings in the room soared from an average of 20–22°c (68–71.5°f) to as much as 37°c (98.6°f) before settling out at around 30°c (86°f). we spent much of the weekend and beginning of this week relocating servers to all manner of places while the cooling system gets fixed. i imagine that next we may move one into each staff person’s under-heated office, where they’ll be able to perform double duty as high-tech foot warmers! all of this happened, of course, while the temperature outside the building hovered between -20° and -25°c. this is not the first time we’ve experienced a failure of our cooling systems during extremely cold weather. last winter we suffered a series of problems with both the systems in sr1 and in our secondary room a few feet away. the issues we had then were not the same as those we’re living through now, but they occurred, as now, at the coldest time of the year. this seeming dichotomy of an overheated server environment in the depths of winter is not a matter of accident or coincidence; indeed, while it may seem counterintuitive, the fact is that many, if not all, of our cooling woes can be traced to the cold outside. the simple explanation is that extreme cold weather stresses and breaks things, including hvac systems. as we’ve tried to analyze this incident, it appears likely that our troubles began when the older of our two systems in sr1 developed a coolant leak at some point after its last preventive maintenance servicing in august. fall was mild here, and we didn’t see the onset of really severe cold weather until early to mid-december. since the older system is mainly intended for failover of the newer one, and since both systems last received routine service recently, it is possible that the leak could have developed at any time since, although my supposition is that it may be itself a result of the cold. in any case, all seemed well because the newer cooling system in sr1 was adequate to mask the failure of the older unit, until it suffered a controller board failure that took it offline last weekend. but, with the failure of the new system on saturday, all it services provided from this room had to be brought down. after a night spent trying to cool the room with fans and a portable cooling unit, we succeeded in bringing the two opacs and other core services back online by sunday, but the coolant leak in the old system was not repaired until midday monday. today is friday, and we’ve limped along all week on about 60 percent of the cooling normally required in sr1. we hope to have the parts to repair the newer cooling system early next week (fingers crossed!). some interesting lessons have emerged from this incident, and while probably not many of you regularly deal with -30°c winters, i think them worth sharing in the hope that they are more generally applicable than our winter extremes are: 1. document your servers and the services that reside on them. we spent entirely too much time in the early hours of this event trying to relate servers and services. we in information technology (it) may think of shutting down or powering up servers “fred,” “wilma,” “betty,” and “barney,” but, in a crisis, what we generally should be thinking of is whether or not we can shut down e-mail, file-and-print services, or the integrated library system (ils) (and, if the latter, whether we shut down just the underlying database server or also the related staff and public services). perhaps your servers have more obvious names than ours, in which case, count yourself fortunate. but ours are not so intuitively named—there is a perfectly good reason for this, by the way—and with distributed applications where the database marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2009 may reside here, the application there, and the web front end yet somewhere else, i’d be surprised if your situation isn’t as complex as ours. and bear in mind that documentation of dependencies goes two ways: not only do you want to know that “barney” is hosting the ils’s oracle database, but you also want to know all of the servers that should be brought up for you to offer ils–related services. 2. prioritize your services. if your cooling system (or other critical server-room utility) were suddenly only operating at 50 percent of your normal required capacity, how would you quickly decide which services to shut down and which to leave up? i wrote in this space recently that we’ve been thinking about prioritized services in the context of disaster recovery and business continuity, but this week’s incident tells me that we’re not really there yet. optimally, i think that any senior member of my on-call staff should be empowered in a given critical situation to bring down services on the basis of a predefined set of service priorities. 3. virtualize, virtualize, virtualize. if we are at all typical of large libraries in the association of research libraries (and i think we are), then it will come as no surprise that we seem to add new services with alarming frequency. i suspect that, as with most places, we tend to try and keep things simple at the server end by hosting new services on separate, dedicated servers. the resulting proliferation of new servers has led to ever-greater strains on power, cooling, and network infrastructures in a facility that was significantly renovated less than two years ago. and i don’t see any near-term likelihood that this will change. we are, consequently, in the very early days of investigating virtualization technology as a means of reducing the number of physical boxes and making much better use of the resources—especially processor and ram— available to current-generation hardware. i’m hoping that someone among our readership is farther along this path than we and will consider submitting to ital a “how we done it” on virtualization in the library server room very soon! 4. sometimes low-tech solutions work . . . no one here has failed to observe the irony of an overheated server room when the temperature just steps away is 30° below. our first thought was how simple and elegant a solution it would be to install ducting, an intake fan, and a damper to the outside of the building. then, the next time our cooling failed in the depths of winter, voila!, we could solve the problem with a mere turn of the damper control. 5. . . . and sometimes they don’t. not quite, it seems. when asked, our university facilities experts told us that an even greater irony than the one we currently have would be the requirement for can$100,000 in equipment to heat that -30°c outside air to around freezing so that we wouldn’t freeze pipes and other indoor essentials if we were to adopt the “low-tech” approach and rely on mother nature. oh, well . . . n in memoriam most of the snail mail i receive as editor consists of advertisements and press releases from various firms providing it and other services to libraries. but a few months ago a thin, hand-addressed envelope, postmarked pittsburgh with no return address, landed on my desk. inside were two slips of paper clipped from a recent issue of ital and taped together. on one was my name and address; the other was a mailing label for jean a. guasco of pittsburgh, an ala life member and ital subscriber. beside her name, in red felt-tip pen, someone had written simply “deceased.” i wondered about this for some time. who was ms. guasco? where had she worked, and when? had she published or otherwise been active professionally? if she was a life member of ala, surely it would be easy to find out more. it turns out that such is not the case, the wonders of the internet notwithstanding. my obvious first stop, google, yielded little other than a brief notice of her death in a pittsburgh-area newspaper and an entry from a digitized september 1967 issue of special libraries that identified her committee assignment in the special libraries assocation and the fact that she was at the time the chief librarian at mcgraw-hill, then located in new york. as a result of checking worldcat, where i found a listing for her master’s thesis, i learned that she graduated from the now-closed school of library service at columbia university in 1953. if she published further, there was no mention of it on google. my subsequent searches under her name in the standard online lis indexes drew blanks. from there, the trail got even colder. mcgraw-hill long ago forsook new york for the wilds of ohio, and it seems that we as a profession have not been very good at retaining for posterity our directories of those in the field. a friend managed to find listings in both the 1982–83 and 1984–85 volumes of who’s who in special libraries, but all these did was confirm what i already knew: ms. guasco was an ala life member, who by then lived in pittsburgh. i’m guessing that she was then retired, since her death notice gave her age as eighty-six years. of her professional career before that, i’m sad that i must say i was able to learn no more. president’s message: ux thinking and the lita member experience rachel vacek information technologies and libraries | september 1014 1 my mind has been occupied lately with user experience (ux) thinking in both the web world and in the physical world around me. i manage a web services department in an academic library, and it’s my department’s responsibility to contemplate how best to present website content so students can easily search for the articles they are looking for, or so faculty can quickly navigate to their favorite database. in addition to making these tasks easy and efficient, we want to make sure that users feel good about their accomplishments. my department has to ensure that the other systems and services that are integrated throughout the site are located in meaningful places and can be used at the point of need. additionally, the site’s graphic and interaction design must not only contribute to but also enhance the overall user experience. we care about usability, graphic design, and the user interfaces of our library’s web presence, but these are just subsets of the larger ux picture. for example, a site can have a great user interface and design, but if a user can’t get to the actual information she is looking for, the overall experience is less than desirable. jesse james garrett is considered to be one of the founding fathers of user-centered design, the creator of the pivotal diagram defining the elements of user experience, and author of book, the elements of user experience. he believes that “experience design is the design of anything, independent of medium, or across media, with human experience as an explicit outcome, and human engagement as an explicit goal.”1 in other words, applying a ux approach to thinking involves paying attention to a person’s behaviors, feelings, and attitudes about a particular product, system, or service. someone who does ux design, therefore, focuses on building the relationship between people and the products, systems, and services in which they interact. garrett provides a roadmap of sorts for us by identifying and defining the elements of a web user experience, some of which are the visual, interface, and interaction design, the information architecture, and user needs.2 in time, these come together to form a cohesive, holistic approach to impacting our users’ overarching experience across our library’s web presence. paying attention to these more contextual elements informs the development and management of a web site. let’s switch gears for a moment. prior to winning the election and becoming the lita vicepresident/president-elect, i reflected on my experiences as a new lita member and before i became really engaged within the association. i endeavored to remember how i felt when i had joined lita in 2005. was i welcomed and informed, or did i feel distant and uninformed? was the path clear to getting involved in interest groups and committees, or were there barriers that rachel vacek (revacek@uh.edu) is lita president 2014-15 and head of web services, university libraries, university of houston, houston, texas. mailto:revacek@uh.edu president’s message | vacek 2 prevented me from getting engaged? what was my attitude about the overall organization? how were my feelings about lita impacted? luckily, there were multiple times when i felt embraced by lita members, such as participating in bigwig’s social media showcase, teaching pre-conferences, hanging out at the happy hours, and attending the forums. i discovered ample networking opportunities and around every corner there always seemed to be a way to get involved. i attended as many lita programs at annual and midwinter conferences as i could, and in doing so, ran into the same crowds of people over and over again. plus, the sessions i attended always had excellent content and friendly, knowledgeable speakers. over time, many of these members became some of my friends and most trusted colleagues. unfortunately, i’m confident that not every lita member or prospective member has had similar, consistent, or as engaging experiences as i’ve had, or as many opportunities to travel to conferences and network in-person. we all have different expectations and goals that color our personal experiences in interacting with lita and its members. one of my goals as lita president is to enhance the member experience. i want to apply the user experience design concepts that i’m so familiar with to effect change and improve the overall experience for current members and those who are on the fence about joining. to be clear, when i say lita member, i am including board members, committee members and chairs, interest group members and chairs, representatives, and those just observing on the sidelines. we are all lita members and deserve to have a good experience no matter the level within the organization. so what does “member experience” really mean? don norman, author of the design of everyday things and the man attributed with actually coining the phrase user experience, explains that "user experience encompasses all aspects of the end-user's interaction with the company, its services, and its products.” 3 therefore, i would say that the lita member experience encompasses all aspects of a member’s interaction with the association, including its programming, educational opportunities, publications, events, and even other members. i believe that there are several components that define a good member experience. first, we have to ensure quality, coherence, and consistency in programming, publications, educational opportunities, communications and marketing, conferences, and networking opportunities. second, we need to pay attention to our members’ needs and wants as well as their motivations for joining. this means we have to engage with our members more on a personal level, and discover their interests and strengths, and help them get involved in lita in ways that benefit the association as well assist them in reaching their professional goals. third, we need to be welcoming and recognize that first impressions are crucial to gaining new members and retaining current ones. think about how you felt and what you thought when you received a product that really impressed you, or when you started an exciting new job, or even used a clean and usable web site. if your initial impression was positive, you were more likely to connect with the product, environment, or even a website. if prospective and relatively new lita information technologies and libraries | september 1014 3 members experience a good first impression, they are more likely to join or renew their membership. they feel like they are part of a community that cares about them and their future. that experience became meaningful. finally, the fourth component to a good member experience is that we need to stop looking at the tangible benefits that we provide to users as the only things that matter. sure, it’s great to get discounts on workshops and webinars or be able to vote in an election and get appointed to a committee, but we can’t continue to focus on these offerings alone. we need to assess the way we communicate through email, social media, and our web page and determine if it adds or detracts from the member experience. what is the first impression someone might have in looking at the content and design of lita’s web page? do the presenters for our educational programs feel valued? does ital contain innovative and useful information? is the process for joining lita, or volunteering to be on a committee, simple, complex, or unbearable? what kinds of interactions do members have with the lita board or the lita staff? these less tangible interactions are highly contextual and can add to or detract from our current and prospective members’ abilities to meet their own goals, measure satisfaction, or define success. as lita president, and with the assistance of the board of directors, there are several things we have done or intend to do to help lita embrace ux thinking: • we have implemented a chair and vice-chair model for committees so that there is a smoother transition and the vice-chair can learn the responsibilities of the chair role prior to being in that role. • we have established a new communications committee that will create a communication strategy focused on communicating the lita’s mission, vision, goals, and relevant and timely news to lita membership across various communication channels. • we are encouraging our committees to create more robust documentation. • we are creating richer documentation that supports the workings of the board. • we are creating documentation and training materials for lita representatives to compliment the materials we have for committee chairs. • we have disbanded committees that no longer serve a purpose at the lita level and whose concerns are now addressed in groups higher within ala. • the assessment and research committee is preparing to do a membership survey. the last one was done in 2007. • we are going to be holding a few virtual and in-person lita “kitchen table conversations” in the fall of 2014 to assist with strategic planning and to discuss how lita’s goals align with ala’s strategic goals of information policy, professional development, and advocacy. • the membership development committee is exploring how to more easily and frequently reach out, engage, appreciate, acknowledge, and highlight current and prospective members. they will work closely with the communications committee. president’s message | vacek 4 i believe that we’ve arrived at a time where it’s crucial that we employ ux thinking at a more pragmatic and systematic level and treat at it as our strategic partner when exploring how to improve lita and help the association evolve to meet the needs of today’s library and informational professionals. garrett summarizes my argument nicely. he says, “what makes people passionate, pure and simple, is great experiences. if they have great experience with your product [and] they have great experiences with your service, they’re going to be passionate about your brand, they’re going to be committed to it. that’s how you build that kind of commitment.”4 i personally am very passionate about and committed to lita, and i truly believe that our ux efforts will positively impact your experience as a lita member. references 1. http://uxdesign.com/events/article/state-of-ux-design-garrett/203, garrett said this quote in a presentation entitled “state of user experience” that he gave during ux week 2009, a very popular conference for ux designers. 2. http://www.jjg.net/elements/pdf/elements.pdf 3. http://www.nngroup.com/articles/definition-user-experience/ 4. http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design, garret said this quote in a podcast interview with teresa brazen, “what the heck is user experience design??!! (and why should i care?)” http://uxdesign.com/events/article/state-of-ux-design-garrett/203 http://www.jjg.net/elements/pdf/elements.pdffunctional http://www.nngroup.com/articles/definition-user-experience/ http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design lib-mocs-kmc364-20131012113626 236 news and announcements programmers discussion group meets: pl/1, the marc format, and holdings twenty-two computer programmers, analysts, and managers met on june 29 in san francisco for the formative meeting of the lit a/isas programmers discussion group. in an informal and informative hour, the group established ground rules, started a mailing list, planned the topic for midwinter 1982, and found out more about practice<> in fifteen library-related installations. programming language usage what programming languages are used, and used primarily, at the installations? nine languages turned up, excluding database management systems (and lumping all "assembly" languages together)-but one language accounted for more than one-half of the responses: language users primary pl/1 14 13 assembler/ assembly languages 8 5 cobol 4 2 pascal 3 1 basic 1 1 c 1 1 mils (a mumps dialect) 1 fortran 0 snobol 0 note: some installations use more than one ''primary" language.) a second round of hands showed only four users with no use of pl!i. marc format usage these questions are asked on an agencyby-agency basis. one agency made no use of the marc communications format. none of those receiving marc-format tapes were unable to recreate the format. eight of the fifteen agencies made significant internal-processing use of the marccommunications-format structure, including the leader, directory, and character storage patterns; this question was made more explicit to try to narrow the answers. thus, the marc communications format is used as a processing format in a significant number of institutions. only three agencies use ascii internally, most use of marc takes place within ebcdic. (all but three agencies were using ibm 360/370 equivalent computersthe parallel is clear.) computer usage as noted, all but three agencies use ibm equivalents in the mainframe range; three of those use plug-compatible equipment such as magnuson and amdahl. the other major computers are cdc, dec/vax, and data general eclipse systems. smaller computers in use include dc, dec 11170, datapoint, and ibm series/! units. home terminals and computers four of those present currently have home terminals. three have home computers. future plaru for the discussion group the midwinter 1982 topic will be "holdings," with some emphasis on dealing with holdings formats in various technical processing systems (such as oclc, utlas, wln, rlin). an announcement and mailing list will go to all those on the mailing list, as will an october/november mailing with questions sent to the chair. those interested should send their names and addresses to walt crawford, rlg, jordan quad, stanford, ca 94305. it is anticipated that papers on the topic may be ready by midwinter; questions and comments are welcomed. note: there will be no set speakers or panelists; this will be a true disci.i.i'sion group. the topic for the philadelphia meeting will be set at midwinter 1982.-walt crawford, chair, the research libraries group, inc. channel 2000 a test of viewdata system called channel 2000 was conducted by oclc in columbus, ohio, during the last quarter of 1980. an outgrowth of the oclc research department's home delivery of library services program, channel 2000 was developed and tested to investigate technical, business, market, and social issues involved in electronic delivery of information using videotex technology. data collection throughout the test, data were collected in three ways. transaction logs were maintained, recording keystrokes of each user during the test, thus allowing future analyses and reconstruction of the test sessions. questionnaires requesting demographic information, life-style, opinion leadership, and attitudes toward channel 2000 were collected from each user in each household before, during, and after the test. six focus-group interviews were held and audiotaped to obtain specific userresponses to the information services. attitudes toward library services forty-six percent of the respondents agreed that channel 2000 saved time in getting books from the library. responding to other questions, 29 percent felt that they would rather go to a traditional library than order books through channel 2000, and 38 percent of the users felt that channel 2000 had no effect on their library allendance. forty-one percent of the channel 2000 test group felt that their knowledge of library services increased as a result of the channel 2000 test. in addition, 16 percent of the respondents stated that they spent more time reading books than they did before the test. eighty-two percent of the respondents felt that public libraries should spend tax dollars on services such as channel 2000. although this might suggest that library viewdata services should be taxbased, subsequent focus-group interviews indicated that remote use of these services should be paid for by the individual, whereas on-site use should be "free." sixtythree percent of the test population stated news and announcements 237 that they would probably subscribe and pay for a viewdata library service, if the services were made available to them off-site. purchase intent respondents were asked to rank-order the seven channel 2000 services according to the likelihood that they would pay money to have that service in their home. a mean score was calculated for each channel 2000 service, and the following table shows rank order of preference. rank order channel 2000 service 1 video encyclopedia locate any of 32,000 articles in the new academic american encyclopedia via one of three easy look-up indexes 2 video catalog browse through the videocard catalog of the public libraries of columbus and franklin county, and select books to be mailed directly to your home 3 home banking pay your bills; check the status of your checking and savings accounts; look up the balance of your visa credit card; look up your mortgage and installment loans; get current information on bank one interest rates 4 public information become aware of public and legislative information in ohio 5 columbus calendar check the monthly calendar of events for local educational and entertainment happenings 6 math that connts! teach your children basic mathematics, including counting and simple word problems 7 early reader help your children learn to read by reinforcing word relationships the final report, mailed to all oclc member libraries, was published as channel 2000: description and findings of a viewdata test conducted by oclc in columbus, ohio, october-december 1980. dublin, ohio: research department, online computer library center, inc., 1981. 21p. notis software available at the 1981 ala annual conference in san francisco, the northwestern univer238 journal of library automation vol. 14/3 september 1981 sity library announced the availability of version 3.2 of the notis computer system. intended for medium and large research libraries or groups of libraries, notis provides comprehensive online integratedprocessing capabilities for cataloging, acquisitions, and serials control. patron access by author and title has been in operation for more than a year , and version 3.2 adds subject-access capability as well as other new features. an improved circulation module and other enhancements are under development for future release. although notis, which runs on standard ibm or ibm-compatible hardware, has been in use by the national library of venezuela for several years, northwestern only recently decided to actively market the software, and provided a demonstration at the ala conference. a contract has been signed with the university of florida, and several other installations are expected within a few months. further information on notis may be obtained from the northwestern university library, 1935 sheridan rd., evanston, il 60201. bibliographic access & control system the washington university school of medicine library announces its computerbased online catalog/library control system known as the bibliographic access & control system (bacs). the system is now in operation and utilizes marc cataloging records obtained from oclc since 1975, serials records from philsom serials control network, and machine-readable patron records. features of interest in the system are: 1. patron access by author, title, subject, call number, or combination of keywords. the public-access feature has been in operation since may 1981. online instructions support system use, minimizing staff intervention. user survey indicates a high degree of satisfaction with the system. 2. low cost public access terminal with a specially designed overlay board. 3. barcode-based circulation system featuring the usual functions, including recalls for high demand items, overdue notices, suspension of circulation privileges, etc. 4. cataloging records loaded from oclc marc records by tape and from a microcomputer interface at the oclc printer port. authority control available on three levels: (a) controlled authority, i.e. , mesh or lc, (b) library-specific assigned authority, and (c) word list available to user. 5. full cataloging functions online, including editing, deleting, and entering records. 6. serials control from philsom system. philsom is an online distributed computer network that currently controls serials for sixteen medical school libraries. philsom features rapid online check-in, claims, fiscal control, union lists, and management reports. 7. five possible displays of the basic bibliographic record, varying from a brief record for the public access terminal to complete information for cataloging and reference staff. 8. two levels of documentation available online. the software is available to interested libraries, bibliographic utilities, or commercial firms. contact: washington university school of medicine library, 4580 scott, st. louis, mo 63110; (314) 454-3711. llo journal of library automation vol. 14/2 june 1981 the desperation from a downtime situation. great neck library is also planning to use the apples for other functions, which, it is hoped, will be implemented soon. multimedia catalog: com and online kenneth j. bierman: tucson public library, tucson, arizona. like many public libraries, the tucson public library (tpl) is closing its card catalog and implementing a vendorsupplied microform catalog. unlike most of these other libraries, however, the tpl microform catalog will not include', location or holding information. the indication of where copies of a particular title are actually available (i.e., which of the fifteen possible branch locations) will be available only by accessing a video display terminal connected to the online circulation and inventory control system. conceptually, the tpl catalog will be in two parts with each part intended to serve different functions.' the microform catalog (copies available in both film and fiche format) will fulfill the bibliographic function of the catalog. this catalog will contain bibliographic description and provide the traditional access points of author, title, and subject. the online catalog (online terminals are in place at all reference desks and a few public access terminals will also be available) will fulfill the finding or locating function of the catalog. this catalog will contain very brief bibliographic description and will only be searchable by author, title, author/title, and call number, and will contain the current status of every copy of every title in the library system (i.e., on shelf, checked out, at bindery, reported missing, etc.). why did the tucson public library make this decision? there are two major reasons: l. accuracy . the location information, if provided in the microform catalog, would always be inaccurate and out of date. assuming that the locations listed in the latest edition of the microform catalog were completely accurate when the catalog was first issued (an unrealistic assumption to begin with as anyone who has ever worked with location information at a public library with many branches well knows!), the location information would become increasingly less accurate with each day because of the large number of withdrawals, transfers, and added copy transactions that occur (more than 100,000 a year) . in addition, at any given time, one-quarter to one-third of the materials in busy branches are not on the shelf because they are either checked out or waiting to be reshelved . thus, the microform catalog would indicate that these materials were available at specific branches when a significant percentage would in fact not be available at any given time. in short, even in the best of circumstances, easily half of the location information would be incorrect in telling a user where a copy of a title was actually available at that moment. 2. cost , a study done at the tucson public library indicated that close to half of the staff time of the cataloging department was spent dealing with location and holding information. this time includes handling transfers, withdrawals, and added copies. all of this record keeping is already being done as a part of the online circulation and inventory control system (the tucson public library has no card shelflist containing copy and location information but rather relies completely on the online file for this type of information). to "duplicate" the information in the microform catalog would cost an estimated $40,000 to $60,000 a year and the information in the microform catalog would never be accurate or up to date for the reasons outlined above . figure 1 is a brief summary of how the bibliographic system will work. would the system in figure 1 be improved if holdings were included in the microform catalog? on the surface, the obvious answer is yes-more information is communications 111 known-item search (37 percent of tpl catalog use according to catalog use survey conducted at the tpl in 1971) user searches microform catalog by author and/or title. if user does not find desired bibliographic entry, user either leaves unsatisfied or goes to desk (or public access terminal) for help. if user finds the desired bibliographic entry, he/she writes down call number (or author for fiction) and proceeds to shelf. if user finds book on shelf he/she checks it out. if user does not find book on shelf, user either leaves unsatisfied or goes to desk (or public access terminal) to obtain holdings information or ask for help (put on reserve, borrow from another library, possible purchase of additional copies, etc.). subject search (63 percent of tpl catalog use by public according to catalog use survey conducted at the tpl in 1971) user searches microform catalog. user writes down call number(s) and proceeds to shelf. if user finds appropriate material(s), he/she checks it out. if user does not find appropriate material he/she leaves unsatisfied or goes to desk for help (reference interview, etc.) . fig. 1. summary of how system will work. always better. but, if we examine the situation in depth, perhaps not. let us look at some hypothetical situations. if the user is doing a search and does not find the desired entry/entries in the microform catalog, it makes no difference whether holdings are included in the catalog. the user will still either leave unsatisfied or go to the desk for help. if the user is doing a known-item search and finds the desired item and notes, and the agency he/she is at is listed as a holding agency, he/she will proceed to the shelf. if the desired material is found, fine . if not (because the material is checked out, reported missing, or withdrawn), he/she will either leave unsatisfied or go to the desk (or public access terminal) for help. if the user is doing a known-item search and finds the desired item in the microform catalog but notes that the agency is not listed as a holding agency, what are his/her choices? the user can go away unsatisfied without checking the shelves (although there may be a copy on the shelf because a copy may have been added to that agency since the microform catalog was last recumulated) or he/she can go to the desk (or public access terminal) to obtain help; here he/she will have access to the "real" holdings information--on the online system. the user could notice from the holdings in the microform catalog that another branch has the item and drive to the other branch. however, when the user gets there he/she may discover that the item is not available-information that could have been found in the online system at the original branch if he/she had gone to the desk (or public access terminal). · the purpose of the above exercise is to demonstrate that in all cases the user is still going to require access to the online catalog in order to determine holdings more accurately. with time, this access will become increasingly self-service through public access terminals. from the user's point of view, providing inaccurate holdings in the microform catalog does very little good and can actually do harm by leaving the impression that, if a library is listed as a holding library, that library will have the item (a false conclusion because of checkouts, reported missings, and withdrawals) or leaving the impression that if a library is not listed as a holding library, that library will not have the item (a false conclusion because a copy could have been added recently but that fact is not yet reflected in the microform catalog) . if the user is doing a subject search, holdings are of less value in the catalog 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online . in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202-8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky . however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work, now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in lilib-s-mocs-kmc364-20140601052623 137 technical note help: the automated binding records control system an interesting new aspect of library automation has been the appearance of commercial ventures established to provide for an effective use of the new ideas and techniques of automation and related fields. some of these ventures have offered the latest in information science research and development techniques, such as systems analysis, management planning, and operations research. others have offered services based on new procedures, for example, computer-produced book catalogs, selective dissemination of information services, indexing and abstracting activities, mechanized acquisitions, and catalog card production systems. one innovation is a new technique devised for libraries to reduce the clerical effort required to prepare materials for binding and to maintain the necessary related records. the technique is called help, the heckman electronic library program. it was developed by the heckman bindery of north manchester, indiana, with the cooperation of the purdue university libraries. it was recognized by heckman's management that the processing of 10,000 to 20,000 periodicals weekly and the maintenance of over 250,000 binding patterns would soon become too unwieldy and costly unless more efficient procedures were developed. it was additionally realized that any new system should also be designed as a means to aid libraries with their interminable record-keeping problems. the latter purpose could be accomplished by providing a library with detailed and accurate information regarding each periodical it binds, and by simplifying the library's method of preparing binding slips for the bindery. in the fall of 1969, after a detailed analysis, the heckman bindery management began the development and programming of a computerized binding pattern system. this system was a result of a team effort involving management, sales, and production departments. john pilkington, data processing manager, directed the installation of the system and earl beal performed the necessary programming functions. in december of 1971 approx imately 700 libraries were using the system, and about 100,000 binding patterns were in the data file . 138 journal of library automation vol. 5/2 june, 1972 as the system was developed, a library's binding pattern data were converted to machine-readable form which then made it possible for the bindery automatically to provide nearly complete binding slips for each periodical title bound. in addition, the system provides an up-to-date pattern record for the libraries' files, and the bindery maintains the resultant data bank of pattern records as the library notifies it of additions, changes, and deletions. in this manner, the bindery expects to establish an efficient method for purging the file of out-of-date information. the system revolves around four forms: the binding pattern index card, the binding slip, the variable posting sheet, and the binding historical record. the binding pattern index card (figure 1) is a 5" x 8w' card, pink in color, which is a computer printout. one of these cards is retained in the library as its pattern record for each set of each periodical bound by the library. the data given on the card are essentially the same as those maintained by most libraries in their manual pattern £les, except that more detail is provided by the help system, and the library does not maintain the record-the bindery does-in machine-readable form. as changes are made to the patterns, the library clerk simply crosses out the old data on the appropriate binding slip and writes in the new data. when the bindery receives the binding slip, a new index card is produced, among other records, and forwarded to the library with the returned shipment of bound volumes. the system also provides for one-time changes that do not affect the pattern record. the data contained on the index cards include the library account number, the library branch or department code, the pattern number, color, type size, stamping position, title (vertical or horizontal spine positions), labels, call number, library imprint, and collating instructions. the collating instructions, which are listed in the instruction manual provided by the bindery, are given as a series of numeric codes. asterisks are used to indicate the end of a print line. the binding slips are also 5" x 8}2'' forms, but they are four-part multiple forms, of which three parts are sent to the bindery with the periodical to be bound, and one part, a card form, is retained by the library as its "at bindery" record. the information required by the binding slip is essentially the same as that included on the index card. the library, however, must provide the variable data such as volume number(s), date(s), month(s), or whatever information is required to identify a specific volume. the variable posting sheet (figure 2) is an 8)~" x 11" form that is used by the library when it sends several volumes or copies of a volume to the bindery at the same time. since the bindery cannot determine beforehand the number of physical volumes of a title a library will want to send for binding at a given time, it sends to the library only one printed-out binding slip to be used for the next volume of a given serial. if multiple volumes of -r-~---------------------------------------------:r-0 pattern cust. acct . no. i lib rar y' i pattern no. i •• 1. colo~. . , i trim i spine icust . pat. no. 'i' 0 i t'"e slot or i i otui i i i size start i i ::: library i 0 <( ~ ..... oo z z 0 i <( 0 post • 0 0 0 0 ,, ' 0 -o ·-' 0 ·~ f ·o 0 ~ accents i ~ z ~ to i : i rr.· llol ~ !ii llol i: i o i z <( ::e i i u z ·> i a: lli 0 i ~ i id z <( :1 :iii: i ~ x lli x i ... v e • t i c a l f r l 0 a n 0 t e l 0 • • call impriiiit panel l,..lllll$ coll.atl 8 len . s£w p£rma · film vol.. oty. 1 ovt• u: " u fiiioui u•• nu... 0, ~: r tal"[ stui filler sep. covea stu r stu8 w/stui sheets 11111 papu y x title i 0 ~ i f '" required i ~ i new title i i : i 0$ sample i i q. or rub job no. cover no. 0 c o: .. . : 3 0~ : • ! of 'i q_l ' + ~--------. 'f•a ( i o 1 -. }, 4 fig. 1. binding pattern index card. 140 journal of library automation vol. 5/ 2 june, 1972 binding patiern variable posting sheet 1he. heckman bin'de.~y, inc. cust. acct. no. 1 ~.18rj.rv rattern no.,l-israrv name periodicalname 'post patterw variabl-e information from \.eft to right in seqi./li:nc'e i z 3 4 5 . 6 ; .... '-......_~ .... "-....... ,_......-"'\_'-~r··-~ ........... ..____.._ · -l, )~ i / -~ fig. 2. variable posting sheet. a set are to be bound, the library clerk provides the variable information for the first volume by using the single binding slip, and the variable data for each additional volume of the same title are posted by the clerk on the posting sheet. the bindery will automatically produce from its pattern data bank the binding slips necessary for binding the additional volumes that are listed on the posting sheet. the binding historical record (figure 3) is a form provided for the use of the library if it desires a permanent record of every volume bound. the use of this form is not required by the system; it is simply a convenience record for the library binding staff. the form is printed on the back of the pattern index card. spaces are provided for volume, year, and date sent to the bindery, and most of the back of the card is available for posting. all data fields are of fixed length with the maximum size of the records at 328 characters. some of the data formats are shown in figure 4. a few of the data fields in the example need additional explanation. the fifth field labeled "print" refers to the color of the spine stamping, i.e., gold, black, or white. the "trim #1 & 2" fields are for bindery use only, and indicate volume size within certain groups for printing purposes. the "spine" field is also for bindery use, and it indicates the size of type that can be used according to the width of the spine. "product no." refers to certain types of publications such as magazines, matched sets, or items which will be pamphlet (inexpensively) bound. i i 0 0 0 0 0 0 0 0 0 0 title : publisher ' s address: volume year -------------------· binding record 0 0 date sent volume year date sent 0 0 0 0 0 0 0 0 fig. 3. binding historical record. ,..--1 i i i i i i i i i ibr., print punch program control card print punch program control card print punch program control card l----96 column cal card name ______________ _ i 12 1314 15 1 6 171 81 91 10 11 112 113 14l15l16 l11 l18l 19 20 l21l22 l23l24l25l26 l21 l28 l29 l30 131132133 134 35136 3ji38 l39 l40 1411421 43144145 l i ' print line 1 i p ier 1 ' t i i cust. no. lib pattern p mat. trim ~im s customer no. no. r #i p 1 i i i pattern ' n i n i t i e no. i ' i 2j 3 4j5j_6 1 18 19 110 11112113 14li5l 16 l11 l1 8 ll 9 20121 122 23124125126 21128129130 31132133134 35136 31138139140 i 411 42 143144 145 i ii i i i i i ii i i i i i ill ill ii i i ii i ill ii card name-------------i 2 13} 4 }5}6 1 i 8 i 9 i 101 ii 1121 131 14115 116111 118 119 20121122123124125126 1211281 29 130 i 311 32 133 i 34 i 35 l36l31 l38l39l40 141 42143144 14511 l i ' print line 1 i p,, er 1 i ti ' i cust. no. lib pattern i ' ' no. no. i !2 ' ) collate (con~.) -~ ' i i i i i 2 i 3 4 i 5 i 6 118 19 }10 11}12 }13 j4li5l16 i 11 i 18 i 19 20 i 21 i 22123 i 24125126121 i 28129130 131132133 i 34 i 35136 131 138 i 39140 i 41 42 143 14414511 i i i i ii i i , iiiii i i i i i 11 i i i i 11 111111 i i ii ii card name ______________ _ 1 2 i 3 14 i sl 6 i 1 i sl 91 10 111 112 j 13 14l15l 16l11 l1 8}19 20 }21}22}23}24}25}26}21}28 }29 }30 l31 }32 }33}34l35l36l37j38 l39l40 ]41 ]42 143 144]45~ print line 1 pr ier 1 ti cust. no. lib pattern i ~ no. no. 5 i 2 i 3 41516 1 i 8 i 9 i 10 ii 112113 14115 i 16117 i 18 i 19 20121 i 22123124125126 i 27128129130 i 31 i 32 133 134 i 35136131138 i 39140 i41i42i43i44i4sl~ i i i ijj ll . ill i i i i l l i i l i ll l ill l l l l l l l l l i ll fig. 4. data formats. ----, 1 j multiple layout form print lines 3 and 4 tier3 gx21·9088·0 um /050 " pnnted •n u s a "no of,offt'is_,.,~,..\w~l,.. "1f.---------'--collate----------------l 11 line 2 print lines 3 and 4 r2 tier 3 ----------------------~----variable ------------------------~----~ 1t line 2 print lines 3 and 4 r2 tier3 -----variable (contt) ------------------~ ' ' i i i i ~ l i i i i _____ j 144 journal of library automation vol. 5/ 2 june, 1972 l. lllrary hamil!. 'uft. acct. no, llll r: how 80unp i pro.~~~ ;::;:. j'iittflll no.,i'itiny i"ayijtta..:l trim i ~''ni-l cu$ t. i"ayteitn no. i !rvpe nor dr 'patter-n pr.l)o.itlng~t::tu p sixe hart ~ lfor.ix:olhal. lv veil tical ' i fr? fronl' or. labels variable fgl cafyions call ~ c impjlinl' ~ i panel. ~ line:s p collatingom ~ fig. 5. pattern printing setup. technical note / hammer 145 one additional form used in the system is for heckman's internal operations. that is a data input form known as the "pattern printing setup" (figure 5). this form is used by the bindery's input clerks to prepare new binding patterns for conversion to machine-readable form. the data prescribed by the form is much like that required by the binding pattern index card, except that data tags are shown for keypunching purposes. the system operates on an ibm system 3 computer with two 5445 disk drives and a 1403nl printer. the disk drives provide a total of 40,000,000 characters of on-line storage in addition to the 7,500,000 usable characters provided by the system 3 itself. five 5496 data recorders are used for data conversion. the programs are written in rpg2. the development of computer-oriented commercial services for libraries suggests that, perhaps if librarians wait long enough, they will not have to automate their libraries as commercial ventures will do it for them. the rapid appearance of systems-analysis firms, commercial and societal abstracting and indexing services, management and planning consulting groups, and data processing service bureaus tends to bear this theory out. at the very least, libraries will not be able to automate internally without providing for the incorporation of such ready services into their systems. when a service such as help is made available at no additional charge, there is no way for libraries to avoid automation. donald p. hammer donald p. hammer is associate director for library and information systems, university of massachusetts library, amherst. at the time the system d escribed in this article was developed, mr. hammer was the head of libraries systems development at purdue university. 100 information technology and libraries | june 2009 tutorial andrew darby and ron gilmour adding delicious data to your library website social bookmarking services such as delicious offer a simple way of developing lists of library resources. this paper outlines various methods of incorporating data from a delicious account into a webpage. we begin with a description of delicious linkrolls and tagrolls, the simplest but least flexible method of displaying delicious results. we then describe three more advanced methods of manipulating delicious data using rss, json, and xml. code samples using php and javascript are provided. o ne of the primary components of web 2.0 is social bookmarking. social bookmarking services allow users to store bookmarks on the web where they are available from any computer and to share these bookmarks with other users. even better, these bookmarks can be annotated and tagged to provide multiple points of subject access. social bookmarking services have become popular with librarians as a means of quickly assembling lists of resources. since anything with a url can become a bookmark, such lists can combine diverse resource types such as webpages, scholarly articles, and library catalog records. it is often desirable for the data stored in a social bookmarking account to be displayed in the context of a library webpage. this creates consistent branding and a more professional appearance. delicious (http://delicious .com/), one of the most popular social bookmarking tools, allows users to extract data from their accounts and to display this data on their own websites. delicious offers multiple ways of doing this, from simply embedding html in the target webpage to interacting with the api.1 in this paper we will begin by looking at the simplest methods for users uncomfortable with programming, and then move on to three more advanced methods using rss, json, and xml. our examples use php, a cross-platform scripting language that may be run on either linux/ unix or windows servers. while it is not possible for us to address the many environments (such as cmses) in which websites are constructed, our code should be adaptable to most contexts. this will be especially simple in the many popular php–based cmses such as drupal, joomla, and wordpress. it should be noted that the process of tagging resources in delicious requires little technical expertise, so the task of assembling lists of resources can be accomplished by any librarian. the construction of a website infrastructure (presumably by the library’s webmaster) is a more complex task that may require some programming expertise. linkrolls and tagrolls the simplest way of sharing links is to point users directly to the desired andrew darby (adarby@ithaca.edu) is web services librarian, and ron gilmour (rgilmour@ithaca.edu) is science librarian at ithaca college library, ithaca, new york. figure 1. delicious linkroll page adding delicious data to your library website | darby and gilmour 101 delicious page. to share all the items labeled “biology” for the user account “iclibref,” one could disseminate the url http://delicious.com/iclibref/ biology. the obvious downside is that the user is no longer on your website, and they may be confused by their new location and what they are supposed to do there. linkrolls, a utility available from the delicious site, provides a number of options for generating code to display a set of bookmarked links, including what tags to display, the number, the type of bullet, and the sorting criterion (see figure 1).2 this utility creates simple html code that can be added to a website. a related tool, tagrolls, creates the ubiquitous delicious tag cloud.3 for many librarians, this will be enough. with the embedded linkroll code, and perhaps a bit of css styling, they will be satisfied with the results. however, delicious also offers more advanced methods of interacting with data. for more control over how delicious data appears on a website, the user must interact with delicious through rss, json or xml. rss like most web 2.0 applications, delicious makes its content available as rss feeds. feeds are available at a variety of levels, from the delicious system as a whole down to a particular tag in a particular account. within a library context, the most useful types of feeds will be those that point to lists of resources with a given tag. for example, the request http://feeds.delicious.com/rss/iclibref/biology returns the rss feed for the “biology” tag of the “iclibref” account, with items listed as follows: darwin’s dangerous idea (evolution 1) 2008-0409t18:40:00z http://icarus.ithaca .edu/cgi-bin/pwebrecon. cgi?bbid=237870 iclibref this episode interweaves the drama in key moments of darwin&#039;s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001 biology to display delicious rss results on a website, the webmaster must use some rss parsing tool in combination with a script to display the results. the xml_rss package provides an easy way to read rss using php.4 the code for such an operation might look like this: parse(); foreach ($rss->getitems() as $item) { echo “
” . $item[‘title’] . “
”; } ?> this code uses xml_rss to parse the rss feed and then prints out a list of linked results. rss is designed primarily as a current awareness tool. consequently, a delicious rss feed only returns the most recent thirty-one items. this makes sense from an rss perspective, but it will not often meet the needs of librarians who are using delicious as a repository of resources. despite this limitation, the delicious rss feed may be useful in cases where currency is relevant, such as lists of recently acquired materials. json a second method to retrieve results from delicious is using javascript object notation or json.5 as with the rss feed method, a request with credentials goes out to the delicious server. the response returns in json format, which can then be processed using javascript. an example request might be http://feeds.delicious . c o m / v 2 / j s o n / i c l i b r e f / b i o l o g y . by navigating to this url, the json response can be observed directly. a json response for a single record (formatted for readability) looks like this: delicious.posts = [ {“u”:“http:\/\/icarus.ithaca .edu\/cgi-bin\/pwebrecon .cgi?bbid=237870”, “d”:“darwin’s dangerous idea (evolution 1)”, “t”:[“biology”], “dt”:“2008-04-09t06:40:00z”, “n”:“this episode interweaves the drama in key moments of darwin’s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001”} ]; it is instructive to look at the json feed because it displays the information elements that can be extracted: “u” for the url of the resource, “d” for the title, “t” for a comma-separated list of related tags, “n” for the note field, and “dt” for the timestamp. to display results in a webpage, the feed is requested using javascript: 102 information technology and libraries | june 2009 then the json objects must be looped through and displayed as desired. alternately, as in the script below, the json objects may be placed into an array for sorting. the following is a simple example of a script that displays all of the available data with each item in its own paragraph. this script also sorts the links alphabetically. while rss returns a maximum of thirty-one entries, json allows a maximum of one hundred. the exact number of items returned may be modified through the count parameter at the end of the url. at the ithaca college library, we chose to use json because at the time, delicious did not offer the convenient tagrolls, and the results returned by rss were displayed in reverse chronological order and truncated at thirty-one items. currently, we have a single php page that can display any delicious result set within our library website template. librarians generate links with parameters that designate a page title, a comma-delimited list of desired tags, and whether or not item descriptions should be displayed. for example, www.ithacalibrary.com/research/delish_feed. php?label=biology%20films&tag=bio logy,biologyi¬es=yes will return a page that looks like figure 2. the advantage of this approach is that librarians can easily generate webpages on the fly and send the url to their faculty members or add it to a subject guide or other webpage. the php script only has to read the “$_get” variables from the url and then query delicious for this content. xml delicious offers an application programming interface (api) that returns xml results from queries passed to delicious through https. for instance, the request https://api.del.icio.us/v1/posts/ recent?&tag=biology returns an xml document listing the fifteen most recent posts tagged as “biology” for a given account. unlike either the rss or the json methods, the xml api offers a means of retrieving all of the posts for a given tag by allowing requests such as https://api.del.icio.us/v1/ posts/all?&tag=biology. this type of request is labor intensive for the delicious server, so it is best to cache the results of such a query for future use. this involves the user writing the results of a request to a file on the server and then checking to see if such an archived file exists before issuing another request. a php utility called deliciousposts, which provides caching functionality, is available for free.6 note that the username is not part of the request and must be supplied separately. unlike the public rss or json feeds, using the xml api requires users to log in to their own account. from a script, this can be accomplished using the php curl function: $ch = curl_init(); curl_setopt($ch, curlopt_ url, $queryurl); curl_setopt($ch, curlopt_ userpwd, $username . “:” . $password); curl_setopt($ch, curlopt_ returntransfer, 1); $posts = curl_exec($ch); curl_close($ch); this code logs into a delicious account, passes it a query url, and makes the results of the query available as a string in the variable $posts. the content of $posts can then be processed as desired to create web content. one way of doing this is to use an xslt stylesheet to transform the results into html, which can then be printed to the browser: /* create a new dom document from your stylesheet */ $xsl = new domdocument; $xsl->load(“mystylesheet.xsl”); /* set up the xslt processor */ $xp = new xsltprocessor; $xp->importstylesheet($xsl); /* create another dom document from the contents of the $posts variable */ $doc = new domdocument; $doc->loadxml($posts); /* perform the xslt transformation and output the resulting html */ $html = $xp>transformtoxml($doc); echo $html; conclusion delicious is a great tool for quickly and easily saving bookmarks. it also offers some very simple tools such as linkrolls and tagrolls to add delicious content to a website. but to exert more control over this data, the user must interact with the delicious api or feeds. we have outlined three different ways to accomplish this: rss is a familiar option and a good choice if the data is to be used in a feed reader, or if only the most recent items need be shown. json is perhaps the fastest method, but requires some basic scripting knowledge and can only display one hundred results. the xml option involves more programming but allows an unlimited number of results to be returned. all of these methods facilitate the use of delicious data within an existing website. references 1. delicious, tools, http://delicious .com/help/tools (accessed nov. 7, 2008). 2. linkrolls may be found from your delicious account by clicking settings > linkrolls, or directly by going to http:// delicious.com/help/linkrolls (accessed nov. 7, 2008). 3. tagrolls may be found from your delivious account by clicking settings > tagrolls or directly by going to http:// delicious.com/help/tagrolls (accessed nov. 7, 2008) 4. martin jansen and clay loveless, “pear::package::xml_rss,” http://pear .php.net/package/xml_rss (accessed november 7, 2008). 5. introducing json, http://json.org (accessed nov. 7, 2008). 6. ron gilmour, “deliciousposts,” h t t p : / / r o n g i l m o u r. i n f o / s o f t w a r e / deliciousposts (accessed nov. 7, 2008). lita cover 2, cover 3, cover 4 mit press 92 index to advertisers introducing zoomify image | smith 25 column title editor author id box for 3 column layout communications “just in case” answers: the twenty-first-century vertical file | dalrymple 25 tam dalrymple “just-in-case” answers: the twenty-first century vertical file this article discusses the use of oclc’s questionpoint service for managing electronic publications and other items that fall outside the scope of oclc library’s opac and web resources pages, yet need to be “put somewhere.” the local knowledge base serves as both a collection development tool and as a virtual vertical file, with records that are easy to enter, search, update, or delete. we do not deliberately collect for the vertical file, but add to it day by day the useful thing which turns up. these include clippings from newspapers, excerpts from periodicals . . . broadsides that are not injured by folding . . . anything that we know will be used if available. —wilson bulletin, 1919 i nformation that “will be used if available” sounds like the contents of the internet.1 as with libraries everywhere, the oclc library has come to depend on the internet as an almost limitless resource. and like libraries everywhere, it has confronted the advantages and disadvantages of that scope. this means that in addition to using the opac and oclc library’s webpages, oclc library staff have used a mix of bookmarks, del.icio.us tags, and post-it® notes to keep track of relevant, authoritative, substantive, and potentially reusable information. much has been written about the use of questionpoint’s transaction management capabilities and of the important role of knowledge bases in providing closure to an inquiry. in contrast, this article will look at questionpoint’s use as a management tool for future questions, for items that fall outside the scope of oclc library’s opac and web resources pages yet need to be “put somewhere.” the questionpoint local knowledge base is just the spot for these new vertical file items. about oclc library oclc is the world’s largest nonprofit membership computer library service and research organization. more than 69,000 libraries in 112 countries and territories around the world use oclc services to locate, acquire, catalog, lend, and preserve library materials. oclc library was established in 1977 to provide support for oclc’s mission. the collection concentrates on library, information and computer sciences, business management, and has special collections that include the papers of frederick g. kilgour and archives of the dewey decimal classification™. oclc library has a distinct clientele to which it offers a complete range of services—print and electronic collections, reference, interlibrary loan—within its subject areas. because of the nature of the organization, the library supports longterm and collaborative research, such as that done by oclc programs and research staff, as well as the immediate information needs of product management and marketing staff. oclc library also provides information to oclc’s other service areas, such as finance and human resources. while most oclc library acquisitions are done on demand, oclc library selects and maintains an extensive collection of periodicals, journals, and reference resources, most of them online and accessible—along with the opac—to oclc employees worldwide from the library’s webpages (see figure 1). often, however, oclc staff, like those of many organizations, are too busy to consult these resources themselves and thus depend on the library. oclc library staff pursue the answers to such research questions through its collections and look to enhance the collections with “anything that we know will be” of use. one of the challenges is keeping track of the “anything” that falls outside the library’s primary collections scope; questionpoint helps with that task. traditional uses of questionpoint questionpoint is a service that provides question management tools aimed at increasing the visibility of reference services and making them more efficient. oclc library uses many of those tools, but there are significant ones it does not use (for example, chat). and although the library’s questionpoint-based aska link is visible by default on the front page of the corporate intranet as well as on oclc library–specific pages, less than than 8 percent of questions over the last year were received through that link. one reason for this low use may be that for most of oclc library’s history, e-mail has been the primary contact method, and so it remains. even when the staff need clarification of a question, they automatically opt for telephone or e-mail messaging. working with a web form and question-and-answer software has not caught on as a replacement for these more established methods. however, questionpoint remains tam dalrymple (dalrympt@oclc.org) is senior information specialist at oclc, dublin, ohio. 26 information technology and libraries | december 200826 information technology and libraries | december 2008 the reference “workspace.” when questions come in through e-mail or phone, librarians enter them into questionpoint, using it to add notes and keep track of sources checked. completed transactions are added to the local knowledge base. (because their questions involve proprietary matters, many special libraries do not add their answers to the global knowledge base, and oclc library is no exception. the local knowledge base is accessible only by oclc library staff.) not surprisingly, most of the questions received are about libraries, museums, and other cultural institutions, their collections, users, and staff. this means that the likelihood of reuse of the information in the oclc library knowledge base is relatively high, and makes the local knowledge base an early stop in the reference process. though statistics vary widely by individual institutions and type of library—and though some libraries have opted not to use the knowledge base—the average ratio for all questionpoint libraries is about one knowledge base search for every three questions received. in contrast, in the past year oclc library staff averaged 4.2 local knowledge base searches for every three questions received. the view of the questionpoint knowledge base as a repository of answers to questions that have been asked is a traditional one. oclc library’s use of the questionpoint knowledge base in anticipation of information needs of its clients—as a way of collection development—is distinctive. in many respects this use creates an updated version of the oldfashioned vertical file. nontraditional uses of questionpoint just-in-case the vertical file has a quirky place in the annals of librarianship. it has been the repository for facts and information too good to throw away but not quite good enough to catalog. h. w. wilson still offers its vertical file index, a specialized subject index to pamphlets issued on topics often unavailable in book form, which began in 1932. by now, except for special collections, the internet has practically relegated the vertical file to the backroom with the card platens and electric erasers. oclc library now uses its questionpoint knowledge base to manage information that once might have gone into a vertical file: the authoritative reports, studies, .org sites, and other resources that are often not substantive enough to catalog, but too good to hide away in a single staff member’s bookmarks. the questionpoint knowledge base provides a place for these resources; more importantly, questionpoint provides fast, efficient ways to collect, tag, manage, and use them. questionpoint allows development of such collections with powerful capabilities that allow for future retrieval and use of the information, and it does so without the incredibly time-consuming processes of the past. a 1909 description of such processes describes in detail the inefficiency of yore: in the public library [sic] of newark, n.j., material is filed in folders made of no. 1 tag manila paper, cut into pieces about 11x18 inches in size. one end is so turned up against the others as to make a receptacle 11x19 1/2 inches. the front fold is a half inch shorter than the back one, and this leaves a margin exposed on the back one, whereon the subject of that folder is written.2 thus a major benefit of using questionpoint to manage these resources is saving time. because questionpoint is a routine part of oclc library’s workflow, it allows the addition of items directly to the figure 1. oclc library intranet homepage introducing zoomify image | smith 27“just in case” answers: the twenty-first-century vertical file | dalrymple 27 knowledge base quickly and with a minimum of fuss. there is initially no need to make the entry “pretty,” but only to describe the resource briefly, add the url, and tag it (see figure 2). unlike a physical vertical file, tagging items in the knowledge base allows items to be “put” in multiple places. staff can also add comments that characterize the authoritativeness of a resource. occasionally librarians come across articles or resources that might address multiple questions. instead of burying the data in one overarching knowledge base record, staff can make an entry for each aspect of the resource. an example of this is www .galbithink.org/libraries/analysis. htm, a page created by douglas galbi, senior economist with the federal communications commission (see figure 3). the site provides statistics, including historical statistics, on u.s. public libraries. rather than describe these generically with a tag like “library statistics”—not very useful in any case—each source can be added separately to the questionpoint knowledge base. for example, the item “audiovisual materials in u.s. public libraries” can be assigned specific tags—audiovisual, av, videos—that will make the data more accessible in the future. in other words, librarians use the faq model of asking and answering just one question at a time. an important element in adding “answers” to oclc library’s knowledge base is the ability to provide context. with questionpoint, librarians can not only describe what the resource is, but why it may be of future use. and just the act of adding information to the knowledge base serves as a valuable mnemonic— “i’ve seen that somewhere.” records added to the knowledge base in this way can be easily updated with information about newer editions or better sources. equally valuable is the ability to edit and add keywords when the resource becomes useful for unforeseen questions. sharing information with staff the knowledge base also serves as a more formal collection development tool. when librarians run across potentially valuable resources, they can send a description and a link to a product manager who may find it of use. library staff use questionpoint’s keyword capability to add tags of people’s names and job titles to facilitate ongoing current awareness. employees may provide feedback suggesting an item be added to the figure 3. a page with diverse facts and figures: www.galbithink.org/libraries/analysis.htm figure 2. a sample questionpoint entry, this for a report by the national endowment for the arts 28 information technology and libraries | december 200828 information technology and libraries | december 2008 permanent print collection, or linked to from the library website. oclc library strives to inform users without subjecting them to information overload. when a 2007 survey of oclc staff found the library’s rss feeds seldom used, librarians began to send e-mails directly to individuals and teams. the reaction of oclc staff indicates that such personal messages, with content summaries that allow recipients to quickly evaluate the contents, are more often read than oclc library rss feeds—especially if items sent continue to be valuable. requirements that enable this kind of sharing include knowledge of company goals, staff needs, and product initiatives. to keep up-todate, librarians meet regularly with other oclc staff, and monitor organizational changes. attendance at oclc’s members council meetings provides information on hot topics that help identify resources for future use. while oclc’s growth as a global organization has brought challenges in maintaining awareness of the full range of organization needs, the questionpoint knowledge base offers a practical way to manage increased volume. maintaining resources of potential interest to staff with questionpoint has another benefit: it helps keep librarians aware of internal experts who can help the library with questions, and in many cases allows the library to connect staff with mutual interests to one another. this has become especially important as oclc has grown and its services continue to integrate with one another. conclusions beyond its usefulness as a system to receive, manage, and answer inquiries, questionpoint is providing a way to facilitate access to online resources that addresses the particular needs of oclc library’s constituency. it is fast and easy to use: a standard part of the daily workflow. it enables direct links to sources and accommodates tagging those sources with the names of people and projects, as well as subjects. it serves as part of the library’s collection management and selection system. using questionpoint in this way has some potential drawbacks. “just in case” acquisition of virtual resources entails some of the risks of traditional acquisitions: acquiring resources that are seldom used, creating a database of resources that are difficult to retrieve, and perhaps the necessity of “weeding” or updating obsolete items. with company growth comes the issue of scalability, as well. but for now, the benefits have far outweighed the risks. most of the items added have been identified for and shared with at least one staff member, so the effort has provided immediate payoff. n the knowledge base serves as a collection development tool, helping to identify items that can be cataloged and added to the permanent collection. n the record in the knowledge base can serve as a reminder to check for later editions. n the knowledge base records are easy to update or even delete. the questionpoint virtual vertical file helps oclc library manage and share those useful things that “just turn up.” references 1. “the vertical file for pamphlets and miscellany,” wilson bulletin 1, no. 16 (june 1919): 351. 2. kate louise roberts, “vertical file,” public libraries 12 (oct. 1907): 316–17. lib-mocs-kmc364-20131012113754 editor's notes goodbye jola it is with mixed emotions that we note that this is the last issue of the journal of library automation. the first issue appeared in march 1968, just shortly after this editor had graduated from library school. under the editorships of frederick g. kilgour and susan k. martin, ]ola established itself as a major source of information about developments in library automation. this is also the last issue of the first volume produced by a new editorial board. the current editors are especially indebted to eileen mahoney of ala's central publication unit, whose experience, patience, and wise counsel contributed materially to making this last volume one we are all proud of. hello ital please welcome volume l , number l of information technology and libraries when its bright new face appears on your doorstep in march. it will look very familiar to you. the new name reflects many of the shifts in emphasis that have gradually been introduced in recent years as changing technologies have encouraged a broadening of lola's original scope. we plan to introduce some minor changes to increase it al's utility, but see these as evolutionary. \.ye continue to solicit comments and suggestions on how the journal can better serve your needs. sychronicity in our september issue, we initiated a new section , " reports and working papers," in which we reproduce documents we believe deserve a wider readership than their original distribution. w e were amused to note a similar innovation in the august bulletin of the american society for information science. we would welcome comments on the usefulness (or wastefulness) of the new section. standard.s standards continue to be a major concern in our field. w e hope those of you involved with acquisitions systems will find the communications by sandy paul and jim long in this issue useful. we encourage you to participate in standards development efforts when possible. please t ry to use developed standards whenever they are applicable to your work. the isbn , san (standard address number), sln (standard library number) , and other standard numbers will become increasingly important as our systems become more interdependent in this shrinking world. 251 lib-mocs-kmc364-20140106083618 170 book reviews basic fortran iv programming, by donald h. ford. homewood, illinois: richard d. irwin, inc., 1971. 254 pp. $7.95. fortran texts are now quite plentiful, so the main question in the reviewer's mind is: what does this book have to offer that no other book has? regrettably the answer must be nothing. there are many other good fortran books available. this has very little to distinguish it. that is not to say that it is not a good book. the quality of the book is good, the text is very readable, and there has been very good attention to the examples and proofreading. the book is suit able for an introductory course, or for self study. it does not go completely into all the features of the language, as these are usually best left to the specific manuals relating to the machines available. the book does bring the student to a level where he will be able to use those manuals and the level where he will need to use those manuals. the book does come to the level necessary for the person who writes his programs with professional assistance. the author has chosen ansi basic fortran iv to be discussed in the book. in particular he relates this to the ibm/360 and 370 computers. this is a common language and is available on most machines with only minor modifications. this was a good choice for the level of book he intended to write, since he didn't want to go into the advanced features of the language. the author goes quickly to the heart of the matter in fortran programming, so that the reader can start using the computer right away. the basic material is well covered and gives a good introduction to the more advanced features which are available on most machines. the examples are well chosen so that they do not require any specialized knowledge ; therefore the emphasis can be put on the programming aspects of the examples. he also has very good end-ofchapter problems, ranging in difficulty from straight repetition of text material to programming problems which will require a considerable amount of individual work. he has a good discussion of mixed mode arithmetic, one of the more difficult topics of fortran to explain. he also has a good discussion of input/output operations, and an explanation of formatting which is very good. this again is a difficult area of the language and has been well explained. discussing each of the statement types in fortran, he begins by giving the general form of the statement in a standardized way, which is very good for introductory purposes and for review and reference. the index in the book doesn't single these out, so somebody who wanted to use the book as a reference should make a self-index of these particular areas of the book where the general forms and statements are given. this is a good feature of the book. robert f. mathis book reviews 171 films: a marc format; specifications for magnetic tapes containing catalog records for motion pictures, filmstrips, and other pictorial media intended for pro;ection. washington: marc development office, 1970. 65 pp. $0.65. this latest format issued by the marc development office is similar in organiza tion to the previously issued formats, describing in tum the leader, record directory, control fields , and variable fields. three appendices give the variable field tags , indicators, and subfield codes applicable to this format , categories of films , and a sample record in the marc format. in addition to the motion pictures and filmstrips specified in the subtitle, the coverage of this format includes slides, transparencies, video tapes, and electronic video recordings. data elements describing these last two have not been defined completely as the marc development office feels that further investigation is needed in these areas. the bibliographic level for this format is for monograph material, i.e., material complete at time of issue or to be issued in a known number of parts . since most of the material covered by this format is entered under title, main entry fields ( 100, 110, 111, 130 ) have not been described. this exclusion also covers the equivalent fields in the 400s and 800s. main entry and other fields not listed in this format but required by a user can be obtained from books: a marc format. this format describes two kinds of data: that generally found on an lc printed card and that needed to describe films in archival collections. only the first category will be distributed in machine readable form on a regular basis. one innovation introduced in this format that can only be applauded by marc users is the adoption of the bnb practice of using the second indicator of title fields (241, 245, 440, 840, but not 740 where the second indicator had previously been assigned a different function) to specify the number of characters at the beginning of the entry which are to be ignored in filing. it is to be hoped that in the future this practice will be applied to books, serials, and other types of works as well as to films. judith hopkins u.k. marc pmiect, edited by a. e. jeffreys and t. d. wilson. newcastle upon tyne: oriel press, 1970. 116 pp. 25s. this volume, which reports the proceedings of a conference on the u.k. marc project held in march 1969, may be of as much interest in the usa as in britain; although the intake of british libraries is much smaller and the money available for experiments much less, the problems of developing and using marc effectively within these constraints are for this very reason of special interest. 172 journal of library automation vol. 4/3 september, 1971 a. j. wells opened the conference with a paper introducing u.k. marc and closed it with a paper stating its relationship to the british national bibliography. points of interest are the need for standardisation among libraries (not smprisingly, this theme occurs throughout) and the differences between u.k. marc and l.c. marc (the latter being the odd one out, in its departures from aacr 67). disappointingly, no hint is given of additional national bibliographical products that might come from marc, such as cumulated and updated bibliographies on given subjects, or listings of children's books, etc. richard coward, with his usual clarity and conciseness, explains the planning and format of u.k. marc, in which he has been so centrally involved. as he says, "we have the technology to produce a marc service but we really need a higher level of technology to use it at anything like its full potential." r. bayly's paper on "user programs and package deals" is disappointing, dealing only with icl 1900 computers, and not comprehensively or clearly even with them. two papers discuss the problems of actually using marc: e. h. c . driver's "why marc?", which concludes that "the most efficient use of marc will be made by large library sys tems or groups of libraries," and f . h. ayres' "marc in a special library environment," which concludes that eventually all libraries will use the marc tape. mr. ayres discusses the proposed use of marc at a wre aldermaston, and also gives a general (and highly optimistic ) blueprint of the sort of way marc could be used in an all-through selection, acquisition and cataloging system. (the four american experimental uses of marc reviewed by c. d. batty-at toronto, yale, rice and indianaare probably well enough known in the usa and canada.) keith davidson's discussion of filing problems is first class-and his paper is just as topical as when it was written, because little progress has been made since then. peter lewis, in "marc and the future in libraries," makes the point that whereas bnb cards provided a ready-made product for libraries, marc tapes will merely offer them a set of parts to put together themselves. of special interest to american audiences may be derek austin's paper, "subject retrieval in the u.k. marc," since the precis system to which it forms an introduction may represent a major breakthrough in machine manipulable subject indexing. marc and its uses constitute one of the most rapidly developing areas of librarianship. regular conferences of this standard are needed to review progress from time to time. maurice b. line 52 information technology and libraries | june 2006 author name and second author author id box for 2 column layout this paper discusses google scholar as an extension of kilgour’s goal to improve the availability of information. kilgour was instrumental in the early development of the online library catalog, and he proposed passage retrieval to aid in information seeking. google scholar is a direct descendent of these technologies foreseen by kilgour. google scholar holds promise as a means for libraries to expand their reach to new user communities, and to enable libraries to provide quality resources to users during their online search process. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. f red kilgour would probably approve of google scholar. kilgour wrote that the paramount goal of his professional career is “improving the availability of information.”1 he wrote about his goal of achieving this increase through shared electronic cataloging, and even argued that shared electronic cataloging will move libraries toward the goal of 100 percent availability of information.2 throughout much of kilgour’s life, 100 percent availability of information meant that all of a library’s books would be on the shelves when a user needed them. in proposing shared electronic cataloging—in other words, online union catalogs—kilgour was proposing that users could identify libraries’ holdings without having to travel to the library to use the card catalog. this would make the holdings of remote libraries as visible to users as the holdings of their local library. kilgour went further than this, however, and also proposed that the full text of books could be made available to users electronically.3 this would move libraries toward the goal of 100 percent availability of information even more than online union catalogs. an electronic resource, unlike physical items, is never checked out; it may, in theory, be simultaneously used by an unlimited number of users. where there are restrictions on the number of users of an electronic resource—as with subscription services such as netlibrary, for example—this is not a necessary limitation of the technology, but rather a limitation imposed by licensing and legal arrangements. kilgour understood that his goal of 100 percent availability of information would only be reached by leveraging increasingly powerful technologies. the existence of effective search tools and the usability of those tools would be crucial so that the user would be able to locate available information without assistance.4 to achieve this goal, therefore, kilgour proposed and was instrumental in the early development of much library automation: he was behind the first uses of punched cards for keeping circulation records, he was behind the development of the first online union catalog, and he called for passage retrieval for information seeking at a time when such systems were first being developed.5 this development and application of technology was all directed toward the goal of improving the availability of information. kilgour stated that the goal of these proposed information-retrieval and other systems was “to supply the user with the information he requires, and only that information.”6 shared catalogs and electronically available text have the effect of removing both spatial and temporal barriers between the user and the material being used. when the user can access materials “from a personal microcomputer that may be located in a home, dormitory, office, or school,” the user no longer has to physically go to the library.7 this is a spatial barrier when the library is located at some distance from the user, or if the user is physically constrained in some way. even if the user is perfectly able-bodied, however, and located close to a library, electronic access still eliminates a temporal barrier: accessing materials online is frequently faster and more convenient than physically going to the library. electronic access enables 100 percent availability of information in two ways: by ensuring that the material is available when the user wants it, and by lowering or removing any actual or perceived barriers to the user accessing the material. ■ library automation weise writes that “for at least the last twenty to thirty years, we [librarians] have done our best to provide them [users] with services so they won’t have to come to the library.”8 the services that weise is referring to are the ability for users to search for and gain access to the full text of materials online. libraries of all types have widely adopted these services: for example, at the author’s own institution, the university of north carolina at chapel hill, the libraries have subscriptions to approximately seven hundred databases and provide access to more than 32,000 unique periodical titles; many of these subscriptions provide access to the full text of materials.9 additionally, the state library of north carolina provides a set of more than one hundred database subscriptions to all academic and public libraries around the jeffrey pomerantz jeffrey pomerantz (pomerantz@unc.edu) is assistant pro fessor in the school of information and library science, university of north carolina at chapel hill. google scholar and 100 percent availability of information google scholar and 100 percent availability of information | pomerantz 53 state; any north carolina resident with a library card may access these databases.10 several other states have similar programs. by providing users with remote access to materials, libraries have created an environment in which it is possible for users to be remote from the library. or rather, as lipow points out, it is the library that is remote from the user, yet the user is able to seek and find information.11 this adoption of technology by libraries has had the effect of enabling and empowering users to seek information for themselves, without either physically going to a library or seeking a librarian’s assistance. the increasing sophistication of freely available tools for information seeking on the web has accelerated this trend. in many cases, users may seek information for themselves online without making any use of a library’s human-intermediated or other traditional services. (certainly, providing access to electronic collections may be considered to be a service of the library, but this is a service that may not require the user either to be physically in the library or to communicate with a librarian.) even technically unsophisticated users may use a search engine and locate information that is “good enough” to fulfill their information needs, even if it is not the ideal or most complete information for those purposes.12 thus, for better or worse, the physical library is no longer the primary focus for many information seekers. part of this movement by users toward self-sufficiency in information seeking is due to the success of the web search engine, and to the success of google in particular. recent reports from the pew internet and american life project shed a great deal of light on users’ use of these tools. rainie and horrigan found that “on a typical day at the end of 2004, some 70 million american adults logged onto the internet.”13 fallows found that “on any given day, 56% of those online use search engines.”14 fallows, rainie, and mudd found that of their respondents, “47% say that google is their top choice of search engine.”15 from these figures, it can be roughly estimated that more than 39 million people use search engines, and more than 18 million use google on any given day—and that is only within the united states. this trend seems quite dark for libraries, but it actually has its bright side. it is important to make a distinction here between use of a search engine and use of a reference service or other library service. there is some evidence that users’ questions to library reference services are becoming more complex.16 why this is occurring is less clear, but it may be hypothesized that users are locating information that is good enough to answer their own simple questions using search engines or other internet-based tools. the definition of “good enough” may differ considerably between a user and a librarian. nevertheless, one function of the library is education, and as with all education, the ultimate goal is to make the student self-sufficient in self-teaching. in the context of the library, this means that one goal is to make the user self-sufficient in finding, evaluating, and using information resources. if users are answering their own simple questions, and asking the more difficult questions, then it may be hypothesized that the widespread use of search engines has had a role in raising the level of debate, so to speak, in libraries. rather than providing instruction to users on simply using search engines, librarians may now assume that some percentage of library users possess this skill, and may focus on teaching higher-level information-literacy skills to users (www.ala.org/ala/acrl/ acrlstandards/informationliteracycompetency.htm). simple questions that users may answer for themselves using a search engine, and complex questions requiring a librarian’s assistance to answer are not opposites, of course, but rather two ends of a spectrum of the complexity of questions. while the advance of online search tools may enable users to seek and find information for themselves at one end of this spectrum, it seems unlikely that such tools will enable users to do the same across the entire spectrum any time soon; perhaps ever. the author believes that there will continue to be a role for librarians in assisting users to find, evaluate, and use information. it is also important to make another distinction here, between the discovery of resources, and access to those resources. libraries have always provided mechanisms for users to both discover and access resources. neither the card catalog nor the online catalog contains the full text of the materials cataloged; rather, these tools are means to enable the user to discover the existence of resources. the user may then access these resources by visiting the library. search engines, similar to the card and online catalogs, are tools primarily for discovery of resources: search-engine databases may contain cached copies of web pages, but the original (and most up-todate) version of the web page resides elsewhere on the web. thus, a search engine enables the user to discover the existence of web pages, but the user must then access those web pages elsewhere. the author believes that there will continue to be a role for libraries in providing access to resources—regardless of where the user has discovered those resources. in order to ensure that libraries and librarians remain a critical part of the user’s information-seeking process, however, libraries must reappropriate technologies for online information seeking. search engines may exist separate from libraries, and users may use them without making use of any library service. however, libraries are already the venue through which users access much online content—newspapers, journals, and other periodicals; reference sources; genealogical materials—even if many users do not physically come to the library or consult a librarian when using them. it is possible for 54 information technology and libraries | june 2006 libraries to add value to search technologies by providing a layer of service available to those using it. ■ google scholar one such technology for online information seeking to which libraries are already adding value, and that could add value to libraries in turn, is google scholar (scholar. google.com). google scholar is a specialty search tool, obviously provided by google, which enables the user to search for scholarly literature online. this literature may be on the free web (as open-access publications become more common and as scholars increasingly post preprint or post-print copies of their work on their personal web sites), or it may be in subscription databases.17 users may access literature in subscription databases in one of two ways: (1) if the user is affiliated with an institution that subscribes to the database, the user may access it via whatever authentication method is in place at the institution (e.g., ip authentication, a proxy server), or (2) if the user is not affiliated with such an institution, the user may pay for access to individual resources on a pay-perview basis. there is not sufficient space here to explore the details of google scholar’s operation, and anyway that is not the point of this paper; for excellent discussions of the operation of google scholar, see gardner and eng, and jacsó.18 pace draws a distinction between federated searching and metasearching: federated search tools compile and index all resources proactively, prior to any user’s actual search, in a just-in-case approach to users’ searching.19 metasearch tools, on the other hand, search all resources on the fly at the time of a user’s search, in a just-in-time approach to users’ searching. google scholar is a federated search tool—as, indeed, are all of google’s current services—in that the database that the user searches is compiled prior to the user’s actual search. in this, google scholar is a direct descendent of kilgour’s work to develop shared online library catalogs. a shared library catalog is a union catalog: it is a database of libraries’ physical holdings, compiled prior to any actual user’s search. google scholar is also a union catalog, though a catalog of publishers’ electronic offerings provided by libraries, rather than of libraries’ physical holdings. it should be noted, however, that while this difference is an important one for libraries and publishers, it might not be understood or even relevant for many users. many of the resources indexed in google scholar are also available in full text. this fact allows google scholar to also move in the direction of kilgour’s goal of making passage retrieval possible for scholarly work. by using google’s core technology—the search engine and the inverted index that is created when pages are indexed by a search engine—google scholar enables full-text searching of scholarly work. as mentioned above, when users search google scholar, they retrieve a set of links to the scholarly literature retrieved by the search. google scholar also makes use of google’s linkanalysis algorithms to analyze the network of citations between publications—instead of the network of hyperlinks between web pages, as google’s search engine more typically analyzes. a cited by link is included with each retrieved link in google scholar, stating how many other publications cite the publication listed. clicking on this cited by link performs a preformulated search for those publications. this citation-analysis functionality resembles the functionality of one of the most common and widely used scholarly databases in the scholarly community: the isi web of science (wos) database (scientific .thomson.com/products/wos). wos enables users to track citations between publications. this functionality has wide use in scholarly research, but until google scholar, it has been largely unknown outside of the scholarly community. with the advent of google scholar, however, this functionality may be employed by any user for any research. further, there is a plugin for the firefox browser (www.mozilla.com/firefox) that displays an icon for every record on the page of retrieved results that links to the appropriate record in the library’s opac (google scholar does not, however, currently provide this functionality natively20). this provides a link from google scholar to the materials that the library holds in its collection. when the item is a book, for example, this link to the opac enables users to find the call number of the book in their local library. when the item is a journal, it enables them to find both the call number and any database subscriptions that index that journal title. periodicals are often indexed in multiple databases, so libraries with multiple-database subscriptions often have multiple means of accessing electronic versions of journal titles. a library user may access a periodical via any or all of these individual subscriptions without using google scholar— but to do so, the user must know which database to use, which means knowing either the topical scope of a database or knowing which specific journals are indexed in a database. as a more centralized means of accessing this material, many users may prefer a link in google scholar to the library’s opac. google scholar thus fulfills, in large part, kilgour’s vision of shared electronic cataloging. in turn, shared cataloging goes a long way toward achieving kilgour’s vision of 100 percent availability of information by allowing a user to discover the existence of information resources. however, discovery of resources is only half of the equation: the other half is access to those resources. and it is here where libraries may position themselves as a critical part of the information-seeking process. search engines google scholar and 100 percent availability of information | pomerantz 55 may enable users to discover information resources on their own, without making use of a library’s services, but it is the library that provides the “last mile” of service, enabling users to gain access to many of those resources. ■ conclusion google scholar is the topic of a great deal of debate, both in the library arena and elsewhere.21 unlike union catalogs and many other online resources used in libraries, it is unknown what materials are included in google scholar, since as of this writing google has not released information about which publishers, titles, and dates are indexed.22 google is known to engage in self-censorship—or self-filtering, depending on what coverage one reads—and so potentially conflicts with the american library association’s freedom to read statement (www .ala.org/ala/oif/statementspols/ftrstatement/freedom readstatement.htm).23 google is a commercial entity and, as such, a primary motivation of google must be profit, and only secondarily, meeting the information needs of library users. for all of these and other reasons, there is considerable debate among librarians about whether it is appropriate for libraries to provide access to google scholar. despite this debate, however, users are using google scholar. google scholar is simply the latest tool to enable users to seek information for themselves; it isn’t the first and it won’t be the last. google scholar holds a great deal of promise for libraries due to the combination of google’s popularity and ease of use, and the resources held by or subscribed to by libraries to which google scholar points. as kesselman and watstein suggest, “libraries and librarians need to have a voice” in how tools such as google scholar are used, given that “we are the ones most passionate about meeting the information needs of our users.” given that library users are using google scholar, it is to libraries’ benefit to see that it is used well. google scholar is the latest tool in a long history of information-seeking technologies that increasingly realize kilgour’s goal of achieving 100 percent availability of information. google scholar does not provide access to 100 percent of information resources in existence; but rather enables discovery of information resources, and allows for the possibility that these resources will be discoverable by the user 100 percent of the time. google scholar may be on the vanguard of a new way of integrating library services into users’ everyday information-seeking habits. as taylor tells us, people have their own individual sources to which they go to find information, and libraries—for many people—are not at the top of their lists.25 google, however, is at the top of the list for a great many people.26 properly harnessed by libraries, therefore, google scholar has the potential to bring users to library resources when they are seeking information. google scholar may not bring users physically to the library. instead, what google scholar can do is bring users into contact with resources provided by the library. this is an important distinction, because it reinforces a change that libraries have been undergoing since the advent of the online database: that of providing access to materials that the library may not own. ownership of materials potentially allows for a greater measure of control over the materials and their use. ownership in the context of libraries has traditionally meant ownership of physical materials, and physical materials by nature restrict use, since the user must be physically collocated with the materials, and use of materials by one user precludes use of those materials by other users for the duration of the use. providing access to materials, on the other hand, means that the library may have less control over materials and their use, but this potentially allows for wider use of these materials. by enabling users to come into contact with library resources in the course of their ordinary web searches, google scholar has the potential to ensure that libraries remain a critical part of the user’s information-seeking process. it benefits google when a library participates with google scholar, but it also benefits the library and the library’s users: the library is able to provide users with a familiar and easy-to-use path to materials. this is (for lack of a better term) a “spoonful of sugar” approach to seeking and finding information resources: by using an interface that is familiar to users, libraries may provide quality information sources in response to users’ information seeking. green wrote that “a librarian should be as unwilling to allow an inquirer to leave the library with his question unanswered as a shop-keeper is to have a customer go out of his store without making a purchase.”27 a modern version of this might be that a librarian should be as unwilling to allow an inquirer to abandon a search with his question unanswered. google scholar and online tools like it have the potential to draw users away from libraries; however, these tools also have the potential to usher in a new era of service for libraries: an expansion of the reach of libraries to new users and user communities; a closer integration with users’ searches for information; and the provision of quality resources to all users, in response to all information needs. google scholar and online tools like it have the potential to enable libraries to realize kilgour ’s goals of improving the availability of information, and to provide 100 percent availability of information. these are goals on which all libraries can agree. 56 information technology and libraries | june 2006 ■ acknowledgements many thanks to lisa norberg, instruction librarian, and timothy shearer, systems librarian, both at the university of north carolina at chapel hill, for many extensive conversations about google scholar, which approached coauthorship of this paper. this paper is dedicated to the memory of kenneth d. shearer. references and notes 1. frederick g. kilgour, “historical note: a personalized prehistory of oclc,” journal of the american society for information science 38, no. 5 (1987): 381. 2. frederick g. kilgour, “future of library computerization,” in current trends in library automation: papers presented at a workshop sponsored by the urban libraries council in cooperation with the cleveland public library, alex ladenson, ed. (chicago: urban libraries council, 1981), 99–106; frederick g. kilgour, “toward 100 percent availability,” library journal 114, no. 19 (1989): 50–53. 3. kilgour, “toward 100 percent availability.” 4. frederick g. kilgour, “lack of indexes in works on information science,” journal of the american society for information science 44, no. 6 (1993): 364; frederick g. kilgour, “implications for the future of reference/information service,” in collected papers of frederick g. kilgour: oclc years, lois l. yoakam, ed. (dublin, ohio: oclc online computer library center, inc., 1984): 9–15. 5. frederick g. kilgour, “a new punched card for circulation records,” library journal 64, no. 4 (1939): 131–33; kilgour, “historical note”; frederick g. kilgour and nancy l. feder, “quotations referenced in scholarly monographs,” journal of the american society for information science 43, no. 3 (1992): 266–70; gerald salton, j. allan, and chris buckley, “approaches to passage retrieval in full-text information systems,” in proceedings of the 16th annual international acm sigir conference on research and development in information retrieval (new york: acm pr., 1993), 49–58. 6. kilgour, “implications for the future of reference/information service,” 95. 7. kilgour, “toward 100 percent availability,” 50. 8. frieda weise, “being there: the library as place,” journal of the medical library association 92, no. 1 (2004): 10, www.pubmedcentral.nih.gov/articlerender.fcgi?artid=314099 (accessed apr. 9, 2006). 9. it is difficult to determine precise figures, as there is considerable overlap in coverage; several vendors provide access to some of the same periodicals. 10. north carolina’s database subscriptions are via the nc live service, www.nclive.org (accessed apr. 9, 2006). 11. anne g. lipow, “serving the remote user: reference service in the digital environment,” paper presented at the ninth australasian information online and on disc conference and exhibition, sydney, australia, 19–21 jan. 1999, www.csu.edu.au/ special/online99/proceedings99/200.htm (accessed apr. 9, 2006). 12. j. janes, “academic reference: playing to our strengths,” portal: libraries and the academy 4, no. 4 (2004): 533–36, http:// muse.jhu.edu/journals/portal_libraries_and_the_academy/ v004/4.4janes.html (accessed apr. 9, 2006). 13. lee rainie and john horrigan, a decade of adoption: how the internet has woven itself into american life (washington, d.c.: pew internet & american life project, 2005), 58, www.pewinter net.org/ppf/r/148/report_display.asp (accessed apr. 9, 2006). 14. deborah fallows, search engine users (washington, d.c.: pew internet & american life project, 2005), i, www.pew internet.org/pdfs/pip_searchengine_users.pdf (accessed apr. 9, 2006). 15. deborah fallows, lee rainie, and graham mudd, data memo on search engines (washington, d.c.: pew internet & american life project, 2004), 3, www.pewinternet.org/ppf/ r/132/report_display.asp (accessed apr. 9, 2006). 16. laura bushallow-wilber, gemma devinney, and fritz whitcomb, “electronic mail reference service: a study,” rq 35, no. 3 (1996): 359–69; carol tenopir and lisa a. ennis, “reference services in the new millennium,” online 25, no. 4 (2001): 40–45. 17. alma swan and sheridan brown, open access selfarchiving: an author study (truro, england: key perspectives, 2005), www.jisc.ac.uk/uploaded_documents/open%20access %20self%20archiving-an%20author%20study.pdf (accessed apr. 9, 2006). 18. susan gardner and susanna eng, “gaga over google? scholar in the social sciences,” library hi tech news 8 (2005): 42–45; péter jacsó, “google scholar: the pros and the cons,” online information review 29, no. 2 (2005): 208–14. 19. andrew pace, “introduction to metasearch . . . and the niso metasearch initiative,” presentation to the openurl and metasearch workshop, sept. 19–21, 2005, www.niso.org/news/ events_workshops/openurl-05-ppts/2-1-pace.ppt (accessed apr. 9, 2006). 20. this plugin was developed by peter binkley, digital initiatives technology librarian at the university of alberta. see www.ualberta.ca/~pbinkley/gso (accessed apr. 9, 2006). 21. see, for example, gardner and eng, “gaga over google?”; jacsó, “google scholar”; m. kesselman and s. b. watstein, “google scholar and libraries: point/counterpoint,” reference services review 33, no. 4 (2005): 380–87. 22. jacsó, “google scholar.” 23. anonymous, google censors itself for china, bbc news, jan. 25, 2006, http://news.bbc.co.uk/2/hi/technology/4645596 .stm (accessed apr. 9, 2006); a. mclaughlin, “google in china,” google blog., jan. 27, 2006, http://googleblog.blogspot .com/2006/01/google-in-china.html (accessed apr. 9, 2006). 24. kesselman and s. b. watstein, “google scholar and libraries,” 386. 25. robert s. taylor, “question-negotiation and information seeking in libraries,” college & research libraries 29, no. 3 (1968): 178–94. 26. fallows, rainie, and mudd, data memo on search engines. 27. samuel s. green, “personal relations between librarians and readers,” american library journal i, no. 2–3 (1876): 79. 1–11. lib-s-mocs-kmc364-20140601053338 two types of designs/ mcgee 203 book reviews the proceedings of the international conference on training for information work, rome, italy , 15th-19th november 1971, edited by georgette lubock. joint publication of the italian national information institute, rome and the international federation for documentation, the hague; f.i.d. publ. 486; sept. 1972, rome, 510 p. let's face it: there is something about any proceedings that elicits a very personal reaction in many of us: "here are papers that either, a) got their authors a trip to the conference city; b ) tell how we did good at our place; or c) unabashedly present h.b.i.'s( half baked ideas )." i personally like proceedings that have many papers under category c); such papers make me think ( or laugh ). the great majority of papers in these rome proceedings fall basically under category b), i.e.-'how we done it good,' and some quite obviously under a), i.e.-'have paper will travel'-well it was rome, italy, after all. however, there is a smattering of papers that fall under c), i.e.-h.b.i.'s. so for those interested in the topic, these proceedings offer among other things some food for speculative thought. for these other things let us start at the beginning. the contents consists of prefatory sections, one opening address, sixty-six papers, a set of twenty brief conclusions, three closing addresses, a summary of work at the conference, an author index, and a list of participants and authors' addresses. the papers are organized according to two major sessions: one on "training of information specialists" (nine invited and fortytwo submitted papers ) and another on "training of information users" (six invited and nine submitted papers ). the larger number of papers on training of specialists vs. training of users probably represents a good assessment of real education interests in the field. the conference was truly international: authors came from four continents, twenty countries and four international organizations. most represented were: italy as host country with fifteen papers, usa with eight, great britain with seven, and france with six papers. the concern for information science education is indeed worldwide; however, if the presented papers are any measure, such education is in big trouble, because one is left with the impression that information science education is in some kind of limbo: the bases, relations, and directions are muddled or nonexistent. but then isn't all contemporary higher education in big trouble, and in limbo? the conceptions of what information science education is all about differ so widely from paper to paper that the question of this difference in itself could be a subject of the next conference. it is my impression that the differences are due to a) widely disparate preconceptions of the nature of "information problems,'' and b) incompetence of a number of authors in relation to the subjects. accomplishments in some other field or, even worse, 204 journal of library automation vol. 5/3 september, 1972 a high administrative title does not necessarily make for competence in information science education. the proceedings offer a fascinating picture of information science education by countries and by various facets. it also offers frustration due to unbelievably unhygienic semantic conditions in the treatment of concepts, including a confusion from the outset of "training" and "education." the first business of the field should be toward clearing its own semantic pollution; such a conclusion can be derived even after a most cursory examination of the papers. my own choices for the three most interesting papers are: -v. slamecka and p. zunde, "science and information: some implications for the education of scientists;" (usa) -s. j. malan, "the implications for south african education in library science in the light of developments in information science;" (south africa) -w. kunz and h. w. j. rittel, "an educational system for the information sciences." (germany) the editing of the ptoceedings is exemplary; the editors and conference organizers worked hard and conscientiously. the proceedings also provide the best single source published so far from which one could gain a wide international overview not only of information science education but also of information science itself, including implicitly the problems the field faces. in this lies the main worth of the proceedings. t efko saracevic computer processing of library files at durham unive1·sity; an ordering and cataloging facility for a small collection using an ibm 360/ 67 machine. by r. n. oddy. durham, england: university library, 1971. 202p. £1.75. the task of the book is to guide the reader in the use of the lfp (library file processing) system developed by the durham university library. the lfp system orders items and prints book catalogs in various sequences for a small collection of items with the aid of an electronic digital computer. the system is batch with card input and printed output; the programs are written in pl/1. "the lfp system was designed to be flexible and easy to operate for small files , and is less suitable for files larger than 10,000 items because there are then other problems which it does not attempt to solve." (p. 10). the book fulfills its assigned task well; it is an excellent example of explanations and instructions for the personnel charged with the day to day operations for the particular system described. the book includes excellent introductory chapters on job control language, how computers operate, file maintenance, etc. outside of the durham university library, however, the book has little use except as a model of a well done operations guide. kenneth ]. bierman 164 information technology and libraries | december 2009 “discovery” focus as impetus for organizational learning jennifer l. fabbi the university of nevada las vegas libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information began with an organizational review of the libraries’ technical services division. this article outlines the phases of this review and subsequent planning and organizational commitment to discovery. using the theoretical lens of organizational learning, it highlights how the emerging focus on discovery has provided an impetus for genuine learning and change. t he university of nevada las vegas (unlv) libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information stemmed from the confluence of several initiatives. however, a significant path that is directly responsible for the increased attention on discovery leads through one unit in unlv libraries—technical services. this unit, consisting of the materials ordering and receiving (acquisitions) and bibliographic and metadata services (cataloging) departments, had been without a permanent director for three years when i was asked to take the interim post in april 2008. while the initial expectation was that i would work with the staff to continue to keep technical services functioning while we performed our third search for a permanent director, it became clear after three months that, because of nevada’s budgetary limitations, we would not be able to go forward with a search at that time. as all personnel searches in unlv libraries were frozen, managers and staff across the divisions moved quickly to reassign staff with the aim of mitigating the effects of staff vacancies. there was division between the library administrators as to what the solution would be for technical services: split up the division—for which we had trouble recruiting and retaining a leader in the past—and divvy up its functions among other divisions in the libraries, or to continue to hold down the fort while conducting a review of technical services that would inform what it might become in the future. other organizations have taken serious looks at, and provided roadmaps of, how their organizations’ focus of technical services will change in the future.1 the latter route was chosen, and the review—eventually dubbed revisioning technical services—led directly to the inquiries and activities documented in this ital special issue. detailing the process of revisioning technical services and using the theoretical lens of organizational learning, i will demonstrate how the libraries’ emerging focus on the concept of discovery has provided an impetus for genuine learning and change. n organizational learning in images of organization, morgan devotes a chapter to theories of organizational development that characterize organizations using the metaphor of the brain.2 based on the principles of modern cybernetics, argyris and schön provide a framework for thinking about how organizations can learn to learn.3 while many organizations have become adept at single-loop learning—the ability to scan the environment, set objectives, and monitor their own figure 1. singleand double-loop learning source: learning-org discussion pages, “single and double loop learning,” learning-org dialog on learning organizations, http://www.learning-org.com/ graphics/lo23374singledll.jpg (accessed aug. 11, 2009). jennifer l. fabbi (jennifer.fabbi@unlv.edu) is special assistant to the dean at the university of nevada las vegas libraries. “discovery” focus as impetus for organizational learning | fabbi 165 general performance in relation to existing operating norms—these types of systems are generally designed to keep the organization “on course.” double-loop learning, on the other hand, is a process of learning to learn, which depends on being able to take a “double look” at the situation by questioning the relevance of operating norms (see figure 1). bureaucratized organizations have fundamental organizing principles, including management hierarchy and subunit goals that are seen as ends to themselves, which can actually obstruct the learning process. to become skilled in the art of double-loop learning, organizations must avoid getting trapped in singlelooped processes, especially those created by “traditional management control systems” and the “defensive routines” of organizational members.4 according to morgan, cybernetics suggests that learning organizations must develop capacities that allow them to do the following:5 n scan and anticipate change in the wider environment to detect significant variations by o embracing views of potential futures as well as of the present and the past; o understanding products and services from the customer’s point of view; and o using, embracing, and creating uncertainty as a resource for new patterns of development. n develop an ability to question, challenge, and change operating norms and assumptions by o challenging how they see and think about organizational reality using different templates and mental models; o making sure strategic development does not run ahead of organizational reality; and o developing a culture that supports change and risk taking. n allow an appropriate strategic direction and pattern of organization to emerge by o developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed; o absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation; and o placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals. unlv libraries’ revisioning technical services process and the resulting organizational focus on discovery is outlined below, and the elements identifying unlv libraries as a learning organization throughout this process are highlighted (see appendix a). n revisioning technical services this review of technical services was a process consisting of several distinct steps over many months, and each step was informed by the data and opinions gained in the prior steps: phase 1: technical services baseline, focusing on the nature of technical services work at unlv libraries, in the library profession, and factors that affect this work now and in the future phase 2: organizational call to action, engaging the entire organization in shared learning and input phase 3: summit on discovery, shifting significantly away from technical services and toward the concept of discovery of information and the experience of our users technical services baseline the first phase of the process, which i called the “technical services baseline,” included a face-to-face meeting with me and all technical services staff. we talked openly about the challenges that we faced, options on the table for the division and why i thought that taking on this review would be the best course to pursue, and goals of the review. outcomes of the process were guided by the dean of libraries, were written by me, and received input from technical services staff, resulting in the following goals: 1. collect input about the kinds of skills and leadership we would like to see in our new technical services director. (while creating these goals, we were given the go-ahead to continue our search for a new director). 2. investigate the organization of knowledge at a broad level—what is the added value that libraries provide? 3. increase overall knowledge of professional issues in technical services and what is most meaningful for us at unlv. 4. encourage technical services staff to consider current and future priorities. after establishing these goals, i began to document information about the process on unlv libraries’ staff website (figure 2) so that all staff could follow its progress. 166 information technology and libraries | december 2009 with the feedback i received at the face-to-face meeting and guided by the stated goals of the process, i gave technical services staff a series of three questions to answer individually: 1. what do you think the major functions of technical services are? examples are “cataloging physical materials” and “ordering and paying for all resources purchased from the collections budget.” 2. what external factors—in librarianship and otherwise—should we be paying the most attention to in terms of their effect on technical services work? examples are “the ways that users look for information” and “reduction of print book and serials budgets.” feel free to do a little research on this question and provide the sources of the information that you find. 3. what are the three highest priority/most important tasks on your to-do list right now? eighteen of twenty staff members responded to the questions. i then analyzed the twenty pages of feedback according to two specific criteria: (1) i paid special attention to phrases that indicated an individual’s beliefs, values, or philosophies to identify potential sources of conflict as we moved through the process; and (2) i looked for priority tasks listed that are not directly related to the individual’s job duties, as many of them were indicators of work stress or anxiety related to perceived impending change. during this phase, organizational learning was initiated through the process of challenging how technical services staff and others viewed technical services as a unit in the organization, and through the creation of shared reference points to guide our future actions. while beginning a dialogue about a variety of future management options for technical services work functions may have raised levels of anxiety within the organization, it also invited administration and staff to question the status quo and consider alternative modes of operation within the context of efficiency.6 in addition to thinking about current realities and external influences, staff were asked to participate in generating outcomes to guide the review process. these shared goals helped to develop a sense of coherence for what started out as a very loose assignment—a review that would inform what the unit might become in the future. organizational call to action the next phase of the process, “a call to action,” required library-wide involvement and input. while i knew that this phase would involve a library staff survey, i also desired that all staff responding to the survey had a basic knowledge of some of the issues that are facing library technical services today. using input from the two technical services department heads, i selected two readings for all library staff: bothmann and holmberg’s chapter on strategic planning for electronic resource management addressed many of the planning, policy, and workflow issues that unlv libraries has experienced7; and coyle’s article on information organization and the future of the library catalog offers several ideas for ensuring that valuable information is visible to our users in the information environments they are using.8 i also asked the library staff to visit the university of nebraska–lincoln’s “encore catalog search” (http://iris.unl.edu) and go through the discovery experience by performing a guided search and a search on a topic of their choice. they were then asked to ponder what collections of physical or digital resources we currently own at the libraries that are not available from the library catalog. after completing these steps, i directed library staff to a survey of questions related to the importance of several items referenced in the articles in terms of the following unlv libraries priorities: n creating a single search interface for users pulling together information from the traditional library catalog as well as other resources (e.g., journal articles, images, archival materials) n considering non–marc records in the library catalog for the integration of nontraditional library and nonlibrary resources into the catalog n linking to access points for full-text resources from the catalog n creating ways for the catalog to recommend items to users figure 2. project’s wiki page on staff website “discovery” focus as impetus for organizational learning | fabbi 167 n creating metadata for materials not found in the catalog n creating “community” within the library catalog n implementing an electronic resource management system (erms) to help manage the details related to subscriptions to electronic content n implementing federated searching so that users can search across multiple electronic resource interfaces at once n making electronic resource license information available to library staff and patrons there also were several questions asking library staff to prioritize many of the functions that technical services already undertakes to some extent: n cataloging specialized or unique materials n cataloging and processing gift collections n ensuring that full-text electronic access is represented accurately in the catalog n claiming and binding print serials n ordering and receiving physical resources n ordering and receiving electronic resources n maintaining and communicating acquisitions budget and serials data the survey asked technical services staff to “think of your current top three priority to-do items. in light of what you read and what you think is important for us to focus on, how do you think your work now will have changed in five years?” all other library staff members were asked to respond to the following: 1. please list two ways that technical services supports your work now. 2. please list two things you would like technical services to start doing in support of your work now. 3. please list two things you think technical services can stop doing now. 4. please list two things technical services will need to begin doing to support your work in the next five years. finally, the survey included ample opportunity for additional comments. fifty-eight staff members (over half of all library staff) completed the readings, activity, and survey. i analyzed the information to inform the design of subsequent phases of revisioning technical services. the dean of libraries’ direct reports then reviewed the design. in addition, many library staff contributed additional readings and links to library catalogs and other websites to add to the revisioning technical services staff webpage. throughout this phase, the organization was invited into the learning process through engagement with shared reference points, the ability to question the status quo, and the ability to embrace views of potential futures as well as of the present and the past.9 the careful selection of shared readings and activities created coherence among the staff in terms of thinking about the future, but these ideas also raised many questions about the concept of discovery and what route unlv libraries might take. the survey allowed library staff to better understand current practices in technical services, to prioritize new ideas against these practices, and to think about future options and their potential impact on their individual work as well as the collective work of the libraries. summit on discovery in the third phase of this process, “the discovery summit,” focus began to shift significantly from technical services as an organizational unit to the concept of discovery and what it means for the future of unlv libraries. during this half-day event, employing a facilitator from off campus, the dean of libraries and i designed a program to fulfill the following desired outcome: through a process of focused inquiry, observation, and discussion, participants will more fully understand the discovery experience of unlv libraries users. the event was open to all library staff members; however, individuals were required to rsvp and complete an activity before the day of the event. (the facilitator worked specifically with the technical services staff at a retreat designed to prepare for upcoming interviews for technical services director candidates.) participants were each sent a “summit matrix” (see appendix b) ahead of time, which asked them to look for specific pieces of information by doing the following: 1. search for the information requested with three discovery tools as your starting points: the libraries’ catalog, the libraries’ website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease—few results) to 5 (highest ease—best results). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? the information that staff members were asked to search for using each discovery tool was mostly specific to the region of southern nevada, such as, “i heard that henderson (a city in southern nevada) started as a mining community. does unlv libraries have any books about that?” and “find any photograph of the gay 168 information technology and libraries | december 2009 pride parade in las vegas that you can look at in unlv libraries.” during the summit, the approximately sixty participants were asked to discuss their experiences searching for the matrix information, including any affective component to their experience, and they were asked to specify criteria for their definition of “ease of discovery.” next, we showed end-user usability video testing footage of a unlv professor, a human resources employee, and a unlv librarian going through similar discovery exercises. after each video, we discussed these users’ experiences—their successes, failures, and frustrations— and the fact that even our experts were unable to discover some of this information. finally, we facilitated a robust brainstorming session on initiatives we could undertake to improve the discovery experience of our users. [editor’s note: read more about this usability testing in “usability as a method for assessing discovery” on page 181 of this issue.] during the wrap-up of the discovery summit, the final phase of this initial process, the discovery miniconference was introduced. a call for proposals for library staff to introduce or otherwise present discovery concepts to other library staff was distributed. this call tied together the revisioning technical services process to date and also placed the focus on discovery to the libraries’ upcoming strategic planning process. this strategic planning process, outlining broad directions for the libraries to focus on for the next two years, would be the first time we would use our newly created evaluation framework. we focused on the concepts of discovery, access, and use, all tied together through an emphasis on the user. all library staff members were invited to submit a poster session or other visual display on various themes related to discovery of information to add to our collective and individual knowledge bases and to better understand our colleagues’ philosophies and positions on discovery. in addressing one of six mini-conference themes listed below, all drawn directly from the revisioning technical services survey results, potential participants were asked to consider the question, “what are your ideas for ways to improve how users find library resources?” n single search interface (federated searching, harvester-type platform, etc.) n open source vs. vendor infrastructure n information-seeking behavior of different users n social networking and web 2.0 features as related to discovery n describing primary sources and other unique materials for discovery n opening the library catalog for different record types and materials proposals could include any of these perspectives: n an environmental scan with a summary of what you learn n a visual representation of what you would consider improvement or success n a position for a specific approach or solution that you advocate ultimately, we had seventeen distinct projects involving twenty-four staff members for the afternoon miniconference. it was attended by approximately seventy additional staff members from unlv libraries as well as representatives from institutions who share our innovative system. we collected feedback on each project in written form and electronically after the mini-conference. miniconference content was documented on its own wiki pages and in this special issue of ital. during this phase of the revisioning technical services process, there was an emphasis on understanding our services from the customers’ point of view, a hallmark of a learning organization.10 during the discovery summit, we aimed to transform frustration and uncertainty over the user experience of the services we are providing into a motivation to embrace potential futures. the mini-conference utilized the discovery themes that had evolved throughout the revisioning technical services process to provide a cohesive framework for library staff members to share their knowledge and ideas about discovery systems and to question the status quo. n organizational ownership of discovery: strategic planning and beyond through the phases of the revisioning technical services process outlined above, it should be evident how the concept of discovery, highlighted during the process, moved from being focused on technical services to being owned by the entire organization. while the vocabulary of discovery had previously been owned by pockets of staff throughout unlv libraries, it has now become a common lexicon for all. the libraries’ evaluation framework, which includes discovery, had set the stage for our upcoming organizational strategic plan. just prior to the discovery summit, the dean of libraries’ direct reports group began to discuss how it would create a strategic plan for the 2009–11 biennium. it became increasingly apparent how important a focus on discovery would be in this process, and that we needed to time our planning right, allowing the organization and ourselves time to become familiar with the potential activities we might commit to in this area before locking into a strategic plan. “discovery” focus as impetus for organizational learning | fabbi 169 the dean’s direct reports group first spent time crafting a series of strategic directions to focus on in the two-year time period we were planning for. rather than give the organization specific activities to undertake, the strategic directions were meant to focus our new initiatives—and in a way to limit that activity to those that would move us past the status quo. of the sixteen directions, one stemmed directly from the organization’s focus on discovery: “improve discoverability of physical and electronic resources in empowering users to be self sufficient; work toward an interface and system architecture that incorporates our resources, internal and external, and allows the user to access them from their preferred starting point.” an additional direction also touched on the discovery concept: “monitor and adapt physical and virtual spaces to ensure they respond to and are informed by next-generation technologies, user expectations, and patterns in learning, social interactions, and research collaboration; encourage staff to experiment with, explore, and share innovative and creative applications of technology.” through their division directors and standing committees, all library staff members were subsequently given the opportunity to submit action items to the strategic plan within the framework of the strategic directions. the effort was made by the dean of libraries for this part of the process to coincide with the discovery mini-conference, a time when many library staff members were being exposed to a wide variety of potential activities that we might take as an organization in this area. one of the major action items that made it into the strategic plan was for the dean’s direct reports to charge an oversight task force with the investigation and recommendation of a systems or systems that would foster increased, unified discovery of library collections. the charge of this newly created discovery task force includes a set of guiding principles for the group in recommending a discovery solution that n creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g., journal articles, images, archival materials); n enhances discoverability of as broad a spectrum of library resources as possible; n is intuitive: minimizes the skills, time, and effort needed by our users to discover resources; n supports a high level of local customization (such as accommodating branding and usability considerations); n supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure); n demonstrates commitment to sustainability and future enhancements; and n is informed by preferred starting points of the user. in setting forth these guiding principles, the work of the discovery task force is informed by the organization’s discovery values, which have evolved over a year of organizational learning. in the timing of the strategic planning process and the emphasis of the plan, we made sure that the organization’s strategic development did not run ahead of organizational reality and also have worked to develop a culture that supports change and risk taking.11 the strategic discovery direction and pattern of organizational focus has been allowed to emerge throughout the organizational learning process. as evidenced in both the strategic plan directions and guiding principles laid out in the charge of the discovery task force, the organization has begun to absorb the basic philosophy that will guide appropriate objectives in this area and has focused more on this guiding philosophy than on the active pursuit of one right answer as it continues to learn. n conclusion using the theoretical lens of organizational learning, i have documented how unlv libraries’ emerging focus on the concept of discovery has provided an impetus for learning and change (see appendix a). our experience throughout this process supports the theory that organizational intelligence evolves over time and in reference to current operating norms.12 argyris and schön warn that a top-down approach to management focusing on control and clearly defined objectives encourages singleloop learning.13 had unlv libraries chosen a more management-oriented route at the beginning of this process, it most likely would have yielded an entirely different result. in this case, genuine organizational learning proved to be action based and ever-emerging, and while this is known to introduce some level of anxiety into an organization, the development of the ability to question, challenge, and potentially change operating norms has been worth the cost.14 i believe that while any single idea we have broached in the discovery arena may not be completely unique, it is the entire process of organizational learning that is significant and applicable to many information and technology-related areas of interest. references 1. karen calhoun, the changing nature of the catalog and its integration with other discovery tools (washington, d.c.: library 170 information technology and libraries | december 2009 scan and anticipate change in the wider environment to detect significant variations by n embracing views of potential futures as well as of the present and the past (revisioning phase 1: technical services questions); n understanding products and services from the customer’s point of view (revisioning phase 3: summit); and n using, embracing, and creating uncertainty as a resource for new patterns of development (revisioning phase 1: meeting; phase 3: summit). develop an ability to question, challenge, and change operating norms and assumptions by n challenging how they see and think about organizational reality using different templates and mental models (revisioning phase 2: survey); n making sure strategic development does not run ahead of organizational reality (strategic planning process; discovery task force charge); and n developing a culture that supports change and risk taking (strategic planning process). allow an appropriate strategic direction and pattern of organization to emerge by n developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed (revisioning phase 1: outcomes; phase 2: shared readings, activity; strategic planning process; discovery task force charge); n absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation (strategic planning process, discovery task force charge); and n placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals (strategic planning process, discovery task force charge). of congress, 2006), http://www.loc.gov/catdir/calhoun-report -final.pdf (accessed aug. 12, 2009); bibliographic services task force, rethinking how we provide bibliographic services for the university of california (univ. of california libraries, 2005), http://libraries.universityofcalifornia.edu/sopag/bstf/final .pdf (accessed aug. 12, 2009). 2. gareth morgan, images of organization (thousand oaks, calif.: sage, 2006). 3. chris argyris and donald a. schön, organizational learning ii: theory, method, and practice (reading, mass.: addison wesley, 1996). 4. morgan, images of organization, 87. 5. morgan, images of organization, 87–97. 6. ibid. 7. robert l. bothmann and melissa holmberg, “strategic planning for electronic management,” in electronic resource management in libraries: research and practice, ed. holly yu and scott breivold, 16–28 (hershey, pa.: information science reference, 2008). 8. karen coyle, “the library catalog: some possible futures,” the journal of academic librarianship 33, no. 3 (2007): 414–16. 9. morgan, images of organization. 10. ibid. 11. ibid. 12. ibid. 13. argyris and schön, organizational learning ii. 14. morgan, images of organization. appendix a. tracking unlv libraries’ discovery focus across characteristics of organizational learning “discovery” focus as impetus for organizational learning | fabbi 171 please complete the following and bring to the summit on discovery—february 24: 1. search for the information requested in each row of the table below with three discovery tools as your starting points: the libraries catalog, the libraries website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease) to 5 (highest ease). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information in the space provided. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? appendix b. summit matrix what am i looking for? libraries catalog libraries website google thoughts, etc., on what i discovered what’s all the fuss about frazier hall? why is it important? does unlv libraries have any documents about the history of the university that reference it? it’s black history month and my professor wants me to find an oral history about african americans in las vegas that is available in unlv libraries. i heard that henderson started as a mining community. does unlv libraries have any books about that? find any photograph of the gay pride parade in las vegas that you can look at in unlv libraries. 106 information technology and libraries | september 2009 michelle frisquepresident’s message michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. b y the time you read this column i will be lita president, however, as i write this i still have a couple of weeks left in my vice-presidential year. i have been warned by so many that my presidential year will fly by, and i am beginning to understand how that could be. i can’t believe i am almost done with my first year. i have enjoyed it and sometimes been overwhelmed by it—especially when i began the process of appointing lita volunteers to committees and liaison roles. i didn’t realize how many appointments there were to make. i want to thank all of the lita members who volunteered. you really helped make the appointment process easier. as a volunteer organization, lita relies on you, and once again many of you have stepped up. thank you. during the appointment process i was introduced to many lita members whom i had not yet met. i enjoyed being introduced to you virtually, and i look forward to meeting you in person in the coming year. i also want to thank the lita office. they were there whenever i needed them. without their assistance i would not have been able to successfully complete the appointment process. over the last year i have been working closely with this year’s lita emerging leaders, lisa thomas and holly tomren. i have really enjoyed the experience. their enthusiasm and energy is contagious. i wish every lita member could have been at this year’s lita camp in columbus, ohio, on may 8. during one of the lightning round sessions, lisa went to the podium and gave an impassioned speech about the benefits of belonging to a professional organization like lita. if there was a person in the audience that was not yet a lita member, i am sure they joined immediately afterward. she really captured the essence of why i became active in lita and why i continue to stay so involved in this organization so many years later. i can honestly say that as much as i have given to lita, i have received so much more in return. that is the true benefit of lita membership. over the last year, the lita board has had some great discussions with lita members and leaders. those conversations will continue as we start the work of drafting a new strategic plan. i want to create a strategic plan that will chart a meaningful path for the association and its members for the next several years. i want it to provide direction but also be flexible enough to adapt to changes in the information technology association landscape. as andrew pace mentioned in his last president’s message, changes will be coming. while we still aren’t sure exactly what those changes are, we know that it is time to seriously look at the current organizational structure of lita to make sure it best fits our needs today while continuing to remain flexible enough to meet our needs tomorrow. when i think of the organizational changes we are exploring, i can’t help but think of the houses i see on my favorite home improvement shows. lita has good bones. the structure and foundation are solid and well built, and as long as the house is well cared for, should last for years to come. however, like all houses, improvements need to be made over time to keep up with the market. the lita structure and foundation will be the same. when you drive up to the house you will still recognize the lita structure. when you walk in the door my hope is that you will still get that same homey feeling you had before, maybe with a few “oohs” and “aahs” thrown in as you notice the upgrades and enhancements. as the year progresses we will know more. i will use this column and other communication avenues to keep you informed of our plans and to gather your input. i would like to close my first column by thanking you for giving me this opportunity to serve you as the lita president. i am honored and humbled by the trust you have placed in me, and i am ready to start my presidential year. i hope it does not go by too quickly. i want to savor the experience. now let’s get started! the importance of identifying and accommodating e-resource usage data for the presence of outliers. alain r. lamothe information technology and libraries | june 2014 31 abstract this article presents the results of a quantitative analysis examining the effects of abnormal and extreme values on e-journal usage statistics. detailed are the step-by-step procedures designed specifically to identify and remove these values, termed outliers. by greatly deviating from other values in a sample, outliers distort and contaminate that data. between 2010 and 2011, e-journal usage at laurentian university’s j. n. desmarais library spiked because of illegal downloading. the identification and removal of outliers had a noticeable effect on e-journal usage levels. they represented more than 100,000 erroneous articles downloaded in 2010 and nearly 200,000 erroneous downloaded in 2011. introduction this article was written with two purposes in mind. first, it presents and discusses the results of a quantitative analysis that assessed how outlier values can influence usage statistics. second, and more important, it details the step-by-step procedures designed specifically to identify outliers and reduce their impact on the data. outliers are abnormal values that result in the corruption or contamination of data by artificially increasing or reducing average values.1 an outlier can thus be defined as a value that appears to greatly deviate from all other values in the sample,2 as an observation that seems to be inconsistent with the rest of the dataset,3 or as a very extreme observation requiring special attention because of potential impacts it may have on a summary of the data.4 they occur frequently in measurement data.5 the presence of outliers in usage data can significantly and negatively impact libraries. for libraries having e-resource subscription pricing based on usage statistics, the presence of outliers can contribute to unwarranted increases in subscription rates. for libraries that integrate eresource usage statistics into their collection development and management practices, the presence of outliers can affect decisions on purchase, retention, or elimination of particular eresources. evaluators can be fooled into thinking that a particular e-resource is heavily used and must be kept. further, the presence of extreme outliers is often the result of a malicious system alain r. lamothe (alomothe@laurentian.ca) is associate librarian, department of library and archives, laurentian university, sudbury, ontario, canada. mailto:alomothe@laurentian.ca information technology and libraries | june 2014 32 intrusion,6 as was experienced by the j. n. desmarais library of laurentian university in sudbury, ontario, canada.7 between june 2010 and may 2011, e-journal usage at the j. n. desmarais library spiked after a four-year period of stable annual usage levels.8 between 2006 and 2010, the total number of fulltext articles downloaded from the library’s e-journal collection ranged between 640,000 and 720,000 annually, with an average of 700,000 articles downloaded per year. but in 2010 that number dramatically increased to more than 857,000 full-text articles downloaded. this was followed by an additional 870,000 full-text articles downloaded in 2011. then, as suddenly and inexplicably as the increase had occurred, usage levels returned to the same quantities recorded in the years prior to 2010. a total of 716,000 full-text articles were downloaded in 2012. during this period of spiking usage the library received notifications and warnings from certain ejournal vendors of abnormally large numbers of full-text articles being downloaded over a relatively short period of time from the laurentian university ezproxy server’s ip address. this level of usage was a breach of license agreements. these vendors then proceeded to temporarily block laurentian university’s ezproxy access until they obtained assurances from the university that the offending accounts were no longer active. this action on the vendors’ part prevented any further suspected illegal downloading from occurring but also barred laurentian university students, staff, and faculty from authorized off-campus access. but not all vendors operated in this fashion and, unknown to the library at the time, full-text articles continued to be downloaded from other vendor sites in excessive amounts. either they were not monitoring excessive usage or they did not have the technical means to do so. regardless, in some cases certain e-journal titles recorded downloads thousands of times higher than normal. in some cases dozens of articles were being downloaded in seconds. the situation continued until late spring 2011, at which point it was discovered that confidential proxy account login information had been posted illegally on the web. with the login information of all compromised accounts now available, proxy managers were able to block their access at once, thereby ending the period of illegal downloading of laurentian university licensed material. web robots were suspected to have been involved. web robots, also referred to as internet bots or www robots, are automated software applications that run tasks on the web much as search engines do.9 they send requests to web servers to procure resources.10 some robots are developed with malicious intent and are designed to download entire websites for the purpose of copying the site,11 for autonomous logins to send spam,12 or for autonomous logins to steal confidential or copyright protected material.13 web robots specifically designed for the illegal procurement of copyright protected content are obviously of particular concern for libraries. unlawful downloading of full-text content occurs for many reasons. studies have clearly demonstrated that excessively high prices of digital content is a major drive for illegal downloads.14 misunderstanding and misinterpretation of copyright laws in addition to the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 33 unfamiliarity with and general apathy toward these same copyright laws further contribute to unlawful downloading of protected material.15 many students are unaware that the transmission of downloaded articles violates copyright laws and license agreements and often misunderstand the fair use aspect of copyright as meaning that the acquisition and distribution of licensed content for the purpose of education is allowed.16 in the minds of these students, distribution is permitted provided it is not for profit. librarians have also reported students systematically downloading all articles from recent journal issues not for the purpose of distribution or sale but rather to build their own personal collection.17 they are more concerned with obtaining resources quickly and completely rather than legally.18 aggravating the situation are students who firmly believe that by paying tuition they have permission to do whatever they wish with their institutions’ e-resources.19 some of these same students even use web robots to download as much as possible thereby saving them time and energy.20 they consider the downloaded item as their personal property. in fact, calluzzo and cante found that students displayed an ethical sense to personal property but became neutral if the property belonged to an enterprise.21 and solomon and o’brien found that 71 percent of students believed illegal copying to be a socially and ethically acceptable behavior.22 the j. n. desmarais library integrates e-resource usage into its collection development policy. as stated in the library’s collection development policy, “if the cost-per-use of an online resource is greater than the cost of an interlibrary loan for three consecutive years, this resource will be reviewed for cancellation.”23 in fact, this practice has been enforced for the past several years and has saved the library a considerable sum of money.24 for this reason, it is extremely important not to assume the accuracy of usage values without carefully examining the data. the artificial inflation of usage numbers could substantially cost the library if it was believed that an e-resource was beginning to experience an improvement in usage when, in actuality, it was not the case. the decision to keep this resource could cost the library tens of thousands of dollars before it was realized that the high number of searches or downloads recorded were not reflective of actual usage but were rather the result of data recording errors or illegal activity. regrettably, libraries will continue to deal with the consequences of copyright infringement, even if the library itself is not at fault. it is, however, important to recognize and understand that publishers are businesses and like any business, expect financial gain.25 even though e-resource piracy is currently very small, the risk of it becoming the single greatest threat to the industry is quite real. both music and film industries have been greatly affected by piracy for nearly two decades, and everyone witnessed the damaging effect it had. publishers have learned from this and will not allow it to happen to them as well.26 unfortunately for all parties involved, the nature of e-resources has made them extremely easy to copy.27 information technology and libraries | june 2014 34 method the following methodology will detail the step-by-step procedures to identify and deal with suspected outliers. all data manipulation and calculations were executed in microsoft excel for mac 2011 (version 14.3.2). all tables and figures were generated using the same version of excel. the first step is to identify suspected outliers by visually examining an entire usage dataset. a dataset is defined as a collection of related data corresponding to the contents of a single database table in which each column represents a particular variable and each row, a given member of the dataset in question.28 for this reason, the term dataset will be referred to in this paper as a grouping of data from any single spreadsheet. each spreadsheet contains the number of full-text articles downloaded per year per vendor. each dataset was downloaded from vendors’ sites as jr1 counter-compliant reports, which detail the number of successful full-text articles downloaded per month and per journal for a given year. all vendors provided jr1 counter-compliant reports that were downloaded as excel spreadsheets. each spreadsheet, or dataset, contained the list of e-journal title and the number of articles downloaded for each title per month (see table 1). each dataset was then visually inspected in its entirety for suspected outliers. january february march april may june july august september october november december polymer 12 15 26 33 38 64 39 5 13 15,123 109 44 surface and coatings technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 international journal of radiation oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 journal of catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 table 1. sample from a 2010 jr1 counter-compliant report indicating the number of articles downloaded per journal over a twelve-month period. suspected outliers are highlighted in bold. since it was impractical to include the entire spreadsheet, table 1 provides an excerpt from a 2010 jr1 counter-compliant report containing five suspected outliers that have been marked for identification. the suspected outliers are highlighted in bold. the first of these extreme values belongs to the title polymer and was recorded in october. compared to the other values for polymer, it stands out dramatically at 15,123 articles downloaded. the second and third extreme values belong to surface and coatings technology and are recorded for the months of october (3,771 downloads) and november (5,428 downloads). the fourth is the 6,436 articles downloaded in june from international journal of radiation oncology and the fifth from journal of catalysis in november (6,693 downloads). these five values greatly deviate from the other values recorded the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 35 for each e-journal title. for polymer, the next highest value is 109 articles downloaded in november 2010, making the suspected outlier almost 14,000 percent greater. now that the suspected outliers have been identified, they must be compared quantitatively to the rest of the values recorded for their corresponding titles and only for their corresponding titles. for example, to test the probability that the value of 15,123 downloads recorded in october 2010 for polymer is indeed an outlier, the comparison must include all other 2010 polymer monthly values plus all other available polymer values. this is achieved by copying all 2010 polymer monthly values into a separate blank spreadsheet and then adding all other polymer monthly values from all other available years to that same spreadsheet (see table 2). this new spreadsheet can be labeled dataset 2, with dataset 1 being the original jr1 report downloaded from the vendor. suspected usage outliers from an e-journal need to be compared to other usage values of that particular title because each e-journal tends to be used differently. it would be inaccurate to test for an outlier by comparing it to the values of all other e-journals included in a collection and would be like comparing apples to oranges. january february march april may june july august september october november december polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 polymer 2010 12 15 26 33 38 64 39 5 13 15,123 109 44 polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 table 2. combining polymer’s usage values from all available jr1 counter-compliant reports. the suspected outlier is highlighted in bold. table 2 provides the number of articles downloaded for the title polymer over a four-year period. these were the only jr1 reports available from the vendor. the suspected outlier is highlighted in bold. when visually comparing the suspected outlier of 15,123 downloads to the rest of the values in dataset 2, it again appears to be an extreme. the next highest value being 638 articles downloaded during march 2011, making the suspected outlier 2,200 percent greater than the next highest value in the dataset. all further outlier testing and accommodating will be performed on this table. the dixon q test was chosen to test for outliers. it is simple to use and designed to test for a small number of outliers in a dataset.29 the q value is calculated by measuring the difference in the gap between the suspected outlier and the next value over the range of values in the dataset (e.g., outlier—next value/largest—smallest). the gap is the absolute difference between the outlier and the closest number to it. to facilitate the calculation, the data should be arranged in order of increasing value with the smallest value at the front of the sequence and the largest value at the end of the sequence. for example, using the data in table 2 each value is be arranged beginning with 4, 5, 5, 7, . . . , 345, 638, information technology and libraries | june 2014 36 and finally ending with 15,123. the calculation would thus be (15,123−638) / (15,123−4) = 0.9581. the calculated q value will also be represented by the symbol of qvalue from this point onward, making qvalue = 0.9581. the next step is to compare the calculated qvalue to the critical values for q determined by verma and quiroz-ruiz.30 critical values correspond to a particular significance level and represent cutoff values that lead to the acceptance or rejection of a null hypothesis.31 the null hypothesis refers to the position in which there is no statistically significant relationship between two variables.32 the alternate hypothesis would thus be the existence of a relationship between two variables.33 if the calculated value is less than the critical value, the null hypothesis is accepted.34 on the other hand, if the calculated value is greater than the critical value, the null hypothesis is rejected.35 if the null hypothesis is rejected, then the alternate hypothesis must be accepted. here, the null hypothesis can be stated as “the suspected outlier is not an outlier.” the alternate hypothesis can then be stated as “the suspected outlier is an outlier.” therefore if the null hypothesis is rejected. then the suspected outlier is to be considered, in fact, to be an outlier. verma and quiroz-ruiz have calculated the critical value for q for a sample size of 48 and at a 95 percent confidence level to be qcritical = 0.2241.36 although operating at a 99 percent confidence level is a more conservative approach, it increases the likelihood of retaining a value that contains an error.37 operating at a 95 percent confidence level provides a reasonable compromise.38 if the calculated value is greater than the critical value, then the suspected outlier is confirmed to be an outlier. therefore, testing for the suspected outlier of 15,124, the q value was calculated to be qvalue = 0.9581. with q0.9581 > q0.2241, the null hypothesis is rejected and it must be accepted that 15,123 is an outlier. once it is determined with statistical certainty that the suspected outlier is indeed an outlier, it needs to be replaced with the median calculated from all values found in dataset 2. for the case of polymer, the median was calculated to be 27 from all values in table 2. replacing an outlier with the median to accommodate the data has been proven to be quite effective in dealing with outliers by introducing less distortion to that dataset.39 extreme values are therefore replaced with values more consistent with the rest of the data.40 january february march april may june july august september october november december polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 polymer 2010 12 15 26 33 38 64 39 5 13 27 109 44 polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 table 3. the identified outlier is replaced with the median (highlighted in bold). the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 37 table 3 represents the number of full-text articles downloaded for polymer after the outlier had been replaced with the median. the confirmed outlier of 15,123 articles downloaded recorded in october 2010 is replaced with the median of 27, highlighted in bold. this then becomes the accepted value for the number of articles downloaded from polymer in october 2010. the outlier is discarded. the new value of 27 articles downloaded in october 2010 replaces the extreme value of 15,123 in the original 2010 jr1 report (see table 4). this is the final step. january february march april may june july august september october november december polymer 12 15 26 33 38 64 39 5 13 27 109 44 surface and coatings technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 international journal of radiation oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 journal of catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 table 4. sample from a 2010 jr1 counter-compliant report indicating the number of articles downloaded per journal over a twelve-month period. polymer’s identified outlier is replaced with the median calculated from table 2 (highlighted in bold). once the first outlier is corrected, the same procedures need to be followed for the other suspected outliers highlighted in table 1. if it is determined that they are outliers, they are replaced with their associated median values. although the steps and calculations used to identify and correct for outliers are relatively simple to follow, it is admittedly a very lengthy and timeconsuming process. but in the end, it is well worth the effort. results and discussion table 5 details the changes in the overall number of articles downloaded from j. n. desmarais library e-journals that resulted from the elimination of outliers. the column titled “recorded downloads” details the number of articles downloaded between 2000 and 2012, inclusively, prior to outlier testing. the column titled “corrected downloads” represents the number of articles downloaded during the same period of time but after the outliers had been positively identified and the data cleaned. the affected values are highlighted in bold. information technology and libraries | june 2014 38 year recorded downloads corrected downloads 2000 806 806 2001 1034 1034 2002 1015 1015 2003 4890 4890 2004 72841 72841 2005 251335 251335 2006 640759 640759 2007 731334 731334 2008 710043 710043 2009 725019 725019 2010 857360 757564 2011 869651 696973 2012 716890 716890 table 5. comparison of the recorded number of articles downloaded to the corrected number of articles downloaded, over a thirteen-year period. all data from all available years were tested for outliers. only data recorded in 2010 and 2011 tested positive for outliers. replacing outliers with the median values for those affected journal titles dramatically reduced the total number of downloaded articles (see table 5). between 2007 and 2009, inclusively, the actual number of full-text articles downloaded recorded from the library’s e-journal collection totaled between 731,334 and 725,019 annually (see table 5). the annual average for those three years is 722,132 articles downloaded. but in 2010 that number dramatically increased to 857,360 downloaded articles, which was followed by 869,651 downloaded articles in 2011 (see table 5). the elimination of outliers from the 2010 data resulted in the number of downloads dropping from 857,360 to 757,564, a difference of nearly 99,796 downloads, or 12 percent. similarly, in 2011, the number of articles downloaded decreased from 869,651 to 696,973 once outliers were replaced with median values. this represents a reduction of over 172,678 downloaded articles, or 20 percent. a staggering 20 percent of articles downloaded in 2011 can therefore be considered as erroneous and, in all likelihood, the result of illicit downloading. figure 1 is a graphical representation of the change in the number of articles downloaded before and after the identification of outliers and their replacement by median values. the line “recorded downloads” clearly indicates a surge in usage between 2010 and 2011 with usage returning to levels recorded prior to the 2010 increase. the line “corrected downloads” depicts a very different picture. the plateau in usage that began in 2007 continues through 2012. evidently, the observed spike in usage was artificial and the result of the presence of outliers in certain datasets. if the data had not been tested for outliers, it would have appeared that usage the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 39 had substantially increased in 2010 and it would have been incorrectly assumed that usage was on the rise once more. instead, the corrected data bring usage levels for 2010 and 2011 back in line with the plateau that had begun in 2007 and reflects a more realistic picture of usage rates at laurentian university. figure 1. comparing the recorded number of articles downloaded to the corrected number of articles downloaded over a thirteen-year period. accuracy in any data gathering is always extremely important, but accuracy in e-resource usage levels is critical for academic libraries. academic libraries having e-journal subscription rates based either entirely or partly on usage can be greatly affected if usage numbers have been artificially inflated. it can lead to unnecessary increases in cost. since it was determined that outliers were present only during the period in which the library had found itself under “attack,” it can be assumed that the vast majority, if not all, of the extreme usage values were a result of illegal downloading. it would therefore be a shame to need to pay higher costs because of inappropriate or illegal downloading of licensed content. accurate usage data is also important for academic libraries that integrate usage statistics into their collection development policy for the purpose of justifying the retention or cancellation of a particular subscription. the j. n. desmarais library is such a library. as indicated earlier, if the cost-per-download of a subscription is consistently greater than the cost of an interlibrary loan for three or more years, it is marked for cancellation. at the j. n. desmarais library, the average cost of an interlibrary loan had been previously calculated to be approximately can$15.00.42 therefore, subscriptions recording a “cost-per-download” greater than the can$15.00 target for more than three years can be eliminated from the collection. information technology and libraries | june 2014 40 any artificial increase in the number of downloads would have as result to artificially lower the cost-per-use ratio. this would reinforce the illusion that a particular subscription was used far more than it really was and lead to the false belief that it would be less expensive to retain rather than rely on interlibrary loan services. the true cost-per-use ratio may be far greater than initially calculated. the unnecessary retention of a subscription could prevent the acquisition of another, more relevant, one. for example, after adjusting the number of articles downloaded from sciencedirect in 2011, the cost-per-download ratio increased from can$0.74 to can$1.59, a 53 percent increase. for the j. n. desmarais library, this package was obviously not in jeopardy of being cancelled. but a 53 percent change in the cost-per-use ratio for borderline subscriptions would definitely have been affected. it must also be stated that none of the library’s subscriptions having experienced extreme downloading found themselves in the position of being cancelled after the usage data had been corrected for outliers. regardless, it is important to verify all usage data prior to any data analysis to identify and correct for outliers. once the outlier detection investigation has been completed and any extreme values replaced by the median, there would be no further need to manipulate the data in such a fashion. the identification of outliers is a one-time procedure. the corrected or cleaned datasets would then become the official datasets to be used for any further usage analyses. conclusions outliers can have a dramatic effect on the analysis of any dataset. as demonstrated here, the presence of outliers can lead to the misrepresentation of usage patterns. they can artificially inflate average values and introduce severe distortion to any dataset. fortunately, they are fairly easy to identify and remove. the following steps were used to identify outliers in jr1 countercompliant reports: 1. identify possible outliers: visually inspect the values recorded in a jr1 report dataset (dataset 1) and mark any extreme values. 2. for each suspected outlier identified, take the usage values for the affected e-journal title and incorporate them into a separate blank spreadsheet (dataset 2). incorporate into dataset 2 all other usage values for the affected journal from all available years. it is important that dataset 2 contain only those values for the affected journal. 3. test for the outlier: perform dixon q test on the suspected outlier to confirm or disprove existence of the outlier. 4. if the suspected outlier tests as positive, calculate the median of dataset 2. 5. replace the outlier in dataset 1 with the median calculated from dataset 2. 6. perform steps 1 through 5 for any other suspected outliers in dataset 1. 7. the corrected values in dataset 1 will become the official values and will be used for all subsequent usage data analysis. the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 41 the identification and removal of outliers had a noticeable effect on the usage statistics for j. n. desmarais library’s e-journal collection. outliers represented over 100,000 erroneous downloaded articles in 2010 and nearly 200,000 in 2011. a total of 20 percent of recorded downloads in 2011 were anomalous, and in all likelihood a result of illicit downloading after laurentian university’s ezproxy server was breached. new technologies have made digital content easily available on the web, which has caused serious concern for both publishers43 and institutions of higher learning, which have been experiencing an increase is illicit attacks.44 the history of napster supports the argument that users “will freely steal content when given the opportunity.”45 since web robot traffic will continue to grow in pace with the internet, it is critical that this traffic be factored into the performance and protection of any web servers.46 references 1. victoria j. hodge and jim austin, “a survey of outlier detection methodologies,” artificial intelligence review 85 (2004): 85–126, http://dx.doi.org/10.1023/b:aire.0000045502.10941.a9; patrick h. menold, ronald k. pearson, and frank allgöwer, “online outlier detection and removal,” in proceedings of the 7th mediterranean conference on control and automation (med99) haifa, israel—june 28-30, 1999 (haifa, israel: ieee, 1999): 1110–30. 2. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 3. vic barnett and toby lewis, outliers in statistical data (new york: wiley, 1994). 4. hodge and austin, “a survey of outlier detection methodologies,” 85–126; r. s. witte and j. s. witte, statistics (new york: wiley, 2004); menold et al., “online outlier detection and removal,” 1110–30. 5. menold et al., “online outlier detection and removal,” 1110–30. 6. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 7. laurentian university (sudbury, canada) is classified as a medium multi-campus university. total 2012 full-time student population was 6,863, of which 403 were enrolled in graduate programs. in addition, 2012 part-time student population was 2,652 with 428 enrolled in graduate programs. also in 2012, the university employed 399 full-time teaching and research faculty members. academic programs cover a multiple of fields in the sciences, social sciences, and humanities and offers 60 undergraduate, 17 master’s, and 7 doctoral degrees. 8. alain r. lamothe, “factors influencing usage of an electronic journal collection at a mediumsize university: an eleven-year study,” partnership: the canadian journal of library and information practice and research 7, no. 1 (2012), https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j. https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j information technology and libraries | june 2014 42 9. ben tremblay, “web bot—what is it? can it predict stuff?” daily common sense: scams, science and more (blog), january 24, 2008, http://www.dailycommonsense.com/web-botwhat-is-it-can-it-predict-stuff/. 10. derek doran and swapna s. gokhale, “web robot detection techniques: overview and limitations,” data mining and knowledge discovery 22 (2011): 183–210, http://dx.doi.org/10.1007/s10618-010-0180-z. 11. c. lee giles, yang sun, and isaac g. councill, “measuring the web crawler ethics,” in www 2010 proceedings of the 19th international conference on world wide web (raleigh, nc: international world wide web conferences steering committee, 2010): 1101–2, http://dx.doi.org/10.1145/17772690.1772824. 12. shinil kwon, kim young-gab, and sungdeok cha, “web robot detection based on patternmatching technique,” journal of information science 38 (2012): 118–26, http://dx.doi.org/10.1177/0165551511435969. 13. david watson, “the evolution of web application attacks,” network security (2007): 7–12, http://dx.doi.org/10.1016/s1353-4858(08)70039-4. 14. eric kin-wai lau, “factors motivating people toward pirated software,” qualitative market research 9 (2006): 404–19, http://dx.doi.org/1108/13522750610689113. 15. huan-chueh wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” electronic library 28 (2010): 197–209, http://dx.doi.org/10.1108/02640471011033576. 16. ibid. 17. ibid. 18. emma mcculloch, “taking stock of open access: progress and issues,” library review 55 (2006): 337–43; c. patra, “introducing e-journal services: an experience,” electronic library 24 (2006): 820–31. 19. wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” 197–209. 20. ibid. 21. vincent j. calluzzo and charles j. cante, “ethics in information technology and software use,” journal of business ethics 51 (2004): 301–12, http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e. 22. s. l. solomon and j. a. o’brien “the effect of demographic factors on attitudes toward software piracy,” journal of computer information systems 30 (1990): 41–46. 23. j. n. desmarais library, “collection development policy” (sudbury, on: laurentian university, 2013), http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://dx.doi.org/10.1007/s10618-010-0180-z http://dx.doi.org/10.1145/17772690.1772824 http://dx.doi.org/10.1177/0165551511435969 http://dx.doi.org/10.1016/s1353-4858(08)70039-4 http://dx.doi.org/1108/13522750610689113 http://dx.doi.org/10.1108/02640471011033576 http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 43 http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development %20policy.pdf. 24. lamothe, “factors influencing usage”; alain r. lamothe, “electronic serials usage patterns as observed at a medium-size university: searches and full-text downloads,” partnership: the canadian journal of library and information practice and research 3, no. 1 (2008), https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i. 25. martin zimerman, “e-books and piracy: implications/issues for academic libraries,” new library world 112 (2011): 67–75, http://dx.doi.org/10.1108/03074801111100463. 26. ibid. 27. peggy hageman, “ebooks and the long arm of the law,” econtent (june 2012), http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-ofthe-law--82976.htm. 28. “dataset, n.,” oed online, (oxford, uk: oxford university press, 2013), http://www.oed.com/view/entry/261122?redirectedfrom=dataset; “dataset—definition,” ontotext, http://www.ontotext.com/factforge/dataset-definition; w. paul vogt, “data set,” dictionary of statistics and methodology: a nontechnical guide for the social sciences (london, uk: sage, 2005); allan g. bluman, elementary statistics—a step by step approach (boston: mcgraw-hill, 2000). 29. david b. rorabacher, “statistical treatment for rejection of deviant values: critical values of dixon’s ‘q’ parameter and related subrange ratios at the 95% confidence level,” analytical chemistry 63 (1991): 139–45; r. b. dean and w. j. dixon, “simplified statistics for small numbers of observations,” analytical chemistry 23 (1951): 636–38, http://dx.doi.org/10.1021/ac00002a010. 30. surenda p. verma and alfredo quiroz-ruiz, “critical values for six dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering,” revista mexicana de ciencias geologicas 23 (2006): 133–61. 31. robert r. sokal and f. james rohlf, biometry (new york: freeman, 2012); j. h. zar, biostatistical analysis (upper saddle river, nj: prentice hall, 2010). 32. “null hypothesis,” accessscience (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 33. ibid. 34. “critical value,” accessscience, (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 35. ibid. 36. verma and quiroz-ruiz, “critical values for six dixon tests for outliers,” 133–61. http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i http://dx.doi.org/10.1108/03074801111100463 http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.oed.com/view/entry/261122?redirectedfrom=dataset http://www.ontotext.com/factforge/dataset-definition http://dx.doi.org/10.1021/ac00002a010 http://www.accessscience.com/ http://www.accessscience.com/ information technology and libraries | june 2014 44 37. rorabacher, “statistical treatment for rejection of deviant values,” 139–45. 38. ibid. 39. jaakko astola and pauli kuosmanen, fundamentals of nonlinear digital filtering (new york: crc, 1997); jaakko astola, pekka heinonen, and yrjö neuvo, “on root structures of median and median-type filters,” ieee transactions of acoustics, speech, and signal processing 35 (1987): 1199–201; l. ling, r. yin, and x. wang, “nonlinear filters for reducing spiky noise: 2dimensions,” ieee international conference on acoustics, speech, and signal processing 9 (1984): 646–49; n. j. gallagher and g. wise, “a theoretical analysis of the oroperties of median filters,” ieee transactions of acoustics, speech, and signal processing 29 (1981): 1136–41. 40. menold et al., “online outlier detection and removal,” 1110–30. 41. ibid. 42. lamothe, “factors influencing usage”; lamothe, “electronic serials usage patterns.” 43. paul gleason, “copyright and electronic publishing: background and recent developments,” acquisitions librarian 13 (2001): 5–26, http://dx.doi.org/10.1300/j101v13n26_02. 44. tena mcqueen and robert fleck jr., “changing patterns of internet usage and challenges at colleges and universities,” first monday 9 (2004), http://firstmonday.org/issues/issue9_12/mcqueen/index.html. 45. robin peek, “controlling the threat of e-book piracy,” information today 18, no. 6 (2001): 42. 46. gleason, “copyright and electronic publishing,” 5–26. http://dx.doi.org/10.1300/j101v13n26_02 http://firstmonday.org/issues/issue9_12/mcqueen/index.html 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online. in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202--8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky. however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work , now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in lid d ... d do __ _ fig. 1. three types of bibliographic items: top, single-document item; center, finite multiple-document item; bottom, infinite multipledocument item. braries, can be assigned to one of these three structures. there are no exceptions. all bibliographic items, furthermore, contain works. an item may contain one single work. but an item may also contain several works. schematically, the two situations can be represented as shown in figure 2. an item that is composed of several documents and contains several works may have one work in each document, or several per document. schematically, the two possibilities can be represented as shown in figure 3. it is possible, of course, for an item to fig . . 2. top, single-work document (example: a typical novel); bottom, multiple-work document (example: a collection of plays). communications 113 fig. 3. top, one work per document; bottom, several works per document . be composed of several documents but to contain only one work. figure 4 is a schematic representation of this case. mixed structures are also possible, as in the schematic shown in figure 5. ign oring the mixed structure that is only a combination of two "pure" structures, the foregoing information can be combined into a table that shows seven possible publication types that differ from each other in terms of structure (figure 6). all bibliographic items, whether composed of one document or many, are known by a title . these titles can be called item titles. in the case of a singledocument item (structures a and c), item title and document title are, of course, identical. but in the case of some multiple-document items (publications of types d, e, f, and g, for example), two possibilities exist: the documents that make up the item may or may not have their own individual document titles. for purposes of fig. 4. multivolume work (example: a very long novel in two volumes). fig. 5. finite multi-document item containing many works, mixed structure. 114 journal of library automation vol. 14/2 june 1981 one several documents document per item per item one \.jork per item a se veral several lo/orks works per item per c document one lo/ork per document fig . 6. publication types. the bibliographer or cataloger, items that consist of several documents bearing individual document titles can be described under one of two principles. the entire item can be treated as a unit. elsewhere i have coined a term for this treatment: the set description principle .1 but it is also possible to treat each document as a separate publication, to describe it under the book description principle . if we combine all these considerations we find that we can assign to each bibliographic item that is added to a library's collection one of the thirteen codes shown in figure 7. how can these codes be useful? taking a look into the future, let us imagine an online catalog system supported by a database that contains the records of a library's holdings . the records in such a database are entered in a definite format . in this format, whatever it will be called , there will be data fields for titles, authors, physical descriptions , subject headings, document numbers, and much else. i propose that to these fields one other be added: the structure code . the structure code would add a new dimension to the retrieval of recorded infinite infinite b d e f g formation. here are a few specific examples . consider a search for material on subject x. qualify the search argument by structure codes 1, 3, 7, and 12. result: the search will yield only major monographic works, defined as items of types a, b,f, and g. note that subject x assigned to such items is a true subject heading. the materials retrieved in this example would all be works dealing specifically with the topic x. but the same term assigned to an item coded, say, 6, would not be a true subject heading. the term here would only give a broad general summary of what the works in the item are about. the structure code adds sophistication to the retrieval process by enabling a searcher to distinguish between specific subject designators and mere summary subject headings. a search that excludes codes 2, 4, 5, and 6 limits output to materials that are not just collections of essays. the stratagem used in card catalogs to reach the same result is the qualification of a subject heading by terms denoting format, such as the subdivisions congresses or addresses, essays, lectures . this method of qualifying subject headings has never been done communications ll5 structure code publication type description principle: book (b) or set (s) schematic 1 a 2 c 3 b 4 d 5 d 6 d 7 f 8 f 9 e 10 e 11 e 12 g 13 g fig. 7. structure codes . consistently , however . the proposed structure code would ensure uniform treatment of all affected publications. qualify the search by codes 9, 10, 11, 13 and all periodicals can be excluded . in the card catalog, format qualifications such b b s b s, with individual document title s, without indiv . document title b s b s, with individual . document title s, without indiv. document title b s fwli ___ wgj ~--~ as periodicals, or societies, periodicals, etc ., or yearbooks are sometimes added to subject headings to reach similar results. again, the structure code would introduce uniformity and consistency. present-day card catalogs list publica116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is im14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout based on data collected as part of the 2006 public libraries and the internet study, the authors assess the degree to which public libraries provide sufficient and quality bandwidth to support the library’s networked services and resources. the topic is complex due to the arbitrary assignment of a number of kilobytes per second (kbps) used to define bandwidth. such arbitrary definitions to describe bandwidth sufficiency and quality are not useful. public libraries are indeed connected to the internet and do provide public-access services and resources. it is, however, time to move beyond connectivity type and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. a secondary, but important issue is the extent to which libraries, particularly in rural areas, have access to broadband telecommunications services. t he biennial public libraries and the internet studies, conducted since 1994, describe public library involve­ ment with and use of the internet.1 over the years, the studies showed the growth of public­access comput­ ing (pac) and internet access provided by public libraries to the communities they serve. internet connectivity rose from 20.9 percent to essentially 100 percent in less than ten years; the average number of public access computers per library increased from an average of two to nearly eleven; and bandwidth rose to the point where 63 percent of public libraries have connection speeds of greater than 769kbps (kilobytes per second) in 2006. this dramatic growth, replete with related information technology challenges, occurred in an environment of challenges—among them budgetary and staffing—that public libraries face in main­ taining traditional services as well as networked services. one challenge is the question of bandwidth suf­ ficiency and quality. the question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” as will be seen in this paper, such arbitrary definitions to describe band­ width sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction.2 there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet­based applications and services. this inad­ equacy will increase significantly as internet­based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high­use multiple­ workstation public­access context. in addition to connectivity speed, there are many ques­ tions related to public library pac and internet access that can affect bandwidth sufficiency—from budget and sus­ tainability, staffing and support, to services public librar­ ies offer through their technology infrastructure, and the impacts of connectivity and pac on the communities that libraries serve. one key question, however, is what is quality pac and internet bandwidth for public libraries? and, in attempting to answer that question, what are measures and benchmarks of quality internet access? this paper provides data from the 2006 public libraries and the internet study to foster discussion and debate around determining quality pac and internet access.3 bandwidth and connectivity data at the library outlet or branch level are presented in this article. the band­ width measures are not systemwide but rather at the point of service delivery in the branch. ■ the bandwidth issue there are a number of factors that affect the sufficiency and quality of bandwidth in a pac and internet service context. examples of factors that influence actual speed include: ■ number of workstations (public­access and staff) that simultaneously access the internet; ■ provision of wireless access that shares the same con­ nection; ■ ultimate connectivity path—that is, a direct connec­ tion to the internet that is truly direct, or one that goes through regional or other local hops (that may have aggregated traffic from other libraries or orga­ nizations) out to the internet; john carlo bertot and charles r. mcclure assessing sufficiency and quality of bandwidth for public libraries john carlo bertot (jbertot@fsu.edu) is the associate director of the information use management and policy institute and professor at the college of information, florida state university; and charles r. mcclure (cmcclure@ci.fsu.edu) is the director of the information use management and policy institute (www .ii.fsu.edu) and francis eppes professor of information studies at the college of information, florida state university. article title | author 15assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 15 ■ type of connection and bandwidth that the telecom­ munications company is able to supply the library; ■ operations (surfing, e­mail, downloading large files, streaming content) being performed by users of the internet connection; ■ switching technologies; ■ latency effects that affect packet loss, jitter, and other forms of noise throughout a network; ■ local settings and parameters, known or unknown, that impede transmission or bog down the delivery of internet­based content; ■ range of networked services (databases, videoconfer­ encing, interactive/real­time services) to which the library is linked; ■ if networked, the speed of the network on which the public­access workstations reside; and ■ general application resource needs, protocol priority, and other general factors. thus, it is difficult to precisely answer “how much bandwidth is enough” within an evolving and dynamic context of public access, use, and infrastructure. putting public­access internet use into a more typi­ cal application­and­use scenario, however, may provide some indication of adequate bandwidth. for example: ■ a typical three­minute digital song is 3mb; ■ a typical digital photo is about 2mb; and ■ a typical powerpoint presentation is about 10mb. if one person in a public library were to e­mail a powerpoint presentation at the same time that another person downloaded multiple songs, and another was exchanging multiple pictures, even a library with a t1 line (1.5mbps—megabytes per second) would experience a temporary network slowdown during these operations. this does not take into account many other new high­ bandwidth­consuming applications such as cnn stream­ ing­video channel; uploading and accessing content to a wiki, blog, or youtube.com; or streaming content such as cbs’s webcasting the 2006 ncaa basketball tournament. an increasingly used technology in various settings is two­way internet­based video conferencing. with an installed t1 line, a library could support two 512kbps or three 384kbps videoconferences, depending on the amount of simultaneous traffic on the network—which, in a public access context, would be heavy. indeed, the 2006 public libraries and the internet study indicated a near continuous use of public­access workstations by patrons (only 14.6 percent of public libraries indicated that they always had a sufficient number of workstations available for patron use). public libraries increasingly serve as access points to e­government services and resources, e.g., social services, disaster relief, health care.4 these services can require the simple completion of a web­based form (low­bandwidth consumption) to more interactive services (high­band­ width consumption). and, as access points to continuing education and online degree programs, public libraries need to offer adequate broadband to enable users to access services and resources that increasingly can depend on streaming technologies that consume greater bandwidth. ■ bandwidth and pac in public libraries today as table 1 demonstrates, public libraries continue to increase their bandwidth, with 63.3 percent of public libraries reporting connection speeds of 769kbps or greater. this compares to 47.7 percent of public libraries reporting connection speeds of greater than 769kbps in 2004. there are disparities between rural and urban pub­ lic libraries, with rural libraries reporting substantially fewer instances of connection speeds of greater than 1.5mbps in 2006. on the one hand, the increase in con­ nectivity speeds between 2004 and 2006 is a positive step. on the other, 16.1 percent of public libraries report that their connection speeds are insufficient to meet patron demands all of the time, and 29.4 percent indicate that their connection speeds are insufficient to meet patron demands some of the time. thus, nearly half of public libraries indicate that their connection speeds are insuf­ ficient to meet patron demands some or all of the time. in terms of public access computers, the average number of workstations that public libraries provide is 10.7 (table 2). urban libraries have an average of 17.1 workstations, as compared to rural libraries, which report an average of 7.1 workstations. a closer look at bandwidth and pac for the next sections, the data offer two key views for analysis purposes: (1) workstations—divided into libraries with ten or fewer public­access workstations and libraries with more than ten public­access worksta­ tions (given that the average number of public­access workstations in libraries is roughly ten); and (2) band­ width—divided into libraries with 769kbps or less and libraries with greater than 769kbps (an arbitrary indicator of broadband for a public library context). in looking across bandwidth and public­access work­ stations (table 3), overall 31.8 percent of public libraries have connection speeds of less than 769kbps while 63.3 percent have connection speeds of greater than 769kbps. a majority of public libraries—68.5 percent—have ten or fewer workstations, while 30.9 percent have more than ten workstations. in general, rural libraries have fewer workstations and lower bandwidth as compared to sub­ urban and urban libraries. indeed, 75.2 percent of urban 16 information technology and libraries | march 200716 information technology and libraries | march 2007 libraries with fewer than ten workstations have connec­ tion speeds of greater than 769kbps, as compared to 45.2 percent of rural libraries. when examining pac capacity, it is clear that public libraries have capacity issues at least some of the time in a typical day (tables 4 through 6). only 14.6 percent of public libraries report that they have sufficient numbers of workstations to meet patron demands at all times (table 6), while nearly as many, 13.7 percent, report that they consistently are unable to meet patron demands for public­access workstations (table 4). a full 71.7 percent indicate that they are unable to meet patron demands during certain times in a typical day (see table 5). in other words, 85.4 percent of public libraries report that they are unable to meet patron demand for public­access workstations some or all of the time during a typical day—regardless of number of workstations available and type of library. the disparities between rural and urban libraries are notable. in general, urban libraries report more difficulty in meeting patron demands for public­access workstations. of urban public libraries, 27.8 percent report that they consistently have difficulty in meeting patron demand for workstations, as compared to 11.0 percent of suburban and 10.6 percent of rural public libraries (table 4). by contrast, 6.6 percent of urban libraries report sufficient workstations to meet patron demand all the time as compared to 18.9 percent of rural libraries (table 6). when reviewing the adequacy of speed of connectiv­ ity data by the number of workstations, bandwidth, and metropolitan status, a more robust and descriptive pic­ table 1. public library outlet maximum speed of public-access internet services by metropolitan status and poverty metropolitan status poverty level maximum speed urban suburban rural low medium high overall less than 56kbps 0.7% ±0.8% (n=18) 0.4% ±0.6% (n=17) 3.7% ±1.9% (n=275) 2.0% ±1.4% (n=245) 2.7% ±1.6% (n=61) 2.6% ±1.6% (n=5) 2.1% ±1.4% (n=311) 56kbps– 128kbps 2.5% ±1.6% (n=67) 5.4% ±2.3% (n=264) 15.2% ±3.6% (n=1,132) 9.9% ±3.0% (n=1,237) 9.5% ±2.9% (n=216) 5.3% ±2.2% (n=10) 9.8% ±3.0% (n=1,463) 129kbps– 256kbps 2.7% ±1.6% (n=72) 6.8% ±2.5% (n=332) 11.1% ±3.1% (n=829) 8.5% ±2.8% (n=1,067) 7.3% ±2.6% (n=166) 8.2% ±2.8% (n=1,233) 257kbps–768kbps 9.1% ±2.9% (n=241) 10.4% ±3.1% (n=504) 13.4% ±3.4% (n=1,002) 12.5% ±3.3% (n=1,557) 8.4% ±2.8% (n=190) 11.7% ±3.2% (n=1,747) 769kbps– 1.5mbps 33.6% ±4.7% (n=889) 40.0% ±4.9% (n=1,945) 31.0% ±4.6% (n=2,310) 34.3% ±4.8% (n=4,286) 34.6% ±4.8% (n=788) 38.1% ±4.9% (n=70) 34.4% ±4.8% (n=5,144) greater than 1.5mbps 49.4% ±5.0% (n=1,304) 31.6% ±4.7% (n=1,533) 19.9% ±4.0% (n=1,488) 27.4% ±4.5% (n=3,423) 35.5% ±4.8% (n=808) 50.5% ±5.0% (n=93) 28.9% ±4.5% (n=4,324) don’t know 1.9% ±1.4% (n=50) 5.4% ±2.3% (n=263) 5.7% ±2.3% (n=427) 5.5% ±2.3% (n=685) 2.1% ±1.4% (n=48) 3.5% ±1.8% (n=6) 4.9% ±2.2% (n=739) weighted missing values, n=1,497 table 2. average number of public library outlet graphical publicaccess internet terminals by metropolitan status and poverty* poverty level metropolitan status low medium high overall urban 14.7 20.9 30.7 17.9 suburban 12.8 9.7 5.0 12.6 rural 7.1 6.7 8.1 7.1 overall 10.0 13.3 26.0 10.7 * note that most library branches defined as “high poverty” are in general part of library systems with multiple branches and not single building systems. by and large, library systems connect and provide pac and internet services systemwide. article title | author 17assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 17 ture emerges. while overall, 53.5 percent of public librar­ ies indicate that their connection speeds are adequate to meet demand, some parsing of this figure reveals more variation (tables 7 through 10): ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at all times, with 24.0 percent of rural libraries, 25.8 percent of suburban libraries, and 25.4 percent of urban libraries so reporting (table 7). ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at some times, with 35.0 percent of rural libraries, 38.1 per­ cent of suburban libraries, and 53.4 percent of urban libraries so reporting (table 8). ■ libraries with connection speeds of greater than 769kbps also report bandwidth­sufficiency issues, with 12.0 percent of rural libraries, 10.5 percent of suburban libraries so reporting; and 14.0 percent of urban librar­ ies indicating that their connection speeds are insuf­ ficient all of the time (table 7); 20.3 percent of rural libraries, 29.5 percent of suburban libraries, and 30.0 percent of urban libraries indicating that their connec­ tion speeds are insufficient some of the time (table 8). ■ libraries that have ten or fewer workstations tend to rate their bandwidth as more sufficient at either 769kbps or less or greater than 769kbps (tables 7, 8, and 10). thus, in looking at the data, it is clear that libraries with fewer workstations indicate that their connection speeds are more sufficient to meet patron demand. table 3. public library public-access workstations and speed of connectivity by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 48.4% n=2,929 45.2% n=2,737 30.1% n=891 63.2% n=1,872 21.6% n=269 75.2% n=937 more than 10 workstations 22.0% n=307 75.5% n=1,053 12.0% n=225 85.1% n=1,595 9.6% n=130 89.8% n=1,221 total 43.4% n=3,242 50.9% n=3,802 23.0% n=1,116 71.6% n=3,474 15.1% n=399 83.0% n=2,194 missing: 7.6% (n=1,239) table 4. fewer public library public-access workstations than patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 10.5% n=681 10.8% n=339 23.6% n=300 12.1% n=1,321 more than 10 workstations 10.8% n=158 11.4% n=220 31.2% n=430 16.9% n=808 total 10.6% n=845 11.0% n=562 27.8% n=748 13.7% n=2,157 missing: 2.9% (n=473) table 5. fewer public library public-access workstations than patrons wishing to use them at certain times during a typical day by metropolitan status rural suburban urban total 10 or fewer workstations 68.8% n=4,444 74.5% n=2,347 69.1% n=880 70.5% n=7,670 more than 10 workstations 78.1% n=1,139 80.2% n=1,548 62.8% n=866 74.5% n=3,553 total 70.5% n=5,605 76.7% n=3,905 65.6% n=1,764 71.7% n=11,273 missing: 2.9% (n=473) table 6. sufficient public library public-access workstations available for patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 20.6% n=1,331 14.7% n=464 7.4% n=94 17.4% n=1,889 more than 10 workstations 11.0% n=161 8.4% n=163 6.0% n=83 8.5% n=406 total 18.9% n=1,501 12.3% n=627 6.6% n=177 14.6% n=2,304 missing: 2.9% (n=473) 18 information technology and libraries | march 200718 information technology and libraries | march 2007 table 7. public library connection speed insufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 25.4% n=668 12.1% n=297 27.4% n=233 9.8% n=173 15.4% n=34 10.2% n=90 more than 10 workstations 11.6% n=34 11.4% n=108 19.2% n=41 11.3% n=168 25.4% n=32 17.1% n=199 total 24.0% n=705 12.0% n=408 25.8% n=274 10.5% n=341 18.7% n=72 14.0% n=293 table 8. public library connection speed insufficient to meet patron needs at some times by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 34.1% n=898 19.3% n=474 37.1% n=315 29.0% n=511 50.0% n=130 27.0% n=238 more than 10 workstations 43.2% n=127 22.5% n=214 42.3% n=90 30.3% n=450 60.3% n=76 32.0% n=374 total 35.0% n=1,025 20.3% n=694 38.1% n=405 29.5% n=961 53.4% n=206 30.0% n=626 table �. public library connection speed is sufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 38.9% n=1,025 68.3% n=1,675 35.0% n=297 60.2% n=1,062 34.6% n=90 62.9% n=556 more than 10 workstations 45.2% n=133 66.1% n=628 38.5% n=82 54.9% n=817 14.3% n=18 50.9% n=594 total 39.5% n=1,158 67.5% n=2,306 35.7% n=379 57.9% n=1,886 28.0% n=108 56.0% n=1,168 table 10. public library connection speed insufficient to meet patron needs some or all of the time by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 59.5% n=1,566 31.4% n=771 64.6% n=549 38.8% n=684 65.4% n=170 37.1% n=328 more than 10 workstations 54.8% n=161 33.9% n=322 61.5% n=131 41.6% n=618 85.7% n=108 49.1% n=573 total 24.0% n=1,025 32.3% n=1,102 64.0% n=680 40.0% n=1,302 72.0% n=278 44.0% n=919 article title | author 1�assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 1� ■ discussion and selected issues the data presented point to a number of issues related to the current state of public library pac and internet­access adequacy in terms of available public access computers and bandwidth. the data also provide a foundation upon which to discuss the nature of quality and sufficient pac and internet access in a public library environment. while public libraries indicate increased ability to meet patron bandwidth demand when providing fewer publicly avail­ able workstations, public libraries indicate that they have difficulty in meeting patron demand for public access computers. growth of wireless connections in 2004, 17.9 percent of public library outlets offered wire­ less access, and a further 21.0 percent planned to make it available. outlets in urban and high­poverty areas were most likely to have wireless access. the majority of librar­ ies (61.2 percent), however, neither had wireless access nor had plans to implement it in 2004. as table 11 demon­ strates, the number of public library outlets offering wire­ less access has roughly doubled from 17.9 percent to 36.7 percent in two years. furthermore, 23.1 percent of outlets that do not currently have it plan to add wireless access in the next year. thus, if libraries follow through with their plans to add wireless access, 61.0 percent of public library outlets in the united states will have it by 2007. the implications of the rapid growth of the public library’s provision of wireless connectivity (as shown in table 11) on bandwidth requirements are significant. either libraries added wireless capabilities through their current overall bandwidth, or they obtained additional bandwidth to support the increased demand created by the service. if the former, then wireless access created an even greater burden on an already problematic band­ width capacity and may have actually reduced the overall quality of connectivity in the library. if the latter, libraries then had to shoulder the burden of increased expendi­ tures for bandwidth. either scenario required additional technology infrastructure, support, and expenditures. sufficient and quality connections the notion of sufficient and quality public library con­ nection to the internet is a moving target and depends on a range of factors and local conditions. for purposes of discussion in this paper, the authors used 769kbps to differentiate “slower” from “faster” connectivity. if, how­ ever, 1.5mbps or greater had been used to define faster connectivity speeds, then only 28.9 percent of public libraries would meet the criterion of “faster” connectiv­ ity (see table 1). and in fact, simply because 28.9 percent of public libraries report connection speeds of 1.5mbps or faster does not also mean that they have sufficient or quality bandwidth to meet the computing needs of their users, their staff, their vendors, and their service provid­ ers. some public libraries may need 10mbps to meet the pac needs of their users as well as the internal staff and management computing needs. the library community needs to become more edu­ cated and knowledgeable about what constitutes sufficient and quality connectivity in their library for the communi­ ties that they serve. a first step is to understand clearly the nature and type of the connectivity of the library. the next step is to conduct an internal audit that minimally: ■ identifies the range of networked services the library provides both to users as well as for the operation of the library; ■ identifies the typical bandwidth consumption of these services; ■ determines the demands of users on the bandwidth in terms of services they use; ■ determines peak bandwidth­usage times; ■ identifies the impact of high­consumption networked services used at these peak­usage times; ■ anticipates bandwidth demands of newer services and resources that users will want to access through the library’s infrastructure—myspace.com, youtube. com—regardless of whether or not the library is the direct provider of such services; and ■ determines what broadband services are available to the library, the costs of these services, and the “fit” of these services to the needs of the library. based on this and related information from such an audit, library administration can better determine the degree to which the bandwidth is sufficient in speed and quality. ■ planning for sufficient and quality bandwidth knowing the current condition of existing bandwidth in the library is not the same as successful technology plan­ ning and management to ensure that the library has, in fact, bandwidth that is sufficient in speed and quality. once an audit such as has been suggested is completed, careful planning for bandwidth deployment in the library is essential. it appears, however, that currently much of the management and planning for networked services is based first on what bandwidth is available as opposed to the bandwidth that is needed to provide the necessary services and resources in a networked environment. this stance puts public libraries in a reactive condition rather than a proactive condition regarding provision of net­ worked services. 20 information technology and libraries | march 200720 information technology and libraries | march 2007 most public library planning approaches stress the importance of conducting some type of needs assessment as a precursor to any type of planning.5 further, technology plans should include such things as goals, objectives, ser­ vices provision, and evaluation as they relate to bandwidth and the appropriate bandwidth needed. recent library technology planning guides, however, give little attention to the management, planning, and evaluation of band­ width as it relates to provision of networked services. it must be noted that some public libraries may be prevented from accessing higher bandwidth due to high cost, lack of availability of bandwidth alternatives, or other local factors that determine access to advanced telecommunications in their areas. in such circumstances, the audit may serve to inform the public service/utilities commissions, fcc, and others of the need for deploy­ ment of advanced telecommunications services in these areas. ■ bandwidth planning in a community context the audit and planning processes that have been described are critical activities for libraries. it is essential, however, for these processes to occur in the larger community con­ text. investments in technology infrastructure are increas­ ingly a community­wide resource that services multiple functions—emergency services, community access, local government agencies, to name a few. it is in this larger context that library pac and internet access occurs. moreover, there is a convergence of technology and service needs. for example, public libraries increasingly serve as agents of e­government and disaster­relief providers.6 first responders rely on the library’s infrastructure when theirs is destroyed, as hurricane katrina and other storms demonstrated. local, state, and federal government agen­ cies rely on broadband and pac and internet access (wired or wireless) to deliver e­government services. thus, at their core, libraries, emergency services, gov­ ernment agencies, and others have similar needs. pooling resources, planning jointly, and looking across needs may yield economies of scale, better service, and a more robust community technology infrastructure. emergency providers need access to reliable broadband and commu­ nications technologies in general, and in emergency situ­ ations in particular. libraries need access to high­quality broadband and pac technologies. both need access to wireless technologies. as broadcast networks relinquish ownership of the 700 mhz frequency used for analog television in february 2009, and this frequency is distributed to municipali­ ties for emergency services, now is an excellent time for libraries to engage in community technology planning for e­government, disaster planning and relief efforts, and pac and internet services. by working with the larger community to build a technology infrastructure, the library and the entire community benefit. ■ availability to high-speed connectivity one key consideration not known at this time is the extent to which public libraries—particularly those in rural areas—even have access to high­speed connec­ tions. many rural communities are served not by the large telecommunications carriers, but rather by small, privately owned­and­run local exchange carriers. iowa and wisconsin, for example, are each served by more than eighty exchange carriers. as such, public libraries are limited in capacity and services to what these exchange table 11. public-access wireless internet connectivity availability in public library outlets by metropolitan status and poverty metropolitan status poverty level provision of public-access wireless internet services urban suburban rural low medium high overall currently available 42.9% ± 4.9% (n=1,211) 42.5% ± 4.9% (n=2,240) 30.7% ± 4.6% (n=2,492) 38.0% ± 4.8% (n=5,165) 28.1% ±4.5% (n=679) 53.8% ± 5.0% (n=99) 36.7% ± 4.8% (n=5,943) not currently available and no plans to make it available within the next year 23.1% ± 4.2% (n=651) 29.7% ± 4.6% (n=1,562) 49.2% ± 5.0% (n=3,988) 37.4% ± 4.8% (n=5,091) 44.4% ± 4.9% (n=1,072) 21.0% ± 4.1% (n=39) 38.3% ± 4.9% (n=6,201) not currently available, but there are plans to make it available within the next year 30.6% ± 4.6% (n=864) 26.0% ± 4.4% (n=1,369) 18.6% ± 3.9% (n=1,509) 22.5% ± 4.2% (n=3,063) 26.2% ± 4.4% (n=633) 25.3% ± 4.4% (n=46) 23.1% ± 4.2% (n=3,742) article title | author 21assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 21 carriers offer and make available. thus, in some areas, dsl service may be the only form of high­speed connec­ tivity available to libraries. and, as suggested earlier, dsl may or may not be considered high speed given the needs of the library and the demands of its users. communities that lack high­quality broadband ser­ vices by telecommunications carriers may want to con­ sider building a municipal wireless network that meets the community’s broadband needs for emergency, disas­ ter, and public­access settings. as a community engages in community­wide technology planning, it may become evident that local telecommunications carriers do not meet the broadband needs of the community. such com­ munities may need to build their own networks, based on identified technology­plan needs. ■ knowledge of networked services connectivity needs patrons may not attempt to use high­bandwidth services at the public library because they know from previous visits that the library cannot provide acceptable connec­ tivity speeds to access that service—thus, they quit trying to access that service, limiting the usefulness of the pub­ lic library. in addition, librarians may have inadequate knowledge or information to determine when bandwidth is or is not sufficient to meet the demands of their users. indeed, the survey and site visits revealed that some librarians did not know the connection speeds that linked their library to the internet. consequently, libraries are in a dilemma: increase both the number of workstations and the bandwidth to meet demand; or provide less service in order to operate within the constraints of current connectivity infrastruc­ ture. and yet, roughly 45 percent of public libraries indi­ cate that they have no plans to add workstations within the next two years; the average number of workstations has been around ten for the last three surveys (2002, 2004, and 2006); and 80 percent of public libraries indicate that space limitations affect their ability to add workstations.7 hence, for many libraries, adding workstations is not an option. ■ missing the mark? the networked environment is such that there are multi­ ple uses of bandwidth within the same library—for exam­ ple, public internet access, staff access, wireless access, integrated library system access. we are now in the web 2.0 environment, which is an interactive web that allows for content uploading by users (e.g., blogs, mytube.com, myspace.com, gaming). streaming content, not text, is increasingly the norm. there are portable devices that allow for text, video, and voice messaging. increasingly, users desire and prefer wireless services. this is a new environment in which libraries provide public access to networked services and resources. it is an enabling environment that puts users fully in the content seat—from creation to design to organization to access to consumption. and users have choices, of which the public library is only one, regarding the information they choose to access. it is an environment of competition, advanced applications, bandwidth intensity, and high­quality com­ puters necessary to access the graphically intense content. the impacts of this new and substantially more com­ plex environment on libraries are potentially significant. as user expectations rise, combined with the provision of high­quality services by other providers, libraries are in a competitive and service­ and resource­rich informa­ tion environment. providing “bare minimum” pac and internet access can have two detrimental effects in that they: (1) relegate libraries to places of last resort, and (2) further digitally divide those who only have public­access computers and internet access through their public librar­ ies. it is critical, therefore, for libraries to chart a high­end course regarding pac and internet access, and not access that is merely perceived to be acceptable by the librarians. ■ additional research the context in which issues regarding quality pac and sufficient connectivity speeds to internet access reside is complex and rapidly changing. research questions to explore include: ■ is it possible to define quality pac and internet access in a public library context? ■ if so, what are the attributes included in the defini­ tion? ■ can these attributes be operationalized and mea­ sured? ■ assuming measurable results, what strategies can the library, policy, research, and other interested communities employ to impact public library move­ ment toward quality pac and internet access? ■ should there be standards for sufficient connectivity and quality pac in public libraries? ■ how can public librarians be better informed regard­ ing the planning and deployment of sufficient and quality bandwidth? ■ what is the role of federal and state governments in supporting adequate bandwidth deployment for public libraries?8 ■ to what extent is broadband deployment and avail­ ability truly universal as per the universal service 22 information technology and libraries | march 200722 information technology and libraries | march 2007 (section 254) of the telecommunications act of 1996 (p.l. 104­104)? these questions are a beginning point to a larger set of activities that need to occur in the research, practitioner, and policy­making communities. ■ obtaining sufficient and quality public-library bandwidth arbitrary connectivity speed targets, e.g., 200kbps or 769kbps, do not in and of themselves ensure quality pac and sufficient connectivity speeds. public libraries are indeed connected to the internet and do provide public­ access services and resources. it is time to move beyond connectivity­type and ­speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. given the widespread connectivity now provided from most public libraries, there continue to be increased demands for more and better networked services. these demands come from governments that expect public libraries to support a range of e­government services, from residents who want to use free wireless connectivity from the public library, to patrons who need to download music or view streaming videos (to name but a few). simply providing more or better connectivity will not, in and of itself, address all of these diverse service needs. increasingly, pac support will require additional public librarian knowledge, resources, and services. sufficient and quality bandwidth is a key component of those services. the degree to which public libraries can provide such enhanced networked services (requiring exceptionally high bandwidth that is both sufficient and of high quality) is unclear. mounting a significant effort now to better understand existing bandwidth use and plan for future needs and requirements in individual public libraries is essential. in today’s networked envi­ ronment, libraries must stay competitive in the provision of networked services. such will require sufficient and high­quality connectivity and bandwidth. ■ acknowledgements the authors gratefully acknowledge the support of the bill & melinda gates foundation and the american library association for support of the 2006 public libraries and the internet study. data from that study have been incorpo­ rated into this paper. references 1. information institute, public libraries and the internet (tal­ lahassee, fla.: information use management and policy insti­ tute, 2006). all studies conducted since 1994 are available at: http://www.ii.fsu.edu/plinternet (accessed march 1, 2007). 2. u.s. federal communications commission, high speed services for internet access: status as of december 31, 2005 (wash­ ington, d.c.: fcc, 2006), available at http://www.fcc.gov/ bureaus/common_carrier/reports/fcc­state_link/iad/ hspd0604.pdf (accessed mar. 1, 2007). 3. j. c. bertot et al., public libraries and the internet 2006 (tal­ lahassee, fla.: information use management and policy insti­ tute, forthcoming), available at http://www.ii.fsu.edu/plinternet (accessed mar. 1, 2007). 4. j. c. bertot et al., “drafted: i want you to deliver e­ government,” library journal 131, no. 13 (aug. 2006): 34–37. 5. c. r. mcclure et al., planning and role setting for public libraries: a manual of options and procedures (chicago: ala, 1987); e. himmel and w. j. wilson, planning for results: a public library transformation process (chicago, ala, 1997). 6. j. c. bertot et al., “drafted: i want you to deliver e­gov­ ernment.”; p. t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 7. j. c. bertot et al., public libraries and the internet 2006. 8. jaeger et al., “the policy implications of internet connec­ tivity in public libraries.” fagan 140 information technology and libraries | september 2006 visual search interfaces have been shown by researchers to assist users with information search and retrieval. recently, several major library vendors have added visual search interfaces or functions to their products. for public service librarians, perhaps the most critical area of interest is the extent to which visual search interfaces and text-based search interfaces support research. this study presents the results of eight full-scale usability tests of both the ebscohost basic search and visual search in the context of a large liberal arts university. l ike the web, online library research database interfaces continue to evolve. even with the smaller scope of library research databases, users can still suffer from information overload and may have difficulty in processing large results sets. web search-engine research has shown that the number of searchers viewing only the first results page has increased from 29 percent in 1997 to 73 percent in 2002 for united states-based web searchengines users.1 additionally, the mean number of results viewed per query in 2001 was 2.5 documents.2 this may indicate either increasing relevance in search results or an increase in simplistic web interactions. visual alternatives to search interfaces attempt to address some of the problems of information retrieval within large document sets. while research and development of visual search interfaces began well before the advent of the web, current research into visual web interfaces has continued to expand.3 within librarianship, the most visual interface research seems to focus on those that could be applied to large-scale digital library projects.4 although library products often have more metadata and organizational structure than the web, search engine-style interfaces adapted for field searching and boolean operators are still the most frequent approach to information retrieval.5 yet research has shown that visual interfaces to digital libraries offer great benefit to the user. zaphiris emphasizes the advantage of shifting the user’s mental load “from slow reading to faster perceptual processes such as visual pattern recognition.”6 according to borner and chen, visual interfaces can help users better understand search results and the interrelation of documents within the result set, and refine their search.7 in their discussion of the function of “overviews” in visual interfaces, greene and his colleagues say that overviews can help users make better decisions about potential relevance, and “extract gist more accurately and rapidly than traditional hit lists provided by search engines.”8 several library database vendors are implementing visual interfaces to navigate and display search results. serials solutions’ new federated search product, centralsearch, uses technology from vivisimo that “organizes search results into titled folders to build a clear, concise picture for its users.”9 ulrich’s fiction connection web site has used aquabrowser to help one “discover titles similar to books you already enjoy.”10 the queens library has also implemented aquabrowser to provide a graphical interface to its entire library’s collections.11 xreferplus maps search results to topics by making visual connections between terms.12 comabstracts, from cios, uses a similar concept map, although one cannot launch a search directly from the tool. groxis chose a circular style for its concept-mapping software, grokker. partnerships between groxis and stanford university began as early as 2004, and grokker is now being implemented at stanford university libraries academic and information resources.13 ebsco and groxis announced their partnership in march 2006.14 the ebscohost interface now features a visual search tab as an option that librarians can choose to leave on (by default) or turn off in ebsco’s administrator module. figure 1 shows a screenshot of the visual search interface. within the context of library research databases, visual searching likely provides a needed alternative from traditional, text-based searching. to test this hypothesis, james madison university libraries (jmu libraries) decided to conduct eight usability sessions with ebscohost’s new visual search, in coordination with ebsco and groxis. while this is by no means the first published usability test of vendor interfaces, the literature understandably reveals a far greater number of usability tests on in-house projects such as library web sites and customized catalog interfaces than on library database interfaces.15 it is hoped that by observing users try both the ebsco basic search and visual search, more understanding will be gained about user search behavior and the potential benefits of a visual approach. ฀ method the usability sessions were conducted at jmu, a large liberal arts university whose student population is mostly drawn from virginia and the northeastern region. only 10 percent of the students are from minority groups. jmu requires that all freshmen pass the online information skills seeking test (isst) before becoming a sophomore, and the libraries developed a web tutorial, “go for the gold,” to prepare students for the isst. therefore, usabiljody condit fagan usability testing of a large, multidisciplinary library database: basic search and visual search jody condit fagan (faganjc@jmu.edu) is digital services librarian at carrier library, james madison university, harrisonburg, virginia. usability testing of a large, multidisciplinary library database | fagan 141 ity-test participants were largely white, from the northeastern united states, and had exposure to basic information literacy instruction. jmu libraries’ usability lab is a small conference room with one computer workstation equipped with morae software.16 audio and video recordings of user speech and facial expressions, along with “detailed application and computer system data,” are captured by the software and combined into a searchable recording session for the usability tester to review. a screenshot of the morae analysis tool is shown in figure 2. the usability test script was developed in collaboration with representatives of ebsco and groxis. ebsco provided access to the beta version of visual search for the test, and groxis provided financial incentives for student participants. the test sessions and the results analysis, however, were conducted solely by the researcher and librarian facilitators. the visual search development team was provided with the results and video clips after analysis. usability study participants were recruited by posting an announcement to the jmu students’ web portal. a $25 gift certificate was offered as an incentive, and more than 140 students submitted a participation interest form. these were sorted by the number of years the student(s) had been at jmu to try to get as many novice users as possible. because so much of today’s student work is conducted in groups, four groups of two, as well as four individual sessions, were scheduled, for a total of twelve students. jmu librarians who had received both human-subjects training and an introduction to facilitation served as facilitators to the usability sessions. their role was to watch the time and ask open-ended questions to keep the student participants talking about what they were doing. the major research question it was hoped would be answered by the tests was, “to what extent does ebsco’s basic search interface and visual search interface support student research?” since the tests could not evaluate the entire research process, it was decided to focus on the development of the research topic. specifically, the goal was to find out how well each interface supported the intellectual process of the students in coming up with a topic, narrowing their topic, and performing searches on their chosen subtopics. an additional goal was to determine how well users were able to find and use the interface widgets and how satisfied the students felt after using the interfaces. the overall session was structured in this order: a pretest survey about the students’ research experience; a series of four tasks performed with ebscohost’s basic search; a series of three tasks performed with ebscohost’s visual search; and a posttest interview. both basic and visual search interfaces were used with academic search premier. each of the eight sessions was recorded in entirety by the morae software, and each recording was viewed in entirety. to try to gain some quantitative data, the researcher measured the time it took to complete each task. however, due to variables such as facilitator involvement and interaction between group members, the numbers did not lend themselves to comparison. also, it would not have been clear whether greater numbers indicated a positive or negative sign. taking longer to come up with subtopics, for example, could as easily be a sign of exploration and interested inquiry as it might be of frustration or failure. as such, the data are mostly qualitative in nature. figure 1. screenshot of ebscohost’s visual search figure 2. screenshot of morae recorder analysis tool 142 information technology and libraries | september 2006 ฀ results the student participants were generally underclassmen. two of the students, group 2, were in their third year at jmu. all others were in their first or second year. while students were drawn from a wide variety of majors, it is regrettable that there was not stronger representation from the humanities. when asked, “what do you normally use to do research?” six students answered an unqualified “google.” three other students mentioned internet search engines in their response. only two students gave the brand or product names of library research databases: one said, “pubmed, wilsonomnifile, and ebsco,” while the other, a counseling major, mentioned psycinfo and cinahl. when shown a screenshot of basic search, half of the students said they had used an ebsco database before. all of the participants said they had never before used a visual search interface. the full results from the individual pretest interviews are shown in figures 3 and 4. to begin the usability test, the facilitator started internet explorer and loaded the ebscohost basic search, which was set to have a single input box. the scripts for each task are listed in figure 5. note that task 4 was only featured in the basic search portion of the test. for task 1 on the basic search—coming up with a general topic—all of the participants began by using their own topics rather than choosing from the list of ideas. also, although they were asked to “spend some time on ebsco to come up with a possible general topic,” all but group 6 fulfilled this by simply thinking of a topic (sometimes after some discussion within the groups of two) and typing it in. with the exception of group 6, the size of the result set did not inspire topic changes. figure 6 summarizes the students’ searches and relative success on task 1. in retrospect, the tests might have yielded more straightforward findings if the students had been directed to choose from the provided list of topics, or even to use the same topic. however, part of the intention was to determine whether either interface was helpful in guiding the students’ topic development. it was hoped that by defining the scenario as writing a paper for class, their topic selection would reflect the realities of student research. however, it probably would have been better to have used the same topic for each session. task 2 asked participants to identify three subtopics, and task 3 asked them to refine their search to one subtopic and limit it to the past two years. a summary of these tasks appears in figure 7. a surprising finding during task 2 was that students did go past the first page of results. four groups went past the first page of results, while two groups did not get enough results for more than one page. the other two groups did not choose to look past the first page of results. this contrasts with jansen and spink’s findings, figure 3. results from pretest interview, groups 1–4 figure 4. results from pretest interview, groups 5–8 usability testing of a large, multidisciplinary library database | fagan 143 in which 73 percent of web searchers only view the first results page.17 another pleasant surprise was that students spent some time actually reading through results when they were searching for ways to narrow their topic. five groups scanned through both titles and abstracts, which requires clicking on the article titles to display the citation view. one of these five additionally chose to open full-text articles and look at the references to determine relevance. two groups scanned through the results pages only, but looked at both article titles and the subjects in the left-hand column. group 5 seemed to only scan the titles in the results list. this user behavior is also quite different than that found with web search-engine users. in one recent study by jansen and spink, more than 90 percent of the time, search-engine users viewed five or fewer documents per query.18 the five groups that chose to view the citation/abstract view by clicking on the title (groups 1, 2, 3, 4, and 6) identified subtopics that were significantly more interesting and plausible than the general topic they had come up with. from looking at their results, these groups were clearly identifying their subtopics from reading the abstracts and titles rather than just brainstorming. although group 2 had the weakest subtopics, going from the world baseball classic to specific players’ relationships to the classic and the home-run derby, they were working with a results set of but eleven items. the three groups that relied on scanning only the results list succeeded to an extent, but as a whole, the new subtopics would be much less satisfying to the scenario’s hypothetical professor. after scanning the titles on two pages of results, group 5 (an individual) ended up brainstorming her subtopics (prevention, intervention, and what an eating disorder looks like) based on her knowledge of the topic rather than drawing from the results. group 7 (a group of two) identified their subtopic (sand dunes) from the lefthand column on the results list. group 8 (an individual) picked up his subtopics (steroids in sports, president bush’s stance on steroids, and softball) from reading keywords in the article titles on the first page of results. since the subjects in the left-hand column were a new addition to basic search, the use of this area was also noted. four groups used the subjects in the left-hand column without prompting. two groups saw the subjects (i.e., ran the mouse over them) but did not use them. the remaining two groups made no action related to the subjects. a worrisome finding of tasks 2 and 3 was that most students had trouble with the default search being set to phrase-searching rather than to a boolean and. this can easily be seen in looking at the number of results the students came up with when they tried to refine their topics (figure 7). even though most students had some limiter still in effect (full text, last two years) when they first tried their new refined search, it was the phrasesearching that really hurt them. luckily, this figure 6. task 1, coming up with a general topic using basic search figure 5. tasks posed for each portion of the usability test. 144 information technology and libraries | september 2006 is a customizable setting in ebsco’s administrator module, and it is recommended that libraries enable the “proximity” expander to be set “on” by default, which will automatically combine search terms with and. task 4, finding a “recent article in the economist about the october earthquake in kashmir,” was designed to test the usability of the ebscohost publication search and limiter. it was listed as optional in case the facilitator was worried that time was an issue. four of the student groups—1, 2, 5, and 7—were posed the task. of these four groups, three relied entirely on the publication limiter on the refine search panel. group 1 chose to use the publication search. all four groups quickly and successfully completed this task. ฀ ฀additional questions during basic search tasks at various points during the three tasks in ebsco’s basic search, the students were asked to limit their results set to only full-text results, to find one peer-reviewed article, and to limit their search to the past two years. seven out of the eight student groups had no problem finding and using the ebscohost “refine search” panel, including the full-text check box, date limiter, and peerreviewed limiter. group 7 did not find the refine search panel or use its limiters until specifically guided by the facilitator near the end. this group had found other ways to apply limits: they used the “books/monographs” tab on the results list to limit to full text, and the results-list sorting function to limit to the past two years. after having seen the refine search panel, group 7 did use the “peer reviewed” check box to find their peer-reviewed article. toward the end of the basic search portion, students were asked to “save three of their results for later.” three groups demonstrated full use of the folder. an additional three groups started to use the folder and viewed the folder but did not use print, save, or e-mail. it is unclear whether they knew how to do so and just did not follow through, or whether they thought they had safely stored the items. two students did not use the folder at all, acting individually on items. one group used the “save” function but did not save each article. ฀ visual search similar to task 1, when using the basic search, students did not discover general topics by using the interface, but simply typed in a topic of interest. only two groups, 1 and 8, chose to try the same topic again. in the interests of processing time, visual search limits the search to the first 250 results retrieved. since jmu has set the default sort results to display in chronological order, the most recent 250 results were returned during these usability tests. figure 8 shows the students’ original search terms using visual search, the actions they took while looking for subtopics, and the subtopics they identified. additionally, if the subtopics they identified matched words on the screen, the location of those words is noted. three of the groups (1, 2, and 5) identified subtopics when looking at the labels on topic and subtopic circles. group 3 identified subtopics while looking at article titles as well as the subtopic circles. the members of group 6 identified subtopics while looking at the citation view and reading the abstract and full text, as well as rolling over article titles with their mice. it was not entirely clear where the student in group 4 got his subtopics from. two of the three subtopics did not seem to figure 7. basic search, task 2 and 3, coming up with subtopics. usability testing of a large, multidisciplinary library database | fagan 145 be represented in the display of the results set. his third subtopic was one of the labels from a subtopic circle. groups 7 and 8 both struggled with finding their subtopics. group 7 simply had a narrow topic (“jackalope”), and group 8 misspelled “steroids” and got few results for that reason. lacking many clusters, both groups tried typing additional terms into the title keyword box on the filter panel, resulting in fewer or zero results. for task 3, students were asked to limit their search to the last two years and to refine their search to a chosen subtopic (figure 9). particularly because the results set is limited to 250, it would have been better to have separated these two tasks: first to have them limit the content, then perhaps the date of the search. three groups, all groups of two, used the date limit first (2, 6, and 8). three groups (1, 3, and 6) narrowed the content of their search by typing a new search or additional keywords into the main search box. groups 2 and 4 narrowed the content of their search by clicking on the subtopic circles. note that this does not change the count of the number of results displayed in the filter panel. groups 5 and 7 tried typing keywords into the title keyword filter panel and also clicking on circles. both groups fared better with the latter approach. group 8 typed an additional keyword into the filter panel box to narrow his search. while five of the groups announced the subtopic to which they wanted to narrow their search before beginning to narrow their topic, groups 2, 7, and 8 began to interact with the interface and experiment with subtopics before choosing one. while groups 2 and 8 arrived at a subtopic and identified it, group 7 tried many experiments, but since their original topic (jackalope) was already narrow, they were not ultimately successful in identifying or searching on a subtopic. as with basic search, students were asked to save three articles for later. five of the groups (2, 4, 5, 6, and 8) used the “add to folder” function which appears in the citation view on the right-hand side of the screen. of these, three groups proceeded to “folder has items.” of these groups, two chose the “save” function. two groups used either “save” or “e-mail” to preserve individual items, rather than using the folder. one group experienced system slowness and was not able to load the full-record view in time to determine whether they would be able to save items for later. a concern that students may not realize is that in folder view or individually, the “save” button really just formats the records. the user must still use a browser function to save the formatted page. no student performed this function. figure 8. visual search, task 1 and 2, coming up with a general topic figure 9. visual search, task 3, searching on subtopic (before date limit, if possible) 146 information technology and libraries | september 2006 several students had some trouble with the mechanics of the filter panel, shown in figure 10. seven of the eight groups found and used the filter panel, originally hidden from view, without assistance. however, some users were not sure how the title keyword box related to the main search box. at least two groups typed the same search string into the title keyword box that they had already entered into the main search box. also, users were not sure whether they needed to click the search button after using the date limiter. however, in no case was a student unable to quickly recover from these areas of confusion. ฀ results of posttest interview at the end of the entire usability session, participants were asked several questions while looking at screenshots of each interface. a full list of posttest interview questions can be found in figure 11. when speaking about the strengths of basic search, seven of eight groups talked about the search options, such as field searching and limiters. the individual in group 1 mentioned “the ability to search in fields, especially for publications and within publications.” one of the students in group 3 mentioned that “i thought it was easier to specify the search for the full text and the peer reviewed—it had a separate page for that.” the student in group 4 added, “they give you all the filter options as opposed to the other one.” five of the eight groups also mentioned familiarity with the type of interface as a strength of basic search. since jmu has only had access to ebsco databases for less than a year, and half of the students admitted they had not used ebsco, it seemed their comments were with the style of interface more than their experience with the interface. the student in group 1 commented, “seems like the standard search engine.” group 2 noted, “it was organized in a way that we’re used to more,” and group 3 said, “it’s more traditional so it’s more similar to other programs.” half of the groups mentioned that basic search was clear or organized. group 6 explained, “it was nice how it was really clearly set out . . . like, everything’s in a line.” not surprisingly, visual search’s strengths surrounded the grouping of subtopics: seven of eight groups made some comment about this. the student in group 4 said, “it groups the articles for you better. it kinda like gives you the subtopics when you get into it and search it and that’s pretty cool.” the student in group 8 stated, “you can look and see an outline of where you want to go . . . it’s easy to pinpoint it on screen like that’s where i want to go with my research.” some of the other strengths mentioned about visual search were: showing a lot of information on one screen without scrolling (group 7) and the colorful nature of the interface. a student in group 2 added, “i like the circles and squares—the symbols register easily.” the only three weaknesses listed for basic search in response to the first question were: “not having a spot to put in words not to search for” (group 1); that, like internet search engines, basic search should have “a clip from the article that has the keyword in it, the line before and the line after” (group 6); and that basic search might be too broad, because “unless you narrow it, [you have to] type in keywords to narrow it down yourself” (group 7). figure 10. visual search filter panel figure 11. posttest interview questions usability testing of a large, multidisciplinary library database | fagan 147 with regard to weaknesses of visual search, half of the groups had some confusion about the content, partially due to the limited number of results. a student from group 7 declared, “it may not have as many results. . . . if you typed in ‘school’ on the other one, it might have . . . 8,000 pages [but] on this you have . . . 50 results.” the student in group 5 agreed, saying that with visual search, “they only show you a certain number of articles.” the student in group 1 said, “it’s kind of confusing when it breaks it up into the topics for you. it may be helpful for some other people, but for the way my mind works i like just having all my results displayed out like on the regular one.” half of the groups also made some comment that they were just not used to it. six of the groups were asked which one they would choose if they had class in one hour. (it is not clear why the facilitator did not ask this question of groups 3 and 8.) four groups (1, 2, 5, and 7) indicated basic search. one student in group 2 said, “i think it’s easier to use, but i don’t trust it.” the other in group 2 added, “it’s new and we’re not quite sure because every other search engine is you just type in words and it’s not graphical.” both students in group 7 commented that the familiarity of basic search was the reason they would use it for class in one hour. both groups 2 and 7 would later say that they liked the visual search interface better. two groups (4 and 6) chose visual search for the “class in one hour” scenario. the student in group 4 commented, “because it does cool things for you, makes it easier to find. otherwise you’re going through by title.” both these groups would later also say that they liked the visual search interface better. the students were also asked to describe two scenarios, one in which they would use basic search and one in which they would use visual search. four of the groups (1, 3, 5, and 6) said they would use basic search when they knew what information they needed. seven of the eight groups said they would use visual search for broad topics. all the students’ responses are given in figure 12. when asked which interface they preferred, the groups split evenly. comments from the four who preferred basic search (1, 3, 5, and 8) centered on the familiarity of the interface. the student in group 5 added, “the regular one . . . i like to get things done.” all four of these students had said they had used an ebsco database before. the two students who could list library research databases by name were both in this group. of the four who preferred visual search (2, 4, 6, and 7), three groups had never used ebsco before, though one of the students in group 7 thought he’d used it in the library web tutorial. group 2 commented, “it seemed like it had a lot more information . . . cool . . . futuristic.” the student in group 4 said, “it’s kind of like a little game. . . . like you’re trying to find the hidden piece.” group 7 commented that visual search was colorful and intriguing. the students in group 6 both stated “the visual one” in unison. one student said that visual search was more “[eye-catching] . . . it keeps you focused at what you are doing, i felt, instead of . . . words . . . you get to look at colors” and added later that it was “fun.” the other students in group 6 said, “i’m a very visual learner. so to see instead of having to read the categories, and say oh this is what makes sense, i see the circles like ‘abilities test’ or ‘academic achievement’ and i automatically know that’s what it is . . . and i can see how many articles are in it . . . and you click on it and it zooms in and you have all of them there.” the second student went on to add, “i’ve been teaching my mom how to use technology and the visual search would be so much easier for her to get, because its just looks like someone drew it on there like this is a general category and then it breaks it down.” other suggestions given during the free-comment portion of the survey were to have the filters from basic search appear on visual search (especially peer-reviewed); curiosity about when visual search would become available (at the time it was in beta test); and a suggestion to have generaleducation writing students write their first paper using visual search. figure 12. examples of two situations: one in which you would be more likely to use visual search, and one in which you would be more likely to use ebsco 148 information technology and libraries | september 2006 ฀ discussion this evaluation is limited both because most students chose different topics for each search interface, and because they only had time to research one topic in each interface. therefore, there could be an infinite number of scenarios in which they would have performed differently. however, this study does show that, for some students, or for some search topics, visual search will help students in a way that basic search may not. one hypothesis of this study was that within the context of library research databases, visual searching would provide a needed alternative from traditional, text-based searching. the success of the students was observed in three areas: the quality of the subtopics they identified after interacting with their search results; the improvement of the chosen subtopic over their chosen general topic, and the quality of the results they found for their subtopic search. the researcher made a best effort to compare topics and results sets and decide which interface helped the student groups to perform better. in addition, qualities that each interface seemed to contribute to the students’ search process were noted (figure 13). these qualities were determined by reviewing the video recordings and examining the ways in which either interface seemed to support the attitudes and behaviors of the students as they conducted their research tasks. when considering all three of these areas, four groups did not, overall, require visual search as an alternative to basic search (1, 3, 4, and 7). two of these groups (4 and 7) seemed to benefit from more focus when using the basic search interface. although visual search lent them more interaction and exploration (which may be why they said they preferred visual search), it seems the focus was more important to their performance. for the other two groups (1 and 3), basic search really supported the depth of inquiry and high interest in finding results. these two groups confirmed that they preferred basic search. for two groups (6 and 8), visual search seemed an equally viable alternative to basic search. for group 6, both interfaces seemed to support the group’s desire to explore; they said they preferred visual search. for the student in group 8, basic search seemed to orient him to the goal of finding results, while visual search supported a more exploratory approach. since, in his case, this exploratory approach did not turn out well in the area of finding results, it is not surprising that he ended up preferring basic search. the remaining two groups (2 and 5) performed better with visual search, upholding the hypothesis that an alternate search is needed. group 2 seemed bored and uninterested in the search process when using basic search even though they chose a topic of personal interest: “world baseball classic.” visual search caught their attention and sparked interest in the impersonal topic “global warming.” group 2 spent more time exploring while using the visual search interface, and in the posttest survey admitted that they preferred the visual search interface. the student in group 5 said she preferred basic search, and as a selfdescribed psycinfo user, seemed comfortable with the interface. yet for this test scenario, visual search made her think of new ideas and supported more real exploration during the search process. within each of the three areas, basic search appeared to have the upper hand for both the quality of the subtopics identified by the students, and in the improvement of the chosen subtopics over the general topics. this is at least partially explained by the limitation of visual search to the most recent 250 results. that is, as the students explored the visual search results, choosing subtopics would not relaunch a search on that subtopic, which would have engendered more and perhaps better subtopics. in the third area, the quality of the results set for the chosen topic, visual search seemed to have the upper hand if only because of the phrase-searching limitation present in jmu’s administrative settings for basic search. that is, students were often finding few or no results on their chosen subtopics in basic search. this study also had findings that seem to transcend figure 13: strengths of basic search and visual search in quality of subtopics, most improved topic, and result sets usability testing of a large, multidisciplinary library database | fagan 149 these interfaces and the underlying database. first, libraries should strongly consider changing their database default searching from phrase searching to a boolean and, if possible. (this is possible in ebsco using the administrative module.) second, most students did not have trouble finding or using the interface widgets to perform limiting functions, with the one exception being some confusion about the relationship between the visual search filters and main search box. unlike some research into web search behavior, students may well travel beyond the first page of results and view more than just a few documents when determining relevance. finally, the presence of subject terms in both interfaces proved to be an aid to understanding results sets. this study also pointed out some improvements that could be made to visual search. first, it would be great if visual search returned more than 250 results in the initial set, or at least provided an overview of the size, type, and extent of objects using available metadata.19 however, even with today’s high-speed connections, result-set size will need to be balanced with performance. perhaps, as students click on subtopics, the software could rerun the search so that the results set does not stay limited to the original 250. on a minor note, for both basic and visual search, greater care should be taken to make sure users understand how the save function works and alert users to the need to use the browser function to complete the process. it should be noted that ebsco has not stopped developing visual search, and many of these improvements may well be on their way. ebsco says it will be adding more support for limiters, display preferences, and contextual text result-list viewing at some point in the future. these feature sets can currently be viewed on grokker.com. an important area for future research is user behavior in library subscription databases. while these usability tests provide a qualitative evaluation of a specific interface, it would be worthwhile to have a more reliable understanding about students’ searching behavior in library databases across similar interfaces. since public service librarians deal primarily with users who have self-identified as needing help, their experience does not always describe the behavior of all users. furthermore, studies of web search behavior may not apply directly to searching in research databases. specifically, students’ use of subject terms in both interfaces could be explored. half of the student groups in this study chose to use the basic search subject clusters in the left-hand column on the results page, despite the fact that they had never seen them before (this was a beta-test feature). is this typical? would this strategy hold up to a variety of research topics? another interesting question is the use of a single search box versus several search boxes arrayed in rows (to assist in constructing boolean and field searching). in the ebsco administrative module, librarians can choose either option. based on research rather than anecdotal evidence, which is best? another option is the default sort: historically, at jmu libraries, this has been a chronological sort. does this cause problems for relevance-thinking students? finally, the issue of collaboration in student research using library research databases would be a fascinating topic. certainly, these usability recordings could be reviewed with a mind to capturing the differences between individuals and groups of two, but there may be better designs for a more focused study of this topic. ฀ conclusion if you take away one conclusion from this study, let it be this: do not hesitate to try visual search with your users! information providers must balance investments in cutting-edge technology with the demands of their users. libraries and librarians, of course, are a key user group for information providers. a critical need in librarianship is to become familiar with the newest technology solutions, particularly with regard to searching, in order to provide vendors with informed feedback about which technologies to pursue. by using and teaching new visual search alternatives, librarians will be poised to influence the further development of alternatives to text-based searching. references and notes 1. bernard j. jansen and amanda spink, “how are we searching the world wide web? a comparison of nine search engine transaction logs,” special issue, information processing and management 42, no. 1 (2006): 257. 2. bernard j. jansen and amanda spink, “an analysis of web documents retrieved and viewed,” in proceedings of the 4th international conference on internet computing (las vegas, 2003), 67. 3. aravindan veerasamy and nicholas j. belkin, “evaluation of a tool for visualization of information retrieval results,” sigir forum (acm special interest group on information retrieval) (1996): 85–93; katy börner and javed mostafa, “jodl special issue on information visualization interfaces for retrieval and analysis,” international journal on digital libraries 5, no. 1 (2005): 1–2; ozgur turetken and ramesh sharda, “clustering-based visual interfaces for presentation of web search results: an empirical investigation,” information systems frontiers 7, no. 3 (2005): 273–97. 4. stephen greene et al., “previews and overviews in digital libraries: designing surrogates to support visual information seeking,” journal of the american society for information science 51, no. 4 (2000): 380–93; panayiotis zaphiris et al., “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 51–69. 5. katy börner and chaomei chen eds., visual interfaces to digital libraries, 1st ed. (berlin; new york: springer, 2003), 243. 150 information technology and libraries | september 2006 6. zaphiris et al., “exploring the use of information visualization for digital libraries,” 51–69. 7. börner and chen, visual interfaces to digital libraries, 243. 8. greene et al., “previews and overviews in digital libraries,” 380–93. 9. “vivisimo corporate profile,” in vivisimo, http://vivi simo.com/html/about (accessed apr. 19, 2006). 10. “aquabrowser library—fiction connection,” www.fic tionconnection.com/ (accessed apr. 19, 2006). 11. “queens library—aquabrowser library,” http://aqua .queenslibrary.org/ (accessed apr. 19, 2006). 12. “xrefer—research mapper,” www.xrefer.com/research (accessed apr. 19, 2006). 13. “stanford ‘groks,’” http://speaking.stanford.edu/back _issues/ soc67/library/stanford_groks.html (accessed apr. 19, 2006); “grokker at stanford university,” http://library.stan ford.edu/catdb/grokker/ (accessed apr. 19, 2006). 14. “ebsco has partnered with groxis to deliver an innovative visual search feature as part of ebsco,” www.groxis .com/service/grokker/pr29.html (accessed apr. 19, 2006). 15. michael dolenko, christopher smith, and martha e. williams, “putting the user into usability: developing customer-driven interfaces at west group,” in proceedings of the national online meeting 20 (medford, n.j.: learned information, 1999), 81–90; e. t. morley, “usability testing: the silverplatter experience,” cd-rom professional 8, no. 3 (1995); ron stewart, vivek narendra, and axel schmetzke, “accessibility and usability of online library databases,” library hi tech 23, no. 2 (2005): 265–86; nicholas tomaiuolo, “deconstructing questia: the usability of a subscription digital library,” searcher 9, no. 7 (2001): 32–39; b. hamilton, “comparison of the different electronic versions of the encyclopaedia britannica: a usability study,” electronic library 21, no. 6 (2003): 547–54; heather l. munger, “testing the database of international rehabilitation research: using rehabilitation researchers to determine the usability of a bibliographic database,” journal of the medical library association (jmla ) 91, no. 4 (2003): 478–83; frank cervone, “what we’ve learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10–14; alexei oulanov and edmund f. y. pajarillo, “usability evaluation of the city university of new york cuny+ database,” electronic library 19, no. 2 (2001): 84–91; steve brantley, annie armstrong, and krystal m. lewis, “usability testing of a customizable library web portal,” college & research libraries 67, no. 2 (2006): 146–63; carole a. george, “usability testing and design of a library web site: an iterative approach,” oclc systems & services 21, no. 3 (2005): 167–80; leanne m. vandecreek, “usability analysis of northern illinois university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; susan goodwin, “using screen capture software for web-site usability and redesign buy-in,” library hi tech 23, no. 4 (2005): 610–21; laura cobus, valeda frances dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 232–46. 16. “morae usability testing for software and web sites,” www.techsmith.com/morae.asp (accessed apr. 19, 2006). 17. jansen and spink, “an analysis of web documents retrieved and viewed,” 67. 18. ibid. 19. greene et al., “previews and overviews in digital libraries,” 381. 26 information technology and libraries | september 2007 author id box for 2 column layout wikis in libraries matthew m. bejune wikis have recently been adopted to support a variety of collaborative activities within libraries. this article and its companion wiki, librarywikis (http://librarywikis. pbwiki.com/), seek to document the phenomenon of wikis in libraries. this subject is considered within the framework of computer-supported cooperative work (cscw). the author identified thirty-three library wikis and developed a classification schema with four categories: (1) collaboration among libraries (45.7 percent); (2) collaboration among library staff (31.4 percent); (3) collaboration among library staff and patrons (14.3 percent); and (4) collaboration among patrons (8.6 percent). examples of library wikis are presented within the article, as is a discussion for why wikis are primarily utilized within categories i and ii and not within categories iii and iv. it is clear that wikis have great utility within libraries, and the author urges further application of wikis in libraries. i n recent years, the popularity of wikis has skyrocketed. wikis were invented in the mid­1990s to help facilitate the exchange of ideas between computer programmers. the use of wikis has gone far beyond the domain of com­ puter programming, and now it seems as if every google search contains a wikipedia entry. wikis have entered into the public consciousness. so, too, have wikis entered into the domain of professional library practice. the purpose of this research is to document how wikis are used in librar­ ies. in conjunction with this article, the author has created librarywikis (http://librarywikis.pbwiki.com/), a wiki to which readers can submit additional examples of wikis used in libraries. the article will proceed in three sections. the first section is a literature review that defines wikis and introduces computer­supported cooperative work (cscw) as a context for understanding wikis. the second section documents the author’s research and presents a schema for classifying wikis used in libraries. the third section considers the implications of the research results. ■ literature review what’s a wiki? wikipedia (2007a) defines a wiki as: a type of web site that allows the visitors to add, remove, edit, and change some content, typically with­ out the need for registration. it also allows for linking among any number of pages. this ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring. wikis have been around since the mid­1990s, though it is only recently that they have become ubiquitous. in 1995, ward cunningham launched the first wiki, wikiwikiweb (http://c2.com/cgi/wiki), which is still active today, to facilitate the exchange of ideas among computer program­ mers (wikipedia 2007b). the launch of wikiwikiweb was a departure from the existing model of web communica­ tion ,where there was a clear divide between authors and readers. wikiwikiweb elevated the status of readers, if they so chose, to that of content writers and editors. this model proved popular, and the wiki technology used on wikiwikiweb was soon ported to other online communi­ ties, the most famous example being wikipedia. on january 15, 2001, wikipedia was launched by larry sanger and jimmy wales as a complementary project for the now­defunct nupedia encyclopedia. nupedia was a free, online encyclopedia with articles written by experts and reviewed by editors. wikipedia was designed as a feeder project to solicit new articles for nupedia that were not submitted by experts. the two services coexisted for some time, but in 2003 the nupedia servers were shut down. since its launch, wikipedia has undergone rapid growth. at the close of 2001, wikipedia’s first year of operation, there were 20,000 articles in eighteen language editions. as of this writing, there are approximately seven million articles in 251 languages, fourteen of which have more than 100,000 articles each. as a sign of wikipedia’s growth, when this manuscript was first submitted four months earlier, there were more than five million articles in 250 languages. author’s note: sources in the previous two para­ graphs come from wikipedia. the author acknowledges the concerns within the academy regarding the practice of citing wikipedia within scholarly works; however, it was decided that wikipedia is arguably an authoritative source on wikis and itself. nevertheless, the author notes that there were changes—insubstantial ones—to the cited wikipedia entries between when the manuscript was first submitted and when it was revised four months later. wikis and cscw wikis facilitate collaborative authoring and can be con­ sidered one of the technologies studied under the domain of cscw. in this section, cscw is explained and it is shown how wikis fit within this framework. cscw is an area of computer science research that considers the application of computer technology to sup­ port cooperative, also referred to as collaborative work. the term was first coined in 1984 by irene greif (1988) and matthew m. bejune (mbejune@purdue.edu) is an assistant professor of library science at purdue university libraries. he also is a doctoral student at the graduate school of library and information science, university of illinois at urbana-champaign. article title | author 27wikis in libraries | bejune 27 paul cashman to describe a workshop they were planning on the support of people in work environments with com­ puters. over the years there have been a number of review articles that describe cscw in greater detail, including bannon and schmidt (1991), rodden (1991), schmidt and bannon (1992), sachs (1995), dourish (2001), ackerman (2002), olson and olson (2002), dix, finlay, abowd, and beale (2004), and shneiderman and plaisant (2005). publication in the field of cscw primarily occurs through conferences. the first conference on cscw was held in 1986 in austin, texas. since then, the conference has been held biennially in the united states. proceedings are published by the association for computing machinery (acm, http://www.acm.org/). in 1991, the first european conference on computer supported cooperative work (ecscw) was held in amsterdam. ecscw also is held biennially, in odd­numbered years. ecscw proceedings are published by springer (http://www.ecscw.uni­sie­ gen.de/). the primary journal for cscw is computer supported cooperative work: the journal of collaborative computing. publications also appear within publications of the acm and chi, the conference on human factors in computing. cscw and libraries as libraries are, by nature, collaborative work envi­ ronments—library staff working together and with patrons—and as digital libraries and computer technolo­ gies become increasingly prevalent, there is a natural fit between cscw and libraries. the following researchers have applied cscw to libraries. twidale et al. (1997) pub­ lished a report sponsored by the british library research and innovation centre that examined the role of col­ laboration in the information­searching process to inform how information systems design could better address and support collaborative activity. twidale and nichols (1998) offered ethnographic research of physical collaborative environments—in a university library and an office—to aid the design of digital libraries. they wrote two reviews of cscw as applied to libraries—the first was more com­ prehensive (twidale and nichols 1998) than the second (twidale and nichols 1999). sánchez (2001) discussed collaborative environments designed and prototyped for digital library environments. classification of collaboration technologies that facilitate collaborative work are typically classified within cscw across two continua: synchronous versus asynchronous, and co­located versus remote. if put together in a two­by­two matrix, there are four possibilities: (1) synchronous and co­located (same time, same place); (2) synchronous and remote (same time, different place); (3) asynchronous and remote (different time, different place); and (4) asynchronous and co­located (different time, same place). this classification schema was first proposed by johansen et al. (1988). nichols and twidale (1999) mapped work applications within the realm of cscw in figure 1. wikis are not present in the figure, but their absence is not an indication that they are not cooperative work technologies. rather, wikis were not yet widely in use at the time cscw was considered by nichols and twidale. the author has added wikis to nichols and twidale’s graphical representation in figure 2. interestingly, wikis are border­crossers fitting within two quadrants: the upper right—asynchronous and co­located; and the lower right—asynchronous and remote. wikis are asynchronous in that they do not require people to be working together at the same time. they are both co­located and remote in that people working collaboratively may not need to be working in the same place. it is also interesting to note that library technologies also can be mapped using johansen’s schema. nichols and twidale (1999) also mapped this, and figure 3 illus­ trates the variety of collaborative work that goes on within libraries. ■ method in order to to discover the widest variety of wikis used in libraries, the author searched for examples of wikis used in libraries within three areas—the lis literature, the library success wiki, and within messages posted on three professional electronic discussion lists. when examples were found, they were logged and classified according to a schema created by the author. results are presented in the next section. the first area searched was within the lis literature. the author utilized the wilson library literature and figure 1. classification of cscw applications co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing team rooms organizational memory workflow web-based applications collaborative writing 2� information technology and libraries | september 20072� information technology and libraries | september 2007 information science database. there were two main types of articles: ones that argued for the use of wikis in libraries, and ones that were case studies of wikis that had been implemented. the second area searched was within library success: a best practices wiki (http://www.libsuccess.org/) (see figure 4), created by meredith farkas, distance learning librarian at norwich university. as the name implies, it is a place for people within the library community to share their success stories. posting to the wiki is open to the public, though registration is encouraged. there are many subject areas on the wiki, including management and leadership, readers’ advisory, reference services, infor­ mation literacy, and so on. there also is a section about collaborative tools in libraries (http://www.libsuccess .org/index.php?title=collaborative_tools_in_libraries), in which examples of wikis in libraries are presented. within this section there is a presentation about wikis made by farkas (2006) titled wiki world (http://www. libsuccess.org/indexphp?title=wiki_world), from which examples were culled. the third area that was searched was professional electronic discussion list messages from web4lib, dig_ ref, and libref­l. the web4lib electronic discussion list (tennant 2005) is “for the discussion of issues relating to the creation, management, and support of library­ based world wide web servers, services, and applica­ tions.” the list is moderated by roy tennant and the web4lib advisory board and was started in 1994. the dig_ref electronic discussion list is a forum for “people and organizations answering the questions of users via the internet” (webjunction n.d.). the list is hosted by the information institute of syracuse, school of information studies, syracuse university, and was created in 1998. the libref­l electronic discussion list is “a moderated discussion of issues related to reference librarianship (balraj 2005). established in 1990, it’s operated out of kent state university and moderated by a group of list own­ ers. these three electronic discussion lists were selected for two reasons. first, the author is a subscriber to each electronic discussion list, and prior to the research noted there were messages about wikis in libraries. second, based on the descriptions of each electronic discussion list stated above, the selected electronic discussion lists reasonably covered the discussion of wikis in libraries within the professional library electronic discussion lists. one year of messages, november 15, 2005, through november 14, 2006, was analyzed for each list. messages about wikis in libraries were identified through key­ word searches against the author’s personal archive of electronic discussion list messages collected over the figure 2. classification of cscw applications including wikis co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing wikis team rooms wikis organizational memory workflow web-based applications collaborative writing figure 3. classification of collaborative work within libraries co-located remote synchronous asynchronous personal help reference interview issue of book on loan fact-to-face interactions use of opacs database search video conferencing telephone notice boards post-it notes memos documents for study social information filtering e-mail, voicemail distance learning postal services figure �. library success: a best practices wiki (http://www. libsuccess.org/) article title | author 29wikis in libraries | bejune 29 years. an alternative method would have been to search the web archive of each list, but the author found it easier to search within his mail client, microsoft outlook. messages with the word “wiki” were found in 513 mes­ sages: 354 in web4lib, 91 in dig_ref, and 68 in libref­ l. this approach had high recall, as discourse about wikis frequently included the use of the word “wiki,” though low precision, as there were many results that were not about wikis used in libraries. common false hits included messages about the nature study (giles 2005) that com­ pared wikipedia to encyclopedia britannica, and messages that included the word “wiki” but were simply refer­ ring to wikis, though not examples of wikis used within libraries. from the list of 513 messages, the author read each message and came up with a much shorter list of thirty­nine messages about wikis in libraries: thirty­two in web4lib, three in dig_ref, and four in libref­l. ■ results classification of the results after all wiki examples had been collected, it became clear that there was a way to classify the results. in farkas’s (2006) presentation about wikis, she organized wikis in two categories: (1) how libraries can use wikis with their patrons; and (2) how libraries can use wikis for knowledge sharing and collaboration. this schema, while it accounts for two types of collaboration, is not granular enough to represent the types of collaboration found within the wiki examples identified. as such, it became clear that another schema was needed. twidale and nichols (1998) identified three types of collaboration within libraries: (1) collaboration among library staff; (2) collaboration between a patron and a member of staff; and (3) collaboration among library users. their classification schema mapped well to the examples of wikis that were identified; however, it too was not granular enough, as it did not distinguish among col­ laboration between library staff intraorganizationally and extraorganizationally, the two most common types of wiki usage found in the research (see appendix). to account for these types of collaboration, which are common not only to wiki use in libraries but to all professional library prac­ tice, the author modified twidale and nichols schema (see figure 6). the improved schema also uniformly represents entities across the categories—library staff and member of staff are referred to as “library staff,” and patrons and library users are referred to as “patrons.” examples of wikis used in libraries for each category are provided to better illustrate the proposed classifica­ tion schema. ■ collaboration among libraries the library instruction wiki (http://instructionwiki .org/main_page) is an example of a wiki that is used for collaboration among libraries (figure 7). it appears as though the wiki was originally set up to support library instruction within oregon—it is unclear if this was asso­ ciated with a particular type of library, say academic or public—but now the wiki supports library instruction in general. the wiki is self­described as: a collaboratively developed resource for librarians involved with or interested in instruction. all librarians and others interested in library instruction are welcome and encouraged to contribute. the tagline for the wiki is “stop reinventing the wheel”(library instruction wiki 2006). from this wiki there figure 6. four types of collaboration within libraries 1. collaboration among libraries (extra-organizational) 2. collaboration among library staff (intra-organizational) 3. collaboration among library staff and patrons 4. collaboration among patrons figure 5. wiki world (http://www.libsuccess.org/index.php?title=wiki _world) 30 information technology and libraries | september 200730 information technology and libraries | september 2007 is a list of library instruction resources that include the fol­ lowing: handouts, tutorials, and other resources to share; teaching techniques, tips, and tricks; class­specific web sites and handouts; glossary and encyclopedia; bibliography and suggested reading; and instruction­related projects, brainstorms, and documents. within the handouts, tutori­ als, and other resources to share section, the author found a wide variety of resources from libraries across the country. similarly, there were a number of suggestions to be found under the teaching techniques, tips, and tricks section. another example of a wiki used for collaboration among libraries is the library success wiki (http://www .libsuccess.org/), one of the sources of examples of wikis used in this research. adding to earlier descriptions of this wiki as presented in this paper, library success seems to be one of the most frequently updated library wikis and perhaps the most comprehensive in its cover­ age of library topics. ■ collaboration among library staff the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) is an example of a wiki used for col­ laboration among library staff (figure 8). this wiki is a knowledge base containing more than one thousand infor­ mation technology services (its) documents. its docu­ ments support the information technology needs of the library organization. examples include answers to com­ monly asked questions, user manuals, and instructions for a variety of computer operations. in addition to being a repository of its documents, the wiki also serves as a portal to other wikis within the university of connecticut libraries. there are many other wikis connected to library units; teams; software applications, such as the libraries ils; libraries within the university of connecticut libraries; and other university of connecticut campuses. the health science library knowledge base, stony brook university (http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome) is another example of a wiki that is used for collaboration among library staff (figure 9). the wiki is described as “a space for the dynamic collaboration of the library staff, and a platform of shared resources” (health sciences library 2007). on the wiki there are the following content areas: news and announcements; hsl departments; projects; trouble­ shooting; staff training resources, working papers and support materials; and community activities, scholarship, conferences, and publications. ■ collaboration among library staff and patrons there are only a few examples of wikis used for collabora­ tion among library staff and patrons to cite as exemplars. one example is the st. joseph county public library (sjpl) subject guides (http://www.libraryforlife.org/ subjectguides/index.php/main_page), seen in figure 10. this wiki is a collection of resources and services in print and electronic formats to assist library patrons with subject area searching. as the wiki is published by library staff for public consumption, it has more of a professional feel than wikis from the first two categories. pages have images, and the content is structured to look like a standard web page. though the wiki looks like a web page, there still remain a number of edit links that follow each section of text on the wiki. while these tags bear importance for those editing figure 7. library instruction wiki (http://instructionwiki.org/) figure �. the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) article title | author 31wikis in libraries | bejune 31 the wiki—library staff only in this case—they undoubtedly puzzle library patrons who think that they have the ability to edit the wiki when, in fact, they do not. another example of collaboration between library staff and patrons that takes a similar approach is the usc aiken gregg­graniteville library web site (http://library. usca.edu/) in figure 11. as with the sjpl subject guides, this wiki looks more like a web site than a wiki. in fact, the usc aiken wiki conceals its true identity as a wiki even more so than the sjpl subject guides. the only evidence that the web site is a wiki is a link at the bottom of each page that says “powered by pmwiki.” pmwiki (http:// pmwiki.org/) is a content management system that uti­ lizes the wiki technology on the back end to manage a web site while retaining the look and feel of a standard web site. it seems that the benefits of using a wiki in such a way are shared content creation and management. ■ collaboration among patrons as there are only three examples of wikis used for col­ laboration among patrons, all examples will be high­ lighted in this section. the first example is wiki worldcat (http://www.oclc.org/productworks/wcwiki.htm), sponsored by oclc. wiki worldcat launched as a pilot project in september 2005. the service allows users of open worldcat, oclc’s web version of worldcat, to add book reviews to item records. though this wiki does not have many book reviews in it, even for contemporary bestsellers, it gives a taste for how a wiki could be used to facilitate collaboration among patrons. a second example is the biz wiki from ohio university libraries (http://www.library.ohiou.edu/subjects/ bizwiki/index.php/main_page) (see figure 12). the biz wiki is a collection of business information resources avail­ able through ohio university. the wiki was created by chad boeninger, reference and instruction librarian, as an alternate form of a subject guide or pathfinder. what separates this wiki from those in the third category, collaboration among library staff and patrons, is that the wiki is editable by patrons as well as librarians. similarly, butler wikiref (http://www .seedwiki.com/wiki/butler_wikiref) is a wiki that has reviews of reference resources created by butler librarians, faculty, staff, and students (see figure 13).figure 9. health sciences library knowledge base (http://appdev .hsclib.sunysb.edu/twiki/bin/view/main/webhome) figure 11. usc aiken gregg-graniteville library (http://library.usca .edu/) figure 10. sjcpl subject guides (http://libraryforlife.org/subject guides/index.php/main_page/) 32 information technology and libraries | september 200732 information technology and libraries | september 2007 full results thirty­three wikis were identified. two wikis were classi­ fied in two categories each. the full results are available in the appendix. table 1 illustrates how wikis were not uniformly distributed across the four categories: category i had 45.7 percent, category ii had 31.4 percent, category iii had 14.3 percent, and category iv had 8.6 percent. nearly 80 percent of all examples were found within categories i and ii. as seen in some of the examples in the previous section, wikis were utilized for a variety of purposes. here is a short list of purposes for which wikis were utilized: for sharing information, supporting association work, collecting soft­ ware documentation, supporting conferences, facilitating librarian­to­faculty collaboration, creating digital reposito­ ries, managing web content, creating intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. wiki software utilization is summarized in tables 2 and 3. mediawiki is the most popular software utilized by libraries (33.3 percent), followed by unknown (30.3 percent), pbwiki (12.1 percent), pmwiki (12.1 percent), seedwiki (6.1 percent), twiki (3 percent), and xwiki (3 percent). if the values for unknown are removed from the totals (table 3 ), mediawiki is utilized in almost half (47.8 percent) of all library wiki applications. ■ discussion with a wealth of examples of wikis in categories i and ii and a dearth of examples of wikis in categories iii and iv, the library community seems to be more comfortable using wikis to collaborate within the community, but less comfortable using wikis to collaborate with library patrons or to enable collaboration among patrons. the research results pose the questions: why are wikis pre­ dominantly used for collaboration within the library community? and why are wikis minimally used for col­ laborating with patrons and helping patrons to collabo­ rate with one another? why are wikis predominantly used for collaboration within the library community? this is perhaps the easier of the two questions to explain. there is a long legacy of cooperation and collaboration intraorganizationally and extraorganizationally within libraries. one explanation for this is the shared bud­ getary climate within libraries. all too often there are insufficient money, staff, and resources to offer desired levels of service. librarians work together to overcome these barriers. prominent examples include coopera­ tive cataloging, interlibrary lending, and the formation of consortia to negotiate pricing. another explanation can be found in the personal characteristics of library professionals. librarianship is a service profession that consequently attracts service­minded individuals who are interested in helping others, whether they are library patrons or fellow colleagues. a third reason is the role of library associations, such as the international federation of library associations and institutions, the american library association, the special libraries association, and the medical library association, as well as many others at the international, national, state, and local lev­ figure 12. ohio university libraries biz wiki (http://www.library. ohiou.edu/subjects/bizwiki) figure 13. butler wikiref (http://www.seedwiki.com/wiki/butler_ wikiref) article title | author 33wikis in libraries | bejune 33 els, and the work that is done through these associations at annual conferences and throughout the year. libraries use wikis to collaborate intraorganizationally and extra­ organizationally because collaboration is what they do most naturally. why are wikis minimally used for collaborating with patrons and helping patrons to collaborate with one another? the reasons for why libraries are only minimally using wikis to collaborate with patrons and for patron collabora­ tion are more difficult to ascertain. however, due to the untapped potential of using wikis, the proposed answers to this question are more important and may lead to future implementations of wikis in libraries. here are four pos­ sible explanations, some more speculative than others. first, perhaps one of the reasons is the result of the way in which libraries are conceived by library patrons and librarians alike. a strong case can be made for libraries as places of collaborative work, and the author takes this posi­ tion. however, historically libraries have been repositories of information, and this remains a pervasive and difficult concept to change—libraries are frequently seen simply as places to get books. in this scenario, the librarian is a gate­ keeper that a patron interacts with to get a book—that is, if the patron interacts with a librarian at all. it also is worthy to note that the relationship is one­way—the patron needs the assistance of librarian, but not the other way around. viewed in these terms, this is not a collaborative situation. for libraries to use wikis for the purpose of collaborating with library patrons, it might demand the reconceptualiza­ tion of libraries by library patrons and librarians. similarly, this extreme conceptualization of libraries does not con­ sider patrons working with one another, even though it is an activity that occurs formally and informally within libraries, not to mention with the emergence of interdisci­ plinary and multidisciplinary work. if wikis are to be used to facilitate collaboration between patrons, the conceptual­ ization of the library by library patrons and librarians must be expanded. second, there may be fears within the library commu­ nity about authority, responsibility, and liability. libraries have long held the responsibility of ensuring the authority of the bibliographic catalog. if patrons are allowed to edit the library wiki, there is potential for negatively affecting the authority of the wiki and even the perceived author­ ity of the library. likewise, there is potential liability in allowing patrons to post to the library wiki. similar con­ table 2. software totals wiki software no. % mediawiki 11 33.3 unknown 10 30.3 pbwiki 4 12.1 pmwiki 4 12.1 seedwiki 2 6.1 twiki 1 3 xwiki 1 3 total: 33 100 table 3. software totals without unknowns wiki software no. % mediawiki 11 47.8 pbwiki 4 17.4 pmwiki 4 17.4 seedwiki 2 8.7 twiki 1 4.3 xwiki 1 4.3 total: 23 100.0 table 1. classification summary category no. % i: collaboration among libraries 16 45.7 ii: collaboration among library staff 11 31.4 iii: collaboration among library staff and patrons 5 14.3 iv: collaboration among patrons 3 8.6 total: 35 100.0 3� information technology and libraries | september 20073� information technology and libraries | september 2007 cerns have been raised in the past about other collabora­ tive technologies, such as blogs, bulletin boards, mailing lists, and so on, all aspects of the library 2.0 movement. if libraries are fully to realize library 2.0 as described by casey and savastinuk (2006), miller (2006), and courtney (2007), these issues must be considered. third, perhaps it is due to a matter of fit. it might be the case that wikis are utilized in categories i and ii and not within categories iii and iv because the tools are better suited to support the types of activities within categories i and ii. consider some of the activities listed earlier: sup­ porting association work, collecting software documenta­ tion, supporting conferences, creating digital repositories, creating intranets, and creating knowledge bases. each of these illustrates a wiki that is utilized for the creation of a resource with multiple authors and readers, tasks that are well­suited to wikis. wikipedia is a great example of a wiki with clear, shared tasks for multiple authors and multiple readers and a sense of persistence over time. in contrast, relationships between library staff and patrons do not typically lead to the shared creation of resources. while it is true that the relationship between patron and librarian in the context of a patron’s research assignment can be collab­ orative depending on the circumstances, authorship is not shared but is possessed by the patron. in addition, research assignments in the context of undergraduate coursework are short­lived and seldom go beyond the confines of a particular course. in terms of patrons working together with other patrons, there is the precedent of group work; however, groups often produce projects or papers that share the characteristics of nongroup research assignments listed above. this, of course, does not mean that wikis are not suitable for collaboration within categories iii and iv, but perhaps the opportunities for collaboration are fewer or that they stretch the imagination of the types and ways of doing collaborative work. fourth, perhaps it is a matter of “not yet.” while the research has shown that libraries are not utilizing wikis in categories iii and iv, this may be because it is too soon. it should be noted that wikis are still new technologies. it might be the case that librarians are experimenting in safer contexts so they will gain experience prior to trying more public projects where their expertise will be needed. if this explanation is true, it is expected that more exam­ ples of wikis in libraries will soon emerge. as they do, the author hopes that all examples of wikis in libraries, new and old, will be added to the companion wiki to this article, librarywikis (http://librarywikis.pbwiki.com/). ■ conclusion it appears that wikis are here to stay, and that their utili­ zation within libraries is only just beginning. this article documented the current practice of wikis used in libraries using cscw as a framework for discussion. the author located examples of wikis in three places: within the lis lit­ erature, on the library success wiki, and within messages from three professional electronic discussion lists. thirty­ three examples of wikis were identified and classified using a classification schema created by the author. the schema has four categories: (1) collaboration among librar­ ies; (2) collaboration among library staff; (3) collaboration among library staff and patrons; and (4) collaboration among patrons. wikis were used for a variety of purposes, including for sharing information, supporting associa­ tion work, collecting software documentation, supporting conferences, facilitating librarian­to­faculty collaboration, creating digital repositories, managing web content, creat­ ing intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. by and large, wikis were primarily used to support collaboration among library staff intraorganiza­ tionally and extraorganizationally, with nearly 80 percent (45.7 percent and 31.4 percent respectively) of the examples so identified, and less so in the support of collaboration among library staff and patrons (14.3 percent) and col­ laboration among patrons (8.6 percent). a majority of the examples of wikis utilized the mediawiki software (47.8 percent). it is clear that there are plenty of examples of wikis utilized in libraries, and more to be found each day. it is at this time that the profession is faced with extending the use of this technology, and it is to the future to see how wikis will continue to be used within libraries. works cited ackerman, mark s. 2002. the intellectual challenge of cscw: the gap between social requirements and technical feasibil­ ity. in human-computer interaction in the new millennium, ed. john m. carroll, 179–203. new york: addison­wesley. balraj, leela, et al. 2005 libref­l. kent state university librar­ ies. http://www.library.kent.edu/page/10391 (accessed june 12, 2007). archive is available at this link as well. bannon, liam j., and kjeld schmidt. 1991. cscw: four charac­ ters in search for a context. in studies in computer supported cooperative work. ed. john m. bowers and steven d. benford, 3–16. amsterdam: elsevier. casey, michael e., and laura c. savastinuk. 2006. library 2.0. library journal 131, no. 14: 40–42. http://www.libraryjournal. com/article/ca6365200.html (accessed june 12, 2007). courtney, nancy. 2007. library 2.0 and beyond: innovative technologies and tomorrow’s user (in press). westport, conn.: libraries unlimited. dix, alan, et al. 2004. socio­organizational issues and stake­ holder requirements. in human computer interaction, 3rd ed., 450–74. upper saddle river, n.j.: prentice hall. dourish, paul. 2001. social computing. in where the action is: the foundations of embodied interaction, 55–97. cambridge, mass: mit pr. article title | author 35wikis in libraries | bejune 35 farkas, meredith. 2006. wiki world. http://www.libsuccess. org/index.php?title=wiki_world (accessed june 12, 2007). giles, jim. 2005. internet encyclopaedias go head to head. nature 438: 900–01. http://www.nature.com/nature/journal/v438/ n7070/full/438900a.html (accessed june 12, 2007). greif, irene, ed. 1988. computer supported cooperative work: a book of readings. san mateo, calif.: morgan kaufmann publishers. health sciences library, state university of new york, stony brook. 2007. health sciences library knowledge base. http://appdev.hsclib.sunysb.edu/twiki/bin/view/main/ webhome (accessed june 12, 2007). johansen, robert, et al. 1988. groupware: computer support for business teams. new york: free press. library instruction wiki. 2006. http://instructionwiki.org/ main_page (accessed june 12, 2007). miller, paul. 2006. coming together around library 2.0. dlib magazine 12, no. 4. http://www.dlib.org/dlib/april06/ miller/04miller.html (accessed june 12, 2007). nichols, david m., and michael b. twidale. 1999. com­ puter supported cooperative work and libraries. vine 109: 10–15. http://www.comp.lancs.ac.uk/computing/research/ cseg/projects/ariadne/docs/vine.html (accessed june 12, 2007). olson, gary m., and judith s. olson. 2002. groupware and com­ puter­supported cooperative work. in the human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ed. julie a. jacko and andrew sears, 583–95. mahwah, n.j.: lawrence erlbaum associates, inc.. rodden, tom t. 1991. a survey of cscw systems. interacting with computers 3, no. 3: 319–54. sachs, patricia. 1995. transforming work: collaboration, learn­ ing, and design. communications of the acm 38: 227–49. sánchez, j. alfredo. 2001. hci and cscw in the context of digi­ tal libraries. in chi ‘01 extended abstracts on human factors in computing systems. conference on human factors in computing systems. seattle, wash., mar. 31–apr. 5 2001. schmidt, kjeld, and liam j. bannon. 1992. taking cscw seri­ ously: supporting articulation work. computer supported cooperative work 1, no. 1/2: 7–40. shneiderman, ben, and catherine plaisant. 2005. collaboration. in designing the user interface: strategies for effective humancomputer interaction, 4th ed., 408–50. reading, mass.: addison wesley. tennant, roy. 2005. web4lib electronic discussion. webjunc­ tion.org. http://lists.webjunction.org/web4lib/ (accessed june 12, 2007). archive is available at this link as well. twidale, michael b., et al. 1997. collaboration in physical and digital libraries. report no. 64, british library research and innovation centre. http://www.comp.lancs.ac.uk/ computing/research/cseg/projects/ariadne/bl/report/ (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998a. using studies of collaborative activity in physical environments to inform the design of digital libraries. technical report cseg/11/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/cscw98.html (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998b. a survey of applications of cscw for digital libraries. technical report cseg/4/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/survey.html (accessed june 12, 2007). webjunction. n.d. dig_ref electronic discussion list. http:// www.vrd.org/dig_ref/dig_ref.shtml (accessed june 12, 2007). wikipedia. 2007a. wiki. http://en.wikipedia.org/wiki/wiki (accessed april 29, 2007). wikipedia. 2007b. wikiwikiweb. http://en.wikipedia.org/ wiki/wikiwikiweb (accessed april 29, 2007). 36 information technology and libraries | september 200736 information technology and libraries | september 2007 appendix. wikis in libraries i = collaboration between libraries ii = collaboration between library staff iii = collaboration between library staff and patrons iv = collaboration between patrons category description location wiki software i library success: a best practices wiki—a wiki capturing library success stories. covers a wide variety of topics. also features a presentation about wikis http://www.libsuccess. org/index.php?title=wiki_world http://www.libsuccess.org/ mediawiki i wiki for school library association in alaska http://akasl.pbwiki.com/ pbwiki i wiki to support reserves direct. free, open­source software for managing academic reserves materials developed by emory university. http://www.reservesdirect.org/ wiki/index.php/main_page mediawiki i sunyla new tech wiki—a place for state university of new york (suny) librarians to share how they are using information technologies to interact with patrons http://sunylanewtechwiki.pbwiki. com/ pbwiki i wiki for librarians and faculty members to collaborate across campuses. being used with distance learning instructors and small groups message from robin shapiro. on [dig_ref] electronic discussion list dated 10/18/2006. unknown i discusses setting up three wikis in last month: “one to sup­ port a pre­conference workshop, another for behind­the­ scenes conferences planning by local organizers, and one for conference attendees to use before they arrived and during the sessions” (30). fichter, darlene. 2006. using wikis to support online collaboration in libraries. information outlook 10, no.1: 30­31. unknown i unofficial wiki to the american library association 2005 annual conference http://meredith.wolfwater.com/ wiki/index.php?title=main_page mediawiki i unofficial wiki to the 2005 internet librarian conference http://ili2005.xwiki.com/xwiki/bin/ view/main/webhome xwiki i wiki for the canadian library association (cla) 2005 annual conference http://wiki.ucalgary.ca/page/cla mediawiki i wiki for south carolina library association http://www.scla.org/governance/ homepage pmwiki i wiki set up to support national discussion about institutional repositories in new zealand http://wiki.tertiary.govt.nz/ ~institutionalrepositories pmwiki i the oregon library instruction wiki used for sharing infor­ mation about library instruction http://instructionwiki.org/ mediawiki i personal repositories online wiki environment (prowe)— an online repository sponsored by the open university and the university of leicester that uses wikis and blogs to encourage the open exchange of ideas across communities of practice http://www.prowe.ac.uk/ unknown article title | author 37wikis in libraries | bejune 37 category description location wiki software i lis wiki—space for collecting articles and general informa­ tion about library and information science http://liswiki.org/wiki/main_page mediawiki i making of modern michigan—a wiki to support a state­wide digital library project http://blog.lib.msu.edu/mmmwiki/ index.php/main_page unknown (behind firewall) i wiki used as a web content editing tool in a digital library initiative sponsored by emory university, the university of arizona, virginia tech, and the university of notre dame http://sunylanewtechwiki.pbwiki .com/ pbwiki ii wiki at suny stony brook health sciences library used as knowledge base http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome; presentation can be found at: http:// ms.cc.sunysb.edu/%7edachase/ wikisinaction.htm twiki ii wiki at york university used internally for committee work. exploring how to use wikis as a way to collaborate with users message from mark robertson. on web4lib electronic discussion list dated 10/13/2006. unknown ii wiki for internal staff use at the university of waterloo. they utilize access control to restrict parts of the wiki to groups message from chris gray. on web4lib electronic discussion list dated 08/09/2006. unknown ii wiki at the university of toronto for internal communica­ tions, technical problems, and as a document repository message from stephanie walker. on libref­l electronic discussion list dated 10/28/2006. unknown ii wiki used for coordination and organization of portable professor program, which appears to be a collaborative infor­ mation literacy program for remote faculty http://tfpp­committee.pbwiki.com/ pbwiki ii the university of connecticut libraries’ staff wiki which is a repository of information technology services documents http://wiki.lib.uconn.edu/wiki/ main_page mediawiki ii wiki used at binghamton university libraries for staff intranet. features pages for committees, documentation, policies, newsletters, presentations, and travel reports screenshots can be found at http://library.lib.binghamton.edu/ presentations/cil2006/cil%202006 _wikis.pdf mediawiki ii wiki used at the information desk at miami university described in: withers, rob. “something wiki this way comes.” c&rl news 66, no. 11 (2005): 775–77. unknown ii use of wiki as knowledge base to support reference service http://oregonstate.edu/~reeset/ rdm/ unknown ii university of minnesota libraries staff web site in wiki form https://wiki.lib.umn.edu/ pmwiki ii wiki used to support the mit engineering and science libraries b­team. the wiki may no longer be active, but is still available http://www.seedwiki.com/wiki/b­ team seedwiki iii a wiki that is subject guide at st. joseph county public library in south bend, indiana http://www.libraryforlife.org/ subjectguides/index.php/main_page mediawiki 3� information technology and libraries | september 20073� information technology and libraries | september 2007 category description location wiki software iii wiki used at the aiken library, university of south carolina as a content management system (cms) http://library.usca.edu/main/ homepage pmwiki iii doucette library of teaching resources wiki—a repository of resources for education students http://wiki.ucalgary.ca/page/ doucette mediawiki iv wiki worldcat (wikid) is an oclc pilot project (now defunct) that allowed users to add reviews to open worldcat records http://www.oclc.org/product­ works/wcwiki.htm unknown iii and iv wikiref lists reviews of reference resources—databases, books, web sites, etc. —created by butler librarians, faculty, staff, and students. http://www.seedwiki.com/wiki/ butler_wikiref; reported in matthies, brad, jonathan helmke, and paul slater. using a wiki to enhance library instruction. indiana libraries 25, no. 3 (2006): 32–34. seedwiki iii and iv wiki used as a subject guide at ohio university http://www.library.ohiou.edu/sub­ jects/bizwiki/index.php/main_page; presentation about the wiki: http://www.infotoday.com/cil2006/ presentations/c101­102_boeninger .pps mediawiki lib-s-mocs-kmc364-20140601053820 book reviews proceedings of the conference on interlibrary communications and information networks, edited by joseph becker, sponsored by the american library association and the u.s. office of education, bureau of libraries and educational technology held at airlie house, warrenton, virginia, september 28, 1970-0ctober 2, 1970. chicago: american library association, 1971. 347p to see how rapidly the field of library networking and communications has moved in recent times, one need only try to review a conference on the subject some years after it was held. what was fresh, imaginative, innovative, or blue-sky has become accepted or gone beyond; errors in thinking or bad guesses as to the future have been shown up; and the blue sky has been divided into lower stratospheres and outer space for ease of working. under these circumstances one can only review such proceedings as history. the assumptions on which the conference was based were the traditional ones of librarians and information scientists-that access to information should be the right of anyone without regard to geographical or economic position, and that pooling of resources (here by networking operations) is one of the best ways to reach that goal. since 1970 both of these assumptions have been questioned, but at the time of the conference there were no opposing voices. the final conclusions, of course, were based on these assumptions. national systems were recommended, both governmental and private, with the establishment of a public corporation (such as the corporation for public broadcasting) as the central stimulator, coordinator, and regulator, to be served by input from a large number of groups. funding, the attendees decided, should be pluralistic, from public, private, and foundation sources (are there any others?), but with the federal government bearing the largest burden of support. since it is deemed desirable to give the widest chance for all individuals to use these networks, it was recommended that fee-forservice prices should be kept low through subventions of the telecommunications costs by libraries and information centers. and since new techniques and methods need to be learned, both education and research in the field must be strengthened and enlarged. since the basic components of networks of libraries and information centers was conceived as being: 1. bibliographic access to media 2. mediation of user request to information book reviews 245 3. delivery of media to users 4. education traditional questions of bibliographic description, the most useful form of public services (including such things as interviewing requestors, seeking information on the existence of answers, locating the answers physically, providing them, evaluating them and obtaining feedback), as well as the best ways to set up networks were discussed at length. moreover, since new technologies have sometimes been touted as the answer to many of these problems, a whole section on network technology was included. such subjects as telecommunications, cable television, and computers were examined; here most of the recommendations still remain to be carried out. the organization proposed for these networks again plowed old ground. the conferees felt that one should use the tremendous national and disciplinary resources already established (the library of congress, the national library of medicine, the national agricultural library, chemical abstracts, etc.); there should be a coordinating body to minimize duplication of effort and assure across-the-board coverage; the systems must be sold to legislators if public money is to be provided; and more research on the best networking operations is necessary. above all in almost every section of the report and in the preface the then-new national commission on libraries and information science was referred to as the great savior. together with requests for public money, it might be said, this was the thread binding all sections of the conference together. was this conference necessary? could it have brought forth something more useful than the gentle spoof in irwin pizer's poem "hiawatha's network?" it was undoubtedly very inspiring for those at the conferenceall 100 of them-who probably learned more over the cocktail glass and dinner plate than at the formal sessions, and who learned as they grappled with the difficulties of consensus-making. but need the proceedings have been published? is everything ever said at a meeting always worth preserving? how about the concept of ephemera rather than total recall? would not a short summary of the recommendations have sufficed? estelle brodman 16 information technology and libraries | march 2009 mathew j. miles and scott j. bergstrom classification of library resources by subject on the library website: is there an optimal number of subject labels? the number of labels used to organize resources by subject varies greatly among library websites. some librarians choose very short lists of labels while others choose much longer lists. we conducted a study with 120 students and staff to try to answer the following question: what is the effect of the number of labels in a list on response time to research questions? what we found is that response time increases gradually as the number of the items in the list grow until the list size reaches approximately fifty items. at that point, response time increases significantly. no association between response time and relevance was found. i t is clear that academic librarians face a daunting task drawing users to their library’s web presence. “nearly three-quarters (73%) of college students say they use the internet more than the library, while only 9% said they use the library more than the internet for information searching.”1 improving the usability of the library websites therefore should be a primary concern for librarians. one feature common to most library websites is a list of resources organized by subject. libraries seem to use similar subject labels in their categorization of resources. however, the number of subject labels varies greatly. some use as few as five subject labels while others use more than one hundred. in this study we address the following question: what is the effect of the number of subject labels in a list on response times to research questions? n literature review mcgillis and toms conducted a performance test in which users were asked to find a database by navigating through a library website. they found that participants “had difficulties in choosing from the categories on the home page and, subsequently, in figuring out which database to select.”2 a review of relevant research literature yielded a number of theses and dissertations in which the authors compared the usability of different library websites. jeng in particular analyzed a great deal of the usability testing published concerning the digital library. the following are some of the points she summarized that were highly relevant to our study: n user “lostness”: users did not understand the structure of the digital library. n ambiguity of terminology: problems with wording accounted for 36 percent of usability problems. n finding periodical articles and subject-specific databases was a challenge for users.3 a significant body of research not specific to libraries provides a useful context for the present research. miller’s landmark study regarding the capacity of human shortterm memory showed as a rule that the span of immediate memory is about 7 ± 2 items.4 sometimes this finding is misapplied to suggest that menus with more than nine subject labels should never be used on a webpage. subsequent research has shown that “chunking,” which is the process of organizing items into “a collection of elements having strong associations with one another, but weak associations with elements within other chunks,”5 allows human short-term memory to handle a far larger set of items at a time. larson and czerwinski provide important insights into menuing structures. for example, increasing the depth (the number of levels) of a menu harms search performance on the web. they also state that “as you increase breadth and/or depth, reaction time, error rates, and perceived complexity will all increase.”6 however, they concluded that a “medium condition of breadth and depth outperformed the broadest, shallow web structure overall.”7 this finding is somewhat contrary to a previous study by snowberry, parkinson, and sisson, who found that when testing structures of 26, 43, 82, 641 (26 means two menu items per level, six levels deep), the 641 structure grouped into categories proved to be advantageous in both speed and accuracy.8 larson and czerwinksi recommended that “as a general principle, the depth of a tree structure should be minimized by providing broad menus of up to eight or nine items each.”9 zaphiris also corroborated that previous research concerning depth and breadth of the tree structure was true for the web. the deeper the tree structure, the slower the user performance.10 he also found that response times for expandable menus are on average 50 percent longer than sequential menus.11 both the research and current practices are clear concerning the efficacy of hierarchical menu structures. thus it was not a focus of our research. the focus instead was on a single-level menu and how the number and characteristics of subject labels would affect search response times. n background in preparation for this study, library subject lists were collected from a set of thirty library websites in the united mathew j. miles (milesm@byui.edu) is systems librarian and scott j. bergstrom (bergstroms@byui.edu) is director of institutional research at brigham young university–idaho in rexburg. classification of library resources by subject on the library website | miles and bergstrom 17 states, canada, and the united kingdom. we selected twelve lists from these websites that were representative of the entire group and that varied in size from small to large. to render some of these lists more usable, we made slight modifications. there were many similarities between label names. n research design participants were randomly assigned to one of twelve experimental groups. each experimental group would be shown one of the twelve lists that were selected for use in this study. roughly 90 percent of the participants were students. the remaining 10 percent of the participants were full-time employees who worked in these same departments. the twelve lists ranged in number of labels from five to seventy-two: n group a: 5 subject labels n group b: 9 subject labels n group c: 9 subject labels n group d: 23 subject labels n group e : 6 subject labels n group f: 7 subject labels n group g: 12 subject labels n group h: 9 subject labels n group i: 35 subject labels n group j: 28 subject labels n group k: 49 subject labels n group l: 72 subject labels each participant was asked to select a subject label from a list in response to eleven different research questions. the questions are listed below: 1. which category would most likely have information about modern graphical design? 2. which category would most likely have information about the aztec empire of ancient mexico? 3. which category would most likely have information about the effects of standardized testing on high school classroom teaching? 4. which category would most likely have information on skateboarding? 5. which category would most likely have information on repetitive stress injuries? 6. which category would most likely have information about the french revolution? 7. which category would most likely have information concerning walmart’s marketing strategy? 8. which category would most likely have information on the reintroduction of wolves into yellowstone park? 9. which category would most likely have information about the effects of increased use of nuclear power on the price of natural gas? 10. which category would most likely have information on the electoral college? 11. which category would most likely have information on the philosopher emmanuel kant? the questions were designed to represent a variety of subject areas that library patrons might pursue. each subject list was printed on a white sheet of paper in alphabetical order in a single column, or double columns when needed. we did not attempt to test the subject lists in the context of any web design. we were more interested in observing the effect of the number of labels in a list on response time independent of any web design. each participant was asked the same eleven questions in the same order. the order of questions was fixed because we were not interested in testing for the effect of order and wanted a uniform treatment, thereby not introducing extraneous variance into the results. for each question, the participant was asked to select a label from the subject list under which they would expect to find a resource that would best provide information to answer the question. participants were also instructed to select only a single label, even if they could think of more than one label as a possible answer. participants were encouraged to ask for clarification if they did not fully understand the question being asked. recording of response times did not begin until clarification of the question had been given. response times were recorded unbeknownst to the participant. if the participant was simply unable to make a selection, that was also recorded. two people administered the exercise. one recorded response times; the other asked the questions and recorded label selections. relevance rankings were calculated for each possible combination of labels within a subject list for each question. for example, if a subject list consisted of five labels, for each question there were five possible answers. two library professionals—one with humanities expertise, the other with sciences expertise—assigned a relevance ranking to every possible combination of question and labels within a subject list. the rankings were then averaged for each question–label combination. n results the analysis of the data was undertaken to determine whether the average response times of participants, adjusted by the different levels of relevance in the subject list labels that prevailed for a given question, were significantly different across the different lists. in other words, would the response times of participants using a particular list, for whom the labels in the list were highly relevant 18 information technology and libraries | march 2009 to the question, be different from students using the other lists for whom the labels in the list were also highly relevant to the question? a separate univariate general linear model analysis was conducted for each of the eleven questions. the analyses were conducted separately because each question represented a unique search domain. the univariate general linear model provided a technique for testing whether the average response times associated with the different lists were significantly different from each other. this technique also allowed for the inclusion of a covariate—relevance of the subject list labels to the question—to determine whether response times at an equivalent level of relevance was different across lists. in the analysis model, the dependent variable was response time, defined as the time needed to select a subject list label. the covariate was relevance, defined as the perceived match between a label and the question. for example, a label of “economics” would be assessed as highly relevant to the question, what is the current unemployment rate? the same label would be assessed as not relevant for the question, what are the names of four moons of saturn? the main factor in the model was the actual list being presented to the participant. there were twelve lists used in this study. the statistical model can be summarized as follows: response time = list + relevance + (list × relevance) + error the general linear model required that the following conditions be met: first, data must come from a random sample from a normal population. second, all variances with each of the groupings are the same (i.e., they have homoscedasticity). an examination of whether these assumptions were met revealed problems both with normality and with homoscedasticity. a common technique— logarithmic transformation—was employed to resolve these problems. accordingly, response-time data were all converted to common logarithms. an examination of assumptions with the transformed data showed that all questions but three met the required conditions. the three 0.70 0.80 0.90 1.00 1.10 1.20 0.50 0.60 avg log performance trend figure 1. the overall average of average search times for the eight questions for all experimental groups (i.e., lists) questions (5, 6, and 7) were excluded from subsequent analysis. n conclusions the series of graphs in the appendix show the average response times, adjusted for relevance, for eight of the eleven questions for all twelve lists (i.e., experimental groups). three of the eleven questions were excluded from the analysis because of heteroscedascity. an inspection of these graphs shows no consistent pattern in response time as the number of the items in the lists increase. essentially, this means that, for any given level of relevance, the number of items of the list does not affect response time significantly. it seems that for a single question, characteristics of the categories themselves are more important than the quantity of categories in the list. the response times using a subject list with twenty-eight labels is similar to the response times using a list of six labels. a statistical comparison of the mean response time for each classification of library resources by subject on the library website | miles and bergstrom 19 group with that of each of the other groups for each of the questions largely confirms this. there were very few statistically significant different comparisons. the spikes and valleys of the graphs in the appendix are generally not significantly different. however, when the average response time associated with all lists is combined into an overall average from all eight questions, a somewhat clearer picture emerges (see figure 1). response times increase gradually as the number of the items in the list increase until the list size reaches approximately fifty items. at that point, response time increases significantly. no association was found between response time and relevance. a fast response time did not necessarily yield a relevant response, nor did a slow response time yield an irrelevant response. n observations we observed that there were two basic patterns exhibited when participants made selections. the first pattern was the quick selection—participants easily made a selection after performing an initial scan of the available labels. nevertheless, a quick selection did not always mean a relevant selection. the second pattern was the delayed selection. if participants were unable to make a selection after the initial scan of items, they would hesitate as they struggled to determine how the question might be reclassified to make one of the labels fit. we did not have access to a high-tech lab, so we were unable to track eye movement, but it appeared that the participants began scanning up and down the list of available items in an attempt to make a selection. the delayed selection seemed to be a combination of two problems: first, none of the available labels seemed to fit. second, the delay in scanning increased as the list grew larger. it’s possible that once the list becomes large enough, scanning begins to slow the selection process. a delayed selection did not necessarily yield an irrelevant selection. the label names themselves did not seem to be a significant factor affecting user performance. we did test three lists, each with nine items and each having different labels, and response times were similar for the three lists. a future study might compare a more extensive number of lists with the same number of items with different labels to see if label names have an effect on response time. this is a particular challenge to librarians in classifying the digital library, since they must come up with a few labels to classify all possible subjects. creating eleven questions to span a broad range of subjects is also a possible weakness of the study. we had to throw out three questions that violated the assumptions of the statistical model. we tried our best to select questions that would represent the broad subject areas of science, arts, and general interest. we also attempted to vary the difficulty of the questions. a different set of questions may yield different results. references 1. steve jones, the internet goes to college, ed. mary madden (washington, d.c.: pew internet and american life project, 2002): 3, www.pewinternet.org/pdfs/pip_college_report.pdf (accessed mar. 20, 2007). 2. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 361. 3. judy h. jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, new brunswick, new jersey): 38–42. 4. george a. miller, “the magical number seven plus or minus two: some limits on our capacity for processing information,” psychological review 63, no. 2 (1956): 81–97. 5. fernand gobet et al., “chunking mechanisms in human learning,” trends in cognitive sciences 5, no. 6 (2001): 236–43. 6. kevin larson and mary czerwinski, “web page design: implications of memory, structure and scent for information retrieval” (los angeles: acm/addison-wesley, 1998): 25, http://doi.acm.org/10.1145/274644.274649 (accessed nov. 1, 2007). 7. ibid. 8. kathleen snowberry, mary parkinson, and norwood sisson, “computer display menus,” ergonomics 26, no 7 (1983): 705. 9. larson and czerwinski, “web page design,” 26. 10. panayiotis g. zaphiris, “depth vs. breath in the arrangement of web links,” www.soi.city.ac.uk/~zaphiri/papers/hfes .pdf (accessed nov. 1, 2007). 11. panayiotis g. zaphiris, ben shneiderman, and kent l. norman, “expandable indexes versus sequential menus for searching hierarchies on the world wide web,” http:// citeseer.ist.psu.edu/rd/0%2c443461%2c1%2c0.25%2cdow nload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/ cache/papers/cs/22119/http:zszzszagrino.orgzszpzaphiriz szpaperszszexpandableindexes.pdf/zaphiris99expandable.pdf (accessed nov. 1, 2007). 20 information technology and libraries | march 2009 appendix. response times by question by group 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 gr p a (5 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p d (2 3 ite m s) gr p e (6 it em s) gr p f (7 it em s) gr p g (1 2 ite m s) gr p h (9 it em s) gr p i (3 5 ite m s) gr p j (2 8 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) question 1 question 8 question 2 question 9 question 3 question 10 question 4 question 11 introducing zoomify image | smith 29 column title editor author id box for 3 column layout playing tag in the dark: diagnosing slowness in library response time | brown-sica 29 margaret brown-sicatutorial playing tag in the dark: diagnosing slowness in library response time in this article the author explores how the systems department at the auraria library (which serves more than thirty thousand primarily commuting students at the university of colorado–denver, the metropolitan state college of denver, and the community college of denver) diagnosed and analyzed slow response time when querying proprietary databases. issues examined include vendor issues, proxy issues, library network hardware, and bandwidth and network traffic. w hy is everything so slow?” this is the question that library systems departments often have the most trouble answering. it is also easy to dismiss because it is often the fault of factors beyond the control of library staff. what usually prompts these questions are the experiences of the reference librarians. when these librarians are trying to help students at the reference desk, it is very frustrating when databases seem to respond to queries slowly, files take forever to load onto the computer screen, and all the while the line in front of the desk get continues to grow. or the library gets calls from students using databases and the catalog from their homes who complain that searching library resources takes too long, and that they are getting frustrated and using google instead. this question is so painful because libraries spend so much of their shrinking budgets on high quality information in the form of expensive proprietary databases, and it is all wasted if users have trouble using them. in this case the problem seemed to be how slow the process of searching for information and downloading documents from databases was. for lack of a better term, the auraria library called this the “response time” problem. this article will discuss the various ways the systems (technology) department of the auraria library, which serves the university of colorado–denver, metropolitan state college of denver, and the community college of denver, tried to identify problems and improve database response time. the systems department defined “response time” as the time it took for a person to send a query from a computer at home or in the library to a proprietary information database and receive a response back, or how long it took to load a selected fulltext article from a database. when a customer sets out to use a database in the library, the query to the database could be slowed down by many different factors. the first is the proxy, in our case innovative interfaces’ inc. web access management (iii wam), a product that authenticates the user via the iii api (application program interface) product. to do this the query travels over network hardware, switches, and wires to the iii server and back again. then the query goes to the database’s server, which may be almost anywhere in the world. hardware problems at the database vendor’s end can affect this transfer. in the case of auraria library this transfer can be influenced by traffic on the library’s network, the university’s network, and any other place in between. this could also be hampered by the amount of memory in the computer where the query originates, by the amount of tasks being performed by that computer, etc. the bandwidth of the network and its speed can also have an effect. basically, the bottlenecks needed to be found and fixed. bottlenecks are described by webopedia as “the delay in transmission of data through the circuits of a computer’s microprocessor or over a tcp/ip network. the delay typically occurs when a system’s bandwidth cannot support the amount of information being relayed at the speed it is being processed. there are, however, many factors that can create a bottleneck in a system.”1 literature review there is not a lot on database response slowness in library literature, probably because the issue overlaps with computer science and really is not one problem but a possibility of one of several problems. the issue is figuring out where the problem lies. gerhan and mutula examined technical reasons for network slowness, performing bandwidth testing at a library in botswana and one in the united states using the same computer, and giving several suggestions for testing, fixing technical problems, and issues to examine. gerhan and mutula concluded that bandwidth and insufficient network infrastructure were the main culprits in their situation. they studied both bandwidth and bandwidth “squeeze.” looking for the bandwidth “squeeze” means looking along the internet’s “journey of many stages through routers and exchange points, each successively farther removed from the user.”2 bandwidth bottlenecks could occur at any one or more of those stages in the query’s transmission. the following four sections parse that lengthy pathway and examine how each may contribute to delays. badue et al. in their article “basic issues on the processing of web queries,” described web margaret brown-sica (margaret.brown -sica@ucdenver.edu) is head of technology and distance education support, auraria library, serving the university of colorado–denver, metropolitan state college of denver, and the community college of denver. 30 information technology and libraries | december 200830 information technology and libraries | december 2008 queries, load balancing, and how they function.3 bertot and mcclure’s “assessing sufficiency and quality of bandwidth for public libraries” is based on data collected as part of the 2006 public libraries and the internet study and provides a very straightforward approach for checking specific areas for problems.4 it outlines why basic data such as bandwidth readings may not give the complete picture. it also gives a nice outline of factors involved such as local settings and parameters, ultimate connectivity path, application resource needs, and protocol priority. azuma, okamoto, hasegawa, and masayuki’s “design, implementation and evaluation of resource management system for internet servers” was very helpful in understanding the role and function of proxy servers and problems they can present.5 vendor issues this is a very thorny topic because it is out of the library’s control, and also because the library has so many databases. the systems department asked the reference staff to send reports of problems listing the type of activity attempted, time and dates, the names of the database, the problem and any error messages encountered. a few that seemed to be the slowest were selected for special examination. one vendor worked extensively with the library and in the end it was believed that there were problems at their end in load balancing, which eventually seemed to be fixed. that company was in the middle of a merger and that may have also been an issue. we also noted that a database that uses very large image files, artstor, was hard to use because it was so slow. this company sent the library an application that simulated the databases’ use and was supposed to test to see if bandwidth at auraria library was sufficient for that database. according to the test, it was. databases that consistently were perceived as the slowest were those that had the largest documents and pictures, such as those that used primarily pdfs and visual material. this, with the results of the testing, pointed to a problem independent of vendor issues. bandwidth and network traffic the systems department decided to do bandwidth testing on the library’s public and staff computers after reading gerhan and mutula’s article about the university of botswana. the general perception is that bandwidth is often the primary problem in network slowness, as well as the problems with databases that use larger files. several of the computers were tested in several successive days during what is usually the busiest time for the network, between noon and 2 p.m. the results were good, averaging about 3000 kilobytes per second (kbps). for this test we used the cnet bandwidth meter, which downloads an image to your computer, measures the time of the download, and compares it to the maximum speeds offered by other internet service providers.6 there are several bandwidth meters available on the internet. when the network administrator checked the switches for network traffic, they showed low traffic, almost always less than 20 percent of capacity. this was confusing: if the problem was neither with the bandwidth nor the vendors, what was causing the slow network performance? one of the university network administrators was consulted to see if any factor in their sphere could be having an effect on our network. we knew that the main university network had implemented a bandwidth shaper to regulate bandwidth. “these devices limit bandwidth . . . by greedy applications, guarantee minimum throughput for users, groups or protocols, and better utilize widearea connections by smoothing out bursty traffic.”7 it was thought that perhaps this might be incorrectly prioritizing some of the library’s traffic. this was a dead end, though—the network administrators had stopped using the device. if the bandwidth was good and the traffic was manageable, then the problem appeared to not be at the library. however, according to bertot and mcclure, the bandwidth question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” . . . such arbitrary definitions to describe bandwidth sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction. there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet-based applications and services. this inadequacy will increase significantly as internet-based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high-use multiple-workstation public-access context.8 proxy issues auraria library uses the iii wam proxy server product. there were several things that pointed to the introducing zoomify image | smith 31playing tag in the dark: diagnosing slowness in library response time | brown-sica 31 proxy being an issue. one was that the systems department had been experimenting with invoking the proxy in the library building in order to collect more accurate statistics and found that complaints about speed seemed to have started around the same time as this experiment. but if the bandwidth was not showing inadequacy and the traffic was light, why was this happening? the answer is better explained by azuma et al.: needless to say, busy web servers must have many simultaneous http sessions, and server throughput is degraded when effective resource management is not considered, even with large network capacity. web proxy servers must also accommodate a large number of tcp connections, since they are usually prepared by isps (internet service providers) for their customers. furthermore, proxy servers must handle both upward tcp connections (from proxy server to web servers) and downward tcp connections (from client hosts to proxy server). hence, the proxy server becomes a likely spot for bottlenecks to occur during web document transfers, even when the bandwidth of the network and web server performance are adequate.9 testing was done from on campus and off campus, with and without using the proxy server. the results showed that the connection was faster without the proxy. when testing was done from the health sciences library at the university of colorado with the same type of server and proxy, the response time was much faster. the difference between auraria library and the other library is that the community auraria library serves (the community college of denver, metropolitan state college, and the university of colorado–denver) has a much larger user population who overwhelmingly use databases from home, therefore taxing the proxy server. the other library belonged to a smaller campus, but the hardware was the same. the proxy was immediately dropped for on-campus users, and that resulted in some responsetime improvements. a conference call was set up with the proxy vendor to determine if improvements in response time might be attained by changing from a proxy server to ldap (lightweight directory access protocol) authentication. the response given was that although there might be other benefits, increased response time was not one of them. library network hardware it was evident that the biggest bottleneck was the proxy, so the systems department decided to take a closer look at iii’s hardware. the switch that regulated traffic between the network and the server that houses our integrated library system, part of which is the proxy server, was discovered to have been set at “halfduplex.” half-duplex refers to the transmission of data in just one direction at a time. for example, a walkie-talkie is a half-duplex device because only one party can talk at a time. in contrast, a telephone is a full-duplex device because both parties can talk simultaneously. duplex modes often are used in reference to network data transmissions. some modems contain a switch that lets you select between halfduplex and full-duplex modes. the correct choice depends on which program you are using to transmit data through the modem.10 when this setting was changed to full duplex response time increased. there was also concern that this switch had not been functioning as well as it could. the switch was replaced, and this also improved response time. in addition, the old server purchased through iii was a generic server that had specifications based on the demands of the ils software and didn’t into consideration the amount of traffic going to the proxy server. auraria library, which serves a campus of more than thirty thousand full-time equivalent students, is a library with one of the largest commuter student populations in the country. a new server had been scheduled to be purchased in the near future, so a call was made to the ils vendor to talk about our hypothesis and requirements. the vendor agreed that the library should change the specification on the new server to make sure it served the library’s unique demands. a server will be purchased with increased memory and a second processor to hopefully keep these problems from happening again in the next few years. also, the cabling between the switch and the server was changed to greater facilitate heavy traffic. conclusion although it is sometimes a daunting task to try to discover where problems occur in the library’s database response time because there are so many contributing factors and because librarians often do not feel that they have enough technical knowledge to analyze such problems, there are certain things that can be examined and analyzed. it is important to look at how each library is unique and may be inadequately served by current bandwidth and hardware configurations. it is also important not to be intimidated by computer science literature and to trust patterns of reported problems. the auraria library systems department was fortunate to also be able to compare problems with colleagues at other libraries and test in those libraries, which revealed issues that were unique and therefore most likely due to a problem at the library end. it is important to keep learning about how 32 information technology and libraries | december 200832 information technology and libraries | december 2008 your system functions and to try to diagnose the problem by slowly looking at one piece at a time. though no one ever seems to be completely satisfied with the speed of their network, the employees of auraria library, especially those who work with the public, have been pleased with the increased speed they are experiencing when using proprietary databases. having improved on the responsetime speed issue, other problems that are not caused by the proxy hardware have been illuminated, such as browser configuration, which may be hampering certain databases—something that had been attributed to the network. references 1. webopedia, s.v. “bottleneck,” www.webopedia.com/term/b/bottleneck.html (accessed oct. 8, 2008). 2. david r. gerhan and stephen mutula, “bandwidth bottlenecks at the university of botswana,” library hi tech 23, no. 1 (2005): 102–17 3. claudine badue et al., “basic issues on the processing of web queries,” sigir forum; 2005 proceedings (new york: association for computing machinery, 2005): 577–78. 4. john carlo bertot and charles r. mcclure,” assessing sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (mar. 2007): 14 –22. 5. kazuhiro azuma, takuya okamoto, go hasegawa, and murata masayuki, “design, implementation and evaluation of resource management system for internet servers,” journal of high speed networks 14, no. 4 (2005): 301–16. 6. “cnet bandwidth meter,” http:// reviews.cnet.com/internet-speed-test (accessed oct. 8, 2008). 7. michael j. demaria, “warding off wan gridlock,” network computing nov. 15, 2002, www.networkcomputing.com/ showitem.jhtml?docid=1324f3 (accessed oct. 8, 2008). 8. bertot and mcclure, “assessing sufficiency and quality of bandwidth for public libraries,” 14. 9. azuma, okamoto, hasegawa, and masayuki, “design, implementation and evaluation of resource management system for internet servers,” 302. 10. webopedia, s.v. “half-duplex,” www.webopedia.com/term/h/half _duplex.html (accessed oct. 8, 2008). lita cover 2, cover 3, cover 4 index to advertisers editorial | truitt 107 marc truitteditorial: computing in the “cloud” silver lining or stormy weather ahead? c loud computing. remote hosting. software as a service (saas). outsourcing. terms that all describe various parts of the same it elephant these days. the sexy ones—cloud computing, for example—emphasize new age-y, “2.0” virtues of collaboration and sharing with perhaps slightly mystic overtones: exactly where and what is the “cloud,” after all? others, such as the more utilitarian “remote hosting” and “outsourcing,” appeal more to the bean counters and sustainabilityminded among us. but they’re really all about the same thing: the tradeoff between cost and control. that the issue increasingly resonates with it operations at all levels these days can be seen in various ways. i’ll cite just a few: n at the meeting of the lita heads of library technology (holt) interest group at the 2009 ala annual conference in chicago, two topics dominated the list of proposed holt programs for the 2010 annual conference. one of these was the question of virtualization technology, and the other was the whole white hat–black hat dichotomy of the cloud.1 practically everyone in the room seemed to be looking at—or wanting to know more about—the cloud and how it might be used to benefit institutions. n my institution is considering outsourcing e-mail. all of it—to google. times are tough, and we’re being told that by handing e-mail over to the googleplex, our hardware, licensing, evergreening, and technical support fees will total zero. zilch. with no advertising. heady stuff when your campus hosts thirty-plus central and departmental mail servers, at least as many blackberry servers, and total costs in people, hardware, licensing, and infrastructure are estimated to exceed can$1,000,000 annually. n in the last couple of days, library electronic discussion lists such as web4lib have been abuzz— or do we now say a-twitter?—about amazon’s orwellian kindle episode, in which the firm deleted copies of 1984 and animal farm from subscribers’ kindle e-book readers without their knowledge or consent.2 indeed, amazon’s action was in violation of its own terms of service, in which the company “grants [the kindle owner] the non-exclusive right to keep a permanent copy of the applicable digital content and to view, use, and display such digital content an unlimited number of times, solely on the device or as authorized by amazon as part of the service and solely for [the kindle owner ’s] personal, noncommercial use.”3 all of this has me thinking back to the late 1990s marketing slogan of a manufacturer of consumer-grade mass storage devices—remember removable hard drives? iomega launched its advertising campaign for the 1 gb jaz drive with the catch-line “because it’s your stuff.” ultimately, whether we park it locally or send it to the cloud, i think we need to remember that it is our stuff. what i fear is that in straitened times, it becomes easy to forget this as we struggle to balance limited staff, infrastructure, and budgets. we wonder how we’ll find the time and resources to do all the sexy and forward-looking things, burdened as we are with the demands of supporting legacy applications, “utility” services, and a huge and constantly growing pile of all kinds of content that must be stored, served up, backed up (and, we hope, not too often, restored), migrated, and preserved. the buzz over the cloud and all its variants thus has a certain siren-like quality about it. the notion of signing over to someone else’s care—for little or no apparent cost—our basic services and even our own content (our stuff) is very appealing. the song is all the more persuasive in a climate where we’ve moved from just the normal bad news of merely doing more with less to a situation where staff layoffs are no longer limited to corporate and public libraries, but indeed extend now to our greatest institutions.4 at the risk of sounding like a paranoid naysayer to what might seem a no-brainer proposition, i’d like to suggest a few test questions for evaluating whether, how, and when we send our stuff into the cloud: 1. why are we doing this? what do we hope to gain? 2. what will it cost us? bear in mind that nothing is free—except, in the open-source community, where free beer is, unlike kittens, free. if, for example, the borg offer to provide institutional mail without advertisements, there is surely a cost somewhere. the borg, sensibly enough, are not in business to provide us with pro bono services. 3. what is the gain or loss to our staff and patrons in terms of local customization options, functionality, access, etc? 4. how much control do we have over the service offered or how our content is used, stored, marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 108 information technology and libraries | september 2009 repurposed, or made available to other parties? 5. what’s the exit strategy? what if we want to pick up and move elsewhere? can we reclaim all of our stuff easily and portably, leaving no sign that we’d ever sent it to the cloud? we are responsible for the services we provide and for the content we have been entrusted. we cannot shrug off this duty by simply consigning our services and our stuff to the cloud. to do so leaves us vulnerable to an irreparable loss of credibility with our users; eventually some among them would rightly ask, “so what is it that you folks do, anyway?” we’re responsible for it—whether it’s at home or in the cloud—because it’s our stuff. it is our stuff, right? references and notes 1. i should confess, in the interest of full disclosure, that it was eli neiburger of the ann arbor district library who suggested “hosted services as savior or slippery slope” for next year’s holt program. i’ve shamelessly filched eli’s topic, if not his catchy title, for this column. thanks, eli. also, again in the interest of full disclosure, i suggested the virtualization topic, which eventually won the support of the group. finally, some participants in the discussion observed that virtualization technology and hosting are in many ways two sides of the same topical coin, but i’ll leave that for others to debate. 2. brad stone, “amazon erases orwell books from kindle,” new york times, july 17, 2009, http://www.nytimes .com/2009/07/18/technology/companies/18amazon.html?_ r=1 (accessed july 21, 2009). 3. amazon.com, “amazon kindle: license agreement and terms of use,” http://www.amazon.com/gp/help/customer/ display.html?nodeid=200144530 (accessed july 21, 2009). 4. “budget cutbacks announced in libraries, center for professional development,” stanford university news, june 10, 2009, http://news.stanford.edu/news/2009/june17/layoffs-061709 .html (accessed july 22, 2009; “harvard libraries cuts jobs, hours,” harvard crimson (online edition), june, 26 2009, http:// www.thecrimson.com/article.aspx?ref=528524 (accessed july 22, 2009). president’s message cindi trainor information technologies and libraries | december 2013 1 hi, litans! forum 2013 i'm excited that 2014 is almost here. last month saw a very successful forum in louisville, in my home state of kentucky. there were 243 people in attendance, and about half of those were firsttime attendees. it's also typical of our yearly conference that there are a large number of attendees from the surrounding area; this is one of the reasons that it travels around the country. louisville's forum was the last of a few in the "middle" of the country--these included st. louis, atlanta, and columbus. next year, forum will move back out west, to albuquerque, nm. the theme for next year's conference will be "transformation: from node to network." see the lita blog (http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/) for the call for proposals for concurrent sessions, poster sessions, and pre-conference workshops. goals of the organization at the board meeting in the fall, we took a stab at updating lita's major goal areas. the strategic plan had not been updated since 2010, so we felt it was time to update the goal areas, at least for the short term. the goals that we agreed upon will carry us through annual conference 2015 and will give us time to mount a more complete planning process in the meantime. they are: • collaboration & networking: foster collaboration and encourage networking among our members and beyond so the full potential of technologies in libraries can be realized. • education & sharing of expertise: offer education, publications, and events to inspire and enable members to improve technology integration within their libraries. • advocacy: advocate for meaningful legislation, policies, and standards that positively impact on the current and future capabilities of libraries that promote equitable access to information and technology. • infrastructure: improve lita’s organizational capacity to serve, educate, and create community for its members. midwinter activities in other governance news, the board will have an online meeting in january 2014, prior to cindi trainor (cindiann@gmail.com) is lita president 2013-14 and community specialist & trainer for springshare, llc. http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/ mailto:cindiann@gmail.com president’s message | trainor 2 midwinter conference. our one-hour meeting will be spent asking and answering questions of those who typically submit written reports for board meetings: the vice-president, the president, and the executive director. as always, look to ala connect for these documents, which are posted publicly. we welcome your comments, as well as your attendance at any of our open meetings. our midwinter meeting schedule is: • the week of january 13 online meeting, time and date tba • saturday, january 25, 1:30 4:30 p.m. pcc 107a • monday, january 27, 1:30 4:30 p.m. pcc 115a as always, midwinter will also hold a lita happy hour (sunday, 6-8 pm, location tba), the top tech trends panel (sunday, 10:30 a.m., pcc 204a), and our annual membership meeting, the lita town meeting (monday 8:30 a.m., pcc 120c). we look forward to seeing you, in philadelphia or virtually. make sure to check the midwinter scheduler (http://alamw14.ala.org/scheduler) for all the details, including the forthcoming happy hours location. it's the best party^h^h^h^h^h networking event at midwinter! i would be remiss if i did not mention lita's committees and igs and their midwinter meetings. many will be meeting saturday morning at 10:30 a.m. (pcc 113abc)--so you can tablehop if you like. expressing interest at midwinter is a great way to get involved. can't make it to philadelphia? no problem! fill out the online form to volunteer for a committee, or check out the connect groups of our interest groups. some of the igs meet virtually before midwinter; some committees and igs also invite virtual participation at midwinter itself. join us! http://alamw14.ala.org/scheduler 4 information technology and libraries | march 2005 the challenges encountered in building the international children’s digital library (icdl), a freely available online library of children’s literature are described. these challenges include selecting and processing books from different countries, handling and presenting multiple languages simultaneously, and addressing cultural differences. unlike other digital libraries that present content from one or a few languages and cultures, and focus on either adult or child audiences, icdl must serve a multilingual, multicultural, multigenerational audience. the research is presented as a case study for addressing these design criteria; current solutions and plans for future work are described. t he internet is a multilingual, multicultural, multigenerational environment. while once the domain of english-speaking, western, adult males, the demographics of the internet have changed remarkably over the last decade. as of march 2004, english was the native language of only 35 percent of the total world online population. as of march 2004, asia, europe, and north america each make up roughly 30 percent of internet usage worldwide.1 in the united states, women and men now use the internet in approximately equal numbers, and children and teenagers use the internet more than any other age group.2 creators of online digital libraries have recognized the benefit of making their content available to users around the world, not only for the obvious benefits of broader dissemination of information and cultural awareness, but also as tools for empowerment and strengthening community.3 creating digital libraries for children has also become a popular research topic as more children access the internet.4 the international children’s digital library (icdl) project seeks to combine these areas of research to address the needs of both international and intergenerational users.5 ■ background and related work creating international software is a complex process involving two steps: internationalization, where the core functionality of the software is separated from localized interface details, and localization, where the interface is customized for a particular audience.6 the localization step is not simply a matter of language translation, but involves technical, national, and cultural aspects of the software.7 technical details such as different operating systems, fonts, and file formats must be accommodated. national differences in language, punctuation, number formats, and text direction must be handled properly. finally, and perhaps most challenging, cultural differences must be addressed. hofstede defines culture as “the collective mental programming of the mind which distinguishes the members of one group or category of people from another.”8 these groups might be defined by national, regional, ethnic, religious, gender, generation, social class, or occupation differences. by age ten, most children have learned the value system of their culture, and it is very difficult to change. hofstede breaks culture into four components: values, rituals, heroes, and symbols. these components manifest themselves everywhere in software interfaces, from acceptable iconic representations of people, animals, and religious symbols to suitable colors, phrases, jokes, and scientific theories.9 however, as hoft notes, culture is like an iceberg: only 10 percent of the characteristics of a culture are visible on the surface.10 the rest are subjective, unspoken, and unconscious. it is only by evaluating an interface with users from the target culture that designers can understand if their software is acceptable.11 developers of online digital libraries have had to contend with international audiences for many years, and the marc and oclc systems have reflected this concern by including capabilities for transliteration and diacritical characters (accents) in various languages.12 however, it is only more recently, with the development of international character-set standards and web browsers that recognize these standards, that truly international digital libraries have emerged. greenstone, an the international children’s digital library: a case study in designing for a multilingual, multicultural, multigenerational audience hilary browne hutchinson, anne rose, benjamin b. bederson, ann carlson weeks, and allison druin hilary browne hutchinson (hilary@cs.umd.edu) is a faculty research assistant in the institute for advanced computer studies and a ph.d. student in the department of computer science. anne rose (rose@cs.umd.edu) is a faculty research assistant in the institute for advanced computer studies. benjamin b. bederson (bederson@cs.umd.edu) is an associate professor in the department of computer science and the institute for advanced computer studies and director of the human-computer interaction laboratory. ann carlson weeks (acweeks@umd.edu) is professor of the practice in the college of information studies. allison druin (allisond@umiacs.umd.edu) is an assistant professor in the college of information studies and the institute for advanced computer studies. all authors are affiliated with the university of maryland-college park and the human-computer interaction laboratory. open-source software project based in new zealand, allows people to create online digital libraries in their native language and culture.13 oclc recently completed a redesign of firstsearch, a web-based bibliographic and full-text retrieval service, to accommodate users with different software, languages, and disabilities.14 researchers at virginia tech redesigned citidel, an online collection of computer-science technical reports, to create an online community that allows users to translate their interface into different languages.15 researchers have also realized that beyond accessibility, digital libraries have enormous potential for empowerment and building community, especially in developing countries. witten et al. and downie describe the importance of community involvement when creating a digital library for a particular culture, both to empower users and to make sure the culture is accurately reflected.16 even more than accurately reflecting a culture, a digital library also needs to be understood by the culture. duncker notes that a digital-library interface metaphor based on a traditional physical library was incomprehensible to the maori culture in new zealand, who are not familiar with the conventions of western libraries.17 in addition to international libraries, a number of researchers have focused on creating digital libraries for children. recognizing that children have difficulty with spelling, reading, and typing, as well as traditional categorization methods such as the dewey decimal system, a number of researchers have created more child-friendly digital libraries.18 pejtersen created the bookhouse interface with a metaphor of rooms in a house to support different types of searching.19 külper et al. designed the bücherschatz interface for children who are eight to ten years old using a treasure-hunt metaphor.20 druin et al. designed the querykids interface for young children to find information about animals.21 theng et al. used the greenstone software to create an environment for older children to write and share stories.22 the icdl project seeks to build on and combine research in both international and children’s digital libraries. as a result, icdl is more ambitious than other digital library projects in a number of respects. first, it is designed for a broader audience. while the digital libraries already described target one or a few cultures or languages, icdl’s audience includes potentially every culture and language in the world. second, the content is not localized. part of the library’s goal is to expose users to books from different cultures, so it would be counterproductive to present books only in a user’s native language. as a result, the interface not only supports multiple languages and cultures, but it also supports them simultaneously, frequently on the same screen. third, icdl’s audience not only includes a broad group of adults from around the world, but also children from three to thirteen years of age. to address these challenges, a multidisciplinary, multilingual, multicultural, and multigenerational team was created, and the development was divided into several stages. in the first stage, completed in november 2002, a java-based, english-only version of the library was created that addressed the searching and reading needs of children. in the second stage, completed in may 2003, an html version of the software was developed that addressed the needs of users with minimal technology. in the third stage, completed in may 2004, the metadata for the books in the library were translated into their native languages, allowing users to view these metadata in the language of their choice. the final stage, currently in progress, involves translating the interface to different languages and adjusting some of the visual design of the interface according to the cultural norms of the associated language being presented. in this paper, the research is presented as a case study, describing the solutions implemented to address some of these challenges and plans for addressing ongoing ones. ■ icdl project description the icdl project was initiated in 2002 by the university of maryland and the internet archive with funding from the national science foundation (nsf) and the institute for museum and library services (imls). today, the projects continues at the university of maryland. the goals of the project include: ■ creating a collection of ten thousand children’s books in one hundred languages; ■ collaborating with children as design partners to develop new interfaces for searching, browsing, reading, and sharing books in the library; and ■ evaluating the impact of access to multicultural materials on children, schools, and libraries. the project has two main audiences: children three to thirteen years of age and the adults who work with them, as well as international scholars who study children’s literature. the project draws together a multidisciplinary team of researchers from computer science, library science, education, and art backgrounds. the research team is also multigenerational—team members include children seven to eleven years of age, who work with the adult members of the team twice a week during the school year and for two weeks during the summer to help design and evaluate software. using the methods of cooperative inquiry, including brainstorming, lowtech prototyping, and observational note taking, the team has researched, designed, and built the library’s category structure, collection goals, and searching and reading interfaces.23 the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 5 6 information technology and libraries | march 2005 the research team is also multilingual and multicultural. adult team members are native or fluent speakers of a number of languages besides english, and are working with school children and their teachers and librarians in the united states, new zealand, honduras, and germany to study how different cultures use both physical and digital libraries. the team is also working with children and their teachers in the united states, hungary, and argentina to understand how children who speak different languages can communicate and learn about each other’s cultures through sharing books. finally, an advisory board of librarians from around the world advises the team on curatorial and cultural issues, and numerous volunteers translate book and web-site information. ■ icdl interface description icdl has four search tools for accessing the current collection of approximately five hundred books in thirty languages: simple, advanced, location, and keyword. all are implemented with java servlet technology, use only html and javascript on the client side, and can run on a 56k modem. these interfaces were created during the first two development phases. the team visited physical libraries to observe children looking for books, developed a category hierarchy of kid-friendly terms based on these findings, and designed different tools for reading books.24 using the simple interface (figure 1), users can search for books using colorful buttons representing the most popular search categories. the advanced interface (figure 2), allows users to search for books in a compact, text-link-based interface that contains the entire librarycategory hierarchy. by selecting the location interface (figure 3), users can search for books by spinning a globe to select a continent. finally, with the keyword interface, users search for books by typing in a keyword. younger children seem to prefer the simplicity and fun of the location interface, while older children enjoy browsing the kid-friendly categories, such as colors, feelings, and shapes.25 all of these methods search the library for books with matching metadata. users can then read the book using a variety of book readers, including standard html pages and more elaborate java-based tools developed by the icdl team that present book pages in comic or spiral layouts (figures 4–6). in addition to the public interface, icdl also includes a private web site that was developed for book contributors to enter bibliographic metadata about the books they provide to the library (figures 7 and 8). using the metadata interface, contributors can enter information about their books in the native language of the book, and optionally translate or transliterate this information into english or latin-based characters. the design of icdl is driven by its audience, which includes users, contributors, and volunteers of all ages from around the world—more than six hundred thousand unique visitors from more than two hundred countries (at last count). as a result, books written in many different languages for users of different ages and cultural backgrounds must be collected, processed, stored, and presented. the rest of this paper will describe some of the challenges encountered and that are still being encountered in the development process, including selecting and processing a more diverse collection of books, handling different character sets and fonts, and addressing differences in cultural, religious, social, and political interpretation. figure 2. icdl advanced interface figure 1. icdl simple interface ■ book selection and processing the first challenge in the icdl project is obtaining and managing content. collecting books from around the world is a challenge because national libraries, publishers, and creators (authors and illustrators) all have different rules regarding copyrights. the goal is to identify and obtain award-winning children’s books from around the world, for example, books on the white ravens list, which are also made available to icdl users (www. icdlbooks.org/servlet/whiteravens).26 however, unsolicited books are received, frequently in languages the team cannot read. as a result, members of the advisory board and various children’s literature organizations in different countries are relied on to review these books. these groups help determine whether books are relevant and acceptable in the culture they are from, and whether they are appropriate for the three-to-thirteen age group. these groups are eager to help; including them in the process is an effective way to build the project and the community surrounding it. in addition to collecting and scanning books, bibliographical metadata in the native language of the book (title, creator[s], publisher, abstract) are also collected via the web-based metadata form filled out by the book contributors. it was decided to base the icdl metadata specification on the dublin core because of its international background, ability to be understood by nonspecialists, and the possibilities to extend its basic elements to meet icdl’s specific needs (see www.icdlbooks.org/ metadata/specification for more details).27 contributors who provide metadata have the option of translating them to english; they also can transliterate them to latin characters, if necessary. regardless of what language or figure 5. icdl comic book reader figure 3. icdl location interface figure 4. icdl standard book reader figure 6. icdl spiral book reader the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 7 8 information technology and libraries | march 2005 languages they provide, they are asked to provide information that they create themselves, such as the abstract, in a format that is easily understandable by children. simple, short sentences make the information easy for children to read, and easier to translate to other languages. the metadata provided allow the team to catalog the books for browsing according to the various categories and to index the books for keyword searching. even though translation to english is optional, the englishspeaking metadata team needs the metadata in english in order to catalog the books. since many contributors do not have the time or ability to provide all of this information, volunteers who speak different languages are relied on to check the metadata that get submitted, and translate or transliterate them as necessary. this method allows information to be collected from contributors without overwhelming them, and also helps build and maintain the volunteer community. ■ handling different character sets the metadata form allows contributors to provide information from the comfort of an operating system and keyboard in their native language, but this flexibility requires software that can handle many different character sets. for example, english uses a latin character set; russian uses a cyrillic character set; and an arabic character set is used for persian/farsi. fortunately, there exists a single character set called unicode, an international, cross-platform standard that contains a unique encoding for nearly every character in every language.28 unfortunately, not all software supports unicode as yet. in the first stage of implementation in icdl, metadata information was collected only in english, so unicode compliance was not a problem. however, in the next phase of development, which included collecting and presenting metadata in the native language of all of the books, the software had to be adjusted to use unicode because icdl supports potentially every language in the world. the open-source mysql database, recently upgraded to allow storage of unicode data, was already in use for storing metadata. icdl’s web applications run on apache http and tomcat web servers, both of which are freely available and unicode-compliant. however, both the web site and the database had to be internationalized and localized to separate the template for metadata presentation from the content in different languages. a unicode-compliant database driver was necessary for passing information between the database and the web site. both the public and metadata web-site applications are written using freely available java servlet technology. the java language is unicode-compliant, but some adjustments had to be made to icdl’s servlet code to force it to handle data using unicode. to allow users to conduct keyword searches for books in the public interface, apache’s freely available lucene search engine is used to create indices of book metadata, which can then be searched. lucene is unicode-compliant, but a separate index for each language had to be created, requiring users to select a search language. this requirement was necessary for two reasons: (1) to avoid confusion over the same words with different meanings (bra means good in swedish); and (2) different languages have different rules for stopwords to ignore (the, of, a in english), truncation of similar words (cats has the same root as cat in english), and separation of characters (chinese does not put white space between symbols). lucene has text analyzers for a variety of languages that support these different conventions. for languages that figure 8. icdl metadata interface with japanese metadata figure 7. icdl metadata interface with spanish metadata lucene does not support, icdl volunteers translated english stopwords, and simple text analyzers were created by the team. finally, html headers created by the java servlets had to be modified to indicate that the content being delivered to users was in unicode. most current browsers and operating systems recognize and handle web pages properly delivered in unicode. for those that do not, help pages were created that explain how to configure common browsers to use unicode, and how to upgrade older browsers that do not support unicode. by making the icdl systems fully unicode-compliant, contributors from all over the world can enter metadata about books in an easily accessible html form using their native languages, and the characters are properly transmitted and stored in the icdl database. volunteers can then use the same form to translate or transliterate the metadata as necessary. finally, this information can be presented to our users when they look at books. for example the book where’s the bear? (harris, 1997) is written in six different languages.29 the original metadata came in english, but icdl volunteers translated them to italian, japanese, french, spanish, and german. users looking at the preview page for this book in the library have the opportunity to change the display language of the book to any one of these languages using a pull-down menu (figures 9 and 10). currently, only the book metadata language can be changed, but in the next stage of development, all of the surrounding interface text (navigation, labels) will be translated to different languages as well. the plan for doing this is to take a similar approach to the citidel and greenstone projects by creating a web site where volunteers can translate words and phrases from the icdl interface into their native language.30 like the creators of citidel, the team believes that machine-based translation would not provide good enough results. unfortunately, the resources do not exist for the team to do the translating themselves. encouraging volunteers to translate the site will help enlarge and enrich the icdl community. for languages that do not receive volunteer translation, translation services are an affordable alternative. ■ character-set complications several issues have arisen as a result of collecting multilingual metadata in many character sets. first, different countries use different formats for dates and times, so contributors are allowed to specify the calendar used when they enter date information (muslim or julian). second, not only do different countries use different formats for numbers, the numbers themselves are also different. for example, the arabic numbers for 1, 2, 3 are even though java is unicode-compliant, it treats numbers as latin characters, necessitating the storing of latin versions of any non-latin numbers used internally by the software for calculations, such as bookpage count. a third issue is that some of the metadata, such as author and illustrator names, need to be transliterated so their values can be displayed when the metadata are shown in a latin-based language. ideally, the transliteration standards used for a language need to be consistent so that the same values are always transliterated the same way. unfortunately, the team has found no practical way to enforce this, except to state the standard to be used in icdl metadata specification. when different standards are used, it makes comparison of equal items much more difficult. for example, the same persian/farsi creator has been figure 10. where’s the bear? in japanese figure 9. where’s the bear? in english the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 9 10 information technology and libraries | march 2005 transliterated as both “hormoz riyaahi” and “hormoz riahi.” it cannot be assumed that a person is the same just because the name is the same (john smith), and when a name is in a character set that the team cannot understand, this problem becomes more challenging. finally, there was the question of how to handle differences in character-set length and direction in the interface. different languages use different numbers of characters to present the same text. icdl screens had to be designed in such a way that the metadata in languages with longer or shorter representations than the english version would still fit. the team anticipates having to make additional interface changes to accommodate longer labels and navigational aids when the remainder of the interface is translated. the fact also had to be considered that, while most languages are read left to right, a few (arabic and hebrew) are read right to left. as a result, screens were designed so that book metadata were reasonably presented in either direction. currently, only the text is displayed right to left, but eventually the goal is to mirror the entire interface to be oriented right to left when content is shown in right-to-left languages. for the problem of how to handle the arrows for turning pages in right-to-left languages—since these arrows could be interpreted as either “previous” and “next” or “left” and “right”—“previous” and “next” were chosen for consistency, so they work the same way in leftto-right books and right-to-left books. ■ font complications while most current browsers and operating systems recognize unicode characters, whether or not the characters are displayed properly depends on whether users have appropriate fonts installed on their computers. for instance, a user looking at where’s the bear? and choosing to display the metadata in japanese will see the japanese metadata only if the computer has a font installed that includes japanese characters. otherwise, depending on the browser and operating system, he may see question marks, square boxes, or nothing at all instead of the japanese characters. the good news is that many users will never face this problem. the interface for icdl is presented in english (until it is translated to other languages). since most operating systems come with fonts that can display english characters, the team has metadata in english (always presented first by default) for nearly all the books. users who choose to display book metadata in another language are likely to do so because they actually can read that language, and therefore are likely to have fonts installed for displaying that language. furthermore, many commonly used software packages, such as microsoft office, come with fonts for many languages. as a result, many users will have fonts installed for more languages than just those required for the native language of their operating system. of course, fonts will still be a problem for other users, such as those with new computers that have not yet been configured with different fonts or those using a public machine at a library. these users will need to install fonts so they can view book metadata, and eventually the entire interface, in other languages. to assist these users, help pages have been created to assist users with the process of installing a font on various operating systems. ■ issues of interpretation while technical issues have been a major challenge for icdl, a number of nontechnical issues relating to interpretation have also been encountered. first, until the interface has been translated into different languages, visual icons are crucial for communicating information to young children who cannot read, and to users who do not speak english. however, certain pictorial representations may not be understood by all cultures, or worse, may offend some cultures. for example, one icon showing a boy sticking out his tongue had to be redesigned when it was learned this was offensive in the chinese culture. the team has also redesigned other icons, such as those using stars as the rating system for popular books. the original icons used five-sided stars, which are religiously significant, so they were changed to more neutral sevenor eight-sided stars. as the team continues to internationalize the interface, there will likely be a need to change other icons that are difficult to represent in a culturally neutral way when the interface is displayed in different languages. for instance, it is a real challenge to create icons for categories such as mythology or super heroes, since the symbols and stories for these concepts differ by culture. icons for such categories as funny, happy, and sad are also complicated because certain common american facial and hand representations have different, sometimes offensive, meanings in different cultures. what is considered funny in one culture (a clown) may not be understood well by another culture. different versions of such icons may have to be created, depending on the language and cultural preferences of users. the team relies on its multicultural members, volunteers, and advisory board to highlight these concerns. religious, social, and political problems of interpretation have also been encountered. icdl’s collection develops unevenly as relationships are built with various publishers and libraries. as a result, there are currently many arabic books and only a few hebrew books; this has generated multiple e-mails from users concerned that icdl is taking a political stance on the arab-israeli conflict. to address this concern, the team is currently working to develop a more balanced collection. many books published in hong kong are received from contributors in either hong kong or china who want their own country to be credited with publication. to address this concern, it was decided to credit the publication country as “hong kong/china” to avoid offending either party. finally, some books have been received with potentially objectionable content. some of these are historical books involving presentation of content that is now considered derogatory. some include subject matter that may be deemed appropriate by some cultures but not by others. some include information that may be too sophisticated for children three to thirteen years of age in any culture. while careful not to include books that are inappropriate for children in this age group, the team does not want to censor books whose content is subjectively offensive. instead, such contributors are consulted to make sure they were aware of icdl collection-development guidelines. if they believe that a book is historically or culturally appropriate, the book is included. a statement is also provided at the bottom of all the book pages indicating that the books in the library come from diverse cultures and historical periods and may not be appropriate for all users of the library. ■ conclusions and lessons learned designing a digital library for an international, intergenerational audience is a challenging process, but it is hugely rewarding. the team is continually amazed with feedback from users all over the world expressing thanks that books are made available from their countries, from teachers who use the library as a resource for lesson planning, from parents who have discovered a new way to read with their children, and from children who are thrilled to discover new favorite books that they cannot get in their local library. thus, the first recommendation the team can make based on experience is that creating international digital-library resources for children is a rich and rewarding area of research that others should continue to explore. a second important lesson learned is that an international, intergenerational team is an absolute necessity. simply having users and testers from other countries is not enough; their input is valuable, but it comes too late in the design process to influence major design changes. team members from different cultural backgrounds offer perspectives that an american-only team simply would not think to consider. similarly, team members who are children understand how children like to look for and read books, and what interface tools are difficult or easy, and fun or not fun. enthusiastic advisors and volunteers are also a crucial resource. the icdl team does not have the time, money, or resources to address all of the issues that surface, and advisors and volunteers are key resources in the development process. bringing together as diverse a team as possible is highly recommended. the goals of educational enrichment and international understanding in an international library make it an attractive resource for people to want to help, so assembling such a team is not as difficult as it sounds. beyond the human resources, the technical resources involved in making icdl an international environment necessitate the examination and adjustment of software and interfaces at every level. unlike many digital libraries that only focus on one or a few languages, icdl must be simultaneously multilingual, multicultural, and multigenerational. as a result, a third lesson is that freely available and open-source technologies are now available for making the necessary infrastructure meet these criteria. with varying degrees of complexity, the team was able to get all the pieces to work together properly. the more difficult challenge, unfortunately, falls on icdl’s users, who may need to install new fonts to view metadata in different languages. however, as computer and browser technologies advance to reflect more global applications, this problem is expected to lessen and eventually disappear. having technical staff capable of searching for and integrating open-source tools with international support to handle these technical issues is highly recommended, as well as usability staff versed in the nuances of different operating systems and browsers. finally, the more subjective issue of cultural interpretation has proven to be the most interesting challenge. it is one that will likely not disappear as icdl’s collection grows and the next stage of development is embarked on for translating the interface to support other languages and cultures. the fourth lesson learned is that culture pervades every aspect of both the visual design and the content of the interface, and that it is necessary to examine one’s own biased cultural assumptions to ensure respect of others. however, with the enthusiasm that continues to be seen in the icdl team members, advisors, volunteers, and users, future design challenges will be able to be addressed with their help. the final recommendation is to actively seek feedback from team members, volunteers, and users from different backgrounds about the cultural appropriateness of all aspects of your software. it may not be possible to address all cultures in your audience right away, but it is important to have a framework in place so that these issues are addressed eventually. the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 11 12 information technology and libraries | march 2005 ■ acknowledgments icdl is a large project with many people who make it the wonderful resource that it has become. we thank them all for their continued hard work, as well as our many volunteers and our generous contributors. we would especially like to thank nsf for our information technology research grant, and imls for our national leadership grant. without this generous funding, our research would not be possible. references 1. internet world stats. accessed mar. 9, 2005, www.internet worldstats.com 2. national telecommunications and information administration (2004). “a nation online: entering the broadband age.” accessed mar. 9, 2005, www.ntia.doc.gov/reports/anol/index. html. 3. i. witten et al., “the promise of digital libraries in developing countries,” communications of the acm 44, no. 5 (2001): 82–85; j. downie, (2003). “realization of four important principles in cross-cultural digital library development,” workshop paper for jcdl 2003. accessed dec. 16, 2004, http://music -ir.org/~jdownie/jcdl03_workshop_downie_dun.pdf. 4. p. busey and t. doerr, “kid’s catalog: an information retrieval system for children,” youth services in libraries 7, no. 1 (1993): 77–84; u. külper, u. schulz, and g. will, “bücherschatz—a prototype of a children’s opac,” information services and use no. 17 (1997): 201–14; a. druin et al., “designing a digital library for young children: an intergenerational partnership,” in proceedings of the acm/ieee-cs joint conference on digital libraries (new york: association for computing machinery, 2001), 398–405. 5. a. druin, “what children can teach us: developing digital libraries for children with children,” library quarterly (in press). accessed dec. 16, 2004, www.icdlbooks.org. 6. a. marcus, “global and intercultural user-interface design,” in j. jacko and a. sears, eds., the human-computer interaction handbook (mahwah, n.j.: lawrence erlbaum assoc., 2002), 441–63. 7. t. fernandes, global interface design (boston: ap professional, 1995). 8. g. hofstede, cultures and organizations: software of the mind (new york: mcgraw-hill, 1991). 9. fernandes, global interface design. 10. n. hoft, “developing a cultural model,” in e. del galdo and j. nielsen, eds., international user interfaces (new york: wiley, 1996), 41–73. 11. j. nielsen, “international usability engineering,” in e. del galdo, and j. nielsen, eds., international user interfaces (new york: wiley, 1996), 1–13. 12. c. borgman, “multimedia, multicultural, and multilingual digital libraries: or, how do we exchange data in 400 languages?” d-lib magazine 3 (june 1997). 13. i. witten et al., “greenstone: a comprehensive opensource digital library software system,” in proceedings of digital libraries 2000 (new york: association for computing machinery, 2000), 113–21. 14. g. perlman, “the firstsearch user interface architecture: universal access for any user, in many languages, on any platform,” in cuu 2000 conference proceedings (new york: association for computing machinery, 2000), 1–8. 15. s. perugini et al., “enhancing usability in citidel: multimodal, multilingual, and interactive visualization interfaces,” in proceedings of jcdl ‘04 (new york: association for computing machinery, 2004), 315–24. 16. witten et al., “the promise of digital libraries in developing countries”; downie, “four important principles.” 17. e. duncker, “cross-cultural usability of the library metaphor,” in proceedings of jcdl ‘02 (new york: association for computing machinery, 2002), 223–30. 18. p. moore, and a. st. george, “children as information seekers: the cognitive demands of books and library systems,” school library media quarterly 19 (1991): 161–68; p. solomon, “children’s information retrieval behavior: a case analysis of an opac,” journal of the american society for information science and technology 44, no. 5 (1993): 245–64; busey and doerr, “kid’s catalog,” 77–84. 19. a. pejtersen, “a library system for information retrieval based on a cognitive task analysis and supported by an iconbased interface,” acm conference on information retrieval (new york: association for computing machinery, 1989), 40–47. 20. külper et al., “bücherschatz—a prototype of a children’s opac,” 201–14. 21. druin et al., “designing a digital library,” 398–405. 22. y. theng et al., “dynamic digital libraries for children,” in proceedings of the joint conference on digital libraries (new york: association for computing machinery, 2001), 406–15. 23. a. druin, “cooperative inquiry: developing new technologies for children with children,” in proceedings of human factors in computing (new york: association for computing machinery, 1999), 592–99. 24. j. hourcade et al., “the international children’s digital library: viewing digital books online,” interacting with computers 15 (2003): 151–67. 25. k. reuter and a. druin, “bringing together children and books: an initial descriptive study of children’s book searching and selection behavior in a digital library,” in proceedings of american society for information science and technology conference (in press). 26. international youth library, the white ravens 2004. available for purchase at. www.ijb.de/index2.html (accessed dec. 16, 2004). 27. dublin core metadata initiative. accessed dec. 16, 2004, www.dublincore.org. 28. unicode consortium (2004). accessed dec. 16, 2004, www.unicode.org. 29. j. harris, where’s the bear? (los angeles: the j. paul getty museum, 1997). 30. perugini et al., “enhancing usability in citidel,” 315–24. lib-mocs-kmc364-20140106083930 198 an algorithm for compaction of alphanumeric data william d. schieber, george w. thomas: central library and documentation branch, international labour office, geneva, switzerland description of a technique for compressing data to be placed in computer auxiliary storage. the technique operates on the principle of taking two alphabetic characters frequently used in combination and replacing them with one unused special character code. such une-for-two replacement has enabled the ilo to achieve a rate of compression of 43.5% on a data base of approximately 40,000 bibliographic records. introduction this paper describes a technique for compacting alphanumeric data of the type found in bibliographic records. the file used for experimentation is that of the central library and documentation branch of the international labour office, geneva, where approximately 40,000 bibliographic records are maintained on line for searches done by the library for its clients. work on the project was initiated in response to economic pressure to conserve direct-access storage space taken by this particularly large file. in studying the problem of how to effect compaction, several alternatives were considered. the first was a recursive bit-pattern recognition technique of the type developed by demaine ( 1,2), which operates mdependently of the data to be compressed. this approach was rejected because of the apparent complexity of the coding and decoding algorithms, and also because early analyses indicated that further development of the second type of approach might ultimately yield higher compression ratios. compaction of alphanumeric datajschieber and thomas 199 the second type of approach involves the replacement, by shorter nondata strings, of longer character strings known to exist with a high frequency in the data. this technique is data dependent and requires an analysis of what is to be encoded. one such method is to separate words into their component parts: prefixes, stems and suffixes; and to effect compression by replacing these components with shorter codes. there have been several successful algorithms for separating words into their components. salton ( 3) has done this in connection with his work on automatic indexing. resnikoff and dolby ( 4,5) have also examined the problem of word analysis in english for computational linguistics. although this method appears to be viable as the basis of a compaction scheme, it was here excluded because ilo data was in several languages. moreover, dolby and resnikoff's encoding and decoding routines require programs that perform extensive word analysis and dictionary look-up procedures that ilo was not in a position to develop. the actual requirements observed were twofold: that the analysis of what strings were to be encoded be kept relatively simple, and that the encoding algorithm must combine simplicity and speed presumably by minimizing the amount of dictionary look-up required to encode and decode the selected string. one of the most straightforward examples of the use of this technique is the work done by snyderman and hunt ( 6 ) that involves replacement of two data characters by single unused computer codes. however, the algorithm used by them does not base the selection of these two-character pairs (called "digrams") on their frequency of occurrence in the data. the technique described here is an attempt to improve and extend the concept by encoding digrams on the basis of frequency. the possibility of encoding longer character strings is also examined. three other related discussions of data compaction appear in papers by myers et al. (7) and by demaine and his colleagues (8,9). the compression technique the basic technique used to compact the data file specifies that the most-frequently occurring digrams be replaced by single unused specialcharacter codes. on an eight-bit character machine of the type used, there are a total of 256 possible character codes (bytes ) . of this total only a small number are allocated to graphics (that is, characters which can be reproduced by the computer's printer). in addition, not all of the graphics provided for by the computer manufacturer appear in the user's data base. thus, of the total code set, a large portion may go unused. characters that are unallocated may be used to represent longer character strings. the most elementary form of substitution is the replacement of specific digrams. if these digrams can be selected on the basis of frequency , the compression ratio will be better than if selection is done independent of frequency. 200 journal of library automation vol. 4/4 december, 1971 this requires a frequency count of all digrams appearing in the data, and a subsequent ranking in order of decreasing frequency. once the base character set is defined, and the digrams eligible for replacement are selected, the algorithm can be applied to any string of text. the algorithm consists of two elements: encoding and decoding. in encoding, the string to be encoded is examined from left to right. the initial character is examined to determine if it is the first of any encodable digram. if it is not, it is moved unchanged to the output area. if it is a possible candidate, the following character is checked against a table to verify whether or not this character pair can be replaced. if replacement can be effected, the code representing the digram is moved to the output area. if not, the algorithm then moves on to treat the second character in precisely the same way as the first. the algorithm continues, character-by-character until the entire string has been encoded. following is a step-by-step description of the element. 1) load length of string into a counter. 2) set pointer to first character in string. 3) check to determine whether character pointed can occur in combination. if character does not occur in combination, point to next character and repeat step 3. 4) if character can occur in combination, check following character in a table of valid combinations with the first character. if the digram cannot be encoded, advance pointer to next character and return to step 3. 5) if the digram is codable, move preceeding non-codable characters (if any) to output area, followed by the internal storage code for the digram. 6) decrease the string length counter by one, advance pointer two positions beyond current value and return to step 3. in the following example assume that only three digrams are defined as codable: ab, be and de. assume also that the clear text to be encoded is the six-character string abcdef. after encoding the coded string would appear as: ab c de f a horizontal line is used to represent a coded pair, a dot shows a single (non-combined) character. the encoded string above is of length four. note that although bc was defined as an encodable digram, it did not combine in the example above because the digram ab was already encoded as a pair. the characters c and f do not combine, so they remain uncoded. note also that if the digram ab had not been defined as codable, the resultant combination would have been different in this case: a bc de f compaction of alphanumeric data j schieber and thomas 201 the decoding algorithm serves to expand a compressed string so that the record can be displayed or printed. as in the encoding routines, decoding of the string goes from left to right. bytes in the source string are examined one by one. if the code represents a single character, the print code for that character is moved to the output string. if the code represents a digram, the digram is moved to the output string. decoding proceeds byte-by-byte as follows until end of string is reached: 1 ) load string length into counter. 2 ) set pointer to first byte in record. 3 ) test character. if the code represents a single character, point to next source byte and retest. 4) if the code represents a digram: move all bytes ( if any ) up to the coded digram; and move in the digram. 5) increase the length value by one, point to next source byte and continue with step 3. application of the technique the algorithm, when used on the data base of approximately 40,000 records was found to yield 43.5% compaction. the file contains bibliographic records of the type shown in figure 1. 413.5 1970 70al350 warner m stone m the data bank societyorganizations, computers and social freedom. london, george allen and unwin, <1970>. 244 p. charts. /social research/ into the potential thrf.at to privacy and freedom f/human right/sl through thf misuse of /data bank/s examines /computer/ based /information ---~ieval/, the impact of computer technology on branches of the /public administration/ ann /health service/$ in the /usa/ ano the /uk/ ano co~cluoes that, in order to protect human dignity, the new powers must be kept tn chf.ck. /bibliography/ pp. 236 to 242 ano /reference/$. engl fig. 1 . sample record from test file. each record contains a bibliographic se gment as well as a brief abstract containing descriptors placed between slashes for computer identification. a large amount of blank space appears on the printed version of these records; however, the uncoded machine readable copy does not contain blanks, except between words and as filler characters in the few fields defined as fixed-length. the average length of a record is 535 characters ( 10) . 202 journal of library automation vol. 4/4 december, 1971 the valid graphics appearing in the data are shown in table 1, along with the percentage of occurrence of each character throughout the entire file. table 1. single-character frequency freq. freq. freq. freq. freq. graphic % graphic % graphic % graphic % graphic % b 14.87 i 4.32 h 1.58 0.63 8 0.31 e 7.63 c 3.48 1.52 w 0.50 ( 0.28 n 6.38 l 3.32 ' 1.52 2 0.42 ) 0.28 i 6.01 d 2.32 1 1.08 k 0.42 + 0.21 a 6.01 u 2.21 v 0.91 3 0.40 j 0.15 (/j 5.86 p 2.12 b 0.87 5 0.37 x 0.14 t 5.50 m 2.02 9 0.83 7 0.37 z 0.13 r 4.82 f 1.61 y 0.82 0 0.35 q 0.08 s 4.61 g 1.58 6 0.81 4 0.34 misc. 0.01 spec. as might be expected, the blank (b) occurs most frequently in the data because of its use as a word separator. the slash occurs more frequently than is normal because of its special use as a descriptor delimiter. it should also be noted that the data contains no lower-case characters. this is advantageous to the algorithm because it considerably le~sens the total number of possible digram combinations. as a result, a larger proportion of the file is codable in the limited set chosen as codable pairs, and because the absence of 26 graphics allows the inclusion of 26 additional coded pairs. in the file used for compaction there are 58 valid graphics. allowing one character for special functions leaves 197 unallocated character codes (of a total of 256 possible ). a digram frequency analysis was performed on the entire file and the digrams ranked in order of decreasing frequency. from this list the first 197 digrams were selected as those which were eligible for replacement by single-character codes. table 2 shows these "encodable" digrams arranged by lead character. the algorithm was programmed in assembler language for use on an ibm 360/40 computer. the encoding element requires approximately 8,000 bytes of main storage; the decoding element requires approximately 2,000 bytes. in order to obtain data on the amount of computer time required to encode and decode the file, the following tests were performed. to find the encoding time, the file was loaded from tape to disk. the tape copy of the file was uncoded, the disk copy compacted. loading time for 41,839 records was 52 minutes and 51 seconds. the same tape to disk operation without encoding took 28:08. the time difference ( 24:43) represents encoding time for 41,839 records, or .035 seconds per record. a decoding test was done by unloading the previously coded disk file to tape. the time taken was 41:52, versus a time of 20:20 for unloading compaction of alphanumeric dataischieber and thomas 203 an uncompacted file. the time difference (21:32) represents decoding time for 41,839 records, or .031 seconds per record. the compaction ratio, as indicated above, was 43.5 per cent. for purposes of comparison, the algorithm developed by snyderman and hunt ( 6) was tested and found to yield a compaction ratio of 32.5% when applied to the same data file. table 2. most frequently occuring digrams lead char. a b c d e f g h i l m n 0 p r s t u v w y b 1 i ) eligible digrams ab ac ad ag ai al am an ap ar as at ab bl bo ca ce ch ci cl co ct cu cb c. dedi du db dl ea ec ed ef el em en ep er es et ev eb el fe fifo fr f~ ge gl gr gb gl ha he hi ho hb la ic ie il in 10 is it iv la le li ll lo lu us ma me mi mm mu mhs na nc nd ne ng ni no ns nt nla nl oc od of og ol om on op or ou ov ol,a pa pe pl po pr p. ra re ri rk rn ro rs rt ru ry rb rl sa se sl so sp ss st su shs s, s. ta tc te th ti to tr ts tu ty tb t i uc ud ul un ur us ut va ve vi wo yhs yl lisa hsb bc bd be hsg lal lal bm bn bo hip l;6r bs hit l;6u l;6w };6};6 l/j i l/j-. l/j ( 19 1 a ; c je 11 / l ; m jp jr ; s jt jb 1, ,b .l/j -b ), possible extension of the algorithm currently the compression technique encodes only pairs of characters. there might be good reason to extend the technique to the encoding of longer strings-provided a significantly higher compaction ratio could be 204 journal of library automation vol. 4/4 december, 1971 achieved without undue increase in processing time. one could consider encoding trigrams, quadrigrams, and up to n-grams. the english wo~d ·'the", for example, may occur often enough in the data to make it worth coding. the arguments against encoding longer strings are several. prime among these is the difficulty of deciding what is to be encoded. doing an analysis of digrams is a relatively straightforward affair, whereas an analysis of trigrams and longer strings is considerably more costly, because of the fact that there are more combinations. furthermore, if longer strings are to be en'coded, the algorithms for encoding and decoding become more complex and time-consuming to employ. one approach to this type of extension is to take a particular type of character string, namely a word, and to encode certain words which appear frequently. a test of this technique was made to encode particular words in the data: descriptors . all descriptors (about 1200 in number) appear specially marked by slashes in the abstract field of the record. each descriptor (including the slashes) was replaced by a two-character code. after replacement, the normal compaction algorithm was applied to the record. a compaction ratio of 56.4% was obtained when encoding a small sample of twenty records ( 10,777 characters). the specific difficulty anticipated in this extension is the amount of either processing time or storage space which the decoding routines would require. if the look-up table for the actual descriptor values were to be located on disk, the time to retrieve and decode each record might be rather long. on the other hand, if the look-up table were to be in main storage at the time of processing, its size might exclude the ability to do anything else, particularly when on-line retrieval is done in an extremely limited amount of main storage area. a partial solution to this problem might be to keep the look-up tables for the most frequently occurring terms in main storage and the others on disk. at present further analysis is being done to determine the value of this approach. conclusions the compaction algorithm performs relatively efficiently given the type of data used in text data base (i.e. data without lower case alphabetics, having a limited number of special characters, in primarily english text ). the times for decoding individual records ( .031 sec/ record ) indicate that on a normal print or terminal display operation, no noticeable increase in access time will be incurred. however several types of problems are encountered when treating other kinds of data. since the algorithm works on the basis of replacing the most-frequently occurring n-grams by single-byte codes, the compaction ratio is dependent on the number of codes that can be "freed up" for n-gram representation. the more codes that can be reallocated to n-grams, the better the compaction. data which would pose complications to the algorithm-as currently defined-can be separated for discussion as follows: compaction of alphanumeric datajschieber and thomas 205 1) data containing both upper and lower case characters (as well as a limited set of special characters), and 2) data which might possibly contain a wide variety of little-used special graphics. if lower-case characters are used, a possible way to encode data using this technique is to harken back to the time-honored method of representing lower-case with upper-case codes, and upper-case characters by their value, preceeded by a single shift code (e.g., #access for access). the shift code blank character digram would undoubtedly figure relatively high on the frequency list, making it eligible as an encodable digram. the second problem occurs when one attempts to compact data having a large set of graphics. a good example of this is bibliographic data containing a wide variety of little-used characters of the type now being provided for in the marc tapes ( 11) issued by the u. s. library of congress (such as the icelandic thorn). normally representation of these graphics is done by allocating as many codes as required from the possible 256-code set. since the compaction ratio is dependent on the number of unallocated internal codes, a possible solution to this dilemma might be to represent little-used graphics by multi-byte codes which would free the codes for representation of frequently occurring n-grams. further, it is noticeable that the more homogeneous the data the higher the compression ratio. this means that data all in one language will encode better than data in many languages. there is, unfortunately, no ready solution to this problem, given the constraints of this algorithm. in dealing with heterogeneous data one must be prepared to accept a lower compression factor. without doubt to be able to effect a savings of around 40% for storage space is significant. the price for this ability is computer processing time, and the more complex the encoding and decoding routines, the more time is required. there is a calculable break-even point at which it becomes economically more attractive to buy x amount of additional storage space than to spend the equivalent cost on data compaction. yet at the present cost of direct-access storage, compaction may be a possible solution for organizations with large data files. references 1. marron, b. a.; demaine, p. a. d.: "automatic data compression," communications of the acm, 10 (november 1967), 711-715. 2. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system iii: alphanumeric compression. (washington, d. c. : national bureau of standards, 1967 ) . (technical note 413 ) . 3. salton, g.: automatic information organization and retrieval (new york: mcgraw-hill, 1968 ). 4. resnikoff, h. l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 8 (march 1965), 84-89. 206 journal of library automation vol. 4/4 december, 1971 5. resnikoff, h . l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 9 (june 1966), 23-33. 6. snyderman, martin; hunt, bernard: "the myriad virtues of text compaction," datamation (december 1, 1970), 36-40. 7. myers, w.; townsend, m.; townsend, t.: "data compression by hardware or software," datamation (april 1966), 39-43. 8. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system ii. numeric compression. (washington, d. c.: national bureau of standards, 1967). (technical note 413 ). 9. demaine, p. a. d.; marron, b. a.: "the solid system i. a method for organizing and searching files." in schecter, g. (ed.): information retrieval-a critical view. (washington, d. c.: thompson book co., 1967). 10. schieber, w.: isis (integrated scientific information system; a general description of an approach to computerized bibliographical control). (geneva: international labour office, 1971) . 11. books: a marc format; specification of magnetic tapes containing monographic catalog records in the marc ii format. (washington, d. c.: library of congress, information systems office, 1970.) 121 n ews and announcements redi or not . .. "public libraries and the remote electronic delivery of information (redi)," a working meeting, was held in columbus, ohio, on monday and tuesday, march 23 and 24, 1981. the meeting, jointly sponsored by the public library of columbus and franklin county (ohio) and oclc, inc., considered the issues that public libraries must examine before becoming involved in electronic information services . subjects explored included technology, communications, information providers, information users , social implications, and financial, legal, and regulatory responsibilities. tom harnish, program dir e ctor of oclc' s home delivery of library services program, was moderator of the twoday event. participants at the conference represented a variety of public libraries from throughout the u.s., including new york, georgia, texas, california, colorado, and illinois. don hammer represented lita at the meeting; mary jo lynch of the ala office for research also attended . "geographic distances, " said harnish, "were the only points of separation among the meeting participants . there was an overwhelming agreement on the concerns for the future of libraries and universal access to information in the electronic age . " on the second day of the conference it became apparent that the redi agenda could not be properly dealt with in two days. "we need an organization which will address these issues on an ongoing basis," said richard sweeney, executive director of plcfc . "librarians at the conference agreed to promote and lead the development of the electronic library . to that end, this group is seeking recognition by ala as a membership initiative group with a special interest in the electronic library." the group's founders prepared the following mission statement for the membership initiative group: to ensure that information delivered electronically remains accessible to the general public, the electronic library association shall promote participation and leadership in the remote electronic delivery of information* (redi) by publicly supported libraries and nonprofit organizations . goals of the organization are to: • identify services and information that are best suited to remote electronic delivery; • plan , fund, and develop working demonstrations of library redi services ; • communicate the availability of electronic library services to the user community; · • inform the library profession of trends, specific events , and future directions of redi; • create coalitions with organizations in allied fields ·of interest. public libraries and nonprofit organizations with information interests, such as information and referral groups, are invited to join the electronic library association . the group plans to meet at the ala annual conference in san francisco. meeting details will be announced as soon as they are available . it was the goal of the "public libraries and the remote electronic delivery of information " meeting to provide th e fram e work within which to address the myriad issues in redi. the electronic library group will validate the role of libraries in technology .... redi or not here we come. *information delivered electronically where and when it is needed, in the library and elsewhere (home/office/off-site). 122 journal of library automation vol. 14/2 june 1981 arl adopts plan for improving access to microforms a plan aimed at improving bibliographic access to materials in microform by building a nationwide database of machinereadable records for individual titles in microform sets was approved in principle by the arl board of directors on january 30, 1981. the plan concentrates on monograph collections, and is aimed at providing records for individual titles in both current and retrospective sets. records add~d to the database will also aid cooperative efforts in preservation microfilming. elements of the plan include: • inputting of records conforming to accepted north american standards to the major bibliographic utilities by libraries and microform publishers; • d~ve_lopment of "profile matching" by the b1blwgraphic utilities permitting the cataloging of all individual titles in a series or microform collection with single operation; • cooperative cataloging of current and retrospective microform sets by libraries and publishers; • compensation for publishers who input acceptable bibliographic records to the bibliographic utilities to offset loss of revenue from card set sales. cooperation among libraries publishers, networks, and others ha's been stressed throughout the development of the plan, and initiatives on a number of fronts are necessary and encouraged in order to accomplish the goal of improved bibliographic access to microforms. arl will s_eek outside funding for a program coordmator to facilitate implementation of the elements outlined above, and recruitment for the one-year position will begin short!~ . the coordinator, advised by a committee of librarians (from arl and ~on-arl institutions) and microform publ~shers, will work with libraries, publishers, and the bibliographic utilities to help get the plan off the ground. the plan is the result of a one-year study funded by a grant from the national endowment for the humanities and conducted for arl by richard boss of information systems consultants, inc. during the course of the year, he interviewed librarians , microform publishers, representatives of the bibliographic utilities, and others interested in bibliographic access to microforms, gradually building the plan from elements on which there was agreement and discarding ideas that were not widely accepted. the effort to build a consensus among the various interested parties was aided by the advisory committee, comprising both arl librarians and microform publishers, which assisted and advised throughout the course of the project. arl will publish the study this spring. arl sponsorship of this project and its follow-up reflects the long-standing commitment the association has had to improving access to microforms . two earlier arl studies on improving bibliographic access contributed to the development of standards for descriptive cataloging of microforms, reinforced the importance of microforms for preserving and disseminating scholarly materials, and identified some of the problem areas that the current study has addressed . today, as the amount of materials in microform in arl libraries continues to grow-arl libraries hold more than 146,660,000 units of microform-improving access to these materials has taken on even greater urgency. the association of research libraries is an organization of major research libraries in the united states and canada. members include the larger university libraries, the national libraries of both countries and a number of public and special librar~ ies with substantial research collections . there are at present 111 institutional members . battelle studies using computers to access unpublished technical information engineers may be able to use computers to store, call up, and otherwise display some technical information not currently published in professional journals as a result of a study recently begun by battelle's columbus laboratories. in a four-month study sponsored by the american society of mechanical engineers (asme), battelle researchers are examining ways to use computers as an alternative to publications for communicating with the technical community. asme is a technical and educational organization with a membership of 100,000 individuals, including 17,000 student members. it conducts one of the largest technical publishing operations in the world, which includes codes, stanc dards, and operating principles for industry. according to battelle's gabor j. kovacs, certain types of information traditionally are not covered in monthly or quarterly technical journals, yet they often have widespread appeal among engineers. "recent advances in computer and telecommunications technologies, coupled with rapidly rising publication costs and postal rates, have created an ideal environment for organizations to consider using computers as an alternative mode of communication," kovacs said . "data bases can be used to maintain information that is impractical for conventional publication, and it is now possible to use them for many other types of communication as well." during the study, researchers will determine the feasibility of using a computer database to disseminate to asme members such information as short articles dealing with design and applications data, news and announcements 123 catalog data, and teleconference messages. with the help of the asme, battelle specialists will define the information requirements for such a system. while technology is sufficiently advanced to accommodate virtually any type of information, costs can become prohibitive unless practical compromises are made, kovacs said . as part of the study, battelle researchers also will analyze the costs associated with systems of varying capabilities. researchers then will define several alternative database systems, which will include such attributes as: • online, interactive retrieval features • simple-to-use retrieval language • user-aid features • a minimum of seventy-five simultaneous users • ability to send, store, and broadcast messages • compatibility with a variety of hard copy and crts (cathode ray tube terminals) • sixteen or more hours per day availability to accommodate different time zones • a minimum of thirty-characters-persecond transmission rates two of these alternative system de .signs-one representing a minimum capability and the other a maximum capability-then will be selected for further evaluation by battelle and the asme. 2 information technology and libraries | september 2008 andrew k. pacepresident’s message andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. w elcome to my first ital column as lita president. i’ve had the good fortune to write a number of columns in the past—in computers in libraries, smart libraries newsletter, and most recently american libraries—and it is a role that i have always cherished. there is just enough space to say what you want, but not all the responsibility of backing it up with facts and figures. in the past, i have worried about having enough to say month after month for an undefined period. now i am daunted by only having one year to address the lita membership and communicate goals and accomplishments of my quickly passing tenure. i am simultaneously humbled and extremely excited to start my presidential year with lita. i have some ambitious agenda items for the division. i said when i was running that i wanted to make lita the kind of organization that new librarians and it professionals want to join and that seasoned librarians wanted to be active in. recruitment to lita is vital, but there is also work to be done to make that recruitment even easier. i am fortunate in following up the great work of my predecessors, many of whom i have had the pleasure of serving with on the lita board since 2005. they have set the bar for me and make the coming year as challenging as anything i have done in my career. i also owe a lot to the membership who stepped forward to volunteer for committees, liaison appointments, and other volunteer opportunities. i also think it is important for lita members to know just how much the board relies on the faithful and diligent services of the lita staff. at my vice presidential town meeting, i talked about marketing and communication in terms of list (who), method (how), and message (what and why). not only was this a good way to do some navel gazing on what it means to be a member of lita, it laid some groundwork for the year ahead. i think it is an inescapable conclusion that the lita board needs to take another look at strategic planning (which expires this year). the approach i am going to recommend, however, is not one that tries to connote the collective wisdom of a dozen lita leaders. instead, i hope we can define a methodology by which lita committees, interest groups, and the membership at large are empowered to both do the work of the division and benefit from it. one of the quirky things that some people know about me is that i actually love bureaucracy. i was pleased to read in the lita bylaws that it is actually my duty as president to “see that the bylaws are observed by the officers and members of the board of directors.” i will tell you all that i also interpret this to mean that the president and the board will not act in ways that are not prescribed. the strength of a volunteer organization comes from its volunteers. the best legacy a lita president can provide is to give committees, interest groups, and the membership a free reign to create its future. as for the board, its main objective is to oversee the affairs of the division during the period between meetings. frankly, we’re not so great at this, and it is one of the biggest challenges for any volunteer organization. it is also one of my predecessor’s initiatives that i plan to follow through on with his help as immediate past president. participation and involvement—and the ability to follow the work and strategies of the division—should be easier for all of us. so, if i were to put my platform in a nutshell it would be this—recruitment, communication, strategic planning, and volunteer empowerment. i left out fun, because it goes without saying that most of us are part of lita because it’s a fun division with great members. this is a lot to get done in one year, but because it will be fun, i’m looking forward to it. lib-mocs-kmc364-20131012113937 268 the use of automatic indexing for authority control martin dillon: university of north carolina at chapel hill ; rebecca c. knight: wichita state university, wichita, kansas; margaret f. lospinuso: university of north carolina at chapel hill; and john ulmschneider: national library of medicine. thesaurus-based automatic indexing and automatic authority control share common ground as word-matching processes. to demonstrate the resemblance, an experimental system utilizing automatic indexing as its core process was implemented to perform authority control on a collection of bibliographic records. details of the system are given and results discussed. the benefits of exploiting the resemblance between the two systems are examined. introduction it is not often realized how close the relationship is between automatic indexing using a thesaurus , on the one hand , and automatic authority control, on the other. making the connection is worthwhile for many reasons. the first has to do with terminology. though one would be naive to hope for a reduction in specialized vocabulary, it is helpful to appreciate that what is called a thesaurus in one application is referred to as an authority file in the other; that the two have virtually the same structure, similar working parts, and play the same role in controlling the content of fields in a bibliographic file in their creation and, at least potentially, during retrievals by users. a second reason emerges in system development. below we discuss the various ways that a library can implement authority control. they range from a fully manual system, where the authority file exists only in card form, to online, automatic authority management. there are intermediate points as well. for each of the automated implementations, the system investment in software can be great. recognition of the close parallel in function of these two library needs allows for parallel development of software for any of these stages. a third reason looks to the future. successful system-patron interaction manuscript received apri11981 ; accepted september 1981. automatic indexing/dillon, et al. 269 ought not to depend upon a patron's knowledge of the authorized entry forms currently in use for a library. first, the concept of a controlled vocabulary is far too narrow: authority control should encompass all fields available for searching. but the patron need not be aware of complicating details: substitutions of recognized variants for authorized forms ought to be carried out automatically during patron retrievals (with due regard, of course, for the intent of the patron). this article describes a project in authority control in a specialized system environment, one that is increasingly typical in many of its features. the file of records is relatively small, currently below 10,000, and has a potential for growth not exceeding 100,000. the collection, derived from the annabel morris buchanan collection of american religious tune books at the university of north carolina (chapel hill) music library, has many similarities with standard book collections, but its details vary greatly and cataloging conventions have been developed locally. its use for scholarly research is similar to that for any standard collection of bibliographic records. a great many such nonstandard collections exist-the morgue file in a newspaper, machine-readable data files, even properties marketed by cooperatives of real estate agencies. developing automated retrieval systems for such collections are similar enterprises, sharing similar goals and problems. in particular, all require extensive authority control similar to that required by a tune-book collection. the important feature of the method of authority control described here, one that makes it likely to be of interest to others, is its use of the same structures and software that are used for general vocabulary control. the three major software components we will refer to below are: thesaurus maintenance, automatic indexing, and automatic updating. these components antedated our effort to implement a similar system for authority control. when the problems that dealt with authority control per se were investigated, it was discovered that the system already available for subject control could be used exactly as it stood for authority control as well. initial experiments confirmed this relationship. 1 authority control and automatic indexing automatic authority control has been approached largely as a unique problem requiring special software development for its implementation. but authority control shares common ground with automatic subject indexing. both are term-matching activities based on a list of preferred terms plus a much larger list of match terms. each preferred term is tied to a number of match terms, but each match term is tied to only one preferred term. in the indexing environment, document text is examined for certain terms; these "free text" (uncontrolled vocabulary) terms are tied to equivalent (controlled vocabulary) terms in a thesaurus. when an uncontrolled vocabulary term is encountered in a document, its associated controlled 270 journal of library automation vol. 14/4 december 1981 vocabulary term is posted to the document as a descriptor. in authority control, document text is also examined for certain terms, e.g., author names. these "free-text" author names (i.e., names just as they appear on a title page) are tied to their authoritative name form (controlled vocabulary) in an authority file . when a "free-text" author name is encountered, the authoritative name is posted to the document or book (i.e., assigned as a heading or entry point). an automatic authority control system, then, is realizable by applying standard automatic subject-indexing software, which exploits the resemblance between the two processes. the input would consist of a thesaurus (in this case, an authority file) and bibliographic records; the indexing discovers matches between the list of possible terms in the thesaurus (variants of author names) with the "free-text" terms (title-page author names) , and posts the appropriate controlled thesaurus terms (authoritative author name form) whenever a match occurs. (see figure 1.) the tune-book project an experimental version of an authority control system using automatic indexing was implemented to test the feasibility of automatic indexing as i thesaurus i (authority file) \ \ i i \ fig. 1. at1thority control by indexing. matching and posting , l ' pdated records i \ ' i bibliographic records ~ automatic indexing/dillon, et al. 271 the core process for authority control. the goal was automatic authority control for the buchanan collection index, the first step in work on a more comprehensive project, an index of american religious tune books, in particular, the shape-note tune books. for the study of american cultural and musical history it is important to be able to trace the dissemination of these hymn tunes and texts, but the absence of a comprehensive index of american hymn tune books severely constrains such studies. many factors have discouraged scholars from constructing an index, among them the magnitude of the repertory . using computers to sort, file, and print reduces many of the problems associated with the size of the repertory, but does not address those created by the diverse forms of names and texts used by the tune-book compilers. correct hymn titles and especially accurate composer attributions were not important to the compilers of the tune books. consequently, although many tune-book compilers did attempt to indicate who had composed the work, the names of the composers appeared in various forms. for example, the name "israel holdroyd" might appear as simply "holdrad" or "holdrayd" with no first name given, or a first initial might be added, or an abbreviated first name, such as "is." might be used with one of several forms of the family name. automatic authority control over these names is necessary to the study of this collection, since only automatic means can address the problems of magnitude encountered in approaching the index as a whole. the database now contains about 6,000 records for these tune books. they are stored in marc format with variable-length fields giving a variety of information about each tune . creation of the authority file a thesaurus of authority records for the buchanan collection was manually created and placed in an online file. the initial authority file comprises a selection of composers whose names are present in conflicting forms in the present database. these were obtained by analyzing the file sorted by tune names, noting those tunes for which it appeared that the name of the same composer was given in more than one form. all forms of the name found were entered on cards along with the name of the tune (or tunes) through which the relationship was established . we used an explicit algorithm as a guide in determining which names were actually forms of the same name (see appendix for details). this process resulted in a list of 266 distinct composers, each with one to four different name forms. all were compared with the list sorted by composers, noting additional forms. these names were then checked in several reference works, and authoritative forms (with dates) were established when possible. implementation software systems file processing for the tune records and the authority thesaurus was 272 journal of library automation vol. 14/4 december 1981 accomplished using a local software product, bibliographic/marc processing system (bps). bps is a general-purpose software package for the manipulation of marc-format records. this experiment used bps subsystems for creation of marc-format records, sorting and formatting, and file updating (i.e., updating a master file with the contents of a transaction file). the automatic indexing program used here was intended as part of a thesaurus-based document query system. 2 it is compatible with bps, but utilizes generalized automatic indexing principles-its compatibility depends only on properly formatted thesaurus and bibliographic records. it includes file-processing programs for the thesaurus (authority file) and the bibliographic records (tune records) and a matching program that performs the indexing. posting of the authoritative name forms to the proper marc record is done with standard bps updating procedures using output from the matching program. automatic authority control process as input the system uses a thesaurus and the text of fields selected from marc-format document records. the thesaurus consists of pairs of terms: the first of each pair is the term searched for in a document, the second is the authority term assigned to the document, whenever the first term is found. figure 2 gives examples. the text may be abstracts, titles, or the contents of any field selected from the documents for authority control. in this case, the text is derived from the composer field; for authority work in general, any field requiring authority control would be input. the first step in authority control is as follows. the text sample and a stop-word list are input to the initial text-processing program. the incomau'ihcrity fcri'i cole, j_ i cvle, joh~ 1774-1855 clarkf", thos. 1 clark, thomas \:ol e!' , ~ eo. i cuzens, 9. / cuzens, benjamin ilall , ::;_ bi ba 11 , r. fholraj / hcld r oyd , israel aolroyd i hcldroyd, israel fig. 2. thesaurus/authority file format . automatic indexing/dillon , et al. 273 ing text (in this case, composer names) is separated into individual words. the stop-word list is used to remove designated words from the input, which in authority control might be titles of address and so onterms such as "miss," "elder," or "reverend." (automatic indexing uses the stop-word list to eliminate similarly noncontributory terms, such as conjunctions and prepositions.) the processing program can also convert plurals to singulars if desired. the purpose of this option in automatic indexing is to pare down variants in order to increase matches by standardizing term forms. however, plurals are not converted in authority control, since names are usually distinguished from one another by their full forms. the processing produces a list of individual terms. each term is given once along with the number of words in the term, then broken up with the document number attached to each piece. the thesaurus authority records are edited by the thesaurus processing program into specially formatted matched pairs of variant and authoritative forms. input is the match-term/variant-term file (figure 2) and the same stop-word list used for document processing. the stop-word list eliminates all unwanted words in the list of variant name forms. output is a file containing all possible name forms (variants), the number of terms in each name and their positions in the name, and the authoritative name form, as in figure 3. next the two files are used as input to a matching program that creates an inverted file of the processed document text, then compares each match term from the prepared thesaurus with the inverted file. a match is discovered according to one of the following criteria: 1. exact match: match term and document term are the same words, in the same order, and adjacent. 2. stop word exact match: words are the same in match term and in document term, and in order, but deleted stop words may intervene between words in the document term. 3. any order match: term must be the same words and adjacent (i.e., without intervening words) and may be in any order. va!'iani twc!ld s ~:utiv~ auti:-ci\ily ?cs: no fch hlstin'js, 'ihos. 2 1 2 rastinq~ , tl:hii.l s 17~4-l tl7 _ hastl.nqs, l h:>s :le i 1 2 rds tl nq.< , th.:>llll s 17cl~ 1 -!72 holde a':! l!ol:lccyd , l s cd: ab-1054, .\3-166q, ad-1248, aq-133b, ••• fig. 4. update file. results table 1 gives some statistics on the experimental runs. in the 5, 788 bibliographic records, 760 distinct composer names were present, the remainder (one composer per record) being duplicate forms; many of these are simply "anon," where the composer was not known. earlier test runs on a subset of the file had fewer duplicates, and additions to the full database show few new composer name forms. thus the database is nearing a stable state with an exhaustive list of composers; this stability contribtable 1. implementation statistics f ile statistics: total number of bibliograp hi c records number of composer names in biblio reco rds ave rage number of compositions per composer tota l number of authorit y na me forms (in authority file) tota l number of variant and authority names (in authority file) run statisti cs: total number of variant thesauru s names matched total numbe r of variant thesaurus n am es unmatched average number of documents per match ed ter m average number of docume nts per term total number of reeords updated b y authority form 5,788 760 13.2 266 599 372 213 5.87 3.61 2, 110 276 journal of library automation vol. 14/4 december 1981 jqc 10: af1 14 7 .\nt ho l o.:; y ; 'i h <:> ~ n ion ilih jl on y i mjrin : : sel~cted ty ;ecr qe y~njr~ckson tune na:1e: i e::-usa lem firs: lin~:je~us, my all tc h~~v•n is gone, pcn: walk e r, william 18 09 -187 5 cc.'1p!)3:':r: loi al k e r, \ojr • joc i d: aa-1353 "antholo.:;y: the sacred harp imprinl': oy 3. f. lthite, e . j. king [and d.p. white}--4th ed.---atalnta : d. p. byrd, 1870 tune name: the hilt cf zion frgsr ~ine:the hill cf zion yield s , pc~: white, benjamin franklin 1800-1879 coi1po ser: white, b. f. )ot: id: afl -1100 anthology: the culcia;er imprint : or, 'ihe new york coll~ction of ~acred music 1 by i. b. woccbury. --neli york f. j. huntington tune name: carson first line:jesus an1 shall it ever be, pcn: bradbury, williaa; batchelder 1816-1868 composer: er, w. !l. fig. 5. updated records. utes to decreasing errors and fewer unmatched composer names in the automated authority control process. the total numbe r of thesaurus records matched applies to variant forms, authoritative forms (matching occurs for these also) , and for those few forms that have no variants. the unmatched terms (213) are largely variants not in the database but gleaned from reference sources in anticipation of their occurrence, and authority forms, most of which do not occur in the database. the 2, 110 matched represent the total number of composer names matched of the originals, 788 names. most of the unmatched names are the "anon" entries (more than 2 ,000); the remainder are unanticipated forms not detected in the initial manual construction of the authority file. these unanticipated forms become new variants added to the authority file as described above. conclusions automated authority control as presented here has a number of advantages, either for libraries with their own processing facilities or for the management of information collections outside the standard library environment. unifying the processes of subject control and authority control by using the same procedures and software for both simplifies the tasks of automatic indexing/dillon, et al. 277 systems personnel and information managers. where catalog access is online, the patron benefits by applying subject access facilities to other searches. ideally, substitutions for all variants would occur automatically, accompanied by an alerl lo the patron where it was felt necessary. at a minimum, the same command structure would be available for referencing names as would be normally available for consulting an online thesaurus. in either case, the difficulties of the patron are reduced, both in comprehending how the system works, and in acquiring a facility for using system commands. references 1. gordon ellyson jessee, "authority control: a study of the concept and its implementation using an automated indexing system" (master's paper, school of library science, university of north carolina at chapel hill, 1980). 2. margaret s. strode, "automatic indexing using a thesaurus" (master's thesis, department of computer science, university of north carolina at chapel hill, 1977). appendix rules for decisions on similar names the following conditions may exist: a = identical tune name b = identical surname c = identical first initial d = same first letter of surname and close match of the rest of the surname. (55 percent match of latters in content, not in order. such a similarity is presumed to represent a similarity in sound. ) e = similar tune name (same criteria as in d for percentage of match). exception: words "new" and "old" cancel any presumed relation between similar tune names. f = information in cmp subfield x field is identical in content the following combinations of conditions indicate the same person, expressed in decreasing order of reliability: l. a&b 2. b&c 3. a&d 4. c&d 5. b&e 6. c&d&e 7. d&e 8. f&(bord) note: points seven and eight are regarded as tentative, and matches using these combinations are flagged for later checking. martin dillon is associate professor of library science at the university of north carolina at chapel hill. rebecca c. knight is administrative services librarian at wichita state university, wichita, kansas. margaret f. lospinuso is music librarian at the university of north carolina at chapel hill. john ulmschneider is library associate at the national library of medicine. microsoft word ital_december_gerrity_final.docx   editor’s comments bob gerrity     information  technologies  and  libraries  |  september  2013   3     this  month’s  issue   we  have  an  eclectic  mix  of  content  in  this  issue  of  information  technology  and  libraries.   lita  president  cindi  trainor  provides  highlights  of  the  recent  lita  forum  in  louisville  and   planned  lita  events  for  the  upcoming  ala  midwinter  meeting  in  philadelphia,  including  the  lita   town  meeting,  the  always-­‐popular  top  tech  trends  panel,  and  the  association’s  popular   “networking  event”  on  sunday  evening.     ital  editorial  board  member  jerome  yavarkosky  describes  the  significant  benefits  that   immersive  technologies  can  offer  higher  education.  the  advent  of  massive  open  online  courses   (moocs)  would  seem  to  present  an  ideal  framework  for  the  development  of  immersive  library   services  to  support  learners  who  may  otherwise  lack  access  to  quality  library  resources  and   services.   responsive  web  design  is  the  topic  of  a  timely  article  by  hannah  gascho  rempel  and  laurie  m.   bridges,  who  examine  what  tasks  library  users  actually  carry  out  on  a  library  mobile  website  and   how  this  has  informed  oregon  state  university  libraries’  adoption  of  a  responsive  design   approach  for  their  website.   piotr  praczyk,  javier  nogueras-­‐iso,  and  salvatore  mele  present  a  method  for  automatically     extracting  and  processing  graphical  content  from  scholarly  articles  in  pdf  format  in  the  field  of   high-­‐energy  physics.  the  method  offers  potential  for  enhancing  access  and  search  services  and   bridging  the  semantic  gap  between  textual  and  graphical  content.   elizabeth  thorne  wallington  describes  the  use  of  mapping  and  graphical  information  systems   (gis)  to  study  the  relationship  between  public  library  locations  in  the  st.  louis  area  and  the   socioeconomic  attributes  of  the  populations  they  serve.  the  paper  raises  interesting  questions   about  how  libraries  are  geographically  distributed  and  whether  they  truly  provide  universal  and   equal  access.     vadim  gureyev  and  nikolai  mazov  present  a  method  for  using  bibliometric  analysis  of  the   publication  output  of  two  research  institutes  as  a  collection-­‐development  tool,  to  identify  journals   most  important  for  researchers  at  the  institutes.       bob  gerrity  (r.gerrity@uq.edu.au)  is  university  librarian,  university  of  queensland,  australia.       editor’s comments bob gerrity   editor’s  comments  |  gerrity       4         editorial board thoughts | eden 109 editorial board thoughts bradford lee eden musings on the demise of paper w e have been hearing the dire predictions about the end of paper and the book since microfiche was hailed as the savior of libraries decades ago. now it seems that technology may be finally catching up with the hype. with the amazon kindle and the sony reader beginning to sell in the marketplace despite the cost (about $360 for the kindle), it appears that a whole new group of electronic alternatives to the print book will soon be available for users next year. amazon reports that e-book sales quadrupled in 2008 from the previous year. this has many technology firms salivating and hoping that the consumer market is ready to move to digital reading as quickly and profitably as the move to digital music. some of these new devices and technologies are featured in the march 3, 2009, fortune article by michael v. copeland titled “the end of paper?”1 part of the problem with current readers is their challenges for advertising. because the screen is so small, there isn’t any room to insert ads (i.e., revenue) around the margins of the text. but new readers such as plastic logic, polymer vision, and firstpaper will have larger screens, stronger image resolution, and automatic wireless updates, with color screens and video capabilities just over the horizon. still, working out a business model for newspapers and magazines is the real challenge. and how much will readers pay for content? with everything “free” over the internet, consumers have become accustomed to information readily available for no immediate cost. so how much to charge and how to make money selling content? the plastic logic reader weighs less than a pound, is one-eighth of an inch thick, and resembles an 8½ x 11 inch sheet of paper or a clipboard. it will appear in the marketplace next year, using plastic transistors powered by a lithium battery. while not flexible, it is a very durable and break-resistant device. other e-readers will use flexible display technology that allows one to fold up the screen and place the device into a pocket. much of this technology is fueled by e-ink, a start-up company that is behind the success of the kindle and the reader. they are exploring the use of color and video, but both have problems in terms of reading experience and battery wear. in the long run, however, these issues will be resolved. expense is the main concern: just how much are users willing to pay to read something in digital rather than analog? amazon has been hugely successful with the kindle, selling more than 500,000 for just under $400 in 2007. and with the drop in subscriptions for analog magazines and newspapers, advertisers are becoming nervous about their futures. or will the “pay by the article” model, like that used for digital music sales, become the norm? so what should or do these developments mean for libraries? it means that we should probably be exploring the purchase of some of these products when they appear and offering them (with some content) for checkout to our patrons. many of us did something similar when it became apparent that laptops were wanted and needed by students for their use. many of us still offer this service today, even though many campuses now require students to purchase them anyway. offering cutting-edge technology with content related to the transmission and packaging of information is one way for our clientele to see libraries as more than just print materials and a social space. and libraries shouldn’t pay full price (or any price) for these new toys; companies that develop these products are dying to find free research and development focus groups that will assist them in versioning and upgrading their products for the marketplace. what better avenue than college students? related to this is the recent announcement by the university of michigan that their university press will now be a digital operation to be run as part of the library.2 decreased university and library budgets have meant that university presses have not been able to sell enough of their monographs to maintain viable business models. the move of a university press to a successful scholarly communication and open-source publishing entity like the university of michigan libraries means that the press will be able to survive, and it also indicates that the newer model of academic libraries as university publishers will have a prototypical example to point out to their university’s administration. in the long run, these types of partnerships are essential if academic libraries are to survive their own budget cuts in the future. references 1. michael v. copeland, “the end of paper?” cnnmoney .com, mar. 3, 2009, http://money.cnn.com/2009/03/03/ technology/copeland_epaper.fortune/ (accessed june 22, 2009). 2. andrew albanese, “university of michigan press merged with library, with new emphasis on digital monographs,” libraryjournal.com, mar. 26, 2009, http://www .libraryjournal.com/article/ca6647076.html (accessed june 22, 2009). bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services and scholarly communication, university of california, santa barbara. levan opensearch and sru | levan 151 not all library content can be exposed as html pages for harvesting by search engines such as google and yahoo!. if a library instead exposes its content through a local search interface, that content can then be found by users of metasearch engines such as a9 and vivísimo. the functionality provided by the local search engine will affect the functionality of the metasearch engine and the findability of the library’s content. this paper describes that situation and some emerging standards in the metasearch arena that choose different balance points between functionality and ease of implementation. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. ฀ the content provider’s dilemma consider the increasingly common situation in which a library wants to expose its digital content to its users. suppose it knows that its users prefer search engines that search the contents of many sites simultaneously, rather than site-specific engines such as the one on the library’s web site. in order to support the preferences of its users, this library must make its contents accessible to search engines of the first type. the easiest way to do this is for the library to convert its contents to html pages and let the harvesting search engines such as google and yahoo! collect those pages and provide searching on them. however, a serious problem with harvesting search engines is that they place limits on how much data they will collect from any one site. google and yahoo! will not harvest a 3-million-record book catalog, even if the library can figure out how to turn the catalog entries into individual web pages. an alternative to exposing library content to harvesting search engines as html pages is to provide a local search interface and let a metasearch engine combine the results of searching the library’s site with the results from searching many other sites simultaneously. users of metasearch engines get the same advantage that users of harvesting search engines get (i.e., the ability to search the contents of many sites simultaneously) plus those users get access to data that the harvesting search engines do not have. the issue for the library is determining how much functionality it must provide in its local search engine so that the metasearch engine can, in turn, provide acceptable functionality to its users. the amount of functionality that the library provides will determine which metasearch engines will be able to access the library’s content. metasearch engines, such as a9 and vivísimo, are search engines that take a user’s query, send it to other search engines, and integrate the responses.1 the level of integration usually depends on the metasearch engine’s ability to understand the responses it receives from the various search engines it has queried. if the response is html intended for display on a browser, then the metasearch engine developers have to write code to parse through the html looking for the content. in such a case, the perceived value of the content determines the level of effort that the metasearch engine developers put into the parsing task; low-value content will have a low priority for developer time and will either suffer from poor integration or be excluded. for metasearch engines to work, they need to know how to send a search to the local search engine and how to interpret the results. metasearch engines such as vivísimo and a9 have staffs of programmers who write code to translate the queries they get from users into queries that the local search engines can accept. metasearch engines also have to develop code to convert all the responses returned by the local search engines into some common format so that those results can be combined and displayed to the user. this is tedious work that is prone to breaking when a local search engine changes how it searches or how it returns its response. the job of the metasearch engine is made much simpler if the local search engine supports a standard search interface such as sru (search and retrieve url) or opensearch. ฀ what does a metasearch engine need in order to use a local search engine? the search process consists of two basic steps. first, the search is performed. second, records are retrieved. to do a search, the metasearch engine needs to know: 1. the location of the local search engine 2. the form of the queries that the local search engine expects 3. how to send the query to the local search engine to retrieve records, the metasearch engine needs to know: 4. how to find the records in the response 5. how to parse the records opensearch and sru: a continuum of searching ralph levan ralph levan (levan@oclc.org) is a research scientist at oclc online computer library center in dublin, ohio. 152 information technology and libraries | september 2006 ฀ four protocols this paper will discuss four search protocols: opensearch, opensearch 1.1, sru, and the metasearch xml gateway (mxg).2 opensearch was initially developed for the a9 metasearch engine. it provides a mechanism for content providers to notify a9 of their content. it also allows rss (really simple syndication) browsers to display the results of a search.3 opensearch 1.1 has just been released. it extends the original specification based on input from a number of organizations, microsoft being prominent among them. sru was developed by the z39.50 community.4 recognizing that their standard (now eighteen years old) needed updating, they simplified it and created a new web service based on an xml encoding carried over http. the mxg protocol is the product of the niso metasearch initiative, a committee of metasearch engine developers, content providers, and users.5 mxg uses sru as a starting place, but eases the requirement for support of a standard query grammar. ฀ functionality versus ease of implementation a library rarely has software developers. the library’s area of expertise is, first of all, the management of content and, secondarily, content creation. librarians use tools developed by other organizations to provide access to their content. these tools include the library’s opac, the software provided to search any licensed content, and the software necessary to build, maintain, and access local digital repositories. for a library, ease of adoption of a new search protocol is essential. if support for the search protocol is built into the library’s tools, then the library will use it. if a small piece of code can be written to convert the library’s existing tools to support the new protocol, the library may do that. similarly, the developers of the library’s tools will want to expend the minimum effort to support a new search protocol. the tool developer’s choice of search protocol to support will depend on the tension between the functionality needed and the level of effort that must be expended to provide and maintain it. if low functionality is acceptable, then a small development effort may be acceptable. high functionality will require a greater level of effort. the developers of the search protocols examined here recognize this tension and are modifying their protocols to make them easier to implement. the new opensearch 1.1 will make it easier for some local search-engine providers to implement by easing some of the functionality requirements of version 1.0. similarly, the niso metasearch committee has defined mxg, a variant of sru that eases some of the requirements of sru.6 ฀ search protocol basics once again, the five basic pieces of information that a metasearch engine needs in order to communicate effectively with a local search engine are: (1) local search engine location, (2) the query-grammar expected, (3) the request encoding, (4) the response encoding, and (5) the record encoding. the four protocols provide these pieces of information to one degree or another (see table 1). the four protocols expose a site’s searching functionality and return responses in a standard format. all of these protocols have some common properties. they expect that the content provider will have a description record that describes the search service. all of these services send searches via http as simple urls, and the responses are sent back as structured xml. to ease implementation, opensearch 1.1 allows the content provider to return html instead of xml. all four protocols use a description record to describe the local search engine. the opensearch protocols define what a description record looks like, but not how it is retrieved. the location of the description record is discovered by some means outside the protocol (a priori knowledge). the description record specifies the location of the local search engine. the sru protocols define what a description record looks like and specifies that it can be obtained from the local search engine. the location of the local search engine is provided by a means outside the protocol (a priori knowledge again). each protocol defines how to formulate the search url. opensearch does this by having the local search-engine provider supply a template of the url in the description record. sru does this by defining the url. opensearch and mxg do not define how to formulate the query. the metasearch engine can either pass the user’s query along to the local search engine unchanged or reformulate the query based on information about the local search engine’s query language that it has gotten by outside means (more a priori knowledge). in the first case, the metasearch engine has to hope that some magic will happen and the local search engine will do something useful with the query. in the latter case, the metasearch engine’s staff has to develop a query translator. sru specifies a standard query grammar: cql (common query language).7 this means that the metasearch engine only has to write one translator for all the sru local search engines in the world. but it also means that all the sru local search engines have to support the cql query grammar. since there are no local search engines that support cql as their native query grammar, the content provider is left with the task of translating cql queries into their native query grammar. the query translation task has moved from the metasearch engine to the content provider. opensearch and sru | levan 153 opensearch 1.0, mxg, and sru define the structure of the query response. in the case of opensearch, the response is returned as an rss message, with a couple of extra elements added. mxg and sru define an xml schema for their responses. opensearch 1.1 allows the local search engine to return the response as unstructured html. this moves the requirement of creating a standard response from the content provider and leaves the metasearch engine with the much tougher task of finding the content embedded in html. if the metasearch engine doesn’t write code to parse the response, then all it can do is display the response. it will not be able to combine the response from the local search engine with the responses from other engines. sru and mxg require that records be returned in xml and that the local search engine must specify the schema for those records in the response. this leaves the content provider with the task of formatting the records according to the schema of their choice, a task that the content provider is probably best able to do. in turn, the metasearch engine can convert the returned records into some common format so that the records from multiple local search engines can be combined into a single response. because the records are encoded in xml, it is assumed that standard xml formatting tools can be used for the conversion. opensearch does not define how records should be structured. the opensearch response has a place for the title of the record and a url that points to the record. the structure of the record is undefined. this leaves the metasearch engine with the task of parsing the record that is returned. again, the effort moves from the content provider to the metasearch engine. if the metasearch engine does not or cannot parse the records, then it can at least display the records in some context, but it cannot combine them with the records from another local search engine. ฀ conclusion these protocols sit on a spectrum of complexity, trading the content provider’s complexity for that of the search engine. however, with lessened complexity for the metasearch engine comes increased functionality for the user. metasearch engines have to choose what content providers they will search. those that provide a high level of functionality can be easily combined with their existing local search engines. content providers with a lower level of functionality will either need additional development by the metasearch engine or will not be searched. not all metasearch engines require the same level of functionality, nor will they be prepared to accept content with a low level of functionality. content providers, such as digital libraries and institutional repositories, will have to choose the functionality they need to support to reach the metasearch engines they desire. references and notes 1. joe barker, “meta-search engines,” in finding information on the internet: a tutorial (u.c. berkeley: teaching library internet workshops, aug. 23, 2005 [last update]), www.lib.berkeley. edu/teachinglib/guides/internet/metasearch.html (accessed may 8, 2006). 2. a9.com, “opensearch specification,” http://opensearch .a9.com/spec/ (accessed may 8, 2006); a9.com, “opensearch 1.1,” http://opensearch.a9.com/spec/1.1/ (accessed may 8, 2006). 3. mark pilgrim, “what is rss?” o’reilly xml.com, dec. 18, 2002, www.xml.com/pub/a/2002/12/18/dive-into-xml.html (accessed may 8, 2006). 4. the library of congress network development and marc standards office, “z39.50 maintenance agency page,” www.loc.gov/z3950/agency/ (accessed may 8, 2006). 5. national information standards organization, “niso metasearch initiative,” www.niso.org/committees/ ms_initiative.html (accessed may 8, 2006). 6. niso metasearch initiative task group 3, “niso metasearch xml gateway implementors guide, version 0.2,” may 16, 2005, [microsoft word document] www.lib.ncsu.edu/nisomi/images/0/06/niso_metasearch_initiative_xml _gateway _implementors_guide.doc (accessed may 8, 2006); the library of congress, “sru: search and retrieve via url; sru version 1.1 13 february 2004,” www.loc.gov/standards/sru/index.html (accessed may 8, 2006). 7. the library of congress, “common query language; cql version 1.1 13th february 2004.” [web page] www.loc .gov/standards/sru/cql/index.html (accessed may 8, 2006). table 1. comparison of requirements of four metasearch protocols for effective communication with local search engines protocol feature opensearch 1.1 opensearch 1.0 mxg sru local search engine location a priori a priori a priori a priori request encoding defined defined defined defined response encoding none rss xml xml record encoding none none xml xml query grammar none none none cql 172 information technology and libraries | december 2009 information discovery insights gained from multipac, a prototype library discovery system alex a. dolski at the university of nevada las vegas libraries, as in most libraries, resources are dispersed into a number of closed “silos” with an organization-centric, rather than patron-centric, layout. patrons frequently have trouble navigating and discovering the dozens of disparate interfaces, and any attempt at a global overview of our information offerings is at the same time incomplete and highly complex. while consolidation of interfaces is widely considered to be desirable, certain challenges have made it elusive in practice. m ultipac is an experimental “discovery,” or metasearch, system developed to explore issues surrounding heterogeneous physical and networked resource access in an academic library environment. this article discusses some of the reasons for, and outcomes of, its development at the university of nevada las vegas (unlv). n the case for multipac fragmentation of library resources and their interfaces is a growing problem in libraries, and unlv libraries is no exception. electronic information here is scattered across our innovative webpac; our main website, our three branch library websites; remote article databases, local custom databases, local digital collections, special collections, other remotely hosted resources (such as libguides), and others. the number of these resources, as well as the total volume of content offered by the libraries, has grown over time (figure 1), while access provisions have not kept pace in terms of usability. in light of this dilemma, the libraries and various units within have deployed finding and search tools that provide browsing and searching access to certain subsets of these resources, depending on criteria such as n the type of resource; n its place within the libraries’ organizational structure; n its place within some arbitrarily defined topical categorization of library resources; n the perceived quality of its content; and n its uniqueness relative to other resources. these tools tend to be organization-centric rather than patron-centric, as they are generally provisioned in relative isolation from each other without placing as much emphasis on the big picture (figure 2). the result is, from the patron’s perspective, a disaggregated mass of information and scattered finding tools that, to varying degrees, each accomplishes its own specific goals at the expense of macro-level findability. currently, a comprehensive search for a given subject across as many library resources as possible might involve visiting a half-dozen interfaces or more—each one predicated upon awareness of each individual interface, its relation to the others, and figure 1. “silos” in the library figure 2. organization-centric resource provisioning alex a. dolski (alex.dolski@unlv.edu) is web & digitization application developer at the university of nevada las vegas libraries. information discovery insights gained from multipac | dolski 173 the characteristics of its specific coverage of the corpus of library content. our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation. test subjects, when given a task to find something and asked to use the library website as a starting point, fail outright in a clear majority of cases.1 multipac is a technical prototype that serves as an exploration of these issues. while the system itself breaks no new technical ground, it brings to the forefront critical issues of metadata quality, organizational structure, and long-term planning that can inform future actions regarding strategy and implementation of potential solutions at unlv and elsewhere. yet it is only one of numerous ways that these issues could be addressed.2 in an abstract sense, multipac is biased toward principles of simplification, consolidation, and unification. in theory, usability can be improved by eliminating redundant interfaces, consolidating search tools, and bringing together resource-specific features (e.g., opac holdings status) in one interface to the maximum extent possible (figure 3). taken to an extreme, this means being able to support searching all of our resources, regardless of type or location, from a single interface; abstracting each resource from whatever native or built-in user interface it might offer; and relying instead on its data interface for querying and result-set gathering. thus multipac is as much a proof-of-concept as it is a concrete implementation. n background: how multipac became what it is multipac came about from a unique set of circumstances. from the beginning, it was intended as an exploratory project, with no serious expectation of it ever being deployed. our desire to have a working prototype ready for our discovery mini-conference meant that we had just six weeks of development time, which was hardly sufficient for anything more than the most agile of table 1. some popular existing library discovery systems name company/institution commercial status aquabrowser serials solutions commercial blacklight university of virginia open-source (apache) encore innovative interfaces commercial extensible catalog university of rochester open-source (mit/gpl) libraryfind oregon state university open-source (gpl) metalib ex libris commercial primo ex libris commercial summon serials solutions commercial vufind villanova university open-source (gpl) worldcat local oclc commercial table 2. some existing back-end search servers name company/institution commercial status endeca endeca technologies commercial idol autonomy commercial lucene apache foundation open-source (apache) search server microsoft commercial search server express microsoft free solr (superset of lucene) apache foundation open-source (apache) sphinx sphinx technologies open-source (gpl) xapian community open-source (gpl) zebra index data open-source (gpl) 174 information technology and libraries | december 2009 development models. the resulting design, while foundationally solid, was limited in scope and depth because of time constraints. another option, instead of developing multipac, would have been to demonstrate an existing open-source discovery system. the advantage of this approach is that the final product would have been considerably more advanced than anything we could have developed ourselves in six weeks. on the other hand, it might not have provided a comparable learning opportunity. n survey of similar systems were its development to continue, multipac would find itself among an increasingly crowded field of competitors (table 1). a number of library discovery systems already exist, most backed by open-source or commercially available back-end search engines (table 2), which handle the nitty-gritty, low-level ingestion, indexing, and retrieval. these lists of systems are by no means comprehensive and do not include notable experimental or research systems, which would make them much longer. n architecture in terms of how they carry out a search, meta-search applications can be divided into two main groups: distributed (or federated search), in which searches are “broadcast” to individual resources that return results in real time (figure 4); and harvested search, in which searches are carried out against a local index of resource contents (figure 5).3 both have advantages and disadvantages beyond the scope of this article. multipac takes the latter approach. it consists of three primary components: the search server, the user interface, and the metadata harvesting system (figure 6). figure 4. the federated search process figure 5. the harvested search process figure 6. the three main components of multipac figure 3. patron-centric resource provisioning information discovery insights gained from multipac | dolski 175 n search server after some research, solr was chosen as the search server because of its ease of use, proven library track record, and http–based representational state transfer (rest) application programming interface (api), which improves network-topological flexibility, allowing it to be deployed on a different server than the front-end web application—an important consideration in our server environment.4 jetty—a java web application server bundled with solr—proved adequate and convenient for our needs. the metadata schema used by solr can be customized. we derived ours from the unqualified dublin core metadata element set (dcmes),5 with a few fields removed and some fields added, such as “library” and “department,” as well as fields that support various multipac features, such as thumbnail images, and primary record urls. dcmes was chosen for its combination of generality, simplicity, and familiarity. in practice, the solr schema is for finding purposes only, so whether it uses a standard schema is of little importance. n user interface the front-end multipac system is written in php 5.2 in a model-view-controller design based on classical object design principles. to support modularity, new resources can be added as classes that implement a resource-class interface. the multipac html user interface is composed of five views: search, browse, results, item, and list, which exist to accommodate the finding process illustrated in figure 7. each view uses a custom html template that can be easily styled by nonprogrammer web designers. (needless to say, judging by figures 8–12, they haven’t been.) most dynamic code is encapsulated within dedicated “helper” methods in an attempt to decouple the templates from the rest of the system. output formats, like resources, are modular and decoupled from the core of the system. the html user interface is one of several interfaces available to the multipac system; others include xml and json, which effectively add web services support to all encompassed resources—a feature missing from many of the resources’ own built-in interfaces.6 n search view search view (figure 8) is the simplest view, serving as the “front page.” it currently includes little more than a brief introduction and search field. the search field is not complicated; it is, in fact, possible to include search forms on any webpage and scope them to any subset of resources on the basis of facet queries. for example, a search form could be scoped to las vegas–related resources in special collections, which would satisfy the demand of some library departments for custom search engines tailored to their resources without contributing to the “interface fragmentation” effect discussed in the introduction. (this would require a higher level of metadata quality than we currently have, which will be discussed in depth later.) because search forms can be added to any page, this view is not essential to the multipac system. to improve simplification, it could be easily removed and replaced with, for example, a search form on the library homepage. n browse view browse view (figure 9) is an alternative to search view, intended for situations in which the user lacks a “concrete target” (figure 7). as should be evident by its appearance, figure 7. the information-finding process supported by multipac figure 8. the multipac search view page 176 information technology and libraries | december 2009 this is the least-developed view, simply displaying facet terms in an html unordered list. notice the facet terms in the format field; this is malprocessed, marc– encoded information resulting from a quick-and-dirty extensible stylesheet language (xsl) transformation from marcxml to solr xml. n results view the results page (figure 10) is composed of three columns: 1. the left column displays a facet list—a feature generally found to be highly useful for results-gathering purposes.7 the data in the list is generated by solr and transformed to an html unordered list using php. the facets are configurable; fields can be made “facetable” in the solr schema configuration file. 2. the center column displays results for the current search query that have been provided by solr. thumbnails are available for resources that have them; generic icons are provided for those that do not. currently, the results list displays item title and description fields. some items have very rich descriptions; others have minimal descriptions or no descriptions at all. this happens to be one of several significant metadata quality issues that will be discussed later. 3. the right column displays results from nonindexed resources, including any that it would not be feasible to index locally, such as google, our article databases, and so on. multipac displays these resources as collapsed panes that expand when their titles are clicked and initiate an ajax request for the current search query. in a situation in which there might be twenty or more “panes” to load, performance would obviously suffer greatly if each one had to be queried each time the results page loaded. the on-demand loading process greatly speeds up the page load time. currently, the right column includes only a handful of resource panes—as many as could be developed in six weeks alongside the rest of the prototype. it is anticipated that further development would entail the addition of any number of panes—perhaps several dozen. the ease of developing a resource pane can vary greatly depending on the resource. for developerfriendly resources that offer a useful javascript object notation (json) api, it can take less than half an hour. for article databases, which vendors generally take great pains to “lock down,” the task can entail a two-day marathon involving trial-and-error http-request-token authentication and screen-scraping of complex invalid html. in some cases, vendor license agreements may prohibit this kind of use altogether. there is little we can do about this; clearly, one of multipac’s severest limitations is its lack of adeptness at searching these types of “closed” remote resources. n item view item view (figure 11) provides greater detail about an individual item, including a display of more metadata fields, an image, and a link to the item in its primary context, if available. it is expected that this view also would include holdings status information for opac resources, although this has not been implemented yet. the availability of various page features is dependent on values encoded in the item’s solr metadata record. for example, if an image url is available, it will be displayed; if not, it won’t. an effort was made to keep the view logic separate from the underlying resource to improve code and resource maintainability. the page template itself does not contain any resource-dependent conditionals. n list view list view (figure 12), essentially a “favorites” or “cart” view, is so named because it is intended to duplicate the list feature of unlv libraries’ innovative millennium figure 9. the multipac browse view page information discovery insights gained from multipac | dolski 177 opac. the user can click a button in either results view or item view to add items to the list, which is stored in a cookie. although currently not feature-rich, it would be reasonable to expect the ability to send the list as an e-mail or text message, as well as other features. n metadata harvesting system for metadata to be imported into solr, it must first be harvested. in the harvesting process, a custom script checks source data and compares it with local data. it downloads new records, updates stale records, and deletes missing records. not all resources support the ability to easily check for changed records, meaning that the full record set must be downloaded and converted during every harvest. in most cases, this is not a problem; most of our resources (the library catalog excluded) can be fully dumped in a matter of a few seconds each. in a production environment, the harvest scripts would be run automatically every day or so. in practice, every resource is different, necessitating a different harvest script. the open archives initiative protocol for metadata harvesting (oai-pmh) is the protocol that first jumps to mind as being ideal for metadata harvesting, but most of our resources do not support it. ideally, we would modify as many of them as possible to be oai–compliant, but that would still leave many that are out of our hands. either way, a substantial number of custom harvest scripts would still be required. for demonstration purposes, the multipac prototype was seeded with sample data from a handful of diverse resources: 1. a set of 16,000 marc records from our library catalog, which we converted to marcxml and then to solr xml using xsl transformations 2. our locally built las vegas architects and buildings database, a mysql database containing more than 10,000 rows across 27 tables, which we queried and dumped into xml using a php script 3. our locally built special collections database, a smaller mysql database, which we dealt with the same way 4. our contentdm digital collections, which we downloaded via oai-pmh and transformed using another custom xsl stylesheet there are typically a variety of conversion options for each resource. because of time constraints, we simply chose what we expected would be the quickest route for each, and did not pay much attention to the quality of the conversion. n how multipac answers unlv libraries’ discovery questions multipac has essentially proven its capability of solving interface multiplication and fragmentation issues. figure 10. the multipac results view page 178 information technology and libraries | december 2009 by adding a layer of abstraction between resource and patron, it enables us to reference abstract resources instead of their specific implementations—for example, “the library catalog” instead of “the innopac catalog.” this creates flexibility gains with regard to resource provision and deployment. this kind of “pervasive decoupling” can carry with it a number of advantages. first, it can allow us to provide custom-developed services that vendors cannot or do not offer. second, it can prevent service interruptions caused by maintenance, upgrades, or replacement of individual back-end resources. third, by making us less dependent on specific implementations of vendor products—in other words, reducing vendor “lock-in”—it can potentially give us leverage in vendor contract negotiations. because of the breadth of information we offer from our website gateway, we as a library are particularly sensitive about the continued availability of access to our resources at stable urls. when resources are not persistent, patrons and staff need to be retrained, expectations need to be adjusted, and hyperlinks—scattered all over the place—need to be updated. by decoupling abstract resources from their implementations, multipac becomes, in effect, its own persistent uri system, unifying many library resources under one stable uri schema. in conjunction with a url rewriting system on the web server, a resource-based uri schema (figure 13) would be both powerful and desirable.8 n lessons learned in the development of multipac the lessons learned in the development of multipac fall into three main categories, listed here in order of importance. metadata quality considerations quality metadata—characterized by unified schemas; useful crosswalking; and consistent, thorough description—facilitates finding and gathering. in practice, a surrogate record is as important as the resource it describes. below a certain quality threshold, its accompanying resource may never be found, in which case it may as well not exist. surrogate record quality influences relevance ranking and can mean the difference between the most relevant result appearing on page 1 or page 50 (relevance, of course, being a somewhat disputed term). solr and similar systems will search all surrogates, including those that are of poor quality, but the resulting relevancy ranking will be that much less meaningful. figure 13. example of an implementation-based vs. resource-based uri implementation-based http://www.library.unlv.edu/arch/archdb2/index.php/projects/view/1509 resource-based (hypothetical) http://www.library.unlv.edu/item/483742 figure 11. the multipac item view page figure 12. the multipac list view page information discovery insights gained from multipac | dolski 179 metadata quality can be evaluated on several levels, from extremely specific to extremely broad (figure 14). that which may appear to be adequate at one level may fail at a higher level. using this figure as an example, multipac requires strong adherence to level 5, whereas most of our metadata fails to reach level 4. a “level 4 failure” is illustrated in table 3, which compares sample metadata records from four different multipac resources. empty cells are not necessarily “bad”— not all metadata elements apply to all resources—but this type of inconsistency multiplies as the number of resources grows, which can have negative implications for retrieval. suggestions for improving metadata quality the results from the multipac project suggest that metadata rules should be applied strictly and comprehensively according to library-wide standards that, at our libraries, have yet to be enacted. surrogate records must be treated as must-have (rather than nice-to-have) features of all resources. resources that are not yet described in a system that supports searchable surrogate records should be transitioned to one that does; for example, html webpages should be transitioned to a content management system with metadata ascription and searchability features (at unlv, this is planned). however, it is not enough for resources to have high-quality metadata if not all schemas are in sync. there exist a number of resources in our library that are well-described but whose schemas do not mesh well with other resources. different formats are used; different descriptive elements figure 14. example scopes of metadata application and evaluation, from broad (top) to specific table 3. comparing sample crosswalked metadata from four different unlv libraries resources library catalog digital collections special collections database las vegas architects & buildings database title goldfield: boom town of nevada map of tonopah mining district, nye county, nevada 0361 : mines and mining collection flamingo hilton las vegas creator paher, stanley w. booker & bradford call number f849.g6p34 contents (item-level description of contents) format digital object photo collections database record language eng eng eng coverage tonopah mining district (nev.) ; ray mining district (nev.) description (omitted for brevity) publisher nevada publications university of nevada las vegas libraries unlv architecture studies library subject (lcsh omitted for brevity) (lcsh omitted for brevity) 180 information technology and libraries | december 2009 are used; and different interpretations, however subtle, are made of element meanings. despite the best intentions of everyone involved with its creation and maintenance, and despite the high quality of many of our metadata records when examined in isolation, in the big picture, multipac has demonstrated—perhaps for the first time—how much work will be needed to upgrade our metadata for a discovery system. would the benefits make the effort worthwhile? would the effort be implementable and sustainable given the limitations of the present generation of “silo” systems? what kind of adjustments would need to be made to accommodate effective workflows, and what might those workflows look like? these questions still await answers. of note, all other open-source and vendor systems suffer from the same issues, which is a key reason that these types of systems are not yet ascendant in libraries.9 there is much promise in the ability of infrastructural standards like frbr, skos, rda, and the many other esoteric information acronyms to pave the way for the next generation of library discovery systems. organizational considerations electronic information has so far proved relatively elusive to manage; some of it is ephemeral in existence, most of it is constantly changing, and all of it is from diverse sources. attempts to deal with electronic resources—representing them using catalog surrogate records, streamlining website portals, farming out the problem to vendors—have not been as successful as they have needed to be and suffer from a number of inherent limitations. multipac would constitute a major change in library resource provision. our library, like many, is for the most part organized around a core 1970s–80s ils–support model that is not well adapted to a modern unified discovery environment. next-generation discovery is trending away from assembly-line-style acquisition and processing of primarily physical resources and toward agglomerating interspersed networked and physical resource clouds from onand offsite.10 in this model, increasing responsibilities are placed on all content providers to ensure that their metadata conforms to site-wide protocols that, at our library, have yet to be developed. n conclusion in deciding how to best deal with discovery issues, we found that a traditional product matrix comparison does not address the entire scope of the problem, which is that some of the discoverability inadequacies in our libraries are caused by factors that cannot be purchased. sound metadata is essential for proper functioning of a unified discovery system, and descriptive uniformity must be ensured on multiple levels, from the element level to the institution level. technical facilitators of improved discoverability already exist; the responsibility falls on us to adapt to the demands of future discovery systems. the specific discovery tool itself is only a facilitator, the specific implementation of which is likely to change over time. what will not change are library-wide metadata quality issues that will serve any tool we happen to deploy. the multipac project brought to light important library-wide discoverability issues that may not have been as obvious before, exposing a number of limitations in our existing metadata as well as giving us a glimpse of what it might take to improve our metadata to accommodate a next-generation discovery system, in whatever form that might take. references 1. unlv libraries usability committee, internal library website usability testing, las vegas, 2008. 2. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools.” report prepared for the library of congress, 2006. 3. xiaoming liu et al., “federated searching interface techniques for heterogeneous oai repositories,” journal of digital information 4, no. 2 (2002). 4. apache software foundation, apache solr, http://lucene .apache.org/solr/ (accessed june 11, 2009). 5. dublin core metadata initiative, “dublin core metadata element set, version 1.1,” jan. 14, 2008, http://dublincore.org/ documents/dces/ (accessed june 25, 2009). 6. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, jan. 20, 2006, http://orweblog.oclc .org/archives/000927.html (accessed july 15, 2009). 7. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 4, no. 25 (2007): 550–61. 8. tim berners-lee, “hypertext style: cool uris don’t change,” 1998, http://www.w3.org/provider/style/uri (accessed june 23, 2009). 9. bowen, jennifer, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology and libraries 2, no. 27 (june 2008): 6–19. 10. calhoun, “the changing nature of the catalog.” microsoft word 5699-11611-7-ce.docx geographic  information  and  technologies   in  academic  libraries:  an  arl  survey  of   services  and  support       ann  l.  holstein     information  technology  and  libraries  |  march  2015             38   abstract   one  hundred  fifteen  academic  libraries,  all  current  members  of  the  association  of  research  libraries   (arl),  were  selected  to  participate  in  an  online  survey  in  an  effort  to  better  understand  campus   usage  of  geographic  data  and  geospatial  technologies,  and  how  libraries  support  these  uses.  the   survey  was  used  to  capture  information  regarding  geographic  needs  of  their  respective  campuses,  the   array  of  services  they  offer,  and  the  education  and  training  of  geographic  information  services   department  staff  members.  the  survey  results,  along  with  review  of  recent  literature,  were  used  to   identify  changes  in  geographic  information  services  and  support  since  1997,  when  a  similar  survey   was  conducted  by  arl.  this  new  study  has  enabled  recommendations  to  be  made  for  building  a   successful  geographic  information  service  center  within  the  campus  library  that  offers  a  robust  and   comprehensive  service  and  support  model  for  all  geographic  information  usage  on  campus.   introduction   in  june  1992,  the  arl  in  partnership  with  esri  (environmental  systems  research  institute)   launched  the  gis  (geographic  information  systems)  literacy  project.  this  project  sought  to   “introduce,  educate,  and  equip  librarians  with  the  skills  necessary”  to  become  effective  gis  users   and  to  learn  how  to  provide  patrons  with  “access  to  spatially  referenced  data  in  all  formats.”1   through  the  implementation  of  a  gis  program,  libraries  can  provide  “a  means  to  have  the   increasing  amount  of  digital  geographic  data  become  a  more  useful  product  for  the  typical   patron.”2     in  1997,  five  years  after  the  gis  literacy  project  began,  a  survey  was  conducted  to  elucidate  how   arl  libraries  support  patron  gis  needs.  the  survey  was  distributed  to  121  arl  members  for  the   purpose  of  gathering  information  about  gis  services,  staffing,  equipment,  software,  data,  and   support  these  libraries  offered  to  their  patrons.  seventy-­‐two  institutions  returned  the  survey,  a  60%   response  rate.  at  that  time,  nearly  three-­‐quarters  (74%)  of  the  respondents  affirmed  that  their   library  administered  some  level  of  gis  services.3  this  indicates  that  the  gis  literacy  project  had  an   evident  positive  impact  on  the  establishment  of  gis  services  in  arl  member  libraries.   since  then,  it  has  been  recognized  that  the  rapid  growth  of  digital  technologies  has  had  a   tremendous  effect  on  gis  services  in  libraries.4  we  acknowledge  the  importance  of  assessing     ann  l.  holstein  (ann.holstein@case.edu)  is  gis  librarian  at  kelvin  smith  library,  case  western   reserve  university,  cleveland,  ohio.     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   39   how  geographic  services  in  academic  research  libraries  have  further  evolved  over  the  past  17   years  in  response  to  these  advancing  technologies  as  well  as  the  increasingly  demanding   geographic  information  needs  of  their  user  communities.     method   for  this  study,  115  academic  libraries,  all  current  members  of  arl  as  of  january  2014,  were   invited  to  participate  in  an  online  survey  in  an  effort  to  better  understand  campus  usage  of   geographic  data  and  geospatial  technologies  and  how  libraries  support  these  uses.  similar  in   nature  to  the  1997  arl  survey,  the  2014  survey  was  designed  to  capture  information  regarding   geographic  needs  of  their  respective  campuses,  the  array  of  services,  software.  and  support  the   academic  libraries  offer,  and  the  education  and  training  of  geographic  information  services   department  staff  members.  our  aim  was  to  be  able  to  determine  the  range  of  support  patrons  can   anticipate  at  these  libraries  and  ascertain  changes  in  gis  library  services  since  the  1997  survey.   a  cross-­‐sectional  survey  was  designed  and  administered  using  qualtrics,  an  online  survey  tool.  it   was  distributed  in  january  2014  via  email  to  the  person  identified  as  the  subject  specialist  for   mapping  and/or  geographic  information  at  each  arl  member  academic  library.  when  the  survey   closed  after  two  weeks,  54  institutions  had  responded  to  the  survey.  this  accounts  for  47%   participation.  responding  institutions  are  listed  in  the  appendix.   results   software  and  technologies   we  were  interested  in  learning  about  what  types  of  geographic  information  software  and   technologies  are  currently  being  offered  at  academic  research  libraries.  results  show  that  100%  of   survey  respondents  offer  gis  software/mapping  technologies  at  their  libraries,  36%  offer  remote   sensing  software  (to  process  and  analyze  remotely  sensed  data  such  as  aerial  photography  and   satellite  imagery),  and  36%  offer  global  positioning  system  (gps)  equipment  and/or  software.   nearly  all  (98%)  said  that  their  libraries  provide  esri  arcgis  software,  with  83%  also  providing   access  to  google  maps  and  google  earth,  and  35%  providing  qgis  (previously  known  as  quantum   gis).  smatterings  of  other  gis,  remote-­‐sensing,  and  gps  products  are  also  offered  by  some  of  the   libraries,  although  not  in  large  numbers  (see  table  1  for  full  listing).     the  fact  that  nearly  all  survey  respondents  offer  arcgis  software  at  their  libraries  comes  as  no   surprise.  arcgis  is  the  most  commonly  provided  mapping  software  available  in  academic  libraries,   and  in  2011,  it  was  determined  that  2,500  academic  libraries  were  using  esri  products.5  esri   software  was  most  popular  in  1997  as  well,  undoubtedly  because  they  offered  free  software  and   training  to  participants  of  the  gis  literacy  project.6         information  technology  and  libraries  |  march  2015   40   software/technology   type   %  of  providing   libraries   esri  arcgis   gis   98   google  maps/earth   gis   83   qgis   gis   35   autocad   gis   19   erdas  imagine   remote  sensing   19   grass   gis   15   envi   remote  sensing   15   geoda   gis   6   pci  geomatica   remote  sensing   6   garmin  map  source   gps   6   simplymap   gis   4   trimble  terrasync   gps   4   table  1.  geographic  information  software/mapping  technologies  provided  at  arl  member   academic  libraries  (2014)   google  maps  and  google  earth,  launched  in  2005,  have  quickly  become  very  popular  mapping   products  used  at  academic  libraries—a  close  second  only  to  esri  arcgis.  in  addition  to  being  free,   their  ease  of  use,  powerful  visualization  capabilities,  “customizable  map  features  and  dynamic   presentation  tools”  make  them  attractive  alternatives  to  commercial  gis  software  products.7     since  1997,  many  software  programs  have  fallen  out  of  favor.  mapinfo,  idrisi,  maptitude,  and   sammamish  data  finder/geosight  pro  were  gis  software  programs  listed  in  the  1997  survey   results  that  are  not  used  today  at  arl  member  academic  libraries.8  instead,  open  source  software   such  as  qgis,  grass,  and  geoda  are  growing  in  popularity.  they  are  free  to  use  and  their  source   code  may  be  modified  as  needed.   gps  equipment  lending  can  be  very  beneficial  to  students  and  campus  researchers  who  need  to   collect  their  own  field  research  locational  data.  the  2014  survey  found  that  30%  of  respondents   loan  recreational  gps  equipment  at  their  libraries  and  10%  loan  mapping-­‐grade  gps  equipment.   the  high  cost  of  mapping-­‐grade  gps  equipment  (several  thousand  dollars)  may  be  a  barrier  for   some  libraries;  however,  this  is  the  type  of  equipment  recommended  in  best-­‐practice  methods  for   gathering  highly  accurate  gps  data  for  research.  in  addition  to  expense,  complexity  of  operation  is   another  consideration.  while  it  is  “fairly  simple  to  use  a  recreational  gps  unit,”  a  certain  level  of   advanced  training  is  required  for  operating  mapping-­‐grade  gps  equipment.9  a  designated  staff   member  may  need  to  take  on  the  responsibility  of  becoming  the  in-­‐house  gps  expert  and  routinely   offer  training  sessions  to  those  interested  in  borrowing  mapping-­‐grade  gps  equipment.     location     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   41   at  36%  of  responding  libraries,  the  geographic  information  services  area  is  located  where  the   paper  maps  are  (map  department/services);  19%  have  separated  this  area  and  designated  it  as  a   geospatial  data  center,  gis,  or  data  services  department;  13%  integrate  it  with  the  reference   department;  and  just  4%  of  libraries  house  the  gis  area  in  government  documents.  table  2  lists  all   reported  locations  for  this  service  area.  not  surprisingly,  in  1997,  government  documents  (39%)   was  just  as  popular  a  location  for  this  service  area  as  within  the  map  department  (43%).10   libraries  identified  government  documents  as  a  natural  fit,  keeping  gis  services  within  close   proximity  to  spatial  data  sets  recently  being  distributed  by  government  agencies,  most  notably  the   us  government  printing  office  (gpo).  these  agencies  had  made  the  decision  to  distribute  “most   data  in  machine  readable  form,”11  including  the  1990  census  data  as  topographically  integrated   geographic  encoding  and  referencing  (tiger)  files.12  gis  technologies  were  needed  to  access  and   most  effectively  use  information  within  these  massive  spatial  datasets.     location   %  of  libraries  (1997)   %  of  libraries  (2014)   map  department/services   43   36   government  documents   39   4   reference   10   13   geospatial  data  center,  gis,  or  data  services   3   19   not  in  any  one  location   -­‐   9   digital  scholarship  center   -­‐   6   combined  area  (i.e.,  map  dept.  &  gov.  docs.)   -­‐   6   table  2.  location  of  the  geographic  information  services  area  within  the  library  (1997  and  2014)   at  59%  of  responding  libraries,  geographic  information  software  is  available  on  computer   workstations  in  a  designated  area,  such  as  within  the  map  department.  however,  many  do  not   restrict  users  by  location  and  have  the  software  available  on  all  computer  workstations   throughout  the  library  (37%)  or  on  designated  workstations  distributed  throughout  the  library   (33%).  a  small  percentage  (7%)  loan  laptops  to  patrons  with  the  software  installed,  allowing  full   mobility  throughout  the  entire  library  space.   staffing   most  professional  staff  working  in  the  geographic  information  services  department  hold  one  or   more  postbaccalaureate  advanced  degrees.  of  113  geographic  services  staff  at  responding   libraries,  65%  had  obtained  an  ma/ms,  mls/mlis,  or  phd;  43%  have  one  advanced  degree,  while   22%  have  two  postbaccalaureate  degrees.  half  (50%)  hold  an  mls/mlis,  31%  hold  an  ma/ms,   and  6%  hold  a  phd.  nearly  one-­‐third  (31%)  have  obtained  a  ba/bs  as  their  highest  educational   degree,  3%  had  a  two-­‐year  technical  degree,  and  2%  had  only  earned  a  ged  or  high  school   diploma.  in  1997,  84%  of  gis  librarians  and  specialists  at  arl  libraries  had  an  mls  degree.13  at   that  time,  the  incumbent  was  most  often  recruited  from  within  the  library  to  assume  this  new  role,     information  technology  and  libraries  |  march  2015   42   whereas  today’s  gis  professionals  are  just  as  likely  to  come  from  nonlibrary  backgrounds,   bringing  their  expertise  and  advanced  geographic  training  to  this  nontraditional  librarian  role.     figure  1.  highest  educational  degree  of  geographic  services  staff  (2014)   on  average,  this  department  is  staffed  by  two  professional  staff  members  and  three  student  staff.   student  employees  can  be  a  terrific  asset,  especially  if  they  have  been  previously  trained  in  gis.   students  are  likely  to  be  recruited  from  departments  that  are  the  heaviest  gis  users  at  the   university  (i.e.,  geography,  geology).  some  libraries  have  implemented  “co-­‐op”  programs  where   students  can  receive  credit  for  working  at  the  gis  services  area.  these  dual-­‐benefit  positions  are   quite  lucrative  to  students.14     campus  users   in  a  typical  week  during  the  course  of  a  semester,  responding  libraries  each  serve  approximately   sixteen  gis  users,  four  remote  sensing  users,  and  three  gps  users.  these  users  may  obtain   assistance  from  department  staff  either  in-­‐person  or  remotely  via  phone  or  email.     on  average,  undergraduate  and  graduate  students  compose  the  majority  (75%)  of  geographic   service  users  (32%  and  43%,  respectively).  faculty  members  compose  14%  of  the  users,  followed   by  staff  (including  postdoctoral  researchers)  at  7%.  some  institutions  also  provide  support  to   public  patrons  and  alumni  (4%  and  1%,  respectively).  in  1997,  it  was  estimated  that  on  average,   63%  of  gis  users  were  students,  22%  were  faculty,  8%  were  staff,  and  8%  were  public.15   ged/hs   2%   2yr  tech   3%   ba/bs   31%   ma/ms/mlis   58%   phd   6%     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   43     figure  2.  comparison  of  the  percentage  of  geographic  service  users  by  patron  status  (1997  and   2014)   the  top  three  departments  that  use  gis  software  at  arl  campuses  are  environmental   science/studies,  urban  planning/studies,  and  geography.  the  most  frequent  remote  sensing   software  users  come  from  the  departments  of  environmental  science/studies,  geography,  and   archaeology.  gps  equipment  loan  and  software  usage  is  most  popular  with  the  departments  of   environmental  science/studies,  geography,  biology/ecology  and  archaeology  (see  table  3  for  full   listing).  some  departments  are  heavy  users  of  all  geographic  technologies,  while  others  have   shown  interest  in  only  one.  for  example,  the  departments  of  psychology  and  medicine/dentistry   have  used  gis  but  have  expressed  little  or  no  interest  in  using  remote-­‐sensing  or  gps  technologies.   support  and  services   the  campus  community  is  supported  by  library  staff  in  a  variety  of  ways  with  regards  to  gis,   remote-­‐sensing,  and  gps  technology  and  software  use.  nearly  all  (94%)  libraries  provide   assistance  using  the  software  for  specific  class  assignments  and  projects,  and  78%  are  able  to   provide  more  in-­‐depth  research  project  consultations.  more  than  one-­‐quarter  (27%)  of  reporting   libraries  will  make  custom  gis  maps  for  patrons,  although  there  may  be  a  charge  depending  on  the   library,  project,  and  patron  type  (10%).  most  (90%)  offer  basic  use  and  troubleshooting  support;   however,  just  39%  offer  support  for  software  installation,  and  55%  offer  technical  support  for   problems  such  as  licensing  issues  and  turning  on  extensions.  the  campus  computing  center  or   information  technology  services  (its)  at  arl  institutions  most  likely  fields  some  of  the  software   installation  and  technical  issues  rather  than  the  library,  thus  accounting  for  the  lower  percentages.     a  variety  of  software  training  may  be  offered  to  the  campus  community  through  the  library;  80%   of  responding  libraries  make  visits  to  classes  to  give  presentations  and  training  sessions,  69%  host   workshops,  47%  provide  opportunities  for  virtual  training  courses  and  tutorials,  and  4%  offer   certificate  training  programs.     0   10   20   30   40   50   60   70   80   students   faculty   staff   public   alumni   1997   2014     information  technology  and  libraries  |  march  2015   44   department   gis   remote  sensing   gps   anthropology   24   10   8   archaeology   24   14   13   architecture   24   1   6   biology/ecology   32   10   13   business/economics   23   1   3   engineering   18   9   11   environmental  science/studies   41   22   16   forestry/wildlife/fisheries   21   12   10   geography   35   22   15   geology   31   12   10   history   27   2   2   information  sciences   14   1   0   nursing   8   1   2   medicine/dentistry   9   0   0   political  science   25   3   5   psychology   4   0   0   public  health/epidemiology/  biostatistics   30   3   9   social  work   2   0   1   sociology   22   0   3   soil  science   17   5   4   statistics   8   3   0   urban  planning/studies   36   7   9   table  3.  number  of  arl  libraries  reporting  frequent  users  of  gis,  remote-­‐sensing,  or  gps   software  and  technologies  from  a  campus  department  (2014)     often,  the  library  is  not  the  only  place  people  can  go  to  obtain  software  support  and  training  on   campus.  most  (86%)  responding  libraries  state  that  their  university  offers  credit  courses,  and  41%   of  campuses  have  a  gis  computer  lab  located  elsewhere  on  campus  that  may  be  utilized.  its  is   available  for  assistance  at  29%  of  the  universities,  and  continuing  education  offers  some  level  of   training  and  support  at  14%  of  campuses.     data  collection  and  access   most  (85%)  of  responding  libraries  collect  geographic  data  and  allow  an  annual  budget  for  it.   “libraries  that  have  invested  money  in  proprietary  software  and  trained  staff  members  will  tend   to  also  develop  and  maintain  their  own  collection  of  data  resources.”16  of  those  collecting  data,  26%   spend  less  than  $1,000  annually,  15%  spend  between  $1,000  and  $2,499,  17%  spend  between   $2,500  and  $5,000,  while  41%  spend  more  than  $5,000.  in  1997,  79%  of  libraries  spent  less  than   $2,000  annually,  and  only  9%  spent  more  than  $5,000.17       geographic  information  and  technologies  in  academic  research  libraries  |  holstein   45     figure  3.  annual  budget  allocations  for  geographic  data  (2014)   a  dramatic  shift  has  occurred  over  the  years  with  budget  allocations  for  data  sets.  no  longer  are   academic  libraries  just  collecting  free  government  data  sets  as  was  typically  the  case  back  in  1997,   but  they  are  investing  much  more  of  their  materials  budget  into  building  up  the  geographic  data   collection  for  their  users.     data  is  made  accessible  to  campus  users  in  a  variety  of  ways.  a  majority  (84%)  offer  data  via   remote  access  or  download  from  a  networked  campus  computer,  using  a  virtual  private  network   (vpn)  or  login.  more  than  half  (62%)  of  responding  libraries  provide  access  to  data  from   workstations  within  the  library,  and  64%  lend  cd-­‐roms.   roughly  one-­‐quarter  (26%)  of  responding  libraries  provide  users  with  storage  for  their  data.  of   those,  29%  have  a  dedicated  geographic  data  server,  14%  use  the  main  library  server,  29%  point   users  to  the  university  server  or  institutional  repository,  and  36%  allow  users  to  store  their  data   directly  onto  a  library  computer  workstation  hard  drive.   internal  use  of  gis  in  libraries   geographic  information  technologies  may  be  used  internally  to  help  patrons  navigate  the  library’s   physical  collections  and  efficiently  locate  print  materials.  of  the  survey  respondents,  60%  use  gis   for  map  or  air  photo  indexing,  27%  use  the  technology  to  create  floor  maps  of  the  library  building,   and  15%  use  it  to  map  the  library’s  physical  collections.  “the  use  of  gis  in  mapping  library   collections  is  one  of  the  non-­‐traditional  but  useful  applications  of  gis.”18  gis  can  be  used  to  link   library  materials  to  simulated  views  of  floor  maps  through  location  codes.19  this  enables  patrons   to  determine  the  exact  location  of  library  material  by  providing  them  with  item  “location  details   such  as  stacks,  row,  rack,  shelf  numbers,  etc.”20  the  gis  system  can  become  a  useful  tool  for   collection  management  and  can  be  a  tremendous  time-­‐saver  for  patrons,  especially  those   unfamiliar  with  the  cataloging  system  or  collection  layout.     discussion   recommendations  for  building  a  successful  geographic  information  service  center   0   5   10   15   20   25   30   35   40   45   percent  (%)     information  technology  and  libraries  |  march  2015   46   the  geographic  information  services  area  is  often  a  blend  of  the  traditional  and  modern.  it  can   extend  to  paper  maps,  atlases,  gps  equipment,  software  manuals,  large-­‐format  scanners,  printers,   and  gis.  gis  services  may  include  a  cluster  of  computers  with  gis  software  installed,  an  accessible   collection  of  gis  data  resources,  and  assistance  available  from  the  library  staff.  the  question  for   academic  libraries  today  is  no  longer  “whether  to  offer  gis  services  but  what  level  of  service  to   offer.”21  every  university  has  different  gis  needs,  and  the  library  must  decide  how  it  can  best   support  these  needs.  there  is  no  set  formula  for  building  a  geographic  information  service  center   because  each  institution  “has  a  different  service  mission  and  user  base.”22  every  library’s  gis   service  program  will  be  designed  with  its  unique  institutional  needs  in  mind;  however,  they  each   will  incorporate  some  combination  of  hardware,  software,  data,  and  training  opportunities   provided  by  at  least  one  knowledgeable  staff  member.23     “gis  represents  a  significant  investment  in  hardware,  software,  staffing,  data  acquisition,  and   ongoing  staff  development.  either  new  money  or  significant  reallocation  is  required.”24   establishing  new  or  enhancing  gis  services  in  the  library  requires  the  “serious  assessment  of  long-­‐ term  support  and  funding  needs.”25  commitment  of  the  university  as  a  whole,  or  at  least  support   from  senior  administration,  “library  administration,  and  related  campus  departments”  is  crucial  to   its  success.26  receiving  “more  funding  will  mean  more  staff,  better  trained  staff,  a  more  in-­‐depth   collection,  better  hardware  and  software,  and  the  ability  to  offer  multiple  types  of  gis  services.”27     once  funding  for  this  endeavor  has  been  secured,  it  is  of  utmost  importance  to  recruit  a  gis   professional  to  manage  the  geographic  information  service  center.  to  be  most  effective  in  this   position,  the  incumbent  should  possess  a  graduate  degree  in  gis  or  geography;  however,   depending  on  what  additional  responsibilities  would  be  required  of  the  candidate  (i.e.,  reference,   cataloging,  etc.)  a  second  degree  in  library  science  is  strongly  recommended.  this  staff  member   should  possess  mapping  and  gis  skills,  which  include  experience  with  esri  software  and  remote   sensing  technologies.  employees  in  this  position  may  be  given  a  job  titles  such  as  “gis  specialists,   gis/data  librarians,  gis/map  librarians,  digital  cartographers,  spatial  data  specialists,  and  gis   coordinators.”28     with  the  new  staff  member  on  board,  hereafter  referred  to  as  “gis  specialist,”  decisions  such  as   what  software  to  provide,  which  data  sets  to  collect,  and  what  types  of  training  and  support  to   offer  to  the  campus  can  be  made.  consulting  with  research  centers  and  academic  departments  that   currently  use  or  are  interested  in  using  gis  and  remote  sensing  technologies  is  a  good  place  to   learn  about  software,  data,  and  training  needs  and  to  determine  the  focus  and  direction  of  the   geographic  information  services  department.29  campus  users  often  come  from  academic   departments  that  “have  neither  staff  nor  facilities  to  support  gis,”  and  “may  only  consist  of  one  or   two  faculty  and  a  few  graduate  students.  these  gis  users  need  access  to  software,  data,  and   expertise  from  a  centralized,  accessible  source  of  research  assistance,  such  as  the  library.”30     at  minimum,  esri  arcgis,  google  maps  and  google  earth  should  be  supported,  with  additional   remote  sensing  or  open  source  gis  software  depending  on  staff  expertise  and  known  campus     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   47   needs.  when  purchasing  commercial  software  licenses,  such  as  for  esri  arcgis,  discounts  for   educational  institutions  are  usually  available.  additionally,  negotiating  campus-­‐wide  software   licenses  may  be  a  good  option  to  consider  as  the  costs  are  usually  far  less  than  purchasing   individual  or  floating  licenses.  costs  for  campus-­‐wide  licensing  are  typically  determined  by  full-­‐ time  equivalent  (fte)  students  enrolled  at  the  university.     facilitating  “access  to  educational  resources  such  as  software  tools  and  applications,  how-­‐to-­‐ guides  for  data  and  software,”  and  tutorials  is  crucial.31  the  gis  specialist  must  be  familiar  with   how  gis  software  can  be  used  by  many  disciplines,  the  availability  of  “training  courses  or  tutorials,   sources  or  extensible  gis  software,  and  hundreds  of  software  and  application  books.”32  tutorials   may  be  provided  direct  from  a  software  vendor  (i.e.,  esri  virtual  campus)  or  developed  in-­‐house   by  the  gis  specialist.  creating  “gis  tutorials  on  short,  task-­‐based  techniques  such  as   georeferencing  or  geocoding”  and  making  them  readily  available  online  or  as  a  handout  may  save   time  having  to  repeatedly  explain  these  techniques  to  patrons.33   geospatial  data  collection  development  is  a  core  function  of  the  geographic  information  services   department.  to  effectively  develop  the  data  collection,  the  gis  specialist  must  fully  comprehend   the  needs  of  the  user  community  as  well  as  possess  a  “fundamental  understanding  of  the  nature   and  use  of  gis  data.”34  this  is  often  referred  to  as  “spatial  literacy.”35  it  is  crucial  to  keep  abreast  of   “recent  developments,  applications,  and  data  sets.”36   the  gis  specialist  will  spend  much  more  time  searching  for  and  acquiring  geographic  data  sets   than  selecting  and  purchasing  traditional  print  items  such  as  maps,  monographs,  and  journals  for   the  collection.  a  budget  should  be  established  annually  for  the  purchase  of  all  geographic   materials,  both  print  and  digital.  a  great  challenge  for  the  specialist  is  to  acquire  data  at  the  lowest   cost  possible.  while  a  plethora  of  free  data  is  available  online  from  government  agencies  and   nonprofit  organizations,  other  data,  available  only  from  private  companies,  may  be  quite   expensive  because  of  the  high  production  costs.  a  collection  development  policy  should  be  created   that  indicates  the  types  of  materials  and  data  collected  and  specifies  geographic  regions,  formats,   and  preferred  scales.37  the  needs  of  the  user  community  must  be  carefully  considered  when   establishing  the  policy.     the  expertise  of  the  gis  specialist  is  needed  not  only  to  help  patrons  locate  the  appropriate   geographic  data,  but  also  to  use  the  software  to  process,  interpret,  and  analyze  it.  “only  the  few   library  patrons  that  have  had  gis  experience  are  likely  to  obtain  any  level  of  success  without   intervention  by  library  staff”;38  thus,  for  any  mapping  program  installed  on  a  library  computer,   “staff  must  have  working  knowledge  of  the  program”  and  must  be  able  to  provide  support  to   users.39  furthermore,  the  gis  specialist  must  be  able  to  train  patrons  to  use  the  software  to   complete  common  tasks  such  as  file  format  conversion,  data  projection,  data  manipulation,  and   geoprocessing.  these  geospatial  technologies  involve  a  steep  learning  curve,  and  unfortunately   “hands-­‐on  training  options  outside  the  university  are  often  cost-­‐prohibitive”  for  many.40  the   campus  community  requires  training  opportunities  to  be  both  convenient  and  inexpensive.     information  technology  and  libraries  |  march  2015   48   teaching  hands-­‐on  geospatial  technology  workshops,  from  basic  to  the  advanced,  is  fundamental   to  educating  the  campus  community.  workshops  will  “vary  from  institution  to  institution,  with   some  offering  students  an  introduction  to  mapping  and  others  focusing  on  specific  features  of  the   program,  such  as  georeferencing,  geocoding,  and  spatial  analysis.  some  also  offer  workshops  that   are  theme  specific,”  such  as  “working  with  census  data”  or  “digital  elevation  modeling.”41  custom   workshops  or  training  sessions  can  be  developed  to  meet  a  specific  campus  need,  tailored  for  a   specific  class  in  consult  with  an  instructor,  or  designed  especially  for  other  library  staff.     today’s  geographic  information  service  center   the  academic  map  librarian  from  the  1970s  or  1980s  would  hardly  recognize  todays’  geographic   information  service  center.  what  was  once  a  room  of  map  cases  and  shelves  of  atlases  and   gazetteers  is  now  a  bustling  geospatial  center.  computers,  powerful  gis  and  remote-­‐sensing   technologies,  gps  devices,  digital  maps,  and  data  are  now  available  to  library  patrons.  every   library  surveyed  provides  gis  software  to  campus  users,  and  85%  also  actively  collect  gis  and   remotely  sensed  data.  with  the  assistance  of  expertly  trained  library  staff,  users  with  no  or  limited   experience  using  geospatial  technologies  are  enabled  to  analyze  spatial  data  sets  and  create   custom  maps  for  coursework,  projects,  and  research.  nearly  all  surveyed  libraries  (94%)  have   staff  that  can  assist  students  specifically  with  software  use  for  class  assignments  and  projects,   while  90%  provide  assistance  with  more  generalized  use  of  the  software.  a  majority  of  libraries   also  offer  a  variety  of  software  training  sessions,  workshops,  and  give  presentations  to  the  campus   community.  all  this  is  made  possible  through  the  library’s  commitment  to  this  service  area  and  the   availability  of  highly  trained  professional  staff,  most  who  hold  a  masters  or  doctoral  degree.  the   library  has  truly  established  itself  as  the  go-­‐to  location  on  campus  for  spatial  mapping  and  analysis.   this  role  has  only  strengthened  in  the  years  since  the  launch  of  the  arl  gis  literacy  project  in   1992.   references   1.     d.  kevin  davie  et  al.,  comps.,  spec  kit  238:  the  arl  geographic  information  systems  literacy   project  (washington,  dc:  association  of  research  libraries,  office  of  leadership  and   management  services,  1999),  16.   2.   ibid.,  3.   3.   ibid.,  i.   4.   abraham  parrish,  “improving  gis  consultations:  a  case  study  at  yale  university  library,”   library  trends  55,  no.  2  (2006):  328,  http://dx.doi.org/10.1353/lib.2006.0060.     5.     eva  dodsworth,  getting  started  with  gis:  a  lita  guide  (new  york:  neal-­‐schuman,  2012),  161.   6.   davie  et  al.,  spec  kit  238,  i.     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   49   7.   eva  dodsworth  and  andrew  nicholson,  “academic  uses  of  google  earth  and  google  maps  in  a   library  setting,”  information  technology  &  libraries  31,  no.  2  (2012):  102,   http://dx.doi.org/10.6017/ital.v31i2.1848.   8.   davie  et  al.,  spec  kit  238,  8.   9.   gregory  h.  march,  “surveying  campus  gis  and  gps  users  to  determine  role  and  level  of   library  services,”  journal  of  map  &  geography  libraries  7,  no.  2  (2011):  170–71,   http://dx.doi.org/10.1080/15420353.2011.566838.   10.   davie  et  al.,  spec  kit  238,  5.     11.   george  j.  soete,  spec  kit  219:  transforming  libraries  issues  and  innovation  in  geographic   information  systems.  (washington,  dc:  association  of  research  libraries,  office  of   management  services,  1997),  5.   12.   camila  gabaldón  and  john  repplinger,  “gis  and  the  academic  library:  a  survey  of  libraries   offering  gis  services  in  two  consortia,”  issues  in  science  and  technology  librarianship  48   (2006),  http://dx.doi.org/10.5062/f4qj7f8r.   13.   davie  et  al.,  spec  kit  238,  5.   14.   soete,  spec  kit  219,  9.   15.   davie  et  al.,  spec  kit  238,  10.   16.   dodsworth,  getting  started  with  gis,  165.   17.   davie  et  al.,  spec  kit  238,  9.   18.   d.  n.  phadke,  geographical  information  systems  (gis)  in  library  and  information  services  (new   delhi:  concept,  2006),  36–37.   19.   ibid.,  13.   20.   ibid.,  74.   21.   rhonda  houser,  “building  a  library  gis  service  from  the  ground  up,”  library  trends  55,  no.  2   (2006):  325,  http://dx.doi.org/10.1353/lib.2006.0058.   22.   melissa  lamont  and  carol  marley,  “spatial  data  and  the  digital  library,”  cartography  and   geographic  information  systems  25,  no.  3  (1998):  143,   http://dx.doi.org/10.1559/152304098782383142.     information  technology  and  libraries  |  march  2015   50   23.   carolyn  d.  argentati,  “expanding  horizons  for  gis  services  in  academic  libraries,”  journal  of   academic  librarianship  23,  no.  6  (1997):  463,   http://dx.doi.org/10.1559/152304098782383142.   24.   soete,  spec  kit  219,  11.   25.   carol  cady  et  al.,  “geographic  information  services  in  the  undergraduate  college:   organizational  models  and  alternatives,”  cartographica  43,  no.  4  (2008):  249,   http://dx.doi.org/10.3138/carto.43.4.239.   26.   houser,  “building  a  library,”  325.   27.   r.  b.  parry  and  c.  r.  perkins,  eds.,  the  map  library  in  the  new  millennium  (chicago:  american   library  association,  2001),  59–60.   28.  patrick  florance,  “gis  collection  development  within  an  academic  library,”  library  trends  55,   no.  2  (2006):  223,  http://dx.doi.org/10.1353/lib.2006.0057.   29.   houser,  “building  a  library,”  325.   30.   ibid.,  323.   31.   ibid.,  322.   32.   parrish.  “improving  gis,”  329.   33.   ibid,  336.   34   florance,  “gis  collection  development,”  222.   35.    soete,  spec  kit  219,  6.   36.    dodsworth,  getting  started  with  gis,  165.   37.   soete,  spec  kit  219,  8.   38.   gabaldón  and  repplinger,  “gis  and  the  academic  library.”   39.   dodsworth,  getting  started  with  gis,  164.   40.   houser,  “building  a  library,”  323.   41.   dodsworth,  getting  started  with  gis,  161–62.         geographic  information  and  technologies  in  academic  research  libraries  |  holstein   51   appendix   responding  institutions   arizona  state  university  libraries   university  of  michigan  library   auburn  university  libraries   michigan  state  university  libraries   boston  college  libraries   university  of  nebraska–lincoln  libraries   university  of  calgary  libraries  and  cultural  resources   new  york  university  libraries   university  of  california,  los  angeles,  library   university  of  north  carolina  at  chapel  hill  libraries   university  of  california,  riverside,  libraries   north  carolina  state  university  libraries   university  of  california,  santa  barbara,  libraries   northwestern  university  library   case  western  reserve  university  libraries   university  of  oregon  libraries   colorado  state  university  libraries   university  of  ottawa  library   columbia  university  libraries   university  of  pennsylvania  libraries   university  of  connecticut  libraries   pennsylvania  state  university  libraries   cornell  university  library   purdue  university  libraries   dartmouth  college  library   queen’s  university  library   duke  university  library   rice  university  library   university  of  florida  libraries   university  of  south  carolina  libraries   georgetown  university  library   university  of  southern  california  libraries   university  of  hawaii  at  manoa  library   syracuse  university  library   university  of  illinois  at  chicago  library   university  of  tennessee,  knoxville,  libraries   university  of  illinois  at  urbana-­‐champaign  library   university  of  texas  libraries   indiana  university  libraries  bloomington   texas  tech  university  libraries   johns  hopkins  university  libraries   university  of  toronto  libraries   university  of  kansas  libraries   tulane  university  library   mcgill  university  library   vanderbilt  university  library   university  of  manitoba  libraries   university  of  waterloo  library   university  of  maryland  libraries   university  of  wisconsin–madison  libraries   massachusetts  institute  of  technology  libraries   yale  university  library   university  of  miami  libraries   york  university  libraries   author name and second author the use of ajax, or asynchronous javascript + xml, can result in web applications that demonstrate the flexibility, responsiveness, and usability traditionally found only in desktop software. to illustrate this, a repository metasearch user interface, ojax, has been developed. ojax is simple, unintimidating but powerful. it attempts to minimize upfront user investment and provide immediate dynamic feedback, thus encouraging experimentation and enabling enactive learning. this article introduces the ajax approach to the development of interactive web applications and discusses its implications. it then describes the ojax user interface and illustrates how it can transform the user experience. w ith the introduction of the ajax development paradigm, the dynamism and richness of desktop applications become feasible for web-based applications. ojax, a repository metasearch user interface, has been developed to illustrate the potential impact of ajax-empowered systems on the future of library software.1 this article describes the ajax method, highlights some uses of ajax technology, and discusses the implications for web applications. it goes on to illustrate the user experience offered by the ojax interface. ■ ajax in february 2005, the term ajax acquired an additional meaning: asynchronous javascript + xml.2 the concept behind this new meaning, however, has existed in various forms for several years. ajax is not a single technology but a general approach to the development of interactive web applications. as the name implies, it describes the use of javascript and xml to enable asynchronous communication between browser clients and server-side systems. as explained by garrett, the classic web application model involves user actions triggering a hypertext transfer protocol (http) request to a web server.3 the latter processes the request and returns an entire hypertext markup language (html) page. every time the client makes a request to the server, it must wait for a response, thus potentially delaying the user. this is particularly true for large data sets. but research demonstrates that response times of less than one second are required when moving between pages if unhindered navigation is to be facilitated through an information space.4 the aim of ajax is to avoid this wait. the user loads not only a web page, but also an ajax engine written in javascript. users interact with this engine in the same way that they would with an html page, except that instead of every action resulting in an http request for an entire new page, user actions generate javascript calls to the ajax engine. if the engine needs data from the server, it requests this asynchronously in the background. thus, rather than requiring the whole page to be refreshed, the javascript can make rapid incremental updates to any element of the user interface via brief requests to the server. this means that the traditional page-based model used by web applications can be abandoned; hence, the pacing of user interaction with the client becomes independent of the interaction between client and server. xmlhttprequest is a collection of application programming interfaces (apis) that use http and javascript to enable transfer of data between web servers and web applications.5 initially developed by microsoft, xmlhttprequest has become a de facto standard for javascript data retrieval and is implemented in most modern browsers. it is commonly used in the ajax paradigm. the data accessed from the http server is usually in extensible markup language (xml) but another format, such as javascript object notation, could be used.6 applications of ajax google is the most significant user of ajax technology to date. most of its recent innovations, including gmail, google suggest, google groups, and google maps, employ the paradigm.7 the use of ajax in google suggest improves the traditional google interface by offering real-time suggestions as the user enters a term in the search field. for example, if the user enters xm, google suggest might offer refinements such as xm radio, xml, and xmods. experimental ajax-based auto-completion features are appearing in a range of software.8 shanahan has applied the same ideas to the amazon online bookshop.9 his experimental site, zuggest, extends the concept of auto-completion: as the user enters a term, the system automatically triggers a search without the need to hit a search button. the potential of ajax to improve the responsiveness and richness of library applications has not been lost on the library community.10 several interesting experiments have been tried. at oclc, for example, a “suggest-like service,” based on controlled headings from the worldjudith wusteman and pádraig o’hiceadha using ajax to empower dynamic searching | wusteman 57 using ajax to empower dynamic searching judith wusteman (judith.wusteman@ucd.ie) is a lecturer in the ucd school of information and library studies, university college dublin, ireland. 58 information technology and libraries | june 2006 wide union catalog, worldcat, has been implemented.11 ajax has also been used in the oclc deweybrowser.12 the main page of this browser includes four iframes, or inline frames, three for the three levels of dewey decimal classification and a fourth for record display.13 the use of ajax allows information in each iframe to be updated independently without having to reload the entire page. implications of ajax there have been many attempts to enable asynchronous background transactions with a server. among alternatives to ajax are flash, java applets, and the new breed of xml user-interface language formats such as xml user interface language (xul) and extensible application markup language (xaml).14 these all have their place, particularly languages such as xul. the latter is ideal for use in mozilla extensions, for example. combinations of the above can and are being used together; xul and ajax are both used in the firefox extension version of google suggest.15 the main advantage of ajax over these alternative approaches is that it is nonproprietary and is supported by any browser that supports javascript and xmlhttprequest—hence, by any modern browser. it could be validly argued that complex client-side javascript is not ideal. in addition to the errors to which complex scripting can be prone, there are accessibility issues. best practice requires that javascript interaction adds to the basic functionality of web-based content that must remain accessible and usable without the javascript.16 an alternative non-javascript interface to gmail was recently implemented to deal with just this issue. a move away from scripting would, in theory, be a positive step for the web. in practice, however, procedural approaches continue to be more popular; attempts to supplant them, as epitomized by xhtml 2.0, simply alienate developers.17 it might be assumed that the use of ajax technology would result in a heavier network load due to an increase in the number of requests made to the server. this is a misconception in most cases. indeed, ajax can dramatically reduce the network load of web applications, as it enables them to separate data from the graphical user interface (gui) used to display it. for example, each results page presented by a traditional search engine delivers, not only the results data, but also the html required to render the gui for that page. an ajax application could deliver the gui just once and, after that, deliver data only. this would also be possible via the careful use of frames; the latter could be regarded as an ajax-style technology but without all of ajax’s advantages. ■ from client-server to soa the dominant model for building network applications is the client/server approach, in which client software is installed as a desktop application and data generally reside on a server, usually in a database.18 this can work well in a homogenous single-site computing environment. but institutions and consortia are likely to be heterogeneous and geographically distributed. pcs, macs, and cell phones will all need access to the applications, and linux may require support alongside windows. even if an organization standardizes solely on windows, different versions of the latter will have to be supported, as will multiple versions of those ubiquitous dynamic link libraries (dlls). indeed, the problems of obtaining and managing conflicting dlls have spawned the term “dll hell.”19 in web applications, a standard client, the browser, is installed on the desktop but most of the logic, as well as the data, reside on the server. of course, the browser developers still have to worry about “dll hell,” but this need not concern the rest of us. “speed must be the overriding design criterion” for web pages.20 but the interactivity and response times possible with client/server applications are still not available to traditional web applications. this is where ajax comes in: it offers, to date, the best of the web application and client/server worlds. much of the activity is moved back to the desktop via client-side code. but the advantages of web applications are not lost: the browser is still the standard client. service-oriented architecture (soa) is an increasingly popular approach to the delivery of applications to heterogeneous computing environments and geographically dispersed user populations.21 soa refers to the move away from monolithic applications toward smaller, reusable services with discrete functionality. such services can be combined and recombined to deliver different applications to users. web services is an implementation of soa principles.22 the term describes the use of technologies such as xml to enable the seamless interoperability of web-based applications. ajax enables web services and hence enables soa principles. thus, the adoption of ajax facilitates the move toward soa and all the advantages of reuse and integration that this offers. ■ arc arc is an experimental open-source metasearch package available for download from the sourceforge opensource foundry.23 it can be configured to harvest open using ajax to empower dynamic searching | wusteman 59 archives initiative-protocol for metadata harvesting (oai-pmh)-compliant data from multiple repositories.24 the harvested results are stored in a relational database and can be searched using basic web forms. arc’s advanced search form is illustrated in figure 1. ■ applying ajax to the search gui the use of ajax has the potential to narrow the gulf between the responsiveness of guis for web applications and those for desktop applications. the flexibility, usability, and richness of the latter are now possible for the former. the ojax gui, illustrated in figure 2, has been developed to demonstrate how ajax can improve the richness of arc-like guis. ojax, including full source code, is available under the open-source apache license and is hosted on sourceforge.25 ojax comprises a client-side gui, implemented in javascript and html, and server-side metasearch web services, implemented in java. the web services connect directly to a metasearch database created by arc from harvested repositories. the database connectivity leverages several libraries from the apache jakarta project, which provides open-source java solutions.26 ■ development process the ojax gui was developed iteratively using agile software development methods.27 features were added incrementally and feedback gained from a proxy user. in order to gain an in-depth understanding of the system and the implications for the remainder of the gui, features were initially built from scratch, using objectoriented javascript.they were then rebuilt using three open-source javascript libraries: prototype, script.aculo .us, and rico.28 prototype provides base ajax capability. it also includes advanced functionality for object-oriented javascript, such as multiple inheritance. the other two libraries are built on top of prototype. the script.aculo. us library specializes in dynamic effects, such as those used in auto-completion. the rico library, developed by sabre, provides other key javascript effects—for example, dynamic scrollable areas and dynamic sorting.29 ■ storyboard one of the aims of the national information standards organization (niso) metasearch initiative is to enable all library users to “enjoy the same easy searching found in web-based services like google.”30 adopting this approach, ojax incorporates the increasingly common concept of the search bar, popularized by the google toolbar.31 ojax aims to be as simple, uncluttered, and unthreatening as possible. the goal is to reflect the simple-search experience while, at the same time, providing the power of an advanced search. thus, the user interface has been kept as simple as possible while maintaining equivalent functionality with the arc advanced search interface. all arc functionality, with the exception of the grouping feature, is provided. to help the intuitive flow of the operation, the fields are set out as a sentence: find [term(s)] in [all archives] from [earliest year] until [this year] in [all subjects] tool tips are available for text-entry fields. by default, searching is on author, title, and abstract. these fields map to the creator, title, and description dublin core metadata fields harvested from the original repositories.32 the search can be restricted by deselecting unwanted fields. arc supports both mysql and oracle databases.33 mysql has been chosen for ojax as mysql is an open-source database. boolean search syntax has been figure 1. arc’s advanced search form figure 2. the ojax metasearch user interface 60 information technology and libraries | june 2006 implemented in ojax to allow for more powerful searching. the syntax is similar to that used by google in that it identifies and/or and exact phrase functionality by +/and “ ”. hence it preserves the user’s familiarity with basic google search syntax. however, it is not as powerful as the full google search syntax; for example, it does not support query modifiers such as: intitle: 34 the focus of this research is the application of ajax to the search gui and not the optimization of the power or expressive capability of the underlying search engine. however, the implementation of an alternative back end that uses a full-text search engine, such as apache lucene, would improve the expressive power of advanced queries.35 full-text search expressiveness is likely to be key to the usability of ojax, ensuring its adequacy for the advanced user without alienating the novice. ■ unifying the user interface one of the main aims of ojax is the unification of the user interface. instead of offering distinct options for simple and advanced search and for refining a completed search, the interface is sufficiently dynamic to make this unnecessary. the user need never navigate between pages because all options, both simple and advanced, are available from the same page. and all results are made available on that same page in the form of a scrollable list. the only point at which a new page is presented is when the resource identifier of a result is clicked. at this stage, a pop-up window, external to the ojax session, displays the full metadata for that resource. this page is generated by the external repository from which the record was originally harvested. simple and advanced search options are usually kept separate because most users are unwilling or unable to use the latter.36 furthermore, the design of existing search-user interfaces is based on the assumption that the retrieval of results will be sufficiently time-consuming that users will want to have selected all options beforehand. with ojax, however, users do not have to make a complete choice of all the options they might want to try before they see any results. as data are entered, answers flow to accommodate them. because the interface is so dynamic and responsive and because users are given immediate feedback, they do not have to be concerned about wasting time due to the wrong choice of search options. users iterate toward the search results they require by manipulating the results in real time. the reduced level of investment that users must make before they achieve any return from the system should encourage them to experiment, hence promoting enactive learning. ■ auto-completion in order to provide instant feedback to the user, the search-terms field and the subject field use ajax to autocomplete user entries. figure 3 illustrates the result of typing smith in the search-terms field. a list is automatically dropped down that itemizes all matches and the number of their occurrences. users select the term they want, the entire field is automatically completed, and a search is triggered. the arc system denormalizes some of the harvested data before saving them in its database. for example, it merges all the author fields into one single field, each name separated by a bar character. to enable the ojax auto-completion feature, it was necessary to renormalize the names. a new table is used to store each name in a separate row; names are referenced by the resource identifier. to enable this, arc’s indexing code was updated so that it creates this table as it indexes records extracted from the oai-pmh feed. in its initial implementation, ojax uses a simple algorithm for auto-completion. future work will involve developing a more complex heuristic that will return results more closely satisfying user requirements. ■ auto-search as already mentioned, a central theme of ojax is the attempt to reduce the commitment necessary from users before they receive feedback on their actions. one way in which dynamic feedback is provided is the triggering of an immediate search whenever an entire option has been selected. examples of entire options include choice of an archive or year and acceptance of a suggested autocompletion. in addition, the following heuristics are used to identify when a user is likely to have finished entering a search term and, thus, when a search should be triggered: 1. entering a space character in the search-terms field or subject field 2. tabbing out of a field after having modified its contents 3. five seconds of user inactivity for a modified field the third heuristic aims to catch some of the edge cases that the other heuristics may miss. it is assumed likely that a term has been completed if a user has made no edits in the last five seconds. as each term will be using ajax to empower dynamic searching | wusteman 61 separated by a space, it is only the last term in a search phrase that is likely not to trigger an auto-search via the first heuristic. users can click the search button whenever they wish, but they should never have to click it. the zuggest system abandons the search button entirely; ojax retains it, mainly in order to avoid confounding user expectations.37 while a search is in progress, the search button is greyed out and acquires a red border. this is particularly useful in alerting the user that a search has been automatically triggered. this is the only feature of ojax that may have an impact on network load in terms of slightly higher traffic. however, the increased number of requests is offset by a reduction in the size of each response because the gui is not downloaded with it. for example, initiating a search in arc results in an average response size of 57.32k. the response is in the form of a complete html page. initiating a search in ojax results in an average response size of 7.96k. the latter comprises a web service response in xml. in other words, more than seven ojax autosearches would have to be triggered before the size of the initial search result in arc was exceeded. ■ dynamic archive list the use of ajax enables a static html page to contain a small component of dynamic data without the entire page having to be dynamically generated on the server. ojax illustrates this: the contents of the drop-down box listing the searchable archives are not hard-coded in the html page. rather, when the page is loaded, an ajax request for the set of available archives is generated. this is a useful technique; static html pages can be cached by browsers and proxy servers, and only the dynamic portion of the data, perhaps those used to personalize the page, need be downloaded at the start of a new session. ■ dynamic scrolling searches commonly produce thousands of results. typical systems, such as google and arc, make these results available via a succession of separate pages, thus requiring users to navigate between them. finding information by navigating multiple pages can take longer than scrolling down a single page, and users rarely look beyond the second page of search results.38 to avoid these problems and to encourage users to look at more of the available results, those results could be made available in one scrollable list. but, in a typical non-ajax application, accessing a scrollable list of, say, two thousand items would require the entire list to be downloaded via one enormous html page. this would be a huge operation; if it did not crash the browser, it would, at least, result in a substantial wait for the user. the rico library provides a feature to enable dynamic scrollable areas. it uses ajax to fetch more records from the server when the user begins to scroll off the visible area. this is used in the display of search results in ojax, as illustrated in figure 4. to the user, it appears that the scrollable list is seamless and that all 4,678 search results are already downloaded. in fact, only 386 have been downloaded. the rest are available at the server. as the user scrolls further down, say to item 396, an ajax request is made for the next ten items. any item downloaded is cached by the ajax engine and need not be requested again if, for example, the user scrolls back up the list. a dynamic information panel is available to the right of the scroll bar. it shows the current scroll position in relation to the beginning and end of the results set. in figure 3. auto-completion in the search terms field figure 4. display of search results and dynamic information panel 62 information technology and libraries | june 2006 figure 4, the information panel indicates that there are 4,678 results for this particular search and that the current scroll position is at result number 386. this number updates instantly during scrolling, preserving the illusion that all results have been downloaded and providing users with dynamic feedback on their progress through the results set. this means that users do not have to wait for the main results window to refresh to identify their current position. ■ auto-expansion of results ojax aims to provide a compact display of key information, enabling users to see multiple results simultaneously. it also aims to provide simple access to full result details without requiring navigation to a new web page. in the initial results display, only one line each of the title, authors, and subject fields, and two lines of the abstract, are shown for each item. as the cursor is placed on the relevant field, the display expands to show any hidden detail in that field. at the same time, the background color of the field changes to blue. when the cursor is placed on the bar containing the resource identifier, all display fields for that item are expanded, as illustrated in figure 5. this expansion is enabled via simple cascading style sheet (css) features. for example, the following css declaration hides all but the first line of authors: #searchresults td div { overflow:hidden; height: 1.1em } when the cursor is placed on the author details, the overflow becomes visible and the display field changes its dimensions to fit the text inside it: #searchresults td div:hover { overflow:visible; height:auto } ■ sorting results another method used by ojax to minimize upfront user investment is to provide initial search results before requiring the user to decide on sort options. because results are available so quickly and because they can be re-sorted so rapidly, it is not necessary to offer pre-search selection of sort options. ajax facilitates rapid presentation of results; after a re-sort, only those on the first screen must be downloaded before they can be presented to the user. results may be sorted by title, author, subject, abstract, and resource identifier. these options are listed on the gray bar immediately above the results list. clicking one of these options sorts the results in ascending order; an upward-pointing arrow appears to the right of the sort option chosen, as illustrated in figure 6. clicking on the option again sorts in descending order and reverses the direction of the arrow. clicking on the arrow removes the sort; the results revert to their original order. functionality for the sort feature is provided by the rico javascript library. server-side implementation supports these features by caching search results so that it is not necessary to regenerate them via a database query each time. figure 5. auto-expansion of all fields for item number 386 figure 6. results being sorted in ascending order by title using ajax to empower dynamic searching | wusteman 63 ■ search history several experimental systems—for example, zuggest— have employed ajax to facilitate a search-history feature. a similar feature could be provided for ojax. a button could be added to the right of the results list. when chosen, it could expand a collapsible search-history sidebar. as the cursor was placed on one of the previous searches listed in the sidebar, a call out, that is, a speech bubble, could be displayed. this could provide further information such as the number of matches for that search and a summary of the search results clicked on by the user. clicking one of the previous searches would restore those search results to the main results window. this feature would take advantage of the ajax persistent javascript engine to maintain the history. its use could help counter concerns about ajax technology “breaking” the back button; the feature could be implemented so that the back button returned the user to the previous entry in the search history.39 in fact, this implementation of back-button functionality could be more useful than the implementation in google, where hitting the back button is likely to take the user to an interim results page; for example, it might simply take the user from page 3 of results to page 2 of results. ■ scrapbook users browsing through search results on ojax would require some simple method of maintaining a record of those resource details that interested them. ajax could enable the development of a useful scrapbook feature to which such resource details could be copied and stored in the persistent javascript engine. ojax could further leverage a shared bookmark web service, such as del. icio.us or furl, to save the scrapbook for use in future sessions and to share it with other members of a research or interest group.40 ■ potential developments for ojax as well as searching a database of harvested metadata, the ojax user interface could also be used to search an oai-pmh-compliant repository directly. with appropriate implementation, all of ojax’s current features could be made available, apart from auto-completion. a recent development has enabled the direct indexing of repositories by google using oai-pmh.41 the latter provides google with additional metadata that can be searched via the google web services apis. the current ojax web services could be replaced by the google apis, thus eliminating the need for ojax to host any server-side components. hence, ojax could become an alternative gui for google searching. ■ conclusion ojax demonstrates that the use of ajax can enable features in web applications that, until now, have been restricted to desktop applications. in ojax, it facilitates a simple, nonthreatening, but powerful search user interface. page navigation is eliminated; dynamic feedback and a low initial investment on the part of users encourage experimentation and enable enactive learning. the use of ajax could similarly transform other web applications aimed at library patrons. however, ajax is still maturing, and the barrier to entry for developers remains high. we are a long way from an ajax button appearing in dreamweaver. reusable, well-tested components, such as rico, and software frameworks, such as ruby on rails, sun’s j2ee framework, and microsoft’s atlas, will help to make ajax technology accessible to a wider range of developers.42 as with all new technologies, there is a temptation to use ajax simply because it exists. as ajax matures, it is important that its focus does not become the enabling of “cool” features but remains the optimization of the user experience. references and notes 1. ojax homepage, http://ojax.sourceforge.net (accessed apr. 5, 2006). 2. j. j. garrett, “ajax: a new approach to web applications,” feb. 18, 2005, www.adaptivepath.com/publications/ essays/archives/000385.php (accessed nov. 11, 2005). 3. ibid. 4. j. nielsen, “the need for speed,” alertbox mar. 1, 1997, www.useit.com/alertbox/9703a.html (accessed nov. 11, 2005). 5. dynamic html and xml: the xmlhttprequest object, http://developer.apple.com/internet/webcontent/xmlhttpreq .html (accessed apr. 5, 2006). 6. javascript object notation, wikipedia definition, http:// en.wikipedia.org/wiki/json (accessed apr. 5, 2006). 7. google gmail, http://mail.google.com (accessed apr. 5, 2006); google suggest, www.google.com/webhp?complete =1&hl=en (accessed apr. 5, 2006); google groups, http://groups .google.com (accessed apr. 5, 2006); google maps, http://maps .google.com (accessed apr. 5, 2006). 8. p. binkley, “ajax and auto-completion,” quædam cuiusdam blog may 18, 2005, www.wallandbinkley.com/quaedam/?p=27 (accessed nov. 11, 2005). 9. francis shanahan, zuggest, www.francisshanahan.com/ zuggest.aspx (accessed apr. 5, 2006). 64 information technology and libraries | june 2006 10. a. rhyno, “ajax and the rich web interface,” librarycog blog apr. 10, 2005, http://librarycog .uwindsor.ca:8087/artblog/librarycog/1113186562 (accessed nov. 11, 2005); r. tennant, “tennant’s top tech trend tidbit,” lita blog june 22, 2005, http://litablog.org/?p=35 (accessed nov. 11, 2005). 11. t. hickey, “ajax and web interfaces,” outgoing blog, mar. 31, 2005. retrieved nov. 11, 2005 http://outgoing.typepad .com/outgoing/2005/03/web_application.html. 12. oclc deweybrowser. http://ddcresearch.oclc.org/ ebooks/fileserver (accessed apr. 5, 2006). 13. hickey, “ajax and web interfaces.” 14. j. wusteman, “from ghostbusters to libraries: the power of xul,” library hi tech 23, no 1 (2005a). retrieved nov. 11, 2005 www.ucd.ie/wusteman/; cover pages, microsoft extensible application markup language (xaml), http://xml.cover pages.org/ms-xaml.html (accessed apr. 5, 2006). 15. google extensions for firefox, http://toolbar.google .com/firefox/extensions/index.html (accessed apr. 5, 2006). 16. c. adams, “ajax: usable interactivity with remote scripting,” sitepoint. (jul. 13, 2005), www.sitepoint.com/article/ remote-scripting-ajax (accessed nov. 11, 2005). 17. xhtml 2.0, w3c working draft, may 27, 2005, www .w3.org/tr/2005/wd-xhtml2-20050527 (accessed apr. 5, 2006). 18. client/server model, http://en.wikipedia.org/wiki/ client/server (accessed apr. 5, 2006). 19. dll hell, http://en.wikipedia.org/wiki/dll_hell (accessed apr. 5, 2006). 20. j. nielsen, “the need for speed.” 21. service-oriented architecture, http://en.wikipedia.org/ wiki/service-oriented_architecture (accessed apr. 5, 2006). 22. j. wusteman, “realizing the potential of web services,” oclc systems & services: international digital library perspectives 22, no. 1 (2006): 5–9. 23. arc—a cross archive search service, old dominion university digital library research group, http://arc.cs.odu .edu (accessed apr. 5, 2006); niso metasearch initiative, www .niso.org/committees/ms_initiative.html (accessed apr. 5, 2006); arc download page, sourceforge, http://oaiarc.source forge.net (accessed apr. 5, 2006). 24. open archives initiative protocol for metadata harvesting, www.openarchives.org/oai/openarchivesprotocol.html (accessed apr. 5, 2006). 25. ojax download page, sourceforge, http://sourceforge .net/projects/ojax (accessed apr. 5, 2006). 26. apache jakarta project, http://jakarta.apache.org (accessed apr. 5, 2006); apache jakarta commons dbcp, http:// jakarta.apache.org/commons/dbcp (accessed apr. 5, 2006); apache jakarta commons dbutils, http://jakarta.apache.org/ commons/dbutils (accessed apr. 5, 2006). 27. agile software development definition, wikipedia, http://en.wikipedia.org/wiki/agile_software_development (accessed apr. 5, 2006). 28. prototype javascript framework, http://prototype.conio .net (accessed apr. 5, 2006); script.aculo.us, http://script.aculo .us (accessed apr. 5, 2006); rico, http://openrico.org/rico/ home.page (accessed apr. 5, 2006). 29. sabre, www.sabre.com (accessed apr. 5, 2006). 30. niso metasearch initiative, www.niso.org/committees/ ms_initiative.html (accessed apr. 5, 2006). 31. google toolbar, http://toolbar.google.com (accessed apr. 5, 2006). 32. dublin core metadata initiative, http://dublincore.org (accessed apr. 5, 2006). 33. mysql, www.mysql.com (accessed apr. 5, 2006). 34. google help center, advanced operators, www.google .com/help/operators.html (accessed apr. 5, 2006). 35. apache lucene, http://lucene.apache.org (accessed apr. 5, 2006). 36. j. nielsen, “search: visible and simple,” alertbox may 13, 2001, www.useit.com/alertbox/20010513.html (accessed nov. 11, 2005). 37. francis shanahan, zuggest. 38. j. r. baker, “the impact of paging versus scrolling on reading online text passages,” usability news 5, no. 1 (2003), http://psychology.wichita.edu/surl/usabilitynews/51/ paging_scrolling.htm (accessed nov. 11, 2005); j. nielsen, “search: visible and simple.” 39. j. j. garrett, “ajax: a new approach to web applications.” 40. del.icio.us, http://del.icio.us (accessed apr. 5, 2006); furl, www.furl.net (accessed apr. 5, 2006). 41. google sitemaps (beta) help, www.google.com/web masters/sitemaps/docs/en/other.html (accessed apr. 5, 2006). 42. ruby on rails, www.rubyonrails.org (accessed apr. 5, 2006); java 2 platform, enterprise edition (j2ee), http://java .sun.com/j2ee (accessed apr. 5, 2006); m. lamonica, “microsoft gets hip to ajax,” cnet news.com, june 27, 2005, http:// news.com.com/microsoft+gets+hip+to+ajax/2100-1007_3 -5765197.html (accessed nov. 11, 2005). lib-mocs-kmc364-20140106084018 title-only entries retrieved by use of trunca1'ed search keys 207 frederick g. kilgour, philip l. long, eugene b. liederman, and alan l. landgraf: the ohio college library center, columbus, ohio. an experiment testing utility of truncated search keys as inquiry terms in an on-line system was performed on a file of 16,792 title-only bibliographic entries. use of a 3,3 key yields eight or fewer entries 99.0% of the time. a previous paper ( 1) established that truncated derived search keys are efficient in retrieval of entries from a name-title catalog. this paper reports a similar investigation into the retrieval efficiency of truncated keys for extracting entries from an on-line, title-only catalog; it is assumed that entries retrieved would be displayed on an interactive terminal. earlier work by ruecking (2), nugent (3), kilgour (4), dolby (5), coe ( 6), and newman and buchinski ( 7) were investigations of search keys designed to retrieve bibliographic entries from magnetic tape files. the earlier paper in this series and the present paper investigate retrieval from on-line files in an interactive environment. similarly, the work of rothrock ( 8) inquired into the efficacy of derived truncated search keys for retrieving telephone directory entries from an on-line file. since the appearance of the previous paper, the ohio state university libraries have developed and activated a remote catalog access and circulation control system employing a truncated derived search key similar to those described in the earlier paper. however, osu adopted a 4,5 key consisting of the first four characters of the main entry and the first five characters of the title excluding initial articles and a few other nonsignificant words. whereas the osu system treats the name and title as a continuous string of characters, the experiments reported in this and the previous paper deal only with the first word in the name and title, articles always being excluded. 208 journal of library automation vol. 4/4 december, 1971 the bell system has also recently activated a large traffic experiment in the san francisco bay area. the master file in this system contains 1,300,000 directory entries. the system utilizes truncated derived keys like those investigated in the present experiments. materials and methods the file used in this experiment was described in the earlier paper ( 1), except that this experiment investigates the title-only entries. the same programs used in the name-title investigation were used in this experiment; the title-only entries were edited so that the first word of the title was placed in the name field and the .11emaining words in the title field. as was the case formerly, it was necessary to clean up the file. single word titles often carried in the second or title field such expressions as one year subscription or vol 16 1968. in addition there were spurious character strings that were not titles, and in such cases the entire entry was removed from the file. thereby, the original 17,066 title entries were reduced to 16,792. the truncated search keys derived from these title-only entries consist of the initial characters of the first word of the title and of the second word of the title. if there was no second word, blanks were employed. if either the first or second word contained fewer characters than the key to be derived, the key was left-justified and padded out with blanks. to obtain a comparison of the effectiveness of truncated research keys derived from title-only entries as related to first keys derived from nametitle entries, a name-title entry fil e of the same number of entries ( 16,792) was constructed. a series of random numbers larger than the number of entries in the original name-title file ( 132,808 ) was generated and one of the numbers was added to each of the 132,808 name-title entries in sequence. next the fil e was sorted by number so that a randomized file was obtained. then the first 16,792 name-title entries were selected. the same program analyzed keys d erived from this file. results table 1 presents the maximum number of entries to be expected in 99% of replies for the file of 16,792 title-only entries as well as for the nametitle file containing the same total of entries. for example, when a large number of random requests are put to the title-only file using a 3,3 search key, the prediction is that 99.0% of the time, eight or fewer replies will be returned. however, in the case of the name-title file , only two replies will be returned 99.3% of the time. the 3,3 key produced only thirteen replies ( .12% of the total number of 3,3 keys) containing twenty-one or more entries. the highest number of entries for a single reply for the 3,3 key was 235 ( "jou ,of" d erived from journal of ) . the next highest number of replies was 88 ("adv, in" for advances in ) . trun cated search keys j kilgour 209 table 1. maximum number of entries in 99% of replies search key title-only entries name-title entries percent max imum ent1·ies maximum entries percent per reply of time per reply of time ~2 ~ ~1 7 99.0 ~3 ~ ~1 4 99.6 2,4 11 99.0 3 99.5 3,2 9 99.1 3 99.2 3,3 8 99.0 2 99.3 3~ 8 ~1 2 99.5 4,2 8 99.1 2 99.2 4,3 7 99.0 2 99.6 4,4 7 99.1 2 99.7 discussion the two words from which the keys are derived in name-title entries constitute a two-symbol markov string of zero order, since the name string and title string are uncorrelated. however, the two words from which keys are derived in the title-only entry are first order markov strings, since they are consecutive words from the title string and are correlated. the consequence of these two circumstances on the effective ness of derived keys is clearly presented in table 1. the keys from name-title entries consistently produce fewer maximum entries per reply. therefore, it is desirable to derive keys from zero order markov strings wherever possible. the ohio state university libraries contain over two and a quarter million volumes, but on 9 february 1971 there were only 47,736 title-only main entries in the catalog. the file used in the present experiment is 35% of the size of the osu file. since 99% of the time the 3,3 key yields eight or fewer titles, it is clear that such a key will be adequate for retrieval for library on-line, title-only catalogs. the 3,3 key also posse sses the attractive quality of eliminating the majority of human misspe1ling as pointed out in the earlier paper ( 1). there remains, however, the unsolved problem of the efficient retrieval of such titles as those beginning with "journal of" and "advances in". it appears that it will be necessary to devise a special algorithm for those relatively few titles that produce excessively high numbers of entries in replies. in the previous investigation it was found that a 3,3 key yielded five or fewer replies 99.08% of the time from a fil e of 132,808 name-title entries. table 1 shows that for a file of only 16,792 entries the 3,3 key produces two or fewer replies 99.3% of the time . these two observations suggest that as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio, since the maximum 210 journal of library automation vol. 4/4 december, 1971 number of entries rose from two to five while the total size of the file increased from one to approximately eight. further research must be done in this area to determine the relative behavior of derived truncated keys as their associated file sizes vary. conclusion this experiment has produced evidence that a series of truncated search keys derived from a first order markov word string in a bibliographic description yields a higher number of maximum entries per reply than does a series derived from a zero order markov string. however, the results indicate that the technique is nonetheless sufficiently efficient for application to large on-line library catalogs. use of a 3,3 search key yields eight or fewer entries 99.0% of the time from a file of 16,792 title-only entries. acknowledgment this study was supported in part by national agricultural library contract 12-03-01-5-70 and by office of education contract oec-0-72-2289 (506). references 1. f. g. kilgour; p. l. long; e. b. leiderman: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7 ( 1970), pp. 79-82. 2. f. h. ruecking, jr.: "bibliographic retrieval from bibliographic imput; the hypothesis and construction of a test," journal of library automation 1 (december 1968), 227-38. 3. nugent, w. r.: "compression word coding techniques for information retrieval," ] ournal of library automation 1 ( december 1968 ) , 250-60. 4. f. g. kilgour: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5 ( 1968), pp. 133-36. 5. j. l. dolby: "an algorithm for variable-length proper-name compression," ] ournal of library automation 3 (december 1970), 257-75. 6. m. j. coe: "mechanization of library procedures in the medium-sized medical library: x. uniqueness of compression codes for bibliographic retrieval,'' bulletin of the medical library association 58 (october 1970), 587-97. 7. w. l. newman; e. j. buchinski: "entry/title compression code access to machine readable bibliographic files," journal of library automation 4 (june 1971 ), 72-85. 8. h. i. rothrock, jr.: computer-assisted directory search; a dissertation in electrical engineering. (philadelphia: university of pennsylvania, 1968). lib-mocs-kmc364-20131012112749 147 who rules the rules? "why can't the english teach their children how to speak?" wondered henry higgins, implying that a lack of widely and consistently followed rules of usage created linguistic backwardness and anarchy. higgins' question might be rephrased today as: "when will the code teach its founders how to catalog?" the library of congress has historically fitted catalog codes to its own practices rather than following them slavishly. the best example is the lamentable policy of superimposition: continued use of preestablished forms of names that are not in compliance with the paris principles or aacrl. this was a cause of widespread confusion and complaint and the practice was eventually discontinued ... well, sort of discontinued. the various interpretations of aacrl, the inclusion of new rules, and pressure for further modifications eventually led to the drafting of aacr2, a code that was supposed to end variance and controversial practices. one might assume that including lc as a principal author of the new text and an lc official as one of the editors might result in a code that it could actually follow. judging by the spate of exceptions and interpretations made so far (more than 300), this has not been the case. in the place of superimposition, we have new impositions known as "compatible headings." they may not be readily ascertained according to the rules, but have been granted a sort of bibliographic squatter's rights. although it would be simpler for catalogers to follow the rules consistently, they must instead check several cataloging service bulletins and name authorities to see whether lc has determined that a given personal, corporate, or serial name is already "compatible" with aacr2. this can result in cataloging delays, higher processing costs, and inconsistent entries. aacr2 and uncertainties regarding its application by lc have been widely credited with lower cataloging productivity. this is not to imply that lc is behaving in a strictly arbitrary or capricious manner vis-a-vis the code. they can be seen as caught on the horns of a trilemma, with vast internal needs and increasing external demands competing for a shrinking budget. president reagan may have whispered sweet nothings during national library week, but during budget hearings it became clear that libraries are not as "truly needy" as impoverished generals and interior decorators. decisions to depart from aacr2 have been based primarily on cost factors. the decision by the rtsd catalog code revision committee and the joint steering committee not to consider cost and implementation factors has led both to widespread opposition to the code resulting in a one-year delay in implementation, and to the modifications that lc has made and is making. some variations such as using "dept." for "depart148 journal of library automation vol. 14/3 september 1981 ment" and "house" for "house of representatives" make fiscal and common sense. many other lc changes are simply bibliographic nit-picking, minor irritants to catalogers who must flip back and forth between the text of aacr2 and half a dozen bulletins to settle a minor point of description. why didn't lc representatives attempt to say, "wait a minute-we just can't do that now," while the code was being considered rather than after it was published? anyway, considering that lc was starting up a whole new catalog and closing the old one, one wonders why rules not to be applied retrospectively had to be tinkered with to such an extent. major questions still to be resolved include not only the compatiblename quandary, but the treatment of serials, microform reproductions, establishment of corporate names and determination of when works "emanate from" corporate bodies, and the romanization of slavic names. the decision to use title entry for serials and monographic series even in the case of generic titles has been controversial. there are, of course, exceptions to the rules, and there will be differences in how uncertain catalogers construct complex entries with parenthetical modifiers. unfortunately, rules establishing entries for serials have sometimes been muddied rather than clarified in the bulletin. consider the example in the winter 1981 issue wherein the bulletin of the engineering station of west virginia university is entered under "bulletin," while the same publication for the entire university is entered under "west virginia university bulletin." also, consider the complex cross-reference structure required to direct users between the two files, both of which may well be split again,' historically, between author/ title and title main entry. this is a special problem in the case of large monographic series generated by corporate bodies. the lc position on microform reproductions of previously published works is clearer, but is still a point of controversy. they have decided to provide the imprint and collation (er, make that "publication, distribution, etc., area" and "physical description area") of the original work, with a description of the microform in a note. in other words, they're sticking to aacrl. the rtsd ccs committee on cataloging: description and access is currently trying to resolve this conflict, one in which many research libraries have sided with lc. this body is also trying to unravel the mystique of "corporate emanation'' introduced in aacr2. another sore point has been the lc decision to follow an alternative rule, which prefers commonly known forms of romanized names over those established via systematic romanization. that lc is correctly following the spirit of the general principle for personal names is little comfort to research libraries with large slavic collections. how are other libraries responding to the murky form of aacr2? some are closing old card catalogs and continuing them with com or temporary card supplements. some of these are establishing cross-reference links between variant forms of names between catalogs, while others are not. editorial/dwyer 149 some are keeping their catalogs open and shifting files, while others are splitting files. some are shifting some files and splitting others. aa cr2 was intended to provide headings that could be easily ascertained by the user. ironically, the temporary result is scrambled catalogs: access systems involving multiple lookups and built-in confusion . until most bibliographic records are in machine-readable form under reliable authority control this will continue to be the case. authority control, it would seem, has long been an idea whose time has come but whose application is yet to be realized. the cooperative efforts of the library of congr~s and the major bibliographic utilities to establish reliable automated authority control will do much to ameliorate the problems presented by aacr2. it would also be helpful if lc, perhaps with the financial assistance of other libraries, networks, and foundations, would publish what might be called aacr2¥2-not a new edition of the code but one accurately reflecting actual lc practice. finally, future code makers would be wise to consider cost and other implementation factors in their deliberations. professor higgins, ever the optimist, would rather sing "wouldn't it be !overly" than hear another verse of "i did it my way." james r. dwyer editor's notes title change it often seems that the only things that change their names as often as library publications are standards organizations. not to be left out, jola will be called information technology and libraries beginning with volume 1, number 1, the march 1982 issue . this name was approved by the lit a board in san francisco this june as more accurately reflecting the true scope of the journal. new section with this issue, we are initiating a new section: "reports and working papers." this is intended to help disseminate documents of particular interest to the]ola readership. we solicit suggestions of documents, often developed as working papers for a specific purpose or group but of interest and value to our readership. in general, documents in this section are neither refereed nor edited. mitch i take great personal pleasure in publishing mike malinconico's speech upon presenting the 1981 lita award to mitch freedman. readers' comments we do continue to solicit suggestions about the journal but receive few. is anybody reading it? if you have any thoughts about what we should or shouldn't do, we would welcome your sharing them. editorial | truitt 3 marc truitteditorial i doubt that many of the blog people are in the habit of sustained reading of complex texts. —michael gorman, 2005 s o, three plus years after the fact, why am i opening with michael gorman’s unfortunate characterization of those he labeled “blog people”? i have no interest in reopening this debate, honestly! but the problem with generalizations, however unfair, is that at their heart there is just enough substance to make them “stick”—to give them a grain or two of credibility. gorman’s words struck a chord in me that existed before his charge and has continued to exist to this day. the substance in gorman’s words had little to do with these “blog people” as such; rather, my interest was piqued by the implications in his remark about how we all deal with “complex texts” and the “sustained reading” of the same. in a time of wide availability of full-text electronic articles, it has become so easy and tempting to cherry pick the odd phrase here or there, without study of the work as a whole. how has scholarship especially been changed by the ease with which we can reduce works to snippets without having considered their overall context? i’m not arguing that scholarly research and writing hasn’t always been at least in part about finding the perfect juicy quotation around which we then weave our own theses. many of us well recall the boxes of 3x5” citation and 5x8” quotation files that we or our patrons laboriously assembled through weeks, months, and years of detailed research. but if the style of compiling these files that i witnessed (and indeed did) is any guide, their existence was the product of precisely that “sustained reading of complex texts” of which gorman spoke. my vague, nagging sense is that what is changing is this style of approaching whole texts. i wondered then about how much scholarly research today is driven by keyword searches of digitized texts that then essentially produce “virtual quotation files” without our having had to struggle with their context in the whole of the original source text? fast forward three years. lately, several articles touching on our changing ways of interacting with resources have appeared in both scholarly and popular venues, and these have served to underline my sense that we are missing something because of our growing lack of engagement with whole texts. writing in the july/august issue of the atlantic monthly, nicholas carr asks “is google making us stupid?” drawing an analogy to the scene in the film 2001: a space odyssey, in which astronaut dave bowman disables supercomputer hal’s memory circuits, carr says i can feel it, too. over the past few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. my mind isn’t going—so far as i can tell—but it’s changing. i’m not thinking the way i used to think. i can feel it most strongly when i’m reading. immersing myself in a book or a lengthy article used to be easy. my mind would get caught up in the narrative or the turns of the argument, and i’d spend hours strolling through long stretches of prose. that’s rarely the case anymore. now my concentration often starts to drift after two or three pages. i get fidgety, lose the thread, begin looking for something else to do. i feel as if i’m always dragging my wayward brain back to the text. the deep reading that used to come naturally has become a struggle.1 carr goes on to explain that “what the net seems to be doing is chipping away my capacity for concentration and contemplation. my mind now expects to take in information the way the net distributes it: in a swiftly moving stream of particles. once i was a scuba diver in the sea of words. now i zip along the surface like a guy on a jet ski.”2 carr’s nagging fear found similar expression among some tech-savvy participants of library online forums; one of the more interesting comments appeared on the web4lib electronic discussion list. in a discussion of the article, tim spalding of librarything observed that he himself had experienced what he dubbed “the google effect” and noted something is lost. . . . human culture often advances by externalizing pieces of our mental life—writing externalizes memory, calculators externalize arithmetic, maps, and now gps, externalize way-finding, etc. each shift changes the culture. and each shift comes with a cost. nobody memorizes texts anymore, nobody knows the times tables past ten or twelve and nobody can find their way home from the stars and the side of the tree the moss grows on.3 meanwhile, another article appeared on a closely related topic, this time in the journal science. james a. evans observed that, because “scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse,” the easy availability of electronic resources was resulting in an “ironic change” for scientific marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | september 2008 scholarship, in that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. the forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.4 evans’s research highlights an additional irony: an unintended benefit to the scholarly process in the paperbased world was “poor indexing,” since it encouraged browsing through less relevant, older, or more marginal literature. this browsing had the effect of “facilitat[ing] broader comparisons and led researchers into the past. modern graduate education parallels this shift in publication—shorter in years, more specialized in scope, culminating less frequently in a true dissertation than an album of articles.”5 what is one to make of all of this? at the outset, i wish to state clearly that i am not some sort of anti e-text luddite. electronic texts are a fact of life, and are becoming moreso every day. even though they are in their infancy as a medium, they’ve already transformed the landscape of bibliographic access. my interest is not with the tool, but with the manner in which we are using it. i began by suggesting that i share with gorman a concern about how we increasingly engage with “complex texts” today. unlike him, though, my concern is not limited only to the so-called blog people (whomever they may be), but indeed, it includes all of us. with the explosion in easily accessible electronic texts, our ideas and habits concerning interaction with these texts are changing, sometimes in unintended ways. in a recent informal survey i conducted of my colleagues at work, i asked, “have you ever read an e-book (not just a journal article) from (virtual) cover to (virtual) cover?” for those whose answer was affirmative, i also asked, “how many such books have you read in their entirety?” out of twenty-odd responses, three individuals answered that yes, they had had occasion to read an entire e-book (for a total of six books among the three “yes” respondents, which seemed surprisingly high to me). of greater interest, though, were those who chose to question the premise of the survey, arguing that people don’t “read” e-books the way that they read paper ones. it does make one wonder, then, how amazon thinks it possesses a viable business model in the kindle e-book reader, for which it currently lists an astounding 140,000+ available e-books. clearly, some e-books are being read as whole texts, by some people, for some purposes. but i suspect that’s another story.6 carr and evans use slightly differing imagery to describe a similar phenomenon. carr closes with a reference back to the death of 2001’s hal, saying, “as we come to rely on computers to mediate our understanding of the world, it is our own intelligence that flattens into artificial intelligence.”7 evans, on the other hand, compares contemporary scientific researchers to newton and darwin, each of whom produced works that “not only were engaged in current debates, but wove their propositions into conversation with astronomers, geometers, and naturalists from centuries past.” twenty-first-century scientists and scholars, by contrast, are able because of readily available electronic resources “to frame and publish their arguments more efficiently, [but] they weave them into a more focused—and more narrow—past and present.” 8 perhaps the most succinct statement, though, comes from librarything’s tim spalding, who summarized the problem thusly: “we advance by becoming dumber.”9 an ital research and publishing opportunity for an inquisitive and enterprising scholar, perhaps? i’d welcome the manuscript! shameless plugs department. by the time you read this, we at ital will have launched our new blog, italica (http://ital-ica.blogspot.com). italica addresses a need we on the ital editorial board have long sensed; that is, an area for “letters to the editor,” updates to articles, supplementary materials we can’t work into the journal—you name it. one of the most important features of italica will be a forum for readers’ conversations with our authors: we’ll ask authors to host and monitor discussion for a period of time after publication so that you’ll then have a chance to interact with them. italica is currently a pilot project. for our first issue we will have begun with a discussion hosted by jennifer bowen, whose article “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase i” was published in the june 2008 issue of ital. for our second italica, we plan to expand coverage and discussion to include all articles and other features in the september issue you now have in hand. italica is sure to become a stimulating supplement to and forum for topics originating in ital. we look forward to seeing you there! references and notes extract. michael gorman, “revenge of the blog people!” library journal (feb. 15, 2005) www.libraryjournal.com/article/ ca502009.html (accessed july 21, 2008). 1. nicholas carr, “is google making us stupid?” the atlantic monthly 301 (july/aug. 2008) www.theatlantic.com/ doc/200807/google (accessed july 23, 2008). editor’s column | truitt 5 2. ibid. 3. tim spalding, “re: ‘is google making us stupid? what the internet is doing to our brains,’” web4lib discussion list post, june 19, 2008, http://article.gmane.org/gmane.education .web4lib/12349 (accessed july 24, 2008). 4. james a. evans, “electronic publication and the narrowing of science and scholarship,” science (july 18, 2008) www .sciencemag.org/cgi/content/full/321/5887/395 (accessed july 24, 2008). emphasis added. 5. ibid. 6. as of 5:30pm (est), july 24, 2008, amazon’s website listed 145,591 “kindle books.” www.amazon.com/s/qid=1216934603/ ref=sr_hi?ie=utf8&rs=154606011&bbn=154606011&rh=n%3a1 54606011&page=1. 7. carr, “is google making us stupid?” 8. evans, “electronic publication and the narrowing of science of scholarship.” 9. spalding, “re: ‘is google making us stupid?’” lib-mocs-kmc364-20140106084043 211 name-title entry retrieval from a marc file philip l. long, head, automated systems research and development and frederick g. kilgour, director: ohio college library center, columbus, ohio a test of validity of earlier findings on 3,3 search-ke y retrieval from an in-process file for retrieval from a marc file. probability of number of entries retrieved per reply is essentially the same for both files. this study was undertaken to test the applicability of previous findings on retrieval of name-title entries from a technical processing system fil e ( 1 ) to retrieval from a marc file; the technique for retrieval employs truncated 3,3 search keys. materials and methods the study cited above employed a file of 132,808 name-title entries obtained from the yale university library's machine aided technical processing system. bibliographic control was not maintained for the generation of records in this file , with the result that the file contained errors that simulated errors in the requests library users put to catalogs. the marc file employed in the present study contains 121,588 name-title entries that are nearly error free. whereas the marc file possesses few records bearing foreign titles, the yale file has a significantly higher percentage of such titles, as would be expected for a large university library. initial articles were deleted in yale titles, but only english articles in marc titles because the language of foreign language titles is not identified in marc. 212 journal of library automation vol. 4/4 december, 1971 design of the program used to analyze the marc file was the same as that for the program employed in the previous study. however, the new program runs on a xerox data systems sigma 5 computer. the test employed the 3,3 search key to make possible comparison with previous results. results table 1 presents the percentage of time that up to five replies can be expected, assuming equal likelihood of key choice. inspection of the table reveals that there is no significant difference between the findings from the yale and the marc files. table 1. probability of number of entries per reply using 3,3, search key number of replies 1 2 3 4 5 discussion cumulative probability percentage yale file marc file 78.58 79.98 92.75 93.28 96.83 96.93 98.40 98.26 99.08 98.91 the same result was expected for the marc file that had been obtained earlier from the yale file. possible influences that might have led to different results were the existence of errors in the yale file, a significant proportion of foreign titles in the yale file as compared to the nearly all-english marc file, and the inability to mechanically delete the initial articles in the few foreign language marc titles. it is most unlikely that the effects of these differences are masking one another. conclusion the findings of a previous study on the effectiveness of retrieval of entries from a large bibliographic file ( 1) by use of a truncated 3,3 search key have been confirmed for a similarly large marc file. reference 1. kilgour, frederick g.; long, philip l. ; leiderman, eugene b.: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science, 7 ( 1970 ), 79-81. lib-mocs-kmc364-20131012114038 278 circulation systems past and present* maurice j. freedman: school of library service, columbia university, new york city. a review of the development of circulation systems shows two areas of change. the librarian's perception of circulation control has shifted from a broad service orientation to a narrow record-keeping approach and recently back again . the technological development of circulation systems has evolved from manual systems to the online systems of today. the trade-ojjs and deficiencies of earlier systems in relation to the comprehensive services made possible by the online computer are detailed. in her 1975 library technology reports study of automated circulation control systems, barbara markuson contrasted what she called "older" and "more recent" views of the circulation function. the "older" or traditional view was that circulation control centered on conservation of the collection and recordkeeping. the "more recent" attitude encompasses "all activities related to the use of library materials. " 1 it appears that this latter outlook is not as new as markuson had suggested. in 1927, jennie m. flexner's circulation work in public libraries described the work of circulation as the "activity of the library which through personal contact and a system of records supplies the reader with the [materials] wanted. "2 flexner went on to characterize four major functions of circulation as follows: (1) the staff must know the books in the collection, and have a working familiarity with them. (2) the staff must know the readers; their wants, interests, etc. (3) the circulation staff must fully understand the library mission and policies and work harmoniously with those in related departments. (4) the circulation department has its own particular duty to perform .... effective routines and techniques must be established by the library and mastered by the staff if the distribution of books is to be properly accomplished and the public is to have *this article is adapted from a speech delivered at rutgers university. manuscript received november 1980; revised may 1981 ; accepted july 1981. circulation systems/freedman 279 the fullest use of the resources of the institution. the library must be able to locate books, on the shelves or in circulation; to know who is using material and how the reader can be traced, if he is misusing or unduly withholding the books drawn. 3 the function of circulation has not changed since flexner's description. even within the context of online circulation systems, it is absolutely essential that the circulation system be seen in as broad a context as possible. it is not merely an electromechanical phenomenon staffed by automatonclerks. circulation services involve that function which is ultimately one of the most fundamental: the satisfactory bringing together of the library user and the materials sought by that person. it follows, then, that the mechanism and means of delivery and control of the service are only a small part, and certainly not the most important part of the circulation function. knowing your collection, your readers, and clearly knowing your library's mission are crucial prerequisites for the effective circulation of library materials. an examination of the history of circulation systems and their evolution to the present state reveals the change in outlook from a narrow view of the circulation function to a broader view. let us begin by establishing the basic elements of record keeping, upon which circulation control is based. there are three categories of records: 1. for the collection of materials, books, tapes, microforms, etc., comprising the library. 2. for the readers or users of the library service. 3. for the wedding or concatenation of the first two, i.e., the library user's use or borrowing of the library's materials. a minimal circulation model is a set of procedures or recordkeeping with respect to only the third category, i.e., records of the materials held by the library user outside of the library. a total or complete system would then be one that provides for all three categories. using these criteria to judge the level of control provided by the various circulation systems of the past, let us review. the earliest method of circulation control was the chain method. in this case, "circulation" is not an accurate term; "use" of materials is more appropriate, as the collection did not circulate. books were chained to the wall and the user did not take the material outside of the library. the minimal circulation model is not met, and records were not required. several hundred years later, the ledger system's first iteration involved a simple notation into a ledger. the identification of the book-call number and/or author and title-and the borrower's identification were recorded. upon the return of the book, the borrower or the receiving clerk initialed the ledger entry or otherwise indicated the return of the item. minimal circulation control is met. a more developed or sophisticated ledger system exceeded this minimal circulation model. the new ledger had each page headed by a different 280 journal of library automation vol. 14/4 december 1981 borrower or registration number. consequently, a given user had all of his or her charges recorded on the given page indicated by the user's number. the economy of not having to write the borrower's name for every transaction was made possible through the creation of a file of patron records linked to the ledger page by common registration numbers. in effect, this was our first "automation." the use of a master file in support of anumbered page provided information that had previously been handwritten every time someone wished to borrow books from the library. the new ledger system also allowed for a more orderly control of charges. only the borrower's number was needed to get at the page of transactions relating to that borrower, as opposed to the former methoda benchmark method, in a sensein which the transactions were chronologically entered and had no other ordering whatsoever. even with the improved ledger system, though, the only ordering was by borrower number and date of issue to the borrower. there was no arrangement that provided for sequencing or finding the books borrowed. the need to identify borrowed books led to the dummy system. every book had a concomitant dummy book (or large card) that had a ruled sheet of paper with the book identification information on it and the borrower's name and/or number. when a user wished to borrow a book, the dummy was pulled from a file and the borrower information was written on the sheet of paper. the dummy was then filed on the shelf occupying the space formerly occupied by the book itself. when the book was returned, it was reshelved, the dummy removed, and the circulation transaction was crossed out. this system is interesting in that it provides for a complete inventory control. either all items are on the shelf in proper sequence or a physical surrogate or record for circulating items is substituted and placed in proper sequence. one has instant and, in effect, "online" access to the presence or absence of materials if one has the call number and can go to the shelf. unlike most systems that can only tell whether or not the book is present, the dummy system tells who has the book and when it was charged. in terms of a minimal model, this system provided less and more than the ledger system. if a reader wanted a list of books he or she borrowed, the reader would have to view every dummy and see if the listed item was charged to him or her. in contrast, the ledger system served such a request well, though every page of the ledger might have to be examined to find out who had borrowed a book not found on the shelf. leaping past several systems, let us now discuss the newark system , the overwhelmingly prevalent system in the united states today (if we include the mechanical or electromechanical versions of dickman, gaylord (the manual, not automated), and demeo). the newark system incorporated the best features of the systems already mentioned. a separate registration file was kept which provided both alphabetic access by patron and numeric access by patron registration circulation systems/freedman 281 number. consequently, the recording of the borrower's identification during circulation transactions only involved the notation of the number. for book identification, a card and matching pocket were placed in each book with the call number and/or author-title identification information. the circulation transaction involved the removal of the card from the pocket and the entering on it, ala dummy system, the date of the transaction and the borrower number. the cards for all of the books borrowed on a given day were aggregated and filed in shelflist sequence in a tray headed by the date of the transactions. resorting to computer jargon, the major or primary sort of the book cards (read circulation cards) was by date, but the minor sort was by call number. consequently, if one wanted to know the status of a given book and one had the call number, it would not take too long to search, even with a file as large as the one in the main branch of newark public library, by looking for the item in all of the different days' charges. when a book was returned, the clerk noted from the date-of-issue card inserted in the book's pocket, the tray in which to search, and the matching call number on the pocket which was used for discharging the book, i.e., removing the charge card from the tray and replacing it in the book. the combination of the books on the shelf plus the cards in the different trays in shelflist order constituted a complete inventory. additionally, the trays of cards comprised a comprehensive record of all current charges, i.e., all transactions by date, call number, and borrower, with borrower number pointing to fuller information in the registration file. looking back at our basic model, the newark system offered not just the minimum-a record of the item and the borrower who took it-but also introduced a major step toward inventory control. there was an inventory sequence involved, or, more accurately, several inventory sequences-one for each given collection (or day) of circulation transactions. what was still missing was a record by borrower of what was charged to him or her. in the original newark system, the borrower's card had entered upon it dates of issue and return of items. this way, even if the library could not tell the user what items (s)he had, the user's card would reflect the number of items outstanding. the handling of reserves, renewals, and overdue notices occurred as follows: a colored clip or some indicator on a circulation card would be used to indicate a reserve. a renewal would be handled the same as a return except the person would wait while the charge card was pulled from the appropriately dated tray, and assuming that no reserves had been placed on the circulation card, the book would be recharged (i.e., renewed) to the borrower. overdues automatically presented themselves by default. cards left in a tray after a predetermined number of days represented charges for which overdues were to be sent. the tray was taken to the registration file and the numerically sequenced registration cards for the delinquent borrowers removed so that notices could be prepared and sent. then the 282 journal of library automation vol. 14/4 december 1981 registration slips and circulation cards had to be refiled at the completion of the process. essentially, most subsequent systems are variants on the newark system. the mcbee key-sort system involves the use of cards with prepunched holes around the edges, one of which can be notched to indicate the date an item is due. the cards are arranged by call number creating a single sequence. the insertion of a knitting needle .like device through a given hole will allow all of the books overdue for a given date to fall free of the deck. this system is like the newark system in that it has inventory and date access, but unlike newark it places a horrible burden on the borrower. each card has (written by the borrower) the borrower's name and address and the call number, author, and title of the book. thus, the library is saved the labor of creating circulation cards and maintaining registration records for every patron-all of the information needed is on the charge card. but here, as marvin scilken has pointed out, the burden of the library's tasks are merely passed on to the users. this point should be emphasized. the next system to be considered is the photo-charge system. microphotos are taken of the borrower's card, which has the name and address on it, the book card (as in the newark book identification card), and a sequentially numbered date-of-issue or date-due slip . again, as with the mcbee, since the photo record includes the borrower's name and address, one can throw away registration files. also, a list or range of transaction numbers is kept by date used. since the numbered date-of-issue slip is placed in the book at the time of charging, and one removes it when the book is returned, it is a simple step to cross off or remove the number on the slip from its corresponding duplicate on the list of numbers for that day's transactions. overdue transactions are found by searching for unchecked transaction numbers on the numerically sequenced microfilm. this system does meet the criterion of the minimal model, a record of the user's use of the item. in terms of labor intensity, one has eliminated the maintenance of charge-card files and registration files by a single microfilm record. reserves, though , are terribly time-consuming with the photo-charge system: each returned book, before it can be returned to the shelf or renewed, must be searched against a call-numbered sequence of reserve cards. academic libraries would not use this kind of system because call-number access is a necessity, especially in relation to recalls of longloaned items . the elimination of paper files is what so commended this system to public libraries over the newark-based systems. but, as was noted, one has virtually no way of determining who took a book out or when it is due back except, in principle, by searching all of the reels of microfilm. some variants on this microfilm system were developed. bro-dart marketed a system that thermographically produced eye-readable records instead of microimages . such was the state of circulation systems before computers began to be used. the following-a discussion of the involvement of computers-can circulation systems/freedman 283 be separated by the type of hardware: main frames, minicomputers, and microcomputers. the main-frame computer has been used primarily in the past as a processing unit for batches of circulation transactions collected and fed to it via punched cards, terminals, or minicomputers. call number and author and title (albeit brief) and user identification number, were captured for each transaction. in the 1960s and into the early 1970s, this information would be batch-processed by the computer and a variety of reports would be produced. what the computer does, then, is keeps track of numbers, their ranges, and the dates of the ranges. but the computer can do much more than this. it is capable, as none of the nonautomated systems were, of rearranging the data input and then comparing and tabulating them as desired and appropriate. consequently, the fact that the call number, author, and title are stored by the machine means that lists or files can be arranged by any of these elements. the same goes for date of transaction. as to borrower identification number, a master file much like the newark registration file is kept (only now in its machine-readable form), and the computer does the comparing at high speed instead of the clerk taking the charge record and going to the numeric file to find the name and address of the borrower. of course, the computer can then readily and quickly print out overdue notices with an obvious absence of clerical support and labor intensity. as we all know, the rate of increase of labor costs in increasing, and the rate of increase of computer costs is decreasing. two kinds of large computer systems have been used. the batchoriented one, which either kept track of items in circulation only (the absence system-only items absent from the collection were tracked), or one that kept track of the entire collection (the inventory system). 4 normally, identification numbers were used for patrons in either system. although relatively rare in academic and public libraries, the mainframe-based online system is also in use. ohio state university is famous for its online system. what is meant here is that all transactions are immediately recorded and all files are instantly updated. printing is still necessary for overdue notices, but printed circulation lists are not necessary because of the online answers to queries regarding books or patrons now possible through terminals distributed to appropriate locations. the minicomputers came on the scene in two stages. clsi's entrance in 1973 utilized one of the early minicomputers, quite small by today's standards. for relatively small libraries that had not begun to dream of having their own computers, it became possible to have an entire inventory (in abbreviated form) and an entire patron file online. consequently, all of the access power of the newark system, and none of its labor intensity, was available online and much more besides. few libraries could afford the main-frame system of ohio state, but many could pay for clsi's, and indeed they did. in the last few years, minicomputers have grown several magnitudes 284 journal of library automation vol. 14/4 december 1981 above the capacity and speed of main-frame computers of the 1960s. consequently, such firms as dataphase, systems control, geac, gaylord, and others offer these larger minis, which can now support online the needs of large branch systems with inventories of hundreds of thousands of books. incidentally, clsi, with a new mini line, can do this now as well. both the miniand maxi-based systems do all of the basic work originally outlined: the whole inventory can be accessed online or with printed lists arranged by author, title, or call number (and, presently, some vendors offer online subject access and cross-references); access can also be made by patron's name. further, the basic transactionitem, borrower, and date-is recorded and checked for holds or delinquency before it is accepted. without overly extolling the present state of the art, it should be said that all of the information identified as important in the earliest systems is now not only available in a far quicker and more usable fashion, it can be manipulated by the machine in a variety of ways to meet and serve management objectives not considered practicable in the past. peter simmons showed how collection development could be aided by automatically generating purchase orders when reserves exceeded a specified acceptable level. 5 all kinds of statistical data regarding collection and patron use can be generated that could not have been possible in a manual mode. while at the university of southwestern louisiana, william mcgrath was able to adjust book budget allocations in terms of collection use and undergraduate major in a most interesting fashion. 6 the net result was an empirically based expenditure of book funds. now the microcomputer or microprocessor is the newly emerging phenomenon , and in many respects it is not unlike the minicomputer of the early 1970s. it is being used to perform single data-recording functions, and is also being seen as the link to the larger computer . so we have moved from chained books to microcomputers the size of a desk top. originally, a great deal of information was captured at great expense and laboriously maintained. certainly the handwritten and typed records of the newark system, although relatively comprehensive, were obtained and preserved at great cost. and, despite it all , there were real limitations of access . the succeeding mcbee and photo-charging systems appreciably cut out-of-pocket costs to the library, but either passed labor directly on to the user, or eliminated access altogether. book or patron access are virtually impossible with the photo-charging method. simply put, that system tells what is overdue, and that's all. the entry in the 1960s of the computer radically altered the ground rules. now all sequences of encoded elements are possible, and management information can be derived. important statistical data pertaining to collection use and library users can be obtained by further manipulating the data accumulated in the circulation process. it is now possible for all but the smallest and the very largest libraries to have access to and control circulation systems/freedman 285 of their materials through the current range of minicomputers on the market. jennie flexner told us that circulation had to be more than maintenance and record keeping of loan and borrower transactions. through the advances of the computer technology and its application to circulation control, we have finally seen what seems to be an optimization of the recordkeeping process and, by extension, an improvement in circulation service. if instantaneous access to patron files, inventory files, and outstanding transaction files through a variety of modes and computer-developed management data does not constitute that optimization, it will have to dountil the real thing comes along. acknowledgment the author is deeply indebted to susan e. bourgault for her editorial assistance. references 1. barbara evans markuson, "automated circulation control," library technology reports quly and sept., 1975), p.6. 2. jennie m. flexner, circulation work in public libraries (chicago: american library assn., 1927), p.l. 3. ibid., p.2. 4. robert mcgee, "two types of design for online circulation systems," journal of library automation 5:185 (sept. 1972). 5. peter simmons, collection development and the computer (vancouver, b.c.: univ. of british columbia, 1971), 60p. 6. william e. mcgrath, "a pragmatic allocation formula for academic and public libraries with a test for its effectiveness," library resources & technical services 19:356-69 (fall1975). maurice j. freedman is an associate professor at the school of library service, columbia university, new york city. 6 information technology and libraries | june 2008 metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1 jennifer bowen the extensible catalog (xc) project at the university of rochester will design and develop a set of open-source applications to provide libraries with an alternative way to reveal their collections to library users. the goals and functional requirements developed for xc reveal generalizable needs for metadata to support a next-generation discovery system. the strategies that the xc project team and xc partner institutions will use to address these issues can contribute to an agenda for attention and action within the library community to ensure that library metadata will continue to support online resource discovery in the future. library metadata, whether in the form of marc 21 catalog records or in a variety of newer metadata schemas, has served its purpose for library users by facilitating their discovery of library resources within online library catalogs (opacs), digital libraries, and institutional repositories. however, libraries now face the challenge of making this wealth of legacy catalog data function adequately within next-generation web discovery environments. approaching this challenge will require: n an understanding of the metadata itself and a commitment to deriving as much value from it as possible; n a vision for the capabilities of future technology; n an understanding of the needs of current (and, where possible, future) library users; and n a commitment to ensuring that lessons learned in this area inform the development of both future library systems and future metadata standards. the university of rochester ’s extensible catalog (xc) project will bring these various perspectives together to design and develop a set of open-source, collaboratively built next-generation discovery tools for libraries. the xc project team seeks to make the best possible use of legacy library metadata, while also informing the future development of discovery metadata for libraries. during phase 1 of the xc project (2006–2007), the xc project team created a plan for developing xc and defined the goals and initial functional requirements for the system. this paper outlines the major metadatarelated issues that the xc project team and xc partner institutions will need to address to build the xc system during phase 2. it also describes how the xc team and xc partners will address these issues, and concludes by presenting a number of issues for the broader library community to consider. while this paper focuses on the work of a single library project, the goals and functional requirements developed for the xc project reveal many generalizable needs for metadata to support a next-generation discovery system.1 the metadata-related goals of the xc project—to facilitate the use of marc metadata outside an integrated library system (ils), to combine marc metadata with metadata from other sources in a single discovery environment, and to facilitate new functionality (e.g., faceted browsing, user tagging)—are very similar to the goals of other library projects and commercial vendor discovery software. the issues described in this paper thus transcend their connection to the xc project and can be considered general needs for library discovery metadata in the near future. in addition to informing the library community about the xc project and encouraging comment on that work, the author hopes that identifying and describing metadata issues that are important for xc—and that are likely to be important for other projects as well—will encourage the library community to set these issues as high priorities for attention and action within the next few years. n the extensible catalog project the university of rochester’s vision for the extensible catalog (xc) is to design and develop a set of open-source applications that provide libraries with an alternative way to reveal their collections to library users. xc will provide easy access to all resources (both digital and physical collections) and will enable library content to be revealed through other web applications that libraries may already be using. xc will be released as open-source software, so it will be available for free download, and libraries will be able to adopt, customize, and extend the software to meet their local needs. the xc project is a collaborative effort between partner institutions that will serve a variety of roles in its development. phase 1 of the xc project, funded by the andrew w. mellon foundation and carried out by the university of rochester river campus libraries between april 2006 and june 2007, resulted in the creation of a project plan for the development of xc. during xc phase 1, the xc project team recruited a number of other institutions that will serve as xc partners and who have agreed to contribute resources toward building and implementing xc during phase 2. xc phase 2 (october 2007 through jennifer bowen (jbowen@library.rochester.edu) is director of metadata management at the university of rochester river campus libraries, new york, and is co-principal investigator for the extensible catalog project. metadata to support next-generation library resource discovery | bowen 7 june 2009) is supported through additional funding from the andrew w. mellon foundation, the university of rochester, and xc partners. during phase 2, the xc project team, assisted by xc partners, will deploy the xc software and make it available as open-source software.2 through its various components, the xc system will provide a platform for local development and experimentation that will ultimately allow libraries to manage and reveal their metadata through a variety of web applications such as web sites, institutional repositories, and content management systems. a library may choose to create its own customized local interface to xc, or use xc’s native user interface “as is.” the native xc interface will include web 2.0 functionality, such as tagging and faceted browsing of search results that will be informed by frbr (functional requirements for bibliographic records)3 and frad (functional requirements for authority data)4 conceptual models. the xc software will handle multiple metadata schemas, such as marc 215 and dublin core,6 and will be able to serve as a repository for both existing and future library metadata. in addition, xc will facilitate the creation and incorporation of user-created metadata, enabling such metadata to be enhanced, augmented, and redistributed in a variety of ways. the xc project team has designed a modular architecture for xc, as shown in the simplified schematic in figure 1. xc will bring together metadata from a variety of sources (integrated library systems, digital repositories, etc.), apply services to that metadata, and display it in a usable way in the web environments where users expect to find it.7 xc’s architecture will allow institutions that implement the software to take advantage of innovative models for shared metadata services, which will be described in this paper. n xc phase 1 activities during the now-completed xc phase 1, the xc project team focused on six areas of activity: 1. survey and understand existing research on user practices. 2. gauge library demand for the xc system. 3. anticipate and prepare for the metadata requirements of the new system. 4. learn about and build on related projects. 5. experiment with and incorporate useful, freely available code. 6. build a community of interest. the xc project team carried out a variety of research activities to inform the overall goals and high-level functional requirements for xc. this research included a literature search and ongoing monitoring of discussion lists and blogs, to allow the team to keep up with the most current discussions taking place about next-generation library discovery systems and related technologies and projects.8 the xc team also consulted regularly with prospective partners and other knowledgeable colleagues who are engaged in defining the concept of a next-generation library discovery system. in order to gauge library demand for the xc system, the team also conducted a survey of interested institutions.9 this paper reports the results of the third area of activity during xc phase 1—anticipating and preparing for the metadata requirements of the new system—and looks ahead to plans to develop the xc software during phase 2. n xc goals and metadata functional requirements the goals of the xc project have significant implications for the metadata functionality of the system, with each goal suggesting specific high-level functional requirements for how the system can achieve that particular goal. the five goals are: n goal 1: provide access to all library resources, digital and non-digital. n goal 2: bring metadata about library resources into a more open web environment. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing. n goal 4: conduct user research to inform system development. n goal 5: publish the xc code as open-source software. figure 1. xc system diagram 8 information technology and libraries | june 2008 an overview of each xc goal and its related high-level metadata requirements appears below. each requirement is then discussed in more detail, with a plan for how the xc project team will address that requirement when developing the xc software. n goal 1: provide access to all library resources, digital and non-digital working alongside a library’s current integrated library system (ils) and its other web applications, xc will strive to bring together access to all library resources, thus eliminating the data silos that are now likely to exist between a library’s opac and its various digital repositories and commercial databases. this goal suggests two fairly obvious metadata requirements (requirements 1 and 2). requirement 1—the system must be capable of acquiring and managing metadata from multiple sources: ilss, digital repositories, licensed databases, etc. a typical library currently has metadata pertaining to its collections residing in a variety of separate online systems: marc data in an ils, metadata in various schemas in digital collections and repositories, citation data in commercial databases, and other content on library web sites. a library that implements xc may want to populate the system with metadata from several online environments to simplify access to all types of resources. to achieve goal 1, xc must be capable of acquiring and managing metadata from all of these sources. each online environment and type of metadata present their own challenges. repurposing marc data repurposing marc metadata from an existing ils will be one of the biggest metadata tasks for a next-generation discovery system such as xc. in planning xc, we have assumed that most libraries will keep their current ils for the next few years or perhaps migrate to a newer commercial or open-source ils. in either case, most libraries will likely continue to rely on an ils’s staff functionality to handle materials acquisition, cataloging, circulation, etc. for the short term. relying upon an ils as a processing environment does not, however, mean that a library must use the opac portion of that ils as its means of resource discovery for users. xc will provide other options for resource retrieval by using web services to interact with the ils in the background.10 to repurpose ils metadata and enable it to be used in various web discovery environments, xc will harvest a copy of marc metadata records from an institution’s ils using the open archives initiative protocol for metadata harvesting (oai-pmh).11 using web services and standard protocols such as oaipmh offers not only a short-term solution for reusing metadata from an ils, but can also be used in both the shortand long-term to harvest metadata from any system that is oai-pmh harvestable, as will be discussed further below. while harvesting metadata from existing systems into xc creates duplication of metadata between an ils and xc, this actually has significant benefits. xc will handle metadata updates through automated harvesting services that minimize additional work for library staff, other than for setting up and managing the automated services themselves. the internal xc metadata cache can be easily regenerated from the original repositories and services when necessary, such as to enable future changes to the internal xc metadata schema. the xc system architecture also makes use of internal metadata duplication among xc’s components, which allows these components to communicate with each other using oaipmh. this built-in metadata redundancy will also enable xc to communicate with external services using this standard protocol. it is important to distinguish the deliberate metadata redundancies built into the xc architecture from the type of metadata redundancies that have been singled out for elimination in the library of congress working group on the future of bibliographic control draft report (recommendation 1.1)12 and previously in the university of california (uc) libraries bibliographic services task force’s final report.13 these other “negative” redundancies result from difficulties in sharing metadata among different environments and cause significant additional staff expense for libraries to enrich or recreate metadata locally. xc’s architecture actually solves many of these problems by facilitating the sharing of enriched metadata among xc users. xc can also adapt as the library community begins to address the types of costly metadata redundancies mentioned in the above reports, such as between the oclc worldcat database14 and copies of that marc data contained within a library’s ils, because xc will be capable of harvesting metadata from any source that uses a standard api.15 metadata from digital repositories and other free sources xc will harvest metadata from various digital collections and repositories, using oai-pmh, and will maintain a copy of the harvested metadata within the xc metadata cache, as shown in figure 1. the metadata services hub architecture provides flexibility and possible economy for xc users by offering the option for multiple xc institutions to share a single metadata hub, thus allowing participating institutions to take full advantage of the hub’s capabilities to aggregate and augment metadata from multiple sources. while the procedure for harvestmetadata to support next-generation library resource discovery | bowen 9 ing metadata from an external repository is not technologically difficult in itself, managing the flow of metadata coming from multiple sources and aggregating that metadata for use in xc will require the development of sophisticated software. to address this, the xc project team is partnering with established experts in bibliographic metadata aggregation to develop the metadata services portion of the xc architecture. the team from cornell university that has developed the software behind the national science digital library’s metadata management system (nsdl/mms)16 is advising the xc team in the development of the xc metadata services hub, which will be built on top of the basic nsdl/mms software. the xc metadata services hub will coordinate metadata services into a reusable task grouping that can be started on demand or scheduled to run regularly. this xc component will harvest xml metadata and combine metadata records that refer to equivalent resources (based on uniform resource identifier [uri], if available, or other unique identifier) into what the cornell team describes as a “mudball.” each mudball will contain the original metadata, the sources for the metadata, and the references to any services used to combine metadata into the mudball. the mudball may also contain metadata that is the result of further automated processing or services to improve quality or to explicitly identify relationships between resources. hub services could potentially record the source of each individual metadata statement within each mudball, which would then allow a metadata record to be redelivered in its original or in an enriched form when requested.17 by allowing for the capture of provenance data for each data element, the hub could potentially provide much more granular information about the origin of metadata—and much more flexibility for recombining metadata—than is possible in most marcbased environments. after using the redeployed nsdl/mms software as the foundation for the xc metadata hub, the xc project team will develop additional hub services to support xc’s functional requirements. xc-specific hub services will accommodate incoming marc data (including marc holdings data for non-digital resources); basic authority control; mappings from marc 21, marcxml,18 and dublin core to an internal xc schema defined within the xc application profile (described below); and other services to facilitate the functionality of the xc user environments (see discussion of requirement 5, below). finally, the xc hub services will make the metadata available for harvesting from the hub by the xc client integration applications. metadata for licensed content for a next-generation discovery system such as xc to provide access to all library resources, it will need to provide access to licensed content, such as citation data and full-text databases. metasearch technology provides one option for incorporating access to licensed content into xc. unfortunately, various difficulties with metasearch technology19 and usability issues with some metasearch products20 make metasearch technology a less-than-ideal solution. an alternative approach would bring metadata from licensed content directly into a system such as xc. the metadata services hub architecture for xc is capable of handling the ingest and processing of metadata supplied by commercial content providers by adding additional services to handle the necessary schema transformations and to control access to the licensed content. the more difficult issue with licensed content may be to obtain the cooperation of commercial vendors to ingest their metadata into xc. pursuing individual agreements with vendors to negotiate rights to ingest their metadata is beyond the original scope of xc’s phase 2 project. however, the xc team will continue to monitor ongoing developments in this area, especially the work of the ethicshare project, which uses a system architecture very similar to that of xc.21 it remains our goal to build a system that will facilitate the inclusion of licensed content within xc in situations where commercial providers have made it available to xc users. requirement 1 summary when considering needed functionality for a next-generation discovery system, the ability to ingest and manage metadata from a variety of sources is of paramount importance. unlike a current ils, where we often think of metadata as mostly static unless it is supplemented by new, updated, and deleted records, we should instead envision the metadata in a next-generation system as being in constant motion, moving from one environment to another and being harvested and transformed on a scheduled basis. the metadata services hub architecture of the xc system will accommodate and facilitate such constant movement of metadata. requirement 2—the system must handle multiple metadata schemas. an extension of requirement 1 will be the necessity for a next-generation system such as xc to handle metadata from multiple schemas, as the system harvests those schemas from various sources. library metadata priorities as a part of the xc survey of libraries described earlier in this paper, the xt team queried respondents about what metadata schemas they currently use or plan to use in the near future. many responding libraries indicated that they expect to increase their use of non–marc 21 metadata within the next three years, although no library indicated the intention to completely move away from 10 information technology and libraries | june 2008 marc 21 within that time period. nevertheless, the idea of a “marc exit strategy” has been discussed in various circles.22 the architecture of xc will enable libraries to move beyond the constraints of a marc-based system without abandoning their ils, and will provide an opportunity for libraries to stage their “marc exit strategy” in a way that suits their purposes. libraries also indicated that they plan to move away from homegrown schemas toward accepted standards such as mets,23 mods,24 mads,25 premis,26 ead,27 vra core,28 and dublin core.29 several responding libraries plan to move toward a wider variety of metadata schemas in the near future, and will focus on using xmlbased schemas to facilitate interoperability and metadata harvesting. to address the needs of these libraries in the future, xc’s metadata services will contain a variety of transformation services to handle a variety of schemas. taking into account the metadata schemas mentioned the most often among survey respondents, the software developed during phase 2 of the xc project will support harvested metadata in marc 21, marcxml, and dublin core (including qualified dublin core).30 metadata crosswalks and mapping one respondent to the xc survey offered the prediction that “reuse of existing metadata and transformation of metadata from one format to another will become commonplace and routine.”31 xc’s internal metadata transformations must be designed with this in mind, to facilitate making these activities “commonplace and routine.” fortunately, many maps and crosswalks already exist that potentially can be incorporated into a next-generation system such as xc.32 the metadata services hub architecture for xc can function as a standard framework for applying a variety of existing crosswalks within a single, shared environment. following “best practices” for crosswalking metadata, such as those developed by the digital library federation (dlf),33 will be extremely important in this environment. as the dlf guidelines describe, metadata schema transformation is not as straightforward as it might first appear to be. while the dlf guidelines advise always crosswalking from a more robust schema to a simpler one, sometimes in a series of steps, such mapping will often result in “dumbing down” of metadata, or loss of granularity. this is a particularly important concern for the xc project because a large percentage of the metadata handled by xc will be rich legacy marc 21 metadata, and we hope to maintain as much of that richness as possible within the xc system. in addition to simply mapping one data element in a schema to its closest equivalent in another, it is essential to ensure that the underlying metadata models of the two schemas being crosswalked are compatible. the authors of the framework for a bibliographic future draft document define multiple layers of such models that need to be considered,34 and offer a general highlevel comparison between the frbr data model35 and the dcmi (dublin core metadata initiative) abstract model (dcam).36 more detailed comparisons of models are also taking place as a part of the development of the new metadata content standard, resource description and access (rda).37 the developers of rda have issued documents offering a detailed mapping of rda elements to rda’s underlying model (frbr)38 and analyzing the relationship between rda elements, the dcmi abstract model, and the metadata framework.39 as a result of a meeting held april 30–may 1, 2007, a joint dcmi/rda task group is now undertaking the collaborative work necessary to carry out the following tasks: n develop an rda element vocabulary. n develop an rda/dublin core application profile based on frbr and frad. n disclose rda value vocabularies using rdf/ rdfs/skos.40 these efforts hold much potential to provide a more rigorous way to communicate about metadata across multiple communities and to increase the compatibility of different metadata schemas and their underlying models. such compatibility will be essential to enabling the functionality of future discovery systems such as xc. an xc metadata application profile the xc project team will define a metadata application profile for xc as a way to document decisions made about data elements, content standards, and crosswalking used within the system. the use of an application profile can facilitate metadata migration, harvesting, and other automated processes, and presents an approach to metadata that is more flexible and responsive to local needs than simply adopting someone else’s metadata guidelines.41 application profiles facilitate the use of multiple schemas because elements can be selected for inclusion from more than one existing schema, or additional elements can be created and defined locally.42 because the xc system will incorporate harvested metadata from a variety of sources, the use of an application profile will be essential to support xc’s complex system requirements. the dcmi community has published guidelines for creating a dublin core application profile (dcap), which is defined more specifically as: [a] form for documenting which terms a given application uses in its metadata, with what extensions or adaptations, and specifying how those terms relate both to formal standards such as dublin core as well as to less formally defined element sets and vocabularies.43 metadata to support next-generation library resource discovery | bowen 11 the announcement of plans to develop an rda/ dublin core application profile illustrates the important role that application profiles are beginning to take to facilitate the interoperability of metadata schemas. the planned rda/dc application profile will “translate” rda into a standard structure that will allow it to be related more easily to other metadata element sets. unfortunately, the rda/dc application profile will likely not be completed in time for it to be incorporated into the first release of the xc software in mid-2009. nevertheless, we intend to use the existing definitions of rda elements to inform the development of the xc application profile.44 this will allow us to anticipate any future incompatibilities between the rda/dc and the xc application profiles, and ensure that xc will be wellpositioned to take advantage of rda-based metadata when rda is implemented. this process may have the reciprocal benefit of also informing the developers of rda of any rda elements that may be difficult to implement within a next-generation system such as xc. the potential value of rda to the xc project—in terms of providing a consistent approach to bibliographic and authority metadata and facilitating frbr-related user functionality—is very significant. it is hoped that at some point xc can become an early adopter of rda and provide a mechanism through which libraries can move their legacy marc 21 metadata into a system that is compatible with an emerging international metadata standard. n goal 2: bring metadata about library resources into a more open web environment xc will reveal library metadata not only through its own separate interface (either the out-of-the-box xc interface or an interface designed by the local library), but will also allow library metadata to be revealed through other web applications. the latter approach will bring library resources directly to web locations that library users are already visiting, rather than attempting to entice users to visit an additional library-specific web location. making library metadata work effectively in the broader web environment (outside the well-defined boundaries of an ils or repository) will require the following requirements 3 and 4: requirement 3—metadata must conform to the standards of the new web environments as well as to that of the system from which it originated. achieving requirement 3 will require library metadata in future systems to perform a dual function: to conform to both existing library standards as well as to web standards and conventions. one way to achieve this is to ensure that the two types of standards themselves are compatible. coyle and hillmann have argued persuasively for changes in the direction of rda development to allow metadata created using rda to function in the broader web environment. these changes include the need to follow a clearly refined, high-level metadata model, to create data elements that can be manipulated by machines, and to move toward the use of uris instead of textual identifiers.45 after the announcement of the outcomes of the rda/dc data modeling meeting, the two authors are considerably more optimistic about rda functioning as a standard within the broader web environment.46 this discourse concerning rda shows but a piece of the process through which long-established library metadata standards need to be reexamined to make library metadata understandable to both humans and machines on the web. moving away from aacr2 toward rda, and ultimately toward incorporating standard web conventions into library metadata, can be a difficult process for those involved in creating and maintaining library standards. nevertheless, transforming library metadata standards in this way is essential to fulfill the requirements necessary for next-generation library discovery systems. requirement 4—metadata must function effectively within the new web environments as well as within the system from which it originated. not only must metadata for a next-generation system follow the conventions and standards used in the broader web, but the data also needs to be able to function effectively in a broader web environment. this is a slightly different proposition from requirement 3, and will necessitate testing the metadata standards themselves to ensure that they enable library metadata to function effectively. the xc project will provide direct experience with using library metadata in two types of web environments: content management systems and learning management systems. library metadata in a content management system as shown in the xc architecture diagram in figure 1, the xc project team will build one of the primary user environments for xc on top of the open-source content management system, drupal.47 the xc drupal module will allow us to respond to many of the needs expressed by libraries in their responses to the xc survey48 by supplying: n a web application server with a back-end database; 12 information technology and libraries | june 2008 n a user interface with web 2.0 features; n library-controlled web pages that will treat library metadata as a native data type; n a metadata interface for enhancing or correcting metadata in the system; and n an administrative interface. the xc team will bring library metadata into the drupal content management system (cms) as a native content type within that environment, creating a drupal “node” for each metadata record. this will allow xc to take advantage of many native features of the drupal cms, such as a taxonomy system.49 building xc interfaces on top of the drupal cms will also give us an opportunity to collaborate with partner libraries that are already active participants in the drupal user community. xc’s architecture will allow the possibility of developing additional user environments on top of other content management systems. bringing library metadata into these new environments will provide many new opportunities for libraries to manipulate their metadata and present it to users without being constrained by the limitations of the current generation of library systems. such opportunities will then inform the future requirements for library metadata in such environments. library metadata in a learning management system figure 1 illustrates two examples of xc user environments through learning management systems: xc interfaces to both the blackboard learning system50 and sakai.51 much exciting work is being done at other institutions to bring library content into these web applications.52 xc will build on projects such as these to reveal library metadata for non-licensed library resources from an ils through learning management systems. specifically, we plan to develop the capability for libraries to make the display of library metadata context-sensitive within the learning management system. for example, searching or browsing on a page for a particular academic course could be configured to reflect the subject area of the course (e.g., chemistry) and automatically present library resources related to that subject.53 this capability will build upon the experiences gained by the university of rochester through its work to develop its “course resources” system.54 such xc functionality will be integrated directly into the learning management system, rather than simply providing a link out to a separate library system. again, we hope that our efforts to bring library metadata into these new environments will encourage libraries to engage in further work to integrate library resources into broader web environments and inform future requirements for library metadata in these environments. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing new functionality for users will require that metadata fulfill more sophisticated functions in a next-generation system than it may have done in an ils or repository, in order to provide more intuitive searching and navigation. the system will also need to capture and incorporate metadata generated through tagging, user-contributed reviews, etc. such new functionality creates the need for requirements 5 and 6. requirement 5—metadata must support functionality to facilitate intuitive searching and navigation, such as faceted browsing and frbrinformed results groupings. enabling faceting and clustering much research has already been done regarding the design of faceted search interfaces in general.55 when considered along with user research conducted at other institutions56 and to be conducted during the development of xc, this data provides a strong foundation for the design of a faceted browse environment. the xc project team has already gained firsthand experience with developing faceted browsing through the development of the “c4” prototype interface during phase 1 of the xc project.57 to enable faceting within xc, we will also pay particular attention to what others have discovered through designing faceted interfaces on top of legacy marc 21 metadata. specific lessons learned from those involved with north carolina state university’s endeca-based catalog,58 vanderbilt university’s primo implementation,59 and plymouth state university’s scriblio system60 provide valuable guidance for the xc project team as we design facets for the xc system. ideally, a mechanism should be developed to enable these discoveries to feed back into the development of metadata and encoding standards, so that changes to existing standards can be considered to facilitate faceting in the future. several new system implementations have used library of congress subject headings (lcsh) and lc subdivisions from marc 21 records as the basis for deriving facets. the xc “c4” prototype interface provides facets for topic, genre, and region that are based simply upon one or more marc 21 6xx tags.61 north carolina state university’s endeca-based system has enabled facets for topic, genre, region, and era using lcsh subdivisions as well, but this has necessitated a “massive cleanup” of subdivisions, as described by charley pennell.62 oclc’s fast (faceted application of subject terminology) project may provide another option for enabling such facets.63 a library could populate its marc 21 data with fast headings, based metadata to support next-generation library resource discovery | bowen 13 upon the existing lcsh in the records, and then use the fast headings as the basis for generating facets. it remains to be seen whether fast will offer significant benefit over lcsh itself when it comes to faceting, however, since fast headings are generated directly from lcsh. while marc 21 metadata has some known difficulties where faceting and clustering are concerned (such as those involving lcsh), the xc system will encounter additional difficulties when implementing these technologies with less robust metadata schemas such as simple dublin core, and especially across metadata from a variety of schemas. the development of web services to augment batches of metadata records in an automated manner holds some promise for improving the creation of facets from other metadata schemas. within the xc system, such services could be added to the metadata services hub and run against ingested metadata. while designing extensive services of this type is beyond the scope of the next phase of xc software development, we will encourage others to develop such services for xc. another (but much less desirable) approach to augmenting metadata is for a metadata specialist to manually edit one record or group of records. the xc cataloging interface, built within the drupal cms, will allow recordby-record editing of metadata when necessary. while we see this editing interface as essential functionality for xc, we anticipate that libraries will want to use this feature sparingly. in many cases it will be preferable to correct or augment metadata within its original repository (e.g., the institution’s ils) and then re-harvest the corrected metadata, rather than correcting it manually within xc itself. because of the expense of manual metadata augmentation and correction, libraries will be well-advised to rely upon insights gained through user research to assess the value of this type of work. for example, a library might decide to edit individual metadata records only when the correction or augmentation will support specific system functionality that is of high priority for the institution’s users. implementing frbr results groupings to incorporate logical groupings of search results based upon the frbr64 and frad65 data models over sets of diverse metadata within xc, we will encounter similar difficulties that we face with faceting and clustering. various analyses of the marc 21 formats have dealt extensively with the relationship between frbr and marc 21,66 and others have written specifically about methodology for frbrizing a marc-based catalog.67 in addition, various tools and web services are available that can potentially facilitate this process.68 even with this extensive body of work to draw upon, however, the success of our implementation of frbr-based functionality will depend upon both the quality and completeness of the system’s metadata. metadata in xc that originated as dublin core records may need significant augmentation to be incorporated effectively into frbrized results displays. to maximize the ability of the system to support frbr/frad results groupings, we may need to supplement automated grouping of resources with a combination of additional services for the metadata services hub, and with cataloger-generated metadata correction and augmentation, as described above.69 the xc team will use the results of user research carried out during the next phase of the xc project to inform our decision-making regarding what frbr-informed results grouping users find helpful, and then assess what specific metadata augmentation services are needed for xc. providing frbr-informed groupings of related records in search results will be easier when the underlying metadata incorporates principles of authority control. of course, the vast majority of the non-marc metadata that will be ingested into xc will not be under authority control. again, this situation suggests the need for additional services or functionality to improve existing metadata within the xc metadata hub, the xc cataloging interface, or both. as an experiment in developing services to facilitate authority control, the xc project team carried out a pilot project in partnership with a group of software engineering students from the rochester institute of technology (rit) during phase 1 of xc. the rit students designed a basic name access control tool that can be used across disparate metadata schemas in an environment such as xc. the tool can ingest marc 21 authority and bibliographic records as well as dublin core records, provide automated matching, and facilitate a cataloger’s handling of problem reports.70 the xc project team will implement the automated portion of the tool as a web service within the xc hub, and the “cataloger facilitation” portion of the tool within the xc cataloging user interface. institutions that use xc can then incorporate additional tools to facilitate authority control into xc as they are needed and developed. in addition to providing a test case for developing xc metadata services, the rit pilot project proved valuable by providing an opportunity for student software developers and catalogers to discuss the functional requirements of a cataloging tool. not only did the experience enable the developers to understand the needs of the system’s intended users, but it also presented an opportunity for the engineering students to demonstrate technological possibilities that the catalogers—who work almost exclusively with legacy ils technology—may not have envisioned before participating in the project. requirement 6—the system must manage usergenerated metadata resulting from user tagging, submission of reviews, etc. because users now expect web-based tools to offer web 2.0 functionalities, the xc project has as one of its basic 14 information technology and libraries | june 2008 goals to incorporate these functionalities into xc’s user environments. the results of the xc survey rank tools to support the finding, gathering, use, and reuse of scholarly content (e.g., rss feeds, blogs, tagging, user reviews) eighth out of a list of twenty new desirable opac features.71 we expect to learn much more about the usefulness of web 2.0 technology within a next-generation system through the user research that we will carry out during phase 2 of the xc project. the xc system will capture metadata generated by users from any one of the system’s user environments (e.g., drupal-based interface, learning management system integration) and harvest it back into the system’s metadata services hub for processing.72 the xc application profile will incorporate user-generated metadata, mapped into its own carefully defined metadata elements. this will allow us to capture and manage this metadata as discrete content, without inadvertently mixing it with other metadata created by library staff or ingested from other sources. n goal 4: conduct user research to inform system development user research will be essential to informing the design and functionality of the xc software. to align xc’s functional requirements as closely as possible with user needs, the xc project team will practice a user-centered design methodology that takes an iterative approach to defining the system’s functional requirements. since we will engage concurrently in the processes of user research and software design, we will not fully determine the system requirements for xc until a significant amount of user research has been done. a complete picture of the demands upon metadata within xc will thus emerge as we gain information from our user research. n goal 5: publish the xc code as open-source software central to the vision of the xc project is sharing the xc software freely throughout the library community and beyond. our hope is that others will use all or part of the xc software, modify it, and improve it to meet their own needs. new requirements for the metadata within xc are likely to arise as this process takes place. other future changes to the xc software will also be needed to ensure the software’s continued compatibility with various metadata standards and schemas. these changes will all affect the system requirements for xc over time. addressing goals 4 and 5 while goals 1 through 3 for the xc project result in specific high-level functional requirements for the system’s discovery metadata that can be addressed and discussed as xc is being developed, goals 4 and 5 present general challenges that must be addressed in the future. goal 4 is likely to fuel the need to update the xc software over time as the needs of users change. goal 5 provides a challenge to managing that updating process in a collaborative environment. these two goals suggest an additional general requirement for the system’s metadata requirement 7: requirement 7—the system’s metadata must be extensible to facilitate future enhancements and updates. enabling future user needs developing xc using a user-centered design process in which user research and software design occur simultaneously will enable us to design and build a system that is as responsive as possible to the needs of users that are seeking library resources. however, user needs will change during the life of the xc software. these needs must be assessed and addressed, and then weighed against the desires of individual institutions that use xc and who request specific system enhancements. to carry forward the xc project’s commitment to serving users, we will develop a governance model for the xc community that brings the needs of future users into the decision-making process by providing a method for continuing to determine and capture user needs. in addition, we will consciously cultivate a commitment to user research among members of the xc community. because the xc software will be released as open source, we can also encourage xc partners to develop whatever additional functionality they need for their own institutions and make these enhancements available to the entire community of xc users. this approach is very different from the enhancement process in place for most commercial systems, and xc partner institutions may need to adjust to this approach. enabling future metadata standards as current metadata standards are revised and new standards and schemas are created, xc must be able to accommodate these changes. new crosswalks will allow new metadata schemas to be mapped to the xc internal schema in the future. the xc application profile can be updated with the addition of new data elements as needed. the drupal-based xc user environment will also allow institutions that use xc to create new internal data types to incorporate additional types of metadata. as the development of the semantic web moves forward73 and enables smart linking between existing authority files and vocabularies,74 xc’s architecture can make use of the resulting web services, either by incorporating them metadata to support next-generation library resource discovery | bowen 15 through the xc metadata services hub or through the native xc user interface as part of a user search query. n further considerations the above discussion of the goals and requirements for xc has revealed a number of issues related to the development of next-generation discovery systems that are unfortunately beyond the scope of the next phase of the xc project. we therefore offer them as a possible agenda for future work by the broader library community: 1. explore the wider usefulness of web-based metadata services and the need for an automated metadata services coordinator to control these functions. libraries are already comfortable with basic “services” that are performed on metadata by an outside agency: for example, a library may send copies of its marc records to a vendor for authority processing or enrichment with tables of contents or other data elements. the library community should encourage vendors and others to develop these and other metadata enrichment options as automated web services. 2. study the advantages of using statement-level metadata provenance, as used in the nsdl metadata management system and considered for use within the xc metadata services hub, and explore whether there are ways that marc 21 could move toward allowing more granularity in recording and sharing metadata provenance. 3. to facilitate access to licensed library resources, encourage the development of more robust metasearch technology and standards so that technological limitations do not hinder system performance and search result usability. if this is not successful, libraries and content providers must work together to enable metadata for licensed resources to be revealed within open discovery environments such as xc and ethicshare.75 this second scenario will enable libraries to directly address usability issues with the display of licensed content, which may make it a more desirable longer-term solution than attempting to improve metasearch technology. 4. the administrative bodies of the two groups represented on the dcmi/rda task group (i.e., the dublin core metadata initiative and the rda committee of principals) have a responsibility to take the lead in funding this group’s work to develop and maintain the rda/dc application profile and its related registries and vocabularies. beyond this, however, the broader library community must recognize that this work is essential to ensure that future library metadata standards will function in the broader web environment, and offer additional administrative and financial support for it in the coming years. 5. to ensure that library standards work effectively outside of traditional library systems, catalogers and metadata experts must develop ongoing, collaborative working relationships with system developers. such collaboration will necessitate educating each group of experts about the domain of the other. 6. libraries should experiment with using metadata in new environments and use the lessons learned from this activity to inform the metadata standards development process. while current library automation environments by and large do not provide opportunities for this, the extensible catalog will provide a flexible platform where experimentation can take place.76 xc will make experimentation as risk-free as possible by ensuring that the original metadata brought into the system can be reharvested in its original form, thus minimizing concerns about possible data corruption. xc will also minimize the investment needed for a library to engage in this experimentation because it will be released as open-source software. 7. to facilitate new functionality for next-generation library discovery environments, libraries must share their new expertise in this area with each other. for example, library professional organizations (such as ala and its associations) should form discussion groups and committees devoted to sharing lessons learned from the implementation of faceted interfaces and web 2.0 technologies, such as tagging and folksonomies. such groups should develop a “best practices” document outlining a preferred way to define facets from marc 21 data that can be used by any library implementing faceting on top of its legacy metadata. 8. the library community should discuss and encourage mechanisms for pooling and sharing usergenerated metadata among libraries and other interested institutions. n conclusions to present library resources via the web in a manner that users now expect, library metadata must function in ways that have never been required of it before. making library metadata function effectively within the broader web environment will require that libraries take advantage of the combined knowledge of experts in the areas of cataloging/metadata and system development who share a 16 information technology and libraries | june 2008 common vision for serving library users. the challenges to making legacy library metadata and newer metadata for digital resources interact effectively in the broader web environment are significant, and work must begin now to ensure that we can preserve the investment that libraries have made in their legacy metadata. while the recommendations within this report are the result of planning to develop one particular library discovery system—the extensible catalog (xc)—these lessons can inform the development of other systems as well. the actual development of xc will continue to add to our knowledge in this area. while it may be tempting to wait and see what commercial vendors offer as their next generation of commercial discovery products, such a passive approach may jeopardize the future viability of library metadata. projects such as the extensible catalog can serve as a vehicle for moving forward by providing an opportunity for libraries to experiment and to then take informed action to move the library community toward a next generation of resource discovery systems. acknowledgments phase 1 of the extensible catalog project was funded through a grant from the andrew w. mellon foundation. this paper is in partial fulfillment of that grant, originally funded on april 1, 2006, and concluding on june 30, 2007. the author acknowledges the contributions of the entire university of rochester extensible catalog project team to the content of this paper, and especially thanks david lindahl, barbara tillett, and konstantin gurevich for reading and offering suggestions on drafts of this paper. references and notes 1. despite the use of the word “catalog” within the name of the extensible catalog project, this paper will avoid using the word “catalog” in the phrase “next-generation catalog” because this may misleadingly convey the idea of a catalog as solely a single, separate web destination for library users. instead, terms such as “discovery environment” and “discovery system” will be preferred. 2. the xc blog provides a list of xc partners, describes their roles in xc phase 2, and provides links to reports that represent the outcomes of xc phase 1. “xc (extensible catalog): an opensource online system that will unify access to traditional and digital library resources,” www.extensiblecatalog.info (accessed october 4, 2007). 3. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998), www.ifla.org/vii/s13/frbr/ frbr.pdf (accessed july 23, 2007). 4. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data: a conceptual model,” april 1, 2007, www.ifla.org/vii/d4/franar-conceptualmodel2ndreview.pdf (accessed july 23, 2007). 5. library of congress, network development and marc standards office, “marc 21 formats,” april 18, 2005, www.loc .gov/marc/marcdocz.html (accessed september 3, 2007). 6. “dublin core metadata element set, version 1.1,” december 20, 2004, http://dublincore.org/documents/dces (accessed september 3, 2007). 7. university of rochester river campus libraries, “extensible catalog phase 2,” (grant proposal submitted to the andrew w. mellon foundation, july 11, 2007). 8. “literature list,” extensible catalog blog, www. extensiblecatalog.info/?page_id=17 (accessed august 27, 2007). 9. a summary of the results of this survey is available on the xc blog. nancy fried foster et al., “extensible catalog survey report,” july 20, 2007, www.extensiblecatalog.info/wp-content/ uploads/2007/07/xc%20survey%20report.pdf (accessed july 23, 2007). 10. lorcan dempsey has written of the need for a service layer for libraries that would facilitate the “de-coupling” of resource retrieval from back-end processing. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, january 20, 2006, http://orweblog.oclc.org/archives/000927. html (accessed august 24, 2007). 11. “open archives initiative protocol for metadata harvesting v. 2.0,” www.openarchives.org/oai/openarchivesprotocol. html (accessed august 27, 2007). 12. library of congress, working group on the future of bibliographic control, “report on the future of bibliographic control: draft for public comment,” november 30, 2007, www .loc.gov/bibliographic-future/news/lcwg-report-draft-11-3007-final.pdf (accessed december 30, 2007). 13. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, 34, http://libraries. universityofcalifornia.edu/sopag/bstf/final.pdf (accessed august 24, 2007). 14. “[worldcat.org] search for an item in libraries near you,” www.worldcat.org (accessed august 24, 2007). 15. oclc’s plan to create additional apis to worldcat as part of its worldcat grid project is a welcome development that may enable oclc members to harvest metadata directly from worldcat into a system such as xc in the future. see the following blog posting for an early description of oclc’s plans, which have not been formally unveiled by oclc as of this writing: bess sadler, “the librarians and the chocolate factory: oclc developer network day,” solvitur ambulando, october 3, 2007, www.ibiblio.org/bess/?p=88 (accessed december 30, 2007). 16. “metadata management system,” nsdl registry, september 20, 2006, http://metadataregistry.org/wiki/index.php/ metadata_management_system (accessed july 23, 2007). 17. diane hillmann, stuart sutton, and jon phipps, “nsdl metadata improvement and augmentation services,”(grant proposal submitted to the national science foundation, 2007). 18. library of congress, network development and marc standards office, “marcxml: marc 21 xml schema,” july 26, 2006, www.loc.gov/standards/marcxml (accessed september 3, 2007). metadata to support next-generation library resource discovery | bowen 17 19. andrew k. pace, “category: metasearch,” hectic pace, http://blogs.ala.org/pace.php?cat=150 (accessed august 27, 2007). see in particular the following blog entries: “metameta,” july 25, 2006; “more meta,” september 29, 2006; “preaching to the publishers,” oct 31, 2006; “even more meta,” july 11, 2007; and “still here,” august 21, 2007. 20. david lindahl, “metasearch in the users’ context,” the serials librarian 51, no. 3/4 (2007): 220–222. 21. ethicshare, a collaborative project of the university of minnesota, georgetown university, indiana university–bloomington, indiana university–purdue university indianapolis, and the university of virginia, is addressing this challenge as part of its plan to develop a sustainable online environment for the practical ethics community. the architecture of the proposed ethicshare system has many similarities to that of xc, but the project focuses specifically upon ingesting citation metadata from a variety of sources, including commercial providers. see cecily marcus, “ethicshare planning phase final report,” july 2007, www.lib.umn.edu/about/ethicshare/university%20 of%20minnesota_ethicshare_final_report.pdf (accessed august 27, 2007). 22. roy tennant used this phrase in “marc exit strategies,” library journal 127, no. 19 (november 15, 2002), www.libraryjournal.com/article/ca256611.html?q=tennant+exit (accessed july 23, 2007); karen coyle presented her vision for moving beyond marc to a more flexible, identifier-based record structure that will facilitate a range of library functions in “future considerations: the functional library systems record,” library hi tech 22, no. 2 (2004). 23. library of congress, network development and marc standards office, “mets: metadata encoding and transmission standard official web site,” august 23, 2007, www.loc.gov/ standards/mets (accessed september 3, 2007). 24. library of congress, network development and marc standards office, “mods: metadata object description schema,” august 22, 2007, www.loc.gov/standards/mods (accessed september 3, 2007). 25. library of congress, network development and marc standards office, “mads: metadata authority description schema,” february 2, 2007, www.loc.gov/standards/mads (accessed september 3, 2007). 26. “premis: preservation metadata maintenance activity,” july 31, 2007, www.loc.gov/standards/premis (accessed september 3, 2007). 27. library of congress, network development and marc standards office, “ead: encoded archival description version 2002 official site,” august 17, 2007, www.loc.gov/ead (accessed september 3, 2007). 28. visual resources association, “vra core: welcome to the vra core 4.0,” www.vraweb.org/projects/vracore4 (accessed september 3, 2007). 29. “dublin core metadata element set, version 1.1.” 30. other xml-compatible schemas, such as mods and mads, will also be supported initially in xc if they are first converted into marc xml or qualified dublin core. in the future, we plan to allow these other schemas to be harvested directly into xc. 31. foster et al., “extensible catalog survey report,” july 20, 2007, 15. the original comment was submitted by meg bellinger in yale university’s response to the xc survey. 32. patricia harpring et al., “metadata standards crosswalks,” in introduction to metadata: pathways to digital information (getty research institute, n.d.), www.getty.edu/research/ conducting_research/standards/intrometadata/crosswalks. html (accessed august 29, 2007); see also carol jean godby, jeffrey a. young, and eric childress, “a repository of metadata crosswalks,” d-lib magazine 10, no. 12 (december 2004), www .dlib.org/dlib/december04/godby/12godby.html (accessed july 23, 2007). 33. digital library federation, “crosswalkinglogic,” june 22, 2007, http://webservices.itcs.umich.edu/mediawiki/oaibp/ index.php/crosswalkinglogic (accessed august 28, 2007). 34. karen coyle et al., “framework for a bibliographic future,” may 2007, http://futurelib.pbwiki.com/framework (accessed july 23, 2007). 35. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 36. andy powell et al., “dcmi abstract model,” dublin core metadata initiative, june 4, 2007, http://dublincore.org/ documents/abstract-model (accessed august 29, 2007). 37. joint steering committee for development of rda, “rda: resource description and access: background,” july 16, 2007, www.collectionscanada.ca/jsc/rda.html (accessed august 29, 2007). 38. joint steering committee for development of rda, “rda-frbr mapping,” june 14, 2007, www.collectionscanada .ca/jsc/docs/5rda-frbrmapping.pdf (accessed august 29, 2007). 39. joint steering committee for development of rda, “rda element analysis,” june 14, 2007, www.collectionscanada.ca/ jsc/docs/5rda-elementanalysis.pdf (accessed august 28, 2007). a revised version of the document was issued on december 16, 2007, at www.collectionscanada.gc.ca/jsc/docs/5rda-element analysisrev.pdf (accessed december 30, 2007). 40. “data model meeting: british library, london 30 april–1 may 2007,” www.bl.uk/services/bibliographic/meeting.html (accessed july 23, 2007). the task group has outlined its work plan, including deliverables, on its wiki at http://dublincore .org/dcmirdataskgroup (accessed october 4, 2007). 41. emily a hicks, jody perkins, and margaret beecher maurer, “application profile development for consortial digital libraries,” library resources and technical services 51, no. 2 (april 2007). 42. makx dekkers, “application profiles, or how to mix and match metadata schemas,” cultivate interactive, january 2001, www.cultivate-int.org/issue3/schemas (accessed august 29, 2007). 43. thomas baker et al., “dublin core application profile guidelines,” september 3, 2005, http://dublincore.org/usage/ documents/profile-guidelines (accessed october 8, 2007). 44. joint steering committee for development of rda, “rda element analysis.” 45. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (jan./feb. 2007), www.dlib.org/dlib/ january07/coyle/01coyle.html (accessed august 24, 2007). 46. karen coyle, “astonishing announcement: rda goes 2.0,” coyle’s information, may 3, 2007, http://kcoyle.blogspot .com/2007/05/astonishing-announcement-rda-goes-20.html (accessed august 29, 2007). 18 information technology and libraries | june 2008 47. “drupal.org,” http://drupal.org (accessed august 30, 2007). 48. foster et al., “extensible catalog survey report,” 14. 49. “taxonomy: a way to organize your content,” drupal.org, http://drupal.org/handbook/modules/taxonomy (accessed september 12, 2007). 50. “blackboard learning system,” www.blackboard.com/ products/academic_suite/learning_system/index.bb (accessed august 31, 2007). 51. “sakai: collaboration and learning environment for education,” http://sakaiproject.org (accessed august 31, 2007). 52. for example, the library into blackboard project at california state fullerton has developed a toolkit for faculty that brings openurl resolver functionality into blackboard to create linked citations to resources. see “putting the library into blackboard: a toolkit for cal state fullerton faculty,” 2005, www .library.fullerton.edu/librarytoolkit/default.shtml (accessed august 31, 2007); and susan tschabrun, “putting the library into blackboard: using the sfx openurl generator to create a toolkit for faculty.” the sakaibrary project at indiana university and the university of michigan are working to integrate licensed library content into sakai using metasearch technology. see “sakaibrary: integrating licensed library resources with sakai,” june 28, 2007, www.dlib.indiana.edu/projects/sakai (accessed august 31, 2007). 53. university of rochester river campus libraries, “extensible catalog phase 2.” 54. susan gibbons, “library course management systems: an overview,” library technology reports 41, no. 3 (may/june 2005): 34–37. 55. marti a. hearst, “design recommendations for hierarchical faceted search interfaces,” august 2006, http:// flamenco.berkeley.edu/papers/faceted-workshop06.pdf (accessed august 31, 2007). 56. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology and libraries 25, no. 3 (september 2006): 128–138. 57. “c4,” https://www.library.rochester.edu/c4 (accessed september 28, 2007). as of the time of this writing, the c4 prototype is available to the public. however, the prototype is no longer being developed, and this prototype may cease to be available at some point in the future. 58. charley pennell, “forward to the past: resurrecting faceted search @ ncsu libraries,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.lib.ncsu.edu/endeca/ presentations/200706-facetedcatalogs-pennell.ppt (accessed august 31, 2007). 59. mary charles lasater, “authority control meets faceted browse: vanderbilt and primo,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.ala.org/ala/lita/litamembership/ litaigs/authorityalcts/2007annualfiles/marycharleslasater.ppt (accessed august 31, 2007). 60. casey bisson, “faceting and clustering: an implementation report based on scriblio,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), http://oz.plymouth.edu/~cbisson/ presentations/alaannual_2-2007june24.pdf (accessed august 31, 2007). 61. “subject access fields (6xx),” in marc 21 concise format for bibliographic data (2006), www.loc.gov/marc/bibliographic/ ecbdsubj.html (accessed september 28, 2007). 62. pennell, “forward to the past: resurrecting faceted search@ ncsu libraries.” 63. “fast: faceted application of subject terminology,” www.oclc.org/research/projects/fast (accessed august 31, 2007). 64. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 65. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data.” 66. library of congress, network development and marc standards office, “functional analysis of the marc 21 bibliographic and holding formats,” april 6, 2006, www.loc. gov/marc/marc-functional-analysis/functional-analysis.html (accessed august 31, 2007); martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 2 (june 2005): 77–95; pat riva, “mapping marc 21 linking entry fields to frbr and tillett’s taxonomy of bibliographic relationships,” library resources and technical services 48, no. 2 (april 2004): 130–143. 67. trond aalberg, “a process and tool for the conversion of marc records to a normalized frbr implementation,” in digital libraries: achievements, challenges and opportunities (berlin/heidelberg: springer, 2006), 283–292; christian monch and trond aalberg, “automatic conversion from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2003): 405–411; david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib magazine 11, no. 10 (october 2005), www .dlib.org/dlib/october05/crane/10crane.html (accessed august 24, 2007). 68. trond aalberg, frank berg haugen, and ole husby, “a tool for converting from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2006), 453–456; “frbr work-set algorithm,” www .oclc.org/research/software/frbr/default.htm (accessed august 31, 2007); “xisbn (web service),” www.worldcat .org/affiliate/webservices/xisbn/app.jsp (accessed august 31, 2007). 69. for example, marc 21 data may need to be augmented to extract data attributes related to frbr works and expressions that are not explicitly coded within a marc 21 bibliographic record (such as a date associated with a work coded within a general note field); or to “sort out” the fields in a marc 21 bibliographic record for a single resource that contains various works and/or expressions (e.g. ,a sound recording with multiple tracks), to associate the various fields (performer access points, analytical entries, subject headings, etc.) with the appropriate work or expression. 70. while the rit-developed tool is not publicly available at the time of this writing, it is our intent to post it to sourceforge (www.sourceforge.net) in the near future. the final report of the rit project is available at http://docushare.lib.rochester.edu/ docushare/dsweb/get/document-27362 (accessed january 2, 2008). metadata to support next-generation library resource discovery | bowen 19 71. foster et al., “extensible catalog survey report.” 72. note the arrow pointing to the left in figure 1 between the user environments and the metadata services hub. 73. jane greenberg and eva mendez, knitting the semantic web (binghamton, ny: haworth information press, 2007). this volume, co-published simultaneously as cataloging and classification quarterly 43, no. 3/4, contains a wealth of articles that explore the role that libraries can, and should, play in the development of the semantic web. 74. corey a. harper and barbara b. tillett explore various methods for making these controlled vocabularies available in “library of congress controlled vocabularies and their application to the semantic web,” cataloging and classification quarterly 43, no. 3/4 (2007): 63. the development of skos (simple knowledge organization system), a semantic web language for representing controlled structured vocabularies, will also be valuable for xc. see alistair miles and jose r. perez-aguiera, “skos: simple knowledge organisation for the web,” catalogingand classification quarterly 43, no. 3/4 (2007). 75. marcus, “ethicshare planning phase final report.” 76. the talis platform provides another promising environment for experimentation and development. see “talis platform: semantic web application platform,” talis, www.talis.com/ platform (accessed september 2, 2007). usability as a method for assessing discovery | ipri, yunkin, and brown 181 tom ipri, michael yunkin, and jeanne m. brown usability as a method for assessing discovery the university of nevada las vegas libraries engaged in three projects that helped identify areas of its website that had inhibited discovery of services and resources. these projects also helped generate staff interest in the usability working group, which led these endeavors. the first project studied student responses to the site. the second focused on a usability test with the libraries’ peer research coaches and resulted in a presentation of those findings to the libraries staff. the final project involved a specialized test, the results of which also were presented to staff. all three of these projects led to improvements to the website and will inform a larger redesign. u sability testing has been a component of the university of nevada las vegas (unlv) libraries web management since our first usability studies in 2000.1 usability studies are a widely used and relatively standard set of tools for gaining insight into web functionality. these tests can explore issues such as the effectiveness of interactive forms or the complexity of accessing full-text articles from third-party databases. they can explore aesthetic and other emotional responses to a site. in addition, they can provide an opportunity to collect input concerning satisfaction with the layout and logic of the site. they can reveal mistakes on the site, such as coding errors, incorrect or broken links, and problematic wording. they also allow us to engage in testing issues of discovery to isolate site elements that facilitate or hamper discovery of the libraries’ resources and services. the libraries’ usability working group seized upon two library-wide opportunities to highlight findings of the past year’s studies. the first was the discovery summit, in which the staff viewed videos of staff attempting finding exercises on the homepage and discussed the finding process. the second was the discovery mini-conference, an outgrowth of a new evaluation framework and the libraries’ strategic plan. through a poster display, the working group highlighted areas dealing with discovery of library resources. the mini-conference allowed us to leverage library-wide interest in the topic of effective information-finding on the web to draw wider attention to usability’s importance in identifying the likelihood of our users discovering library resources independently. the usability working group engaged in three projects to help identify areas of the website that inhibited discovery and to generate staff interest in the process of usability. all three of these projects led to improvements to the website and will inform a larger redesign. the first project is an ongoing effort to study student responses to the site. the second was to administer a usability test with the libraries’ peer research coaches and present those findings to the libraries’ staff. the final project was requested by the dean of libraries and involved a specialized test, the results of which also were presented to staff. n student studies the usability working group began its ongoing evaluation of unlv libraries’ website by conducting two series of tests: one with five undergraduate students and one with five graduate students. not surprisingly, most students self-reported that the main reason they come to the libraries’ site is to find books and journal articles for assignments. the group created a set of fourteen tasks that were based on common needs for completing assignments: 1. find a journal article on the death penalty. (note: if students go somewhere other than the library, guide them back.) 2. find what floor the book the catcher in the rye is on. 3. find the most current issue of the journal popular mechanics. 4. identify a way to ask a question from home. 5. find a video on global warming. 6. you need to write a bibliography for a paper. find something on the website that would help you. 7. find out what lied library’s hours were for july 4. 8. find the libraries’ tutorial on finding books in the library. 9. the library offers workshops on how to use the library. find one you can take. 10. find a library-recommended website in business. 11. find out what books are checked out on this card. 12. find instructions for printing from your personal laptop. 13. your sociology professor, dr. lampert, has placed something on reserve for your class. please find the material. 14. your professor wants you to read the book efficiency and complexity in grammars by john a. hawkins. find a copy of the book for your assignment. (the tom ipri (tom.ipri@unlv.edu) is head, media and computer services; michael yunkin (michael.yunkin@unlv.edu) is web content manager/usability specialist; and jeanne m. brown (jeanne.brown@unlv.edu) is head, architecture studies library and assessment librarian, university of nevada las vegas libraries. 182 information technology and libraries | december 2009 moderator will prompt if the person stops at the catalog.) the results of these tests revealed that the site was not as conducive to discovery as was hoped. the libraries are planning on a complete redesign of the site in the near future; however, the results of these first two series of usability tests were compelling enough to prompt an intermediary redesign to improve some of the areas that were troublesome to students. that said, the tests also found certain parts of the old site (figure 1) to be very effective: 1. all participants used the tabbed box in the center of the page, which gives them access to the catalog, serials lists, databases, and reserves. 2. all students quickly found the “ask a librarian” link when prompted to find a way to ask a question from home. 3. most students found the libraries’ hours, partly because of the “hours” tab at the top of the page and partly because of multiple access points. 4. many participants used the “site search” tab to navigate to the search page, but few actually used it to conduct searches. they effectively used the site map information also included on the search page. the usability tests also revealed some variables that undermined the goal of discoverability. 1. due to the various sources of library-related information (website, catalog, vendor databases) navigation posed problems for students. although not a specific question in the usability tests, the results show students often struggled to get back to the libraries’ home page to start a new question. 2. students often expected to find different content under “help and instruction” than what was there. 3. students used the drop down boxes as a last resort. often, they would expand a drop down box and quickly navigate away without selecting anything from the list. 4. with some exceptions, students mainly ignored the tabs across the top of the home page. 5. although students made good use of the tabbed box in the center of the page, many could not distinguish between “journals” and “articles & databases.” 6. similarly, students easily found the “reserves” tab but could not make sense of the difference between “electronic reserves (e-reserves)” and “other reserves.” 7. no student found business resources via the “subject guides” drop down menu at the bottom of the home page. n peer-coach test and staff presentation unlv libraries employs peer research coaches, undergraduate students who serve as frontline research mentors to their peers. the usability working group administered the same test they used with the first group of undergraduate and graduate students to the peer research coaches. although these students are trained in library research, they still struggled with some of the usability tasks. the usability working group presented the findings of the peer research coach tests with staff. the peer research coaches are highly regarded in the libraries, so staff were surprised that they had so much difficulty navigating the site; this presentation was the first time many of the staff had seen the results of usability studies of the site. the shocking nature of these results generated a great deal of interest among the staff regarding the work of the usability working group. n the dean’s project in january 2009, the dean of libraries asked the usability working group for assistance in planning for the discovery summit. initially, she requested to view figure 1. unlv libraries’ original website design usability as a method for assessing discovery | ipri, yunkin, and brown 183 the video from some of the usability tests with the goal of identifying discovery-oriented problems on the libraries’ website. soon after, the dean tasked the group with performing a new set of usability tests using three subjects: a librarian, a library employee with little research or web expertise, and a faculty researcher. each participant was asked to complete three tasks, first using the libraries’ website, then using google. the tasks were based on items found in the libraries’ special collections: 1. find a photograph available in unlv libraries of the basic magnesium mine in henderson, nevada. 2. find some information about the baneberry nuclear test. are there any documents in unlv libraries about the lawsuit associated with the test? 3. find some information about the local greenpeace chapter. are there any documents in unlv libraries about the las vegas chapter? the dean viewed those videos and chose the most interesting clips for a presentation at the discovery summit. prior to this meeting, the libraries’ staff were instructed to try completing the tasks on their own so that they might see the potential difficulties users must overcome and to compare the user experience provided by our website with that provided by google. at the discovery summit, the dean presented the staff a number of clips from these special usability tests, giving the staff an opportunity to see where users familiar with the libraries collections stumble. the staff also were shown several clips of undergraduates using the website to perform basic tasks, such as finding journal articles or videos in the libraries, with varying degrees of success. these clips helped illustrate the various difficulties users encounter when attempting to discover library holdings, including unfamiliar search interfaces, library jargon, and a lack of clear relationships between the catalog and other databases. this discussion helped set the stage for the discovery mini-conference. n initial changes to the site unlv libraries’ website is in the process of being redesigned, and the results of the usability studies are being used to inform that process. however, because of the seriousness of some of the issues, some changes are being implemented into an intermediary design (figure 2). the new homepage n combines article and journal searching into one tab and removes the word “databases” from the page entirely; n adds a website search to the tabbed box; n adds a “music & video” search option; n makes better use of the picture on the page by incorporating rotating advertisements in that area; n widens the page, allowing more space on the rest of the site’s templates; n breaks the confusing “help & instruction” page into two more specific pages: “help” and “using the libraries”; and n adds the main library and the branch library hours to the homepage. this new homepage is just the beginning of our efforts to improve discovery through the libraries’ website. the usability working group already has plans to do a card sort for the “using the library” category to further refine the content and language of that section. the group plans to test the initial changes to the site to ensure that they are improving discovery. reference 1. jennifer church, jeanne brown, and diane vanderpol, “walking the web: usability testing of navigational pathways at the university of nevada las vegas libraries,” in usability assessment of library-related web sites: methods and case studies, ed. nicole campbell (chicago: ala, 2001). figure 2. unlv libraries’ new website design hutchinson this study focuses on the adoption and use of wireless technology by medium-sized academic libraries, based on responses from eighty-eight institutions. results indicate that wireless networks are already available in many medium-sized academic libraries and that respondents from these institutions feel this technology is beneficial. w ireless networking offers a way to meet the needs of an increasingly mobile, tech-savvy student population. while many research libraries offer wireless access to their patrons, academic libraries serving smaller populations must heavily weigh both the potential benefits and disadvantages of this new technology. will wireless networks become essential components of the modern academic library, or is this new technology just a passing fad? prompted by plans to implement a wireless network at the houston cole library (hcl) (jacksonville state university’s [jsu’s] library), which serves a student enrollment close to ten thousand, this study was conducted to gather information about whether libraries similar in size and mission to hcl have adopted wireless technology. the study also sought to find out what, if any, problems other libraries have encountered with wireless networks and how successful they have perceived those networks to be. other questions addressed include level of technical support offered, planning, type of equipment used to access the network, and patron-use levels. � review of literature a review of the literature on wireless networks revealed a number of articles on wireless networks and checkout programs for laptop computers at large research institutions. seventy percent of major research libraries surveyed by kwon and soules in 2003 offered some degree of wireless access to their networks.1 no articles, however, specifically addressed the use of wireless networks in medium-sized academic libraries. many articles can also be found on wireless-network use in medical libraries and other institutions. library instruction using wireless classrooms and laptops has been another subject of inquiry as well. breeding wrote that there are a number of successful uses for wireless technology in libraries, and a wireless local area network (wlan) can be a natural extension of existing networks. he added that since it is sometimes difficult to install wiring in library buildings, wireless is more cost effective.2 a yearly survey conducted by the campus computing project found that the number of schools planning for and deploying wireless networks rose dramatically from 2002 to 2003. “for example, the portion of campuses reporting strategic plans for wireless networks rose to 45.5 percent in fall 2003, up from 34.7 percent in 2002 and 24.3 percent in 2001.”3 the use of wireless access in academia is expected to keep growing. according to a summary of a study conducted by the educause center for applied research (ecar), the higher-education community will keep investing in the technology infrastructure, and institutions will continue to refine and update networks. the move toward wireless access “represents a user-centered shift, providing students and faculty with greater access than ever before.”4 in an article on ubiquitous computing, drew provides a straightforward look at how wlans work, security issues, planning, and the uses and ramifications of wireless technology in libraries. he suggests, “perhaps one of the most important reasons for implementing wireless networking across an entire campus or in a library is the highly mobile lifestyle of students and faculty.” the use of wireless will only increase with the advent of new portable devices, he added. wireless networking is the best and least expensive way for students, faculty, and staff to take their office with them wherever they go.5 the circulation of laptop computers is a frequent topic in the available literature. the 2003 study by kwon and soules primarily focused on laptop-lending services in academic-research libraries. fifty percent of the institutions that responded to their survey provided laptops for checkout. the majority indicated moderate-to-high use of laptop services. positive user response and improved “public reputation, image, and relations” were the greatest advantages reported with laptop circulation. the major disadvantages associated with these services were related to labor and cost.6 a study of laptop checkout service at the mildred f. sawyer library at suffolk university in boston revealed that laptop usage was popular during the fall semester of 1999. students checked out the computers to work on group projects. a laptop area was set aside on one library floor to provide wired internet access for eight users. however, students wanted to use the laptops anywhere, not one designated place. the wired laptop areas were not popular, dugan wrote, adding that “few students used the wired area and the wires were repeatedly stolen or intentionally broken.” an interim phase involved providing wireless network cards for checkout wireless networks in medium-sized academic libraries: a national survey paula barnett-ellis and laurie charnigo paula barnett-ellis (pbarnett@jsucc.jsu.edu) is health and sciences librarian, and laurie charnigo (charnigo@jsucc .jsu.edu) is education librarian at houston cole library, jacksonville state university, alabama. wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 13 14 information technology and libraries | march 2005 to encourage patrons to use their own laptops, and, when a wireless network was put into place in the fall of 2000, demand exceeded the number of available laptops for checkout.7 � method a survey (see appendix) was designed to find out how many libraries similar in size and mission to hcl have adopted wireless networks, the experiences they have encountered in offering wireless access, and, most importantly, whether they felt the investment in wireless technology has been worth the effort.8 the national center for education statistic’s academic library peer comparison tool, a database composed of statistical information on libraries throughout the united states, was used to select institutions for this study. a search on this database retrieved eighty-eight academic libraries that met two criteria: full-time enrollments of between five thousand and ten thousand, and classification by the carnegie classification of higher education as master’s colleges and universities i.9 the survey was administered to those thought most likely to be responsible for systems in the library; they were selected from staff listings on library web sites (library systems administrator, information tech-nology [it] staff). if such a person could not be identified, the survey was sent to the head of library systems or to the library director. the survey was divided into the following sections: implementation of wireless network, planning and installation stages, user services, technical problems, and benefits specific to use of network. surveys were mailed out in march 2004. an internet address was provided in the cover letter if participants wished to take the survey online rather than return it by mail. an e-mail reminder with a link to the online survey was sent out three weeks after the initial survey was mailed. all letters and e-mails were personalized, and a self-addressed stamped envelope and a ballpoint pen with the jsu logo were included with the mail surveys. in the e-mail reminder, the authors offered to share the results of the project with anyone who was interested, and received several enthusiastic responses. � results a total of fifty-three completed surveys were returned, resulting in a response rate of 60 percent. the overwhelming majority (85 percent) responded that their library offered wireless-network access. even if the thirty-five surveys that were not returned had reported that wireless networks were not available, more than 50 percent would still have offered wireless networks. survey results also pointed to the newness of the technology. only four of the fifty-three institutions have had wireless networks for more than three years. the majority (73 percent) has implemented wireless networks just within the last two years. when asked to identify the major reasons for offering wireless networks to their patrons, the three responses most chosen were: (1) to provide greater access to users; (2) the flexibility of a network unfettered by the limitations of tedious wiring; and (3) to keep up with technological innovation (see table 1). least significant factors in the decision to implement wireless networks were cost; use by library faculty and staff; to aid in bibliographic instruction; and use for carrying out technical services (taking inventory). somewhat to the authors’ surprise, wireless use in bibliographic instruction was not high on the list of reasons for installing a wireless network, identified by only 9 percent of respondents. the benefits of wireless for library instruction was stressed in the literature by mathias and heser and patton.10 in addition to obtaining an instrument for gauging how many libraries similar in scope and size to hcl have implemented wireless networks and why they chose to do so, questions on the survey were also designed to gather information on planning and implementation, user services, technical problems, and perceived benefits. � planning and implementation although tolson mentions that some schools have used committees composed of faculty, staff, and students to look into the adoption of wireless technology, responses from this survey indicated that the majority (60 percent) of the libraries did not form committees specifically for the planning of their wireless networks.11 in addition, 49 percent of the libraries took fewer than six months to plan for implementation of a network, 37 percent required six months to one year, and 15 percent reported more than one to two years. actual time spent on installation and configuration of wireless networks was relatively short, 98 percent indicating less than one year (see table 2 for specific times). one of the most important issues to consider when planning to implement a wireless network is extent of coverage—where wireless access will be available. survey responses revealed varying degrees of wireless coverage among institutions. twenty percent had campus-wide access, 55 percent had some level of coverage throughout the entire library, 37 percent provided a limited range of coverage outside the building, and 20 percent offered access only in certain areas within the library. according to a bulletin published by ecar, institutions vary in their approaches to networking depending on enrollment. smaller colleges and universities with fewer than ten thousand students are “more likely to implement campuswide wireless networks from the start. larger institutions are more likely to implement wireless technology in specific buildings, consistent with a desire to move forward at a modest pace, as resources and comfort with the technology grow.”12 questions on the survey also queried respondents about the popularity of spaces in the library where users access the library’s wireless network. answers revealed that the most popular areas for wireless access are study carrels, tables, and study rooms. nineteen percent indicated that accessing wireless networks in the stacks is popular. of particular concern to hcl, a thirteen-story building, was how the environment of the library would accommodate a wireless network. a thorough site survey is important to locate the best spots within the library to install access points and to determine whether there are architectural barriers in the building that might interfere with access. the majority of survey respondents indicated that the site survey conducted in their library for a wireless network was carried out by their academic institution’s it staff (59 percent). while library staff conducted 35 percent of site surveys, only 17 percent were conducted by outside companies. � user services an issue to be addressed by libraries deciding to go wireless is whether laptop computers should also be provided for checkout in the library. after all, it might be hard to justify the usefulness of a wireless network if users do not have access to laptops or other hardware with wireless capabilities. while one individual reported working at a “laptop university” in which campuswide wireless networking exists and all students are required to own laptops, not all college students will have that luxury. in order to provide more equal access to students, checking out laptops has become an increasingly common service in academic libraries. seventy percent of this survey’s respondents whose institutions offered wireless access also made laptops available for checkout. comments made throughout the survey seemed to imply that while checking out laptops to patrons is an invaluable complement to offering wireless access, librarians should be prepared for a myriad of hassles that accompany laptop checkout. wear and tear of laptops, massive battery use, cost of laptops, and maintenance were some of the biggest problems reported. one participant, whose institution decided to stop offering laptops for checkout to patrons in the library, wrote, “it required too much staff time to maintain and we decided the money was better spent elsewhere. the college now encourages students to purchase a laptop [instead of] a full-sized pc.” one participant worried that the rising use of laptops in his library would lead to the obsolescence of its more than one hundred wired desktops, writing, “our desktops are very popular and we think having them is one of the reasons our gate count has increased in recent years. what happens when everyone has a laptop?” the number of laptops checked out in the libraries varied. the majority of libraries had purchased between one and thirty laptops available for checkout (see table 3). three institutions had more than forty-one laptops available for checkout. one library could boast that it had sixty laptops available for checkout with twelve pagers to notify students waiting in line to use laptops. when asked about the use of laptops in libraries, 46 percent table 1. main reasons for implementing a wireless network in absolute numbers and percentages reasons for implementing total number of percent of responses a wireless network responses out of total number provide greater access to users 36 67 flexibility (no wires, ease in setting up) 29 54 to keep up with or provide technological innovation 28 52 campuswide initiative 21 39 requests expressed by users 16 30 provide greater online access due to shortage of computers-per-user in the library 15 28 other 7 13 offer network access outside the library building 6 11 aid in bibliographic instruction 5 9 for use by library faculty and staff 5 9 low cost 5 9 to carry out technical services (such as inventory) 4 7 wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 15 16 information technology and libraries | march 2005 observed moderate use, while 32 percent reported heavy use of laptops. only 3 percent indicated that they hardly ever noticed use of laptops in the library. for those students who chose to bring their own laptop to access the library’s wireless network, half of the institutions surveyed required students to purchase their own network-interface cards for their laptops, while 19 percent allowed students to check them out from the library. in addition to laptops, personal digital assistants, (pdas) were listed by 37 percent of respondents as devices that may access wireless networks. one librarian indicated that cell phones could access the wireless network in his library. fiftysix percent of respondents indicated that users are able to print to a central printer in the library from their wireless device. an important consideration for implementing a wireless network is how users will authenticate. authentication protocol is defined by the microsoft encyclopedia of networking as “any protocol used for validating the identity of a user to determine whether to grant the user access to resources over a network.”13 authentication methods listed by the institutions surveyed varied greatly and the authors could not identify all of them. methods mentioned were lightweight directory access protocol (ldap), virtual private network (vpn), and media access control (mac) addresses, bluesocket, remote authentication dial in user service (radius), pluggable graphical identification and authentication (pgina), protected extensive authentication protocol (peap), and e-mail logins. out of the thirty-nine responses to this question, seven individuals indicated that they do not require any type of authentication at the present. although some individuals noted that they are planning to enable some type of authentication in the future, one participant suggested that there were ethical issues involved in requiring users to authenticate. this person argued that “anonymous access to information is valued” and praised his institution’s current policy of allowing “anyone who can find the network” to use it. a concern about offering wireless network access in the library is how library staff will be prepared to handle the flood of technical questions that are likely to ensue. the level of technical support offered to users varied among the institutions surveyed. more than half of the respondents indicated that users receive help specifically from it staff or from the campus computer center. thirtynine percent of users received help from the reference desk, while 19 percent received help from circulation staff. thirty-three percent of the responding institutions offered technical help from a web site, while 7 percent indicated that they did not offer any type of technical support to users. technical problems the technical problems most often encountered with wireless networks centered on architectural barriers that cause black-outs or slow-spots where wireless access fails. this confirms the importance of carrying out thorough site surtable 2. total length of time taken to completely configure and install the wireless network time to install and total number of percent of responses configure wireless network responses out of total number less than one month 12 28 one to two months 11 26 more than two months to four months 10 23 more than four months to six months 4 9 more than six months to one year 5 12 more than one year 1 2 table 3. total number of laptops available for checkout in the library total laptops total number of percent of responses available for checkout responses out of total number one to five 8 26 six to ten 5 16 eleven to fiften 1 3 sixteen to twenty 5 16 twenty-one to thirty 8 26 thirty-one to forty 1 3 more than forty 3 10 veys and testing prior to installation of access points. site surveys may be carried out by companies specially equipped and trained to determine where access points should be installed, the most appropriate type of antennae (directional or omnidirectional), and how many access points are needed to provide the greatest amount of coverage. configuration of the network was the second most highly reported problem associated with installing wireless networks, seeming to suggest the need for librarians to coordinate their efforts and rely on the knowledge provided by the it coordinator (or similar type of personnel) within their institution. lack of technical support available to users, slow speed, and authentication were also indicated as technical problems most encountered (see table 4). integrating the wireless network with the existing wired network was the least-mentioned problem associated with wireless networks. although security problems, particularly concerning wired equivalency protocol (wep) vulnerabilities, have been pointed out as one of the major drawbacks of a wireless network, the majority of users had not as yet experienced security problems. although one participant wrote, “don’t be too casual about the security risks,” another individual wrote, “talk to your networking department,” as many of them are overly worried about security. perceived benefits respondents reported that the number-one benefit of offering wireless access was user satisfaction. giving patrons the ability to use their laptops anywhere in the library and do multiple tasks from one machine is simply becoming what more and more users expect. the secondlargest benefit revolved around flexibility and ease of use due to the lack of wires. thirty-five percent indicated that allowing students to roam the stacks while accessing the network was a significant benefit. although a few studies have suggested the promise of wireless networks for aiding bibliographic instruction, only 9 percent of respondents indicated this as a benefit of wireless technology. use of wireless technology for instruction, it might be recalled, was not a significant factor noted by respondents in the decision to implement a wireless network. likewise, use of this type of network to carry out technical services (such as inventory) was also low on the scale of benefits. seventy-three percent of users claimed that wireless networks have thus far been worth the cost-benefit ratio. while 70 percent indicated moderate to heavy use of the wireless network, 27 percent reported low usage. when asked what advice they would give to others considering adopting wireless networks in their libraries, the overwhelming majority of responses were positive, recommending that hcl take the plunge. as one individual wrote, “offer it and they will come. it has really increased the usage of our library.” other individuals noted that it is simply necessary to offer wireless access to keep up with technological innovation, and that students expect it. the most significant warning, however, revolved around checkout and maintenance of laptops, which, from the results of this survey, seems be both a big advantage and a headache. several individuals echoed the importance of doing site surveys to test bandwidth limitations and access. one particularly energized participant, using multiple exclamations for emphasis, shared a plethora of advice. “throttle connection speeds! allow only http access! block ports and unnecessary protocols! secure your network and disallow unauthenticated users! use access control lists! establish policies that describe wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 17 table 4. technical problems encountered problems total number of percent of responses encountered responses out of total number architectural barriers 15 28 configuration problems 12 22 not enough technical help available to users when needed 10 19 slow speed 10 19 authentication problems 10 19 blackouts 6 11 problems installing drivers 6 11 security problems 6 11 difficulty signing on 6 11 problems with operating systems 5 9 other 3 6 problems integrating the wireless network with an existing wired network 2 4 18 information technology and libraries | march 2005 [wireless fidelity] wi-fi risks and liabilities on your part!” useful advice on wireless-access implementation gleaned from this survey fell under the following categories: � be aware of slower speed � create a policy and guide for users � do it because more users are going wireless, it is necessary to keep up with technological innovation, and because students love it � provide plenty of access points � install access points in appropriate places � ensure continuous connectivity by allowing overlap between access points � purchase battery chargers and heavy-duty laptops with extended warranties � get support from it staff for planning and maintenance � offering wireless will increase library usage � perform or have an expert perform a careful site survey and do lots of testing to locate dead or slow spots in the library due to architectural barriers � enable some type of authorization � be aware of security concerns � although the majority of participants’ networks (70 percent) support 802.11b (which allows for throughput up to 11 megabits per second), a few participants suggest using the 802.11g standard (up to 54 megabits per second) because it is “the fastest” and “backwards compatible to 802.11b” � conclusion though it is a relatively new technology, this study found that a surprisingly large number of medium-sized academic libraries are already offering wireless access. not only are they offering wireless access, but they are also providing patrons with laptops for checkout in the library. although actual use of the network by patrons was not determined through survey responses (as individuals were only asked about their observations of network use), the comments and answers were overwhelmingly positive and enthusiastic about this new technology. problems that have been encountered with wireless networks largely revolve around configuration, slow speed, and laptop checkout. although much of the literature focuses on security issues that accompany wireless networking, few individuals reported problems with security. college and university students, like the rest of society, are becoming increasingly mobile. more often, they want access to library networks and the internet wherever they happen to be studying or working on group projects, not merely in computer labs or designated study areas. the majority of the libraries in this study are accommodating these students’ needs by offering wireless access. according to breeding, wireless networking is a rapidly growing niche in the networking world, and mobile computer users will become a larger and larger part of any library’s clientele.14 to encourage patrons to continue visiting them, academic libraries, large and small, should attempt to meet the demand for wireless access if at all possible. references and notes 1. myoung-ja lee kwon and aline soules, laptop computer services: spec kit 275 (washington, d.c.: association of research libraries office of leadership and management services, 2003), 11. 2. marshall breeding, “the benefits of wireless technologies,” information today 19, no. 3 (mar. 2002): 42–43. 3. kenneth c. green, “the campus computing project.” accessed mar. 3, 2004, www.campuscomputing.net/. 4. educause center for applied research, “respondent summary: wireless networking in higher education in the u.s. and canada.” accessed dec. 4, 2003, www.educause.edu/ ir/library/pdf/ecar_so/ers/ers0202/ekf0202.pdf. 5. wilfred drew, “wireless networks: new meaning to ubiquitous computing,” journal of academic librarianship 29, no. 2 (mar. 2003): 102–106. 6. kwon and soules, laptop computer services, 11, 15–17. 7. robert e. dugan, “managing laptops and the wireless networks at the mildred f. sawyer library,” journal of academic librarianship 27, no. 4 (jul. 2001): 295–98. 8. questions on the survey did not distinguish as to whether wireless network installations were initiated by it or library personnel. 9. national center for education statistics, “compare academic libraries.” accessed mar. 10, 2004, http://nces.ed.gov/ surveys/libraries/academicpeer/. 10. molly susan mathias and steven heser, “mobilize your instruction program with wireless technology,” computers in libraries 22, no.3 (mar. 2002): 24–30; janice k. patton, “wireless computing in the library: a successful model at st. louis community college,” community & junior-college libraries 10, no. 3 (mar. 2001): 11–16. 11. stephanie diane tolson, “wireless laptops and local area networks.” accessed dec. 11, 2003, www.thejournal.com/ magazine/vault/articleprintversion.cfm?aid=3536. 12. raymond boggs and paul arabasz, “research bulletin: the move to wireless networking in higher education.” accessed dec. 4, 2003, www.educause.edu/ir/library/pdf/erb0207.pdf. 13. mitch tulloch, microsoft encyclopedia of networking (redmond, wash.: microsoft pr., 2002), 122. 14. marshall breeding, “a hard look at wireless networks,” library journal 127, no. 12 (summer 2002): 14–17. 1. has a wireless network been implemented in your library? __yes __no 2. if your library has not adopted wireless networking, are you currently planning or seriously considering it for the near future? __yes (please skip to question 4) __no (please fill out questions 2 and 3 only) 3. what are your primary concerns about implementing a wireless network? check all that apply. __the technology is still new __unsure of its benefits __no need for one __questions regarding security __cost __would not be able to provide technical support that might be needed __funds must primarily support other types of technology at the moment __have not noticed many users with laptops in the library __slow speed of wireless networks __other 4. how long has a wireless network been implemented in your library? __fewer than 6 months __6 months to 1 year __more than 1 to 2 years __more than 2 to 3 years __more than 3 years 5. what were the main reasons for implementing a wireless network? check all that apply. __provide greater access to users __campuswide initiative __offer network access outside the library building __provide greater online access due to shortage of computers per user in the library __flexibility (no wires, ease in setting up) __requests expressed by users __low cost __to keep up with or provide technological innovation __to carry out technical services (such as inventory) __aid in bibliographic instruction __for use by library faculty and staff __other 6. please describe the coverage of your network. check all that apply. __campuswide __library building and limited range outside the library building __inside the library (all areas) __select areas within the library 7. what areas of the library are most popularly used for access to the wireless network? check all that apply. __reference and computer media center areas __in the stacks __librarians and staff offices __carrels, tables, reading or study rooms __area outside the library building 8. please list standards your wireless network supports. check all that apply. __802.11b __802.11a __802.11g __bluetooth __other planning and installation 1. was a committee established to plan the implementation and service of the wireless network? __yes __no 2. how long did it take to plan for implementation of the wireless network? __fewer than 6 months __6 months to 1 year __more than 1 to 2 years __more than 2 years 3. how long did it take to install and configure the network? __less than a month __1 to 2 months __more than 2 to 4 months __more than 4 to 6 months __more than 6 months to 1 year __more than 1 year 4. who performed the site survey? check all that apply. __an outside company or contractor appendix. survey: implementation of wireless networks wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 19 20 information technology and libraries | march 2005 __institution’s own information technology coordinator or computer staff __library staff with technical expertise __no site survey was conducted 5. if the site surveyor was an outside company or contractor, please list their company name and whether you would recommend them. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ user services 1. how are users authenticated? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2. does the library check out laptops to users (for either wired or wireless use)? __yes __no 3. if laptops are available for checkout, do they have wireless capability? __yes __no 4. how many laptops do you have for checkout? __one to five __six to ten __eleven to fifteen __sixteen to twenty __twenty-one to thirty __thirty-one to forty __more than forty 5. how would you describe use of laptops in your library on the average day? __heavy—very noticeable use of laptops __moderate use of laptops __low use of laptops __not sure __hardly even notice laptops are used 6. how do users obtain wireless cards for the network? check all that apply. __check out from library __purchase from library __purchase from the campus computer center __must purchase on their own 7. if the library checks out wireless cards, how many were purchased for checkout? __one to five __six to ten __eleven to fifteen __sixteen to twenty __twenty-one to twenty-five __twenty-six to thirty __more than thirty 8. what type of technical support does the library provide to users? check all that apply. __help from reference or help desk __help from the information technology staff or campus computer center __circulation staff __other library staff __from a web site __no technical help is provided to users 9. has the library created a policy for the use of wireless networks? __yes __no 10. are users able to print from the wireless network in the library? __yes __no 11. which of the following may access the wireless network? check all that apply. __laptops __desktop computers __pdas __cell phones __other technical problems 1. what technical problems have you or your users encountered? check all that apply. __blackouts __architectural barriers __slow speed __problems integrating the wireless network with an existing wired network __configuration problems __security problems __authentication problems __problems with operating systems __difficulty signing on __not enough technical help available to users when needed __problems installing drivers __other 2. have you experienced security problems with the network? check all that apply. __have not experienced any security problems __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 110 information technology and libraries | september 2009 employing virtualization in library computing: use cases and lessons learned arwen hutt, michael stuart, daniel suchy, and bradley d. westbrook this paper provides a broad overview of virtualization technology and describes several examples of its use at the university of california, san diego libraries. libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. this paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure. o perating system virtualization, herein referred to simply as “virtualization,” is a powerful and highly adaptable solution to several library technology challenges, such as managing computer labs, automating cataloging and other procedures, and demonstrating new library services. virtualization has been used in one manner or another for decades,1 but it is only within the last few years that this technology has made significant inroads into library environments. virtualization technology is not without its drawbacks, however. libraries need to assess their needs, as well as the resources required for virtualization, before embarking on large-scale implementations. this paper provides a broad overview of virtualization technology and explains its benefits and drawbacks by describing some of the ways virtualization has been used at the university of california, san diego (ucsd) libraries.2 n virtualization overview virtualization is used to partition the physical resources (processor, hard drive, network card, etc.) of one computer to run one or more instances of concurrent, but not necessarily identical, operating systems (oss). traditionally only one instance of an operating system, such as microsoft windows, can be used at any one time. when an operating system is virtualized—creating a virtual machine (vm)—the vm communicates through virtualization middleware to the hardware or host operating system. this middleware also provides a consistent set of virtual hardware drivers that are transparent to the enduser and to the physical hardware. this allows the virtual machine to be used in a variety of heterogeneous environments without the need to reconfigure or install new drivers. with the majority of hardware and compatibility requirements resolved, the computer becomes simply a physical presentation medium for a vm. n two approaches to virtualization: host-based vs. hypervisor virtualization can be implemented using type 1 or type 2 hypervisor architectures. a type 1 hypervisor (figure 1), commonly referred to as “host-based virtualization,” requires an os such as microsoft windows xp to host a “guest” operating system like linux or even another version of windows. in this configuration, the host os treats the vm like any other application. host-based virtualization products are often intended to be used by a single user on workstation-class hardware. in the type 2 hypervisor architecture (figure 2), commonly referred to as “hypervisor-based virtualization,” the virtualization middleware interacts with the computer’s physical resources without the need of a host operating system. such systems are usually intended for use by multiple users with the vms accessed over the network. realizing the full benefits of this approach requires a considerable resource commitment for both enterprise-class server hardware and information technology (it) staff. n use cases archivists’ toolkit the archivists’ toolkit (at) project is a collaboration of the ucsd libraries, the new york university libraries, and the five colleges libraries (amherst college, hampshire college, mt. holyoke college, smith college, and university of massechusetts, amherst) and is funded by the andrew w. mellon foundation. the at is an open-source archival data management system that provides broad, integrated support for the management of archives. it consists of a java client that connects to a relational database back-end (mysql, mssql, or oracle). the database can be implemented on a networked server or a single workstation. since its initial release in december 2006, the at has sparked a great deal of interest and rapid uptake of the application within the archival community. this growing interest has, in turn, created an increased demand for demonstrations of the product, workshops and training, and simpler methods for distributing the application. (of the use cases described here, the two for the at arwen hutt (ahutt@ucsd.edu) is metadata specialist, michael stuart (mstuart@ucsd.edu) is information technology analyst, daniel suchy (dsuchy@ucsd.edu) is public services technology analyst, and bradley d. westbrook (bradw@library.ucsd.edu) is metadata librarian and digital archivist, university of california, san diego libraries. employing virtualization in library computing | hutt et al. 111 distribution and laptop classroom are exploratory, whereas the rest are in production.) at workshops the society of american archivists sponsors a two-day at workshop occurring on multiple dates at several locations. in addition, the at team provides oneand two-day workshops to different institutional audiences. at workshops are designed to give participants a hands-on experience using the at application. accomplishing this effectively requires, at the minimum, supplying all participants with identical but separate databases so that participants can complete the same learning exercises simultaneously and independently without concern for working in each other’s space. in addition, an ideal configuration would reduce the workload of the instructors, freeing them from having to set up the at instructional database onsite for each workshop. for these workshops we needed to do the following: n provide identical but separate databases and database content for all workshop attendees n create an easily reproducible installation and setup for workshops by preparing and populating the at instructional database in advance virtualization allows the at workshop instructors to predefine the workstation configuration, including the installation and population of the at databases, prior to arriving at the workshop site. to accomplish this we developed a workshop vm configuration with mysql and the at client installed within a linux ubuntu os. the workshop instructors then built the at vm with the data they require for the workshop. the at client and database are loaded on a dvd or flash drive and shipped to the classroom managers at the workshop sites, who then need only to install a copy of the vm and the freely available vmplayer software (necessary to launch the at vm) onto each workstation in the classroom. the at vm, once built, can be used many times both for multiple workstations in a classroom as well as for multiple workshops at different times and locations. this implementation has worked very well, saving both time and effort for the instructors and classroom support staff by reducing the time and communication figure 1. a type 1 hypervisor (host-based) implementation figure 2. a type 2 hypervisor-based implementation 112 information technology and libraries | september 2009 necessary for deploying and reconfiguring the vm. it also reduces the chances that there will be an unexpected conflict between the application and the host workstation’s configuration. but the method is not perfect. more than anything else, licensing costs motivated us to choose linux as the operating system instead of a proprietary os such as windows. this reduces the cost of using the vm, but it also requires workshop participants to use an os with which they are often unfamiliar. for some participants, unfamiliarity with linux can make the workshop more difficult than it would be if a more ubiquitous os was used. at demonstrations in a similar vein, members of the at team are often called upon to demonstrate the application at various professional conferences and other venues. these demonstrations require the setup and population of a demonstration database with content for illustrating all of the application’s functions. one of the constraints posed by the demonstration scenario is the importance of using a local database instance rather than a networked instance, since network connections can be unreliable or outright unavailable (network connectivity being an issue we’ve all faced at conferences). another constraint is that portions of the demonstrations need some level of preparation (for example, knowing what search terms will return a nonempty result set), which must be customized for the unique content of a database. a final constraint is that, because portions of the demonstration (import and data merging) alter the state of the database, changes to the database must be easily reversible, or else new examples must be created before the database can be reused. building on our experience of using virtualization to implement multiple copies of an at installation, we evaluated the possibility of using the same technology for simplifying the setup necessary for demonstrating the at. as with the workshops, the use of a vm for at demonstrations allows for easy distribution of a prepopulated database, which can be used by multiple team members at disparate geographic locations and on different host oss. this significantly reduces the cost of creating (and recreating) demonstration databases. in addition, demonstration scripts can be shared between team members, creating additional time savings as well as facilitating team participation in the development and refinement of the demonstration. perhaps most important is the ability to roll back the vm to a specific state or snapshot of the database. this means the database can be quickly returned to its original state after being altered during a demonstration. overall, despite our initial anxiety about depending on the vm for presentations to large audiences, this solution has proven very useful, reliable, and cost-effective. at distribution implementing the at requires installing both the toolkit client and a database application such as mysql, instantiating an at database, and establishing the connection between database and client. for many potential customers of the at, the requirements for database creation and management can be a significant barrier due to inexperience with how such processes work and a lack of readily available it resources. many of these customers simply desire a plug-and-play version of the application that they can install and use without requiring technical assistance. it is possible to satisfy this need for a plug-and-play at by constructing a vm containing a fully installed and ready-to-use at application and database instance. this significantly reduces the number and difficulty of steps involved in setting up a functional at instance. the customer would only need to transfer the vm from a dvd or other source to their computer, download and install the vm reader, and then launch the at vm. they would then be able to begin using the at immediately. this removes the need for the user to perform database creation and management; arguably the most technically challenging portion of the setup process. users would still have the option of configuring the application (default values, lookup lists, etc.) in accord with the practices of their repository. batch processing catalog records the rapid growth of electronic resources is significantly changing the nature of library cataloging. not only are types of library materials changing and multiplying, the amount of e-resources being acquired increases each year. electronic book and music packages often contain tens of thousands of items, each requiring some level of cataloging. because of these challenges, staff are increasingly cataloging resources with specialized programs, scripts, and macros that allow for semiautomated record creation and editing. such tools make it possible to work on large sets of resources—work that would not be financially possible to perform manually item by item. however, the specialized configuration of the workstation required for using these automated procedures makes it very difficult to use the workstation for other purposes at the same time. in fact, user interaction with the workstation while the process is running can cause a job to terminate prior to completion. in either scenario, productivity is compromised. virtualization offers an excellent remedy to this problem. a virtual machine configured for semiautomated batch processing allows for unused resources on the workstation to process the batch requests in an isolated environment while, at the same time and on the same machine, the user is able to work on other tasks. in cases employing virtualization in library computing | hutt et al. 113 where the user’s machine is not an ideal candidate for virtualization, the vm can be hosted via a hypervisorbased solution, and the user can access the vm with familiar remote access tools such as remote desktop in windows xp. secure sandbox in addition to challenges posed by increasingly large quantities of acquisitions, the ucsd libraries is also encountering an increasing variety of library material types. most notable is the variety and uniqueness of digital media acquired by the library, such as specialized programs to process and view research data sets, new media formats and viewers, and application installers. cataloging some of these materials requires that media be loaded and that applications be installed and run to inspect and validate content. but running or opening these materials, which are sometimes from unknown sources, poses a security risk to both the user’s workstation and to the larger pool of library resources accessible via the network. many installers require a user to have administrative privileges, which can pose a threat to network security. the virtual machine allows for a user to have administrative privileges within the vm, but not outside of the vm. the user can be provided with the privileges needed for installing and validating content without modifying their privileges on the host machine. in addition, the vm can be isolated by configuring its network connection so that any potential security risks are limited to the vm instance and do not extend to either the host machine or the network. laptop classroom instructors at the ucsd libraries need a laptop classroom that meets the usual requirements for this type of service (mobility, dependability, etc.) but also allows for the variety of computing environments and applications in use throughout our several library locations. in a least-common-denominator scenario, computers are configured to meet a general standard (usually microsoft windows with a standard browser and office suite) and allow minimal customization. while this solution has its advantages and is easy to configure and maintain from the it perspective, it leaves much to be desired for an instructor who needs to use a variety of tools in the classroom, often on demand. the goal in this case is not to settle for a single generic build but instead look for a solution that accommodats three needs: n the ability to switch quickly between different customized os configurations n the ability to add and remove applications on demand in a classroom setting n the ability to restore a computer modified during class to its original state of course, regardless of the approach taken, the laptops still needed to retain a high level of system security, application stability, and regular hardware maintenance. after a thorough review of the different technologies and tools already in use in the libraries, we determined that virtualization might also serve to meet the requirements of our laptop classroom. the need to support multiple users and multiple vms makes this scenario an ideal candidate for hypervisor-based virtualization. we decided to use vdi (virtual desktop infrastructure), a commercially available hypervisor product from vmware. vmware is one of the largest providers of virtualization software, and we were already familiar with several iterations of its host-based vm services. the core of our project plan consists of a base vm to be created and managed by our it department. to support a wide variety of applications and instruction styles, instructors could create a customized vm specific to their library’s instruction needs with only nominal assistance from it staff. the custom vm would then be made available on demand to the laptops from a central server (as depicted in figure 2 above). in this manner, instructors could “own” and maintain a personal instructional computing environment, while the classroom manager could still ensure the laptop classroom as a whole maintained the necessary secure software environment required by it. as an added benefit, once these vms are established, they could be accessed and used in a variety of diverse locations. n considerations for implementation before implementing any virtualization solution, in-depth analysis and testing is needed to determine which type of solution, if any, is appropriate for a specific use case in a specific environment. this analysis should include three major areas of focus: user experience, application performance in the virtualized environment, and effect on the enterprise infrastructure. in this section of this paper, we review considerations that, in hindsight, we would have found to be extremely valuable in the ucsd libraries’ various implementations of virtualization. user experience traditionally, system engineers have developed systems and tuned performance according to engineering metrics (e.g., megabytes per second and network latency). while such metrics remain valuable to most assessments of a 114 information technology and libraries | september 2009 computer application, performance assessments are being increasingly defined by usability and user experience factors. in an academic computing environment, especially in areas such as library computer labs, these newer kinds of performance measures are important indicators of how effectively an application performs and, indirectly, of how well resources are being used. virtualization can be implemented in a way that allows library users to have access to both the virtualized and host oss or to multiple virtualized oss. since virtualization essentially creates layers within the workstation, multiple os layers (either host or virtualized) can cause the users to become confused as to which os they are interacting with at a given moment. in that kind of implementation, the user can lose his or her way among the host and guest oss as well as become disoriented by differing features of the virtualized oss. for example, the user may choose to save a file to the desktop, but may not be aware that the file will be saved to the desktop of the virtualized os and not the host os. external device support can also be problematic for the end user, particularly with regard to common devices such as flash drives. the user needs to be aware of which operating system is in use, since it is usually the only one with which an external device is configured to work. authentication to a system is another example of how the relationship between the host and guest os can cause confusion. the introduction of a second os implicitly creates a second level of authentication and authorization that must be configured separately from that of the host os. user privileges may differ between the host and guest os for a particular vm configuration. for instance, a user might need to remember two logins or at least enter the same login credentials twice. these unexpected differences between the host and guest os produce negative effects on a user’s experience. this can be a critical factor in a time-sensitive environment such as a computer lab, where the instructor needs to devote class time to teaching and not to preparing the computers for use and navigating students through applications. interface latency and responsiveness latency (meaning here the responsiveness or “sluggishness” of the software application or the os) in any interface can be a problem for usability. developers devote a significant amount of time to improving operating systems and application interfaces to specifically address this issue. however, users will often be unable to recognize when an application is running a virtualized os and will thus expect virtualized applications to perform with the same responsiveness as applications that are not-virtualized. in our experience, some vm implementations exhibit noticeable interface latency because of inherent limitations of the virtualization software. perhaps the most notable and restrictive limitation is the lack of advanced 3d video rendering capability. this is due to the lack of support for hardware-accelerated graphics, thus adding an extra layer of communication between the application and the video card and slowing down performance. in most hardware-accelerated 3d applications (e.g., google earth pro or second life), this latency is such a problem that the application becomes unusable in a virtualized environment. recent developments have begun to address and, in some cases, overcome these limitations.3 in every virtualization solution there is overhead for the virtualization software to do its job and delegate resources. in our experience, this has been found to cause an approximately 10–20 percent performance penalty. most applications will run well with little or moderate changes to configuration when virtualized, but the overhead should not be overlooked or assumed to be inconsequential. it is also valuable to point out that the combination of applications in a vm, as well as vms running together on the same host, can create further performance issues. traditional bottlenecks the bottlenecks faced in traditional library computing systems also remain in almost every virtualization implementation. general application performance is usually limited by the specifications of one or more of the following components: processor, memory, storage, and network hardware. in most cases, assuming adequate hardware resources are available, performance issues can be easily addressed by reconfiguring the resources for the vm. for example, a vm whose application is memorybound (i.e., performance is limited by the memory available to the vm), can be resolved by adjusting the amount of memory allocated to the vm. a critical component of planning a successful virtualization deployment includes a thorough analysis of user workflow and the ways in which the vm will be utilized. although the types of user workflows may vary widely, analysis and testing serve to predict and possibly avoid potential bottlenecks in system performance. enterprise impact when assessing the effect virtualization will have on your library infrastructure, it is important to have an accurate understanding of the resources and capabilities that will form the foundation for the virtualized infrastructure. it is a misconception that it is necessary to purchase stateof-the-art hardware to implement virtualization. not only are organizations realizing how to utilize existing hardware better with virtualization for specific projects, they are discovering that the technology can be extended employing virtualization in library computing | hutt et al. 115 to the rest of the organization and be successfully integrated into their it management practices. virtualization does, however, impose certain performance requirements for large-scale deployments that will be used in a 24/7 production environment. in such scenarios, organizations should first compare the level of performance offered by their current hardware resources with the performance of new hardware. the most compelling reasons to buy new servers include the economies of scale that can be obtained by running more vms on fewer, more robust servers, as well as the enhanced performance supplied by newer, more virtualization-aware hardware. in addition, virtualization allows for resources to be used more efficiently, resulting in lower power consumption and cooling costs. also, the network is often one of the most overlooked factors when planning a virtualization project. while a local virtualized environment (i.e., a single computer) may not necessarily require a high performance network environment, any solution that calls for a hypervisor-based infrastructure requires considerable planning and scaling for bandwidth requirements. the current network hardware available in your infrastructure may not perform or scale adequately to meet the needs of this vm use. again, this highlights the importance of thorough user workflow analyses and testing prior to implementation. depending on the scope of your virtualization project, deployment in your library can potentially be expensive and can have many indirect costs. while the initial investment in hardware is relatively easy to calculate, other factors, such as ongoing staff training and system administration overhead, are much more difficult to determine. in addition, virtualization adds an additional layer to oftentimes already complex software licensing terms. to deal with the increased use of virtualization, software vendors are devoting increasing attention to the intricacies of licensing their products for use in such environments. while virtualization can ameliorate some licensing constraints (as noted in the at workshop use case), it can also conceal and promote licensing violations, such as multiple uses of a single-license applications or access to license-restricted materials. license review is a prudent and highly recommended component of implementing a virtualization solution. finally, concerning virtualization software itself, it also should be noted that while commercial vm companies usually provide plentiful resources for aiding implementation, several worthy open-source options also exist. as with any opensource software, the total cost of operation (e.g., the costs of development, maintenance, and support) needs to be considered. n conclusion as our use cases illustrate, there are numerous potential applications and benefits of virtualization technology in the library environment. while we have illustrated a number of these, many more possibilities exist, and further opportunities for its application will be discovered as virtualization technology matures and is adapted by a growing number of libraries. as with any technology, there are many factors that must be taken into account to evaluate if and when virtualization is the right tool for the job. in short, successful implementation of virtualization requires thoughtful planning. when so implemented, virtualization can provide libraries with cost-effective solutions to long-standing problems. references and notes 1. alessio gaspar et al., “the role of virtualization in computing education,” in proceedings of the 39th sigcse technical symposium on computer science education (new york: acm, 2008): 131–32; paul ghostine, “desktop virtualization: streamlining the future of university it,” information today 25, no. 2 (2008): 16; robert p. goldberg, “formal requirements for virtualizable third generation architectures,” in communications of the acm 17, no. 7 (new york: acm, 1974): 412–21; and karissa miller and mahmoud pegah, “virtualization: virtually at the desktop,” in proceedings of the 35th annual acm siguccs conference on user services (new york: acm, 2007): 255–60. 2. for other, non–ucsd use cases of virtualization, see joel c. adams and w. d. laverell, “configuring a multi-course lab for system-level projects,” sigcse bulletin 37, no. 1 (2005): 525–29; david collins, “using vmware and live cd’s to configure a secure, flexible, easy to manage computer lab environment,” journal of computing for small colleges 21, no. 4 (2006): 273–77; rance d. necaise, “using vmware for dual operating systems,” journal of computing in small colleges 17, no. 2 (2001): 294–300; and jason nieh and chris vaill, “experiences teaching operating systems using virtual platforms and linux,” sigcse bulletin 37, no 1 (2005): 520–24. 3. h. andrés lagar-cavilla, “vmgl (formerly xen-gl): opengl hardware 3d acceleration for virtual machines,” www .cs.toronto.edu/~andreslc/xen-gl/ (accessed oct. 21, 2008). 6 information technology and libraries | march 2009 paul t. jaeger and zheng yan one law with two outcomes: comparing the implementation of cipa in public libraries and schools though the children’s internet protection act (cipa) established requirements for both public libraries and public schools to adopt filters on all of their computers when they receive certain federal funding, it has not attracted a great amount of research into the effects on libraries and schools and the users of these social institutions. this paper explores the implications of cipa in terms of its effects on public libraries and public schools, individually and in tandem. drawing from both library and education research, the paper examines the legal background and basis of cipa, the current state of internet access and levels of filtering in public libraries and public schools, the perceived value of cipa, the perceived consequences of cipa, the differences in levels of implementation of cipa in public libraries and public schools, and the reasons for those dramatic differences. after an analysis of these issues within the greater policy context, the paper suggests research questions to help provide more data about the challenges and questions revealed in this analysis. t he children’s internet protection act (cipa) established requirements for both public libraries and public schools to—as a condition for receiving certain federal funds—adopt filters on all of their computers to protect children from online content that was deemed potentially harmful.1 passed in 2000, cipa was initially implemented by public schools after its passage, but it was not widely implemented in public libraries until the 2003 supreme court decision (united states v. american library association) upholding the law’s constitutionality.2 now that cipa has been extensively implemented for five years in libraries and eight years in schools, it has had time to have significant effects on access to online information and services. while the goal of filtering requirements is to protect children from potentially inappropriate content, filtering also creates major educational and social implications because filters also limit access to other kinds of information and create different perceptions about schools and libraries as social institutions. curiously, cipa and its requirements have not attracted a great amount of research into the effects on schools, libraries, and the users of these social institutions. much of the literature about cipa has focused on practical issues—either recommendations on implementing filters or stories of practical experiences with filtering. while those types of writing are valuable to practitioners who must deal with the consequences of filtering, there are major educational and societal issues raised by filtering that merit much greater exploration. while relatively small bodies of research have been generated about cipa’s effects in public libraries and public schools,3 thus far these two strands of research have remained separate. but it is the contention of this paper that these two strands of research, when viewed together, have much more value for creating a broader understanding of the educational and societal implications. it would be impossible to see the real consequences of cipa without the development of an integrative picture of its effects on both public schools and public libraries. in this paper, the implications of cipa will be explored in terms of effects on public libraries and public schools, individually and in tandem. public libraries and public schools are generally considered separate but related public sphere entities because both serve core educational and information-provision functions in society. furthermore, the fact that public schools also contain school library media centers highlights some very interesting points of intersection between public libraries and school libraries in terms of the consequences of cipa: while cipa requires filtering of computers throughout public libraries and public schools, the presence of school library media centers makes the connection between libraries and schools stronger, as do the teaching roles of public libraries (e.g., training classes, workshops, and evening classes). n the legal road to cipa history under cipa, public libraries and public schools receiving certain kinds of federal funds are required to use filtering programs to protect children under the age of seventeen from harmful visual depictions on the internet and to provide public notices and hearings to increase public awareness of internet safety. senator john mccain (r-az) sponsored cipa, and it was signed into law by president bill clinton on december 21, 2000. cipa requires that filters at public libraries and public schools block three specific types of content: (1) obscene material (that paul t. jaeger (pjaeger@umd.edu) is assistant professor at the college of information studies and director of the center for information policy and electronic government of the university of maryland in college park. zheng yan (zyan@uamail.albany .edu) is associate professor at the department of educational and counseling psychology in the school of education of the state university of new york at albany. one law with two outcomes | jaeger and yan 7 which appeals to prurient interests only and is “offensive to community standards”); (2) child pornography (depictions of sexual conduct and or lewd exhibitionism involving minors); and (3) material that is harmful to minors (depictions of nudity and sexual activity that lack artistic, literary, or scientific value). cipa focused on “the recipients of internet transmission,” rather than the senders, in an attempt to avoid the constitutional issues that undermined the previous attempts to regulate internet content.4 using congressional authority under the spending clause of article i, section 8 of the u.s. constitution, cipa ties the direct or indirect receipt of certain types of federal funds to the installation of filters on library and school computers. therefore each public library and school that receives the applicable types of federal funding must implement filters on all computers in the library and school buildings, including computers that are exclusively for staff use. libraries and schools had to address these issues very quickly because the federal communications commission (fcc) mandated certification of compliance with cipa by funding year 2004, which began in summer 2004.5 cipa requires that filters on computers block three specific types of content, and each of the three categories of materials has a specific legal meaning. the first type—obscene materials—is statutorily defined as depicting sexual conduct that appeals only to prurient interests, is offensive to community standards, and lacks serious literary, artistic, political, or scientific value.6 historically, obscene speech has been viewed as being bereft of any meaningful ideas or educational, social, or professional value to society.7 statutes regulating speech as obscene have to do so very carefully and specifically, and speech can only be labeled obscene if the entire work is without merit.8 if speech has any educational, social, or professional importance, even for embodying controversial or unorthodox ideas, it is supposed to receive first amendment protection.9 the second type of content—child pornography—is statutorily defined as depicting any form of sexual conduct or lewd exhibitionism involving minors.10 both of these types of speech have a long history of being regulated and being considered as having no constitutional protections in the united states. the third type of content that must be filtered— material that is harmful to minors—encompasses a range of otherwise protected forms of speech. cipa defines “harmful to minors” as including any depiction of nudity, sexual activity, or simulated sexual activity that has no serious literary, artistic, political, or scientific value to minors.11 the material that falls into this third category is constitutionally protected speech that encompasses any depiction of nudity, sexual activity, or simulated sexual activity that has serious literary, artistic, political, or scientific value to adults. along with possibly including a range of materials related to literature, art, science, and policy, this third category may involve materials on issues vital to personal well-being such as safe sexual practices, sexual identity issues, and even general health care issues such as breast cancer. in addition to the filtering requirements, section 1731 also prescribes an internet awareness strategy that public libraries and schools must adopt to address five major internet safety issues related to minors. it requires libraries and schools to provide reasonable public notice and to hold at least one public hearing or meeting to address these internet safety issues. requirements for schools and libraries cipa includes sections specifying two major strategies for protecting children online (mainly in sections 1711, 1712, 1721, and 1732) as well as sections describing various definitions and procedural issues for implementing the strategies (mainly in sections 1701, 1703, 1731, 1732, 1733, and 1741). section 1711 specifies the primary internet protection strategy—filtering—in public schools. specifically, it amends the elementary and secondary education act of 1965 by limiting funding availability for schools under section 254 of the communication act of 1934. through a compliance certification process within a school under supervision by the local educational agency, it requires schools to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, are child pornography, or are harmful to minors under the age of seventeen. likewise, section 1712 specifies the same filtering strategy in public libraries. specifically, it amends section 224 of the museum and library service act of 1996/2003 by limiting funding availability for libraries under section 254 of the communication act of 1934. through a compliance certification process within a library under supervision by the institute of museum and library services (imls), it requires libraries to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, child pornography, or harmful to minors under the age of seventeen. section 1721 is a requirement for both libraries and schools to enforce the internet safety policy with the internet safety policy strategy and the filtering technology strategy as a condition of universal service discounts. specifically, it amends section 254 of the communication act of 1934 and requests both schools and libraries to monitor the online activities of minors, operate a technical protection measure, provide reasonable public notice, and hold at least one public hearing or meeting to address the internet safety policy. this is through the 8 information technology and libraries | march 2009 certification process regulated by the fcc. section 1732, titled the neighborhood children’s internet protection act (ncipa), amends section 254 of the communication act of 1934 and requires schools and libraries to adopt and implement an internet safety policy. it specifies five types of internet safety issues: (1) access by minors to inappropriate matter on the internet; (2) safety and security of minors when using e-mail, chat rooms, and other online communications; (3) unauthorized access; (4) unauthorized disclosure, use, and dissemination of personal information; and (5) measures to restrict access to harmful online materials. from the above summary, it is clear that (1) the two protection strategies of cipa (the internet filtering strategy and safety policy strategy) were equally enforced in both public schools and public libraries because they are two of the most important social institutions for children’s internet safety; (2) the nature of the implementation mechanism is exactly the same, using the same federal funding mechanisms as the sole financial incentive (limiting funding availability for schools and libraries under section 254 of the communication act of 1934) through a compliance certification process to enforce the implementation of cipa; and (3) the actual implementation procedure differs in libraries and schools, with schools to be certified under the supervision of local educational agencies (such as school districts and state departments of education) and with libraries to be certified within a library under the supervision of the imls. economics of cipa the universal service program (commonly known as e–rate) was established by the telecommunications act of 1996 to provide discounts, ranging from 20 to 90 percent, to libraries and schools for telecommunications services, internet services, internal systems, and equipment.12 the program has been very successful, providing approximately $2.25 billion dollars a year to public schools, public libraries, and public hospitals. the vast majority of e-rate funding—about 90 percent—goes to public schools each year, with roughly 4 percent being awarded to public libraries and the remainder going to hospitals.13 the emphasis on funding schools results from the large number of public schools and the sizeable computing needs of all of these schools. but even 4 percent of the e-rate funding is quite substantial, with public libraries receiving more than $250 million between 2000 and 2003.14 schools received about $12 billion in the same time period.15 along with e-rate funds, the library services and technology act (lsta) program administered by the imls provides money to each state library agency to use on library programs and services in that state, though the amount of these funds is considerably lower than e-rate funds. the american library association (ala) has noted that the e-rate program has been particularly significant in its role of expanding online access to students and to library patrons in both rural and underserved communities.16 in addition to the effect on libraries, e-rate and lsta funds have significantly affected the lives of individuals and communities. these programs have contributed to the increase in the availability of free public internet access in schools and libraries. by 2001, more than 99 percent of public school libraries provided students with internet access.17 by 2007, 99.7 percent of public library branches were connected to the internet, and 99.1 percent of public library branches offered public internet access.18 however, only a small portion of libraries and schools used filters prior to cipa.19 since the advent of computers in libraries, librarians typically had used informal monitoring practices for computer users to ensure that nothing age inappropriate or morally offensive was publicly visible.20 some individual school and library systems, such as in kansas and indiana, even developed formal or informal statewide internet safety strategies and approaches.21 why were only libraries and schools chosen to protect children’s online safety? while there are many social institutions that could have been the focus of cipa, the law places the requirements specifically on public libraries and public schools. if congress was so interested in protecting children from access to harmful internet content, it seems that the law would be more expansive and focused on the content itself rather than filtering access to the content. however, earlier laws that attempted to regulate access to internet content failed legal challenges specifically because they tried to regulate content. prior to the enactment of cipa, there were a number of other proposed laws aimed at preventing minors from accessing inappropriate internet content. the communications decency act (cda) of 1996 prohibited the sending or posting of obscene material through the internet to individuals under the age of eighteen.22 however, the supreme court found the cda to be unconstitutional, stating that the law violated free speech under the first amendment. in 1998, congress passed the child online protection act (copa), which prohibited commercial websites from displaying material deemed harmful to minors and imposed criminal penalties on internet violators.23 a three-panel judge for the district court for the eastern district of pennsylvania ruled that copa’s focus on “contemporary community standards” violated the first amendment, and the panel subsequently imposed an one law with two outcomes | jaeger and yan 9 injunction on copa’s enforcement. cipa’s force comes from congress’s power under the spending clause; that is, congress can legally attach requirements to funds that it gives out. since cipa is based on economic persuasion—the potential loss of funds for technology—the law can only have an effect on recipients of those funds. while regulating internet access in other venues like coffee shops, internet cafés, bookstores, and even individual homes would provide a more comprehensive shield to limit children’s access to certain online content, these institutions could not be reached under the spending clause. as a result, the burdens of cipa fall squarely on public libraries and public schools. n the current state of filtering when did cipa actually come into effect in libraries and schools? after overcoming a series of legal challenges that were ultimately decided by the supreme court, cipa came into effect in full force in 2003, though 96 percent of public schools were already in compliance with cipa in 2001. when the court upheld the constitutionality of cipa, the legal challenge by public libraries centered on the way the statute was written.24 the court’s decision states that the wording of the law does not place unconstitutional limitations on free speech in public libraries. to continue receiving federal dollars directly or indirectly through certain federal programs, public libraries and schools were required to install filtering technologies on all computers. while the case decided by the supreme court focused on public libraries, the decision virtually precludes public schools from making the same or related challenges.25 before that case was decided, however, most schools had already adopted filters to comply with cipa. as a result of cipa, a public library or public school must install technology protection measures, better known as filters, on all of its computers if it receives n e-rate discounts for internet access costs, n e–rate discounts for internal connections costs, n lsta funding for direct internet costs,26 or n lsta funding for purchasing technology to access the internet. the requirements of cipa extend to public libraries, public schools, and any library institution that receives lsta and e–rate funds as part of a system, including state library agencies and library consortia. as a result of the financial incentives to comply, almost 100 percent of public schools in the united states have implemented the requirements of cipa,27 and approximately half of public libraries have done so.28 how many public schools have implemented cipa? according to the latest report by the department of education (see table 1), by 2005, 100 percent of public schools had implemented both the internet filtering strategy and safety policy strategy. in fact, in 2001 (the first year cipa was in effect), 96 percent of schools had implemented cipa, with 99 percent filtering by 2002. when compared to the percentage of all public schools with internet access from 1994 to 2005, internet access became nearly universal in schools between 1999 and 2000 (95 to 98 percent), and one can see that the internet access percentage in 2001 was almost the same as the cipa implementation percentage. according to the department of education, the above estimations are based on a survey of 1,205 elementary and secondary schools selected from 63,000 elementary schools and 21,000 secondary and combined schools.29 after reviewing the design and administration of the survey, it can be concluded that these estimations should be considered valid and reliable and that cipa was immediately and consistently implemented in the majority of the public schools since 2001.30 how many public libraries have implemented cipa? in 2002, 43.4 percent of public libraries were receiving e-rate discounts, and 18.9 percent said they would not apply for e-rate discounts if cipa was upheld.31 since the supreme court decision upholding cipa, the number of libraries complying with cipa has increased, as table 1. implementation of cipa in public schools year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2005 access (%) 35 50 65 78 89 95 98 99 99 100 100 filtering (%) 96 99 97 100 10 information technology and libraries | march 2009 have the number of libraries not applying for e-rate funds to avoid complying with cipa. however, unlike schools, there is no exact count of how many libraries have filtered internet access. in many cases, the libraries themselves do not filter, but a state library, library consortium, or local or state government system of which they are a part filters access from beyond the walls of the library. in some of these cases, the library staff may not even be aware that such filtering is occurring. a number of state and local governments have also passed their own laws to encourage or require all libraries in the state to filter internet access regardless of e-rate or lsta funds.32 in 2008, 38.2 percent of public libraries were filtering access within the library as a result of directly receiving e-rate funding.33 furthermore, 13.1 percent of libraries were receiving e-rate funding as a part of another organization, meaning that these libraries also would need to comply with cipa’s requirements.34 as such, the number of public libraries filtering access is now at least 51.3 percent, but the number will likely be higher as a result of state and local laws requiring libraries to filter as well as other reasons libraries have implemented filters. in contrast, among libraries not receiving e-rate funds, the number of libraries now not applying for e-rate intentionally to avoid the cipa requirements is 31.6 percent.35 while it is not possible to identify an exact number of public libraries that filter access, it is clear that libraries overall have far lower levels of filtering than the 100 percent of public schools that filter access. e-rate and other program issues the administration of the e-rate program has not occurred without controversy. throughout the course of the program, many applicants for and recipients of the funding have found the program structure to be obtuse, the application process to be complicated and time consuming, and the administration of the decision-making process to be slow.36 as a result, many schools and libraries find it difficult to plan ahead for budgeting purposes, not knowing how much funding they will receive or when they will receive it.37 there also have been larger difficulties for the program. following revelations about the uses of some e-rate awards, the fcc suspended the program from august to december 2004 to impose new accounting and spending rules for the funds, delaying the distribution of over $1 billion in funding to libraries and schools.38 news investigations had discovered that certain school systems were using e-rate funds to purchase more technology than they needed or could afford to maintain, and some school systems failed to ever use technology they had acquired.39 while the administration of the e-rate program has been comparatively smooth since, the temporary suspension of the program caused serious short-term problems for, and left a sense of distrust of, the program among many recipients.40 filtering issues during the 1990s, many types of software filtering products became available to consumers, including serverside filtering products (using a list of server-selected blocked urls that may or may not be disclosed to the user), client-side filtering (controlling the blocking of specific content with a user password), text-based content-analysis filtering (removing illicit content of a website using real-time analysis), monitoring and timelimiting technologies (tracking a child’s online activities and limiting the amount of time he or she spends online), and age-verification systems (allowing access to webpages by passwords issued by a third party to an adult).41 but because filtering software companies make the decisions about how the products work, content and collection decisions for electronic resources in schools and public libraries have been taken out of the hands of librarians, teachers, and local communities and placed in the trust of proprietary software products.42 some filtering programs also have specific political agendas, which many organizations that purchase them are not aware of.43 in a study of over one million pages, for every webpage blocked by a filter as advertised by the software vendor, one or more pages were blocked inappropriately, while many of the criteria used by the filtering products go beyond the criteria enumerated in cipa.44 filters have significant rates of inappropriately blocking materials, meaning that filters misidentify harmless materials as suspect and prevent access to harmless items (e.g., one filter blocked access to the declaration of independence and the constitution).45 furthermore, when libraries install filters to comply with cipa, in many instances the filters will frequently be blocking text as well as images, and (depending on the type of filtering product employed) filters may be blocking access to entire websites or even all the sites from certain internet service providers. as such, the current state of filtering technology will create the practical effect of cipa restricting access to far more than just certain types of images in many schools and libraries.46 n differences in the perceived value of cipa and filtering based on the available data, there clearly is a sizeable contrast in the levels of implementation of cipa between one law with two outcomes | jaeger and yan 11 schools and libraries. this difference raises a number of questions: for what reasons has cipa been much more widely implemented in schools? is this issue mainly value driven, dollar driven, both, or neither in these two public institutions? why are these two institutions so different regarding cipa implementation while they share many social and educational similarities? reasons for nationwide full implementation in schools there are various reasons—from financial, population, social, and management issues to computer and internet availability—that have driven the rapid and comprehensive implementation of filters in public schools. first, public schools have to implement cipa because of societal pressures and the lobbying of parents to ensure students’ internet safety. almost all users of computers in schools are minors, the most vulnerable groups for internet crimes and child pornography. public schools in america have been the focus of public attention and scrutiny for years, and the political and social responsibility of public schools for children’s internet safety is huge. as a result, society has decided these students should be most strongly protected, and cipa was implemented immediately and most widely at schools. second, in contrast to public libraries (which average slightly less than eleven computers per library outlet), the typical number of computers in public schools ranges from one hundred to five hundred, which are needed to meet the needs of students and teachers for daily learning and teaching. since the number of computers is quite large, the financial incentives of e-rate funding are substantial and critical to the operation of the schools. this situation provides administrators in schools and school districts with the incentive to make decisions to implement cipa as quickly and extensively as possible. furthermore, the amount of money that e-rate provides for schools in terms of technology is astounding. as was noted earlier, schools received over $12 billion from 2000 to 2003 alone. schools likely would not be able to provide the necessary computers for students and teachers without the e-rate funds. third, the actual implementation procedure differs in schools and libraries: schools are certified under the supervision of the local educational agencies such as school districts and state departments of education; libraries are certified within a library organization under the supervision of the imls. in other words, the certification process at schools is directly and effectively controlled by school districts and state departments of education, following the same fundamental values of protecting children. the resistance to cipa in schools has been very small in comparison to libraries. the primary concern raised has been the issue of educational equality. concerns have been raised that filters in schools may create two classes of students—ones with only filtered access at school and ones who also can get unfiltered access at home.47 reasons for more limited implementation in libraries in public libraries, the reasons for implementing cipa are similar to those of public schools in many ways. public libraries provide an average of 10.7 computers in each of the approximately seven thousand public libraries in the united states, which is a lot of technology that needs to be supported. the e-rate and lsta funds are vital to many libraries in the provision of computers and the internet. furthermore, with limited alternative sources of funding, the e-rate and lsta funds are hard to replace if they are not available. given that the public libraries have become the guarantor of public access to computing and the internet, libraries have to find ways to ensure that patrons can access the internet.48 libraries also have to be concerned about protecting and providing a safe environment for younger patrons. while libraries serve patrons of all ages, one of the key social expectations of libraries is the provision of educational materials for children and young adults. children’s sections of libraries almost always have computers in them. much of the content blocked by filters is of little or no education value. as such, “defending unfiltered internet access was quite different from defending catcher in the rye.”49 nevertheless, many libraries have fought against the filtering requirements of cipa because they believe that it violates the principles of librarianship or for a number of other reasons. in 2008, 31.6 percent of public libraries refused to apply for e-rate or lsta funds specifically to avoid cipa requirements, a substantial increase from the 15.3 percent of libraries that did not apply for e-rate because of cipa in 2006.50 as a result of defending patron’s rights to free access, the libraries that are not applying for e-rate funds because of the requirements of cipa are being forced to turn down the chance for funding to help pay for internet access in order to preserve community access to the internet. because many libraries feel that they cannot apply for e-rate funds, local and regional discrepancies are occurring in the levels of internet access that are available to patrons of public libraries in different parts of the country.51 for adult patrons who wish to access material on computers with filters, cipa states that the library has the option of disabling the filters for “bona fide research or other lawful purposes” when adult patrons request such disabling. the law does not require libraries to 12 information technology and libraries | march 2009 disable the filters for adult patrons, and the criteria for disabling of filters do not have a set definition in the law. the potential problems in the process of having the filters disabled are many and significant, including librarians not allowing the filters to be turned off, librarians not knowing how to turn the filters off, the filtering software being too complicated to turn off without injuring the performance of the workstation in other applications, or the filtering software being unable to be turned off in a reasonable amount of time.52 it has been estimated that approximately 11 million low-income individuals rely on public libraries to access online information because they lack internet access at home or work.53 the e-rate and lsta programs have helped to make public libraries a trusted community source of internet access, with the public library being the only source of free public internet access available to all community residents in nearly 75 percent of communities in the united states.54 therefore usage of computers and the internet in public libraries has continued to grow at a very fast pace over the past ten years.55 thus public libraries are torn between the values of providing safe access for younger patrons and broad access for adult patrons who may have no other means of accessing the internet. n cipa, public policy, and further research while the diverse implementations, effects, and levels of acceptance of cipa across schools and libraries demonstrate the wide range of potential ramifications of the law, surprisingly little consideration is given to major assumptions in the law, including the appropriateness of the requirements to different age groups and the nature of information on the internet. cipa treats all users as if they are the same level of maturity and need the same level of protection as a small child, as evidenced by the requirement that all computers in a library or school have filters regardless of whether children use a particular computer. in reality, children and adults interact in different social, physical, and cognitive ways with computers because of different developmental processes.56 cipa fails to recognize that children as individual users are active processors of information and that children of different ages are going to be affected in divergent ways by filtering programs.57 younger children benefit from more restrictive filters while older children benefit from less restrictive filters. moreover, filtering can be complimented by encouragement of frequent positive internet usage and informal instruction to encourage positive use. finally, children of all ages need a better understanding of the structure of the internet to encourage appropriate caution in terms of online safety. the internet represents a new social and cultural environment in which users simultaneously are affected by the social environment and also construct that environment with other users.58 cipa also is based on fundamental misconceptions about information on the internet. the supreme court’s decision upholding cipa represents several of these misconceptions, adopting an attitude that ‘we know what is best for you’ in terms of the information that citizens should be allowed to access.59 it assumes that schools and libraries select printed materials out of a desire to protect and censor rather than recognizing the basic reality that only a small number of print materials can be afforded by any school or library. the internet frees schools and libraries from many of these costs. furthermore, the court assumes that libraries should censor the internet as well, ultimately upholding the same level of access to information for adult patrons and librarians in public libraries as students in public schools. these two major unexamined assumptions in the law certainly have played a part in the difficulty of implementing cipa and in the resistance to the law. and this does not even address the problems of assuming that public libraries and public schools can be treated interchangeably in crafting legislation. these problematic assumptions point to a significantly larger issue: in trying to deal with the new situations created by the internet and related technology, the federal government has significantly increased the attention paid to information policy.60 over the past few years, government laws and standards related to information have begun to more clearly relate to social aspects of information technologies such as the filtering requirements of cipa.61 but the social, economic, and political ramifications for decisions about information policy are often woefully underexamined in the development of legislation.62 this paper has documented that many of the reasons for and statistics about cipa implementation are available by bringing together information from different social institutions. the biggest questions about cipa are about the societal effects of the policy decisions: n has cipa changed the education and informationprovision roles of libraries and schools? n has cipa changed the social expectations for libraries and schools? n have adult patron information behaviors changed in libraries? n have minor patron information behaviors changed in libraries? n have student information behaviors changed in school? n how has cipa changed the management of libraries and schools? n will congress view cipa as successful enough to merit using libraries and schools as the means of enforcing other legislation? one law with two outcomes | jaeger and yan 13 but these social and administrative concerns are not the only major research questions raised by the implementation of cipa. future research about cipa not only needs to focus on the individual, institutional, and social effects of the law. it must explore the lessons that cipa can provide to the process of creating and implementing information policies with significant societal implications. the most significant research issues related to cipa may be the ones that help illuminate how to improve the legislative process to better account for the potential consequences of regulating information while the legislation is still being developed. such cross-disciplinary analyses would be of great value as information becomes the center of an increasing amount of legislation, and the effects of this legislation have continually wider consequences for the flow of information through society. it could also be of great benefit to public schools and libraries, which, if cipa is any indication, may play a large role in future legislation about public internet access. references 1. children’s internet protection act (cipa), public law 106554. 2. united states v. american library association, 539 u.s. 154 (2003). 3. american library association, libraries connect communities: public library funding & technology access study 2007–2008 (chicago: ala, 2008); paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–9; zheng yan, “limited knowledge and limited resources: children’s and adolescents’ understanding of the internet,” journal of applied developmental psychology (forthcoming); zheng yan, “differences in basic knowledge and perceived education of internet safety between high school and undergraduate students: do high school students really benefit from the children’s internet protection act?” journal of applied developmental psychology (forthcoming); zheng yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?,” developmental psychology 42 (2006): 418–28. 4. martha m. mccarthy, “filtering the internet: the children’s internet protection act,” educational horizons 82, no, 2 (winter 2004): 108. 5. federal communications commission, in the matter of federal–state joint board on universal service: children’s internet protection act, fcc order 03-188 (washington, d.c.: 2003). 6. cipa. 7. roth v. united states, 354 u.s. 476 (1957). 8. miller v. california, 413 u.s. 15 (1973). 9. roth v. united states. 10. cipa. 11. cipa. 12. telecommunications act of 1996, public law 104-104 (feb. 8, 1996). 13. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 14. ibid. 15. ibid. 16. american library association, “u.s. supreme court arguments on cipa expected in late winter or early spring,” press release, nov. 13, 2002, www.ala.org/ala/aboutala/hqops/ pio/pressreleasesbucket/ussupremecourt.cfm (accessed may 19, 2008). 17. kelly rodden, “the children’s internet protection act in public schools: the government stepping on parents’ toes?” fordham law review 71 (2003): 2141–75. 18. john carlo bertot, paul t. jaeger, and charles r. mcclure, “public libraries and the internet 2007: issues, implications, and expectations,” library & information science research 30 (2008): 175–184; charles r. mcclure, paul t. jaeger, and john carlo bertot, “the looming infrastructure plateau?: space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007), www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/2017/1907 (accessed may 19, 2008). 19. mccarthy, “filtering the internet.” 20. leigh s. estabrook and edward lakner, “managing internet access: results of a national survey,” american libraries 31, no. 8 (2000): 60–62. 21. alberta davis comer, “studying indiana public libraries’ usage of internet filters,” computers in libraries (june 2005): 10–15; thomas m. reddick, “building and running a collaborative internet filter is akin to a kansas barn raising,” computers in libraries 20, no. 4 (2004): 10–14. 22. communications decency act of 1996, public law 104-104 (feb. 8, 1996). 23. child online protection act (copa), public law 105-277 (oct. 21, 1998). 24. united states v. american library association. 25. r. trevor hall and ed carter, “examining the constitutionality of internet filtering in public schools: a u.s. perspective,” education & the law 18, no. 4 (2006): 227–45; mccarthy “filtering the internet.” 26. library services and technology act, public law 104-208 (sept. 30, 1996). 27. john wells and laurie lewis, internet access in u.s. public schools and classrooms: 1994–2005, special report prepared at the request of the national center for education statistics, nov. 2006. 28. american library association, libraries connect communities; john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 285–301; jaeger et al., “cipa.” 29. wells and lewis, internet access in u.s. public schools and classrooms. 14 information technology and libraries | march 2009 30. ibid. 31. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 32. jaeger et al., “cipa.” 33. american library association, libraries connect communities. 34. ibid. 35. ibid. 36. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 37. ibid. 38. norman oder, “$40 million in e-rate funds suspended: delays caused as fcc requires new accounting standards,” library journal 129, no. 18 (2004): 16; debra lau whelan, “e-rate funding still up in the air: schools, libraries left in the dark about discounted funds for internet services,” school library journal 50, no. 11 (2004): 16. 39. ken foskett and paul donsky, “hard eye on city schools’ hardware,” atlanta journal-constitution, may 25, 2004; ken foskett and jeff nesmith, “wired for waste: abuses tarnish e-rate program,” atlanta journal-constitution, may 24, 2004. 40. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 41. department of commerce, national telecommunication and information administration, children’s internet protection act: study of technology protection measures in section 1703, report to congress (washington, d.c.: 2003). 42. mccarthy, “filtering the internet.” 43. paul t. jaeger and charles r. mcclure, “potential legal challenges to the application of the children’s internet protection act (cipa) in public libraries: strategies and issues,” first monday 9, no. 2 (2004), www.firstmonday.org/issues/issue9_2/ jaeger/index.html (accessed may 19, 2008). 44. electronic frontier foundation, internet blocking in public schools (washington, d.c.: 2004), http://w2.eff.org/censor ship/censorware/net_block_report (accessed may 19, 2008). 45. adam horowitz, “the constitutionality of the children’s internet protection act,” st. thomas law review 13, no. 1 (2000): 425–44. 46. tanessa cabe, “regulation of speech on the internet: fourth time’s the charm?” media law and policy 11 (2002): 50–61; adam goldstein, “like a sieve: the child internet protection act and ineffective filters in libraries,” fordham intellectual property, media, and entertainment law journal 12 (2002): 1187–1202; horowitz, “the constitutionality of the children’s internet protection act”; marilyn j. maloney and julia morgan, “rock and a hard place: the public library’s dilemma in providing access to legal materials on the internet while restricting access to illegal materials,” hamline law review 24, no. 2 (2001): 199–222; mary minow, “filters and the public library: a legal and policy analysis,” first monday 2, no. 12 (1997), www .firstmonday.org/issues/issue2_12/minnow (accessed may 19, 2008); richard j. peltz, “use ‘the filter you were born with’: the unconstitutionality of mandatory internet filtering for adult patrons of public libraries,” washington law review 77, no. 2 (2002): 397–479. 47. mccarthy, “filtering the internet.” 48. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2008); john carlo bertot et al., “drafted: i want you to deliver e-government,” library journal 131, no. 13 (2006): 34–39; paul t. jaeger and kenneth r. fleischmann, “public libraries, values, trust, and e-government,” information technology and libraries 26, no. 4 (2007): 35–43. 49. doug johnson, “maintaining intellectual freedom in a filtered world,” learning & leading with technology 32, no. 8 (may 2005): 39. 50. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities.” 51. jaeger et al., “public libraries and internet access across the united states.” 52. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 53. goldstein, “like a sieve.” 54. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities”; jaeger and fleischmann, “public libraries, values, trust, and e-government.“ 55. bertot, jaeger, and mcclure, “public libraries and the internet 2007”; charles r. mcclure et al., “funding and expenditures related to internet access in public libraries,” information technology & libraries (forthcoming). 56. zheng yan and kurt w. fischer, “how children and adults learn to use computers: a developmental approach,” new directions for child and adolescent development 105 (2004): 41–61. 57. zheng yan, “age differences in children’s understanding of the complexity of the internet,” journal of applied developmental psychology 26 (2005): 385–96; yan, “limited knowledge and limited resources”; yan, “differences in basic knowledge and perceived education of internet safety”; yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?” 58. patricia greenfield and zheng yan, “children, adolescents, and the internet: a new field of inquiry in developmental psychology,” developmental psychology 42 (2006): 391–93. 59. john n. gathegi, “the public library as a public forum: the (de)evolution of a legal doctrine,” library quarterly 75 (2005): 12. 60. sandra braman, “where has media policy gone? defining the field in the 21st century,” communication law and policy 9, no. 2 (2004): 153–82; sandra braman, change of state: information, policy, & power (cambridge, mass.: mit pr., 2007); charles r. mcclure and paul t. jaeger, “government information policy research: importance, approaches, and realities,” library & information science research 30 (2008): 257–64; milton mueller, christiane page, and brendan kuerbis, “civil society and the shaping of communication-information policy: four decades of advocacy,” information society 20, no. 3 (2004): 169–85. 61. paul t. jaeger, “information policy, information access, and democratic participation: the national and international implications of the bush administration’s information politics,” government information quarterly 24 (2007): 840–59. 62. mcclure and jaeger, “government information policy research.” 6 information technology and libraries | september 2008 mireia ribera turróeditorial board thoughts the june issue of ital featured a new column enti-tled editorial board thoughts. the column features commentary written by ital editorial board members on the intersection of technology and libraries. in the june issue kyle felker made a strong case for gerald zaltman’s book how customers think as a guide to doing user-centered design and assessment in the context of limited resources and uncertain user needs. in this column i introduce another factor in the library–it equation, that of rapid technological change. in the midst of some recent spring cleaning in my library i had the pleasure of finding a report documenting the current and future it needs of purdue university’s hicks undergraduate library. the report is dated winter 1995. the following summarizes the hicks undergraduate library’s it resources in 1995: [the library] has seven public workstations running eight different databases and using six different search software programs. six of the stations support a single database only; one station supports one cd-rom application and three other applications (installed on the hard drive). none of the computers runs windows, but the current programs do not require it. five stations are equipped with six-disc cd-rom drives. we do not anticipate that we will be required to upgrade to windows capability in the near future for any of the application programs. today the hicks undergraduate library’s it resources are dramatically different. as opposed to seven public workstations, we have more than seventy computers distributed throughout the library and the digital learning collaboratory, our information commons. this excludes forty-six laptops available for patron checkout and eighty-eight laptops designated for instructional use. we have moved from eight cd-rom databases to more than four hundred networked databases accessible throughout the purdue university libraries, campus, and beyond. as a result, there are hundreds of “search software programs”—doesn’t that phrase sound odd today?—including the library databases, the catalog, and any number of commercial search engines like google. today all, or nearly all, of our machines run windows, and the macs have the capability of running windows. in addition to providing access to databases, our machines are loaded with productivity and multimedia software allowing students to consume and produce a wide array of information resources. beyond computers, our library now loans out additional equipment including hard drives, digital cameras, and video cameras. the 1995 report also includes system specifications for the computers. these sound quaint today. of the seven computers six were 386 machines with processors clocking in at 25 mhz. the computers had between 640k and 2.5mb of ram with hard drives with capacities between 20 and 60mb. the seventh computer was a 286 machine probably with a 12.5 mhz processor, and correspondingly smaller memory and hard disc capacity. the report does not include monitor specifications, though, based on the time, they were likely fourteenor fifteen-inch cga or ega cathode ray tube monitors. modern computers are astonishingly powerful in comparison. according to a member of our it unit, the computers we order today have 2.8 ghz dual core processors, 3gb of ram, and 250gb hard drives. this equates to being 112 times faster, 1,200 times more ram, and hard drives that are 4,167 times larger than the 1995 computers! as a benchmark, consider moore’s law, a doubling of capacitors every two years, a sixty-four fold increase over a thirteen year period. who would have thought that library computers would outpace moore’s law?! today’s computers are also smaller than those of 1995. our standard desktop machines serve as an example, but perhaps not as dramatically as laptops, mini-laptops, and any of the mobile computing machines small enough to fit into your pocket. monitors are smaller, though also bigger. each new computer we order today comes standard with a twenty-inch flat panel lcd monitor. it is smaller in terms of weight and overall size, but the viewing area is significantly larger. these trends are certainly not unique to purdue. nearly every other academic library could boast similar it advancements. with this in mind, and if moore’s law continues as projected, imagine the computer resources that will be available on the average desktop machine— although one wonders if it will in fact be a desktop machine—in the next thirteen years. what things out on the distant horizon will eventually become commonplace? here the quote from the 1995 report about windows is particularly revealing. what things that are currently state-of-the-art will we leave behind in the next decade? what’s dos? what’s a cd-rom? will we soon say, what’s a hard drive? what’s software? what’s a desktop computer? in the last thirteen years we have also witnessed the widespread adoption and proliferation of the internet, the network that is the backbone for many technologies that have become essential components of physical and digital libraries. earlier this year, i co-authored an arl spec kit entitled social software in libraries.1 the survey reports on the usage of ten types of social software within arl libraries: (1) social networking sites like myspace and facebook; (2) media sharing sites like 6 information technology and libraries | september 2008 matthew m. bejune (mbejune@purdue.edu) is an ital editorial board member (2007–09), assistant professor of library science at purdue university, and doctoral student in the graduate school of library and information science at the university of illinois at urbana–champaign. matthew m. bejune editorial board thoughts | bejune 7 youtube and flickr; (3) social bookmarking and tagging sites like del. icio.us and librarything; (4) wikis like wikipedia and library success: a best practices wiki; (5) blogs; (6) rss used to syndicate content from webpages, blogs, podcasts, etc.; (7) chat and instant messenger services; (8) voice over internet protocol (voip) services like googletalk and skype; (9) virtual worlds like second life and massively multiplayer online games (mmogs) like world of warcraft; and (10) widgets either developed by libraries like facebook applications, firefox catalog search extensions, or widgets implemented by libraries like meebome and firefox plugins. of the 64 arl libraries that responded, a 52% response rate, 61 (95% of respondents) said they are using social software. of the three libraries not using social software, two indicated they plan to do so in the future. in combination then, 63 out of 64 respondents (98%) indicated they are either currently using or planning to use social software. as part of the survey there was a call for examples of social software used in libraries. of the 370 examples we received, we selected around 70 for publication in the spec kit. the examples are captivating and they illustrate the wide variety of applications in use today. of the ten social software applications in the spec kit, how many of them were at our disposal in 1995? by my count three: chat and instant messenger services, voip, and virtual worlds such as text-based muds and moos. of these three, how many were in use in libraries? very few, if any. in our survey we asked libraries for the year in which they first implemented social software. the earliest applications were cu-seeme, a voip chat service at cornell university in 1996, im at the university of california riverside in 1996 as well, and interoffice chat at the university of kentucky in 1998. the remaining libraries adopted social software in year 2000 and beyond, with 2005 being the most common year with 22 responses or 34% of the libraries that had adopted social software. a look at this data shows that my earlier use of a thirteen-year time period to illustrate how difficult it is to project technological innovations that may prove disruptive to our organizations is too broad a time frame. perhaps we should scale this back to looking at five-year increments of time. using the spec kit data, in year 2003, a total of 16 arl libraries had adopted social software. this represents 25% of the total number of institutions that responded when we did our survey. this seems like figure 1. responses to the question, “please enter the year in which your library first began using social software” (n=61). a more reasonable time frame to be looking to the future. so, what does the future hold for it and libraries, whether it be thirteen or five years in the future? i am not a technologist by training, nor do i consider myself a futurist, so i typically defer to my colleagues. there are three places i look to for prognostications of the future. the first is lita’s top technology trends, a recurring discussion group that is a part of ala’s annual conference sand midwinter meetings. past top technology trends discussions can be found on lita’s blog (www.ala .org/ala/lita/litaresources/toptechtrends/toptechnology.cfm) and on lita’s website (www.ala.org/ala/lita/ litaresources/toptechtrends/toptechnology.cfm). the second source is the horizon project, a five-year qualitative research effort aimed at identifying and describing emerging technologies within the realm of teaching and learning. the project is a collaboration between the new media consortium and educause. the horizon project website (http://horizon.nmc.org/wiki/main_page) contains the annual horizon reports going back to 2004. a final approach to project the future of it and libraries is to consider the work of our peers. the next library innovation may emerge from a sister institution. or perhaps it may take route at your local library first! reference 1. bejune, matthew m. and jana ronan. social software in libraries. arl spec kit 304. washington, d.c.: association of research libraries, 2008. a candid look at collected works: challenges of clustering aggregates in glimir and frbr gail thornburg information technology and libraries | september 2014 53 abstract creating descriptions of collected works in ways consistent with clear and precise retrieval has long challenged information professionals. this paper describes problems of creating record clusters for collected works and distinguishing them from single works: design pitfalls, successes, failures, and future research. overview and definitions the functional requirements for bibliographic records (frbr) was developed by the international federation of library associations (ifla) as a conceptual model of the bibliographic universe. frbr is intended to provide a more holistic approach to retrieval and access of information than any specific cataloging code. frbr defines a work as a distinct intellectual or artistic creation. put very simply, an expression of that work might be published as a book. in frbr terms, this book is a manifestation of that work.1 a collected work can be defined as “a group of individual works, selected by a common element such as author, subject or theme, brought together for the purposes of distribution as a new work.”2 in frbr, this type of work is termed an aggregate or “manifestation embodying multiple distinct expressions .”3 zumer describes aggregate as “a bibliographic entity formed by combing distinct bibliographic units together.”4 here the terms are used interchangeably. in frbr, the definition of aggregates applies only to group 1 entities, i.e., not to groups of persons or corporate bodies. the ifla working group on aggregates has defined three distinct types of aggregates: (1) collections of expressions, (2) aggregates resulting from augmentation or supplementing of a work with additional material, and (3) aggregates of parallel expressions of one work in multiple languages.5 while noting the relationships between the categories, this paper will focus on the first type. aggregates of the first type include selections, anthologies, series, books with independent sections by different authors, and so on. aggregates may occur in any format, from a volume containing both of the j. d. salinger works catcher in the rye and franny and zooey to a sound recording containing popular adagios from several composers to a video containing three john wayne movies. gail thornburg (thornbug@oclc.org) is consulting software engineer and researcher at oclc, dublin, ohio. mailto:thornbug@oclc.org a candid look at collected works | thornburg 54 the environment the oclc worldcat database is replete with bibliographic records describing aggregates. it has been estimated that that database may contain more than 20 percent aggregates.6 this proportion may increase as worldcat coverage of recordings and videos tends to increase. in the global library manifestation identifier (glimir) project, automatic clustering of the records into groups of instances of the same manifestation of a work was devised. glimir finds and groups similar records for a given manifestation and assigns two types of identifiers for the clusters. the first type is manifestation id, which identifies parallel records differing only in language of cataloging or metadata detail, some of which are probably true duplicates whose differences cannot be safely deduplicated by a machine process. the second type is a content id, which describes a broader clustering, for instance, physical and digital reproductions and reprints of the same title from differing publishers. this process started with the searching and matching algorithms developed for worldcat. the glimir clustering software is a specialization of the matching software developed for the batch loading of records to worldcat, deduplicating the database, and other search and comparison purposes.7 this form of glimirization compares an incoming record to database search results to determine what should match for glimir purposes. this is a looser match in some respects than what would be done for merging duplicates. the initial challenges of tailoring matching algorithms to suit the needs of glimir have been described in thornburg and oskins8 and in gatenby et al.9 the goals of glimir are (1) to cluster together different descriptions of the same resource and to get a clearer picture of the number of actual manifestations in worldcat so as to allow the selection of the most appropriate description, and (2) to cluster together different resources with the same content to improve discovery and delivery for end users. according to richard greene, “the ultimate goal of glimir is to link resources in different sites with a single identifier, to cluster hits and thereby maximize the rank of library resources in the web sphere.”10 glimir is related conceptually to the frbr model. if the goal of frbr is to improve the grouping of similar items for one work, then glimir similarly groups items within a given work. manifestation clusters specify the closest matches. content clusters contain reproductions and may be considered to represent elements of the expression level of the frbr model. the frbr and glimir algorithms this paper discusses have evolved significantly over the past three years. in addition, it should be recognized that the frbr algorithms use a map/reduce keyed approach to cluster frbr works and some glimir content while the full glimir algorithms use a more detailed and computationally expensive record comparison approach. the frbr batch process starts with worldcat enhanced with additional authority links, including the production glimir clusters. it makes several passes through worldcat, each pass constructing keys that pull similar records together for comparison and evaluation. as described by toves, “successive passes progressively build up knowledge about the groups allowing us to refine and information technology and libraries | september 2014 55 expand clusters, ending up with the work, content and manifestation clusters to feed into production.”11 each approach to clustering has its limits of feasibility, but the frbr and glimir combined teams have endeavored to synchronize changes to the algorithms and to share insights. some materials are easier to cluster using one approach, and some in the other. clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified and handled in some way. why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowballing of cluster members that serves no one. how could this happen? in a bibliographic database such as worldcat, items representing collected works can be catalogued in several ways. efforts to relax matching criteria in just the right degree to cluster records for the same work are difficult to devise and apply. the glimir and frbr teams consulted several times to discuss clustering strategies for works, content, and manifestation clusters. practical experience with glimir led to rounds of enhancements and distinctions to improve the software’s decisions. while glimir clusters can and have been undone and redone on more than one occasion, it took experience from the team to realize that the clues to a collected work must be recognized. bible and beowulf as are many initial production startups, the output of glimir processing was monitored. reports for changes in any clusters of more than fifty were reviewed by quality control catalogers for suspicious combinations. and occasionally a library using a glimiror frbr-organized display would report a strange cluster. this was the case with a huge malformed cluster of records for the bible. such a work set tends to be large and unmanageable by nature; there are a huge number of records for the bible in worldcat. however, it was noticed the set had grown suddenly over the previous two months. user interface applications stalled when attempting to present a view organized by such a set. one day, a local institution reported that a record for beowulf had turned up in this same work set. this started the team on an investigation. after much searching and analysis of the members of this cluster, the index case was uncovered. in many cases bibliographic records are allowed to cluster based on a uniform title. what the team found connecting these disparate records was a totally unexpected use of the uniform title, a field a candid look at collected works | thornburg 56 240 subfield a, contents: “b.”. that’s right, “b.”. once the first case was located, it was not hard to figure out that there were numerous uniform “titles” with other single letters of the alphabet. so in this odd usage, bible and beowulf could come together, if insufficient data were present in two records to discriminate by other comparisons. or potentially, other titles which started with “b.” seeing this unanticipated use of uniform title field, the frbr and glimir algorithms were promptly modified to beware. the frbr and glimir clusters were then unclustered and redone. this was a data issue, and unanticipated uses of fields in a record will crop up, if usually with less drama. further experience showed more. in the examination of another ill-formed cluster, a reviewer realized that one record had the uniform title stated as “illiad” but the item title was homer’s “odyssey.” of course these have the same author, and may easily have the same publisher. even the same translator (e.g., richard lattimore) is not improbable for a work like this. this was a case of bad data, but it imploded two very large clusters. music and identification of collected works as music catalogers know, musical works are very frequently presented in items that are collections of works. the rules for creating bibliographic records for music, whether scores or recordings or other, are intricate. the challenges to software to distinguish minor differences in wording from critical differences seem to be endless. moreover, musical sound recordings are largely collected works due to the nature of publication. as noted by papakhian, personal author headings are repeated oftener in sound recording collections than in the general body of materials.12 there are several factors that may contribute to such an observation. there are likely to be numerous recordings by the same performer of different works and numerous records of the same work by different performers. composers are also likely to be performers. the point is, for sound recordings an author statement and title may be less effective discriminators than for printed materials. vellucci13,14 and riley15 have written extensively on the problems of music in frbr models. the problems of distinguishing and relating whole/part relationships is particularly tricky. musical compositions often consist of units or segments that can be performed separately. so they are generally susceptible to extraction. these extractive relationships are seen in cases where parts are removed from the whole to exist separately, or perhaps parts for a violin or other instrument are extracted from the full score. software must be informed with rules as to significant differences in description of varying parts and varying descriptions of instruments, and in this team’s experience that is particularly difficult. krummel has noted that the bibliographic control of sound recordings has a dimension beyond item and work, that is, performance.16 different performances of the same beethoven symphony information technology and libraries | september 2014 57 need to be distinguished. cast and performer list evaluation and dates checking are done by the software. however, the comparisons the software can make are susceptible to fullness or scarcity of data provided in the bibliographic record. there is great variation observed in the numbers of cast members stated in a record. translator and adapter information can prove useful in the same sense of roles discrimination for other types of materials. this is close scrutiny of a record. at the same time consider that an opera can include the creative contributions of an author (plot), a librettist, and a musical composer. yet these all come together to provide one work, not a collected work. tillett has categorized seven types of bibliographic relationships among bibliographic entities, including the following: 1. equivalence, as exact copies or reproduction of a work. photocopies, microforms are examples. 2. derivative relationships, or, a modification such as variations, editions, translations. 3. descriptive, as in criticism, evaluation, review of a work. 4. whole/part, such as the relation of a selection from an anthology. 5. accompanying, as in a supplement or concordance or augmentation to a work. 6. sequential, or chronological relationships. 7. shared characteristic relationships, as in items not actually related that share a common author, director, performer, or other role. 17 while it is highly desirable for a software system to notice category 1 to cluster different records for the same work, that same software could be confused by “clues,” such as in category 7. and the software needs to understand the significance of the other categories in deciding what to group and what to split. to handle these relations in bibliographic records, tillett discusses linking devices including, for instance, uniform titles. yet uniform titles are used for the categories of equivalence relationships, whole/part relationships, and derivative relationships. this becomes more and more complex for a machine to figure out. of course, uniform titles within bibliographic records are supposed to link to authority records via text string only. consideration should ideally be given to linking via identifiers, as has been suggested elsewhere.18 thematic indexes review of scores and recordings glimir clusters showed a case where haydn’s symphonies a and b were brought together. these were outside the traditional canon of the 104 haydn symphonies and were referred to as “a” and “b” by the haydn scholar h. c. robbins landon. this misclustering highlighted the need for additional checks in the software. a candid look at collected works | thornburg 58 the original glimir software was not aware of thematic indexes as a tool for discrimination. thematic indexes are numbering systems for the works of a composer. the kochel mozart catalog, as in k. 626, is a familiar example. these designations are not unique to a given composer, that is, they are intended to be unique for a given composer, but identical designators may coincidentally have been assigned to multiple composers. while “b” series numbers may be applied to works of chambonnières, couperin, dvořák, pleyel, and others, the presence of more than one b number is suggestive of collected work status. for more on the various numbering systems, see the interesting discussion by the music library association.19 however, the software cannot merely count likely identifiers in the usual place. this could lead to falsely flagging aggregates; one work by dvořák could have b.193, which is incidentally equivalent to opus 105. clearly, any detection of multiple identifiers of this sort must be restricted to identifiers of the same series. string quartet number 5, or maybe 6 cases of renumbering can cause problems in identifying collected works. an early suppressed or lost work, later discovered and added to the canon of the composer’s work, can cause renumbering of the later works. clustering software needs must be very attentive to discrete numbers in music, but can it be clever enough? paul hindemith (1895–1963) works offer an example. his first string quartet was written in 1915, but long suppressed. his publisher was generally schott. long after hindemith’s death, this first quartet was unearthed, and then was published by schott. the publisher then renumbered all the quartets. so quartets previously 1 through 6 became 2 through 7. the rediscovered work was then called “no. 1,” though sometimes called “no. 0” to keep the older numbering intact. further, the last two quartets did not even have opus numbers assigned and were both in the same key.20 this presents a challenge. anything musical another problem case emerged when reviewers noticed a cluster contained both the unrelated songs “old black joe” and “when you and i were young maggie.” on investigation, the cluster held a number of unrelated pieces. here the use of alternate titles in a 246 field had led to overclustering, and the rules for use of 246 fields were tightened in frbr and glimir. as in the other problem cases, cycles of testing were necessary to estimate sufficient yet not excessive restrictions. rules too strict split good clusters and defeat the purpose of frbr and glimir. at this point the glimir/frbr team recognized that rules changes were necessary but not sufficient. that is, a concerted effort to handle collected works was essential. information technology and libraries | september 2014 59 strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). with time and experimentation, a set of checks were devised to allow collected works to be flagged. these clues were categorized as types: (1) considered conclusive evidence, or (2) partial evidence. type 2 needed another piece of evidence in the record. finding the best clues was a team effort. it was acknowledged that to prevent overclustering, overidentification of aggregates was preferable to failure to identify them. several cycles of tests were conducted and reviewed, assessing whether the software guessed right. table 1 illustrates the types of checks done for a given bibliographic record. here the “$” is used as abbreviation for subfield, and “ind” equals indicator. area field rule notes uniform title 240 $a and no $m, $n, $p, or $r title in $ a on list of terms, without the other subfields listed, is collected work this is a long list of terms such as “symphonies,” “plays,” “concertos,” and so on. title 245 contains “selections,” is collected 245 245 with multiple semi colons and doc type “rec” 246 if four or more v246 fields with ind2 = 2, 3, or 4, is collected. if more than 1 246, consider partial evidence extent 300 if 300$a has “pagination multiple” or “multiple pagings,” is collected contents notes 505$a and $t 1. check $a for first and last occurrences of “movement”. if not multiple movement occurrences and does have if all / any the above produce more than one pattern instance or more a candid look at collected works | thornburg 60 multiple “ / ” pattern. 2. if the above doesn’t find multiple patterns, also look for “ ; “ patterns. 3. if the above checks don’t produce more than 1 pattern, look for multiple “ – ” patterns. 4. count 505s $t cases. 5. count $r cases. than one $t, or more than one $r, is collected. various fields for thematic index clues 505a if any v505 $a, check for differing opuses. (this also checks for thematic index cases too.) if found, is collected. for types score and recording related work 740 if 1 or more 740 and 1 has indicator 2 = 2”, is collected . if only multiple 740s, partial evidence author 700/710/711/730 check for $t and $n. and check 730 ind 2 value of “2.” if 730 with ind2 = 2 or multiple $t is found, is collected. if only 1 $t, partial evidence 100/110/111, 700/710 730 if format recording, and both records are collected work, require cast list match to cluster anything but manifestation matches. that is, do not cluster at content level without verifying by cast. table 1. checks on bibliographic records. frailties of collected works identification in well-cataloged records the above table illustrates many areas in a bibliographic record that can be mined for evidence of aggregates. the problem is that cataloging practice offers no one rule mandatory to catalog a collected work correctly. moreover, as worldcat membership grows, the use of multiple schemes of cataloging rules for different eras and geographic areas adds to the complexity, even assuming that all the bibliographic records are cataloged “correctly.” correct cataloging is not assumed by the team. information technology and libraries | september 2014 61 software confounded with all the checks outlined in the table, the team still found cases of collected works that seemed to defy machine detection. one record had the two separate works, tom sawyer and huckleberry finn, in the same title field, with no other clues to the aggregate nature of the item. the work brustbild was another case. for this electronic resource set, brustbild appeared to be the collection set title, but the specific title for each picture was given in the publisher field. a cluster for the work gedichte von eduard morike (score) showed problems with the uniform title which was for the larger work, but the cluster records each actually represented parts of the work. the bad cluster for si ku quan shu zhen ben bie ji, an electronic resource, contained records which each appeared to represent the entire collection of 400 volumes, but the link in each 856 field pointed only to one volume in the set. limitations of the present approach the current processing rules for collected works adopt a strategy of containment. the problem may be handled in the near term by avoiding the mixing of collected works with noncollected works, but the clusters containing collected works need further analysis to produce optimal results. for example, it is one thing to notice scores “arrangements” as a clue to the presence of an aggregate. the requirement also exists that an arrangement should not cluster with the original score. the rules for clustering and distinguishing different sets of arrangements present another level of complexity. checks to compare and equate the instruments involved in an arrangement are quite difficult; in this team’s experience, they fail more often than they succeed. without initial explication of the rules for separating arrangements, reviewers quickly found clusters such as haydn’s schopfung, which included records for the full score, vocal score, and an arrangement for two flutes. an implementation that expects one manifestation to have the identifier of only one work is a conceptual problem for aggregates. a simple case: if the description of a recording of bernstein’s mass has an obscurely placed note indicating the second side contains the work candide, mass is likely to be dominant in the clustering effect, with the second work effectively “hidden.” this manifestation would seem to need three work ids, one for the combination, one for mass, and one for candide. this does not easily translate to an implementation of the frbr model but could perhaps be achieved via links. several layers of links would seem necessary. a manifestation needs to link to its collected work. a collected work needs links to records for the individual works that it contains, and vice versa, individual works need to link to collective works. this can be important for translations, for example, into russian, where collective works are common even where they do not exist in the original language. a candid look at collected works | thornburg 62 lessons learned first and foremost, plan to deal with collected works. for clustering efforts this must be addressed in some way for any large body of records. secondly, formats will gain the focus. the initial implementation of the glimir algorithms used test sets mainly composed of a specific work. after all, glimir clusters should all be formed within one work. these sets were carefully selected to represent as many different types of work sets as possible, whether clear or difficult examples of work set members. plenty of attention was given to the compatibility of differing formats, given the looser content clustering. these were good tests of the software’s ability to cluster effectively and correctly within a set that contained numerous types of materials. random sets of records were also tested to cross check for unexpected side effects. what in retrospect the team would have expanded was sets that were focused on specific formats. recordings, scrutinized as a group, can show different problems than scores or books. the distinctions to be made are probably not complete. another lesson learned in glimir concerned the risks of clustering. the deliberate effort to relax the very conservative nature of the matching algorithms used in glimir was critical to success in clustering anything. singleton clusters don’t improve anyone’s view. in the efforts to decide what should and should not be clustered, it was initially hard to discern the larger scale risks of overclustering. risks from sparse records were probably handled fairly well in this initial effort, but risks from complex records needed more work. collected works is only one illustration of risks of overclustering. future research the current research suggests a number of areas for possible further exploration: • the option for human intervention to rearrange clusters not easily clustered automatically would seem to be a valuable enhancement. • there is next the general question, what sort of processing is needed, and feasible, to distinguish the members of clusters flagged as collected works? • part versus whole relationships can be difficult to distinguish from the information in bibliographic records. further investigation of these descriptions is needed. • arrangements of works in music are so complex as to suggest an entire study by themselves. work on this area is in progress, but it needs rules investigation. • other derivative relationships among works: do these need consideration in a clustering effort? can and should they be brought together while avoiding overclustering of aggregates? • how much clustering of collected works may actually be helpful to persons or processes searching the database? how can clusters express relationships to other clusters? information technology and libraries | september 2014 63 conclusion clustering bibliographic records in a database as large as worldcat takes careful design and undaunted execution. the navigational balance between underclustering and overclustering is never easy to maintain, and course corrections will continue to challenge the navigators. acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many complex cases in music cataloging and music history. references 1. barbara tillett, “what is frbr? a conceptual model for the bibliographic universe,” last modified 2004, accessed november 22, 2013, http://www.loc.gov/cds/frbr.html. 2. janifer gatenby, email message to the author, november 10, 2013. 3. international federation of library associations (ifla) working group on aggregates, final report of the working group on aggregates, september 12, 2011, http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf. 4. maja zumer and edward t. o’neill, “modeling aggregates in frbr,” cataloging and classification quarterly 50, no. 5–7 (2012): 456–72. 5. ifla working group on aggregates, final report. 6. zumer and o’neill, “modelling aggregates in frbr.” 7. gail thornbug and w. michael oskins, “misinformation and bias in metadata processing: matching in large databases,” information technology & libraries 26, no. 2 (2007): 15–22. 8. gail thornburg and w. michael oskins, “matching music: clustering versus distinguishing records in a large database,” oclc systems and services 28, no. 1 (2012): 32–42. 9. janifer gatenby et al., “glimir: manifestation and content clustering within worldcat,” code{4}lib journal 17 (june 2012),http://journal.code4lib.org/articles/6812. 10. richard o. greene, “cataloging alchemy: making your data work harder” (slideshow presented at the american library association annual meeting, washington, dc, june 26–29, 2010), http://vidego.multicastmedia.com/player.php?p=ntst323q. 11. jenny toves, email message to the author, december 17, 2013. 12. arsen r. papakhian, “the frequency of personal name headings in the indiana university music library card catalogs,” library resources & technical services 29 (1985): 273–85. http://www.loc.gov/cds/frbr.html http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf http://journal.code4lib.org/articles/6812 http://vidego.multicastmedia.com/player.php?p=ntst323q a candid look at collected works | thornburg 64 13. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md: scarecrow, 1997). 14. sherry l. vellucci, “frbr and music,” in understanding frbr: what it is and how it will affect our retrieval tools, ed. arlene g. taylor (westport, ct: libraries unlimited, 2007), 131–51. 15. jenn riley, “application of the functional requirements for bibliographic records (frbr) to music,” www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf. 16. donald w. krummel, “musical functions and bibliographic forms,” the library, 5th ser. 31 (1976): 327–50. 17. barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging,” (phd diss., graduate school of library & information science, university of california, los angeles, 1987), 22–83. 18. program for cooperative cataloging (pcc) task group on the creation and function of name authorities in a non marc environment, “report on the pcc task group on the creation and function of name authorities in a non marc environment,” last modified 2013, http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcc tgonnameauthina_nonmarc_environ_finalreport.pdf. 19. music library association, authorities subcommittee of the bibliographic control committee, “thematic indexes used in the library of congress/naco authority file,” http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm. 20. jay weitz, email message to the author, may 6, 2013. http://www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm overview and definitions the environment clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified ... why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowb... bible and beowulf music and identification of collected works thematic indexes string quartet number 5, or maybe 6 anything musical strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). frailties of collected works identification in well-cataloged records software confounded limitations of the present approach lessons learned future research conclusion acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many co... references checking out facebook.com | charnigo and barnett-ellis 23 author name and second author checking out facebook.com | charnigo and barnett-ellis 23 author name and second author author id box for 2 column layout while the burgeoning trend in online social networks has gained much attention from the media, few studies in library science have yet to address the topic in depth. this article reports on a survey of 126 academic librarians concerning their perspectives toward facebook.com, an online network for students. findings suggest that librarians are overwhelmingly aware of the “facebook phenomenon.” those who are most enthusiastic about the potential of online social networking suggested ideas for using facebook to promote library services and events. few individuals reported problems or distractions as a result of patrons accessing facebook in the library. when problems have arisen, strict regulation of access to the site seems unfavorable. while some librarians were excited about the possibilities of facebook, the majority surveyed appeared to consider facebook outside the purview of professional librarianship. d uring the fall of 2005, librarians noticed something unusual going on in the houston cole library (hcl) at jacksonville state university (jsu). students were coming into the library in droves. patrons waited in lines with photos to use the public­access scan­ ner (a stack of discarded pictures quickly grew). library traffic was noticeably busier than usual and the computer lab was constantly full, as were the public­access termi­ nals. the hubbub seemed to center around one particular web site. once students found available computers, they were likely to stay glued to them for long stretches of time, mesmerized and lost in what was later determined to be none other than “facebook addiction.” this addic­ tion was all the more obvious the day the internet was down. withdrawal was severe. soon after the librarians noticed this curious behavior, an article in the chanticleer, the campus newspaper for jsu, dispelled the mystery surrounding the web­site brouhaha. a campus reporter broke the exciting news to the jsu community that “after months of waiting and requests from across the country, it’s finally here. jsu is officially on the facebook.”1 the library suddenly became a popular hangout for students in search of computers to access facebook. apparently jsu jumped on the bandwagon relatively late. the facebook craze had already spread throughout other colleges and universities since the web site was founded in february 2004 by mark zuckerberg, a former student at harvard university. the creators of facebook vaguely define the site as “a social utility that connects you with the people around you.”2 although originally created to allow students to search for other students at colleges and universities, the site has expanded to allow individuals to connect in high schools, companies, and within regions. recently, zuckerberg has also announced plans to expand the network to military bases.3 currently, students and alumni in more than 2,200 colleges and uni­ versities communicate, connect with other students, and catch up with past high school classmates daily through the network. students who may never physically meet on campus (a rather serendipitous occurrence in nature) have the opportunity to connect through facebook. establishing virtual identities by creating profiles on the site, students post photographs, descriptions of academic and personal interests such as academic majors, campus organizations of which they are members, political orientation, favorite authors and musicians, and any other information they wish to share about themselves. facebook’s search engine allows users to search for students, faculty, and staff with similar interests by keyword. it would be hard to gauge how many of these students actually meet in person after connecting through facebook. the authors of this study have heard students mention that either they or their friends have made dates with other students on campus through facebook. many of the “friends” facebook users first add when they initially establish their accounts are the ones they are already acquainted with in the physical world. when facebook made its debut at jsu, it had become the “ninth most highly trafficked web site in the u.s.”4 one source estimated that 85 percent of college students whose institutions are registered in facebook’s directory have created personal profiles on the site.5 membership for the university network requires a university e­mail address, and an institution cannot be registered in the directory unless a significant number of students request that the school be added. currently, more than nine mil­ lion people are registered on facebook.6 soon after jsu was registered on facebook’s direc­ tory, librarians began to receive questions regarding use of the scanner and requests for help uploading pictures to facebook profiles. students seemed surprisingly open about showing librarians their profiles, which usually contained more information than the librarians wanted to know. however, not all students were enthusiastic about facebook. complaints began to surface from students awaiting access to computers for academic work while classmates “tied up” computers on facebook. some stu­ dents complained about the distraction facebook caused checking out facebook.com: the impact of a digital trend on academic libraries laurie charnigo and paula barnett-ellis laurie charnigo (charnigo@jsu.edu) is an education librarian and paula barnett-ellis (pbarnett@jsu.edu) is a health, science, and nursing librarian at the houston cole library, jacksonville state university, alabama. 24 information technology and libraries | march 200724 information technology and libraries | march 2007 in the library’s computer lab, a complaint that eventually reached the president of jsu. currently, the administra­ tion at jsu has decided to block access to facebook in the computer labs on campus, including the lab in the library. opinions of faculty and staff in the library about facebook vary. some librarians scoff at this new trend, viewing the site primarily as just another dating service. others have created their own facebook accounts just to see how it works, to connect with students, and to keep up with the latest internet fad.7 ■ study rationale prompted by the issues that have arisen at hcl as a result of heavy patron use of facebook, the authors surveyed academic librarians throughout the united states to find out what impact, if any, the site has had on other libraries. the authors sought information about the practical effect facebook has had on libraries, as well as librarians’ perspectives, perceived roles associated with, and awareness of internet social trends and their place in the library. online social networking, like e­mail and instant messaging, is emerging as a new method of com­ munication. recently, the librarians have heard facebook being used as a verb (e.g., “i’ll facebook you”). few would probably disagree that making social connections and friends (and facebook revolves around connecting friends) is an important aspect of the campus experi­ ence. much of the attraction students and alumni have toward college yearbooks (housed in the library) stems from the same fascination that viewing photos, student profiles, and searching for past and present classmates on facebook inspires. emphasis in this study centers on librarians’ awareness of, experimentation with, and atti­ tudes towards facebook and whether or not they have created policies to regulate or block access to the site on public­access computers. however trendy an individual web site such as facebook may appear, online social networking, a cat­ egory facebook falls within, has become a new subject of inquiry to marketing professionals, sociologists, commu­ nication scholars, and library and information scientists. downes defines social networks as a “collection of indi­ viduals linked together by a set of relations.”8 according to downes, “social networking web sites fostering the development of explicit ties between individuals as ‘friends’ began to appear in 2002.”9 facebook is just one of many popular online social network sites (myspace, friendster, flickr), and survey respondents often asked why questions focused solely on facebook. the authors decided to investigate it specifically because it is cur­ rently the largest online social network targeted for the academic environment. librarians are also increasingly exploring the use of what have loosely been referred to as “internet 2.0” com­ panies and services, such as facebook, to interact with and reach out to our users in new and creative ways. the term internet 2.0 was coined by o’reilly media to refer to internet services such as blogs, wikis, online social net­ working sites, and types of networks that allow users the ability to interact and provide feedback. o’reilly lists the core competencies that define internet 2.0 services. one of these competencies, which might be of particular inter­ est to librarians, is that internet 2.0 services must “trust the users” as “co­developers.”10 as librarians struggle to develop innovative ways to reach users beyond library walls, it seems logical to observe online services, such as facebook and myspace, which appeal to a huge portion of our clientele. from a purely evaluative standpoint of the site as a database, the authors were impressed by several of the search features offered in facebook. graph­theory algo­ rithms and other advanced network technology are used to process connections.11 some of the more interesting search options available in facebook include the ability to: ■ search for students by course field, class number, or section; ■ search for students in a particular major; ■ search for students in a particular student organiza­ tion or club; ■ create “groups” for student organizations, clubs, or other students with common interests; ■ post announcements about campus or organization events; ■ search specifically for alumni; and ■ block or limit who may view profiles, providing users with built­in privacy protection if the user so wishes. since the authors finished the study, the site has added a news feed and a mini feed, features that allow users to keep track of their friends’ notes, messages, profile changes, friend connections, and group events. in response to negative feedback about the news feeds and mini feeds by users who felt their privacy was being violated, facebook’s administrators created a way for users to turn off or limit information displayed in the feeds. the addition of this technology, however, provides a sophisticated level of connectivity that is a benefit to users who like to keep abreast of the latest happenings in their network of friends and groups. the pulse, another feature on the site, keeps daily track of popular interests (e.g., favorite books) and member demographics (number of members, political orientation) and compares them with overall facebook member averages. the authors were pleasantly surprised to discover that the beatles and led zeppelin, beloved bands of the baby boomers, article title | author 25checking out facebook.com | charnigo and barnett-ellis 25 continue to live on in the hearts of today’s students. these groups were ranked in the top ten favorite bands by stu­ dents at jsu. as of october 2006, the top campaign issues expressed by facebook users were: reducing the drinking age to eighteen (go figure) and legalization for same­sex marriage. arguably, much of the information provided by facebook is not academic in nature. however, an evaluation or review of facebook might provide useful information to instruction librarians and database ven­ dors regarding interface design and search capabilities that appeal to students. provitera­mcglynn suggests that facilitating learning among millennials, who “represent 70 to 80 million people” born after 1992 (a large percent­ age of facebook members) involves understanding how they interact and communicate.12 awareness of students’ cultural and social interests, and how they interact online, may help older generations of academic librarians better connect with their constituents. ■ the literature on online social networks although social networks have been the subject of study by sociologists for years and social network theories have been established to describe how these networks func­ tion, the study of online social networks has received little attention from the scholarly community. in 1997, garton, haythornthwaite, and wellman were among the first to describe a method, social network analysis, for studying online social networks.13 their work was published years before online social networks similar to facebook evolved. currently, the literature on these networks is predominantly limited to popular news pub­ lications, business magazines, occasional blurbs in library science and communications journals, and numerous student newspapers.14 privacy issues and concerns about sexual predators lurking on facebook and similar sites have been the focus of most articles. in the chronicle of higher education, read details numerous arrests, suspensions, and schol­ arship withdrawals that have resulted from police and administrators searching for incriminating information students have posted in facebook.15 read discovered that, because students naively reveal so much informa­ tion about their activities, some campus police were regularly trolling facebook, finding it “an invaluable ally in protecting their campuses.”16 students may feel a false sense of security when they post to facebook, regarding it as their private space. however, read warns that “as more and more colleges offer alumni e­mail accounts, and as campus administrators demonstrate more internet savvy, students are finding that their conversations are playing to a wider audience than they may have antici­ pated.”17 privacy concerns expressed about facebook appear to revolve more around surveillance than stalk­ ers. in a web seminar on issues regarding facebook use in higher education, shawn mcguirk, director of judicial affairs, mediation, and education at fitchburg state college, massachusetts, recommends that administrators and others concerned with students posting potentially incriminating, embarrassing, or overtly personal infor­ mation draft a document similar to the one created by cornell university’s office of information technologies, which advises students on how to safely and responsibly use online social networking sites similar to facebook.18 after pointing out the positive benefits of facebook and reassuring students that cornell university is proud of its liberal policy in not monitoring online social networks, the essay, entitled “thoughts on facebook,” provides poignant advice and examples of privacy issues revolv­ ing around facebook and similar web sites.19 the golden rule of this essay states: don’t say anything about someone else that you would not want said about yourself. and be gentle with your­ self too! what might seem fun or spontaneous at 18, given caching technologies, might prove to be a liability to an on­going sense of your identity over the longer course of history.20 a serious concern discussed in this document is the real possibility that potential employers may scan facebook profiles for the “real skinny” on job candidates. however, unless the employer uses an e­mail issued from the same school as the candidate, he or she is unable to look at the individual’s full profile without first request­ ing permission from the candidate to be added as a “friend.” all the employer is able to view is the user’s name, school affiliation, and picture (if the user has posted one). unless the user has posted an inappropriate picture or is applying for a job at the college he or she is attending, the threat of employers snooping for informa­ tion on potential candidates in facebook is minimal. the same, however, cannot be said of myspace, which is much more open and accessible to the public. additionally, three pilot research studies have also focused on privacy issues specifically relating to facebook, including those of stutzman, gross and acquisti, and govani and pashley. results from all three studies revealed strikingly close findings. individuals who participated in the studies seemed willing to dis­ close personal information about themselves—such as photos and sometimes even phone numbers and mailing addresses—on facebook profiles even though students also seemed to be aware that this information was not secure. in a study of fifty carnegie mellon university undergraduate users, govani and pashley concluded that these users “generally feel comfortable sharing their per­ sonal information in a campus environment. participants said they “had nothing to hide” and “they don’t really 26 information technology and libraries | march 200726 information technology and libraries | march 2007 care if other people see their information.”21 a separate study of more than four thousand facebook members at the same institution by gross and acquisti echoed these findings.22 comparing identity elements shared by members of facebook, myspace, friendster, and the university of north carolina directory, stutzman discov­ ered that a significant number of users shared personal information about themselves in online social networks, particularly facebook, which had the highest level of campus participation.23 gross and acquisti provide a list of explanations suggesting why facebook members are so open about sharing personal information online. three explanations that are particularly convincing are that “the perceived benefit of selectively revealing data to strang­ ers may appear larger than the perceived costs of possible privacy invasions”; “relaxed attitudes toward (or lack of interest in) personal privacy”; and “faith in the network­ ing service or trust in its members.”24 in public libraries, concern has primarily centered on teenagers accessing myspace.com, an online social net­ working site much larger than facebook. myspace, whose membership, unlike facebook, does not require an .edu e­mail address, has a staggering 43 million users, a num­ ber that continues to rise.25 julian aiken, a reference librar­ ian at the new haven free public library, wrote about the unpopular stance he took when his library decided to ban access to myspace due to the hysterical hype of media reports exposing the dangers from online predators lurking on the site.26 for aiken, the damage of censorship policies in libraries far outweighs the potential risk of sex crimes. furthermore, he suggests that there are even edu­ cational benefits of myspace, observing that “[t]eenagers are using myspace to work on collaborative projects and learn the computer and design skills that are increasingly necessary today.”27 what is apparent is that whether facebook continues to rise in popularity or fizzles out among the college crowd, the next generation of college students, who now constitute the largest percentage of myspace users, are already solidly entrenched and adept at using online social networks. librarians in institutions of higher education might need to consider what implica­ tions the communication style preferences of these future students could have, if any, on library services. while most of the academic attention regarding online social networks has centered on privacy concerns, perhaps the business sector has done a more thorough investiga­ tion of user behavior and students’ growing attraction towards these types of sites. business magazines have naturally focused on the market potential, growth, and fluctuating popularity of various online social networks. advertisers and investors have sought ways to capital­ ize on the exponential growth of these high­traffic sites. business week reported that as of october 2005, facebook .com had 4.2 million members. more than half of those members were between the ages of twelve and twenty­ four.28 while some portended that the site was losing momentum, as of august 2006, membership on facebook had expanded beyond eight million.29 marketing experts have closely studied, apparently more so than com­ munication scholars, the behavior of users in online social networks. in a popular business magazine, hempel and lehman describe user behavior of the “myspace generation”: “although networks are still in their infancy, experts think they’re already creating new forms of social behavior that blur the distinctions between online and real­world interactions.”30 the study of user behavior in online social networks, however, has yet to be addressed in length by those outside the field of marketing. although evidence of interest in online social net­ works is apparent in librarian weblogs and forums (many librarians have created facebook groups for their libraries), actual literature in the field of library and information science is scarce.31 dvorak questions the lack of interest displayed by the academic community toward online social networks as a focus of scholarly research. calling on academics to “get to work,” he argues “aca­ demia, which should be studying these phenomena, is just as out of the loop as anyone over 30.”32 this discon­ nect is also echoed by michael j. bugeja, director of the greenlee school of journalism and communication at iowa state university, who writes, “while i’d venture to say that most students on any campus are regular visitors to facebook, many professors and administrators have yet to hear about facebook, let alone evaluate its impact.”33 the lack of published research articles on these types of networks, however, is understandable given the newness of the technology. a few members of the academic community have sug­ gested opportunities for using facebook to communicate with and reach out to students. in a journal specifically geared toward student services in higher education, shier considers the impact of facebook on campus community building.34 although she cannot identify an academic purpose for facebook, she describes how the site can con­ tribute to the academic social life of a campus. facebook provides students with a virtual campus experience, particularly in colleges where students are commuters or are in distance education. shier writes, “as the student’s definition of community moves beyond the geographic and physical limitations, facebook.com provides one way for students to find others with common interests, feel as though they are part of a large community, and also find out about others in their classes.”35 furthermore, facebook membership extends beyond students to fac­ ulty, staff, and alumni. shier cites examples of professors who used facebook to connect or communicate with their students, including the president of the university of iowa and more than one hundred professors at duke university. professors who teach online courses make article title | author 27checking out facebook.com | charnigo and barnett-ellis 27 themselves seem more human or approachable by estab­ lishing facebook profiles.36 greeting students on their own turf is exactly the direction staff at washington university’s john m. olin library decided to take when they hired web services librarian joy weese moll to communicate and answer questions through a variety of new technologies, includ­ ing facebook.37 brian mathews, information services librarian at georgia institute of technology, also created a facebook profile in order to “interact with the students in their natural environment.”38 mathews decided to experiment with the possibilities of using facebook as an outreach tool to promote library services to 1,700 stu­ dents in the school of mechanical engineering after he discovered that 1,300 of these students were registered on facebook. advising librarians to become proactive in the use of online social networks, mathews reported that overall, his experience helped him to effectively “expand the goal of promoting the library.”39 bill drew was among the first librarians to create an account and profile for his library, the suny morrisville library. as of september 2006, nearly one hundred librarians had created profiles or accounts for their libraries on facebook. one month later, however, the administration at facebook began shutting down library accounts on the grounds that libraries and institutions were not allowed to represent themselves with profiles as though they were individu­ als. in response, many of these libraries simply created groups for their libraries, which is completely appropri­ ate, similar to creating a profile, and just as searchable as having an account. the authors of this study created the “houston cole library users want answers!” group, which currently has ninety­one members. library news and information of interest about the library is announced in the group.40 in this study, one trend the authors will try to identify is whether other librarians have considered or are already using facebook in similar ways that moll, mathews, and drew have explored as avenues for com­ municating with students or promoting library services. ■ the survey in february 2006, 244 surveys were mailed to reference or public service librarians (when the identity of those per­ sons could be determined). these individuals were chosen from a random sample of the 850 institutions of higher education classified by the carnegie classification listing of higher education institutions as “master’s colleges and universities (i and ii)” and “doctoral/ research universities (extensive and intensive).”41 the sample size provided a 5.3 percent margin error and a 95 percent confidence level. one hundred twenty­six surveys were completed, providing a response rate of 51 percent. fifteen survey questions (appendix a) were designed to target three areas of inquiry: awareness of facebook, practical impact of the site on library services, and perspectives of librarians toward online social networks. awareness of facebook a series of questions on the survey queried respondents about their awareness and degree of knowledge about facebook. the overwhelming majority of librarians were aware of facebook’s existence. out of 126 librarians, 114 had at least heard of facebook; 24 were not familiar with the site. as one individual wrote, “i had not heard of facebook before your survey came, but i checked and our institution is represented in facebook.” universities registered in facebook are easily located through a search­by­region on facebook’s home page. thirty­eight colleges and universities for alabama (jsu’s location) are registered in facebook. (in comparison, 143 academic institutions in california are listed.) out of those librar­ ians who had heard of the site, 27 were not sure whether their institutions were registered in facebook’s directory. sixty survey participants were aware that their institu­ tions were registered in the directory, while fifteen librar­ ians reported that their universities were not registered (figure 1). several comments at the end of the survey indicated that some of the institutions surveyed did not issue school e­mail accounts, making membership in facebook impossible for their university. interestingly, out of the sixty individuals who could claim that their universities were in the directory, 34 percent have created their own personal facebook accounts and two libraries have individual profiles (figure 2). one individual who established an account on the site wrote, “personally, i’m a little embarrassed by having an account because it’s such a teeny­bopper kind of thing and i’m a little old for it. but it’s an interesting cultural phenomenon and academic librarians need to get on the bandwagon with it, if only to better understand their constituents.” another survey respondent with an individual profile on the site reported a group created by his or her institution on facebook titled “i totally want to have sex in the library.” this individual wanted to make it clear, however, that the students—not the librarians—created this group. a particularly help­ ful participant went so far as to poll the reference col­ leagues in all nine of the libraries at his/her institution and found that “only a few had even heard of facebook.” that librarians will become increasingly aware of online social networks was the sentiment expressed by another individual who wrote, “most librarians at my institu­ tion are unaware of social software in general, much less facebook. however, i think this will change in the future as social software is mentioned more often in traditional media (such as television and newspapers).” according to survey responses, it does not appear 28 information technology and libraries | march 200728 information technology and libraries | march 2007 that use of facebook by students has been as noticeable or distracting in other libraries as it has been at hcl. when asked to describe their observation of student use of library computers to access facebook, 56 percent of those surveyed checked “rarely to never.” only 20 percent indicated “most of the time” to “all of the time” (table 1). however, it is important to remember that only sixty individuals could verify that their institutions are regis­ tered on facebook. through comments, some librarians hinted that “snooping” or keeping mental notes of what students view on library computers is frowned upon. it simply is not our business. “we do not regulate or track student use of computers in the library,” wrote one indi­ vidual. several librarians noted that students were using facebook in the libraries, but more so on personal laptops than public­access computers. practical impact of facebook another goal of this study was to find out whether facebook has had any real impact on library services, such as an increase in bandwidth, library traffic, and noise, or in use of public­access computers, scanners, or other equipment. student complaints about monopolization of computers for use of facebook led administrators to block the site from computer labs at jsu. access to facebook on public­access terminals, however, was not regulated. survey responses revealed that facebook has had minimal impact on library services elsewhere. only one library was forced to develop a policy for specifically addressing computer­use concerns as a result of facebook use. one individual mailed the sign posted on every computer terminal in the library, which states, “if you are using a computer for games, chat, or other recreational activity, please limit your usage to thirty minutes. computers are primarily intended for academic use.” another librarian reported that academic computing staff had to shut down access to facebook on library computers due to band­ width and access issues. this individual, however, added, “interestingly, no one has complained to the library staff about its absence!” given a list of possible effects facebook may have had on library services and operations, 10 per­ cent of respondents indicated that facebook has increased patron use of computers. seven percent agreed that it has increased patron traffic, and only 2 percent reported that the site has created bandwidth problems or slowed down internet access. only four individuals received patron complaints about other users “tying up” the computers with facebook (figure 3). since the advent of facebook, the public scanner has become one of the hottest items in hcl. librarians at jsu know that use of the scanner has increased tremendously due to facebook because the scanner used by students to upload photos is attached to a public workstation next to the general reference desk. students often ask questions about uploading pictures to their facebook profiles as well as how to edit photos (e.g., resizing and cropping). one survey question asked whether scanner use had increased as a result of facebook. of the sixty­two respon­ dents who answered this question (it was indicated that only those libraries that provide public access to scanners should answer the question), 77 percent reported that figure 1. institutions added to the facebook directory figure 2. involvement with facebook table 1. student use of library computers to access facebook (based on observation) total percentage never 23 32 rarely 17 24 some of the time 17 24 all the time 7 10 most of the time 7 10 article title | author 2�checking out facebook.com | charnigo and barnett-ellis 2� scanner use had not increased. furthermore, only two librarians have assisted students with the scanner or pro­ vided any other type of assistance, for that matter, with facebook. the assistance the two librarians gave included scanning photographs, editing photos, uploading photos to facebook profiles, and creating accounts. however, in a separate question, 21 percent of participants agreed that librarians should be responsible for helping students, when needed, with questions about facebook. no librar­ ian has added additional equipment such as computers or scanners as a result of facebook. only one individual reported future plans by his/her library to add additional equipment in the future as a result of heavy use of the site. perspectives toward facebook one of the main goals of the study was to obtain a snapshot of the perspectives and attitudes of librarians toward facebook and online social networks in general. most of the librarians surveyed were neither enthusiastic nor disdainful of facebook. a small group of the respon­ dents, however, when given the chance to comment, were extremely positive and excited about the possibilities of online social networking. twenty­one individuals saw no connection between libraries and facebook. sixty­ seven librarians were in agreement that computer use for academic purposes should take priority, when needed, over use of facebook. however, fifty­one respondents indicated that librarians needed to keep up with internet trends, such as facebook, even when such trends are not academic in nature (table 2). out of 126 librarians who completed the survey, only 23 reported that facebook has generated discussion among library faculty and staff about online social networks. on the other hand, few individuals voiced negative opinions toward facebook. only 5 percent of those surveyed indicated that facebook annoyed faculty and staff. one individual wrote, “i don’t like facebook or most social networking services. they encourage the formation of cliques and keep users from meeting and accepting those who are different than themselves.” comments like this, however, were rare. although the majority of librarians seemed fairly apa­ thetic toward facebook, few individuals expressed nega­ tive comments toward the site. few librarians indicated that facebook should be addressed or regulated in library policy. most individu­ als viewed the site as just another communication tool similar to instant messaging or cell phones. in fact, while most librarians did not express much interest in facebook, many were quite vocal about not regulating its use. the following comment by one survey partici­ pant captures this sentiment: “attempts to restrict use of facebook in the library would be futile, in my opinion, in the same way it is now impossible to ban use of usb drives and aim in academic libraries.” while most indi­ table 2. access, assistance, and awareness of facebook and similar trends: perspectives total percentage computer use for academic purposes should take priority, when needed, over use of facebook. 67 53 librarians need to “keep up” with internet trends, such as facebook, even when these trends are not academic in nature. 51 40 library resources should not be monopolized with use of facebook. 35 28 librarians should help students, when able, with questions regarding facebook. 27 21 there is no connection between libraries and facebook. 21 17 student use of facebook on library computers should not be regulated. 15 12 library computers should be available for access to facebook, but librarians should not feel that it is their responsibility to assist students with questions regarding the site. 11 9 (respondents were allowed to check any or all responses that applied.) figure 3. patron complaints about facebook 30 information technology and libraries | march 200730 information technology and libraries | march 2007 viduals agreed that academic use of computers should take priority over recreational use, a polite request that a patron using facebook allow another student to use the computer for academic purposes, when necessary, appears more preferable than the creation and enforce­ ment of strict policies. as one librarian put it, “i don’t want students to see the library as a place where they are ‘policed’ unnecessarily.” when asked if facebook serves any academic pur­ pose, 54 percent of those surveyed indicated that it does not, while 34 percent were “not sure.” twelve percent of the librarians identified academic potential or pos­ sible benefits of the site (figure 4). the authors were surprised to find that 46 percent of those surveyed were not completely willing to dismiss facebook as pure rec­ reation. some librarians found facebook to be a distrac­ tion to academics: “maybe i’m old fashioned, but when do students find time for this kind of thing? i wonder about the impact of distractions like this on academic pursuits. there’s still only twenty­four hours in a day.” another individual asked two students who were using facebook in the library what they thought of the site and they admitted that it was “frequently a distraction from academic work.” for the 34 percent who were not sure whether facebook has any academic value, there were comments such as “i am continuing to observe and will decide in the future.” academic uses for facebook included suggestions that it be used as a communication tool for student collaboration in classes (facebook allows students to search for other students by course and sec­ tion number). one individual suggested it could be used as an “online study hall,” but then wondered if this might lead to plagiarism. some thought instructors could somehow use facebook for conducting online discussion forums, with one participant observing “it’s ‘cooler’ than using blackboard.” “building rapport” with students through a communication medium that many students are comfortable with was another benefit mentioned. respondents who were enthusiastic about facebook thought it most beneficial as a virtual extension of the campus. facebook could potentially fill a void where face­to­face connections are absent in online and dis­ tance­education classes. several librarians suggested that facebook has had a positive influence in fostering col­ legiate bonds and school spirit. as one individual wrote, “[t]he academic environment is not only responsible for scholarly growth, but personal growth as well. this is just one method for students to interact in our highly techno­ logical society.” facebook could provide students who are not physically on campus with a means to connect with other students at their institutions who have similar academic and social interests. some librarians were so enthusiastic about facebook that they suggested libraries use the site to promote their services. using the site to advertise library events and creating online library study groups and book clubs for students were some of the ideas expressed. one librar­ ian wrote: “facebook (and other social networking sites) can be a way for libraries to market themselves. i haven’t seen students using facebook in an academic manner, but there was a time when librarians frowned on e­mail and aim too. if it becomes a part of students’ lives, we need to welcome it. it’s part of welcoming them, too.” more librarians, however, felt that facebook should serve as a space exclusively for students and that librarians, profes­ sors, administrators, police, and other uninvited folks should keep out. furthermore, as one individual noted, it is not “an appropriate venue” for librarians to promote their services. while the review of literature demonstrates that much has been made of online social networks and privacy issues, the librarians surveyed were not particularly con­ cerned about privacy. only 19 percent indicated that they were concerned about privacy issues related to facebook. however, some librarians voiced concerns that many stu­ dents are ignorant about the risks of posting personal infor­ mation and photographs on facebook and do not seem fully aware of the possibility that individuals outside their social sphere might also have reason to access the site. one individual mentioned that the librarians at her institution have begun to emphasize this to students during library instruction sessions on internet research and evaluation. ■ limitations several limitations to this study must be noted when attempting to reach any type of conclusion. participants who had never heard of facebook obviously could not answer any questions except that they were not famil­ iar with the site. some questions required respondents to “guesstimate.” unless librarians have access to their figure 4. finds conceivable academic value in facebook article title | author 31checking out facebook.com | charnigo and barnett-ellis 31 institution’s internet usage statistics, it would be hard for them to really know how much bandwidth is being used by students accessing facebook. librarians, having been trained in a profession that places a high value on freedom of access, might also be wary of activities that suggest any type of censorship. therefore, it is conceivable that some of the librarians surveyed do not know whether students are using facebook in the library because they make a point not to snoop or make note of individual web sites that students view. ■ discussion while online education is growing at a rapid rate across the united states, so is the presence of virtual academic social communities. although facebook might prove to be a passing fad, it is one of the earliest and largest online social networking communities geared specifically for students in higher education. it represents a new form of communication that connects students socially in an online environment. if online academics have evolved and continue to do so, then it is only natural that online academic social environments, such as facebook, will continue to evolve as well. while traditionally considered the heart of the campus, one is left to ponder the library’s presence in online academic social networks. what role the library will serve in these environments might largely depend on whether librarians are proactive and experi­ mental with this type of technology or whether they simply dismiss it as pure recreation. emerging technolo­ gies for communication should provoke, at the very least, an interest in and knowledge of their presence among library and information science professionals. this survey found that librarians were overwhelmingly aware of and moderately knowledgeable about facebook. some librarians were interested in and fascinated with facebook, but preferred to study it as outsiders. others had adopted the technology, but more for the purpose, it would seem, of having a better understanding of today’s students and why facebook (and other online social net­ working sites) appeals to so many of them. it is apparent from this study that there is a fine line between what now constitutes “academic” activity and “recreational” activity in the library. sites like facebook seem to blur this line fur­ ther and librarians do not seem eager or find it necessary to distinguish between the two unless absolutely pressed (e.g., asking a student to sign out of facebook when other patrons are waiting to use computers for academic work). one area of attention this study points to is a lack of con­ cern among librarians toward the internet and privacy issues. some individuals surveyed suggested that librari­ ans play a larger role in making students aware that people outside their society of friends—namely, administrative or authority figures—have the ability to access the informa­ tion they post online to social networks. participants were most enthusiastic about facebook’s role as a space where students in the same institution can connect and share a common collegiate bond. librarians who have not yet “checked out” facebook might consider one individual’s description of the site as “just another ver­ sion of the college yearbook that has become interactive.”42 among the most cherished books in hcl that document campus life at jsu are the mimosa yearbooks. alumni and students regularly flip through this treasure trove of pho­ tographs and memories. no administrator or librarian would dare weed this collection or find its presence irrele­ vant. while year books archive campus yesteryears, online social networks are dynamically documenting the here and now of campus life and shaping the future of how we communicate. as casey writes, “libraries are in the habit of providing the same services and the same programs to the same groups. we grow comfortable with our provision and we fail to change.”42 by exploring popular new types of internet services such as facebook instead of quickly dismissing them as irrelevant to librarianship, we might learn new ways to reach out and communicate better with a larger segment of our users. ■ acknowledgements the authors would like to acknowledge stephanie m. purcell, student worker at the houston cole library, for her excellent editing suggestions and insight into online social networks from the student’s point of view, and john­bauer graham, head of public services at the houston cole library, for his encouragement. references and notes 1. angela reid, “finally . . . the facebook,” the chanticleer, sept. 22, 2005, 4. 2. facebook.com, http://www.facebook.com/about.php (accessed dec. 2, 2005). 3. angus loten, “the great communicator,” inc.com., june 6, 2006, http://www.inc.com/30under30/zuckerberg.html (accessed dec. 4, 2005). 4. adam lashinsky, “facebook stares down success,” fortune, nov. 28, 2005, 4. 5. michael amington, “85 percent of college students use facebook,” testcrunch: tracking web 2.0 company review on facebook (sept. 7, 2005), http://www.techcrunch.com/2005/09/07/ 85­of­college­students­use­facebook (accessed dec. 2, 2005). 6. http://www.facebook.com/about.php. 7. facebook us! if you are a registered member of facebook, do a global search for “laurie charnigo” or “paula barnett­ ellis.” 32 information technology and libraries | march 200732 information technology and libraries | march 2007 8. stephen downes, “semantic networks and social net­ works,” the learning organization 12, no. 5 (2005): 411. 9. ibid. 10. tim o’reilly, “what is web 2.0?” http://www.oreilly net.com/pub/a/oreilly/tim/news/2005/09/30/what­is­web ­20.html (accessed aug. 6, 2006). 11. http://www.facebook.com/about.php. 12. angela provitera mcglynn, “teaching millennials, our newest cultural cohort,” the education digest 71, no. 4 (2005): 13. 13. laura garton, caroline haythornthwaite, and barry well­ man, “studying online social networks,” journal of computer mediated communication 31, no. 4 (1997). 14. facebook.com’s “about” page archives a collection of col­ lege newspaper articles about facebook: http://www.facebook .com/about.php (accessed dec. 4, 2005). 15. brock read, “think before you share,” the chronicle of higher education, jan. 20, 2006, a38–a41. 16. ibid., a41. 17. ibid., a40. 18. shawn mcguirk, “facebook on campus: understanding the issues,” magna web seminar presented live on june 14, 2006. transcripts available for a fee from magna pubs. http://www .magnapubs.com/catalog/cds/598755­1.html (accessed aug. 2, 2006). 19. tracy mitrano, “thoughts on facebook” (apr. 2006) cor­ nell university of information technologies, http://www.cit .cornell.edu/oit/policy/ memos/facebook.html (accessed june 22, 2006). 20. ibid., “conclusion.” 21. tabreez govani and harriet pashley, “student awareness of the privacy implications when using facebook,” unpublished paper presented at the “privacy poster fair” at the carnegie mellon university school of library and information science, dec. 14, 2005, 9, http://lorrie.cranor.org/courses/fa05/tubzhlp .pdf (accessed jan. 15, 2006). 22. ralph gross and alessandro acquisti, “information rev­ elation and privacy in online social networks,” paper presenta­ tion at the acm workshop on privacy in the electronic society, alexandria, va., nov. 7, 2005, 79, http://portal.acm.org/citation .cfm?id=1102214 (accessed nov. 30, 2005). 23. frederic stutzman, “an evaluation of identity­sharing behavior in social network communities,” paper presentation at the idmaa and ims code conference, oxford, ohio, april 6–8, 2006, 3–6, http://www.ibiblio.org/fred/pubs/stutzman _pub4.pdf (accessed may 23, 2006). 24. gross and acquisti, “information revelation and privacy in online social networks,” 73. 25. “myspace: design anarchy that works,” business week, jan. 2, 2006, 16. 26. julian aiken, “hands off myspace,” american libraries 37, no. 7 (2006): 33. 27. ibid. 28. jessi hempel and paula lehman, “the myspace genera­ tion,” business week, dec. 12, 2005, 94. 29. http://www.facebook.com/about.php. 30. hempel and lehman, “the myspace generation,” 87. 31. the authors created the “librarians and facebook” group on facebook to discuss issues concerning facebook and librari­ anship, such as censorship issues, policies, and ideas for con­ necting with students through facebook. this is a global group. if you have a facebook account, we invite you to do a search for “librarians and facebook” and join our group. 32. john c. dvorak, “academics get to work!” pcmagazine online, http://www.pcmag.com/article2/0,1895,1928970,00 .asp (accessed feb. 21, 2006). 33. michael j. bugeja, “facing the facebook,” the chronicle of higher education, jan. 27, 2006, c1–c4; ibid. 34. maria tess shier, “the way technology changes how we do what we do,” new directions for student services 112 (winter 2005): 83–84. 35. ibid., 84 36. shier, “the way technology changes how we do what we do,” 112; j. duboff, “poke” your prof: faculty discovers thefacebook.com,” yale daily news, mar. 24, 2005, http://www .yaledailynews.com/article.asp?aid=28845 (accessed jan. 15, 2006; mingyang liu, “would you friend your professor? duke chronicle online, feb. 25, 2005, http://www.dukechronicle.com/ media/paper884/news/2005/02/25/news/would.you.friend .your.professors­1472440.shtml?norewrite&sourcedomain =www.dukechronicle.com (accessed jan. 15, 2006). 37. brittany farb, “students can ‘check out’ new librarian on the facebook,” student life (washington univ. in st. louis), feb. 27, 2006, http://www.studlife.com/home/index.cfm?eve nt=displayarticle&ustory_id=5914a90d­53b (accessed feb. 27, 2006). 38. brian s. mathews, “do you facebook? networking with students online,” college & research libraries news 37, no. 5 (2006): 306. 39. ibid., 307. 40. view the “houston cole library users want answers!” group by doing a search for the group title on facebook. 41. nces compare academic libraries, http://nces.ed.gov/ surveys/libraries/ compare/peervariable.asp (accessed dec. 2, 2005). the random sample was chosen using the research ran­ domizer available online, http://www.randomizer.org/form .htm (accessed dec. 2, 2005). 42. michael e. casey and laura c. savastinuk, “library 2.0,” library journal 131, no. 14 (2006): 40. article title | author 33checking out facebook.com | charnigo and barnett-ellis 33 1. has your institution been added to the facebook directory?  yes  no (skip to questions 10, 11, and 12  not sure (skip to questions 10, 11, and 12)  i am not familiar with facebook (skip all questions and submit) 2. which best describes your involvement with facebook?  i have a personal account  my library has an account  no involvement 3. which best describes your observation of student use of library computers to access facebook?  all the time  most of the time  some of the time  rarely  never 4. has your library added additional equipment such as computers or scanners as a result of facebook use?  yes  no  no, but we plan to in the future 5. have patrons complained about other patrons using library computers for facebook?  yes  no  not sure 6. has your library had to develop a policy or had to address computer use concerns as a result of facebook use?  yes  no  not sure 7. if your library provides public access to a scanner, has patron use of scanners increased due to the use of facebook?  yes  no 8. have you assisted students with the library’s scan­ ner for facebook?  yes  no 9. if you have provided assistance to students with facebook, please check all that apply:  creating accounts  scanning photographs or offering advice on where students can access a scanner  editing photographs (e.g., resizing photos or use of a photo editor)  uploading photographs to facebook profiles  other __________________________________ 10. check the responses that best describe your opinion about the responsibilities of librarians in assisting students with facebook questions and access to the web site:  student use of facebook on library computers should not be regulated.  library resources should not be monopolized with facebook use.  computer use for academic purposes should take priority, when needed, over use of facebook.  librarians should help students, when able, with facebook questions.  librarians need to “keep up” with internet trends, such as facebook, even if they are not academic in nature.  there is no connection between librarians, libraries, and facebook.  library computers should be available for facebook use, but librarians should not feel that they need to assist students with facebook questions. 11. would you consider facebook to be a relevant aca­ demic endeavor?  yes  no  not sure appendix a: survey on the impact of facebook on academic libraries 34 information technology and libraries | march 200734 information technology and libraries | march 2007 12. if you answered “yes” to question 11, please describe how facebook could be considered an aca­ demic endeavor. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ 13. please check all answers that best describe what effect, if any, use of facebook in the library has had on library services and operations?  has increased patron traffic  has increased patron use of computers  has created computer access problems for patrons  has created bandwidth problems or slowed down internet access  has generated complaints from other patrons  annoys library faculty and staff  interests library faculty and staff  has generated discussion among library faculty and staff about facebook 14. is privacy a concern you have about students using facebook in the library?  yes  no  not sure please list any observations, concerns, or opinions you have regarding facebook use in libraries. extracted the paragraphs from my palm to my desktop, and saved that document and the tocs on a universal serial bus (usb) key. today, i combined them in a new document on my laptop and keyed the remaining paragraphs in my room at an inn on a pier jutting into commencement bay in tacoma on southern puget sound. i sought inspiration from the view out my window of the water and the fall color, from old crow medicine show on my ipod, and from early sixties beyond the fringe skits on my treo. fred kilgour was committed to delivering informa­ tion to users when and where they wanted it. libraries must solve that challenge today, and i am confident that we shall. editorial continued from page 3 digitization has bestowed upon librarians and archivists of the late 20th and early 21st centuries the opportunity to reexamine how they access their collections. it draws these two traditional groups together with it specialists in order to collaborate on this new great challenge. in this paper, the authors offer a strategy for adapting a library system to traditional archival practice. t he librarian and the archivist . . . both collect, preserve, and make accessible materials for research; but significant differences exist in the way these materials are arranged, described, and used.”1 among the items usually collected by libraries are: published books and serials, and in more recent times, commercially available sound recordings, films, videos, and electronic resources of various types. archives, on the other hand, tend to collect original records of an organization, unique personal papers, as well as other effects of individuals and families. each type of institution, given its particular emphasis, has its own traditions and its own methods of dealing with its collections. most midto large-sized automated libraries in the united states and abroad use machine readable cataloging (marc) records to form the basis of their online catalogs. bibliographic records, including those in the marc format, generally represent an individually published item, or “information product,”2 and describe the physical characteristics of the item itself. the basic unit of archival description, however, is a much more complex entity than the basic unit of bibliographic description and often involves multiple hierarchical levels that may or may not extend down to the level of individual items. at portland state university (psu) the authors examined whether the capabilities of their present integrated library system could be expanded to capture the hierarchical structure of traditional archival finding aids. ■ background as early as 1841, the cataloging rules established by panizzi were geared toward locating individual published items. panizzi based his rules on the idea that any person looking for any particular book should be able to find it through the catalog.3 this tradition has continued over time up through current standards such as the anglo-american cataloguing rules and reaffirmed in marc, the standard for the representation and exchange of bibliographic information that has been widely used by libraries for over thirty years.4 archival description, on the other hand, is generally based on the fonds, that is, the entire collection of materials in any medium that were created, accumulated, and used by a particular person, family, or organization in the course of that creator’s activities and functions.5 thus, the basic unit of archival description, usually a finding aid, is a much more complex entity than the basic unit of bibliographic description, often involving multiple hierarchical levels of description that may or may not extend down to the level of individual items. before archival description begins, the archivist identifies related groups of materials and determines their proper arrangement. once the arrangement is determined, then the description of the materials reflects both their provenance and their original order.6 the first explicit statement of the levels of arrangement in an archival collection was by holmes and has since been elevated to the level of dogma in the archival community.7 a more recent statement in describing archives: a content standard (dacs) indicates that the actual levels of arrangement may differ for each collection. by custom, archivists have assigned names to some, but not all, levels of arrangement. the most commonly identified are collection, record group, series, file (or filing unit), and item. a large or complex body of material may have many more levels. the archivist must determine for practical reasons which groupings will be treated as a unit for purposes of description.8 rephrasing holmes, the five levels of arrangement can be defined as: 1. the collection level which holmes called the depository level—the breakdown of the depository’s complete holdings into a few major divisions based on the broadest common denominator 2. the record group level—the fonds or complete collection of the papers of a particular administrative division or branch of an organization or of a particular individual or family 3. the series level—the breakdown of the record group into natural series and the arrangement of each series with respect to the others 4. the filing unit level—the breakdown of each series into unit components, which are usually fairly obvious if the documents are kept in file folders 5. the document level—the level of individual items digital collection management through the library catalog michaela brenner, tom larsen, and claudia weston digital collection management through the library catalog | brenner, larsen, and weston 65 michaela brenner (brennerm@pdx.edu) and tom larsen (larsent@pdx.edu) are database maintenance and catalog librarians, and claudia weston (westonc@pdx.edu) is assistant university librarian for technical services, portland state university. 66 information technology and libraries | june 2006 the end result of archival description is usually a finding aid that ideally presents an accurate representation of the items in an archival collection so that users can, as independently as possible, locate them.9 building on the print finding aid, the archival community has explored a number of mechanisms for disseminating information on the availability of items in their collections. in 1983, the usmarc format for archival and manuscript control (marc-amc) was released and subsequently sanctioned for use as one possible standard data structure and communication protocol in the saa descriptive standard archives, personal papers, and manuscripts (appm) and its successor, dacs.10 its adoption, however, has been somewhat controversial among archivists.11 the difficulty in capturing the hierarchical nature of collections through the marc format is one factor that has limited the use of marc by the archival community. while it is possible to encode this hierarchical description in marc using notes and linking fields, few archivists in practice have actually made use of these linking fields.12 thus, in archival cataloging, marc records have been used primarily for collection-level description, allowing users to search and discover only general information about archival collections in online catalogs while the finding aid has remained the primary tool for detailed data at all levels of description. in 1995, the encoded archival description (ead) emerged as a new standard for encoding descriptions of archival collections. the ead standard, like the marc standard, allows for the electronic storage and exchange of archival information; but unlike marc, it is based on the finding aid. ead is well suited for encoding the hierarchical relationships between the different parts of the collection and displaying them to the user, and it has become more widely adopted by the archival community. as outlined, the standards and systems chosen by an institution are dictated by the needs and traditions of that institution. the archival community relies heavily on finding aids and, with increasing frequency, on ead, their electronic extension; whereas the library community heavily relies on the online public access catalog (opac) and marc records. new trends capitalizing on the strengths of both traditions are evolving as libraries and archives seek ways to improve access to their archival and digital collections. ■ access to digital archival collections in libraries when searching the web for collections of information, one frequently encounters separate interfaces for traditional library, archival, and digital collections even though these collections may be owned, sponsored, hosted, or licensed by a single institution. descriptive records for traditional library materials reside in the opac and are constructed according to standard library practice, while finding aids for the archival and digital collections increasingly appear on specially designed web sites. this, of course, means that users searching the opac may miss relevant materials that are described only in the archival and digital documents database or web site. similarly, users searching the archival and digital documents database or web site may miss relevant materials that are described only in the opac. in other instances, libraries, such as the library of congress, selectively add records to their opacs for individual items in their archival and digital document collections. this incorporation allows users more complete access to items within the library’s collections. authority control and the assignment of descriptors further enhance access to the item-level records. to minimize processing costs, however, libraries frequently create brief descriptive records for items, thereby limiting their value to patrons.13 by creating descriptive records for the items only, libraries also obscure the hierarchical relationships among the items and the collections in which they reside. these relationships can provide the user with a useful context for the individual items and are an essential part of archival description. still other libraries, such as the university of washington, include collection-level marc records in the opac for their archival and digital document collections. these are searchable in the opac in the same way as bibliographic records for other materials. these collection-level records can then in turn be linked to finding aids that describe the collections more fully.14 collection-level records often are used in libraries where library resources may be insufficient for cataloging large collections of materials at the item level.15 the guidelines for collection-level records in appm and dacs, however, allow for additional fields that are not ordinarily used in library bibliographic records. these include such things as descriptions of the organization and arrangement of the collection, citations for published descriptions of the collection and links to the finding aid, and acknowledgment of the donors, as well as ample subject access to the collection. despite their potential for detail, collectionlevel records cannot provide the same degree of access to individual items as full item-level records. ■ an approach taken at portland state university library in many ways, archival and digital-document collections are continuing resources. a continuing resource is defined as “. . . a bibliographic resource that is issued over time digital collection management through the library catalog | brenner, larsen, and weston 67 with no predetermined conclusion. continuing resources include serials and ongoing integrating resources.”16 like published continuing resources, archival and digital collections generally are created over time with no predetermined conclusion. in fact, some archival collections continue to grow even after part of the collection has been accessioned by a library or archive. thus, even though many of the individual items in the collection might be properly treated as monographic (not unlike serial analytics), it would not be unreasonable to treat the entire collection as a continuing resource. with this in mind, the authors examined whether their electronic-resource management system could be adapted to accommodate evolving collections of digitized and born-digital material. more specifically, the present system was examined to determine whether its capabilities could be expanded to capture the hierarchical structure found in traditional archival finding aides. the electronic resource management system in use by psu library is innovative interfaces’ electronic resource management (erm) product. according to innovative interfaces inc.’s (iii) marketing literature, “[erm] effectively controls subscription and licensing information for licensed resources such as e-journals, abstracting and indexing (a&i) databases, and full-text databases.”17 to control and provide improved access to these resources, erm stores details about purchase orders, aggregators and publishers, subscription terms, licensing conditions, breadth of holdings, internal and external contact information, and other aspects of these resources that individual libraries consider relevant. for increased security and data integrity, multilevel permissions restrict viewing and editing of data to the appropriate level of staff or patron. the ability of erm to replicate the two-level hierarchical relationships between aggregators or publishers and the electronic and print resources they provide was of particular interest to the authors. through erm and iii’s batch record load capabilities, bibliographic and resource records can be loaded into the iii system using delimited source files such as those provided by serials solutions. resource records are the mechanisms used by iii to describe digital resources at a collection, subcollection, or title level, thereby enabling the capture of descriptive information not permitted by standard bibliographic records. iii uses holdings records to document serial holdings statements. according to the marc 21 formats for holdings data, a holdings statement is the “record of the location(s) and bibliographic units of a specific bibliographic item held at one or more locations.”18 iii holdings records may also contain a url for connecting to an electronic resource. in figure 1, for example, the resource record shows that psu library provides limited access to a number of journal titles through its springer journals online resource. as seen in figure 2, the display of a holdings record embedded in a bibliographic record provides more specific information on the availability of a title through the library’s collection. in this particular example, the information display reveals that print volumes are available for this title but that psu only has this title available as a part of the springer-verlag electronic collection accessible by clicking on the hotlink. more information on the springer collection can be discovered by clicking on the about resource button to retrieve the springer journals online resource record. this example, then, represents a two-level hierarchy where the resource springer journals online is analogous to an archival collection and abdominal imaging is analogous to an archival series. adaptation of erm for library-created digital collections was explored through work being done to fulfill the requirements of a grant received in 2005 by psu library. the goal of this grant was “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository for the collection, accession, and dissemination of key planning documents and reports, maps, and other ephemeral materials that have high value for oregon citizens and for scholars around the world.”19 the overall collection is called the oregon sustainable community digital library (oscdl). in addition to having its own web site, it was decided to make this collection accessible through the psu library catalog so that patrons could find digitized original documents about the city of portland together with other library materials. bibliographic records would be added to the database with hyperlinks to the digitized original documents using existing staff and tools. these bibliographic marc records would be as complete as possible. initially, attention was focused on documents originating from four different sources: ernest bonner, a former portland city planner; the city of portland archives; metro (the regional government for the portland, oregon, metropolitan area); and trimet (the portland metropolitan public transportation system). along with the documents, metadata was received from various databases. these descriptions ranged from almost nothing to detailed archival descriptions. unlike the challenge of shifting titles and holdings with typical serials collections, the challenge of this project was to reflect the four hierarchical levels of psu library’s collection (figure 3). innovative’s system structure was manipulated in order to accomplish this. at the core of iii’s erm module are resource records (rr) created to reflect the peculiarities of a particular collection. linked to these resource records are holdings records (hr) containing hyperlinks to the actual digitized documents (doc h1 – doc h3) as well as to their respective bibliographic records (bib doc h1 – bib doc h3) containing additional information on the individual items within the collection (figure 4). 68 information technology and libraries | june 2006 first, resource records were manually created for three of the subcollections within the bonner collection. these subcollections contained documents reflecting the development of harbor drive, front street, and the park blocks. the fields defined for the resource records include the resource title; type (digitized documents) and format (pdf) of the resource; a hyperlink to the new oscdl web site; content and systems contact names; a brief description of the resource; and, most importantly, the resource id used to connect holding records for individual documents to the corresponding resource record. next, the batch-loading function in erm was used to create bibliographic and holding records and associate them with the resource records. taking advantage of tracking data produced during the digitization process (figure 5), spreadsheets were created for each collection reflecting the data assigned to each individual digitized document. the document title, the date the document was created, number of pages, and summaries were included. coordinates for the streets mentioned in the documents were also included. because erm uses issn numbers and titles as match points for record loads, ”issn” numbers were also manufactured for each document and included in the spreadsheet. these homemade numbers were distinguished by using pdx as a prefix followed by collection and document numbers or letters, for example, pdx0022090 or pdxhdcoll. fortunately, erm accepted these dummy issns (figure 6). from this data spreadsheet, the system-required comma delimited coverage load file (*.csv) was also created. for this file, the system only allows a limited number of fields, and is very particular about the right terms, including correct capitalization, for the header row. individual document titles, the made-up issn numbers, individual urls to the documents, and a collection-specific resource id (provider) that connects all the documents from a collection to their respective resource record were included. the resource id is the same for all documents in one collection (figure 7). in the first attempt, the system was set up to produce holdings and bibliographic records automatically, using the data from the spreadsheets. for the bibliographic records, a system-provided template was created that included some general subject headings, genre headings, an author field, and selected fixed fields, such as language, bibliographic level, and material type (figure 8). records for the harbor drive collection were loaded, and the system created brief bibliographic and holdings records and linked them to the harbor drive resource record. the records were globally updated to add the general material designator (gmd) “electronic resource” to the title as well as the phrase “digitized document” as a local “call number” to make these documents more visible in the browse screen of the online catalog (opac) (figure 9). the digitized documents now could be found in the library catalog by author, subject, or keyword. the brief bibliographic records (figure 10) allow the user to go either to the digitized document via url or to the resource record with more information on the resource itself and links to other items in the same collection. the resource record then provides links either to the new oscdl web site (via the oregon sustainable community digital library link at the bottom of the resource record), to the bibliographic description of the individual document, or to the digitized document (figure 11). however, the quality of the brief bibliographic records that had been batch generated through the system-provided template was not satisfactory (figure 8). it was decided that more document-specific data like summaries, number of pages, the dates the documents were created, geographical information, and documentlevel local subject headings should be included. these data were already available from the original spreadsheets. with limited time and staff resources, full bibliographic marc records were batch created using the spreadsheets, detailed templates adjusted slightly to each collection, microsoft mail merge, and finally, the marcedit program created by terry reese of oregon state university (http://oregonstate.edu/~reeset/marcedit/html/index.html). this gave maximum control over the data to be included and the way they would be included. it also eliminated the need to clean up the data following the record load (figure 12). subsequently, full bibliographic records were created for the subcollections harbor drive, front street, and park blocks, to connect them to the next higher level, the bonner collection (figure 3). these records were also contributed to worldcat. mimicking the process used at the document level, a resource record was created for the bonner collection and the holdings records for the three subcollections were connected with their corresponding bibliographic records (figure 13). resource records with their corresponding item-level records for trimet, the city archives, and metro followed. the final step was then to add the resource record and the bibliographic record for the whole oscdl collection (figure 14). since this last bibliographic record is not connected to a collection above it, there is only a hyperlink to the oscdl resource record (figure 15). more subcollections and their corresponding digital documents are continually being added to oscdl. structures in psu library’s opac are adjusted as these collections change. digital collection management through the library catalog | brenner, larsen, and weston 69 ■ conclusion according to salter, “digitizing, the current challenge that straddles the 20th and 21st centuries, has given archivists and librarians pause to reconsider access to their collections. the world of digitization is the catalyst for it people, librarians, and archivists to unify the way they do things.”20 in this paper, a strategy has been offered for adapting a library system to traditional archival practice. by making use of some of the capabilities of the module in psu library’s integrated library system that was originally designed for managing electronic resources, a method was developed for managing digital archival collections in a way that incorporates some of the features of a traditional finding aid. the contents of the various hierarchical levels of the collection are fully represented through the manipulation of the record structures available through psu’s system. this technique provides for enhanced access to the individual items of a collection by giving the context of the item within the collection. links between the hierarchical levels facilitate navigation between the levels. although the records created for traditional library systems are not as rich as those found in traditional finding aids, or in ead, their electronic equivalent; and the visual arrangements are not as intriguing as a wellplanned web site, the ability to show how items fit within the greater context of their respective collection(s) is a step toward reconciling traditional library and archival practices. enabling the library user to virtually browse through the overall resources offered by the library and then, if desired, through the various levels of a collection for relevant resources enhances the opportunities presented to the user for finding relevant information. references and notes 1. society of american archivists, “so you want to be an archivist: an overview of the archival profession,” 2004, www.archivists.org/prof-education/arprof.asp (accessed apr. 24, 2006). 2. kent m. haworth, “archival description: content and context in search of structure,” journal of internet cataloging 4, no. 3/4 (2001): 7–26. 3. antonio panizzi, “rules for the compilation of the catalogue,” the catalogue of the british museum 1 (1841): v–ix. 4. joint steering committee for revision of aacr, angloamerican cataloguing rules, 2nd ed., 2002 revision (chicago: ala, 2002). 5. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004). 6. haworth, “archival description.” 7. oliver w. holmes, “archival arrangement: five different operations at five different levels,” american archivist 27, no. 1 (1964): 21–41; terry abraham, “oliver w. holmes revisited: levels of arrangement and description of practice,” american archivist 54, no. 3 (1991): 370–77. 8. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); xiii. 9. haworth, “archival description.” 10. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); steven l. hensen, comp., archives, personal papers, and manuscripts, 2nd ed. (chicago: society of american archivists, 1989). 11. peter carini and kelcy shepherd, “the marc standard and encoded archival description,” library hi tech 22, no. 1 (2004): 18–27; steven l. hensen, “archival cataloging and the internet: the implications and impact of ead,” journal of internet cataloging 4, no. 3/4 (2001): 75–95. 12. abraham, “oliver w. holmes revisited.” 13. elizabeth j. weisbrod and paula duffy, “keeping your online catalog from degenerating into a finding aid: considerations for loading microformat records into the online catalog,” technical services quarterly 11, no. 1 (1993): 29–42. 14. carini and shepherd, “the marc standard and encoded archival description.” 15. see, for example, margaret f. nichols, “finding the forest among the trees: the potential of collection-level cataloging,” cataloging & classification quarterly 23, no. 1 (1996): 53–71; and weisbrod and duffy, “keeping your online catalog from degenerating into a finding aid.” 16. joint steering committee for revision of aacr, angloamerican cataloguing rules, d-2. 17. innovative interfaces inc., “electronic resources management,” 2005, www.iii.com/pdf/lit/eng_erm.pdf (accessed apr. 24, 2006). 18. library of congress, marc 21 format for holdings data: including guidelines for content designation (washington, d.c.: cataloging distribution service, library of congres, 2000), appendix e–glossary. 19. carl abbot, “planning a sustainable portland: a digital library for local, regional, and state planning and policy documents—framing paper,” 2005, http://oscdl.research.pdx. edu/framing.php (accessed apr. 24, 2006). 20. anne a. salter, “21st-century archivist,” newsletter, 2003, www.lisjobs.com/newsletter/archives/sept03asalter.htm (accessed apr. 24, 2006). 70 information technology and libraries | june 2006 figure 1. example of resource record from the psu library catalog (search conducted nov. 4, 2005) appendix. figures digital collection management through the library catalog | brenner, larsen, and weston 71 figure 2. example of a bibliographic record for a journal title from the psu library catalog (search conducted nov. 4, 2005) 72 information technology and libraries | june 2006 figure 4. resource record harbor drive with linked holdings records, bibliographic records, and original documents figure 3. partial diagram of the hierarchical levels of the collection digital collection management through the library catalog | brenner, larsen, and weston 73 figure 7. comma delimited coverage load file (*.csv) figure 6. data spreadsheet figure 5. spreadsheet for tracking data 74 information technology and libraries | june 2006 figure 9. browse screen in opac figure 8. bibliographic records template digital collection management through the library catalog | brenner, larsen, and weston 75 figure 11. resource record with various links figure 10. system-created brief bibliographic record in opac 76 information technology and libraries | june 2006 figure 13. bonner resource record with linked holdings records, bibliographic records, and original documents figure 12. full bibliographic record in opac digital collection management through the library catalog | brenner, larsen, and weston 77 figure 15. bibliographic record for the oscdl collection figure 14. outline of linked records in the collection lib-mocs-kmc364-20131012113204 190 communications automation and the service attitudes of arl circulation managers james r. martin: university of rochester library, rochester, new york. the circulation function in our large academic libraries has undergone two important transformations since the turn of the century. the first of these is departmentalization; the second, automation. the departmentalization of the circulation function has tended to separate the circulation department from the library's educational and information functions , the more "professional " aspects of librarianship. laurence miller makes this point in his dissertation, "changing patterns of circulation services in university libraries, " which focuses on the rise of circulation departmentalization.1 miller surveyed large academic libraries to determine if certain services-reference, interlibrary loan, orientation, catalog assistance-were being withdrawn from the circulation function . after verifying a withdrawal of these services and identifying them as the "professional" ones, miller drew the conclusion that circulation is therefore suspect as a professional activity. 2 his are generally held conclusions as robert oram suggests: until recently, librarians have been reluctant to deal with circulation problems on an organized basis. the belief that circulation was, in part at least, custodial and clerical rather than managerial and professional underlies much of the reluctance to solve mutual circulation problems th rough a professional group.' paralleling this change in the circulation function's organizational setting, the mechanization of the circulation process has continued to move from the laborious and slow use of manual procedures and book cards toward the immediate updating and record keeping of the online system. circulation automation has passed from the early days of simply mechanizing files (represented by the batch system) to the present, where libraries have the potential capacity to perform the complete circulation control process with real-time systems. • sophisticated online systems have begun to truly control the complete circulation function. the metamorphosis of circulation automationfrom simple mechanization to full computerization-has had a tremendous impact on the technical side, the processes, of the circulation department. likewise it may well have had impact on the service attitudes, priorities, and leadership of the department. the level of automation may relate to the circulation manager's attitudes and priorities, and in the words of an american library association committee, "the impact of automation might change the image of the circulation librarian." 5 as it automates, gaining control over its own processes, the circulation department and its manager may actually become more responsive to its users-more service oriented, more "professional." in february 1980, a questionnaire was sent to circulation managers of all the ninety-eight academic libraries that hold membership in the association of research libraries. 6 it sought to (1) identify the degree and state of automation of the circulation function , classified by the three system categories of manual , batch , and online systems, and (2) to capture opinions on the circulation manager's view of his management role and his attitudes on service issues and user demands. these attitudes were related to the three types of systems. seventy-six questionnaires were returned, for a 78 percent response rate. circulation department characteristics circulation departments ranged in size from 4 to 78 ite employees. the average department size was 18, the median 14.25. the number of students employed ranged from 0 to 175. twenty-nine percent of managers said staffing was not adequate and 45 percent said they had to depend too heavily upon students. fifty-seven percent of managers of manual systems responded that they had to depend too heavily upon students, versus 27 percent of batch and 50 percent of online managers. (because of variations in what is counted, transaction volume figures are not particularly informative.) circulation system characteristics the seventy-six responding libraries reported approximately thirty-two different system configurations. thirty-nine percent of these systems were manual, 34 percent were batch, and 26 percent were online. nineteen percent of the total were manual mcbee systems and 15 percent were libs100 online systems. manual systems had been in use an average of twenty-six years, batch systems an average of eight years (range: ten months to eighteen years), and online systems an average of three years (range: three months to eight years). circulation manager characteristics typically, the circulation manager in an arl library is the head of a department. arl circulation managers had held their positions from six months to twenty years. five years was the average, but 68 percent listed five years or less. gender was evenly distributed: thirty-eight males and thirtyeight females. the managers of manual systems were 43 percent male/ 57 percent female, those of batch systems were 54 percent male/46 percent female, and of online systems 55 percent male/45 percent female. seventy percent of all managers had an mls, and 30 percent did not; 40 percent of managers of online systems did not have an mls. a majority of circulation mancommunications 191 agers (57 percent) reported spending over 25 percent of their time on matters outside of strictly circulation concerns. in fact a substantial minority, 23 percent of all managers, spent over 50 percent on extracirculation matters. satisfaction with circulation system as a group, arl circulation managers are not satisfied with their systems, as table 1 shows. online-system managers consistently rate their systems most highly. asked if their systems were "close to ideal," only 17 percent of all respondents were affirmative. only 3 percent of manual-system managers agreed that their system was "close to ideal" as compared to 12 percent of batch managers and 45 percent of online managers. hidden in these averages is the fact that three managers gave their systems perfect scores on all four questions and those systems were all online: geac, libsloo, and an ibm-based online system. (table 2 summarizes responses on the four system-performance statements.) hardware, software, and downtime circulation managers with automated systems also reported on their experience with equipment, software, and downtime. batch-system managers were more satisfied with hardware and software (7 4 percent for both) than were online managers (60 percent satisfied with hardware and 65 percent with software). however, open-ended questions revealed that dissatisfaction with online-system hardware and software centered around limitations of the libs100 system (used by 55 percent of online-system managers). the libs100 system was panned for "inflexible software," "poor fines system," and "lack of reserve book features. " (these are all long-recognized limitations that were partially addressed in the relatively recent release 24.) the downtime situation was more satisfactory, however, for online managers than batch managers. seventy-five percent reported downtime was not a problem as against more than 63 percent of batch-system managers. 192 journal of library automation vol. 14/3 september 1981 table 1. responses by type of system (n = 30 manual , 26 batch, 20 online) strongly no strongly disagree agree agree opinion disagree "our circulation system is completely adequate" manual 1(3%) 4(13 %) batch 1(4%) 5(19%) online 3(15%) 7(35%) "our circulation system is reliable" manual 1 (3%) 15(50 o/o ) batch 3(12%) 9(35%) online 5(25 %) 11(55 %) 1(3 %) 1(4 %) 1(5 %) 1(3 %) 12(40% ) 13(50 %) 6(30%) 10(33 %) 11(42 %) 3(15 %) 14(40 %) 6(23%) 3(15%) 3(10 %) 3(12 %) 1(5 %) "our circulation system's records are very accurate" manual 2(7%) 7(23%) 2(7 %) 16(53 %) 9(35%) 6(30%) 3(10 %) 3(12 % ) batch 3(8%) 12(46%) online 4(20 o/o) 10(50%) "our circulation system is close to ideal" manual 1(3 %) batch 3(12 o/o) online 3(15%) 6(30%) 3(15 %) 7(23%) 8(31 o/o) 4(20%) 22(73%) 13(50%) 4(20%) table 2. summary of responses on four system questions (detail given in table 1) standard minimum maximum mean median deviation value value variance manual 9. 9 3.27 4 16 11 batch 10.o8· 8.5 3.81 4 18 15 online 13.45. 14 4.57 5 20 21 •20 =strongly agree, 16 =agree, 12 =no opinion, 8 =disagree, 4 =strongly disagree. service attitudes respondents were asked to mark attitude statements on a five-point scale: "strongly agree," "agree," "no opinion," "disagree," and "strongly disagree." attitude statements fell into four categories: (1) specific service concerns, (2) the importance of the managerial role, (3) user problems, contacts and complaints, and (4) user demands and expectations. the averages of the last three groups were used to explore the question of association between level of automation and manager service attitudes (see table 3). specific service concerns ninety percent of circulation managers agreed that "speed of service is very important to users," and no online-system manager disagreed. forty-three percent of manual-system managers agreed that "control of circulating books tends to be inadequate." this compares to 16 percent of batch managers and 15 percent of onlinesystem managers. asked whether "users tend to expect more service than the department can give," 56 percent of manual managers agreed, as did 46 percent of batch managers and 40 percent of online-system managers. attitudes toward management role the study found that circulation managers are uniformly strong in their affirmation of the importance of their role, with a slight tendency for online managers to be more affirmative. in fact, 100 percent of respondents agreed with the statement that the "management of the circulation function is important." ninety-three percent agreed that "circulation management should rank high among the library's priorities." ninety-five percent disagreed with the negative statement that "circulation communications 193 table 3. attitude responses, averages management role (9 questions) demands and expectations (6 questions) contacts and complaints (6 questions) totals 3.913 3.92 3.99 manual batch online 4.38 4.34 4.48 5 =most positive response . 1 =least positive response. management offers little opportunity for the exercise of initiative." ninety-four percent of all managers disagreed that "circulation management lacks complexity." attitudes toward user problems, contacts, and complaints the study found that circulation managers are uniformly strong in their desire to respond to user complaints and problems, but with a slight tendency for online managers to be more favorable to the user. one hundred percent of online managers regarded user contacts as pleasant, as did 93 percent of manual and batch managers. ninety-five percent of online managers, 92 percent of batch managers, and 87 percent of manual managers affirm that patron contact provides the challenge in circulation work. eighty percent of online managers and 73 percent of manual and batch managers rejected the statement that "complaints tend to be unfounded." sixty-five percent of the respondents of online systems were more likely to favor the user by thinking "complaints are most often substantive," as compared to 50 percent of manual managers and 48 percent of batch managers. ninety percent of online managers disagreed that users "complain far too much," compared with 84 percent of batch managers and 79 percent of manual managers. attitudes toward user demands and expectations circulation managers are generally favorable in their attitudes toward user demands and expectations. several statements in this area, however, ran contrary to the tendency of online managers to agree slightly more with attitudes favorable to the user than managers of batch and manual 3.48 3.52 3.46 3.88 3.9 4.03 systems. for example, while 93 percent of manual-system managers and 85 percent of batch managers agreed that "the circulation department should be oriented towards users' expectations," only 70 percent of online managers did. on the statement, "users should be more tolerant of limitations in circulation services," manual managers disagreed by 34 percent, batch managers by 40 percent, and online managers by 20 percent. these responses against the trend of the online manager as more user oriented may be due to the fact that the study was not completely successful in differentiating between responses based on general attitudes and those based directly on the specific system in use. in other words, the relative quality of each circulation system or even the "bugs" peculiar to a ~pecific system may affect one's attitude toward the user's need to tolerate the limitations of that system. manual-system managers know the limitations on their service are keyed to inefficient systems, whereas online-system managers know their systems and services are already at a high level. this knowledge of the system in use colors service attitudes. conclusion the study found a depressed state of circulation-system development and support in arl libraries. seventy-four percent of circulation managers, on average, rated their systems negatively on basic system integrity, as shown in table 2. the thirty manual-system managers gave their systems an average score of 9, to the effect that their systems were ideal, adequate, reliable, and accurate. the twentysix batch managers gave their systems an average score of 10.08, the twenty online managers an average of 13.45. recognizing the considerable constraints under which 194 journal of library automation vol. 14/3 september 1981 today's large academic libraries struggle, there is, nonetheless, room for criticism of library priorities. this study must be viewed as only a first step (largely tentative and exploratory) in relating automation with service attitudes. it suggests that online systems may be associated with managers more positive in their view of the management role and more positive in their attitudes toward users than batchand manual-system managers. further research would be useful at this point to compare levels of automation (manual, batch, and online) with circulation-staff service attitudes or those of patrons using the systems. references l. laurence miller, "changing patterns of circulation services in university libraries" (ph.d. dissertation, florida state university, 1971), p.iii. 2. ibid., p.149. 3. robert oram, "circulation," in allen kent and harold lancour, eds., encyclopedia of library and information science, v.s (new york: marcel dekker, 1971), p.l. 4. william h. scholz, "computer-based circulation systemsa current review and evaluation," library technolo gy reports 13:237 (may 1977). 5. robert oram , " circulation," p.2. 6. james robert martin , "automation and the service environment of the circulation manager" (ph.d. dissertation, florida state university, 1980), p.22. statistics on headings in the marc file sally h. mccallum and james l. godwin: network development office, library of congress, washington, d.c. in designing an automated system, it is important to understand the characteristics of the data that will reside in the system. work is under way in the network development office of the library of congress (lc) that focuses on the design requirements of a nationwide authority file. in support of this work, statistics relating to headings that appear on the bibliographic records in the lc marc ii files were gathered. these statistics provide information on characteristics of headings and on the expected sizes and growth rates of various subsets of authority files. this information will assist in making decisions concerning the contents of authority files for different types of headings and the frequency of update required for the various file subsets. then ational commission on libraries and information science supported this work. use of these statistics to assist in system design is largely system-dependent; however, some general implications are given in the last section of this paper. in general , counts were made of the number of bibliographic records, headings that appear in those records, and distinct headings that appear on the records. the statistics were broken down by year, by type of heading, and by file. in this paper, distinct headings are those left in a file after removal of duplicates. distinctness will not be used to imply that a heading appears only once in a source bibliographic file, although distinct headings may in fact have only a single occurrence. thus, a file of records containing the distinct headings from a set of bibliographic records is equivalent in size to a marc authority file of the headings in those bibliographic records. methodology these statistics were derived from four marc ii bibliographic record files maintained internally at lc: books, serials, maps, and films. the files contain updated versions of all marc records that have been distributed by lc on the books, serials, maps, and films tape:; frum 1969 through october 1979, and a few records that were then in the process of distribution. the files do not contain cip records. a total of l ,336,182 bibliographic records were processed, including 1,134,069 from the books file, 90,174 from the serials file, 60,758 from the maps file, and 51,176 from the films file. a file of special records, called access point (ap) records, was created that contains one record for the contents of each occurrence of the following fields in the bibliographic records: 184 information technology and libraries | december 2009 thomas sommer unlv special collections in the twenty-first century university of nevada las vegas (unlv) special collections is consistently striving to provide several avenues of discovery to its diverse range of patrons. specifically, unlv special collections has planned and implemented several online tools to facilitate unearthing treasures in the collections. these online tools incorporate web 2.0 features as well as searchable interfaces to collections. t he university of nevada las vegas (unlv) special collections has been working toward creating a visible archival space in the twenty-first century that assists its patrons’ quest for historical discovery in unlv’s unique southern nevada, gaming, and las vegas collections. this effort has helped patrons ranging from researchers to students to residents. special collections has created a discovery environment that incorporates several points of access, including virtual exhibits, a collection-wide search box, and digital collections. unlv special collections also has added web 2.0 features to aid in the discovery and enrichment of this historical information. these new features range from a what’s new blog to a digital collection with interactive features. the first point of discovery within the unlv special collections website began with the virtual exhibits. staff created the virtual exhibits as static html pages that showcased unique materials housed within unlv special collections. they showed the scope and diversity of materials on a specific topic available to researchers, faculty, and students. one virtual exhibit is “dino at the sands” (figure 1), a point of discovery for the history not only of dean martin but of many rat pack exploits.1 the photographs in this exhibit come from the sands collection. it is a static html page, and it provides information and pictures regarding one of las vegas’ most famous entertainers. this exhibit contains links to rat pack information and various resources on dean martin, including photographs, books, and videotapes. a second mode of discovery within the unlv special collections website is its new “search special collections” google-like search box (figure 2). this is located on the homepage and searches the manuscript, photograph, and oral history primary source collections.2 the purpose is to aid in the discovery of material within the collections that is not yet detailed in the public online catalog. in the past researchers would have to work through the special collection’s website to locate the resources. they can now go to one place to search for various types of material—a one-stop shop. the search results are easy to read and highlight the search term (see figure 3).3 the third point of access is the digital collection. these collections are digital copies of original materials located within the archives. the digital copies are presented online, described, and organized for easy access. each collection offers full-text searches, browsing, zoom, pan, figure 2. unlv special collections search box figure 1. “dino at the sands” exhibit thomas sommer (thomas.sommer@unlv.edu) is university and technical services archivist in special collections at the university of nevada las vegas libraries. unlv special collections in the twenty-first century | sommer 185 side-by-side comparison, and exporting for presentation and reuse. the newest example of a digital collection is “southern nevada: the boomtown years” (figure 4).4 this collection brings together a wide range of original materials from various collections located within unlv special collections, the nevada state museum, the historical society in las vegas, and the clark county heritage museum. it even provides standards-based activities for elementary and high school students. this project was funded by the nevada state library and archives under the library services and technology act (lsta) as amended through the institute of museum figure 4. “southern nevada: the boomtown years” digital collection figure 5. “what’s new” blog figure 6. unlv special collection facebook page figure 3. hoover dam search results 186 information technology and libraries | december 2009 and library services (imls). unlv special collections director peter michel selected the content. the team included fourteen members, four of whom were funded by the grant. christy keeler, phd, created the educator pages and designed the student activities. new collections are great, but users have to know they exist. to announce new collections and displays, special collections first added a what’s new blog that includes an rss feed to keep patrons up-to-date on new messages (figure 5).5 another avenue of interaction was implemented in april 2009 when special collections created its own facebook page (figure 6).6 students and researchers are encouraged to become fans. status updates with images and links to southern nevada and las vegas resources lead the fans back to the main website where the other treasures can be discovered. special collections has implemented various web 2.0 features within its newest digital collections. specifically, it added a comments section, a “rate it” feature, and an rss feature to its latest digital collections (figures 7, 8, and 9). these latest trends enrich the collections’ resources with patron-supplied information.7 as is apparent, unlv special collections implemented several online tools to allow patrons to discover its extensive primary resources. these tools range from virtual exhibits and digital collections with web 2.0 features to blogs and social networking sites. special collections has endeavored to stay on top of the latest trends to benefit its patrons and facilitate their discovery of historical materials in the twenty-first century. figure 8. “rate it” feature for aerial view of hughes aircraft plant photograph figure 7. comments section for aerial view of hughes aircraft plant photograph figure 9. rss feature for the index to the “welcome home howard” digital collection continued on page 190 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . fboze rectangle cherry 154 information technology and libraries | september 2006 article title: subtitle in same font author name and second author author id box for 2 column layout the present study investigated whether there is a correlation between user performance and compliance with screen-design guidelines found in the literature. rather than test individual guidelines and their interactions, the authors took a more holistic approach and tested a compilation of guidelines. nine bibliographic display formats were scored using a checklist of eighty-six guidelines. twenty-seven participants completed ninety search tasks using the displays in a simulated web environment. none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. in some cases, user performance was actually significantly slower with greater conformity to guidelines. in a supplementary study, a different set of forty-three guidelines and the user performance data from the main study were used. again, none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. a ttempts to establish generalizations are ubiquitous in science and in many areas of human endeavor. it is well known that this enterprise can be extremely problematic in both applied and pure science.1 in the area of human-computer interaction, establishing and evaluating generalizations in the form of interface-design guidelines are pervasive and difficult challenges, particularly because of the intractably large number of potential interactions among guidelines. using bibliographic display formats from web catalogs, the present study utilizes global evaluation by correlating user performance in a search task with conformity to a compilation of eighty-six guidelines (divided into four subsets). the literature offers many design guidelines for the user interface, some of which cover all aspects of the user interface, some of which focus on one aspect of the user interface—e.g., screen design. tullis, in chapters in two editions of the handbook of human-computer interaction, reviews the work in this area.2 the earlier chapter provides a table describing the screen-design guidelines available at that time. he includes, for example, galitz, whom he notes have several hundred guidelines addressing general screen design, and smith and mosier, whom he notes have about three hundred guidelines addressing the display of data.3 earlier guidelines tended to be generic. more recently, guidelines have been developed for specific applications—e.g., web sites for airline travel agencies, multimedia applications, e-commerce, children, bibliographic displays, and public-information kiosks.4 although some of the guidelines in the literature are based on empirical evidence, many are based on expert opinion and have not been tested. some of the researchbased guidelines have been tested in isolation or in combination with only a few other guidelines. the national cancer institute (nci) web site, research-based web design and usability guidelines, rates sixty guidelines on a scale of 0 to 5 based on the strength of the evidence.5 the more valid the studies that directly support the guideline, the higher the rating. in interpreting the scores, the site advises that scores of 1, 2, or 3 suggest that “more evidence is needed to strengthen the designer’s overall confidence in the validity of a guideline.” of the sixty guidelines on the site, forty-six (76.7 percent) fall into this group. in 2003, the united states department of health and human services web site, research-based web design and usability guidelines, rated 187 guidelines on a different five-point scale.6 eightytwo guidelines (43.9 percent) meet the criteria of having strong or medium research support. another forty-eight guidelines (25.7 percent) are rated as having weak research support. thus, there is some research support for 69.6 percent of the guidelines. in addition to the issue of the validity of individual guidelines, there may be interactions among guidelines. an interaction occurs if the effect of a variable depends on the level of another variable—e.g., an interaction occurs if the usefulness of a guideline depends on whether some other guideline is being followed. a more severe problem is the potential for high-order interactions: the nature of a two-way interaction may depend on the level of a third variable, the nature of a three-way interaction may depend on the level of a fourth variable, and so on. because of the combinatorial explosion, if there are more than a few variables the number of possible interactions becomes huge. as cronbach stated: “once we attend to interactions, we enter a hall of mirrors that extends to infinity.”7 with a large set of guidelines, it is impractical to test all of the guidelines and all of the interactions, including highorder interactions. muter suggested several approaches for handling the problem of intractable high-order interactions, including adapting optimizing algorithms such as simplex, seeking “robustness in variation,” re-construing the problem, and pruning the alternative space.8 the present study utilizes another approach: global evaluation by joan m. cherry, paul muter, and steve j. szigeti bibliographic displays in web catalogs: does conformity to design guidelines correlate with user performance? joan m. cherry (joan.cherry@utoronto.ca) is a professor in the faculty of information studies; paul muter (muter@psych .utoronto.ca) is an assistant professor in the department of psychology; and steve j. szigeti (szigeti@fis.utoronto.ca) is a doctoral student in the faculty of information studies and the knowledge media design institute, all at the university of toronto, canada. bibliographic displays in web catalogs | cherry, muter, and szigeti 155 correlating user performance with conformity to a set of guidelines. using this method, particular guidelines and interactions are not tested, but the set and subsets are tested globally, and some of the interactions, including high-order interactions, are captured. bibliographic displays were scored using a compilation of guidelines, divided into four subsets, and the performance of users doing a set of search tasks using the displays was measured. an attempt was made to determine whether users find information more quickly on displays that receive high scores on checklists of screen-design guidelines. the authors are aware of only two studies that have investigated conformity with a set of guidelines and user performance, and they both included only ten guidelines. d’angelo and twining measured the correlation between compliance with a set of ten standards (d’angelo standards) and user comprehension.9 the d’angelo standards are in the form of principles for web-page design, based on a review of the literature.10 d’angelo and twining found a small correlation (.266) between number of standards met and user comprehension.11 they do not report on statistical significance, but from the data provided in the paper it appears that the correlation is not significant. gerhardt-powals compared an interface designed according to ten cognitive engineering principles to two control interfaces and found that the cognitively engineered interface resulted in statistically significantly superior user performance.12 the guidelines used in the present study were based on a list compiled by chan to evaluate displays of bibliographic records in online library catalogs.13 the set of guidelines was broken down into four subsets. participants in this study were given search tasks and clicked on the requested item on a bibliographic display. the main dependent variable of interest was response time. ฀ method participants twenty-seven participants were recruited through the university of toronto psychology 100 subject pool. seventeen were female; ten were male. most (twenty) were in the age group 17 to 24; three were in the age group 25 to 34 years, and four were in the age group 35 to 44. one had never used the web; all others reported using the web one or more hours per week. participants received course credit. design to control for the effects of fatigue, practice runs, and the like, the order of trials was determined by two orthogonal 9 x 9 latin squares—one to select a display and one to select a book record. each participant completed five consecutive search tasks—author, title, call number, publisher, and date—in a random order, with each display-book combination. (the order of the five search tasks was randomized each time.) this procedure was repeated, so that in total each participant did ninety tasks (9 displays x 5 tasks x 2 repetitions). materials and apparatus the study used nine displays from library catalogs available on the web. they were selected to represent a variety of systems and to illustrate the remarkable diversity in bibliographic displays in web catalogs. the displays differed in the amount of information included, the structure of the display, employment of highlighting techniques, and use of graphical elements. four examples of the nine displays are presented in figures 1a, 1b, 1c, and 1d. the displays were captured and presented in an interactive environment using active server page (asp) software. the look of the displays was retained, but hypertext links were deactivated. nine different book records were used to provide the content for the displays. items selected were those that would be readily understood by most users—e.g., books by saul bellow, norman mailer, and john updike. the guidelines were based on a list compiled by chan from a review of the literature in human-computer interaction and library science.14 the list does not include guidelines about the process of design. chan formatted the guidelines as a checklist for bibliographic displays in online catalogs. in work reported in 1996, cherry and cox modified the checklist for use with bibliographic displays in web catalogs.15 in a 1998 paper, cherry reported on evaluations of bibliographic displays in catalogs of academic libraries, based on chan’s data for twelve opacs and data for ten web catalogs evaluated by cherry and cox using a modification of the 1996 checklist for web catalogs.16 the findings showed that, on average, displays in opacs scored 58 percent and displays in web catalogs scored 60 percent. the 1996 checklist of guidelines was modified by herrero-solana and de moya-anegón, who used it to explore the use of multivariate analysis in evaluating twenty-five latin american catalogs.17 for the present study four questions were removed that were considered less useful from the checklist used in cherry’s 1998 analysis. the checklist consisted of four sections or subsets: labels (these identify parts of the bibliographic description); text (the display of the bibliographic, holdings/ location, and circulation status information); instructions (includes instructions to users, informational messages, and options available); and layout (includes identification of the screen, the organization for the bibliographic 156 information technology and libraries | september 2006 information, spacing, and consistency of information presentation). items on the checklist were phrased as questions requiring yes/no responses. examples of the items are: labels: “are all fields/variables labeled?” text: “is the text in mixed case (upper and lowercase)?” instructions: “are instructional sentences or phrases simple, concise, clear, and free of typographical errors?” and layout: “is the width of the display no more than forty to sixty characters?” the set used in the present study contained eightysix guidelines in total, of which forty-eight were generic and could be applied to any application. thirty-eight are specific and apply to bibliographic displays in web catalogs. the experiment was run on a pentium computer with a seventeen-inch sony color monitor with a standard keyboard and mouse. figure 1a. example of display figure 1b. example of display figure 1c. example of display figure 1d. example of display bibliographic displays in web catalogs | cherry, muter, and szigeti 157 procedure participants were tested individually. five practice trials with a display and book record not used in the experiment familiarized the participant with the tasks and software. at the beginning of a trial, the message “when ready, click” appeared on the screen. when the participant clicked on the mouse, a bibliographic display appeared along with a message at the top of the screen indicating whether the participant should click on the author, title, call number, publisher, or date of publication—e.g., “current task: author.” participants clicked on what they thought was the correct answer. if they clicked on any other area, the display was shown again. an incorrect click was not defined as an error—in effect, percent correct was always 100—but an incorrect click would of course add to the response time. the software recorded the time to successfully complete each search, the identification for the display and the book record, and the search-task type. when a participant completed the five search tasks for a display, a message was shown indicating the average response time on that set of tasks. when participants completed the ninety search tasks, they were asked to rank the nine displays according to their preference. for this task, a set of laminated color printouts of the displays was provided. participants ranked the displays, assigning a rank of 1 to the display that they preferred most, and 9 to the one they preferred least. they were also asked to complete a short background questionnaire. the entire session took less than forty-five minutes. scoring the displays on screen design guidelines the authors’ experience has indicated that judging whether a guideline is met can be problematic: evaluators sometimes differ in their judgments. in this study, three evaluators assessed each of the nine displays independently. if there was any disagreement amongst the evaluators’ responses for a given question for a given display, that question was not used in the computation of the percentage score for that display. (a guideline regarding screen density was evaluated by only one evaluator because it was very time-consuming.) the total number of questions used to assess each display was eighty-six. the number of questions on which the evaluators disagreed ranged from twelve to thirty across the nine displays. all questions on which the three evaluators agreed for a given display were used in the calculation of the percentage score for that display. hence the percentage scores for the displays are based on a variable set and number of questions—from fifty-six to seventy-four. the subset of questions on which the three evaluators agreed for all nine displays was small—twenty-two questions. ฀ results with regard to conformity to the guidelines, in addition to the overall scores for each display, which ranged from 42 percent to 65 percent, the percentage score was calculated for each subset of the checklist (labels, text, instructions, and layout). the time to successfully complete each search task was recorded to the nearest millisecond. (for some unknown reason, six of the 2,430 response times recorded [27 x 90] were 0 milliseconds. the program was written in such a way that the response-time buffer was cleared at the time of stimulus presentation, in case the participant clicked just before this time. these trials were treated as missing values in the calculation of the means.) six mean response times were calculated: author, title, call number, publisher, date, and the sum of the five response times, called all tasks. the mean of all tasks response times ranged from 13,671 milliseconds to 21,599 milliseconds for the nine formats. the nine display formats differed significantly on this variable according to an analysis of variance, f(8, 477) = 17.1, p < .001. the correlations between response times and guidelines-conformance scores are presented in table 1. it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 1 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 1 are not all independent of each other. only five of the thirty correlations in table 1 are significant at the .05 level, and they all indicate slower response times with higher conformity to guidelines. of the six correlations in table 1 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 1 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.469 (df = 7, p = 0.203)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 2 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 3 shows a scatter plot for the highest obtained correlation: between score on the overall checklist of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 1: no correlation in figure 2, and slower search times with higher guidelines scores in figure 3. finally, correlations were computed between preference and response times (all tasks response times and five 158 information technology and libraries | september 2006 specific-task response times) and between preference and conformity to guidelines (overall guidelines four subsets of guidelines). none of the eleven correlations approached statistical significance. ฀ supplementary study to further validate the results of the main study, it was decided to score the interfaces against a different set of guidelines based on the 2003 u.s. department of health and human services research-based web design and usability guidelines. this set consists of 187 guidelines and includes a rating for each guideline based on strength of research evidence for that guideline. the present study started with eighty-two guidelines that were rated as having either moderate or strong research support, as the definitions of both of these include “cumulative research-based evidence.”18 compliance with guidelines that address the process of design can only be judged during the design process, or via access to the interface designers. since this review process did not allow for that, a total of nine process-focused guidelines were discarded. this set of seventy-three guidelines was then compared with the sixty-guideline 2001 nci set, research-based web design and usability guidelines, intending to add any outstanding nci guidelines supported by strong research evidence to the existing list of seventy-three. however, all of the strongly supported nci guidelines were already represented in the original seventy-three. finally, the guidelines in the iso 9241, ergonomic requirements for office work with visual display terminals (vdts), part 11 (guidance on usability), part 12 (presentation of information ), and part 14 (menu dialogues ) were compared to the existing set of seventy-three, with the intention that any prescriptive guideline in the iso set that was not already included in the original seventy-three would be added.19 again, there were none. the seventy-three guidelines were organized into three thematic groups: (1) layout (the organization of textual and graphic material on the screen), (2) interaction (which included navigation or any element with which the user would interact), and (3) text and readability. all of the guidelines used were written in a manner allowing readers room for interpretation. the authors explicitly stated that they were not writing rules, but rather, guidelines, and recognized that their application must allow for a level of flexibility.20 this ambiguity creates problems in terms of assessing displays. in this study, two evaluators independently assessed the nine displays. the first evaluator applied all seventy-three guidelines and found thirty to be nonapplicable to the specific types of interfaces considered. the second evaluator applied the shortened list of forty-three guidelines. following the independent evaluations, the two evaluators compared assessments. the initial rate of agreement between the two assessments ranged from 49 percent to 70 percent across the nine displays. in cases where there was disagreement, the evaluators discussed their rationale for the assessment in order to achieve consensus. ฀ results of supplementary study as with the initial study, in addition to the overall scores for each display, the percentage score was calculated for each subset of the checklist (labels, interaction, and text and readability). it is worth noting that the overall scores witnessed higher compliance to this second set of guidelines, ranging from 68 percent to 89 percent. the correlations between response times and guidelines-conformance scores are presented in table 2. again, it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 2 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 2 are not all independent of each other. only one of the twenty-four correlations in table 2 table 1. correlations between scores on the checklist of screen design guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .469 (.203) .401 (.285) .870 (.002) .547 (.127) .035 (.930) .247 (.522) labels: .722 (.028) .757 (.018) .312 (.413) .601 (.087) .400 (.286) .669 (.049) text: -.260 (.500) -.002 (.997) .595 (.091) -.191 (.623) -.412 (.271) -.288 (.452) instructions: .422 (.258) .442 (.234) .712 (.032) .566 (.112) .026 (.947) .126 (.748) layout: .602 (.086 -.102 (.794) .383 (.308) .624 (.073) .492 (.179) .367 (.332) bibliographic displays in web catalogs | cherry, muter, and szigeti 159 is significant at the .05 level, and it indicates a slower response time with higher conformity to guidelines. of the ten correlations in table 2 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 2 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.292 (p = 0.445)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 4 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 5 shows a scatter plot for the highest-obtained correlation: between score on the text and readability category of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 2: no correlation in figure 4, and slower search times with higher guidelines scores in figure 5. ฀ discussion in the present experiment and the supplementary study, none of the correlations indicating faster user performance with greater conformity to guidelines approached statistical significance. in some cases, user performance was actually significantly slower with greater conformity to guidelines—i.e., in some cases, there was a negative correlation between user performance and conformity to guidelines. the authors are aware of no other study indicating a negative correlation between user performance and conformity to interface design guidelines. some researchers would not be surprised at a finding of zero correlation between user performance and conformity to guidelines, but a negative correlation is somewhat puzzling. a negative correlation implies that there is something wrong somewhere—perhaps incorrect underlying theories or an incorrect body of assumptions. such a negative correlation is not without precedent in applied science. in the field of medicine, before the turn of the twentieth century, seeing a doctor actually decreased the chances of improving health.21 presumably, medical guidelines of the time were negatively correlated with successful practice, and the negative correlation implies not just worthlessness, but medical theories or beliefs that were actually incorrect and harmful. the boundary conditions of the present findings are unknown. the present findings may be specific to the tasks employed—fairly simple search tasks. the findings may apply only to situations in which the user is switching formats frequently, as opposed to situations in which each user is using only one format. (a between-subjects design would test this possibility.) the findings may be specific to the two sets of guidelines used. with sets of ten guidelines, d’angelo and twining and gerhardt-powals found positive correlations between user performance and conformity to guidelines (though apparently not statistically significantly in the former study).22 the guidelines used in the authors’ main study and supplementary study tended to be more detailed than in the other two studies. detailed guidelines are sometimes seen as advantageous, since developers who use guidelines need to be able to interpret the guidelines in order to implement them. however, perhaps following a large number of detailed figure 2. scatter plot for overall score on checklist of screen design guidelines and time to complete set of five search tasks figure 3. scatter plot for overall score on checklist of screen design guidelines and time to complete “title” search tasks 160 information technology and libraries | september 2006 guidelines reduces the amount of personal judgment used and results in less effective designs. (designers of the nine displays used in the present study would not have been using either of the sets of guidelines used in our studies but may have been using some of the sources from which our guidelines were extracted.) as noted by cheepen in discussing guidelines for voice dialogues, sometimes a designer’s experience may be more valuable than a particular guideline.23 the lack of agreement in interpreting the guidelines was an unexpected but interesting factor revealed during the collection of data in both the main study and the supplementary study. while a higher rate of agreement had been expected, the differences raised an important point in the use of guidelines. if guidelines intentionally leave room for interpretation, what factor does expert opinion and experience play in design? in the main study, the number of guidelines on which the evaluators disagreed ranged from 14 percent to 35 percent across the nine displays. in the supplementary study, both evaluators had experience in interface design through a number of different roles in the design process (both academic and professional). this meant the evaluators’ interpretations of the guidelines were informed by previous experience. the initial level of disagreement ranged from 30 percent to 51 percent across the nine displays. while it was possible to quickly reach consensus table 2. correlations between scores on subset of the u.s. dept. of health and human services (2003) research–based web design and usability guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .292 (.445) .201 (.604) .080 (.839) -.004 (.992) .345 (.363) .499 (.172) layout: -.308 (.420) -.264 (.492) -.512 (.159) -.332 (.383) .046 (.906) -.294 (.442) text: .087 (.824) -.051 (.895) .712 (.032) -.059 (.879) -.095 (.808) -.259 (.500) interaction: .638 (.065) .603 (.085) .055 (.887) .439 (.238) .547 (.128) .625 (.072) figure 4. scatter plot for subset of u.s. department of health and human services (2003) research–based web design and usability guidelines conformance score and total time to complete five search tasks figure 5. scatter plot for text and readability category of u.s. department of health and human services (2003) research–based web design and usability guidelines and time to complete “title” search tasks bibliographic displays in web catalogs | cherry, muter, and szigeti 161 on a number of assessments (because both evaluators recognized the high degree of subjectivity that is involved in design), it also led to longer discussions regarding the intentions of the guideline authors. a majority of the differences involved lack of guideline clarity (where one evaluator had indicated a meet-or-fail score, while another felt the guideline was either unclear or not applicable). does this imply that guidelines can best be applied by committees or groups of designers? the dynamic of such groups would add another complex variable to understanding the relationship between guideline conformity and user performance. future research should test other tasks and other sets of guidelines to confirm or refute the findings of the present study. there should also be investigation of other potential predictors of display effectiveness. for example, would the ratings of usability experts or graphic designers for a set of bibliographic displays be positively correlated with user performance? crawford, in response to a paper presenting findings from an evaluation of bibliographic displays using a previous version of the checklist of guidelines used in the main study, commented that the design of bibliographic displays still reflects art, not science.24 several researchers have discussed aesthetics and user interface design. reed et al. noted the need to extend our understanding of the role of aesthetic elements in the context of user-interface guidelines and standards.25 ngo, teo, and byrne discussed fourteen aesthetic measures for graphic displays.26 norman discussed these ideas in “emotions and design: attractive things work better.”27 tractinsky, katz, and ikar found strong correlations between perceived aesthetic appeal and perceived usability.28 most empirical studies of guidelines have looked at one variable only or, at the most, a small number of variables. the opposite extreme would be to do a study that examines a large number of variables factorially. for example, assuming eighty-six yes/no guidelines for bibliographic displays, it would be theoretically possible to do a factorial experiment testing all possible combinations of yes/no—2 to the 86th power. in such an experiment, all two-way interactions and higher interactions could be assessed, but such an experiment is not feasible. what the authors have done is somewhere between these two extremes. this study has the disadvantage that we cannot say anything about any individual guideline, but it has the advantage that it captures some of the interactions, including highorder interactions. despite the present results, the authors are not recommending abandoning the search for guidelines in interface design. at a minimum, the use of guidelines may increase consistency across interfaces, which may be helpful. however, in some research domains, particularly when huge numbers of potential interactions result in extreme complexity, it may be advisable to allocate resources to means other than attempting to establish guidelines, such as expert review, relying on tradition, letting natural selection take its course, utilizing the intuitions of designers, and observing user-interaction. indeed, in pure and applied research in general, perhaps more resources should be allocated to means other than searching for explicit generalizations. future research may better indicate when to attempt to establish generalizations and when to use other methods. ฀ acknowledgements this work was supported by a social sciences and humanities research council general research grant awarded by the faculty of information studies, university of toronto, and by the natural sciences and engineering research council of canada. the authors wish to thank mark dykeman and gerry oxford who developed the software for the experiment; donna chan, joan bartlett, and margaret english, who scored the displays with the first set of guidelines; everton lewis, who conducted the experimental sessions; m. max evans, who helped score the displays with the supplementary set of guidelines; and robert l. duchnicky, jonathan l. freedman, bruce oddson, tarjin rahman, and paul w. smith for helpful comments. references and notes 1. see, for example, a. chapanis, “some generalizations about generalization,” human factors 30, no. 3 (1988): 253–67. 2. t. s. tullis, “screen design,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 377–411; t. s. tullis, “screen design,” in handbook of humancomputer interaction, 2d ed., eds. m. helander, t. k. landauer, and p. prabhu (amsterdam: elsevier, 1997), 503–31. 3. w. o. galitz, handbook of screen format design, 2d ed. (wellesley hills, mass.: qed information sciences, 1985); s. l. smith and j. n. mosier, guidelines for designing user interface software, technical report esd-tr-86-278 (hanscom air force base, mass.: usaf electronic systems division, 1986). 4. c. chariton and m. choi, “user interface guidelines for enhancing the usability of airline travel agency e-commerce web sites,” chi ‘02 extended abstracts on human factors in computing systems, apr. 20–25, 2002 (minneapolis, minn.: acm press), 676–77, http://portal.acm.org/citation .cfm?doid=506443.506541 (accessed dec. 28, 2005); m. g. wadlow, “the andrew system; the role of human interface guidelines in the design of multimedia applications,” current psychology: research and reviews 9 (summer 1990): 181–91; j. kim and j. lee, “critical design factors for successful e-commerce systems,” behaviour and information technology 21, no. 3 (2002): 185–99; s. giltuz and j. nielsen, usability of web sites for children: 162 information technology and libraries | september 2006 70 design guidelines (fremont, calif.: nielsen norman group, 2002); juliana chan, “evaluation of formats used to display bibliographic records in opacs in canadian academic and public libraries,” master of information science research project report (university of toronto: faculty of information studies, 1995); m. c. maquire, “a review of user-interface design guidelines for public information kiosk systems,” international journal of human-computer studies 50, no. 3 (1999): 263–86. 5. national cancer institute, research-based web design and usability guidelines (2001), www.usability.gov/guidelines/index .html (accessed dec. 28, 2005). 6. u.s. department of health and human services, researchbased web design and usability guidelines (2003), http://usability .gov/pdfs/guidelines.html (accessed dec. 28, 2005). 7. l. j. cronbach, “beyond the two disciplines of scientific psychology,” american psychologist 30, no. 2 (1975): 116–27. 8. p. muter, “interface design and optimization of reading of continuous text,” in cognitive aspects of electronic text processing, eds. h. van oostendorp and s. de mul (norwood, n.j.: ablex, 1996), 161–80; j. a. nelder and r. mead, “a simplex method for function minimization,” computer journal 7, no. 4 (1965): 308–13; t. k. landauer, “research methods in human-computer interaction,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 905–28; r. n. shepard, “toward a universal law of generalization for psychological science,” science 237 (sept. 11, 1987): 1317–323. 9. j. d. d’angelo and j. twining, “comprehension by clicks: d’angelo standards for web page design, and time, comprehension, and preference,” information technology and libraries 19, no. 3 (2000): 125–35. 10. j. d. d’angelo and s. k. little, “successful web pages: what are they and do they exist?” information technology and libraries 17, no. 2 (1998): 71–81. 11. d’angelo and twining, “comprehension by clicks.” 12. j. gerhardt-powals, “cognitive engineering principles for enhancing human-computer performance,” international journal of human-computer interaction 8, no. 2 (1996): 189–211. 13. chan, “evaluation of formats.” 14. ibid. 15. joan m. cherry and joseph p. cox, “world wide web displays of bibliographic records: an evaluation,” proceedings of the 24th annual conference of the canadian association for information science (toronto, ontario: canadian association for information science, 1996), 101–14. 16. joan m. cherry, “bibliographic displays in opacs and web catalogs: how well do they comply with display guidelines?” information technology and libraries 17, no. 3 (1998): 124– 37; cherry and cox, “world wide web displays of bibliographic records.” 17. v. herrero-solana and f. de moya-anegón, “bibliographic displays of web-based opacs: multivariate analysis applied to latin-american catalogs,” libri 51 (june 2001): 75–85. 18. u.s. department of health and human services, researchbased web design and usability guidelines, xxi. 19. international organization for standardization, iso 924111: ergonomic requirements for office work with visual display terminals (vdts)—part 11: guidance on usability (geneva, switzerland: international organization for standardization, 1998); international organization for standardization, iso 9241-12: ergonomic requirements for office work with visual display terminals (vdts)—part 12: presentation of information (geneva, switzerland: international organization for standardization, 1997); international organization for standardization, iso 9241-14: ergonomic requirements for office work with visual display terminals (vdts)—part 14: menu dialogues (geneva, switzerland: international organization for standardization, 1997). 20. u.s. department of health and human services, researchbased web design and usability guidelines. 21. ivan illich, limits to medicine: medical nemesis: the expropriation of health (harmondsworth, n.y.: penguin, 1976). 22. d’angelo and twining, “comprehension by clicks”; gerhardt-powals, “cognitive engineering principles.” 23. c. cheepen, “guidelines for dialogue design—what is our approach? working design guidelines for advanced voice dialogues project. paper 3,” (1996), www.soc.surrey.ac.uk/research/ reports/voice-dialogues/wp3.html (accessed dec. 29, 2005). 24. w. crawford, “webcats and checklists: some cautionary notes,” information technology and libraries 18, no. 2, (1999): 100–03; cherry, “bibliographic displays in opacs and web catalogs.” 25. p. reed et al., “user interface guidelines and standards: progress, issues, and prospects,” interacting with computers 12, no. 1 (1999): 119–42. 26. d. c. l. ngo, l. s. teo, and j. g. byrne, “formalizing guidelines for the design of screen layouts,” displays 21, no. 1 (2000): 3–15. 27. d. a. norman, “emotion and design: attractive things work better,” interactions 9, no. 4 (2002): 36–42. 28. n. tractinsky, a. s. katz, d. ikar, “what is beautiful is usable,” interacting with computers 13, no. 2 (2000): 127–45. lib-mocs-kmc364-20140106084054 a computer system for effective management of a medical library network 213 richard e. nance and w. kenneth wickham: computer science/ operations research center, institute of technology, southern methodist university, dallas, texas, and maryann duggan: systems analyst, south central regional medical library program, dallas, texas trips (talon reporting and information processing system) is an interactive software system for generating reports to nlm on regional medical library network activity and constitutes a vital part of a network management information system (nemis) for the south central regional medical library program. implemented on a pdp-lofsru 1108 interfaced system, trips accepts paper tape input describing network transactions and generates output statistics on disposition of requests, elapsed time for completing filled requests, time to clear unfilled requests, arrival time distribution of requests by day of month, and various other measures of activity andjor performance. emphasized in the trips design are flexibility, extensibility, and system integrity. processing costs, neglecting preparation of input which may be accomplished in several ways, are estimated at $.05 per transaction, a transaction being the transmittal of a message from one library to another. introduction the talon (texas, arkansas, louisiana, oklahoma, and new mexico) regional medical library program is one of twelve regional programs established by the medical library assistance act of 1965. the regional programs form an intermediate link in a national biomedical information network with the national library of medicine ( nlm) at the apex. unlike 214 journal of library automation vol. 4/4 december, 1971 most of the regional programs that formed around a single library, talon evolved as a consortium of eleven large medical resource libraries with administrative headquarters in dallas. a major focus of the talon program is the maintenance of a document delivery service, created in march 1970, to enable rapid access to published medical information. twx units located in ten of the resource libraries and at talon headquarters in dallas comprise the major communication channel. in july 1970 a joint program was initiated to develop a statistical reporting system for the talon document delivery network. design and development of the system was done by the computer science/operations research center at southern methodist university, while training and operational procedures were developed by talon personnel. both parties in the effort view the statistical reporting system as a vital first step in providing talon administrators with a comprehensive network management information system (nemis ). an overview of this statistical reporting system, designated as trips (talon reporting and information processing systems), and its relation to nemis is discussed in the following paragraphs. the objectives and design characteristics of nemis are stated in ( 1 ). design requirements there were two considerations for requirements for a network management information service ( nemis ) for talon: 1) in what environment would talon function? 2) what should be the objectives of a network management information service and what part does a statistical reporting system play in its development? the talon staff and the design team spent an intensive period in joint discussion of these two questions. talon environment the talon document delivery network operates in an expansive geographical area (figure 1). the decentralized structure of the network enables information transfer between any two resource libraries. in addition talon headquarters serves as a switching center, by accepting loan requests, locating documents, and relaying requests to holding libraries. a requirement placed on talon by nlm is the submission of monthly, quarterly, and annual reports giving statistical data on network activity. these statistics provide details on: 1) requests received by channel used (mail, telephone, twx, other), 2) disposition of requests (rejected, accepted and filled , accepted and unfilled), 3) response time for filled requests, 4) response time for unfilled requests, 5) most frequent user libraries, 6) requests received from each of the other regions, and 7) non-medlars reference inquiries. a medical library networkjnance 215 • fig. 1. location of the eleven resource libraries and talon headquarters. monthly reports require cumulative statistics on year-to-date performance, and each of the eleven resource libraries and talon headquarters is required to submit a report on its activity. needs and objectives while the immediate need of the talon network was to develop a system to eliminate manual preparation of nlm reports, an initial decision was made to develop software also capable of assisting talon management in policy and decision making. eventual need for a network management information system ( nemis) being recognized, the talon reporting and information processing system (trips) was designed as the first step in the creation of nemis. provision of information in a form suitable for analytical studies of policy and decision makinge.g., the message distribution problem described by nance ( 2) -placed some stringent requirements on trips. for instance, the identification of primitive data elements could not be made from report considerations only; an overall decision had to be made that no sub-item of information would ever be required for a data element. in addition the system demanded flexibility and extensibility, since it was to operate in a highly dynamic environment. these characteristics are quite apparent in the design of trips. 216 journal of library automation vol. 4/4 december, 1971 trips design trips is viewed as a system consisting of hardware and software components. the description of this system considers: 1) the input, 2) the software subsystems (set of programs), 3) hardware components, and 4) the output. emphasis is placed on providing an overview, and no effort is made to give a detailed description. the environment in which trips is to operate is defined in a single file ( for25.dat). this file assigns network parameters, e.g., number of reporting libraries, library codes, and library titles. the file is accessed by subprograms written in fortran iv and dystal ( 3), the latter being a set of fortran iv subprograms, termed dystal functions, that perform primitive list processing and dynamic storage allocation operations. because it requires only fortran iv trips can be implemented easily on most computers. input a transaction log, maintained by each regional library and talon headquarters, constitutes the basic input to trips. copies of log sheets are used to create paper tape description of the transactions. if and when compatibility is achieved between standard twx units and telephone entry to computer systems, the input could be entered directly by each regional library. (this is technically possible at present. ) currently, talon headquarters is converting the transaction descriptions to machine readable form. initial data entry under normal circumstances is pictured in figure 2, which shows the sequence of operations and file accesses in two phases: 1) data entry and 2) report generation. data entry in tum comprises 1) collecting statistics, 2) diagnosis and verification of input data and 3) backup of original verified input data. trips is designed to be extremely sensitive to input data. all data is subjected to an error analysis, and a specific file (for22.dat ) is used to collect errors detected or diagnosed in the error analysis routine. only verified data records are transmitted to the statistical accumulation file (for20.dat). software subsystems trips comprises seven subsystems or modules. within each module are several fortran iv subprograms, dystal function and/ or pdp-10 systems programs discussed under hardware components in the following section: newy: run at the beginning of each year, newy builds an in-core data structure and transfers it to disk for each resource library in the network. it further creates the original data backup disk file ( for23.dat). after disk formatting , record (the accessing and storage module) may be activated to begin accumulating statistics for the new year. a medical library networkjnance 217 statistical collection l~cport genera tion reimburs ab le statis tic s repor t non-reimburs•ble s tat istics report fig. 2. trips structure newq: newm: dumpl: record: report: manage: run between quarters, newq purges the past quarter statistics for each library and prepares file for23.dat for the next quarter. the report for the quarter must be generated before newq is executed. run between months, newm purges the monthly statistics for each regional library and prepares file for23.dat for the backing up of next month's data. the utility module causes a dyst al dump of the data base. the accessing and storage module record incorporates the error diagnosis on input and the entry of validated data records into file for23.dat. no data record with an indicated error is permitted, and erroneous records are flagged for exception reporting. the error report (ermes.dat) may be printed on the teletype or line printer after execution of record. the reporting module report generates all reimbursable statistics on a month-to-date, quarter-to-date, and year-todate basis. utilization of trips as a network management tool is afforded by manage, which combines statistics from reimbursable and non-reimbursable transactions to generate a report providing measures of total network activity and performance. 218 journal of library automation vol. 4/4 december, 1971 the primary files used by the software subsystems are described briefly in table 1. table 1. primary files in trips file name function of the file for25.da t contains the system definition parameters and initialization values. for20.dat for2l.dat statistical accumulation for validated data records. generation of reports from information in for20.dat. comments created from card input to assure proper format. two parts : file type ascii ( 1) input translator binary data structure, and (2) statistical data base. carriage control charascii acters must be included to generate reports. for22.dat collects data records errors accumulated ascii diagnosed as in error. in for22.da t are transmitted to ermes.dat for output. for23.dat enables creation and each month's valiascii updating of the backdated records added up magnetic tape. to tape. for24.dat enables recovery tape information binary read of backup tape. stored prior to transfer of file information to for20.dat. ermes.dat serves to output mesif 6 or less errors ocascii sages on data records cur ermes is not diagnosed as in error. created and messages are output to teletype. if more than 6 errors, an estimate of typing time is given to user who has option of printing them on the teletype or in a report form on the line printer. a medical library networkjnance 219 a major concern in any management information system is the system integrity. in addition to the diagnosis of input data, trips concatenates sequential copies of disk file for23.dat to provide a magnetic tape backup containing all valid data records for the current year. a failsafe tape, containing all trips programs, is also maintained. hardware components conversion of transaction information to machine readable form is done off line currently. using a standard twx with ascii code, paper tapes are created and spliced together. fed through a paper tape reader to a pdp-10 (digital equipment company), the input data is submitted to trips. control of trips is interactive, with the user monitoring program execution from a teletype. all file operations are accomplished using the pdp-10 via the teletype, and the output reports are created on a high-speed line printer. with sm,u's pdp-10 and sru 1108 interface, report generation can be done on line printers at remote terminals to the sru 1108 as well. output trips output consists of a report for each library in the network and a composite for the entire network. the report may be limited to reimbursable statistics or include all statistics. information includes: 1) errors encountered in the input phase, 2) number of requests received by channel, 3 ) disposition of requests (i.e., rejected, accepted/ filled, accepted/ unfilled, etc. ) , 4) elapsed time for completing :filled requests or clearing unfilled requests, 5) geographic origin of requests, 6) titles for which no holdings were located within the region, 7 ) types of requesting institutions, 8) arrival time distribution of requests by day of month, 9) invoice for reimbursement by talon, 10 ) node/ network dependency coefficient as described by ( 4). summary trips is now entering its operational phase. training of personnel at the resource libraries is concluded, and data on transactions are being entered into the system. input errors have decreased significantly ( from fifteen or twenty percent to approximately two percent ). talon personnel are enthusiastic, and needless to say the regional library staffs are happy to see a bothersome, time-consuming manual task eliminated. in summary, the following characteristics of trips deserve repeating: 1) with its modular construction, it is flexible and extensible. 220 journal of library automation vol. 4/4 december, 1971 2) implemented in dystal and fortran iv, it should allow installation on most computers without major modifications. 3) designed to operate in an interactive environment, it can be modified easily to function in a batch processing environment. 4) trips is extremely sensitive to system integrity, providing diagnosis of input data, reporting of errors, magnetic tape backup of data files, and a system failsafe tape. 5) definition of primitive data elements and the structural design of trips enable it to serve as the nucleus of a network management information system ( nemis) as well as to generate reports required by nlm. 6) currently accepting paper tape as the input medium, trips could be modified easily to accept punched card input and with more extensive changes could derive the input information during the message transfer among libraries. finally, the processing cost of operating trips, neglecting the conversion to paper tape, is estimated to be $.05 per transaction (a message transfer from one library to another). extensive and thorough documentation of trips has been provided. availability of this documentation is under review by the funding agency. acknowledgment work described in this article was done under contract hew phs 1 g04 lm 00785-01, administered by the south central regional medical library program of the national library of medicine. the authors express their appreciation to dr. u. narayan bhat and dr. donald d. hendricks for their contributions to this work. references 1. "nemis -a network management information system," status report of the south central regional medical library program, october 26, 1970. 2. nance, richard e.: "an analytical model of a library network," journal of the american society for information science, 21: (jan.-feb. 1970), 58-66. 3. sakoda, james m.: dyst aldynamic storage allocation language manual, (providence, r. i.: brown university, 1965). 4. duggan, maryann, "library network analysis and planning (libnat)," journal of library automation, 2: (1969), 157-175. article title | author 39 author id box for 2 column layout thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria 39 author id box for 2 column layout knowledge organization systems denotes formally represented knowledge that is used within the context of digital libraries to improve data sharing and information retrieval. to increase their use, and to reuse them when possible, it is vital to manage them adequately and to provide them in a standard interchange format. simple knowledge organization systems (skos) seem to be the most promising representation for the type of knowledge models used in digital libraries, but there is a lack of tools that are able to properly manage it. this work presents a tool that fills this gap, facilitating their use in different environments and using skos as an interchange format. u nlike the largely unstructured information avail­ able on the web, information in digital libraries (dls) is explicitly organized, described, and man­ aged. in order to facilitate discovery and access, dl sys­ tems summarize the content of their data resources into small descriptions, usually called metadata, which can be either introduced manually or automatically generated (index terms automatically extracted from a collection of documents). most dls use structured metadata in accor­ dance with recognized standards, such as marc21 (u.s. library of congress 2004) or dublin core (iso 2003). in order to provide accurate metadata without ter­ minological dispersion, metadata creators use different forms of controlled vocabularies to fill the content of typi­ cal keyword sections. this increase of homogeneity in the descriptions is intended to improve the results provided by search systems. to facilitate the retrieval process, the same vocabularies used to create the descriptions are usu­ ally used to simplify the construction of user queries. as there are many different schemas for modeling controlled vocabularies, the term knowledge organization systems (kos) is intended to encompass all types of schemas for organizing information and promoting knowledge management. as hodge (2000) says, “a kos serves as a bridge between the users’ information need and the material in the collection.” some types of kos can be highlighted. examples of simple types are glossaries, which are only a list of terms (usually with definitions), and authority files that control variant ver­ sions of key information (such as geographic or personal names). more complex are subject headings, classifica­ tion schemes, and categorization schemes (also known as taxonomies) that provide a limited hierarchical structure. at a more complex level, kos includes thesauri and less traditional schemes, such as semantic networks and ontologies, that provide richer semantic relations. there is not a single kos on which everyone agrees. as lesk (1997) notes, while a single kos would be advantageous, it is unlikely that such a system will ever be developed. culture constrains the knowledge classifi­ cation scheme because what is meaningful to one area is not necessarily meaningful to another. depending on the situation, the use of one or another kos has its advan­ tages and disadvantages, each one having its place. these schemas, although sharing many characteristics, usually have been treated heterogeneously, leading to a variety of representation formats to store them. thesauri are an example of the format heterogeneity problem. according to iso­2788 (norm for monolingual thesauri) (iso 1986), a thesaurus is a set of terms that describe the vocabulary of a controlled indexing language, formally organized so that the a priori relationships between con­ cepts (for example, synonyms, broader terms, narrower terms, and related terms) are made explicit. this stan­ dard is complemented with iso­5964 (iso 1985), which describes the model for multilingual thesauri, but none of them describe a representation format. the lack of a stan­ dard representation model has caused a proliferation of incompatible formats created by different organizations. so each organization that wants to use several external thesauri has to create specific tools to transform all of them to the same format. in order to eliminate the heterogeneity of represen­ tation formats, the w3c initiative has promoted the development of simple knowledge organization systems (skos) (miles et al. 2005) for its use in the semantic web environment. skos has been created to represent simple kos, such as subject heading lists, taxonomies, classifica­ tion schemes, thesauri, folksonomies, and other types of controlled vocabulary as well as concept schemes embed­ ded in glossaries and terminologies. although skos has been recently proposed, the number and importance of organizations involved in its creation process (and that publish their kos in this format) indicates that it will probably become a standard for kos representation. skos provides a rich, machine­readable language that is very useful to represent kos, but nobody would expect to have to create it manually or by just using a general­purpose resource description framework (rdf) editor (skos is rdf­based). however, in the digital library area, there are not specialized tools that are able to manage it adequately. therefore, this work tries to fill this gap, describing an open source tool, thmanager, that thmanager: an open source tool for creating and visualizing skos javier lacasta, javier nogueras-iso, francisco javier lópez-pellicer, pedro rafael muro-medrano, and francisco javier zarazaga-soria javier lacasta (jlacasta@unizar.es) is assistant professor, javier nogueras-iso (jnog@unizar.es) is assistant professor, francisco javier lópez-pellicer (fjlopez@unizar.es) is research fellow, pedro rafael muro-medrano (prmuro@ unizar.es) is associate professor, and francisco javier zarazaga-soria (javy@unizar.es) is associate professor in the computer science and systems engineering department, university of zaragoza, spain. �0 information technology and libraries | september 2007�0 information technology and libraries | september 2007 facilitates the construction of skos­based kos. although thmanager has been created to manage thesauri, it also is appropriate to create and manage any other models that can be represented using skos format. this article describes the thmanager tool, highlight­ ing its characteristics. thmanager’s layer­based architec­ ture permits the reuse of the components created for the management of thesauri in other applications where they are also needed. for example, it facilitates the selection of values from a controlled vocabulary in a metadata cre­ ation tool, or the construction of user queries in a search client. the tool is distributed as open source software accessible through the sourceforge platform (http:// thmanager.sourceforge.net/). ■ state of the art in thesaurus tools and representation models the problem of creating appropriate content for thesauri is of interest in the dl field and other related disciplines, and an increasing number of software packages have appeared in recent years for constructing thesauri. for instance, the web site of willpower information (http://www .willpower.demon.co.uk/thessoft.htm) offers a detailed revision of more than forty tools. some are only avail­ able as a module of a complete information storage and retrieval system, but others also allow the possibility of working independently of any other software. among these thesaurus creation tools, one may note the follow­ ing products: ■ bibliotech (http://www.inmagic.com/). this is a multiplatform tool that forms part of bibliotech pro integrated library system and can be used to build an ansi/niso standard thesaurus (standard z39.19 [ansi 1993]). ■ lexico (http://www.pmei.com/lexico.html). this is a java­based tool that can be accessed and/or manip­ ulated over the internet. thesauri are saved in a text­based format. it has been used by the u.s. library of congress to manage such vocabularies and thesauri as the thesaurus for graphic materials, the global legal information network thesaurus, the legislative indexing vocabulary, and the symbols of american libraries listing. ■ multites (http://www.multites.com/) is a windows­ based tool that provides support for ansi/niso relationships plus user­defined relationships and comment fields for an unlimited number of thesauri (both monolingual and multilingual). ■ termtree 2000 (http://www.termtree.com.au/) is a windows­based tool that uses access, sql server, or oracle for data storage. it can import and export trim thesauri (a format used by the towers records information management system [http://www.towersoft.com/]), as well as a defined termtree 2000 tag format. ■ webchoir (http://www.webchoir.com/) is a family of client­server web applications that provides dif­ ferent utilities for thesaurus management in multiple dbms platforms. termchoir is a hierarchical infor­ mation organizing and searching tool that enables one to create and search varieties of hierarchical subject categories, controlled vocabularies, and tax­ onomies based on either predefined standards or a user­defined structure, and is then exported to an xml­based format. linkchoir is another tool that allows indexers to describe information sources using terminology organized in termchoir. and seekchoir is a retrieval system that enables users to browse thesaurus descriptors and their references (broader terms, related terms, synonyms, and so on). ■ synaptica (http://www.synaptica.com/) is a client­ server web application that can be installed locally on a client’s intranet or extranet server. thesaurus data is stored in a sql server or oracle database. the application supports the creation of electronic the­ sauri in compliance with the ansi/niso standard. the application allows the exchange of thesauri in csv (comma­separated values) text format. ■ superthes (batschi et al. 2002) is a windows­based tool that allows the creation of thesauri. it extends the ansi/niso relationships, allowing many pos­ sible data types to enrich the properties of a concept. it can import and export thesauri in xml and tabular format. ■ tematres (hhttp://r020.com.ar/tematres/) is a web application specially oriented to the creation of thesauri, but it also can be used to develop web navigation structures or to manage the documentary languages in use. the thesauri are stored in a mysql database. it provides the created thesauri in zthes (tylor 2004) or in skos format. finally, it must be mentioned that, given that thesauri can be considered as ontologies specialized in organiz­ ing terminology (gonzalo et al. 1998), ontology editors have sometimes been used for thesaurus construction. a detailed survey of ontology editors can be found in the denny study (2002). all of these tools (desktop or web­based) present some problems in using them as general thesaurus editors. the main one is the incompatibility in the interchange formats that they support. these tools also present integration problems. some are deeply integrated in bigger sys­ tems and cannot easily be reused in other environments because they need specific software components to work article title | author �1thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �1 (as dbms to store thesauri). others are independent tools (can be considered as general­purpose thesaurus editors), but their architecture does not facilitate their integration within other information management tools. and most of them are not open source tools, so there is no possibility to modify them to improve their functionality. focusing on the interchange format problem, the iso­5964 standard (norm for multilingual thesauri) is currently undergoing review by iso tc46/sc 9, and it is expected that the new modifications will include a stan­ dard exchange format for thesauri. it is believed that this format will be based on technologies such as rdf/xml. in fact, some initiatives in this direction have already arisen: ■ the adl thesaurus protocol (janée et al. 2003) defines an xml­ and http­based protocol for access­ ing thesauri. as a result of query operations, portions of the thesaurus encoded in xml are returned. ■ the language independent metadata browsing of european resources (limber) project has published a thesaurus interchange format in rdf (matthews et al. 2001). this work introduces an rdf representa­ tion of thesauri, which is proposed as a candidate thesaurus interchange format. ■ the california environmental resources evaluation system (ceres) and the nbii biological resources division are collaborating in a thesaurus partnership project (ceres/nbii 2003) for the development of an integrated environmental thesaurus and a thesau­ rus networking toolset for metadata development and keyword searching. one of the deliverables of this project is an rdf format to represent thesauri. ■ the semantic web advanced development for europe (swad­europe 2001) project includes the swad­europe thesaurus activity, which has defined the skos, a set of specifications to represent the knowledge organization systems (kos) on the semantic web (thesauri between them). the british standards bs­5723 (bsi 1987) and bs­6723 (bsi 1985) (equivalent to the international iso­2788 and iso­5964) also lack a representation format. the british standards institute idt/2/2 working group is now developing the bs­8723 standard that will replace them and whose fifth part will describe the exchange formats and protocols for interoperability of thesauri. the objec­ tive of this working group is to promote the standard to iso, to replace the iso­2788 and iso­5964. here, it is important to remark that given the direct involvement of the idt/2/2 working group with skos development; probably the two initiatives will not diverge. the new representation format will be, if not exactly skos, at least skos­based. taking into account all these circumstances, skos seems to be the most adequate representation model to store thesauri. given that skos is rdf­based, it can be created using any tool that is able to manage rdf (usually used to edit ontologies); for example, swoop (mindswap group 2006), protégé (noy et al. 2000), or triple20 (wielemaker et al. 2005). the problem with these tools is that they are too complex for editing and visualizing such a simple model as skos. they are thought to create complex ontologies, so they provide too many options not spe­ cifically adapted to the type of relations in skos. in addition, they do not allow an integrated management of collection of thesauri and other types of controlled vocabularies as needed in dl processes (for example, the creation of metadata of resources, or the construction of queries in a search system). ■ skos model skos is a representation model for simple knowledge organization systems, such as subject heading lists, tax­ onomies, classification schemes, thesauri, folksonomies, other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies. this section describes the model, providing characteristics, showing the state of development, and indicating the problems found to represent some types of kos. skos was initially developed within the scope of the semantic web advanced development for europe (swad­europe 2001). swad­e was created to support w3c’s semantic web initiative in europe (part of the ist­7 programme). skos is based on a generic rdf schema for thesauri that was initially produced by the desire project (cross et al. 2001), and further developed in the limber project (matthews et al. 2001). it has been developed as a draft of an rdf schema for thesauri com­ patible with relevant iso standards, and later adapted to support other types of kos. among the kos already published using this new format are gemet (eea 2001), agrovoc (fao 2006), adl feature types (hill and zheng 1999), and some parts of wordnet lexical data­ base (miller 1990), all of them available on the skos project web page. skos is a collection of three different rdf schema application profiles: skos­core, to store common prop­ erties and relations; skos­mapping, whose purpose is to describe relations between different kos; and skos­ extension, to indicate specific relations and properties only contained in some type of kos. for the first step of the development of the thmanager tool, only the most stable part of skos has been consid­ ered. figure 1 shows the part of skos­core used. the rest of skos­core is still unstable, so its support has been delayed until it is approved. skos­mapping and skos­extension are still in their first steps of develop­ �2 information technology and libraries | september 2007�2 information technology and libraries | september 2007 ment and are very unstable, so their management in thmanager also has been delayed until the creation of stable versions. in skos­core, a kos (in our case, usually a the­ saurus) consists of a set of concepts (labelled as skos: concept) that are grouped by a concept scheme (skos: conceptscheme). to distinguish between different mod­ els provided, the skos:conceptscheme contains a uri that identifies it, but to describe the model content to humans, metadata following the dublin core standard also can be added. the relation of the concept scheme with the concepts of the kos is done through the skos: hastopconcept relation. this relation points at the most general concepts of the kos (top concepts), which are used as entry points to the kos structure. in skos, each concept consists of a uri and a set of properties and relations to other concepts. among the properties, skos.preflabel and skos.altlabel provide labels for a concept in different languages. the first one is used to show the label that better identifies a concept (for the­ sauri it must be unique). the second one is an alternative label that contains synonyms or spelling variations of the preferred label (it is used to redirect to the preferred label of the concept). the skos concepts also can contain three other properties called skos.scopenote, skos.definition, and skos.example. they contain annotations about the ways to use a concept, a definition, or examples of use in differ­ ent languages. last, the skos.prefsymbol and skos.altsymbol properties are used to provide a preferred or some alter­ native symbols that graphically represent the concept. for example, a graphical representation is very useful to identify the meaning of a mathematical formula. another example is a chemical formula, where a graphical repre­ sentation of the structure of the substance also provides valuable information to the user. with respect to the relations, each concept indicates by means of the skos:inscheme relation in which concept scheme it is contained. the skos.broader and the skos.narrower relations are inverse relations used to model the generalization and specialization characteristics present in many kos (including thesauri). skos.broader relates to more general concepts, and skos.narrower to more spe­ cific ones. the skos.related relation describes associative relationships between concepts (also present in many thesauri), indicating that two concepts are related in some way. with these properties and relations, it is perfectly possible to represent thesauri, taxonomies, and other types of controlled vocabularies. however, there is a problem for the representation of classification schemes that provide multiple coding of terms, as there is no place to store this information. under this category, one may find classification schemes such as iso­639 (iso 2002) (iso standard for coding of languages), which proposes different types of alphanumeric codes (for example, two letters and three letters). for this special case, the skos working group proposes the use of the property skos.notation. although this property is not in the skos vocabulary yet, it is expected to be added in future versions. given the need to work with these types of schemes, this property has been included in the thmanager tool. ■ thmanager architecture this section presents the architecture of thmanager tool. this tool has been created to manage thesauri in skos, but it also is a base infrastructure that facilitates the management of thesauri in dls, simplifying their inte­ gration in tools that need to use thesauri or other types of controlled vocabularies. in addition, to facilitate its use on different computer platforms, thmanager has been developed using the java object­oriented language. the architecture of thmanager tool is shown in figure 2. the system consists of three layers: first, a repository layer where thesauri are stored and identified by means of associated metadata describing them; second, a per­ sistence layer that provides an api for access to thesauri stored in the repository; and third, a gui layer that offers different graphical components to visualize thesauri, to search by their properties, and to edit them in different ways. the thmanager tool is an application that uses the different components provided by the gui layer to allow the user to manage the thesauri. in addition, the layered architecture allows other applications to use some of the visualization components or the method provided by the persistence layer to provide access to thesauri. the main features that have guided the design of these layers have been the following: a metadata­driven design, efficient management of thesauri, the possibility of interrelating thesauri, and the reusability of thmanager figure 1. skos model article title | author �3thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �3 components. the following subsections describe these characteristics in detail. metadata-driven design a fundamental aspect in the repository layer is the use of metadata to describe thesauri. thmanager considers metadata of thesauri as basic information in the thesau­ rus management process, being stored in the metadata repository and managed by the metadata manager. the reason for this metadata­driven design is that thesauri must be described and classified to facilitate the selec­ tion of the one that better fits the user needs, allowing the user to search them not only by their name but also by the application domain or the associated geographi­ cal area between others. the lack of metadata makes the identification of useful thesauri (provided by other organizations) difficult, producing a low reuse of them in other contexts. to describe thesauri in our service, a metadata profile based on dublin core has been created. the reason to use dublin core as basis of this profile has been its extensive use in the metadata community. it provides a simple way to describe a resource using very general metadata ele­ ments, which can be easily matched with complex domain­ specific metadata standards. additionally, dublin core also can be extended to define application profiles for specific types of resources. following the metadata pro­ file hierarchy described in tolosana­calasanz et al. (2006), the thesaurus metadata profile refines the definition and domain of dublin core elements as well as includes two new elements (metadata language and metadata identifier) to appropriately identify the metadata records describing a thesaurus. the profile for thesauri has been described using the iemsr format (heery et al. 2005) and is distributed with the tool. iemsr is an rdf­based format created by the jisc ie metadata schema registry project to describe metadata application profiles. figure 3 shows the metadata created for gemet thesaurus (the resource), expressed as a hedgehog graph (reinterpreta­ tion of rdf triplets: resources, named properties, and values). the purpose of these metadata is not only to sim­ plify the thesaurus location to a user, but also to facilitate the identification of thesauri useful for a specific task in a machine­to­machine communication. for instance, one may be interested only in thesauri that cover a restricted geographical area or have a specific thematic. efficient thesauri storage thesauri vary enormously in size, ranging from hundreds of concepts and properties to millions. so the time spent on load, navigation, and search processes are a functional restriction for a tool that has to manage them. skos is rdf­based, and because reading rdf to extract the con­ tent is a slow process, the format is not appropriate for inner storage. to provide better access time, thmanager transforms skos into a binary format when a new skos is imported. the persistence layer provides a unified access to the thesaurus repository. this layer is used by the gui layer figure 2. kos manager architecture viewer generatorviewer generator repository concept repository metadata manager concept manager persistence gui disambiguation tool concept core thesaurus persistence manager skos core skos mapping jena api metadata repository thesaurus metadata applications thmanagerthmanager other tools that use thesauri other tools that use thesauri desktop tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri desktop tools that use thesauri desktop tools that use thesauri other tools that use thesauri other tools that use thesauri web services that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri web services that use thesauri web services that use thesauri visualization edition search gui manager figure 3. metadata of gemet thesaurus european topic centre on catalogue of data sources (etc/cds) general multilingual environmental thesaurus dc:title dcterms:alternative gemet dc:creator [ http://www2.ulcc.ac.uk/unesco/concept/mt_mt_2.55 ] science.environmental sciences and engineering [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.60 ] science.pollution, disasters and security [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.65 ] science.natural resources dc:subject dc:subject dc:subject dc:subject gemet was conceived as a "general" thesaurus, aimed to define a common general language, a core of general terminology for the environment dc:description dc:publisher european environment agency (eea) dc:date 2005-03-07 dc:type [ http://iaaa.cps.unizar.es/dctype/concept/236 ] text.reference materials.ontology dc:format [ http://iaaa.cps.unizar.es/mimetype/concept/skos ] skos http://www.eionet.eu.int/gemetdc:identifier dc:language en es fr ... iaaa:metadatalanguage en http://iaaa.cps.unizar.es/ontologies/gemetiaaa:metadataidentifier [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.75 ] science.natural sciences [ http://www.eionet.europa.eu ] european environment information and observation network it can be used whenever there is no commercial profitdc:rights dc:relation us environmental protection agency (epa) dc:contributor dc:source [ http://europa.eu/eurovoc ] eurovoc thesaurus european environment agency (eea) dc:creator ... �� information technology and libraries | september 2007�� information technology and libraries | september 2007 to access the thesauri, but it also can be employed by other tools that need to use thesauri outside a desktop environment (for example, a thematic search system accessible through the web that requires browsing a thesaurus to facilitate construction of user queries). this layer performs the transformation of skos to the binary format when a thesaurus is imported. the transformation is provided using the jena library, a popular library to manipulate rdf documents that allows storing them in different kinds of repositories (http://jena.sourceforge. net/). jena provides an open model that can be extended with specialized modules to use other ways of storage, making it possible to easily change the storage format system for another that is more efficient if needed. the data structure used is shown in figure 4. the model is an optimized representation of the information given by the rdf triplets. the concepts map contains the concepts and their associated relations in the form of key­value pairs: the key is a uri identifying a concept; and the value is a relations object containing the properties of the concept. a relations object is a map that stores the properties of one concept in the form of pairs. the keys used for this map are the names of the typical property types in the skos model (for example, narrower or broader). the only special cases for encoding these property types in the proposed data structure occur when they have a language attribute (for example, preflabel, definition, or scopenote). in those cases, we propose the use of a [lang] suffix to distinguish the property type for a particular language. for instance, preflabel_en indicates a preflabel property type in english. additionally, it must be noted that the data type of the property values assigned to each key in the relations map varies upon the semantics given to each property type. the data types fall into the following categories: a string for a preflabel property type; a list of strings for altlabel, definition, scope note, and example property types; a uri for a prefsymbol property type; a list of uris for narrower, broader, related, and altsymbol property types; and a list of notation objects for a notation property type. the data type used for notation values is a complex object because there may be different notation types. a notation object consists of type and value attributes. the type attribute is a uri that identifies a particular notation type and qualifies the associated notation value. additionally, and with the objective of increasing the speed of some operations (for example, navigation or search), some optimizations have been added. first, the uris of the top concepts are stored in the topconcepts list. this list contains redundant information, given that those concepts also are stored in the concepts map, but it makes immediate their location. second, to speed up the search of concepts and the drawing of the alphabetic viewer, the translations map has been added. for each language sup­ ported by the thesaurus, this map contains a translationterm object, or list of pairs , ordered by preflabel. it also contains redundant information that allows the immediate creation of the alphabetic viewer for a language, simplifying the search process; as can be seen later, this does not provides a big over­ head in load time. in addition, if no alphabetic viewer and search are needed, this structure can be removed without affecting the hierarchical viewer. this solution has proven to be useful to manage the kind of thesauri we use (they do not sur­ pass 50,000 concepts and about 330,000 properties), loading them to memory in an average com­ puter in a reasonable time, and allowing immediate navigation and search (see section 6). interrelation of thesauri the vast choice of thesauri that are available nowadays implies an undesired effect of content heterogeneity. although a the­ saurus is usually created for a specific application domain, some of the concepts defined in thesauri from different applica­figure �. persistence model …… relations uri 3uri 3 relations uri 2uri 2 relations uri 1uri 1 valuekey …… relations uri 3uri 3 relations uri 2uri 2 relations uri 1uri 1 valuekey <> concepts uriprefsymbol list altsymbol list notation stringpreflabel_[lang] list altlabel_[lang] list definition_[lang] list scopenote_[lang] list example_[lang] list related list broader list narrower valuekey uriprefsymbol list altsymbol list notation stringpreflabel_[lang] list altlabel_[lang] list definition_[lang] list scopenote_[lang] list example_[lang] list related list broader list narrower valuekey <> relations -type : uri -value : string notation …… list narrower valuekey …… list narrower valuekey <> relations … uri 390 uri 27 uri 3 … uri 390 uri 27 uri 3 <> topconcepts … -concept : uri -label : string translationterm …… listfr listes listen valuekey …… listfr listes listen valuekey <> translations article title | author �5thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �5 tions domains may be equivalent. in order to facilitate cross­domain classification of resources, users would benefit from the possibility of knowing the connections of a thesaurus in their application domain to thesauri used in other domains. however, it is difficult to manually detect the implicit links between those different thesauri. therefore, in order to automatically facilitate these interthesaurus connections, the persistence layer of thmanager tool provides an interrelation function that relates a thesaurus with respect to an upper­level lexical database (the concept core displayed in figure 2). the interrelation mechanism is based on the method presented in nogueras­iso, zarazaga­soria, and muro­ medrano (2005). it is an unsupervised disambiguation method that uses the relations between concepts as disam­ biguation context. it applies a heuristic voting algorithm to select the most adequate sense of the used concept core for each thesaurus concept. at the moment, the concept core is the wordnet lexical database. wordnet is a large english lexical database that groups nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms (synsets), each expressing a distinct concept. those synsets are interlinked by means of conceptual­semantic and lexical relations. the interrelation component has been conceived as an independent module that receives a thesaurus as input in skos and returns the relation respect to concept core using an extended version of the skos mapping model (miles and brickley 2004). this model, as commented before, is a part of skos that allows describing exact, major, and minor mappings between concepts of two different kos (in this case between a thesaurus and the common core). skos mapping is still in an early stage of development and has been extended in order to provide the needed functionality. the base skos mapping provides the map:exactmatch, map:majormatch, and map:minormatch relations to indicate the degree of relation between two concepts. given that the interrelation algorithm cannot ensure that a mapping is 100 percent exact, only the major and minor match properties are used. the algorithm returns a list of pos­ sible mappings with the lexical database for each concept: the one with the highest probability is assigned as major match, and the rest are assigned as minor matches. to store the interrelation probability, skos mapping has been extended by adding a blank node with the liability of the mapping. also, to be able to know which concepts of which thesauri are equivalents to one of the common core, the inverse relations of map:majormatch and map:minormatch have been created. an example of skos mapping can be seen in figure 5. there, the concept 340 of gemet thesaurus (alloy) is correctly mapped to the wordnet concept number 13751474 (alloy, metal) with a probability of 91.007 percent, an unrelated minor mapping also is found, but it is given a low probability (8.992 percent). reusability of thmanager components on top of the api layer, the gui layer has been con­ structed. this layer contains several graphical interfaces to provide different types of viewers, searchers, and edi­ tors for thesauri. this layer is used as base for the con­ struction of the thmanager tool. the tool groups a subset of the provided components, relating them to obtain a final user application that allows the management of the stored thesauri, their visualization (navigation by the concept relations), their edition, and their importation and exportation using skos format. the thmanager tool not only has been created as an independent tool to facilitate thesauri management, but also to allow easy integration in tools that need to use thesauri. it has been done by combining the informa­ tion management with specific graphical interfaces in different black­box components. between the provided components, there is a hierarchical viewer, an alphabetic viewer, a list viewer, a searcher, and an editor, but more components can be constructed if needed. the use of the gui layer as a library of reusable graphical components makes it possible to create different tools that are able to manage thesauri with different user requirements with minimum effort, allowing also the integration of this technology in other applications that need controlled vocabularies to improve their functionality. for example, in a metadata creation tool, it can be used to provide the graphical component to select controlled values from thesauri and automatically insert them in the metadata. it also can be used to provide the list of possible values to use in a web search system, or to provide a thesaurus­ based navigation of a collection of resources in an explor­ atory search system. figure 6 shows the integration process of a thesau­ rus visualization component in an external tool. the provided thesaurus components have been constructed following the java beans philosophy (reusable software components that can be manipulated visually in a builder tool), where a component is a black box with methods to read and change its state that can be reused when needed. here, each thesaurus component is a thesaurusbean that can be directly inserted in a graphical application to use its functionality (visualize or edit thesauri) in a very simple way. the thesaurusbeans are provided by the thesaurusbeanmanager that, given the parameters of the thesaurus to visualize and the type of visualization, returns the most adequate component to use. ■ description of thmanager functionality thmanager tool is a desktop application that is able to manage thesauri stored in skos. as regards to the instal­ �6 information technology and libraries | september 2007�6 information technology and libraries | september 2007 lation requirements, the application requires 100 mbs of free space on the hard disk. with respect to ram and cpu requirements, they depend greatly on the size and the number of thesauri loaded in the tool. considering the number and size of thesauri used as testbed in section 6, ram consumption ranges from 256 to 512 mbs, and with a 3ghz cpu (for example, pentium iv), the load times for the bigger thesauri are acceptable. however, if the size of thesauri is smaller, ram and cpu requirements decrease, being able to operate on a computer with just a 1 ghz cpu (for example, pentium iii) and 128 mbs of ram. given that the management of thmanager is meta­ data oriented, the first window in the application shows a table including the metadata records describing all the thesauri stored in the system (figure 7). the selection of a record in this table indicates to the rest of the compo­ nents the selected thesaurus. the creation or deletion of thesauri also is provided here. the only operation that can be performed when no record is selected is to import a new thesaurus stored in skos. to import it, the name of the skos file must be provided. the import tool also contains the option to interrelate the imported thesaurus to the concept core. the metadata of the thesaurus are extracted from inside of the skos if they are available, or they can be provided in an associated xml metadata file. if no metadata record is provided, the application generates a new one with minimum information, using as base the name of the skos file. once the user has selected a thesaurus, it can visualize and modify its metadata or content, export it to skos, or, as commented before, delete it. with respect to the metadata describing a thesaurus, a metadata viewer visualizes the metadata in html and a metadata editor allows the editing of metadata following the thesaurus metadata profile described in the metadata­driven design section (figure 8 shows a screenshot of the metadata edi­ tor). different html views can be provided by adding more css files to the application. the metadata editor is customiz­ able. to add or delete metadata elements to the metadata edi­ tor window, it is only neces­ sary to modify the description of the iemsr profile for thesauri included in the application. the main functionality of the tool is to visualize the thesaurus structure, showing all proper­ ties of concepts and allowing the navigation by relations (see figure 9). here, different read­only viewers are provided. there is an alphabetic viewer that shows all the concepts ordered by the preferred label in one language. a hierar­ chical viewer provides navigation by broader and nar­ rower relations. additionally, a hypertext viewer shows all properties of a concept and provides navigation by all its relations (broader, narrower, and related) via hyper­ links. finally, there also is a search system that allows the typical searches needed for thesauri (equals, starts with, contains). currently, search is limited to preferred labels in the selected language, but it could be extended to allow searches by other properties, such as synonyms, defini­ tions, or scope notes. figure 5. skos mapping extension alloy ... 91.00727 alloy, metal … 91.00727 map:majormatch iaaa:probability map:majormatch iaaa:hasmajormatch iaaa:hasmajormatch resource property alloy, metal a mixture containing two or more metallic elements or metallic and nonmetallic elements usually fused together or dissolving into each other when molten; "brass is an alloy of zinc and copper" skos:definition map:minormatch iaaa:hasminormatch admixture, alloy map:minormatch iaaa:hasminormatch http://www.eionet.eu.int/ gemet/concept/340 rdf:about a28660 rdf:nodeid a2821 8.992731 iaaa:probability rdf:nodeid http://wordnet.princeton.edu/ wordnet_2.0/13751474 rdf:about skos:preflabel alloy skos:preflabel http://wordnet.princeton.edu/ wordnet_2.0/13664144 the state of impairing the quality or reducing the value of something skos:preflabel skos:definition rdf:about any of a large number of substances having metallic properties and consisting of two or more elements; with few exceptions, the components are usually metallic elements. (source: mgh) skos:definition figure 6. gui component integration desktop tool thesaurusbeanmanager type: tree, thesaurus: gemet thesaurusbean article title | author �7thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �7 all of these viewers are synchronized, so the selec­ tion of a concept in one of them produces the selection of the same concept in the others. the layered architec­ ture described previously allows these viewers to be reused in many situations, including other parts of the thmanager tool. for example, in the thesaurus metadata editor described before, the thesaurus viewer is used to facilitate the selection of values for the subject section of metadata. also, in the thesaurus editor shown later, the thesaurus viewer simplifies the selection of a concept related (by some kind of relation) to the selected, and provides a preview of the hierarchical viewer to help to detect wrong relations. the third available operation is to edit the thesaurus structure. here, to create a thesaurus following the skos model, an edition component is provided (see figure 10). the graphical interface shows a list with all the concepts created in the selected thesaurus, allowing the creation of new ones (providing their uris) or deletion of selected ones. once a concept has been selected, its properties and relations to other concepts are shown, allowing the creation of new ones and the deletion of others. to facili­ tate the creation of relations between concepts, a selector of concepts (based in the thesaurus viewer) is provided, allowing the user to add related concepts without manu­ ally typing the uri of the associated concept. also, to see if the created thesaurus is correct, a preview of the hier­ archical viewer can be shown, allowing the user to easily detect problems in the broader and narrower relations. with respect to the interrelation functionality, at the moment the mapping obtained is shown in the thesaurus viewers, but the navigation between equivalent concepts of two thesauri must be be done manually by the user. however, a navigation component still under develop­ ment will allow the user to jump from a concept in a the­ saurus to concepts in others that are mapped to the same concept in the common core. as mentioned before, for efficiency, the format used to store the thesauri in the repository is binary, but the inter­ change format used is skos. so a module for thesauri importation and exportation is provided. this module is able to import from and export to skos. in addition, if the thesaurus has been interrelated with respect to the concept core, it is able to export its mapping to the con­ cept core using the extended version of skos mapping above. ■ results of the work this section shows some experiments performed with the thmanager tool for the storage and management of a selected set of thesauri. in particular, this set of thesauri is relevant in the context of the geographic information community. the increasing relevance of geographic infor­ mation for decision­making and resource management in different areas of government has promoted the cre­ ation of geo­libraries and spatial data infrastructures to facilitate distribution and access of geographic informa­ tion (nogueras­iso, zarazaga­soria, and muro­medrano, 2005). in this context, complex metadata schemes, such as iso­19115, have been proposed for a full­detail descrip­ tion of resources. many of the metadata elements in these schemes are either constrained to a selected vocabulary (iso­639 for language encoding, iso­3166 for country codes, and so on), or the user is told to pick a term from the most suitable thesaurus. the problems with this sec­ ond case are that typically the choice for thesauri is quite open, the thesauri are frequently large, and the exchange format of available thesauri is quite heterogeneous. in such a context, the thmanager tool has proven to be very useful to simplify the management of the used thesauri. at the moment, eighty kos between thesauri and other types of controlled vocabulary have been cre­ ated or transformed to skos and managed through this tool. table 1 shows some of them, indicating their names (name column), the number of concepts (nc column), their total number of properties and relations (np and nr columns), and the number of languages in which concept properties are provided (nl column). to give an idea of the cost of loading these structures, the sizes of skos and binary files (ss and sb columns) are provided in kilobytes (kb). additionally, table 1 compares the performance time of thmanager with respect to other tools that load the figure 7. thesaurus selector figure �. thesaurus metadata editor �� information technology and libraries | september 2007�� information technology and libraries | september 2007 thesauri directly from an rdf file using the jena library (time performance has been obtained using a 3ghz pentium iv processor). for this purpose, three different load times (in seconds) have been computed. the bt column contains the load time of binary files without the cost of creating the gui for the thesauri viewers. the lt column contains the total load time of binary files (including the time of gui creation and drawing). the jt column contains the time spent by a hypothetical rdf­ based editor tool to invoke jena and load in its memory model the rdf skos files (it does not include gui cre­ ation) containing the thesauri. the difference between the bt and lt column shows the time used to draw the gui once the thesauri have been loaded in memory. the difference between bt and jt columns shows the gain in terms of time of using a binary storage instead of a rdf based one. the thesauri shown in the table are the adl feature types thesaurus (adl ftt), the isoc thesaurus of geography (isoc­g), the iso­639, the unesco thesaurus (unesco 1995), the ogp surveying and positioning committee code lists (epsg) (ogp 2006), the multilingual agricultural thesaurus (agrovoc), the european vocabulary thesaurus (eurovoc) (eupo 2005), the european territorial units (spain and france) (etu), and the general multilingual environmental thesaurus (gemet). they have been selected because they have different sizes and can be used to show how the load time evolves with the thesaurus size. among them, gemet and agrovoc can be high­ lighted. although they are provided as skos, they include nonstandard extensions that we have transformed to standard skos relations and properties. eurovoc and unesco are examples of thesauri provided in formats different than skos that we have completely transformed into skos. the former one was in an xml­based format, and the latter used a plain­text format. another thesaurus transformed to skos is the european territorial units, which contains the administrative political units in spain and france. here, the original source was a collection of heterogeneous documents that contained parts of the needed information and have been processed to generate a skos file. some classification schemes also have been trans­ formed to skos, such as the iso­639 and the different epsg codes for coordinate reference systems (includ­ ing datums, ellipsoids, and projections). with respect to controlled vocabularies created (by the authors) in skos using the thmanager tool, there is an extended version of the adl feature types that includes a more detailed clas­ sification of features types and different glossaries used for resource classification. figure 11 depicts the comparison of the different load times shown in table 1 with respect to the size of the rdf skos files. the order of the thesauri in the figure is the same as in the table 1. it can be seen that the time to con­ struct the model using a binary format is almost half the time spent to create the model using a rdf file. in addi­ tion, once the binary model is loaded, the time to generate the gui is not very dependent on thesaurus size. this is possible thanks to the redundant information added to facilitate the access to top concepts and to speed up load­ ing of the alphabetic viewer. this redundant informa­ tion produces an overhead in the load of the model, but without it the drawing time would be much worse, as it would have to generate it on the fly. however, in spite of the improvements, for the larger thesauri considered, the load time starts to be long, given that it includes the load time of all the structure of the thesaurus in memory and the creation of the objects used to manage it quickly when loaded. but, once it is loaded, future accesses are immediate (quicker than 0.5 seconds). these accesses include opening it again, navigating by figure 9. thesaurus concept selector figure 10. thesaurus concept editor article title | author �9thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �9 thesaurus relations, changing the visualization language, and searching concepts by their preferred labels. to minimize the load time, thesauri can be loaded in the background when the application is launched, reducing, in that way, the user perception of the load time. another interesting aspect in figure 11 is the peak of the third element. it corresponds with the iso­639 classifica­ tion scheme. it has the special characteristic of not having hierarchy and having many notations. these two character­ istics produce a little increase in the model load time, given that the top concepts list contains all the concepts and the notations are more complex than other relations. but most of the time is used to generate the gui of the tree viewer. the tree viewer gets all the concepts that are top terms, and for each one it asks for their preferred labels in the selected language and sorts them alphabetically to show the first level of the tree. this is fast for a few hundred concepts, but not for the 7,599 in the iso­639. however, this problem could be easily solved if the metadata contained a descrip­ tion of the type of kos to visualize. if the tool knew that the kos does not have broader and narrower relations, it could use the structures used to visualize the alphabetic list, which are optimized to show all of the kos concepts rapidly, instead of trying to load it as a tree. the persistence approach used has the advantage of not requiring external persistence systems, such as a dbms, and providing rapid access after loading, but it has the drawback of loading all thesauri in memory (in time and space). so, for much bigger thesauri, the use of some kind of dbms would be necessary. if this change were necessary, minimum modifications would be needed (one class). however, if not all the concepts are loaded, the alphabetic viewer (shows all the concepts) would have to be updated (for example, showing the concepts by pages) or it would become too slow to work with it. ■ conclusions this article has presented a tool for managing the the­ sauri needed in a digital library, for creating metadata, and for running search processes using skos as the interchange format. this work revises the tools that are available to edit thesauri, highlighting the lack of a formalized way to exchange thesauri and the difficulty of integrating those tools in other environments. this work selects skos from the available interchange formats for thesauri as the most promising format to become a standard for skos repre­ sentation, and highlights the lack of tools that are able to manage it properly. the thmanager tool is offered as the solution to these problems. it is an open source tool that can manage the­ sauri stored in skos, allowing their visualization and editing. thanks to the layered architecture, its components can be easily integrated in other applications that need to use thesauri or other controlled vocabularies. additionally, the components can be used to control the possible values used in a web search service to facilitate traditional or exploratory searches based on a controlled vocabulary. the performance of the tool is proved through a series of experiments on the management of a selected set of thesauri. this work analyzes the features of this selected set of thesauri and compares the efficiency of this tool with respect to other tools that load the thesauri directly from a rdf file. in particular, it is shown that the internal representation used by thmanager helps to decrease the time spent for the graphical loading of thesauri, facilitating navigation of the thesaurus contents as well as other typical operations, such as sorting or change of visual­ ization language. additionally, it is worth noting that the tool can be used as a library of components to simplify the integration of the­ sauri in other applications that require the use of controlled vocabularies. thmanager has been integrated within the open source catmdedit tool table 1. sizes of some thesauri and other types of vocabularies name nc np nr nl lt bt jt ss sb adl ftt 210 210 408 1 0.4 0.047 0.062 103 41 isoc­g 5,136 5,136 1,026 1 2.4 1.063 1.797 2,796 1,332 iso­639 7,599 16,247 0 6 5.1 1.969 2.89 3,870 3,017 unesco 8,600 13,281 21,681 3 2.1 1.406 2.984 4,034 2,135 epsg 4,772 9,544 0 1 1.8 0.969 1.796 2,935 1,682 agrovoc 16,896 103,484 30,361 3 7.5 4.953 14.75 15,859 5,089 eurovoc 6,649 196,391 20,861 15 11.1 9.266 15.828 18,442 11,483 etu 44,991 89,980 89,976 2 13.3 10.625 17.844 23,828 10,412 gemet 5,244 326,602 12,750 21 13.7 11.828 25.61 28,010 15,048 50 information technology and libraries | september 200750 information technology and libraries | september 2007 (zarazaga­soria et al. 2003), a metadata editor tool for the documentation of geographic information resources (metadata compliant with iso19115 geographic informa­ tion metadata standard). the thesaurusbeans provided in thmanager library have been used to facilitate keyword selection for some metadata elements. the thmanager component library also has contributed to the develop­ ment of catalog search systems guided by controlled vocabularies. for instance, it has been used to build a thematic catalog in the sdiger project (zarazaga­soria 2007). sdiger is a pilot project on the implementa­ tion of the infrastructure for spatial information in europe (inspire) for the development of a spatial data infrastructure to support access to geographic infor­ mation resources concerned with the european water framework directive. thanks to the thmanager compo­ nents, the thematic catalog allows browsing of resources by means of several multilingual thesauri, including gemet, unesco, agrovoc, and eurovoc. future work will enhance the functionalities provided by thmanager. first, the ergonomics will be improved to show connections between different thesauri. currently, these connections can be computed and annotated, but the gui does not allow the user to navigate them. as the base technology already has been developed, only a graphical interface is needed. second, the tool will be enhanced to support data types different from texts (for example, images, documents, or other multimedia sources) for the encoding of concepts’ property values. third, it has been noted that the thesauri concepts can evolve with time. thus, a mechanism for the managing the different ver­ sions of thesauri will be necessary in the future. finally, improvements in usability also are expected. thanks to the component­based design of thmanager widgets (thesaurusbeans), new viewers or editors can be readily created to meet the needs of specific users. ■ acknowledgments this work has been partially supported by the spanish ministry of education and science through the proj­ ects tin2006­00779 and tic2003­09365­c02­01 from the national plan for scientific research, development, and technology innovation. the authors would like to express their gratitude to juan josé floristán for his support in the technical development of the tool. references american national standards institute (ansi). 1993. guidelines for the construction, format, and management of monolin­ gual thesauri. ansi/niso z39.19­1993. revision of z39.19. batschi, wolf­dieter et al. 2002. superthes: a new software for construction, maintenance, and visualisation of mul­ tilingual thesauri. http://www.t­reks.cnr.it/docs/st_ enviroinfo_2002.pdf (accessed sept. 6, 2007). british standards institute (bsi). 1985. guide to establishment and development of multilingual thesauri. bs 6723. british standards institute (bsi). 1987. guide to establishment and development of monolingual thesauri. bs 5723. ceres/nbii. 2003. the ceres/nbii thesaurus partnership project. http://ceres.ca.gov/thesaurus/ (accessed june 12, 2007). cross, phil, dan brickley, and traugott koch. 2001. rdf the­ saurus specification. technical report 1011, institute for learning and research technology. http://www.ilrt.bris.ac.uk/ discovery/2001/01/rdf­thes/ (accessed june 12, 2007). denny, michael. 2002. ontology building: a survey of edit­ ing tools. xml.com. http://xml.com/pub/a/2002/11/06/ ontologies.html (accessed june 12, 2007). european environment agency (eea). 2004. general multilingual environmental thesaurus (gemet). version 2.0. european environment information and observation network. http:// www.eionet.europa.eu/gemet/rdf (accessed june 12, 2007). european union publication office (eupo). 2005. european vocabulary (eurovoc). publications office. http://europa .eu/eurovoc/ (accessed june 12, 2007). food and agriculture organization of the united nations (fao). 2006. agriculture vocabulary (agrovoc). agricul­ tural information management standards. http://www.fao. org/aims/ag%20alpha.htm (accessed june 12, 2007). gonzalo, julio, et al. 1998. applying eurowordnet to cross­lan­ guage text retrieval. computers and the humanities 32, no. 2/3 (special issue on euroword­net): 185–207. heery, rachel, et al. 2005. jisc metadata schema registry. in 5th acm/ieee-cs joint conference on digital libraries, 381–81. new york: acm pr. hill, linda, and qi zheng. 1999. indirect geospatial referencing through place names in the digital library: alexandria digi­ figure 11. thesaurus load times 0 5 10 15 20 25 30 0 5000 10000 15000 20000 25000 30000 skos file size (kb) lo ad t im e (s ) rdf (jena) binary thmanager article title | author 51thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria 51 tal library experience with developing and implementing gazetteers. in asis ‘99: proceedings of the 62nd asis annual meeting: knowledge: creation, organization, and use, 57–69. med­ ford, n.j.: information today, for the ameircan society for information science. hodge, gail. 2000. systems of knowledge organization for digital libraries: beyond traditional authority files. washington, d.c.: the digital library federation. international organization for standardization (iso). 1985. guidelines for the establishment and development of multilingual thesauri. iso 5964. international organization for standardization (iso). 1986. guidelines for the establishment and development of monolingual thesauri. iso 2788. international organization for standardization (iso). 2002. codes for the representation of names of languages. iso 639. international organization for standardization (iso). 2003. information and documentation—the dublin core metadata element set. iso 15836:2003. janée, greg, satoshi ikeda, and linda l. hill. 2003. the adl the­ saurus protocol. http://www.alexandria.ucsb.edu/~gjanee/ thesaurus/ (accessed june 12, 2007). lesk, michael. 1997. practical digital libraries. san francisco: books, bytes, and bucks. matthews, brian m., et al. 2001. internationalising data access through limber. in third international workshop on internationalisation of products and systems: 1–14. milton keynes (uk). http://epubs.cclrc.ac.uk/bitstream/401/limber_iwips.pdf (accessed june 12, 2007). miles, alistair, and dan brickley, eds. 2004. skos mapping vocab­ ulary specification. w3c. http://www.w3.org/2004/02/ skos/mapping/spec/2004­11­11.html (accessed june 12, 2007). miles, alistair, brian matthews, and michael wilson. 2005. skos core: simple knowledge organization for the web. in 2005 dublin core annual conference—vocabularies in practice, 5–13. madrid: universidad carlos ii de madrid. miller, george a. 1990. wordnet: an on­line lexical database. int. j. lexicography 3: 235–312. mindswap group. 2006. swoop a hypermedia­based feath­ erweight owl ontology editor. maryland information and network dynamics lab. semantic web agents project. http://www.mindswap.org/2004/swoop/ (accessed june 12, 2007). nogueras­iso, javier, francisco javier zarazaga­soria, and pedro rafael muro­medrano. 2005. geographic information metadata for spatial data infrastructures—resources, interoperability, and information retrieval. new york: springer verlag. noy, natalie f., ray w. fergerson, and mark a. musen. 2000. the knowledge model of protégé2000: combining interoper­ ability and flexibility. in knowledge engineering and knowledge management: methods, models, and tools: 12th international conference, ekaw 2000, juan-les-pins, france, october 2–6, 2000: proceedings, 1­20 (lecture notes in computer science, 1937). new york: springer. ogp surveying & positioning committee. 2006. surveying and positioning. http://www.epsg.org/ (accessed june 12, 2007). semantic web advanced development for europe (swad­ europe). 2001. semantic web advanced development for europe thesaurus activity. http://www.w3.org/2001/sw/ europe/ reports/thes (accessed june 12, 2007). tolosana­calasanz, r., et al. 2006. semantic interoperability based on dublin core hierarchical one­to­one mappings. international journal of metadata, semantics, and ontologies 1, no. 3: 183–88. tylor, mike. 2004. the zthes specifications for thesaurus rep­ resentation, access, and navigation. http://zthes.z3950.org/ (accessed june 12, 2007). united nations educational, scientific, and cultural organiza­ tion (unesco). 1995. unesco thesaurus: a structured list of descriptors for indexing and retrieving literature in the fields of education, science, social and human science, culture, communication and information. paris: unesco publ. u.s. library of congress. network devlopment and marc standards office. 2004. marc standards. http://www.loc. gov/marc/ (accessed june 12, 2007). wielemaker, jan, guss schreiber, and bob wielinga1. 2005. using triples for implementation: the triple20 ontology-manipulation tool (lecture notes in computer science, 3729): 773–85. new york: springer. zarazaga­soria, francisco javier, et al. 2003. a java tool for creating iso/fgdc geographic metadata. in geodatenund geodiensteinfrastukuren—von der forschung zur praktischen anwendung: beitrage ze den münsteraner gi-tagen, 26/27. juni 2003 (ifgiprints, 18). münster, germany: institut fur geoin­ formatik, universitat münster. zarazaga­soria, francisco javier, et al. 2007. providing sdi ser­ vices in a cross­border scenario: the sdiger project use case. in research and theory in advancing spatial data infrastructure concepts, 113–26. redlands, calif.: esri. ebsco cover 2 lita cover 3, cover 4 index to advertisers barnettellis 22 information technology and libraries | march 2005 the metascholar initiative of emory university libraries, in collaboration with the center for the study of southern culture, the atlanta history center, and the georgia music hall of fame, received an institute of museum and library services grant to develop a new model for library-museum-archives collaboration. this collaboration will broaden access to resources for learning communities through the use of the open archives initiative protocol for metadata harvesting (oaipmh). the project, titled music of social change (mosc), will use oai-pmh as a tool to bridge the widely varying metadata standards and practices across museums, archives, and libraries. this paper will focus specifically on the unique advantages of the use of oaipmh to concurrently maximize the exposure of metadata emergent from varying metadata cultures. t he metascholar initiative of emory university libraries, in collaboration with the center for the study of southern culture, the atlanta history center, and the georgia music hall of fame, received an institute of museum and library services grant to develop a new model for library-museum-archives collaboration to broaden access to resources for learning communities through the use of the open archives initiative protocol for metadata harvesting (oai-pmh).1 the collaborators of the project, entitled music of social change (mosc), are creating a subject-based virtual collection concerning music and musicians associated with social-change movements such as the civil-rights struggle. this paper will specifically focus on the advantages offered by oai-pmh in amalgamating and serving metadata from these institutional sources that are significantly different in kind.2 there has been a great deal of discussion within the library community as to the possibilities oai-pmh holds for harvesting, aggregating, and then disseminating research metadata. however, in reality, only a few of institutions (be they museum, archives, or libraries) have actually begun to utilize oai-pmh to this end. there are some practical, historical barriers to implementing any shared system for distributing metadata across institutions that are, more than in degree, different in kind. one of these significant differences is of metadata cultures and practices. libraries have traditionally incrementally assigned metadata at an item level within their collection(s). the strength of this model is that at least a minimal amount of metadata is assigned to a very high percentage of items within the collection. the challenge of such a system is that for such metadata records to interoperate within a shared database and through a common interface (for example, the traditional union catalog), the metadata fields have been quite rigidly defined compared to those within archival and museum environments. due to tradition as well as the sheer volume of items collected by libraries, metadata at an item level are not greatly detailed or contextualized. often, items within library collections lack robust relationary mapping to other items within or outside of the collection, as is done, for example, in archival processing. content contextualization is highly valued by archival metadata practices and culture as the central tenet of metadata creation. items at a subcollection level almost always have metadata derivative from and deferential to that of the collection-level metadata. the great benefit of archival practices in metadata assignment is a contextualization of content that reflects the background, the topographic place in time and space of a given portion of a collection and its organic, emergent relationship to the whole. the weaknesses of this model are a great inconsistency in description details and variables (at the collection and subcollection levels), as well as very disparate levels of granularity within the hierarchy of the structure of a collection at which metadata are assigned. such disparities among institutional types feed an unnecessary level of misunderstanding by libraries of the metadata culture and aims of archives as well as those of museums. museums often have very skeletal documented (as opposed to undocumented) metadata about their collections or objects therein. often museums are not funded to make metadata on their collections freely available. it is common, in fact, for curatorial staff to view metadata as intellectual property to which they serve as gatekeepers, reflecting a professional value placed upon contextualizing materials for users. this is done on a user-by-user or exhibition-by-exhibition basis, depending on user background or the thesis of a given exhibition. additionally, museums perceive information on the aboutness of their collections to be a class of capital with which they can always potentially cost-recover or generate income. within the culture of museums, staff have traditionally been disinclined to make their collections available in an unmediated manner. additionally, there has been resistance to documenting information about collections in a systematic way. there is even greater resistance to adhering to any prescriptions on metadata as would be required for compliance with even the most minimally structured database. such regulation would discriminate the mosc project: using the oai-pmh to bridge metadata cultural differences across museums, archives, and libraries eulalia roel eulalia roel (eulalia.roel@gmail.com) is coordinator of information resources at the federal reserve, atlanta. against the nuanced information required for each and every object within a collection. � why oai-pmh to bridge these cultures? oai-pmh was selected by the mosc project as a means to bridge some of these substantial disparities. the protocol is often mistakenly assumed to function only with metadata expressed as unqualified dublin core (dc). in fact, the protocol functions with any metadata format expressed by extensible markup language (xml); this is the minimal requirement for content to serve metadata through oai-pmh. this includes those formats that have been well received by institutions other than libraries, such as xml encoded archival description (ead) as it is used in archives. as per 4.2 of the oai-pmh guidelines for repository implementers, communities are able to develop their own collection description xml schemas for use within description . . . elements. if all that is desired is the ability to include an unstructured textual description, then it is recommended that repositories use the dublin core description element. seven existing schemes are: dublin core, encoded archival description (ead), the eprints schema, rslp collection description schema, uddi/wsdl, marc21, and the branding schema.3 the oai protocol has often been partnered with unqualified dc metadata, as this is the most minimal metadata structure necessary for participation in an oai harvesting system. not only are these dc fields unqualified, no fields are actually required. no structure or regulations are codified outside of requiring metadata contributors to adhere to this unqualified metadata schema. therefore, the oai protocol requires minimal technology support and resources at any given contributing site (such support varying more widely across institutions than even their metadata practices themselves). this maximizes flexibility in metadata contribution, as well as maximizing interoperability between the collective data pool from which a user can search. granted, this unregulated framework does come at a cost of inconsistency in metadata detail and quality. however, the great advantage of such nominal requirements is that they enable contributors with minimal metadata-encoding practices to participate in the metadata collaborative. following is an example of a record as it may appear in the mosc collection:
oai:atlantahistorycenter.com:10 2003-03-31 south:blues south:mississippi-delta-region
long hall recordings morris, william blues .. comment: sound amateur recording 2003-05-16 sound recording http://atlantahistorycenter.com/ porcelain/10
additionally, with no fields required by the dc schema, institutions can have absolute discretion as to what metadata are exposed if this is a concern (as may be for privacy considerations for archives or for intellectualproperty concerns for museums). however, one of the great strengths of implementing oai-pmh is that, while the threshold for regulating metadata is low, the protocol can also handle any metadata format expressed by xml, including data formats significantly more structured than dc; for example, ead, text encoding initiative (tei), and tei lite-defined documents. scholars are then able to access these scholarly objects via one point, while still being able to collectively access and utilize all metadata objects available in all collections, from the most to the least robust. the aim of the mosc project participants in selecting oai-pmh is to maximize participation from fairly disparate kinds of organizations, with equally disparate kinds of metadata cultures and practices. in comparison to other, currently available methods of metadata aggregation, oai-pmh is maximally forgiving of discordant metadata suppliers. thereby, the hope is, metadata contributions are maximized. concurrently, the protocol the mosc project | roel 23 24 information technology and libraries | march 2005 allows for highly robust metadata formats. as the cost for inclusion in aggregated systems, in some cases metadata objects are stripped down. this need is eliminated when oai-pmh is utilized. the use of the protocol allows for the inclusion of objects consisting of the most skeletal unqualified dublin core elements, while still accommodating the most complicated metadata objects. optimally, this is a means to achieve a critical mass of contributed resources that will enable end users to utilize the mosc project as the premier site and a primary resource for information on materials about music and musicians associated with social-change movements. � acknowledgment the author would like to express her sincerest gratitude to the institute of museum and library services for funding the music of social change project. references 1. “metascholar: an emory university digital library research initiative,” emory university libraries web site. accessed sept. 1, 2004, http://metascholar.org/; “the center for southern culture,” university of mississippi web site. accessed sept. 1, 2004, www.olemiss.edu/depts/south/; “atlanta history center,” atlanta history center web site. accessed sept. 1, 2004, www.atlantahistorycenter.com/; “georgia music hall of fame,” georgia music hall of fame web site. accessed sept. 1, 2004, www.gamusichall.com/home.html; “institute of museum and library services: library-museum collaboration,” institute of museum and library services web site. accessed sept. 1, 2004, www.imls.gov/grants/l-m/index.htm. 2. “implementation guidelines for the open archives initiative protocol for metadata harvesting,” open archives initiative web site. accessed sept. 1, 2004, www.openarchives.org/ oai/openarchivesprotocol.html#introduction. 3. “4.2 collection and set descriptions,” open archives initiative web site. accessed sept. 1, 2004, www.openarchives.org/ oai/2.0/guidelines-repository.htm#setdescription. 20 information technology and libraries | june 2008 an assessment of student satisfaction with a circulating laptop service louise feldmann, lindsey wess, and tom moothart since may of 2000, colorado state university’s (csu) morgan library has provided a laptop computer lending service. in five years the service had expanded from 20 to 172 laptops. although the service was deemed a success, users complained about slow laptop startups, lost data, and lost wireless connections. in the fall of 2005, the program was formally assessed using a customer satisfaction survey. this paper discusses the results of the survey and changes made to the service based on user feedback. colorado state university (csu) is a land-grant insti-tution located in fort collins, colorado. the csu libraries consist of the morgan library, the main library on the central campus; the veterinary teaching branch hospital library at the veterinary hospital campus; and the atmospheric branch library at the foothills campus. in 1997, morgan library completed a major renovation and expansion which provided a designated space for public desktop computers in an information commons environment. the library called this space the electronic information center (eic). due to the popularity of the eic ,and with the intent of expanding computer access without expanding the computer lab, library staff began to explore the implementation of a laptop checkout service in 2000. library staff used heather lyle’s (1999) article “circulating laptop computers at west virginia university” as a guide in planning the service. development funds were used to purchase twenty laptop computers, and the 3com corporation donated fifteen wireless network access points. the laptops were to be used in morgan library on a wireless network maintained by the library technology services department. these computers were to be circulated from the loan desk, the same desk used to check out books. although the building is open to the public, use of the laptops was limited to university students and staff and for library in-house use only. all the public desktop computers and laptops use microsoft windows and microsoft office. maintaining the security of the libraries’ network and students’ personal data in a wireless environment was paramount. to maintain a secure computing environment and present a standardized computing experience in the library, an application of windows xp group policies was used. currently, the laptop software is updated at least every semester using symantec ghost. ghost copies a standardized image to every laptop even when the library owns a variety of computer models from the same manufacturer. additionally, due to concerns over wireless computer security, morgan library implemented cisco’s virtual private network (vpn) in 2004. the laptop service was launched in may 2000. more than 22,000 laptop transactions occurred in the initial year. since its inception, the use of the morgan library laptop service and the number of laptops available for checkout has steadily grown. using student technology funds, the service had grown to 172 laptops and ten presentation kits consisting of a laptop, projector, and a portable screen. circulation during the fall 2005 semester totaled 30,626 laptops and 102 presentation kits. in fiscal year 2005, 66,552 laptops and presentation kits were checked out. based on the high circulation statistics and anecdotal evidence, the service appeared to be successful. although morgan library replaced laptops every three years and upgraded the wireless network, laptop support staff noted that users complained of slow laptop startups, lost data, and lost wireless connections. the researchers also noted that large numbers of users queued at the circulation desk at 5:00 p.m. even though large numbers of desktop computers were available in the eic. a customer service satisfaction survey was developed to assess the service and test library staff’s assumptions about the service. csu had a student population of 25,616 students at the time of the survey. n literature review much of the published literature discussing laptop services focuses on the implementation of laptop lending programs and was published from 2001 to 2003, when many libraries were beginning this service (allmang 2003; block 2001; dugan 2001; myers 2001; oddy 2002; vaughan and burnes 2002; williams 2003). these articles deal primarily with topics such as how to deal with start-up technological, staffing, and maintenance issues. they have minimal discussion of the service post-implementation. researchers who have surveyed users of university laptop lending services include direnzo (2002), lyle (1999), jordy (1998), block (2001), oddy (2002), and monash university’s caulfield library (2004). direnzo from the university of akron only briefly discusses a survey they conducted with some information about additional software added as a result of their user comments. lyle from west virginia university discusses the percentage of respondents to particular questions such louise feldmann (louise.feldmann@colostate.edu) is the business and economics librarian at colorado state university libraries. she serves as the college liaison librarian to the college of business. lindsey wess (lindsey.wess@colostate. edu) coordinates assistive technology services and manages the information desk and the electronic information center at colorado state university libraries. tom moothart (tmoothar@ library.colostate.edu) is the coordinator of on-site services at colorado state university libraries. student satisfaction with circulating laptop service | feldmann, wess, and moothart 21 as what applications were used, problems encountered, and overall satisfaction with the service. jordy’s report provides in-depth analysis of the survey results from the university of north carolina at chapel hill, but the focus of his survey is on the laptop service’s impact on library employee work flow. monash university’s caulfield library survey focuses on wireless access and awareness of the program by patrons. other survey results found on university library web sites include southern new hampshire university library (west 2005) and murray state university library (2002). additionally, the monmouth university library web site (2003) provides discussion and written analysis of a survey they conducted prior to implementation of their service, a survey which was used to gather information and assess patron needs in order to aid in the construction and planning of their service. from the survey results discussed in the literature and posted on web sites, overall comments from users are very consistent with one another. most users indicate that they use a loaned laptop computer rather than desktop computer for privacy and portability (lyle 1999; oddy 2002; west 2005). in addition, the responses from patrons are overwhelmingly positive and users appreciated having the service made available (lyle 1999; jordy1998; west 2005). both west virginia university and the university of north carolina at chapel hill surveys found that 98 percent of respondents would check out a laptop again (lyle 1999; jordy 1998). southern university of new hampshire’s survey indicated that 88 percent of those responding would check one out again (west 2005). many respondents stated that a primary drawback of using the laptops was the slowness of connectivity (lyle 1999; monash 2004; murray state 2002). the primary use of the laptops, reported in the surveys, was microsoft word (lyle 1999; jordy 1998; oddy 2002). there is a lack of published literature regarding laptop lending customer satisfaction surveys and analysis. this could be due to the relative newness of many programs, the lack of university libraries that provide laptops, or the reliance on circulation statistics solely to assess the program. articles that discuss circulation and usage statistics as an assessment indicator to judge the popularity of their programs include direnzo (2002), dugan (2001), and vaughan and burnes (2002). based on high circulation statistics and positive anecdotal evidence, it may appear that library users are pleased with laptop programs, and perhaps there has been a hesitation to survey users on a program that is perceived by those in the library as successful. n results with the strong emphasis on assessment at colorado state university, it was decided to formally survey laptop users on their satisfaction with the program. the survey was distributed by the access services staff when the laptops were checked out from october 28, 2005, to november 28, 2005. this was a voluntary survey and the respondents were asked to complete one survey. users returned 173 completed surveys. undergraduates are the predominant audience for the laptop service; of the 173 returned surveys, 160 identified themselves as undergraduates. as shown in table 1, the responses indicated that the library has a core of regular laptop users, with 33 percent using the laptops at least daily and 82 percent using the laptops at least weekly. only 3 percent indicated that they were using a laptop for the first time. many laptop users also utilized the eic with 67 percent responding that they use the information commons at least weekly (see table 2). the laptops were initially purchased with the intent that they would be used to support student team projects. presentation kits with a laptop, projector, and portable screen were an extension of this idea and were also made available for checkout. surprisingly, only 15 percent of table 1. how often do you use a library laptop? frequency percentage more than once a day 3% daily 30% weekly 49% monthly 15% my first time 3% n=172 table 2. how often do you use a library pc? frequency percentage more than once a day 3% daily 20% weekly 44% monthly 20% never 13% n=169 22 information technology and libraries | june 2008 the respondents noted that they were using the laptop with a group. during evenings, it was observed by staff that students were regularly queuing and waiting for a laptop even though pcs were available in the library computer lab. figure 1 shows an hourly use statistics for the desktop and laptop public computers. the usage of the desktop computer drops in the late afternoon, just as the use of the laptop computer increases. students were asked why they chose a laptop rather than a library pc and were allowed to choose from multiple answers. as can be seen in table 3, most students noted the advantages of portability and privacy. five respondents wrote in the “other” category that they were able to work better in quieter areas, and ten mention that the computer lab workspace is limited. the dense use of space in the library computer lab has been noted by morgan library staff and students. the desktop surrounding each library pc only provides about three feet of workspace. one respondent explained the choice of laptop over pc was because “i can take it to a table and spread out my notes vs. on a library pc.” for many users, the desktops are too crowded to spread research material, and the eic is too noisy for contemplative thought. as can be noted from the use statistics, the public laptop program has been a very popular library service. prior to the survey, the perception of the morgan library staff was that students were waiting in the evening for extended periods of time for a laptop. when the library expanded the laptop pool from 20 in 2000 to 172 in 2005, it had seemingly no effect on reducing the number of students waiting to use them. as can be seen in table 4, when asked how long they had waited for a laptop, 74 percent of the students said they had access to a laptop immediately, and 15 percent waited less than a minute. the survey was administered during the second busiest time of the year for the library, the month before thanksgiving break. in the open comments, one respondent stated that it was possible to wait fortyfive minutes to an hour for a laptop and another noted that “during finals weeks it is almost impossible to get one.” even with the limited waiting time recorded by the page 1 of 1 feldmann figures.doc 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 7:30 am 8:30 am 9:30 am 10:30 am 11:30 am 12:30 pm 1:30 pm 2:30 pm 3:30 pm 4:30 pm 5:30 pm 6:30 pm 7:30 pm 8:30 pm 9:30 pm 10:30 pm 11:30 pm time of day p er ce nt ag e of u se r desktop computers checkout laptops figure 1. computer use statistics for may 1, 2006. figure 1. computer use statistics for may 1, 2006. table 3. why did you choose to use a laptop rather than a library pc? response number portability 41 privacy 12 easier to work with a group 7 portability and privacy 54 portability and easier to work with a group 10 portability, privacy, and easier to work with a group 12 student satisfaction with circulating laptop service | feldmann, wess, and moothart 23 respondents, when asked how the library could improve the laptop service many respondents requested that more laptops be purchased to decrease the wait. the library is struggling to determine the appropriate number of laptops to have available during peak use periods to reduce or eliminate wait times. the library laptops are more problematic than the library desktop computers to support. the laptops are more fragile than the desktop computers and have the added complication of connecting to the wireless network. every morning the morgan library’s technology staff retrieves non-functioning laptops; library technicians regularly retrieve lost data due to malfunctioning laptops and unsophisticated computer users. the addition of the virtual private network (vpn) connection to the laptop startup script files has slowed the boot-up to the wireless network. an effort has been made to ameliorate wireless “dead zones,” but users still complain of being dropped from the wireless network. with these problems in mind, users were asked about the technical complications they have experienced with the library laptops. the survey responses in tables 5 and 6 indicate a much lower percentage of users reporting technical problems than was anticipated. the technical staff’s large volume of technical calls may reflect the volume of users rather than systematic problems with the laptop service. surprisingly, 79 percent of the users reported rarely or never returning a non-functioning laptop. in addition, the library technicians have reported that no problems have been found on some of the laptops returned for repair. some of the returned computers may be due to frustration with the slow connection to the wireless network. forty-five percent of respondents reported at least occasionally having problems connecting to the wireless network. from the inception of the laptop program, the library has experienced problems with the wireless technology. from its original fifteen wireless access points to its current twenty-nine, the library has struggled to meet the demand of additional library laptops and users’ personal laptops. many written comments on the surveys complained about the slow connection speed of the wireless network such as, “find a way to make the boot-up process faster. i need to wait about five minutes for it to be totally booted and ready to use.” even with the slow connection to the wireless network, 41 percent of students responding to the survey rated their satisfaction with the library’s laptop service as excellent and 49 percent rated their satisfaction as good (see table 7). n discussion even with 90 percent of our users rating the laptop service as good or excellent, the survey noted some problems that needed attention. the morgan library laptops seamlessly connect to a wireless network through a login script when the computer is turned on. a new script was written to table 4. how long did you wait before you were able to check out your laptop? response percentage i did not wait 74% less than one minute 15% one to four minutes 11% five to ten minutes 2% more than ten minutes 0% n=171 table 5. how often have you experience problems saving files, connecting to the wireless network, or had a laptop that locked up or crashed? frequency saving files wireless connection locked up or crashed often <1% 5% <1% occasionally 8% 40% 17% rarely 33% 32% 35% never 58% 24% 49% n= 165 165 163 table 6. how often have you returned a library laptop that was not working properly? frequency percentage often 4% occasionally 18% rarely 30% never 49% n=165 24 information technology and libraries | june 2008 allow the connection and authentication to the cisco virtual private network (vpn) client. during testing it was found that some laptops took as long as ten minutes to connect to the wireless network, which resulted in numerous survey respondents commenting on our slow wireless network. to help correct this problem, the library’s network staff changed each laptop’s user profile from a mandatory roaming profile to a local profile and simplified the login script. the laptops connected faster to the wireless network with the new script, but they still did not meet the students’ expectations. in the fall of 2006, the library network staff moved the laptops from vpn to wi-fi protected access (wpa) wireless security, and laptop login time to the wireless network dropped to under two minutes. the number of customer complaints dropped dramatically after implementing wpa. additional access points were purchased to improve connectivity in morgan library’s wireless “dead zones.” in january 2006, the university’s central computing services audited the wireless network after continued wireless connectivity complaints. the audit recommended reconfiguring the access points channel assignments. in many cases it was found that the same channel had been assigned to access points adjacent to each other, ultimately compromising laptop connectivity. the audit also discovered noise interference on the wireless network from a 2.4-ghz cordless phone used by the loan desk staff. the phone was replaced with a 5.8-ghz one, which has resulted in fewer dropped connections near the loan desk. supporting almost 200 laptops has introduced several problems in the library. the morgan library building was not designed to support the use of large numbers of laptops. because it is impractical for the loan desk to charge nearly 200 laptop batteries throughout the day, laptops available for checkout must be connected to electrical outlets. these are seldom near study tables, and students are forced to crawl underneath tables to locate power or stretch adapter cords across aisles. a space plan for the morgan library is being developed that will increase the number of outlets near study tables. in the meantime, 100 power strips were added to tables used heavily by laptop users. the loan desk staff is very efficient at circulating, but has less success at troubleshooting technical problems. when the laptop service was first implemented, large numbers of laptops were not available due to servicing reasons. the public laptop downtime was lowered by hiring additional library technology students. a one-day onsite repair service agreement was purchased from the manufacturer which resulted in many equipment repairs being completed within 48 hours. in order to reduce the downtime further, a plan to replace some loan desk student workers with library technology students is being evaluated. the technology students will be able to troubleshoot connectivity and hardware problems with the users when they return the defective computers to the loan desk. if a computer needs additional service, it can be handled immediately, which will allow more laptops for checkout since fewer will be removed for repair. when the laptop service was first envisioned, it was seen as a great service for those working in groups. as can be seen in table 3, very few students are using the laptops in a group setting. in survey written comments, students emphasize that they enjoy the portability and privacy enabled by using a laptop. the morgan library eic is cramped and noisy, with the configuration allowing very little room for students to spread out research materials and notes for writing. the morgan library space plan takes these issues into consideration and recommends reconfiguring the eic to lessen the noise and provide writing space near computers. this is intended to improve the student library experience and encourage students to use the desktop computers during the evenings when lines form for the laptops. in order to decrease the current laptop queue at the loan desk, more laptops will be added. as a result of survey comments requesting apple computers, five mac powerbooks were added to the library’s laptop fleet. in addition, as morgan library adds more checkout laptops and the number of students arriving on campus with wireless laptops increases, the wireless infrastructure will need to be upgraded. upgrading the wireless access points to standard 802.11g has been implemented. updating each laptop with a new hardrive image has become problematic as the number of laptops has increased. the wireless network capacity is not large enough for the ghost software to transmit the image to multiple laptops, and so each laptop must be physically attached to the library network. initially, when library technology services attempted imaging many laptops at once, it took six to eight hours and required up to eight staff members. this method of large-scale laptop imaging was so network intensive that it had to be performed when the library was closed to avoid disrupting table 7. please rate your satisfaction with the laptop service. response percentage excellent 41% good 49% neutral 7% poor very poor 2% <1% n=166 student satisfaction with circulating laptop service | feldmann, wess, and moothart 25 public internet use. now imaging the laptop fleet is done piecemeal, twenty to thirty laptops at a time, in order to minimize complications with the ghost process and multicasting through the network switches. due to the staff time required, laptop software is not updated as often as the users would like. technological solutions continue to be investigated that will decrease the labor and network intensity of imaging. n conclusion the morgan library laptop service was established in 2000 and has been a very popular addition to the library’s services. as an example of its popularity, in fiscal year 2005 the laptops circulated 66,552 times. student government continues to support the use of student technology fees to support and expand the fleet of laptops. this survey was an attempt to assess users’ perceptions of the service and identify areas that need improvement. the survey found that students rarely wait more than a few minutes for a laptop, and in open-ended survey questions, students noted that they waited for computers only during peak use periods. while relatively few survey respondents experienced technical difficulties with the laptops and wireless network, slow wireless connection time was a concern that students noted in the open comments section of the survey. overall, the students gave the laptop service a very high rating. when asked to suggest improvements to the service, many respondents recommended purchasing more laptops. the libraries made several changes to improve the laptop service based on survey responses. changes have been made to the login script files, wireless network, and security protocol to speed and stabilize the wireless connection process. additional wireless access points will be added to the building and all access points will be upgraded to the 802.11g standard. in addition, five mac powerbooks have been added to the fleet of windowsbased laptops. the library continues to investigate new service models to circulate and maintain the laptops. works cited allmang, nancy. 2003. our plan for a wireless loan service. computer in libraries 23, no. 3: 20–25. block, karla j. 2001. laptops for loan: the experience of a multilibrary project. journal of interlibrary loan, document delivery, and information 12, no. 1: 1–12. direnzo, susan. 2002. a wireless laptop-lending program: the university of akron experience. technical services quarterly 20, no. 2: 1–12. dugan, robert e. 2001. managing laptops and the wireless network at the mildred f. sawyer library. journal of academic librarianship 27, no. 4: 295–298. jordy, matthew l. 1998. the impact of user support needs on a large academic workflow as a result of a laptop check-out program. master’s thesis, university of north carolina. lyle, heather. 1999. circulating laptop computers at west virginia university. information outlook 3, no. 11: 30–32. myers, penelope. 2001. laptop rental program, temple university libraries. journal of interlibrary loan, document delivery, and information supply 12, no. 1: 35–40. monash university caulfield library. 2004. laptop users and wireless network survey. www.its.monash.edu.au/staff/networks/wireless/review/caul-lapandnetsurvey.pdf (accessed june 8, 2005). monmouth university. 2003. testing the wireless waters: a survey of potential users before the implementation of a wireless notebook computer lending program in an academic library. http://bluehawk.monmouth.edu/~hholden/wwl/wireless_survey_results.html (accessed june 8, 2005). murray state university. 2002. library laptop computer usage survey results. www.murraystate.edu/msml/laptopsurv. htm (accessed june 8, 2005). oddy, elizabeth carley. 2002. laptops for loan. library and information update 1, no. 4: 54–55. vaughn, james b., and brett burnes. 2002. bringing them in and checking them out: laptop use in the modern academic library. information technology and libraries 21, no. 2: 52–62. west, carol. 2005. librarians pleased with results of student survey. southern new hampshire university. www.snhu. edu/3174/asp (accessed june 8, 2005). williams, joe. 2003. taming the wireless frontier: pdas, tablets, and laptops at home on the range. computers in libraries 23, no. 3: 10–12, 62–64. 8 information technology and libraries | june 20088 information technology and libraries | september 2008 from our readers: virtues and values in digital library architecture mark cyzyk editor’s note: “from our readers” will be an occasional feature, highlighting ital readers’ letters and commentaries on timely issues. at the fall 2007 coalition for networked information (cni) conference in washington, d.c., i pre-sented “a survey and evaluation of open-source electronic publishing systems.” toward the end of my presentation was a slide enumerating some of the things i had personally learned as a web application architect during my review of the systems under consideration: n platform independence should not be neglected. n one inherits the flaws of external libraries and frameworks. choose with care. n installation procedures must be simple and flawless. n don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. n documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” initially, these were just notes i thought might be useful to others, figuring it’s typically helpful to share experiences, especially at international conferences. but as i now look at those maxims, it occurs to me that when abstracted further they point in the direction of more general concepts and traits—concepts and traits that accurately describe us and the products of our labor if we are successful, and prescribe to us the concepts and traits we need to understand and adopt if we are not. in short, peering into each maxim, i can begin to make out some of the virtues and values that underlie, or should underlie, the design and architecture of our digital library systems. n freedom and equality platform independence should not be neglected. “even though this application is written in platformindependent php, the documentation says it must be run on either red hat or suse, or maybe it will run on solaris too, but we don’t have any of these here.” while i no doubt will be heartily flamed for suggesting that microsoft has done more to democratize computing than any other single company, i nevertheless feel the need to point out that, for many of us, windows server operating systems and our responsibility for administering them way back when provided the impetus for adding our swipe-card barcodes to the acl of the data center—surely a badge of membership in the club of enterprise it if ever there was one. you may not like the way windows does things. you may not like the way microsoft plays with the other boys. but to act like they don’t exist is nothing more than foolish burying one’s head in the *nix sand. windows servers have proven themselves time and again as being affordable, easily managed, dependable, and, yes, secure workhorses. windows is the ford pickup truck of the server world, and while that pickup will some day inevitably suffer a blowout of its twenty-year-old head gasket (and will therefore be respectfully relegated to that place where all dearly departed trucks go), it’s been a long and good run. we should recognize and appreciate this. windows clearly has a place in the data center, sitting quietly humming alongside its unix and linux brothers. i imagine that it actually takes some effort to produce platform-dependent applications using platform-independent languages and frameworks. such effort should be put toward other things. keep it pure. and by that i mean, keep it platform independent. freedom to choose and presumed equality among the server-side oses should reign. n responsibility and good sense one inherits the flaws of external libraries and frameworks. choose with care. so you’ve installed the os, you’ve installed and configured the specified web server, you’ve installed and configured the application platform, you’ve downloaded and compiled the source, yet there remains a long list of external libraries to install and configure. one by one you install them. suddenly, when you get to library number 16 you hit a snag. it won’t install. it requires a previous version of library number 7, and multiple versions of library number 7 can’t be installed at the same time on the same box. worse yet, as you take a break to read some more of the documentation, it sure looks like required library number 19 is dependent on the current version of library number 7 and won’t work with any previous version. and could it be that library number 21 is dependent on library number 20, which is dependent on library number 23, which is dependent on—yikes—library number 21? mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect, library digital programs group, sheridan libraries, johns hopkins university in baltimore. from our readers: virtues and values in digital library architecture | cyzyk 9 all things come full circle. but let’s suppose you’ve worked out all of these dependencies, you’ve figured out the single, secret order in which they must install, you’ve done it, and it looks like it’s working! yet, when you go to boot up the web service, suddenly there are errors all over the place, a fearsome crashing and burning that makes you want to go home and take a nap. something in your configuration is wrong? something in the way your configuration is interacting with an external library is wrong? you search the logs. you gather the relevant messages. they don’t make a lot of sense. now what to do? you search the lists, you search the wikis to no avail, and finally, in desperation, you e-mail the developers. “but that’s a problem with library x, not with our application.” au contraire. i would like to strongly suggest a copernican revolution in how we think about such situations. while it’s obvious that the developers of the libraries themselves are responsible for developing and maintaining them, i’d like to suggest that this does not relieve you, the developer of a system that relies on their software, from responsibility for its bugs and peculiar configuration problems. i’d like to suggest that, far from pushing responsibility in the case mentioned above out to the developers of the malfunctioning external library, that you, in choosing that library in the first place, have now inherited responsibility for it. even if you don’t believe in this notion of inheritance, if you would please at least act as if it were true, we’d all be in a better place. part of accepting this kind of responsibility is you then acting as a conduit through which we poor implementers learn the true nature of the problem and any solutions or temporary workarounds we may apply so that we can get your system up and running pronto. in the end, it’s all about your system. your system as a whole is only as strong as the weakest link in its chain of dependencies. n simplicity and perfection installation procedures must be simple and flawless. it goes without saying that if we can’t install your system we a fortiori can’t adopt it for use in our organization. i remember once having such a difficult time trying to get a system up and running that i almost gave up. i tried first to get it running against apache 1.4, then against apache 2.0. i had multiple interactions with the developers. i banged my head against the wall of that system for days in frustration. the documentation was of little help. it seemed to be more part of an internal documentation project, a way for the developers to communicate among themselves, than to inform outsiders like me about their system. and related to this i remember driving to work during this time listening to a report on npr about the famous hopkins pediatric neurosurgeon, dr. ben carson. apparently, earlier in the week he had separated the brains of siamese twins and the twins were now doing fine, recuperating. the npr commentator marveled at the intricacy of the operation and at the fact that the whole thing took, i believe, five hours. “five hours? five hours?!” i exclaimed while barreling down the highway in my vintage 1988 ford ranger pickup (head gasket mostly sealed tight, no compression leakage). “i can’t get this system at work installed in five days!” our goal as system architects needs to be that we provide to our users simple and flawless installation procedures so that our systems can, on average, be installed and configured in equal or less time than it takes to perform major brain surgery.1 “all in an afternoon” should become our motto. i am happy to find that there are useful and easy to use package managers, e.g., yum and synaptic, for doing such things on various linux distributions. windows has long had solid and sophisticated installation utilities. tomcat supports drop-in-place war files. when possible and appropriate, we need to use them. n justice and e-z livin don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. i remember reading plato’s republic as an undergraduate and the feeling of being let down when the climax of the whole thing was a definition in which “justice” simply is each man serving his proper place in society and not transgressing the boundaries of his role. “that’s it?” i thought. “so you have this rigidly hierarchical society and each person in it knows his role and knows in which slot his role fits—and keeping to this is ‘justice’?” this may not be such a great way to structure a society, but now that i think about it, it’s a great way to structure a computer application. sit down and carefully look at the functions your program will provide. then create a small set of user roles to which these functions will be carefully mapped. in the end you will have a hierarchical structure of roles and functions that should look perfectly simple and rational when drawn on a piece of paper. and while the superuser role should have power over 10 information technology and libraries | september 2008 all and access to all functions in the application, the list of functions that he alone has access to should be small, i.e., the actual work of the superuser should be minimized as much as possible by making sure that most functions are delegated to the members of other, appropriate, proper user roles. doing this happily results in what i call the state of e-z livin: the last thing you want is for users to constantly be calling you with data issues to fix. you therefore will model management of the data—all of it—and the configuration of the application itself—most of it— directly into the architecture of the application, provide users the guis they need to configure and manage things themselves, and push as much functionality as you can out to them where it belongs. let them click their respective ways to happiness and computing goodness. you build the tool, they use it, and you retire back to the land of e-z livin. users are assigned to their roles, and all roles are in their proper places. application architecture justice is achieved. n clarity and wholeness documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” as system developers we’ve probably all had the magical experience of a mind meld with a fellow developer when working intensively on a project. i have had this experience with two other developers, separately, at different stages of my career. (one of them, in fact, used to point out to everyone that, “between the two of us, we make one good developer!”) this is a wonderful and magical and productive working relationship in which to be, and it needs to be recognized, supported, and exploited whenever it happens. you are lucky if you find yourself designing and developing a system and your counterpart is reading your mind and finishing your sentences. however, just as it’s best to leave that nice young couple cuddling in the corner booth alone, so too it really doesn’t make a lot of sense to expect the mind-melded developers to turn out anything that remotely resembles coherent and understandable documentation. those undergoing a mind meld by definition know perfectly well what they mean. to the rest of us it just feels like we missed a memo. if you have the luxury, make sure that the one writing the documentation is not currently undergoing a mind meld with anyone else on the development team. scotty typically stayed behind while he beamed the others down. beam them down. be that scotty. you do the world a great service by staying behind on the ship and dutifully reporting, clearly and comprehensively, what’s happening down on the red planet. to these five maxims, and their corresponding virtues, i would add one more set, one upon which the others rely: n empathy and graciousness you are not your audience. at least in applied computing fields like ours, we need to break with the long-held “guru in the basement” mentality. the actions of various managerial strata have now ostensibly acknowledged for us that technical expertise, especially in applied fields, is a commodity, i.e., it can be bought. a dearth of such expertise is remedied by simply applying money to the situation—admittedly difficult to do at the majority of institutions of higher education, but a common occurrence at the wealthiest. nevertheless, the dogmatic hold of the guru has been broken and the magical aura that once draped her is not now so resplendent—her relative rarity, and the clubby superiority that depended upon it, has been diluted significantly by the sheer number of counterparts who can and will gleefully fill her function. we respect, value, and admire her; it’s just that her stranglehold on things has (rightfully) been broken. and while nobody is truly indispensable, what is more difficult and rare to find is someone who has the guru’s same level of technical chops coupled with a genuine empathic ability to relate to those who are the intended users of her systems and services. unless your systems and services are geared primarily toward other developers, programmers, and architects— and presumably they are not, nor, in the library world, should they be—your users will typically be significantly unlike you. let me repeat that: your users are not like you. rephrased: you are not your audience. when looking back over the other maxims, values, and virtues mentioned in this essay then, the moralpsychological glue that binds them all is composed of empathy for our users—faculty, students, librarians, non-technical staff—and the graciousness to design and carry out a project plan in a spirit of openness, caring, flexibility, humility, respect, and collaboration. when empathy for the users of our systems is absent—and there are cases where you can actually see this in the design and documentation of the system itself—our systems will ultimately not be used. when the spirit of graciousness is broken, men become robots, mere rule followers, and users will boycott using their systems and will look elsefrom our readers: virtues and values in digital library architecture | cyzyk 11 where, naturally preferring to avoid playing the simonsays games so often demanded by tech folk in their workaday worlds; there is a reason the comic strip dilbert is so funny and rings so true. when confronted with a lack of empathy and graciousness on our part, the users who can boycott using our systems and services will boycott using our systems and services. and we’ll be left out in the rain, feeling like, as bonnie raitt once sadly sang, “i can’t make you love me if you don’t / i can’t make your heart feel something it won’t.” empathy and graciousness, while not guaranteeing enthusiastic adoption of our systems and services, are a necessary precondition for users even countenancing participation. there are undoubtedly other virtues and values that can usefully be expounded in the context of digital library architecture—consistency, coherence, and elegance immediately come to mind—and i could go on and on analyzing the various maxims surrounding these that bubble up through the stack of consciousness during the course of the day. yet doing so would conflict with another virtue i think is key to the success and enjoyment of opinionpiece essays like this and maybe even of other sorts of publications and presentations: brevity. note 1. a colleague of mine has since informed me that carson’s operation took twenty-five hours, not five. nevertheless, my admonition here still holds. when installation and configuration of our systems are taking longer, significantly longer, than it takes to perform major brain surgery, surely there is something amiss? lib-mocs-kmc364-20131012114126 286 communications marc format simplification d. kaye capen: university of alabama, university. this is a summary of a paper written on the consideration of the feasibility as well as the benefits, disadvantages, and consequences of simplification of the marc formats for bibliographic records. 1 the original paper was commissioned in june 1981, by the arl task force on bibliographic control as one facet in exploring the perceived high costs of cataloging and adhering to marc formats in arl libraries. the conclusions and recommendations, however, are entirely those of the author and the opinions and judgments stated here result from a wide-ranging canvas of technical services people, computer people, and/o r library administrators. because the marc format has so many uses, the paper is divided into five perspectives from which the marc format can be viewed: history, standards, and codes; present purposes; library operations; computer operations; and online catalogs. the library of congress has already begun a review of the marc format and has distributed a draft document. 2 the general thrust of that review is a close examination of the marc format in an attempt to begin to lay the foundation on which revised marc formats can firmly standparticularly in regard to content designation (tags, indicators, and subfield codes used to identify and characterize the data explicitly). as that review deals with the very specific, this paper aims generally at attempting to paint with broad strokes a picture of today's marc in its many relationships, benefits, costs, and what the impact would be to the whole from any change to the part. perspective: marc history, standards, and codes relationships the original marc format document established conventions for encoding data for monographs. though it was understood that early applications were going to relate to the production of catalog cards, the marc designers looked ahead to an increasing emphasis on data retrieval applications. other design considerations included, for example, the necessity for providing for complex computer filing, allowance for a variety of data processing equipment, and an attempt to provide for some analytical work (more specific description of contents notes or other types of analysis). later the single marc ii format was transformed into a series of formats, and as time passed, those formats became inextricably tied to other developments at the national and international levels: the international standard bibliographic descriptions, the anglo-american cataloguing rules , 2d ed., unimarc, the national level bibliographic records, and the national and international communications standards; e.g., ansi z39.2-1979 and iso 2709. benefits the benefits of the marc formats and other standards and codes have been substantial both philosophically and pragmatically. the sharing of cataloging records through the computer-based, online networks have been shown in a variety of cost studies to have contained the rate of rise of per unit cost. a further benefit of the marc formats is the momentum its creation gave to the steady movement toward standardization which can benefit individuallibraries in a number of ways: first, bibliographic information can be exchanged among libraries and countries. second, in recent years we have moved steadily toward creating an environment in which the library of congress would become one of many authoritative libraries thus enhancing the shareability of records. costs the early costs of the development and implementation of the marc formats were borne by lc (aided by council on library resources funds). lc continues to bear most of the costs of marc formats, such as new marbi proposals, duplication and distribution of documentation, and so forth. direct investment of library dollars came through the purchase of the marc tapes and the development of systems to receive, process, and output data in marc formats. impact of change throughout the years of its use, the marc format content designation and content rules have been augmented or modified. in the beginning, however, databases were small and changes could be absorbed more readily. the number and complexity of the formats have increased, as have the interrelationships of the marc formats with other standards and codes resulting in a present environment in which the impact of change is felt more strenuously. perspective: present relationships and constraints relationships today's close interrelationships between the marc formats and other codes and standards affect both library and computer operations. though, for example, the general international standard bibliographic description was implemented by the library community prior to the adoption of aacr2, the second edition of the rules has firmly incorporated the isbds. when this format description system is combined with the machine-based marc formats, some isbd information will be supplied by humans and some generated by programmed machine manipulations. communications 287 as a second example, in the last couple of years, the library of congress has spearheaded the development of national level bibliographic record(s) which define the specific data elements that should be included by any organization creating cataloging records which may also be shared with other organizations or be acceptable for contribution to a national database. as the logical idea of a national database comes to fruition, it is necessary for the marc format to provide for greater specificity in the coding of originating library, modifying library, and so forth. benefits the benefits of the use of the marc format continue to lie in the ease with which bibliographic information can be shared and the concomitant beneficial impact on cost control. in addition, the marc format supports a host of other standards and codes and the benefit from these relationships has been consistency in and fostering of standards development. in the bibliographic arena, the more that standards are developed-locally, regionally, nationally, and internationally-the more we will be able to transmit and share bibliographic data, thus controlling the costs of original cataloging. on the other hand, we also "pay" when we standardize. cost the two costs associated with increased standardization are additional time and thus cost required to meet standards, and the increased expense of maintaining local practices which may often be idiosyncratic. in relation to the latter, while many local idiosyncrasies are often unnecessary and counterproductive, there are generally some which have become an integral part of a large catalog database or upon which a major procedural activity is based. but, to benefit from compliance with standards, increasingly we will move away from local practices. in terms of the time required to adhere to the marc format, it is possible to continue to utilize the format (or participate in systems that use it) and yet control the amount of complexity with which one has to deal. both aacr2 and national level biblio288 journal of library automation vol. 14/4 december 1981 graphic record documents allow for "levels of description" which provide for more or less description; and various online networks allow, in a similar manner, for limited input standards. as we view the array of standards and codes which together make up today's bibliographic scene, we can see that each of the separate elements is consistent within itself, is understandable, and counts for only a portion of the costs associated with the cataloging process. the combination of elements, however, begins an accretion of complexity that for most requires an effort of organization and education in order to control work flow and meet standards. impact of change because the marc format is closely interwoven with a number of national and international codes and standards, changes to the format would have implications far beyond the local library. at the very least, discussions would have to involve a host of individuals and groups, all at different stages of development and implementation based upon the present marc format. perspective: library operations relationships in the library-operations perspective, any operations related to the marc format have to be viewed as only one of many elements which must be interfaced with daily work flow. let us look, for example, at the amount of time which might be expended in a typical large academic library by cataloging personnel in training and ongoing work activities required in marc-related operations. in those libraries which obtain access to cataloging databases as members of networks, contact with the marc format is filtered through the standards, requirements , marc implementation design, documentation and other related training facilities of the network. libraries which maintain their own databases do the same kind of filtering, though staff may have somewhat more control of the user cordiality of the interface. the shared networking environment , however, generally seems to imply more standards and requirements because of the attempt to guarantee as much "shareability" as possible. libraries participating in oclc, for example, must train staff in the following codes: aacri; aacr2; standard subject heading codes; standard classification codes; oclc/marc formats for each type of material being cataloged; oclc bibliographic input standards; oclc level i and level k input standards; oclc systems users guides; in some instances, input standards documents for regional or special-interest cooperatives; local library interpretations, procedures, and standards. any close review of the time library staff expend in the use of these tools for either training or ongoing operations reveals that marc per se requires only a limited proportion of a typical library staff person's day. while training may be intensive at either the beginning of a person's job or at the beginning of work with a new type/format of material, this portion of the cataloging unit cost is small. benefits, costs in the cataloging activity, the benefits from the use of the marc formats are at least two: first, the marc format as part of an online cataloging system permits the machine-production of catalog cards at a major savings over manual production. second, access to a shared cataloging database permits the use of "clerical" catalogers at an estimated unit cost saving per book of twenty dollars when compared to "original" cataloging.3 third, depending upon the information available in the cataloging record, the time required for decision making during the cataloging process can be decreased significantly. impact of change it was the general consensus of the technical services people i contacted that simplification of the formats through the consistent assignment of tags would make training and introduction to new formats somewhat easier, but that any savings of time would probably be trivial. there was no consensus that either simplification or shortening would result in any significant time or cost savings. to a certain extent, the use of the very specific marc formats has made the descriptive cataloging process (and the training to undertake it) clearer in that the logical relationships and description of the data elements are so clearly exposed through the assignment of tags and other codes. also, once initial familiarity with the format(s) is achieved, ongoing use becomes second nature. it is also possible for cataloging staff to control the complexity with which they will deal through the use of less than "full," but still nationally acceptable levels of cataloging and, hence, marc coding. finally, most technical services people believe that cataloging and maintenance activities in libraries have always been complex, requiring long and detailed procedures and intricate work flow . while membership in networks requires new skills and knowledge, it is the sum of the whole rather than the difficulty of any single portion which affects unit costs today. changing the marc format through either simplification or shortening would have only a slight effect on the total technical services operation and costs. perspective: the computer operations environment relationships in looking at computer operations, there are at least two major subdivisions: operations that serve only one client (e.g., alibrary system serving itself) or operations that serve many clients (e.g., rlin or blackwell/north america). the constraints differ for each operation and are further complicated by whether or not the computer operation must be able to produce as well as accept bibliographic records in a marc format. each computer facility, for example, can have distinct operating software depending upon the type and mix of computing equipment used. in addition, each computing facility translates the marc-formatted records into an internal processing format which may differ extensively from marc. too, further tailoring may be done for batch processing as opposed to online operations and computer operations which serve a single user may not have to re-create records in the marc format and may even communications 289 more radically redesign the marcformatted records for internal use. as changes to the marc format occur over the years, each computer system will write additional software to incorporate those changes into the then existing system. in some instances, it may be too difficult to attempt to convert old databases to reflect changes in marc coding, and there will then exist an "old" database and a "new" database for that particular marc field or subfield. since changes have occurred in many fields, most databases are an amalgam of new and old interpretations (this is true in relation to cataloging codes, too) of marc coding, and original internal software design may reflect the same type of patchwork quilt. operating these computer systems is complicated, in addition, by the fact that a wide range of user library needs and desires must be accommodated. indeed, a report prepared by hank epstein for the conference to explore machine-readable bibliographic interchange (cembi) revealed after an exhaustive review of the use of marc data elements that there was no data element not used by someone!• benefits benefits that accrue to computing operations as a result of the marc format include the use of what was called "a pretty decent general communications format ," which facilitates communications, card/ com production, and online information retrieval. as a communications format it is as coherent as any other structure for carrying bibliographic data. because the format allows for a very specific level of detail in description, computing operations can supply a variety of products to fill a variety of needs. costs while specific cost information was not available for inclusion in this paper, discussion does reveal some widely held generalizations. first, the marc format does not seem to be any more complex or costly to use than other variable field communications formats. beginning programmers are generally introduced first to the internal communications format of their particular 290 journal of library automation vol. 14/4 december 1981 computing system, and when they come to the marc tags rapidly become familiar with the coding through experience. indeed, if the programmers know the structure of and have a specification for the format, they can work with that format even though they may be unfamiliar with it from the users' point of view. thus, the format itself, and training in its use does not seem to be significantly costly. second, every change in the marc format requires some programming effort and may or may not require concomitant changes in the database. the consensus of the computer people with which i spoke was that the sophistication and specificity of the marc formats was a good thing, but the inconsistencies among formats is problematical. the benefits of consistency can be important, but to justify changes financially, the major changes should be done at one time. indeed, most individuals doubted whether or not there was sufficient capital in these straitened times to be able to implement consistently a major marc format changeand this is from the perspective of both the operations serving one and many users. impact of change without a philosophical and practical framework (or benchmark) against which to compare the benefits and costs of alternative solutions to marc format maintenance issues and without a better and more comprehensive description of the requirements of the internal processing formats of the computer operations, it is difficult to assess clearly the costs and benefits of marc format changes. it does seem to be the case presently that, once established, computer operations can deal with the complexity and specificity of the marc format without undue ongoing financial investment. the strength of the marc format for computer operations lies in its specificity. for the batch processing environment especially, the marc format is a reasonably efficient format and one that facilitates development. its inefficiencies are not drastic and its specificity buys valuable flexibility. severe cuts or major simplifications would be a mistake since discontinuing specificity is a one-way street-once it is gone, it cannot be retrieved. the ability of the machine to assist in editing is weakened by the loss of specificity and it then becomes more difficult to edit out poor data. simplification through consistency, rather than shortening, would produce the most beneficial impact-though it must be done carefully to be cost beneficial. perspective: online catalogs relationship the major difficulties facing us when we attempt to discuss the relationship of the marc format to online catalogs is that, first, we know so little about how people think when they use our card catalogs; and, second, we have so little experience with how those thought and use patterns might change when the online catalog replaces the card catalog. another aspect of online library system development is the combination of subsystems such as acquisitions, serials control, or authority control with the online catalog and the implications of such a combination for system design, the internal processing format, and compatibility with the marc format. the index design of most large online catalogs or information retrieval systems today relies upon precoordinated search keys in order to facilitate the large sorting activities that have to occur. the second indicator in the 700 field, for example, is designed for the purpose of formulating search keys, filing added entries or for selecting alternative secondary added entries. this type of specificity is necessary for both card production and online retrieval. taken together, all of these considerations make most systems and library technical people hesitate to recommend any major changes to the marc format at this time. benefits at this time, therefore, in terms of information retrieval, there does not seem to be any major force toward either simplifying or shortening the marc format to facilitate retrieval. this becomes an even more cogent sentiment when we consider that major development efforts have already been begun in the areas of online catalog access and information retrieval. delays in these development efforts now caused by ........ changes in the marc formats could be enormously wasteful of the time and effort already invested, and could postpone urgently needed implementation of new, easily maintainable online systems. costs there is no firm cost data to guide us in considering the impact of marc format changes in the information retrieval environment. generally accepted assumptions are, however, that because of our lack of knowledge and experience in this area, it is simply too risky and potentially costly to experiment. impact of change overall, without more experience in this area, it is the general opinion that the fullest level of descriptive specificity of the marc format might be required to design and implement online catalogs/information retrieval systems which can be responsive to the needs of a variety of users and levels of information. interaction with other subsystems and formats is also incomplete, thus clouding our vision of the impact of change over the breadth of the library community. summary and conclusions the original purpose of the marc format is still a cogent and necessary one-that of allowing for a great variety of individual library needs for products, practices, and policies via a standardizing communications format. both catalog card production and online retrieval necessitate the same level of specificity, though particular tags, indicators, and subfield codes may vary. as we look toward a variety of authoritative cataloging sources the marc format, in addition to a specific coding of bibliographic information, might also have to specify descriptions of cataloging actions so that the greatest degree of "shareability" might exist. some of this related authoritytype information will either be carried as part of the marc format or in some manner as linked records. the computer operations that utilize the marc formats exist under the constraints of a variety of internal processing formats and design constraints. for each internal processing system, however, the specificity of the marc format offers flexibility and communications 291 efficiency for a number of different processes and products. taken by itself, the marc format is no more difficult to work with than any other standard or technique for both librarians and computer people. while it might be useful for librarians to implement training aids such as online documentation, access to library manuals (particularly that of the library of congress), and so forth, the benefits of aids such as these are trivial since the coding can be learned rather quickly through experience. for computing people, on the other hand, changes in the formats can be very expensive and disruptive. there is general agreement, moreover, that over the long term we have got to be able to maintain the marc format in response to experience with retrieval and other theoretical and technical advances. the main thrust of maintenance in the computing realm is consistency across formats, but approaching this type of simplification requires a number of preliminary steps if it is to be implemented effectively. we need to develop a vocabulary for jointly discussing the elements of the problem. in addition, a major review needs to be undertaken of the internal processing formats and design constraints of the major computer operations-both to serve as a benchmark for measuring the impact of format changes, and as a guideline for newly developing systems to assist in avoiding mistakes in the development of new computer operations. someone needs to be thinking about and designing the ultimate, comprehensive marc format-not to be implemented, but to serve as a springboard for discussion and for consideration of system design. we need to establish limitations on what we will handle with the marc formats and where we will begin to rely on underlying formats instead. the development of a comprehensive marc conceptualization would also provide a protocol for undertaking the improvement of marc and would serve as a benchmark against which local systems could be compared. at the very least, the steps described here would facilitate the consideration and implementation of making the formats consistent across types of material a goal which is seen by all to be highly desirable. 292 journal of library automation vol. 14/4 december 1981 we need a format which is consistent, easily maintainable without being uncontrollably disruptive, and responsive to changing needs which are likely to accelerate as we gain experience with online systems. rather than recommending or supporting the implementation of specific changes to the marc format, it is essential that the library community begin to establish the framework and benchmarks necessary to maintain the marc formats over the long term as well as to guide short-term considerations. arl and others can play an important role in undertaking and encouraging a broader approach to this pressing problem. such an approach will not only reduce the risk of decision making, but will also assist in the development of the cost/benefit data needed to enhance consideration of format changes. references 1. d. kaye capen, simplification of the marc format: feasibility, benefits, disadvantages, consequences (washington, d.c.: association of research libraries, 1981), 22p. 2. "principles of marc format content designation,'" draft (washington, d.c.: library of congress, 1981), 66p. 3. ichikot. morita and d. kaye capen, "a cost analysis of the ohio college library center on-line shared cataloging system in the ohio state university libraries," library resources & technical services 21:286302 (summer 1977). 4. council on library resources bibliographic interchange committee, bibliographic interchange report, no. i (washington, d.c.: the council, 1981). comparing fiche and film: a test of speed terence crowley: division of library science, san jose state university, san jose, california. introduction for more than a decade librarians have been responding to budget pressures by altering the format of their library catalogs from labor-intensive card formats to computer-produced book and microformats. studies at bath, 1 toronto, 2 texas, 3 eugene, 4 los angeles, 5 and berkeley, 6 have compared the forms of catalogs in a variety of ways ranging from broad-scale user surveys to circumscribed estimates of the speed of searching and the incidence of queuing. the american library association published a state-of-the-art reporf as well as a guide to commercial computer-output microfilm (com) catalogs pragmatically subtitled how to choose; when to buy. 8 in general, com catalogs are shown to be more economical and faster to produce and to keep current, to require less space, and to be suitable for distribution to multiple locations. primary disadvantages cited are hardware malfunctions, increased need for patron instruction, user resistance (particularly due to eyestrain), and some machine queuing. the most common types of library com catalogs today are motorized reel microfilm and microfiche, each with advantages and disadvantages. microfilm offers filesequence integrity and thus is less subject to user abuse, i.e., theft, misfiling, and damage; in motorized readers with "captive" reels it is said to be easier to use. disadvantages include substantially greater initial cost for motorized readers; limits on thecapacity of captive reels necessitating multiple units for large files; inexact indexing in the most widespread commercial reader, and eyestrain resulting from high speed film movement. microfiche offers a more nearly random retrieval, much less expensive and more versatile readt:r~, and unlimited file size. conversely, the file integrity of fiche is lower and the need for patron assistance in use of machines is said to be greater than for self-contained motorized film readers. the problem one of the important considerations not fully researched is that of speed of searching. the toronto study included a selftimed "look-up" test of thirty-two items "not in alphabetical order" given to thirtysix volunteers, of whom thirty finished the test. the researchers found the results "inconclusive" but noted that seven of the ten librarians found film searching the fastest method. "average" time reported for searching in card catalogs was 37.3 minsmartphones: a potential discovery tool | starkweather and stoward 187 smartphones: a potential discovery tool wendy starkweather and eva stowers the anticipated wide adoption of smartphones by researchers is viewed by the authors as a basis for developing mobile-based services. in response to the unlv libraries’ strategic plan’s focus on experimentation and outreach, the authors investigate the current and potential role of smartphones as a valuable discovery tool for library users. w hen the dean of libraries announced a discovery mini-conference at the university of nevada las vegas libraries to be held in spring 2009, we saw the opportunity to investigate the potential use of smartphones as a means of getting information and services to students. being enthusiastic users of apple’s iphone, we and the web technical support manager, developed a presentation highlighting the iphone’s potential value in an academic library setting. because wendy is unlv libraries’ director of user services, she was interested in the applicability of smartphones as a tool for users to more easily discover the libraries’ resources and services. eva, as the health sciences librarian, was aware of a long tradition of pda use by medical professionals. indeed, first-year bachelor of science nursing students are required to purchase a pda bundled with select software. together we were drawn to the student-outreach possibilities inherent in new smartphone applications such as twitter, facebook, and myspace. n presentation our brief review of the news and literature about mobile phones in general provided some interesting findings and served as a backdrop for our presentation: n a total of 77 percent of internet experts agreed that the mobile phone would be “the primary connection tool” for most people in the world by 2020.1 the number of smartphone users is expected to top 100 million by 2013. there are currently 25 million smartphone users, with sales in north america having grown 69 percent in 2008.2 n smartphones offer a combination of technologies, including gps tracking, digital cameras, and digital music, as well as more than fifty-thousand specialized apps for the iphone and new ones being designed for the blackberry and the palm pre.3 the palm pre offered less than twenty applications at its launch, but one million apllication downloads had been performed by june 24, 2009, less than a month after launch.4 n the 2009 horizon report predicts that the time to adoption of these mobile devices in the educational context will be “one year or less.”5 data gathered from campus users also was presented, providing another context. in march 2009, a survey of university of california, davis (uc-davis) students showed that 43 percent owned a smartphone.6 uc-davis is participating in apple’s university education forum. here at unlv, 37 percent of students and 26 percent of faculty and staff own a smartphone.7 the presentation itself highlighted the mobile applications that were being developed in several libraries to enhance student research, provide library instruction, and promote library services. two examples were abilene christian university (http://www.acu.edu/technology/ mobilelearning/index.html), which in fall 2008 distributed iphones and ipod touches to the incoming freshman class; and stanford university (http://www.stanford .edu/services/wirelessdevice/iphone/) which participates in “itunes u” (http://itunes.stanford.edu/). if the libraries were to move forward with smartphone technologies, it would be following the lead of such universities. readers also may be interested in joan lippincott’s recent concise summary of the implications of mobile technologies for academic libraries as well as the chapter on library mobile initiatives in the july 2008 library technology report.8 n goals: a balancing act ultimately the goal for many of these efforts is to be where the users are. this aspiration is spelled out in unlv libraries’ new strategic plan relating to infrastructure evolution, namely, “work towards an interface and system architecture that incorporates our resources, internal and external, and allows the user to access from their preferred starting point.”9 while such a goal is laudable and fits very well into the discovery emphasis of the mini-conference presentation, we are well aware of the need for further investigation before proceeding directly to full-scale development of a complete suite of mobile services for our users. of critical importance is ascertaining where our users are and determining whether they want us to be there and in what capacity. the value of this effort is demonstrated in booth’s research report on student interest in emerging technologies at ohio state university. the report includes the results of an extensive environmental survey of their wendy starkweather (wendy.starkweather@unlv.edu) is director, user services division, and eva stowers (eva.stowers @unlv.edu) is medical/health sciences librarian at the university of nevada las vegas libraries. 188 information technology and libraries | december 2009 library users. the study is part of ohio state’s effort to actualize their culture of assessment and continuous learning and to use “extant local knowledge of user populations and library goals” to inform “homegrown studies to illuminate contextual nuance and character, customization that can be difficult to achieve when using externally developed survey instruments.”10 unlv libraries are attempting to balance early experimentation and more extensive data-driven decision-making. the recently adopted strategic plan includes specific directions associated with both efforts. for experimentation, the direction states, “encourage staff to experiment with, explore, and share innovative and creative applications of technology.”11 to that end, we have begun working with our colleagues to introduce easy, small-scale efforts designed to test the waters of mobile technology use through small pilot projects. “text-a-librarian” has been added to our existing group of virtual reference service, and we introduced a “text the call number and record” service to our library’s opac in july 2009. unlv libraries’ strategic plan helps foster the healthy balance by directing library staff to “emphasize data collection and other evidence based approaches needed to assess efficiency and effectiveness of multiple modes and formats of access/ownership” and “collaborate to educate faculty and others regarding ways to incorporate library collections and services into education experiences for students.”12 action items associated with these directions will help the libraries learn and apply information specific to their users as the libraries further adopt and integrate mobile technologies into their services. as we begin our planning in earnest, we look forward to our own set of valuable discoveries. references 1. janna anderson and lee rainie, the future of the internet iii, pew internet & american life project, http://www.pewinternet .org/~/media//files/reports/2008/pip_futureinternet3.pdf .pdf (accessed july 20, 2009). 2. sam churchill, “smartphone users: 110m by 2013,” blog entry, mar. 24, 2009, dailywireless.org, http://www.daily wireless.org/2009/03/24/smartphone-users-100m-by-2013 (accessed july 20, 2009). 3. mg siegler, “state of the iphone ecosystem: 40 million devices and 50,000 apps,” blog entry, june 8, 2009, tech crunch, http://www.techcrunch.com/2009/06/08/40-million-iphones -and-ipod-touches-and-50000-apps (accessed july 20, 2009). 4. jenna wortham, “palm app catalog hits a million downloads,” blog entry, june 24, 2009, new york times technology, http://bits.blogs.nytimes.com/2009/06/24/palm-app-cataloghits-a-million-downloads (accessed july 20, 2009). 5. larry johnson, alan levine, and rachel smith, horizon report, 2009 edition (austin, tex.: the new media consortium, 2009), http://www.nmc.org/pdf/2009-horizon-report.pdf (accessed july 20, 2009). 6. university of california, davis. “more than 40% of campus students own smartphones, yearly tech survey says,” technews, http://technews.ucdavis.edu/news2.cfm?id=1752 (accessed july 20, 2009). 7. university of nevada las vegas, office of information technology, “student technology survey report: 2008– 2009,” http://oit.unlv.edu/sites/default/files/survey/survey results2008_students3_27_09.pdf (accessed july 20, 2009). 8. joan lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl bi-monthly report 261 (dec. 2008), http://www.arl.org/bm~doc/arl-br-261-mobile .pdf. (accessed july 20, 2009); ellyssa kroski, “library mobile initiatives,” library technology reports 44, no. 5 (july 2008): 33–38. 9. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 20, 2009): 2. 10. char booth, informing innovation: tracking student interest in emerging library technologies at ohio university (chicago: association of college and research libraries, 2009), http:// www.ala.org/ala/mgrps/divs/acrl/publications/digital/ ii-booth.pdf (accessed july 20, 2009); “unlv libraries strategic plan 2009–2011,” 6. 11. “unlv libraries strategic plan 2009–2011,” 2. 12. ibid. assignfast: an autosuggest-based tool for fast subject assignment rick bennett, edward t. o’neill, and kerre kammerer information technology and libraries | march 2014 34 abstract subject assignment is really a three-phase task. the first phase is intellectual—reviewing the material and determining its topic. the second phase is more mechanical—identifying the correct subject heading(s). the final phase is retyping or cutting and pasting the heading(s) into the cataloging interface along with any diacritics, and potentially correcting formatting and subfield coding. if authority control is available in the interface, some of these tasks may be automated or partially automated. a cataloger with a reasonable knowledge of faceted application of subject terminology (fast)1,2 or even library of congress subject headings (lcsh)3 can quickly get to the proper heading but usually needs to confirm the final details—was it plural? am i thinking of an alternate form? is it inverted? etc. this often requires consulting the full authority file interface. assignfast is a web service that consolidates the entire second phase of the manual process of subject assignment for fast subjects into a single step based on autosuggest technology. background faceted application of subject terminology (fast) subject headings were derived from the library of congress subject headings (lcsh) with the goal of making the schema easier to understand, control, apply, and use while maintaining the rich vocabula ry of the source. the intent was to develop a simplified subject heading schema that could be assigned and used by nonprofessional cataloger or indexers. faceting makes the task of subject assignment easier. without the complex rules for combining the separate subdivisions to form an lcsh heading, only the selection of the proper heading is necessary. the now-familiar autosuggest4,5 technology is used in web search and other text entry applications to help the user enter data by displaying and allowing the selection of the desired text before typing is complete. this helps with error correction, spelling, and identification of commonly used terminology. prior discussions of autosuggest functionality in library systems have focused primarily on discovery rather than on cataloging.6-11 rick bennett (rick_bennett@oclc.org) is a consulting software engineer in oclc research , edward t. o’neill (oneill@oclc.org) is a senior research scientist at oclc research and project manager for fast, and kerre kammerer (kammerer@oclc.org) is a consulting software engineer in oclc research, dublin, ohio. http://www.oclc.org/research/activities/fast.html http://www.loc.gov/catdir/cpso/lcc.html mailto:rick_bennett@oclc.org mailto:oneill@oclc.org mailto:kammerer@oclc.org information technology and libraries | march 2014 35 the literature often uses synonyms for autosuggest, such as autocomplete or type-ahead. since assignfast can lead to terms that are not being typed , autosuggest seems most appropriate and will be used here. the assignfast web service combines the simplified subject choice capabilities of f ast with the text selection features of autosuggest technology to create an in -interface subject assignment tool. much of a full featured search interface for the fast authorities, such as searchfast ,12 can be integrated into the subject entry field of a cataloging interface. this eliminates the need to switch screens, cut and paste, and make control character changes that may differ between the authority search interface and the cataloging interface. as a web service, assignfast can be added to existing cataloging interfaces. in this paper, the actual operation of assignfast is described , followed by how the assignfast web service is connected to an interface, and finally by a description of the web service construction. assignfast operation an authority record contains the established heading, see headings, and control numbers that may be used for linking or other future reference. the relevant fields of the fast record for motion pictures are shown here: control number fst01027285 established heading motion pictures see cinema see feature films -history and criticism see films see movies see moving-pictures in fast, the facet of each heading is known. motion pictures is a topical heading. the see references are unauthorized forms of the established heading. if someone intended to enter cinema as a subject heading, they would be directed to use the established heading motion pictures. for a typical workflow, the subject cataloger would need to leave the cataloging interface, search for “cinema” in an authority file interface, find that the established heading was motion pictures, and return to the cataloging interface to enter the established heading. the figure below shows the same process when assignfast is integrated into the cataloging interface. without leaving the cataloging interface, typing only “cine” shows both the see term that was initially intended and the established heading in a selection list. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 36 figure 1. assignfast typical selection choices. selecting “cinema use motion pictures” enters the established term, and the entry process is complete for that subject. figure 2. assignfast selection result. the text above the entry box provides the fast id number and facet type. information technology and libraries | march 2014 37 as a web service, assignfast headings can be manipulated by the cataloging interface software after selection and before they are entered into the box. for example, one option available in the assignfast demo is marcbreaker format.13 marcbreaker combines marc field tagging and allows diacritics to be entered using only ascii characters. using marcbreaker output, assignfast returns the following for “ ”: =651 7$abrazil$zs{tilde}ao paulo$0(ocolc)fst01205761$2fast in this case, the output includes marc tagging of 651 (geographic), as well as subfie ld coding ($z) that identifies the city within brazil, that it’s a fast heading, and the fast control number. the information is available in the assignfast result to fill one or multiple input boxes and to reformat as needed for the particular cataloging interface. addition to web browser interfaces as a web service, assignfast could be added to any web-connected interface. a simple example is given here to add assignfast functionality to a web browser interface using javascript and jquery (http://jquery.com). these technologies are commonly used, and other implementation technologies would be similar. example files for this demo can be found on the oclc developers network under assignfast.14 the example uses the jquery.autocomplete function.15 first, the script packages jquery.js, jqueryui.js, and the style sheet jquery-ui.css are required. version 1.5.2 of jquery and version 1.8.7 for jquery-ui was used for this example, but other compatible versions should be fine. these are added to the html in the script and link tags. the second modification to the cataloging interface is to surround the existing subject search input box with a set of div tags.
the final modification is to add javascript to connect the assignfast web service to the search input box. this function should be called from assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 38 function setuppage() { // connect the autosubject to the input areas jquery('#existingbox').autoc omplete( { source: autosubjectexample, minlength: 1, select: function(event, ui) { jquery('#extrainformation').html("fast id " + ui.item.idroot + " facet "+ gettypefromtag(ui.item.tag)+ ""); } //end select } ).data( "autocomplete" )._renderitem = function( ul, item ) { formatsuggest(ul, item);}; } //end setuppage() the source: autosubjectexample tells the autocomplete function to get the data from the autosubjectexample function, which in turns calls the assignfast web service. this is in the assignfastcomplete.js file. in select: function, the extrainformation text is rewritten with additional information returned with the selected heading. in this case, the fast number and facet are displayed. the generic _renderitem of the jquery.autocomplete function is overwritten by the formatsuggest function (found in assignfastcomplete.js) to create a display that differentiates the see from the authorized headings that are returned in the search. the version used for this example shows: see heading use authorized heading when a see heading is returned, or simply the authorized heading otherwise. web service construction the autosuggest service for a fast heading was constructed a little differently than the typical autosuggest. for a typical autosuggest for the term motion picture from the example given above, you would index just that term. as the term was typed, motion picture and other terms starting with the text entered so far would be shown until you resolved the desired heading. for example, typing in “m t” might give motion pictures motion picture music employee motivation information technology and libraries | march 2014 39 diesel motor mothers and daughters for the typical autosuggest, the term indexed is the term displayed and is the term returned when selected. for assignfast, both the established and see references are indexed. however, when typing resolves a see heading, both the see heading and its established heading are displayed. only the established heading is selected, even if you are typing the see heading. for assignfast, the “m t” result now becomes features (motion pictures) use feature films motion pictures motorcars (automobiles) use automobiles motion picture music background music for motion pictures use motion picture music motion pictures for the hearing impaired use films for the hearing impaired documentaries, motion picture use documentary films mother of god use mary, blessed virgin, saint the headings in assignfast are ranked by how often they are used in worldcat, so headings that are more common appear at the top. to place the established heading above the see heading when they are similar, the established heading is also ranked higher than the see for the same usage. assignfast can also be searched by facet, so if only topical or geographic headings are desired, only headings from these facets will be displayed. the web service uses a solr16 search engine running under tomcat.17 this provides full text search and many options for cleaning and manipulating the terms within the index. the particular option used for assignfast is the edgengramfilter.18 this option is used for autosuggest and has each word indexed one letter at a time, building to its entire length. the ndex f “c nem ” w then c nt n “c,” “c ,” “c n,” “c ne,” “c nem,” nd “c nem .” solr handles utf-8 encoded unicode for both input and output. the assignfast indexes and queries are normalized using fast normalization19 to remove punctuation, diacritics, and capitalization. fast normalization is very similar to naco normalization, although in fast nor malization the subfield indicator is replaced by a space and no commas retained. assignfast is accessed using a rest request.20 rest requests consist of urls that can be invoked via either http post or get methods, either programmatically or via a web browser. http://fast.oclc.org/searchfast/fastsuggest?&query=[query]&queryindex=[queryindex]&qu eryreturn=[queryreturn]&suggest=autosuggest&rows=[numrows]&callback=[callbackfun ction] assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 40 where parameter description query the query to search queryindex the index corresponding to the fast facet. these include name description suggestall all facets suggest00 personal names suggest10 corporate names suggest11 events suggest30 uniform titles suggest50 topicals suggest51 geographic names suggest55 form/genre queryreturn information requested list, comma separated. these include: names description idroot fast number auth authorized heading, formatted for display with—as subfield separator type alt or auth—indicates whether the match on the queryindex was to an authorized or see heading tag marc authority tag number for the heading—100= personal name, 150 = topical, etc. raw authorized heading, with subfield indicators. blank if this is identical to auth (i.e., no subfields) breaker authorized heading in marcbreaker format. blank if this is identical to raw (i.e., no diacritics) indicator indicator 1 from the authorized heading numrows headings to return maximum restricted to 20 callback the callback function name for jsonp table 1. assignfast web service results description. information technology and libraries | march 2014 41 example response: http://fast.oclc.org/searchfast/fastsuggest?&query=hog&queryindex=suggestall&queryreturn=s uggestall%2cidroot%2cauth%2ctag%2ctype%2craw%2cbreaker%2cindicator&suggest=autosu bject&rows=3&callback=testcall yields the following response: testcall({ "responseheader":{ "status":0, "qtime":148, "params":{ "json.wrf":"testcall", "fl":"suggestall,idroot,auth,tag,ty pe,raw,b reaker,indicator", "q":"suggestall:hog", "rows":"3"}}, "response":{"numfound":1031,"start":0,"docs" :[ { "idroot":"fst01140419", "tag":150, "indicator":" ", "type":"alt", "auth":"swine", "raw":"", "breaker":"", "suggestall":["hogs"]}, { "idroot":"fst01140470", "tag":150, "indicator":" ", "type":"alt", "auth":"swine--ho using", "raw":"swine$xhousing", "breaker":"", "suggestall":["hog houses"]}, { "idroot":"fst00061534", "tag":100, "indicator":"1", "type":"auth", "auth":"hogarth, william, 1697-1764", "raw":"hogarth, william,$d1697-1764", "breaker":"", "suggestall":["hogarth, william, 1697-1764"]}] }}) table 3. typical assignfast json data return. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 42 the first response heading is the use for headin hogs, which has the authorized heading swine. the second is the use for heading for hog houses, which has the authorized heading swine-housing. this authorized heading is also given in its raw form, including the $x subfield separator, which is unnecessary for the first heading. the third response matches the authorized heading for hogarth, william, 1697–1764, which is also given in its raw form. the breaker (marcbreaker) format is only added if it differs from the raw form, which is only when diacritics are present. conclusions subject assignment is a combination of intellectual and manual tasks. the assignfast web service can be easily integrated into existing cataloging interfaces, greatly reducing the manual effort eq ed f g d s bject d t ent y nc e s ng the c t ge ’s p d ct v ty. references 1. lois mai chan and edward t. o’neill, fast: faceted application of subject terminology, principles and applications (santa barbara, ca: libraries unlimited, 2010), http://lu.com/showbook.cfm?isbn=9781591587224 . 2. oclc research activities associated with fast are summarized at http://www.oclc.org/research/activities/fast. 3. lois m. chan, library of congress subject headings: principles and application: principles and application (westport, ct: libraries unlimited, 2005). 4. “a t c mp ete ” wikipedia, last modified on october 1, 2013, http://en.wikipedia.org/wiki/autocomplete. 5. tony russell-rose, “des gn ng e ch: as-you-type suggestions,” ux magazine, article no. 828, may 16, 2012, http://uxmag.com/articles/designing-search-as-you-type-suggestions. 6. david ward, jim hahn, and kirsten fe st “a t c mp ete s rese ch t : a t dy n providing search suggestions ” information technology & libraries 31, no. 4 (december 2012), 6–19. 7. jon je mey “a t m ted indexing: feeding the autocomplete monster,” indexer 28, no. 2 (june 2010), 74–75. 8. holger bast, christian w. mortensen, and ingmar webe “o tp t-sensitive autocompletion search,” information retrieval 11 (august 2008), 269–286. 9. elías tzoc, “re-using today’s metadata for tomorrow’s research: five practical examples for enh nc ng access t d g t c ect ns ” journal of electronic resources librarianship 23, no. 1 (january–march 2011) http://lu.com/showbook.cfm?isbn=9781591587224 http://www.oclc.org/research/activities/fast/ http://en.wikipedia.org/wiki/autocomplete http://uxmag.com/articles/designing-search-as-you-type-suggestions information technology and libraries | march 2014 43 10. holger bast and ingmar webe “type less f nd m e: f st a t c mp et n e ch w th succinct index,” sigir ’06 proceedings of the 29th annual international acm sigir conference on research and development in information retrieval (new york: acm, 2006), 364–71. 11. demian katz, ralph levan, and ya’aqov ziso “us ng a th ty d t n v f nd ” code4lib journal 11 (june 2011). 12. edward t. o’ne , rick bennett, and kerre kammerer, “using authorities to improve subject searches ” in m j ž me nd k. r e nd edw d t. o’ne eds., “ ey nd l b es— subject metadata in the digital environment and semantic web ,” special issue, cataloging & classification quarterly 52, no. 1/2 (in press). 13. “marcm ke nd marc e ke use ’s m n ” library of congress, network development and marc standards office, revised november 2007, http://www.loc.gov/marc/makrbrkr.html . 14. “oclc deve pe s netw k— ss gnfa t ” s bm tted eptembe 28 2012 http://oclc.org/developer/services/assignfast [page not found] 15. “jq e y t c mp ete ” ccessed oct be 1 2013 http://jqueryui.com/autocomplete. 16. “ap che l cene—ap che ” ccessed oct be 1 2013 http://lucene.apache.org/solr. 17. “ap che t mc t ” ccessed oct be 30 2013 http://tomcat.apache.org. 18. “ olr w k —analyzers tokenizers tokenfilters,” last edited october 29, 2013, http://wiki.apache.org/solr/analyzerstokenizerstokenfilters . 19. thomas b. hickey, jenny toves, and edward t. o’neill, “naco normalization: a detailed examination of the authority file comparison rules,” library resources & technical services 50, no. 3 (2006), 166–72. 20. “rep esent t n t te t nsfe ” wikipedia, last modified on october 21, 2013, http://en.wikipedia.org/wiki/representational_state_transfer . http://www.loc.gov/marc/makrbrkr.html http://oclc.org/developer/services/assignfast http://jqueryui.com/autocomplete/ http://lucene.apache.org/solr/ http://tomcat.apache.org/ http://wiki.apache.org/solr/analyzerstokenizerstokenfilters http://en.wikipedia.org/wiki/representational_state_transfer 2 information technology and libraries | june 2007 i write my final president’s column a month after the midwinter meeting in seattle. you will read it as preparations for the ala annual conference in washington, d.c. are well underway. despite that discon­ nect in time, i am confident that the level of enthusiasm will continue uninterrupted between the two events. indeed, the midwinter meeting was highly charged with positive energy and excitement. the feelings are reignited if you listen to the numerous podcasts now found on the lita blog. the lita bloggers and podcasters were omni­ present reporting on all of the meetings and recording the musings of the lita top tech trendsters. by the time you have read this you will have also, hopefully, cast your ballot for lita officers and directors after having had the opportunity to listen to brief podcast interviews with the candidates. the lita board approved the election pod­ casts at the annual conference in new orleans. thanks to the collaborative efforts of the nominating committee and the bigwig members, we have this new input into our voting decision­making. the most exciting aspects of the midwinter meeting were the face­to­face, networking opportunities that make lita so great. the lita happy hour crowd filled the six arms bar and lit it up with the wonderful lita glow badges. what was particularly gratifying to me was the number of new lita members alongside those of us who have been around longer than we care to count. the net­ working that went on there was phenomenal! the other important networking opportunity for lita members was the lita town meeting led by lita vice president mark beatty. the room was packed with eager members ready to brainstorm about what they think lita should be doing after consuming a wonderful breakfast. lita’s sponsored emerging leader, michelle boule, and mark have collated the findings and will be working with the other emerging leaders to fine­tune a direction. the podcast interview of michelle and mark is an excellent summary of what you can expect in the next year when mark is president. as stated earlier, this is my last president’s column, which means my term is winding down. using lita’s strategic plan as a guide, i have worked with many of you in lita to ensure that we have a structure in place that allows us to be more adaptable to the rapidly chang­ ing world and to make sure that lita is relevant to lita members 365 x 24 x 7 and not just at conferences and lita national forum. attracting and retaining new members is critical for the health of any organization and in that vein, mark and i have used the ala emerging leaders program as a jumping off point to work with lita’s emerging leaders. the bigwig group is foment­ ing with energy and excitement as they rally bloggers and have this past year launched the podcasting initiative and the lita wiki. all of these things are making it easier for members to communicate about issues of interest in their work as well as to conduct lita business. the lita blog had over nine thousand downloads of its podcasts in the first three weeks after midwinter which confirms the desire for these types of communications! i appointed two task forces that provided recommen­ dations to the lita board at midwinter. the assessment and research task force has recommended that a perma­ nent committee be established to monitor the collection of feedback and assessment data on lita programs and services. having an established assessment process will enable the board to know how well we are accomplishing our strategic plan and to keep us on the correct course to meet membership needs. the education working group has recommended the merger of two committees, the education and regional institutes committees, into one education committee. this merged committee will develop a variety of educational opportunities including online and face­to­face sessions. we hope to have both of these committees up and going later in 2007. happily, the feedback from the town meeting parallels the recom­ mendations of the task forces. the board will be revisit­ ing the strategic plan at the annual conference using information gathered at the town meeting. we will also be looking at what new services we should be initiating. all arrows seem to be pointing towards more educational and networking opportunities both virtual and in person. i anticipate that lita members will see some great new things happening in the next year. i have very much enjoyed the opportunity to serve as the lita president this past year. the best part has been getting to know so many lita members who have such creative ideas and who roll up their sleeves and dig in to get the work done. i am very grateful for everyone who has volunteered their time and talents to make lita such a great organization. bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. president’s column bonnie postlethwaite lib-mocs-kmc364-20131012113204 194 journal of library automation vol. 14/3 september 1981 today's large academic libraries struggle, there is, nonetheless, room for criticism of library priorities. this study must be viewed as only a first step (largely tentative and exploratory) in relating automation with service attitudes. it suggests that online systems may be associated with managers more positive in their view of the management role and more positive in their attitudes toward users than batchand manual-system managers. further research would be useful at this point to compare levels of automation (manual, batch, and online) with circulation-staff service attitudes or those of patrons using the systems. references l. laurence miller, "changing patterns of circulation services in university libraries" (ph.d. dissertation, florida state university, 1971), p.iii. 2. ibid., p.149. 3. robert oram, "circulation," in allen kent and harold lancour, eds., encyclopedia of library and information science, v.s (new york: marcel dekker, 1971), p.l. 4. william h. scholz, "computer-based circulation systemsa current review and evaluation," library technolo gy reports 13:237 (may 1977). 5. robert oram , " circulation," p.2. 6. james robert martin , "automation and the service environment of the circulation manager" (ph.d. dissertation, florida state university, 1980), p.22. statistics on headings in the marc file sally h. mccallum and james l. godwin: network development office, library of congress, washington, d.c. in designing an automated system, it is important to understand the characteristics of the data that will reside in the system. work is under way in the network development office of the library of congress (lc) that focuses on the design requirements of a nationwide authority file. in support of this work, statistics relating to headings that appear on the bibliographic records in the lc marc ii files were gathered. these statistics provide information on characteristics of headings and on the expected sizes and growth rates of various subsets of authority files. this information will assist in making decisions concerning the contents of authority files for different types of headings and the frequency of update required for the various file subsets. then ational commission on libraries and information science supported this work. use of these statistics to assist in system design is largely system-dependent; however, some general implications are given in the last section of this paper. in general , counts were made of the number of bibliographic records, headings that appear in those records, and distinct headings that appear on the records. the statistics were broken down by year, by type of heading, and by file. in this paper, distinct headings are those left in a file after removal of duplicates. distinctness will not be used to imply that a heading appears only once in a source bibliographic file, although distinct headings may in fact have only a single occurrence. thus, a file of records containing the distinct headings from a set of bibliographic records is equivalent in size to a marc authority file of the headings in those bibliographic records. methodology these statistics were derived from four marc ii bibliographic record files maintained internally at lc: books, serials, maps, and films. the files contain updated versions of all marc records that have been distributed by lc on the books, serials, maps, and films tape:; frum 1969 through october 1979, and a few records that were then in the process of distribution. the files do not contain cip records. a total of l ,336,182 bibliographic records were processed, including 1,134,069 from the books file, 90,174 from the serials file, 60,758 from the maps file, and 51,176 from the films file. a file of special records, called access point (ap) records, was created that contains one record for the contents of each occurrence of the following fields in the bibliographic records: type of heading personal name corporate name conference name topical subject geographic subject uniform title heading fields 100,700,400,800,600 110,710,410,810,610 111,711,411,811,611 130, 730, 650 651 830,630 only the 6xx subject fields that contained lc subject headings (i.e., second indicator = 0) were selected asap records. the main entry data string was substituted for the pronoun in the series (4xx) fields that contained pronouns. the ap records also contained information from the bibliographic records that assisted in making the counts, such as the date of entry of the record on the file, the identity of the type of bibliographic file, and the language of the bibliographic record. a third file was derived from the ap file that contained a normalized character string for each ap record heading. these normalized ap records were used to produce the counts of distinct headings by clustering like data strings. normalization included conversion of all characters to uppercase, and masking of diacritics, marks of punctuation, and other characters that do not determine the distinctness of a heading, but would interfere with machin~ determination of uniqueness. the subhelds included in the normalized string, hence used for all heading comparisons, are given below. only use-dependent subfields, such as the relator subfield, and those that belonged to title clusters in author/title headings were excluded. examples of the ap file field contents and the normalized forms are: ap field contents: chuang-tzu chuang-tzu [blaeu,joan] 1596-1673 blaeu, joan. 1596-1673 blaeu,joan, 1596-1673 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788.1824 communications 195 normalized forms: chuang tzu blaeu joan 1596 1673 byron george gordon noel byron baron 17881824 distinct headings for this study were determined by comparing on the following subfields: type of heading personal name corporate name sub fields a, b,c,d a, b, k, f, p, s, g conference name a, q, e topical subject a, b, x, y, z geographic subject a, b, x, y, z all occurrences of repeating subfields were included. the relator data of subfields were dropped from personal and corporate name headings as were the title subfields in author/title headings. a separate study will examine the occurrence of author/title headings. approximately 8 percent of the name headings in the files carry title subfields: 6 percent are series and 2 percent are author/title subjects or added entries. two types of distinct heading counts were generated for topical and geographic subject headings. one takes account only of main terms, the a and b subfields, excluding all subject subdivisions. the other compared the complete heading strings, including subject subdivisions. characteristics of the files the four bibliographic files from which the statistics were derived were begun in different years and are of unequal size. table 1 presents the number of bibliographic records added to each of the marc files by the year that the record was first entered into the file. the records added in the first months of 1979 have been eliminated from tables 1-3, thus the total number of records under consideration is 1,210,809. in the combined file, the records for books dominate the contributions from other forms of materials, representing 85 percent of the combined file records. after the addition of the films and serials records in 1972 and 1973 the total number of records added each year leveled off to around 115,000 but jumped to an average of slightly more than 150,000 records per year following the ad196 journal of library automation vol. 14/3 september 1981 table 1. number of records added to each file by year year entered book serial map film total 1968 11,812 0 0 0 11,812 1969 43,874 0 1, 104 0 44,978 1970 86,004 0 3,467 0 89,978 1971 105,390 0 8,857 6,280 114 ,247 1972 73,437 0 4,665 6,280 84,382 1973 92,512 3,720 5,566 8,929 110,727 1974 99,004 10,682 6,246 8,457 124,389 1975 86,527 15,866 6,721 8,604 117,718 1976 120,106 19,098 6,876 5,432 151,512 1977 140,011 17,999 7,011 4 ,797 169,818 1978 169,044 12,643 5,584 4,464 191,735 total 1,027,721 80,008 56,117 46,963 1,210,809 table 2. numbers of headings and distinct name headings added to all files by year number of headin gs number of distinct headin gs year personal corporate conference personal corporate conference entered names names names names names. names 1968 14,526 3,138 155 12,620 2,139 143 1969 53, 134 21,206 1,027 39,184 9,364 909 1970 104,365 42 ,798 2,175 63,037 14,286 1,769 1971 129,617 57,496 2,742 64,029 15,216 2,158 1972 91,040 45,768 1,942 41,246 9,891 1,402 1973 118,188 57,847 2,625 48,703 12,653 1,862 1974 127,588 73,303 2,972 51,623 17,129 1,983 1975 113,622 76,417 2,519 50,291 18,135 1,742 1;}76 154 ,7 18 88,207 3,454 73,182 23,120 2,306 1977 182,860 87,985 3,487 89,353 23,906 2,333 1978 218,535 97,042 4,192 99,780 24,280 2,831 total 1,308, 193 651,207 27,290 633,048 170, 119 19,438 table 3. numbers of subject headings and distinct subject headings added to all files by year number of distinct headings number of headings first terms only full headings year topical geographic topical geographic topical geographic entered subjects subjects subjects subjects subjects subjects 1968 10,615 1,857 4,390 489 7,775 1,512 1969 45,161 9,047 8,104 1,980 23,617 5,426 1970 89,304 21,054 8,170 4,263 34 ,526 10,179 1971 115,220 31,278 6,853 5,417 36,689 12,862 1972 92,247 20,760 4,236 2,597 26,201 7,074 1973 121 , 161 27,890 4,460 3,105 33,061 9,819 1974 137,843 31,814 4,524 3,553 39,262 11 ,4 13 1975 130,980 30,650 4,203 3,417 40,129 11 ,818 1976 168,840 39,886 5, 125 4,142 55,468 15,472 1977 185,331 44,973 5,718 4,194 59,529 16,676 1978 222,565 49,923 7,151 4,034 69,856 17,855 t otal 1,319,267 309,132 62,934 37,191 426, 113 120, 106 clition of major non-english roman alphabet language records in 1976. the increase is noticeable primarily in the books and serials files since the maps file had been adding those languages since 1969 and only a limited number of non-english-language audiovisual materials are cataloged. the unusually large number of records added to the books file in 1971 resulted from a special project to add retrospective titles to the file. the large increase in books records in 1978 was due to the co marc project in which retrospective lc records that had been converted to machine-readable form by other libraries were contributed to the lc marc file. approximately 12,000 comarc records were added in 1977 and 28,000 in 1978. the fall in numbers of film records produced in 1976-1978 reflects a general fall in production of instructional films in the united states. counts of items cataloged that are compiled by lc processing services from catalogers' statistics sheets show that lc cataloged approximately 225,000 titles in 1978; thus, approximately 73 percent of lc cataloging is currently going into machinereadable form. the principal exclusions are records for most nonroman material (only nonroman records for maps have been transliterated and added since 1969) and a few records for music, sound recordings, incunabula, and microforms. the portion being put into machine-readable form should rise significantly as the romanized records for items in several nonroman alphabets are added in the next year. name headings table 2 presents the number of occurrences of name headings in the marc bibliographic files and the number of distinct name headings, both by type of heading and by year. the number of distinct headings that were new to the file in a year was determined by comparing the headings added in a given year against those added in all previous years. it is not surprising to find that 66 percent of name-heading occurrences are personal names, 33 percent are corporate, and only 1.4 percent are conference. the figures shift when considering the distinct names, where 77 percent are percommunications 197 sonal and only 21 percent are corporate. looking at ~he total figures in table 2, while 1 ,308,193 of the headings that appeared on the records were personal names, only 633,048 or 48 percent of these were distinct. of the rest, 52 percent were duplicates of the distinct headings. similarly, 26 percent of corporate names were distinct, with 74 percent being duplicates; and 71 percent of conference names were distinct, with only 29 percent being duplicates. in 1968, 87 percent and 68 percent of personal and corporate names, respectively, were distinct, i.e., 13 percent and 32 percent "had been used previously" when they appeared on a bibliographic record during the year. as the base file of names grows, the percentage of names appearing on new records but which "had been used previously" rises, to 60 percent and 77 percent in 1974. while the figures reported in table 2 indicate that the percentage of headings used that were repeats fell slightly again in 1977 (51 percent and 73 percent), this is probably due to the influx of new names with the addition of new languages in 1976-77. additional statistics gathered on english-language items show the percentage of repeating headings becoming steady after 1974. subject headings statistics concerning distinct topical and geographical subject headings were collected for main terms, excluding subdivisions, and for full subject heading strings. table 3 gives the numbers of headings and the numbers of distinct headings of each type found in the marc file. looking at the total figures, only 4.8 percent of topical first terms are distinct, the rest are duplicates. this indicates an average occurrence of 20.8 times for each first term. slightly more, 12 percent, of the geographic first terms are distinct. when the full headings with topical, period, form, and geographic subdivisions are considered, the percentage of headings that are distinct rises to 32.3 percent for topical subjects and 38.8 percent for geographic subjects. thus, 67.3 percent of topical and 61.2 percent of geographic are duplicates of existing headings. in the yearly figures, sub198 journal of library automation vol. 14/3 september 1981 ject headings show the same tendency as name headings in that the percentages of headings that appear on new records but which "had been previously used" rises as the stock of headings increases and then levels off. subjects were also affected by the addition of other roman alphabet languages in 1976-77 but not to a very large degree. for all access points, name headings and full string subject headings, name headings account for 55 percent of the headings that occur in the bibliographic records, with only 45 percent attributable to topical and geographical headings. it should be noted that 12 percent of the name headings that appear on the bibliographic records are names used as subjects. frequencies of occurrence counts were also made of the frequency with which name headings occurred in the bibliographic files. table 4 summarizes the frequency data: 66 percent of distinct personal names, 62 percent of distinct corporate names, and 84 percent of distinct conference names occur only once in the files. the percent of corporate names with single occurrences is surprisingly close to that for personal; however, the percent of names having multiple occurrences falls more slowly for corporate than for personal names. while 5.47 percent of corporate names occur ten or more times, only 1.92 percent of personal names occur ten or more times. the figures for personal names roughly correspond to those obtained by william potter from a sample taken from the main catalog at the university of illinois at urbana-champaign. that study showed 63.5 percent of personal names occurred onlyonce. 1 the number of occurrences of different types of headings are compared in figure 1. the bars show the numbers of personal, corporate, conference, topical, and geographic headings that appear in the bibliographic files. the shaded areas represent the number of headings that are distinct, thus the upper part of each bar represents additional occurrences of the headings from the shaded area. for personal, corporate, and conference headings a further distinction is made between distinct headings that occur only once, the crosshatched area, and those that have multiple occurrences. thus the multiple occurrences of corporate names may be seen to come from a small table 4. frequency of occurrence of name headings in all files distinct distinct distinct number of personal names corporate names conference names occurrences number percent number percent number percent 1 456,328 65.65 116,250 62.02 18,02 1 83.90 2 119,68 1 17.22 30,185 16.10 2,049 9.54 3 46,247 6.65 11,563 6.17 587 2.73 4 23,951 3.45 6,814 3.64 289 1.35 5 13,820 1.99 4,109 2.19 163 .76 6 8,790 1.26 2,958 1.58 98 .46 7 5,827 .84 2,175 1.16 56 .26 8 4,056 .58 1,673 .89 48 .22 9 2,998 .43 1,395 .74 36 . 17 10 2,153 .31 10 ,037 .55 18 .08 11-13 4,116 .59 2,180 1.16 44 .20 14-20 3,748 .54 2,632 1.40 41 .19 2150 2,678 .39 2,901 1.55 23 .11 51-100 448 .06 936 .50 4 .02 101-200 149 .02 374 .20 2 .01 201-300 47 .01 109 .06 1 .00 301400 19 .00 46 .02 0 .00 401-500 11 .00 21 .01 0 .00 5011000 5 .00 53 .03 0 .00 1001 + 2 .00 18 .01 0 .00 total 695,074 99.99 187,429 99.98 21,480 100.00 number of distinct corporate headings, as was indicated by the slow decrease of the multiple-heading occurrence rate (i.e., a small group of corporate names have a very large number of occurrences). file growth as a bibliographic file grows and the stock of names and subjects that are contained in the associated authority file increases, the number of new-to-the-file 1400 1200 1000 "' <:> 800 z i5 : .. 0 a: w 600 id ::;: "' z 400 200 1,444,726 personal names corporate names communications 199 headings that are required for the new bibliographic records would be expected to fall. figure 2 illustrates that tendency and shows that there is a leveling off of the number of new-to-the-file headings per new bibliographic record after the bibliographic file reaches a certain size. for example, after approximately 700,000 bibliographic records are in the file, for every additional 100 bibliographic records approximately 298 name and subject headings 30,417 conference names 1.468,804 topical subjects geographic subjects d distinct headings distinct headings that occur -only once fig. 1. number of headin gs by type. 200 journal of library automation vol. 14/3 september 1981 will be assigned, and, of these, approximately 53 will be new personal names, 14 new corporate names, 2 new conference names, 35 new topical subjects, and 10 new geographic subjects; the remaining 184 headings used will already be established in the authority file. thus after a certain bibliographic file size is reached, the growth of the authority file is approximately a linear function of the growth of the bibliographic file. implications the reoccurrence frequency of headings in a bibliographic file is often cited as a factor in designing bibliographic and authority-file configurations. discussion 1.2 ii i 0 0 .9 a: 0 u w a: .8 ~ ~ .7 z 5 :.\ " .6 ~ z 0 .5 a: w "' ~ . 4 z .3 centers on the necessity of carrying authority records for headings that occur only once in a bibliographic file . with reference to the name-heading data in table 4 and figure 1, carrying authority records only for headings that occur more than once could 'potentially reduce the size of the authority file from that indicated by the whole shaded area (including shaded and crosshatched) to the plain shaded area, i.e., from 903,983 records to 310,123, a 66 percent decrease. controlling multiple occurrences of a heading is, however, only one role of the authority record. more important perhaps is the control of cross-references connected with the heading. preliminary work with a • persona l names ---9 top~cal su8jects ... corporate names 2~ ----------~----~---------& geographi ca l subj ects 'y con ference n ames » ~~~~r=~~~~~==~==~==~~~==~==~==~==~-100 200 300 400 500 600 700 800 900 1000 11 00 1200 1300 number of bibliographic records cthousands) fig. 2 . n umber of n ew headings p er r eco rd for all files. random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456, 328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 (winter 1980). rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 volve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found, it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a va):'iety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for-printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the microsoft word june_ital_owen_final.docx engine  of  innovation:     building  the  high  performance  catalog        will  owen  and   sarah  c.  michalak     information  technology  and  libraries  |  june  2015               5   abstract   numerous  studies  have  indicated  that  sophisticated  web-­‐based  search  engines  have  eclipsed  the   primary  importance  of  the  library  catalog  as  the  premier  tool  for  researchers  in  higher  education.   we  submit  that  the  catalog  remains  central  to  the  research  process.  through  a  series  of  strategic   enhancements,  the  university  of  north  carolina  at  chapel  hill,  in  partnership  with  the  other   members  of  the  triangle  research  libraries  network  (trln),  has  made  the  catalog  a  carrier  of   services  in  addition  to  bibliographic  data,  facilitating  not  simply  discovery,  but  also  delivery  of  the   information  researchers  seek.   introduction in  2005,  an  oclc  research  report  documented  what  many  librarians  already  knew—that  the   library  webpage  and  catalog  were  no  longer  the  first  choice  to  begin  a  search  for  information.  the   report  states,   the  survey  findings  indicate  that  84  percent  of  information  searches  begin  with  a  search   engine.  library  web  sites  were  selected  by  just  1  percent  of  respondents  as  the  source  used  to   begin  an  information  search.  very  little  variability  in  preference  exists  across  geographic   regions  or  u.s.  age  groups.  two  percent  of  college  students  start  their  search  at  a  library  web   site.1   in  2006  a  report  by  karen  calhoun,  commissioned  by  the  library  of  congress,  asserted,  “today  a   large  and  growing  number  of  students  and  scholars  routinely  bypass  library  catalogs  in  favor  of   other  discovery  tools.  .  .  .  the  catalog  is  in  decline,  its  processes  and  structures  are  unsustainable,   and  change  needs  to  be  swift.”2     ithaka  s+r  has  conducted  national  faculty  surveys  triennially  since  2000.  summarizing  the  2000– 2006  surveys,  roger  schonfeld  and  kevin  guthrie  stated,  “when  the  findings  from  2006  are   compared  with  those  from  2000  and  2003,  it  becomes  evident  that  faculty  perceive  themselves  as   becoming  decreasingly  dependent  on  the  library  for  their  research  and  teaching  needs.”3   furthermore,  it  was  clear  that  the  “library  as  gateway  to  scholarly  information”  was  viewed  as   decreasingly  important.  the  2009  survey  continued  the  trend  with  even  fewer  faculty  seeing  the       will  owen  (owen@email.unc.edu)  is  associate  university  librarian  for  technical  services  and   systems  and  sarah  c.  michalak  (smichala@email.unc.edu)  is  university  librarian  and  associate   provost  for  university  libraries,  university  of  north  carolina  at  chapel  hill.     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   6   gateway  function  as  critical.  these  results  occurred  in  a  time  when  electronic  resources  were   becoming  increasingly  important  and  large  google-­‐like  search  engines  were  rapidly  gaining  in   use.4     these  comments  extend  into  the  twenty-­‐first  century  more  than  thirty  years  of  concern  about  the   utility  of  the  library  catalog.  through  the  first  half  of  this  decade  new  observations  emerged  about   patron  perceptions  of  catalog  usability.  even  after  migration  from  the  card  to  the  online  catalog   was  complete,  the  new  tool  represented  primarily  the  traditionally  cataloged  holdings  of  a   particular  library.  providing  direct  access  to  resources  was  not  part  of  the  catalog’s  mission.   manuscripts,  finding  aids,  historical  photography,  and  other  special  collections  were  not  included   in  the  traditional  catalog.  journal  articles  could  only  be  discovered  through  abstracting  and   indexing  services.  as  these  discovery  tools  began  their  migration  to  electronic  formats,  the   centrality  of  the  library’s  bibliographic  database  was  challenged.   the  development  of  google  and  other  sophisticated  web-­‐based  search  engines  further  eclipsed  the   library’s  bibliographic  database  as  the  first  and  most  important  research  tool.  yet  we  submit  that   the  catalog  database  remains  a  necessary  fixture,  continuing  to  provide  access  to  each  library’s   particular  holdings.  while  the  catalog  may  never  regain  its  pride  of  place  as  the  starting  point  for   all  researchers,  it  still  remains  an  indispensable  tool  for  library  users,  even  if  it  may  be  used  only   at  a  later  stage  in  the  research  process.   at  the  university  of  north  carolina  at  chapel  hill,  we  have  continued  to  invest  in  enhancing  the   utility  of  the  catalog  as  a  valued  tool  for  research.  librarians  initially  reasoned  that  researchers   still  want  to  find  out  what  is  available  to  them  in  their  own  campus  library.  gradually  they  began   to  see  completely  new  possibilities.  to  that  end,  we  have  committed  to  a  program  that  enhances   discovery  and  delivery  through  the  catalog.  while  most  libraries  have  built  a  wide  range  of   discovery  tools  into  their  home  pages—adding  links  to  databases  of  electronic  resources,  article   databases,  and  google  scholar—we  have  continued  to  enhance  both  the  content  to  be  found  in  the   primary  local  bibliographic  database  and  the  services  available  to  students  and  researchers  via  the   interface  to  the  catalog.   in  our  local  consortium,  the  triangle  research  libraries  network  (trln),  librarians  have   deployed  the  search  and  faceting  services  of  endeca  to  enrich  the  discovery  interfaces.  we  have   gone  beyond  augmenting  the  catalog  through  the  addition  of  marcive  records  for  government   documents,  by  including  encoded  archival  description  (ead)  finding  aids  and  selected  (and  ever-­‐ expanding)  digital  collections  that  are  not  easily  discoverable  through  major  search  engines.  we   have  similarly  enhanced  services  related  to  the  discovery  and  delivery  of  items  listed  in  the   bibliographic  database,  including  not  only  common  features  like  the  ability  to  export  citations  in  a   variety  of  formats  but  also  more  extensive  services  such  as  document  delivery,  an  auto-­‐suggest   feature  that  maximizes  use  of  library  of  congress  subject  headings  (lcsh),  and  the  ability  to   submit  cataloged  items  to  be  processed  for  reserve  reading.     information  technology  and  libraries  |  june  2015     7   both  students  and  faculty  have  embraced  e-­‐books,  and  in  adding  more  than  a  million  such  titles  to   the  unc-­‐chapel  hill  catalog  we  continue  to  blend  discovery  and  delivery,  but  now  on  a  very  large   scale.  coupling  catalog  records  with  a  metadata  service  that  provides  book  jackets,  tables  of   contents,  and  content  summaries,  cataloging  geographic  information  systems  (gis)  data  sets,  and   adding  live  links  to  the  finding  aids  for  digitized  archival  and  manuscript  collections  have  further   enhanced  the  blended  discovery/delivery  capacity  of  the  catalog.   we  have  also  leveraged  the  advantages  of  operating  in  a  consortial  environment  by  extending  the   discovery  and  delivery  services  among  the  members  of  trln  to  provide  increased  scope  of   discovery  and  shared  processing  of  some  classes  of  bibliographic  records.  trln  comprises  four   institutions  and  content  from  all  member  libraries  is  discoverable  in  a  combined  catalog   (http://search.trln.org).  printed  material  requested  through  this  combined  catalog  is  often   delivered  between  trln  libraries  within  twenty-­‐four  hours.   at  unc,  our  search  logs  show  that  use  of  the  catalog  increases  as  we  add  new  capacity  and  content.   these  statistics  demonstrate  the  catalog’s  continuing  relevance  as  a  research  tool  that  adds  value   above  and  beyond  conventional  search  engines  and  general  web-­‐based  information  resources.  in   this  article  we  will  describe  the  most  important  enhancements  to  our  catalog,  include  data  from   search  logs  to  demonstrate  usage  changes  resulting  from  these  enhancements,  and  comment  on   potential  future  developments.   literature  review   an  extensive  literature  discusses  the  past  and  future  of  online  catalogs,  and  many  of  these   materials  themselves  include  detailed  literature  reviews.  in  fact,  there  are  so  many  studies,   reviews,  and  editorials,  it  becomes  clear  that  although  the  online  catalog  may  be  in  decline,  it   remains  a  subject  of  lively  interest  to  librarians.  two  important  threads  in  this  literature  report  on   user-­‐query  studies  and  on  other  usability  testing.  though  there  are  many  earlier  studies,  two   relatively  recent  articles  analyze  search  behavior  and  provide  selective  but  helpful  literature   surveys.5     there  are  many  efforts  to  define  directions  for  the  catalog  that  would  make  it  more  web-­‐like,  more   google-­‐like,  and  thus  more  often  chosen  for  search,  discovery,  and  access  by  library  patrons.   these  articles  aim  to  define  the  characteristics  of  the  ideal  catalog.  charles  hildreth  provides  a   benchmark  for  these  efforts  by  dividing  the  history  of  the  online  catalog  into  three  generations.   from  his  projections  of  a  third  generation  grew  the  “next  generation  catalog”—really  the  current   ideal.  he  called  for  improvement  of  the  second-­‐generation  catalog  through  an  enhanced  user-­‐ system  dialog,  automatic  correction  of  search-­‐term  spelling  and  format  errors,  automatic  search   aids,  enriched  subject  metadata  in  the  catalog  record  to  improve  search  results,  and  the   integration  of  periodical  indexes  in  the  catalog.  as  new  technologies  have  made  it  possible  to   achieve  these  goals  in  new  ways,  much  of  what  hildreth  envisioned  has  been  accomplished.6       engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   8   second-­‐generation  catalogs,  anchored  firmly  in  integrated  library  systems,  operated  throughout   most  of  the  1980s  and  the  1990s  without  significant  improvement.  by  the  mid-­‐2000s  the  search   for  the  “next-­‐gen”  catalog  was  in  full  swing,  and  there  are  numerous  articles  that  articulate  the   components  of  an  improved  model.  the  catalog  crossed  a  generational  line  for  good  when  the   north  carolina  state  university  libraries  (ncsu)  launched  a  new  catalog  search  engine  and   interface  with  endeca  in  january  2006.  three  ncsu  authors  published  a  thorough  article   describing  key  catalog  improvements.  their  endeca-­‐enhanced  catalog  fulfilled  the  most  important   criteria  for  a  “next-­‐gen”  catalog:  improved  search  and  retrieval  through  “relevance-­‐ranked  results,   new  browse  capabilities,  and  improved  subject  access.”7     librarians  gradually  concluded  that  the  catalog  need  not  be  written  off  but  would  benefit  from   being  enhanced  and  aligned  with  search  engine  capabilities  and  other  web-­‐like  characteristics.   catalogs  should  contain  more  information  about  titles,  such  as  book  jackets  or  reviews,  than   conventional  bibliographic  records  offered.  catalog  search  should  be  understandable  and  easy  to   use.  additional  relevant  works  should  be  presented  to  the  user  along  with  result  sets.  the   experience  should  be  interactive  and  participatory  and  provide  access  to  a  broad  array  of   resources  such  as  data  and  other  nonbook  content.8     karen  markey,  one  of  the  most  prolific  online  catalog  authors  and  analysts,  writes,  “now  that  the   era  of  mass  digitization  has  begun,  we  have  a  second  chance  at  redesigning  the  online  library   catalog,  getting  it  right,  coaxing  back  old  users  and  attracting  new  ones.”9   marshall  breeding  predicted  characteristics  of  the  next-­‐generation  catalog.  his  list  includes   expanded  scope  of  search,  more  modern  interface  techniques,  such  as  a  single  point  of  entry,   search  result  ranking,  faceted  navigation,  and  “did  you  mean  .  .  .  ?”  capacity,  as  well  as  an  expanded   search  universe  that  includes  the  full  text  of  journal  articles  and  an  array  of  digitized  resources.10     a  concept  that  is  less  represented  in  the  literature  is  that  of  envisioning  the  catalog  as  a   framework  for  service,  although  the  idea  of  the  catalog  designed  to  ensure  customer  self-­‐service   has  been  raised.11  michael  j.  bennett  has  studied  the  effect  of  catalog  enhancements  on  circulation   and  interlibrary  loan.12  service  and  the  online  catalog  have  a  new  meaning  in  morgan’s  idea  of   “services  against  texts,”  supporting  “use  and  understand”  in  addition  to  the  traditional  “find  and   get.”13  lorcan  dempsey  commented  on  the  catalog  as  an  identifiable  service  and  predicts  new   formulations  for  library  services  based  on  the  network-­‐level  orientation  of  search  and  discovery.14   but  the  idea  that  the  catalog  has  moved  from  a  fixed,  inward-­‐focused  tool  to  an  engine  for   services—a  locus  to  be  invested  with  everything  from  unmediated  circulation  renewal  and   ordering  delivery  to  the  “did  you  mean”  search  aid—has  yet  to  be  addressed  comprehensively  in   the  literature.   enhancing  the  traditional  catalog   one  of  the  factors  that  complicates  discussions  of  the  continued  relevance  of  the  library  catalog  to   research  is  the  very  imprecision  of  the  term  in  common  parlance,  especially  when  the  chief  point     information  technology  and  libraries  |  june  2015     9   of  comparison  to  today’s  ils-­‐driven  opacs  is  google  or,  more  specifically,  google  scholar.  from   first-­‐year  writing  assignments  through  advanced  faculty  research,  many  of  the  resources  that  our   patrons  seek  are  published  in  the  periodical  literature,  and  the  library  catalog,  the  one  descended   from  the  cabinets  full  of  cards  that  occupied  prominent  real  estate  in  our  buildings,  has  never  been   an  effective  tool  for  identifying  relevant  periodical  literature.   this  situation  has  changed  in  recent  years  as  products  like  summon,  from  proquest,  and  ebsco   discovery  service  have  introduced  platforms  that  can  accommodate  electronic  article  indexing  as   well  as  marc  records  for  the  types  of  materials—books,  audio,  and  video—that  have  long  been   discovered  through  the  opac.  in  the  following  discussion  of  “catalog”  developments  and   enhancements,  we  focus  initially  not  on  these  integrated  solutions,  but  on  the  catalog  as  more   traditionally  defined.  however,  as  electronic  resources  become  an  ever-­‐greater  percentage  of   library  collections,  we  shall  see  a  convergence  of  these  two  streams  that  will  portend  significant   changes  in  the  nature  and  utility  of  the  catalog.   much  work  has  been  done  in  the  first  decade  of  the  twenty-­‐first  century  to  enhance  discovery   services  and,  as  noted  above,  north  carolina  state  university’s  introduction  of  their  endeca-­‐based   search  engine  and  interface  was  a  significant  game-­‐changer.  in  the  years  following  the   introduction  of  the  endeca  interface  at  ncsu,  the  triangle  research  libraries  network  invested  in   further  development  of  features  that  enhanced  the  utility  of  the  endeca  software  itself.   programmed  enhancements  to  the  interface  provided  additional  services  and  functionality.  in   some  cases,  these  enhancements  were  aimed  at  improving  discovery.  in  others,  they  allowed   researchers  to  make  new  and  better  use  of  the  data  that  they  found  or  made  it  easier  to  obtain  the   documents  that  they  discovered.   faceting  and  limiting  retrieval  results   perhaps  the  most  immediately  striking  innovation  in  the  endeca  interface  was  the  introduction  of   facets.  the  use  of  faceted  browsing  allowed  users  to  parse  the  bibliographic  record  in  new  ways   (and  more  ways)  than  had  preceding  catalogs.  there  were  several  fundamentally  important  ways   faceting  enhanced  search  and  discovery.   the  first  of  these  was  the  formal  recognition  that  keyword  searching  was  the  user’s  default  means   of  interacting  with  the  catalog’s  data.  ncsu’s  initial  implementation  allowed  for  searches  using   several  indexes,  including  authors,  titles,  and  subject  headings,  and  this  functionality  remains  in   place  to  the  present  day.  however,  by  default,  searches  returned  records  containing  the  search   terms  “anywhere”  in  the  record.  this  behavior  was  more  in  line  with  user  expectations  in  an   information  ecosystem  dominated  by  google’s  single  search  box.   the  second  was  the  significantly  different  manner  in  which  multiple  limits  could  be  placed  on  an   initial  result  set  from  such  a  keyword  search.  the  concept  of  limiting  was  not  a  new  one:  certain   facets  worked  in  a  manner  consistent  with  traditional  limits  in  prior  search  interfaces,  allowing   users  to  screen  results  by  language,  or  date  of  publication,  for  example.       engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   10   it  was  the  ease  and  transparency  with  which  multiple  limits  could  be  applied  through  faceting  that   was  revolutionary.  a  user  who  entered  the  keyword  “java”  in  the  search  box  was  quickly  able  to   discriminate  between  the  programming  language  and  the  indonesian  island.  this  could  be   achieved  in  multiple  ways:  by  choosing  between  subjects  (for  example,  “application  software”  vs.   “history”)  or  clearly  labeled  lc  classification  categories  (“q  –  science”  vs.  “d  –  history”).  these   limits,  or  facets,  could  be  toggled  on  and  off,  independently  and  iteratively.   the  third  and  highly  significant  difference  resulted  from  how  library  of  congress  subject   headings  (lcsh)  were  parsed  and  indexed  in  the  system.  by  making  lcsh  subdivisions   independent  elements  of  the  subject-­‐heading  index  in  a  keyword  search,  the  endeca   implementation  unlocked  a  trove  of  metadata  that  had  been  painstakingly  curated  by  catalogers   for  nearly  a  century.  the  user  no  longer  needed  to  be  familiar  with  the  formal  structure  of  subject   headings;  if  the  keywords  appeared  anywhere  in  the  string,  the  subdivisions  in  which  they  were   contained  could  be  surfaced  and  used  as  facets  to  sharpen  the  focus  of  the  search.  this  was   revolutionary.   utilizing  the  power  of  new  indexing  structures   the  liberation  of  bibliographic  data  from  the  structure  of  marc  record  indexes  presaged  yet   another  far-­‐reaching  alteration  in  the  content  of  library  catalogs.  to  this  day,  most  commercial   integrated  library  systems  depend  on  marc  as  the  fundamental  record  structure.  in  ncsu’s   implementation,  the  multiple  indexes  built  from  that  metadata  created  a  new  framework  for   information.     this  change  made  possible  the  integration  of  non-­‐marc  data  with  marc  data,  allowing,  for   example,  dublin  core  (dc)  records  to  be  incorporated  into  the  universe  of  metadata  to  be  indexed,   searched,  and  retrieved.  there  was  no  need  to  crosswalk  dc  to  marc:  it  sufficed  to  simply  assign   the  dc  elements  to  the  appropriate  endeca  indexes.  with  this  capacity  to  integrate  rich  collections   of  locally  described  digital  resources,  the  scope  of  the  traditional  catalog  was  enlarged.   expanding  scopes  and  banishing  silos   at  unc-­‐chapel  hill,  we  began  this  process  of  augmentation  with  selected  collections  of  digital   objects.  these  collections  were  housed  in  a  contentdm  repository  we  had  been  building  for   several  years  at  the  time  of  the  library’s  introduction  of  the  endeca  interface.  image  files,  which   had  not  been  accessible  through  traditional  catalogs,  were  among  the  first  to  be  added.  for   example,  we  had  been  given  a  large  collection  of  illustrated  postcards  featuring  scenes  of  north   carolina  cities  and  towns.  these  postcards  had  been  digitized  and  metadata  describing  the  image   and  the  town  had  been  recorded.  other  collections  of  digitized  historical  photographs  were  also   selected  for  inclusion  in  the  catalog.  these  historical  resources  proved  to  be  a  boon  to  faculty   teaching  local  history  courses  and,  interestingly,  to  students  working  on  digital  projects  for  their   classes.  as  class  assignments  came  to  include  activities  like  creating  maps  enhanced  by  the     information  technology  and  libraries  |  june  2015     11   addition  of  digital  photographs  or  digitized  newspaper  clippings,  the  easy  discovery  of  these   formerly  hidden  collections  enriched  students’  learning  experience.   other  special  collection  materials  had  been  represented  in  the  traditional  catalog  in  somewhat   limited  fashion.  the  most  common  examples  were  manuscripts  collections.  the  processing  of   these  collections  had  always  resulted  in  the  creation  of  finding  aids,  produced  since  the  1930s   using  index  cards  and  typewriters.  during  the  last  years  of  the  twentieth  century,  archivists  began   migrating  many  of  these  finding  aids  to  the  web  using  the  ead  format,  presenting  them  as  simple   html  pages.  these  finding  aids  were  accessible  through  the  catalog  by  means  of  generalized   marc  records  that  described  the  collections  at  a  superficial  level.  however,  once  we  attained  the   ability  to  integrate  the  contents  of  the  finding  aids  themselves  into  the  indexes  underlying  the  new   interface,  this  much  richer  trove  of  keyword-­‐searchable  data  vastly  increased  the  discoverability   and  use  of  these  collections.   during  this  period,  the  library  also  undertook  systematic  digitization  of  many  of  these  manuscript   collections.  whenever  staff  received  a  request  for  duplication  of  an  item  from  a  manuscript   collection  (formerly  photocopies,  but  by  then  primarily  digital  copies),  we  digitized  the  entire   folder  in  which  that  item  was  housed.  we  developed  standards  for  naming  these  digital  surrogates   that  associated  the  individual  image  with  the  finding  aid.  it  then  became  a  simple  matter,  involving   the  addition  of  a  short  javascript  string  to  the  head  of  the  online  finding  aid,  to  dynamically  link   the  digital  objects  to  the  finding  aid  itself.     other  library  collections  likewise  benefited  from  the  new  indexing  structures.  some  uncataloged   materials  traditionally  had  minimal  bibliographic  control  provided  by  inventories  that  were  built   at  the  time  of  accession  in  desktop  database  applications;  funding  constraints  meant  that  full   cataloging  of  these  materials  (often  rare  books)  remained  elusive.  the  ability  to  take  the  data  that   we  had  and  blend  it  into  the  catalog  enhanced  the  discovery  of  these  collections  as  well.   we  also  have  an  extensive  collection  of  video  resources,  including  commercial  and  educational   films.  the  conventions  for  cataloging  these  materials,  held  over  from  the  days  of  catalog  cards,   often  did  not  match  user  expectations  for  search  and  discovery.  there  were  limits  to  the  number   of  added  entries  that  catalogers  would  make  for  directors,  actors,  and  others  associated  with  a   film.  many  records  lacked  the  kind  of  genre  descriptors  that  undergraduates  were  likely  to  use   when  seeking  a  film  for  an  evening’s  entertainment.  to  compensate  for  these  limitations,  staff  who   managed  the  collection  had  again  developed  local  database  applications  that  allowed  for  the   inclusion  of  more  extensive  metadata  and  for  categories  such  as  country  of  origin  or  folksonomic   genres  that  patrons  frequently  indicated  were  desirable  access  points.  once  again,  the  new   indexing  structures  allowed  us  to  incorporate  this  rich  set  of  metadata  into  what  looked  like  the   traditional  catalog.   each  of  the  instances  described  above  represents  what  we  commonly  call  the  destruction  of  silos.   information  about  library  collections  that  had  been  scattered  in  numerous  locations—and  not  all   of  them  online—was  integrated  into  a  single  point  of  discovery.  it  was  our  hope  and  intention  that     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   12   such  integration  would  drive  more  users  to  the  catalog  as  a  discovery  tool  for  the  library’s  diverse   collections  and  not  simply  for  the  traditional  monographic  and  serials  collections  that  had  been   served  by  marc  cataloging.  usage  logs  indicate  that  the  average  number  of  searches  conducted  in   the  catalog  rose  from  approximately  13,000  per  day  in  2009  to  around  19,000  per  day  in  2013.  it   is  impossible  to  tell  with  any  certainty  whether  there  was  heavier  use  of  the  catalog  simply   because  increasingly  varied  resources  came  to  be  represented  in  it,  but  we  firmly  believe  that  the   experience  of  users  who  search  for  material  in  our  catalog  has  become  much  richer  as  a  result  of   these  changes  to  its  structure  and  content.   cooperation  encouraging  creativity   another  way  we  were  able  to  harness  the  power  of  endeca’s  indexing  scheme  involved  the  shared   loading  of  bibliographic  records  for  electronic  resources  to  which  multiple  trln  libraries   provided  access.  trln’s  endeca  indexes  are  built  from  the  records  of  each  member.  each   institution  has  a  “pipeline”  that  feeds  metadata  into  the  combined  trln  index.  duplicate  records   are  rolled  up  into  a  single  display  via  oclc  control  numbers  whenever  possible,  and  the   bibliographic  record  is  annotated  with  holdings  statements  for  the  appropriate  libraries.   we  quickly  realized  that  where  any  of  the  four  institutions  shared  electronic  access  to  materials,  it   was  redundant  to  load  copies  of  each  record  into  the  local  databases  of  each  institution.15  instead,   one  institution  could  take  responsibility  for  a  set  of  records  representing  shared  resources.   examples  of  such  material  include  electronic  government  documents  with  records  provided  by   the  marcive  documents  without  shelves  program,  large  sets  like  early  english  books  online,  and   pbs  videos  streamed  by  the  statewide  services  of  nc  live.   in  practice,  one  institution  takes  responsibility  for  loading,  editing,  and  performing  authority   control  on  a  given  set  of  records.  (for  example,  unc,  as  the  regional  depository,  manages  the   documents  without  shelves  record  set.)  these  records  are  loaded  with  a  special  flag  indicating   that  they  are  part  of  the  shared  records  program.  this  flag  generates  a  holdings  statement  that   reflects  the  availability  of  the  electronic  item  at  each  institution.  the  individual  holdings   statements  contain  the  institution-­‐specific  proxy  server  information  to  enable  and  expedite  access.   in  addition  to  this  distributed  model  of  record  loading  and  maintenance,  we  were  able  to  leverage   oai-­‐pmh  feeds  to  add  selected  resources  to  the  searchtrln  database.  all  four  institutions  have   access  to  the  data  made  available  by  the  inter-­‐university  consortium  for  political  and  social   research  (icpsr).  as  we  do  not  license  these  resources  or  maintain  them  locally,  and  as  records   provided  by  icpsr  can  change  over  time,  we  developed  a  mechanism  to  harvest  the  metadata  and   push  it  through  a  pipeline  directly  into  the  searchtrln  indexes.  none  of  the  member  libraries’   local  databases  house  this  metadata,  but  the  records  are  made  available  to  all  nonetheless.   while  we  were  engaged  in  implementing  these  enhancements,  additional  sources  of  potential   enrichment  of  the  catalog  were  appearing.  in  particular,  vendors  began  providing  indexing   services  for  the  vast  quantities  of  electronic  resources  contained  in  aggregator  databases.     information  technology  and  libraries  |  june  2015     13   additionally,  they  made  it  possible  for  patrons  to  move  seamlessly  from  the  catalog  to  those   electronic  resources  via  openurl  technologies.  indeed,  services  like  proquest’s  summon  or   ebsco’s  discovery  service  might  be  taken  as  another  step  toward  challenging  the  catalog’s   primacy  as  a  discovery  tool  as  they  offered  the  prospect  of  making  local  catalog  records  just  a   fraction  of  a  much  larger  universe  of  bibliographic  information  available  in  a  single,  keyword-­‐ searchable  database.   it  remains  to  be  seen,  therefore,  whether  continuing  to  load  many  kinds  of  marc  records  into  the   local  database  is  an  effective  aid  to  discovery  even  with  the  multiple  delimiting  capabilities  that   endeca  provides.  what  is  certain,  however,  is  that  our  approach  to  indexing  resources  of  any  kind   has  undergone  a  radical  transformation  over  the  past  few  years—a  transformation  that  goes   beyond  the  introduction  of  any  of  the  particular  changes  we  have  discussed  so  far.   promoting  a  culture  of  innovation   one  important  way  endeca  has  changed  our  libraries  is  that  a  culture  of  constant  innovation  has   become  the  norm,  rather  than  the  exception,  for  our  catalog  interface  and  content.  once  we  were   no  longer  subject  to  the  customary  cycle  of  submitting  enhancement  requests  to  an  integrated   library  system  vendor,  hoping  that  fellow  customers  shared  similar  desires,  and  waiting  for  a   response  and,  if  we  were  lucky,  implementation,  we  were  able  to  take  control  of  our  aspirations.   we  had  the  future  of  the  interface  to  our  collections  in  our  own  hands,  and  within  a  few  years  of   the  introduction  of  endeca  by  ncsu,  we  were  routinely  adding  new  features  to  enhance  its   functionality.   one  of  the  first  of  these  enhancements  was  the  addition  of  a  “type-­‐ahead”  or  “auto-­‐suggest”   option.16  inspired  by  google’s  autocomplete  feature,  this  service  suggests  phrases  that  might   match  the  keywords  a  patron  is  typing  into  the  search  box.  ben  pennell,  one  of  the  chief   programmers  working  on  endeca  enhancement  at  unc-­‐chapel  hill,  built  a  solr  index  from  the  ils   author,  title,  and  subject  indexes  and  from  a  log  of  recent  searches.  as  a  patron  typed,  a  drop-­‐ down  box  appeared  below  the  search  box.  the  drop-­‐down  contained  matching  terms  extracted   from  the  solr  index  in  a  matter  of  seconds  or  less.  for  example,  typing  the  letters  “bein”  into  the   box  produced  a  list  including  “being  john  malkovich,”  “nature—effects  of  human  beings  on,”   “human  beings,”  and  “bein,  alex,  1903–1988.”  the  italicized  letters  in  these  examples  are   highlighted  in  a  different  color  in  the  drop-­‐down  display.  in  the  case  of  terms  drawn  directly  from   an  index,  the  index  name  appears,  also  highlighted,  on  the  right  side  of  the  box.  for  example,  the   second  and  third  terms  in  the  examples  above  are  tagged  with  the  term  “subject.”  the  last  example   is  an  “author.”   in  allowing  for  the  textual  mining  of  lcsh,  the  initial  implementation  of  faceting  in  the  endeca   catalog  surfaced  those  headings  for  the  patron  by  uniting  keyword  and  controlled  vocabularies  in   an  unprecedented  manner.  there  was  a  remarkable  and  almost  immediate  increase  in  the  number   of  authority  index  searches  entered  into  the  system.  at  the  end  of  the  fall  semester  prior  to  the   implementation  of  the  auto-­‐suggest  feature,  an  average  of  around  1,400  subject  searches  were     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   14   done  in  a  week.  approximately  one  month  into  the  spring  semester,  that  average  had  risen  to   around  4,000  subject  searches  per  week.  use  of  the  author  and  title  indexes  also  rose,  although   not  quite  as  dramatically.  in  the  perpetual  tug-­‐of-­‐war  between  precision  and  recall,  the  balance   had  decidedly  shifted.   another  service  that  we  provide,  which  is  especially  popular  with  students,  is  the  ability  to   produce  citations  formatted  in  one  of  several  commonly  used  bibliographic  styles,  including  apa,   mla,  and  chicago  (both  author-­‐date  and  note-­‐and-­‐bibliography  formats).  this  functionality,  first   introduced  by  ncsu  and  then  jointly  developed  with  unc  over  the  years  that  followed,  works  in   two  ways.  if  a  patron  finds  a  monographic  title  in  the  catalog,  simply  clicking  on  a  link  labeled  “cite”   produces  a  properly  formatted  citation  that  can  then  be  copied  and  pasted  into  a  document.  the   underlying  technology  also  powers  a  “citation  builder”  function  by  which  a  patron  can  enter  basic   bibliographic  information  for  a  book,  a  chapter  or  essay,  a  newspaper  or  journal  article,  or  a   website  into  a  form,  click  the  “submit”  button,  and  receive  a  citation  in  the  desired  format.     an  additional  example  of  innovation  that  falls  somewhat  outside  the  scope  of  the  changes   discussed  above  was  the  development  of  a  system  that  allowed  for  the  mapping  of  simplified   chinese  characters  to  their  traditional  counterparts.  searching  in  non-­‐roman  character  sets  has   always  offered  a  host  of  challenges  to  library  catalog  users.  the  trln  libraries  have  embraced  the   potential  of  endeca  to  reduce  some  of  these  challenges,  particularly  for  chinese,  through  the   development  of  better  keyword  searching  strategies  and  the  automatic  translation  of  simplified  to   traditional  characters.   since  we  had  complete  control  over  the  endeca  interface,  it  proved  relatively  simple  to  integrate   document  delivery  services  directly  into  the  functionality  of  the  catalog.  rather  than  simply   emailing  a  bibliographic  citation  or  a  call  number  to  themselves,  patrons  could  request  the   delivery  of  library  materials  directly  to  their  campus  addresses.  once  we  had  implemented  this   feature,  we  quickly  moved  to  amplify  its  power.  many  catalogs  offer  a  “shopping  cart”  service  that   allows  patrons  to  compile  lists  of  titles.  one  variation  on  this  concept  that  we  believe  is  unique  to   our  library  is  the  ability  for  a  professor  to  compile  such  a  list  of  materials  held  by  the  libraries  on   campus  and  submit  that  list  directly  to  the  reserve  reading  department,  where  the  books  are   pulled  from  the  shelves  and  placed  on  course-­‐reserve  lists  without  the  professor  needing  to  visit   any  particular  library  branch.  these  new  features,  in  combination  with  other  service   enhancements  such  as  the  delivery  of  physical  documents  to  campus  addresses  from  our  on-­‐ campus  libraries  and  our  remote  storage  facility,  have  increased  the  usefulness  of  the  catalog  as   well  as  our  users’  satisfaction  with  the  library.  we  believe  that  these  changes  have  contributed  to   the  ongoing  vitality  of  the  catalog  and  to  its  continued  importance  to  our  community.   in  december  2012,  the  libraries  adopted  proquest’s  summon  to  provide  enhanced  access  to   article  literature  and  electronic  resources  more  generally.  at  the  start  of  the  following  fall   semester,  the  libraries  instituted  another  major  change  to  our  discovery  and  delivery  services   through  a  combined  single-­‐search  box  on  our  home  page.  this  has  fundamentally  altered  how     information  technology  and  libraries  |  june  2015     15   patrons  interact  with  our  catalog  and  its  associated  resources.  first,  because  we  are  now   searching  both  the  catalog  and  the  summon  index,  the  type-­‐ahead  feature  that  we  had  deployed  to   suggest  index  terms  from  our  local  database  to  users  as  they  entered  search  strings  no  longer   functions  as  an  authority  index  search.  we  have  returned  to  querying  both  databases  through  a   simple  keyword  search.     second,  in  our  implementation  of  the  single  search  interface  we  have  chosen  to  present  the  results   from  our  local  database  and  the  retrievals  from  summon  in  two,  side-­‐by-­‐side  columns.  this  has   the  advantage  of  bringing  article  literature  and  other  resources  indexed  by  summon  directly  to   the  patron’s  attention.  as  a  result,  more  patrons  interact  directly  with  articles,  as  well  as  with   books  in  major  digital  repositories  like  google  books  and  hathitrust.  this  change  has   undoubtedly  led  patrons  to  make  less  in-­‐depth  use  of  the  local  catalog  database,  although  it   preserves  much  of  the  added  functionality  in  terms  of  discovering  our  own  digital  collections  as   well  as  those  resources  whose  cataloging  we  share  with  our  trln  partners.  we  believe  that  the   ease  of  access  to  the  resources  indexed  by  summon  complements  the  enhancements  we  have   made  to  our  local  catalog.   conclusion  and  further  directions   one  might  argue  that  the  integration  of  electronic  resources  into  the  “catalog”  actually  shifts  the   paradigm  more  significantly  than  any  previous  enhancements.  as  the  literature  review  indicates,   much  of  the  conversation  about  enriching  library  catalogs  has  centered  on  improving  the  means   by  which  search  and  discovery  are  conducted.  the  reasonably  direct  linking  to  full  text  that  is  now   possible  has  once  again  radically  shifted  that  conversation,  for  the  catalog  has  come  to  be  seen  not   simply  as  a  discovery  platform  based  on  metadata  but  as  an  integrated  system  for  delivering  the   essential  information  resources  for  which  users  are  searching.   once  the  catalog  is  understood  to  be  a  locus  for  delivering  content  in  addition  to  discovering  it,  the   local  information  ecosystem  can  be  fundamentally  altered.  at  unc-­‐chapel  hill  we  have  engaged  in   a  process  whereby  the  catalog,  central  to  the  library’s  web  presence  (given  the  prominence  of  the   single  search  box  on  the  home  page),  has  become  a  hub  from  which  many  other  services  are   delivered.  the  most  obvious  of  these,  perhaps,  is  a  system  for  the  delivery  of  physical  documents   that  is  analogous  to  the  ability  to  retrieve  the  full  text  of  electronic  documents.  if  an  information   source  is  discovered  that  exists  in  the  library  only  in  physical  form,  enhancements  to  the  display  of   the  catalog  record  facilitate  the  receipt  by  the  user  of  the  print  book  or  a  scanned  copy  of  an  article   from  a  bound  journal  in  the  stacks.     in  2013,  ithaka  s+r  conducted  a  local  unc  faculty  survey.  the  survey  posed  three  questions   related  to  the  catalog.  in  response  to  the  question,  “typically  when  you  are  conducting  academic   research,  which  of  these  four  starting  points  do  you  use  to  begin  locating  information  for  your   research?,”  41  percent  chose  “a  specific  electronic  research  resource/computer  database.”  nearly   one-­‐third  (30  percent)  chose  “your  online  library  catalog.”17     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   16   when  asked,  “when  you  try  to  locate  a  specific  piece  of  secondary  scholarly  literature  that  you   already  know  about  but  do  not  have  in  hand,  how  do  you  most  often  begin  your  process?,”  41   percent  chose  the  library’s  website  or  online  catalog,  and  40  percent  chose  “search  on  a  specific   scholarly  database  or  search  engine.”  in  response  to  the  question,  “how  important  is  it  that  the   library  .  .  .  serves  as  a  starting  point  or  ‘gateway’  for  locating  information  for  my  research?,”  78   percent  answered  extremely  important.     on  several  questions,  ithaka  provided  the  scores  for  an  aggregation  of  unc’s  peer  libraries.  for   the  first  question  (the  starting  point  for  locating  information),  18  percent  of  national  peers  chose   the  online  catalog  compared  to  30  percent  at  unc.  on  the  importance  of  the  library  as  gateway,  61   percent  of  national  peers  answered  very  important  compared  to  the  78  percent  at  unc.   in  2014,  the  unc  libraries  were  among  a  handful  of  academic  research  libraries  that  implemented   a  new  ithaka  student  survey.  though  we  don’t  have  national  benchmarks,  we  can  compare  our   own  student  and  faculty  responses.  among  graduate  students,  31  percent  chose  the  online  catalog   as  the  starting  point  for  their  research,  similar  to  the  faculty.18  of  the  undergraduate  students,  33   percent  chose  the  library’s  website,  which  provides  access  to  the  catalog  through  a  single  search   box.19   a  finding  that  approximately  a  third  of  students  began  their  search  on  the  unc  library  website   was  gratifying.  oclc’s  perceptions  of  libraries  2010  reported  survey  results  regarding  where   people  start  their  information  searches.  in  2005,  1  percent  said  they  started  on  a  library  website;   in  2010,  not  a  single  respondent  indicated  doing  so.20     the  gross  disparity  between  the  oclc  reports  and  the  ithaka  surveys  of  our  faculty  and  students   requires  some  explanation.  the  libraries  at  the  university  of  north  carolina  at  chapel  hill  are   proud  of  a  long  tradition  of  ardent  and  vocal  support  from  the  faculty,  and  we  are  not  surprised  to   learn  that  students  share  their  loyalty.  for  us,  the  recently  completed  ithaka  surveys  point  out   directions  for  further  investigation  into  our  patrons’  use  of  our  catalog  and  why  they  feel  it  is  so   critical  to  their  research.   anecdotal  reports  indicate  that  one  of  the  most  highly  valued  services  that  the  libraries  provide  is   delivery  of  physical  materials  to  campus  addresses.  some  faculty  admit  with  a  certain  degree  of   diffidence  that  our  services  have  made  it  almost  unnecessary  to  set  foot  in  our  buildings;  that  is  a   trend  that  has  also  been  echoed  in  conversations  with  our  peers.  yet  the  online  presence  of  the   library  and  its  collections  continues  to  be  of  significant  importance—perhaps  precisely  because  it   offers  an  effective  gateway  to  a  wide  range  of  materials  and  services.   we  believe  that  the  radical  redesign  of  the  online  public  access  catalog  initiated  by  north  carolina   state  university  in  2006  marked  a  sea  change  in  interface  design  and  discovery  services  for  that   venerable  library  service.  without  a  doubt,  continued  innovation  has  enhanced  discovery.   however,  we  have  come  to  realize  that  discovery  is  only  one  function  that  the  online  catalog  can   and  should  serve  today.  equally  if  not  more  important  is  the  delivery  of  information  to  the     information  technology  and  libraries  |  june  2015     17   patron’s  home  or  office.  the  integration  of  discovery  and  delivery  is  what  sets  the  “next-­‐gen”   catalog  apart  from  its  predecessors,  and  we  must  strive  to  keep  that  orientation  in  mind,  not  only   as  we  continue  to  enhance  the  catalog  and  its  services,  but  as  we  ponder  the  role  of  the  library  as   place  in  the  coming  years.  far  from  being  in  decline,  the  online  catalog  continues  to  be  an  “engine   of  innovation”  (to  borrow  a  phrase  from  holden  thorp,  former  chancellor  of  unc-­‐chapel  hill)  and   a  source  of  new  challenges  for  our  libraries  and  our  profession.   references     1.     cathy  de  rosa  et  al.,  perceptions  of  libraries  and  information  resources:  a  report  to  the  oclc   membership  (dublin,  oh:  oclc  online  computer  library  center,  2005),  1–17,   https://www.oclc.org/en-­‐us/reports/2005perceptions.html.   2.     karen  calhoun,  the  changing  nature  of  the  catalog  and  its  integration  with  other  discovery   tools,  final  report,  prepared  for  the  library  of  congress  (ithaca,  ny:  k.  calhoun,  2006),  5,   http://www.loc.gov/catdir/calhoun-­‐report-­‐final.pdf.   3.     roger  c.  schonfeld  and  kevin  m.  guthrie,  “the  changing  information  services  needs  of   faculty,”  educause  review  42,  no.  4  (july/august  2007):  8,   http://www.educause.edu/ero/article/changing-­‐information-­‐services-­‐needs-­‐faculty.   4.     ross  housewright  and  roger  schonfeld,  ithaka’s  2006  studies  of  key  stakeholders  in  the  digital   transformation  in  higher  education  (new  york:  ithaka  s+r,  2008),  6,   http://www.sr.ithaka.org/sites/default/files/reports/ithakas_2006_studies_stakeholders_di gital_transformation_higher_education.pdf.   5.     xi  niu  and  bradley  m.  hemminger,  “beyond  text  querying  and  ranking  list:  how  people  are   searching  through  faceted  catalogs  in  two  library  environments,”  proceedings  of  the   american  society  for  information  science  &  technology  47,  no.  1  (2010):  1–9,   http://dx.doi.org/10.1002/meet.14504701294;  and  cory  lown,  tito  sierra,  and  josh  boyer,   “how  users  search  the  library  from  a  single  search  box,”  college  &  research  libraries  74,  no.   3  (2013):  227–41,  http://crl.acrl.org/content/74/3/227.full.pdf.     6.     charles  r.  hildreth,  “beyond  boolean;  designing  the  next  generation  of  online  catalogs,”   library  trends  (spring  1987):  647–67,  http://hdl.handle.net/2142/7500.   7.     kristen  antelman,  emily  lynema,  and  andrew  k.  pace,  “toward  a  twenty-­‐first  century   library  catalog,”  information  technology  and  libraries  25,  no.  3  (2006):  129,   http://dx.doi.org/10.6017/ital.v25i3.3342.   8.     karen  coyle,  “the  library  catalog:  some  possible  futures,”  journal  of  academic  librarianship   33,  no.  3  (2007):  415–16,  http://dx.doi.org/10.1016/j.acalib.2007.03.001.   9.     karen  markey,  “the  online  library  catalog:  paradise  lost  and  paradise  regained?”  d-­‐lib   magazine  13,  no.  1/2  (2007):  2,  http://dx.doi.org/10.1045/january2007-­‐markey.     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   18     10.    marshall  breeding,  “next-­‐gen  library  catalogs,”  library  technology  reports  (july/august   2007):  10–13.   11.    jia  mi  and  cathy  weng,  “revitalizing  the  library  opac:  interface,  searching,  and  display   challenges,”  information  technology  and  libraries  27,  no.  1  (2008):  17–18,   http://dx.doi.org/10.6017/ital.v27i1.3259.   12.    michael  j.  bennett,  “opac  design  enhancements  and  their  effects  on  circulation  and   resource  sharing  within  the  library  consortium  environment,”  information  technology  and   libraries  26,  no.  1  (2007):  36–46,  http://dx.doi.org/10.6017/ital.v26i1.3287.   13.    eric  lease  morgan,  “use  and  understand;  the  inclusion  of  services  against  texts  in  library   catalogs  and  discovery  systems,”  library  hi  tech  (2012):  35–59,   http://dx.doi.org/10.1108/07378831211213201.   14.    lorcan  dempsey,  “thirteen  ways  of  looking  at  libraries,  discovery,  and  the  catalog:  scale,   workflow,  attention,”  educause  review  online  (december  10,  2012),   http://www.educause.edu/ero/article/thirteen-­‐ways-­‐looking-­‐libraries-­‐discovery-­‐and-­‐ catalog-­‐scale-­‐workflow-­‐attention.   15.    charles  pennell,  natalie  sommerville,  and  derek  a.  rodriguez,  “shared  resources,  shared   records:  letting  go  of  local  metadata  hosting  within  a  consortium  environment,”  library   resources  &  technical  services  57,  no.  4  (2013):  227–38,   http://journals.ala.org/lrts/article/view/5586.   16.    benjamin  pennell  and  jill  sexton,  “implementing  a  real-­‐time  suggestion  service  in  a  library   discovery  layer,”  code4lib  journal  10  (2010),  http://journal.code4lib.org/articles/3022.     17.    ithaka  s+r,  unc  chapel  hill  faculty  survey:  report  of  findings  (unpublished  report  to  the   university  of  north  carolina  at  chapel  hill,  2013),  questions  20,  21,  33.   18.    ithaka  s+r,  unc  chapel  hill  graduate  student  survey:  report  of  findings  (unpublished  report   to  the  university  of  north  carolina  at  chapel  hill,  2014),  47.   19.    ithaka  s+r,  unc  chapel  hill  undergraduate  student  survey:  report  of  findings  (unpublished   report  to  the  university  of  north  carolina  at  chapel  hill,  2014),  39.   20.    cathy  de  rosa  et  al.,  perceptions  of  libraries,  2010:  context  and  community:  a  report  to  the   oclc  membership  (dublin,  oh:  oclc  online  computer  library  center,  2011),  32,   http://oclc.org/content/dam/oclc/reports/2010perceptions/2010perceptions_all.pdf.     editorial board thoughts: a considerable technology asset that has little to do with technology mark dehmlow information technology and libraries | march 2014 4 for this issue’s editorial, i thought i would set aside the trendy topics like discovery, the clo ud, and open . . . well, everything—source, data, science—and instead focus on an area that i think has more long-term implications for technologists and libraries. for technologists in libraries, probably any industry really, i believe our most important challenges aren’t technical at all. for the average “techie,” even if an issue is complex, it is often finite and ultimately traceable to a root cause—the programmer left off a semi-colon in a line of code, the support person forgot to plug in the network cable, or the systems administrator had a server choke after a critical kernel error. debugging people issues, on the other hand, is much less reductive. people are nothing but variables who respond to conflict with emotion and can become entrenched in their perspectives (right or wrong). at a minimum, people are unpredictable. the skill set to navigate people and personalities requires patience, flexibility, seeing the importance of the relationship through the 1s and 0s, and often developing mutual trust. working with technology benefits from one’s intelligence (iq), but working with people requires a deeper connection to perception, self-awareness, body language, and emotions, all parts of emotional intelligence (eq). eq is relevant to all areas of life and work, but i think particularly relevant to technology workers. of particular importance are eq traits related to emotional regulation, self-awareness, and the ability to pick up social queues. my primary reasoning for this is that technology is (1) fairly opaque to people outside of technology areas and (2) technology is driving so much of the rapid change we are experiencing in libraries. it units in traditional organizations have a significant challenge because many root issues in technology are not well understood, and change is uncomfortable for most, so it is easy to resent technology for being such a strong catalyst for change. as a result, it is becoming more incumbent upon us in technology to not only instantiate change in our organizations but also to help manage that change through clear communication, clear expectation setting, defining reasonable timeframes that accommodate individuals’ needs to adapt to change, a commitment to shift behavior through influence, and just plain old really good listening. i would like to issue a bit of a challenge to technology managers as you are making hiring decisions. if you want the best possible working relationships with other functional areas in the library, especially traditional areas, spend time evaluating candidates for soft skills like a relaxed demeanor; patience; clear, but not condescending, communication; and a personal commitment to mark dehmlow (mdehmlow@nd.edu), a member of lita and the ital editorial board, is director, information technology program, hesburgh libraries, university of notre dame, south bend, indiana. editorial board thoughts: a considerable technology asset | dehmlow 5 serving others. these skills are very hard to teach. they can be developed if one is committed to developing them, but more often than not, they are innate. if a candidate has those traits as a base but also has an aptitude for understanding technology, that individual will likely be the kind of employee people will want to keep, certainly much more so than someone who has incredible technical skill but little social intelligence. for those who are interested in developing their eq, there are many of tools available—a million management books on team building, servant leadership, influencing coworkers, providing excellent service, etc. personally, i have found that developing a better sense of self-awareness is one of the best ways to increase one’s eq. tests such as the meyers briggs type indicator ,1 the strategic leadership type indicator ,2 and the disc,3 which categorize your personality and work-style traits, can be very effective tools for understanding how you approach your work and how your work style may affect your peers. combined with a willingness to flex your style based on the personalities of your coworkers, these can be very powerful tools for influencing outcomes. most importantly, i have found putting the importance of the relationship above the task or goal can make a remarkable difference in cultivating trust and collaboration. self-awareness and flexible approaches not only have the opportunity to improve internal relationships between technology and traditional functional areas of the library, but between techies and end users. we are using technology in many new creative ways to support end users, meaning techies are more and more likely to have direct contact with users. in many ways, our reputation as a committed service profession will be affected by out tech staffs’ ability to interact well with end users, and ultimately, i believe the proportion of our tech staff that have a high eq could be one the strongest predictor s of the long-term success for technology teams in libraries. references 1. “my mbti personality type,” the myers briggs foundation, http://www.myersbriggs.org/mymbti-personality-type/mbti-basics. 2. “strategic leadership type indicator —leader’s self assessment,” hrd press, http://www.hrdpress.com/slti. 3. “remember that boss who you just couldn’t get through to? we know why…and we can help,” everything disc, http://www.everythingdisc.com/disc-personality-assessment-about.aspx. http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.hrdpress.com/slti http://www.everythingdisc.com/disc-personality-assessment-about.aspx a s i approach the end of my tenure as ital edi­ tor, i reflect on the many lita members who have not submitted articles for possible publica­ tion in our journal. i am especially mindful of the smaller number who have promised or hinted or implied that they intended to or might submit articles. admittedly, some of them may have done so because i asked them, and their replies to me were the polite ones that one expects of the honorable members of the library and information technology association of the american library association. librarians are as individuals almost all or almost always polite in their professional discourse. pondering these potential authors, particularly the smaller number, i conjured a mental picture of a fictional, male, potential ital author. i don’t know why my fic­ tional potential author was male—it may be because more males than females are members of that group; it may be because i’m a male; or it may be unconscious sex­ ism. i’m not very self­analytic. my mental picture of this fictional male potential author saw him driving home from his place of employ­ ment after having an after­work half gallon of rum when, into the picture, a rattlesnake crawled on to the seat of his car and bit him on the scrotum. lucky him: he was, after all, a figment of my imagina­ tion. (any resemblance between my fictional author and a real potential author is purely coincidental.) lucky me: we all know that such an incident is not unthinkable in library land. lucky lita: it is unlikely that any member will cancel his or her membership or any subscriber, his, her, or its subscription because the technical term “scro­ tum” found its way into my editorial. ital is, after all, a technology journal, and members and readers ought to be offended if our journal abjures technical terminology. likewise they should be offended if our articles discuss library technology issues misusing technical terms or concepts, or confusing technical issues with policy issues, or stating technology problems or issues in the title or abstract or introduction then omitting any mention of said problems until the final paragraph(s). ital referees are quite diligent in questioning authors when they think terminology has been used loosely. their close readings of manuscripts have caught more than one author mislabeling policies related to the uses of informa­ tion technologies as if the policies were themselves tech­ nical conundrums. most commonly, they have required authors who state major theses or technology problems at the beginnings of their manuscripts, then all but ignore these until the final paragraphs, to rewrite sections of their manuscripts to emphasize the often interesting questions raised at the outset. what, pray tell, is the editor trying to communicate to readers? two things, primarily. first, i have been following with interest the several heated discussions that have taken place on lita­l for the past number of months. sometimes, the idea of the traditional quarterly scholarly/professional journal in a field changing so rapidly may seem almost quaint. a typical ital article is five months old when it is pub­ lished. a typical discussion thread on lita­l happens in “real time” and lasts two days at most. a small number of participants raise and “solve” an issue in less than a half dozen posts. a few times, however, a question asked or a comment posted by a lita member has led to a flurry of irrelevant postings, or, possibly worse, sustained bomb­ ing runs from at least two opposing camps that have left some members begging to be removed from the list until the all clear signal has been sounded. i’ve read all of these, and i could not help but won­ der, what if ital accepted manuscripts as short as lita­l postings? what would our referees do? i suspect, for our readers’ sakes, most would be rejected. authors whose manuscripts are rejected receive the comments made by the referees and me explaining why we cannot accept their submissions. the most frequent reason is that they are out of scope, irrelevant to the purposes of lita. when someone posts a technology question to lita­l that gener­ ates responses advising the questioner that implementing the technology in question is bad policy, the responses are, from an editor’s point of view, out of scope. how many lita members have authority—real authority—to set policy for their libraries? a second “popular” reason for rejections is that the manuscripts pose “false” problems that may be technological but that are not technologies that are within the “control” of libraries. these are out of scope in a different manner. third, some manuscripts do not pass the “so what” test. some days i wish that lita­l responders would referee, honestly, their own responses for their relevance to the questions or issues or so­whatness and to the membership. second, and more importantly to me, lita members, whether or not your bodies include the part that we all have come to know and defend, do you have the “­” to send your ital editor a manuscript to be chewed upon not by rattlesnakes but by the skilled professionals who are your ital editorial board members and referees? i hope (and do i dare beg again?) so. your journal will not suffer quaintness unless you make it so. editorial: the virtues of deliberation john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial | webb 3 36 information technology and libraries | march 200736 information technology and libraries | march 2007 author id box for 2 column layout opac design enhancements and their effects on circulation and resource sharing within the library consortium environment michael j. bennett a longitudinal study of three discrete online public access catalog (opac) design enhancements examined the possible effects such changes may have on circulation and resource sharing within the automated library consortium environment. statistical comparisons were made of both circulation and interlibrary loan (ill) figures from the year before enhancement to the year after implementation. data from sixteen libraries covering a seven-year period were studied in order to determine the degree to which patrons may or may not utilize increasingly broader opac ill options over time. results indicated that while ill totals increased significantly after each opac enhancement, such gains did not result in significant corresponding changes in total circulation. m ost previous studies of online public access catalog (opac) use and design have centered on transaction­log analysis and user survey results in the academic library environment. measures of patron success or lack thereof have traditionally been expressed in the form of such concepts as “zero­hit” analysis or the “branching” analysis of kantor and, later, ciliberti.1 missing from the majority of the literature on opac study, however, are the effects that use and design have had on public library patron borrowing practices. major drawbacks to transaction­log analyses and user surveys as a measure of successful opac use include a lack of standardization and the inherent difficulties in interpreting resulting data. as peters notes, “[s]urveys measure users’ opinions about online catalogs and their perceptions of their successes or failures when using them, while transaction logs simply record the searches conducted by users. surveys,” he concludes, “mea­ sure attitudes, while transaction logs measure a specific form of behavior.”2 in both cases it is difficult, in many instances, to draw clear conclusions from either method. circulation figures, on the other hand, measure a more narrowly defined level of patron success. circulation is a discrete output that is the direct result of patrons’ initiated interaction with one or many library collections, one or many levels of library technology. with the recent advent of such enhanced opac functionality as patron­placed holds on items from broader and broader catalogs, online catalogs now more than ever not only serve as search mechanisms but also as ways for patrons to directly obtain materials from multiple sources. it follows that an investigation of the possible effects such enhancements may have on general circulation trends is warranted. ■ literature review during the mid­to­late 1980s, transaction­log analysis was introduced as an inexpensive and easy method of looking at opac use in primarily the academic library environment. peters’s transaction­log survey of more than thirteen thousand searches executed over a five­ month period at the university of missouri­kansas city remains particularly instructive today for its large sample and transferable design as well as its interpreta­ tion of results.3 here analysis was broken into two phases. in phase one, usage patterns by search type and failure rates as measured by zero hits were examined as dependent vari­ ables with search type as the independent variable in a comparison study. phase two took this one step further in the assigning of what peters termed “probable cause” of zero hits. these probable causes fell into patterns that, in turn, resulted in the identification of fourteen discernable error types that included such things as typographical errors and searches for items not in the catalog. once again, search type formed the independent variable while error type shaped the dependent variable in a simple study of error types as a percentage of total searches. peters found that users rarely employed truncation or any advanced feature searches and that failures were due primarily to such consistent erroneous search patterns as typographical errors and misspellings. more importantly, however, he cogently reassessed transaction­log analysis as a tool and critiqued its limitations. zero hits, for exam­ ple, need not necessarily construe failure when a patron performs a quality search and finds that the library simply does not own the title in question. concerning intelligible outputs from transaction­log study, peters found that, “if the user is seen as carrying on a dialog of sorts with the online catalog, then it could be said that most transaction logs record only half of the conversa­ tion. more information about the system’s response to the user’s queries would help us better understand why patrons do what they do.”4 a look at subsequent transaction­log analyses into the 1990s reveals somewhat differing research approaches yet strikingly similar results. wallace (1993) duplicated peters’s methods at eleven terminals within the university of colorado library system.5 her efforts spanned twenty hours of search monitoring and resulted in 4,134 logged searches. these were defined by carl system search type, (e.g., word, subject), then analyzed as cumulative totals and percentages of all searches. in this case, how­ michael j. bennett michael j. bennett (mbennett@cwmars.org) is digital initiatives librarian, c/w mars library network, worcester, massachusetts. article title | author 37opac design enhancements | bennett 37 ever, failed searches (peters’s zero hits) were eliminated entirely from the sample as wallace focused primarily on patterns of completed searches and did not concern her­ self with questions of search success or failure, thus limit­ ing the scope of her findings. among searches analyzed, results were comparable to peters’s.6 in keeping with peters’s line of thinking, wallace remarked, intriguing vagaries in human behavior during an infor­ mation search process continue to stymie researchers’ efforts to understand that process. . . . current, widely used and described guidelines, rules and principles of searching simply do not take into account important aspects of what is really going on when an individual is using a computer to search for information.7 in 1998, ciliberti et al. conducted a materials avail­ ability study of 441 opac searches at adelphi university over a three­week period during fall semester.8 their work combined kantor’s branching­analysis methodol­ ogy with transaction­log analysis of opac use in order to better understand if users obtain the materials they need through the online catalog.9 sampling was accom­ plished during random open hours and drew informa­ tion from undergraduate, graduate, and faculty users. survey forms included questions of what patrons were searching for. forms were then picked randomly by staff for re­creation. the study was unclear as to the actual design of these forms and their queries. as a result their effectiveness remains questionable. a seven­category scheme was developed to code search failures that closely followed kantor’s branching analysis, where the concept of errors extends beyond just opac and its design to include such things as library collection devel­ opment and circulation practices.10 the survey itself along with the loss of accuracy that can be expected from patrons attempting to describe their searches on paper, then having these same searches re­created by research staff lead this author to question the data’s validity. as peters has noted, surveys are good for assessing opac users’ opinions but not necessarily their behavior.11 it would seem that in this instance the tool did not fit the task. this study did, however, use transaction logs after the initial survey analysis and indeed found discrepancies between the self­report (survey) and actual transaction­log data. search errors were subsequently categorized as pre­ viously described.12 though branching analysis is adept at examining on a holistic, entire­library scale (e.g., the ques­ tion of why patrons are not able to obtain materials), the method’s inherent breadth of focus does not lend itself to fine scrutiny of opac design issues in and of themselves. further refinement of the transaction­log analysis methodology may be seen in blecic’s et al. four­year longi­ tudinal study of opac use within the university of illinois library system.13 once again, failed searches, termed “zero postings” by the authors, were examined as dependent variables and percentages of the total number of searches and were used as a control. reasons for zero postings (e.g., searches missing search statements, author names entered in incorrect order) fell into seven separate catego­ ries. subsequent transaction­log sets were then culled after three incremental opac enhancements. enhancements included redesigns of general introductory and explain screens. z­test analysis of the level of equality between percentages of zero postings from log set to log set was then made in order to assess whether or not the enhance­ ments had any affect on diminishing said percentages and thus improving searching behavior. what blecic et al. found was temporary improve­ ment in patron searches followed by an unexpected lowering of patron performance over time. confounding attributes to the study include its longitudinal nature in an academic setting where user groups are not constant but variable. sadly, no attempt at tracking such possible changes in user populations was made. also of note was the fact that, as time passed, the command­based opac was increasingly being surrounded by web­based journal database search interfaces that did not require the use of sophisticated search statements and arguments. as users became accustomed to this type of searching, their com­ mand syntax skills may have suffered as a result.14 merits of the study include its straightforward design, logical data analysis, and plausible conclusions. longitudinal studies, though prone to the confound­ ing variables described, nevertheless form a persuasive template for further research into how incremental opac enhancements affect actual opac use over time. variations of transaction­log analysis also include the purely experimental. thomas’s 2001 simulation study of eighty­two first­year undergraduates at the university of pittsburg utilized four separate experimental screen inter­ faces.15 these interfaces included one that mimicked the current catalog with data labels and brief bibliographic displays, a second interface with the same bibliographic display but no data labels, and a third that contained the data labels but modified the brief display to include more subject­oriented fields. a fourth interface viewed the same brief displays as the third group but with the labels removed. users were pretested for basic demographic informa­ tion and randomly assigned to one of the four experi­ mental interface groups. each group was then given the same two search tasks. for the first task, users were asked to select items that they would examine further for a hypothetical research paper on big­band music and the music of duke ellington. the second task involved asking participants to examine twenty bibliographic records and to decide whether they would choose to look into these records further. participants were then asked to identify the data elements used to inform their 38 information technology and libraries | march 200738 information technology and libraries | march 2007 relevance choices. resulting user behavior was subse­ quently tracked through transaction logs. for thomas’s experimental purposes, though, trans­ action logs took on a higher level of sophistication than in earlier comparative studies. here participants’ actions were monitored with a greater level of granularity. quantitative data were tracked for screens visited, time spent viewing them, total number of screens, total number of bibliographic citations examined at each level of speci­ ficity, and total time it took to complete the task. because of the obtrusive nature of the project, a third party was hired to administer the experiment. chi­square analysis of demographic data found no significance among partici­ pant groups in terms of their experience in using comput­ ers, online catalogs, or prior knowledge of the problem topic. this important analysis allowed the researchers a higher level of confidence in their subsequent findings. results in many instances were, however, inconclu­ sive. factors impairing the clarity of conclusions included the number of variables analyzed and the artificiality of the test design itself. thomas comments on one particular example of this: one of the fields that previous researchers said that library users found important was the call number field. obviously, without the call number, locating the actual item on the shelf is greatly complicated. in this experi­ ment, however, participants were not asked to retrieve the items they selected; thus, their perceived need for the call number may well have been mitigated.16 here is further evidence that a study of opac activity viewed in the context of actual outcomes, namely circula­ tion, is a logical approach to consider. most recently, graham at the university of lethbridge, alberta, examined opac subject searching and no­ hit results and considered two possible experimental enhancement types in order to allow users the ability to conduct more accurate searches.17 over a one­week period, 1,521 no­hit subject searches were first sampled and placed into nine categories by error type. subtotals were then expressed as percentage distributions of the total. a similar examination of 37,987 no­hit findings was also made over the course of four calendar years, form­ ing a longitudinal approach. percent distribution of error types from the two studies were then compared and were found to be similar with “non­library of congress subject headings” being the predominant area of concern. graham then attempted to improve subject searching by systematically enhancing the catalog in two ways. first, cross­references were created based upon the original no­ hit search term and linked to existing library of congress subject headings (lcshs) that graham interpreted as appropriate to the searcher’s original intentions. second, in instances where the original search could not be easily linked to an existing lcsh, a pathfinder record was cre­ ated that suggested alternate search strategies. all total, 10,520 new authority records and 2,312 pathfinder records were created over the course of the longitudinal study.18 the experiment, unfortunately, only went this far. no attempt was subsequently made to test whether these two methods of adding value to an existing opac search interface made a difference in users’ experiences. though creative in its suggested ameliorations to no­hit searches, the study also lacked any statistical testing of comparative data among sample years. possible problematic design issues, such as the relative complexity of pathfinders and how this might affect their end use were discussed but never tested through the analysis of real outcomes. in summary, major weaknesses of the transaction­log analysis model as demonstrated through the literature include: 1. lack of standardization among general study methodologies. 2. lack of standardization of opacs themselves: command structure and screen layout differ among software vendors. 3. lack of standards on measurable levels of search “success” or “failure.” while the following study of opac design enhance­ ments in the public library consortium environment did not directly address the first two points of emphasis, it was this author’s expectation that the lack of stan­ dardized notions of opac search success or failure found throughout the literature may be better addressed through a longitudinal analysis of discrete circulation and ill statistics. in this way, these quantifiable outcomes, both the direct results of patron initiation, would better assume clearer measures of patron success or failure in opac end use. ■ purpose and methodology in recent years, both academic and public libraries have invested substantial capital in improving opac design and automated systems. to what extent have these improvements affected the use of library materials by public library patrons? in order to better examine the question, this study tracked, over a seven­year period dating back from july 1998 through june 2005, the circulation and systemwide holds statistical trends of sixteen member libraries of c/ w mars, a massachusetts automated library network of 140 libraries. during this time a number of discrete, incre­ mental opac modifications granted patrons the ability to accomplish tasks remotely through the opac that previ­ ously had required library staff mediation. among these article title | author 3�opac design enhancements | bennett 3� changes, the initiation of intra­consortium (c/w mars) patron­placed holds, and the subsequent introduction of a link from the existing opac to the massachusetts virtual catalog (nine massachusetts consortiums, four university of massachusetts system libraries) were examined. this author hypothesized that such opac enhance­ ments that allow for broader choices of patron­placed holds would result in increases in both total circulation and total network transfers (ill) of library materials one year after initial enhancement adoption. as both total cir­ culation and total ill grew, it was hypothesized that ill as a percent of total circulation would likewise increase due to the fact that each opac enhancement was targeted directly toward facets of ill procurement. opac enhancements followed the schedule below: 1. general c/w mars network systemwide holds (requests mediated through library staff only), november 2000 2. patron­placed holds (request button placed on c/ w mars opac screens), december 2002 3. c/w mars participation in the massachusetts virtual catalog (additional button for pass through opac searches and requests from c/w mars catalog into the massachusetts virtual catalog), august 2004 these dates served as independent variables in a study of separate dependent variables (total circulation and total ills received) for all eight libraries one year after initial adoption of a new enhancement. for the sake of continu­ ity the terms holds and ills were used interchangeably throughout this examination. t­test comparisons to fig­ ures from the year prior to enhancement were then made for statistical significance. in addition, ills received as a percentage of total circulation (dependent variable) for all fifteen libraries one year after initial adoption of a new enhancement were also calculated and compared to the year prior to enhancement through z­test analysis. libraries chosen were a random sample from both central and western geographic regions of the network. sampled institutions did not go through any substantial renovations, drastic open hours changes, or closures dur­ ing the study period in order to better avoid potential con­ founding variables that may have skewed the resulting data. raw circulation and ill figures were taken directly from the massachusetts board of library commissioners’ (mblc) data files for fiscal years 1999 through 2004.19 in the mblc’s data files, the following fields, sorted by library, correlated to this study’s statistical reporting: “dircirc” = “circulation” “loan from” = “ill” as fiscal year (fy) 2005 figures for circulation and ill had not yet been compiled by mblc at the time of this writing, these statistics were in turn taken directly from reports run off of c/w mars’s network servers. it should be noted that similar c/w mars reports are distributed and used by the consortium’s libraries them­ selves each fiscal year for reporting circulation and ill statistics to mblc. raw data by library were entered into microsoft excel spreadsheets. totals for circulation and ills received for all libraries by fy of opac enhancement were totaled and then compared to fy data prior to enhancement as a percent change value. excel’s data analysis tools were then employed to run t­tests (paired two sample for means) in tables 1 through 5 to analyze the level of change for significance from one sample to the next in both total circulation and total ills. (all tables and charts can be found in appendix following article.) tests for sig­ nificance employed two­tailed t­tests with an alpha level set to .05. raw data for these same libraries across identical study years were also entered into subsequent spread­ sheets (tables 6 through 10) for additional z­tests (two samples for means) to analyze the level of change for significance from one fy sample to the next in ills received as a percentage of total circulation. here tests for significance employed two­tailed z­tests with an alpha level set to .05. ■ results and discussion the results of a sixteen­library, seven­year longitudinal study of total circulation and total ills­received statistics are outlined in tables 1 through 5, charts 1 through 10. in addition, an analysis of ills received as a percentage of total circulation during this same time period among sampled libraries is represented in tables 6 through 10. over the course of the study a total of 22,277,245 circula­ tion and 624,286 ill transactions were examined from july 1998 through june 2005. yearly comparisons in total circulation and total ills received from fy ’99 to fy ’00 were made to analyze the level of changes in circulation and ill statistics between years before any opac ill enhancements were under­ taken. as such these numbers gave insight into what changes, if any, normally occur in circulation and ill fig­ ures prior to a schedule of substantial opac ill enhance­ ments. although the year­to­year comparisons over the course of subsequent enhancement rollouts were made to test for the statistical significance of the year prior and following a particular functionality addition, the ’99 to ’00 40 information technology and libraries | march 200740 information technology and libraries | march 2007 comparison was made to form a control of what circula­ tion and ill trends may look like between years of no drastic workflow or design changes. results showed that this yearly comparison prior to the beginning of opac enhancements (table 1, charts 1 and 2) showed no significant change from one year to the next in total circulation (t = 1.81, p > 0.05) or total ills received (t = ­0.76, p > 0.05). circulation from ’99 to ’00 declined slightly by 3.42 percent while total ills received increased 3.35 percent. the mblc’s available retrospec­ tive data set currently only goes back to fy ’99, so a deeper understanding beyond this two­year comparison of normal year­to­year trends was impossible to achieve. yet data from this sample suggest that both circulation and ills may trend statistically flat from one year of little if any alteration of ill design to the next. additionally, comparisons of the percent of total ills received to total circulation were made between ’99 and ’00 (as will be seen in table 6) and were found to be insignificantly different (z = ­0.23, p > 0.05). ills received made up 0.61 percent of total circulation in fy ’99 and 0.65 percent of total circulation in fy ’00. during fy ’01 (november 2001), c/w mars rolled out automated systemwide holds functionality whereby library staff were first able to place patron requests for materials at other c/w mars member libraries through the consortium’s automated circulation system. up until this point, holds (ills) were placed primarily by staff through e­mail or faxed requests from one ill depart­ ment to another. patrons would request material either verbally with staff or through the submission of a paper or electronic form. staff would then look up the item in the electronic catalog and make the request. with the advent of systemwide holds, staff still accepted requests in a similar fashion, but instead of using the fax or e­mail, they began to place requests directly into the network’s innovative millennium circu­ lation clients. from there, the automated system not only randomly chose the lending library within the system but also automatically queued paging slips at the lending library for material that would subsequently be sent in transit to the borrowing location. by this time in the network’s development, opac had also graduated from a character­based telnet system to a smoother web design. but the catalog, in terms of directly assisting in the placing of ill requests, func­ tioned as it always had—it was still individually a search­ ing mechanism. the introduction of systemwide holds led to the sec­ ond largest jump in ill figures out of all comparative samples (table 2, chart 4). interestingly enough, the con­ siderably significant 127.23­percent gain in ill activity from fy ’00 to fy ’01 (t = ­4.07, p < 0.05) did not translate into a significant increase in total circulation. in fact, cir­ culation declined during this period, not significantly (t = 1.87, p > 0.05), but by 2.40 percent nonetheless (table 2, chart 5). a comparison of the percent of ills to total circulation from fy ’00 to fy ’01 (table 7) indicated a sig­ nificant increase of 0.65 percent to 1.52 percent (z = ­4.20, p < 0.05). more on the possible effects to circulation that rising levels of ills may elicit will be touched upon. though no statistical evaluations were made between fy ’01 and fy ’02 (as no novel ill changes were made over this period), it should be noted that during fy ’02 the network first allowed patrons the ability, through opac, to log into their own accounts remotely. patrons were given the ability to set up a personal identification number and view such things as a list of their checked­ out items. patrons were also allowed to place checks next to such items and to renew these items remotely. fy ’03 saw the original direct ill enhancement to opac. during this year patrons were first given the opportunity to directly place ill requests of their own (patron­placed holds) for material found in the catalog through the addition of an opac screen request button. up until this time, all material requests had been medi­ ated by library staff. comparative total circulation results from the year before enhancement to fy ’03 (table 3, chart 5) showed only a slightly significant 4.18 percent increase (t = ­2.94, p < 0.05). ills­received figures (table 3, chart 6), however, jumped by a considerable 25.58 percent margin (t = ­4.66, p < 0.05), strongly suggesting that the opac request­ button addition and its facilitation of patron­placed holds had a positive effect upon total ill activity as was hypothesized. finally, total ills received as a percentage of total circulation increased slightly from fy ’02 (2.52 percent) to fy ’03 (3.04 percent) (table 8) but did not rep­ resent a significant shift (z = ­1.51, p > 0.05). the last augmentation to the network’s opac design that this study examined was an additional link for ills through the massachusetts virtual catalog. the massachusetts virtual catalog at the time of this study was an online union catalog of nine massachusetts net­ work consortia and four university of massachusetts system libraries. unlike the previous request­button enhancement that allowed for seamless patron­placed holds within the c/ w mars catalog, the massachusetts virtual catalog link was not a button but a descriptive hyperlink (can’t find the title you want here? try the massachusetts virtual catalog next!) from the network’s opac to the virtual catalog’s own dedicated opac interface. once there, patrons were required to login to the virtual catalog and re­create their search queries from scratch as previous searches were not automatically passed through to the second catalog. in essence, the virtual catalog acted as an additional step for patrons to take beyond c/w mars’s list of holdings to broaden their search for materials that the network’s member libraries did not own. article title | author 41opac design enhancements | bennett 41 comparative figures for total circulation between fy ’04 and fy ’05 (table 4, chart 7) when the virtual catalog link was added to the c/w mars opac screen found circulation down an insignificant 2.04 percent (t = 0.97, p > 0.05), which ran counter to hypothesized expectations. total ills received between fy ’04 and fy ’05 (table 4, chart 8), however, rose 30.85 percent, which proved to be a highly significant increase (t = ­7.03, p < 0.05). additionally ills as a percent of total circulation rose from 4.70 percent in fy ’04 to 6.27 percent in fy ’05 (table 9), which was sta­ tistically significant (z = ­3.28, p < 0.05) and pointed to not only gains in ill itself after the introduction of the virtual catalog link but also to the ever increasing proportion of total circulation that ill activity accounted for. the final statistical comparison accomplished in this study was a look at what possible cumulative effect, if any, both opac enhancements may have had from the year before the first enhancement’s rollout (patron­placed holds request button) to one year after the latest addition (virtual catalog hyperlink from opac). in turn, com­ parative numbers for circulation and ills between fy ’02 and fy ’05 were examined. total circulation over this time (table 5, chart 9) increased insignificantly by 3.46 percent (t = ­1.47, p > 0.05). total ills received (table 5, chart 10), how­ ever, increased by 157.47 percent, the highest significant increase of any two comparative samples (t = ­7.20, p < 0.05). ills as a percent of total circulation also increased significantly from 2.52 percent in fy ’02 to 6.27 percent in fy ’05 (z = ­7.71, p < 0.05) (table 10). if one steps back and examines the various compari­ sons discussed up to this point, certain trends become evident. over the course of the seven­year study, total circulation remained relatively flat, oscillating slightly back and forth, year to year with only one significant increase that occurred after the introduction of patron­ placed holds in fy ’03. these results, excluding fy ’03, ran against hypothesized expectations that predicted that as ill enhancements were rolled out, correspondingly significant increases in circulation would result. total ills received (the fy ’99 to fy ’00 control com­ parison) before the advent of first, network systemwide holds, then a succession of opac design enhancements that allowed for a broader range of patron­initiated ills suggested that these totals run statistically flat from one year to the next. with the advent of systemwide holds, the ill picture, however, began to change dramatically with a significant increase in total ills. this was fol­ lowed by significant increases in ill activity in each study year that came after an opac ill enhancement. these results pointed toward the substantial effect that these enhancements made in total ill activity and sup­ ported hypothesized expectations. when such opac rollouts were examined as a cumu­ lative influence through the prism of ill levels of this past fiscal year (fy ’05) compared to the year before their initial advent (fy ’02), the positive effect that such enrich­ ments had on not only total ill but also on total circula­ tion becomes clearest. for it is through this comparison that it was found that not only did total ills increase significantly but that ills as a percentage of total circula­ tion also increased significantly from the time before the first opac enhancement to the present. total circulation was surprisingly impervious to change and ran statisti­ cally flat during this time. it is clear from this longitudinal study that incremen­ tally granting patrons access to online tools for them to initiate such traditional library business as ills spurs sig­ nificantly large increases in such activity. in other words, these online tools are not ignored but are intellectually and literally grasped. what may be surprising, however, is the degree to which ill has increased as a result of them, to a point where ill has not only taken up a sig­ nificantly greater proportion of total circulation than ever before but also appears to be changing the very nature of circulation itself. future studies may include a deeper examination of the circulation and ill statistical picture farther back in time than this investigation covers to better clarify trends leading up to such major enhancement rollouts. also, similar longitudinal studies from different consortia envi­ ronments may shed further light on evidence discussed throughout this writing. consortia are uniquely poised to offer large statistical sample sizes and standardized workflows within their network­wide ill and circulation software packages and automated statistical programs. this, in turn, results in high­quality, consistent data samples from heterogeneous library sources that are rela­ tively uncorrupted by scattershot recording methods and differing circulation and ill methodologies. finally, a future look at the effects that similar opac ill enhancements may have on borrowing trends beyond general raw transactional figures is warranted. chris anderson, for example, has recently commented on long tail statistical analysis and its relation to library catalogs. here outwardly shifting demand curves for library mate­ rials are hypothesized as collections become more visible and interconnected through the web.20 in a similar vein, a more granular examination of such concepts as possible circulation and ill­activity trends in terms of discrete material types borrowed, patron types who borrow, or a cross­tabulation of these data points would appear to be a fertile next step toward a greater knowledge of ills and circulation as a whole. references 1. t. peters, “when smart people fail: an analysis of the transaction log of an online public access catalog,” the journal of academic librarianship 15, no. 5 (1989): 267–73. 42 information technology and libraries | march 200742 information technology and libraries | march 2007 2. ibid., 272. 3. ibid. 4. ibid., 272. 5. p. wallace, “how do patrons search the online catalog when no one’s looking? transaction­log analysis and impli­ cations for bibliographic instruction and system design,” rq 33, no. 2 (1993): 239–43. 6. peters, “when smart people fail.” 7. wallace, “how do patrons search the online catalog when no one’s looking?” 239. 8. a. ciliberti et al., “empty handed? a material availabil­ ity study and transaction­log analysis verification,” the journal of academic librarianship 24, no. 4 (1998): 282–89. 9. p. kantor, “availability analysis,” journal of the american society for information science 27, nos. 5–6 (1976): 311–19. 10. ciliberti et al., “empty handed? a material availability study and transaction­log analysis verification.” 11. peters, “when smart people fail.” 12. ciliberti et al., “empty handed? a material availability study and transaction­log analysis verification.” 13. d. blecic, et al., “a longitudinal study of the effects of opac screen changes on searching behavior and searcher suc­ cess,” college & research libraries 60, no. 6 (1999): 515–30. 14. ibid. 15. d. thomas, “the effect of interface design on item selec­ tion in an online catalog,” library resources & technical services 45, no. 1 (2001): 20–46. 16. ibid., 41. 17. r. graham, “subject no­hits searches in an academic library online catalog: an exploration of two potential ame­ liorations,” college & research libraries 65, no. 1 (2004): 36–54. 18. ibid. 19. massachusetts board of library commissioners 2005, “public library data, data files,” http://www.mlin.lib.ma.us/ advisory/statistics/public/index.php (accessed oct. 13, 2005). 20. c. anderson, “the long tail,” wired magazine 12, no. 10 (2004): 170–77; “q&a with chris anderson,” oclc newsletter, 2005, no. 268, http://www.oclc.org/news/publications/news letters/oclc/2005/268/interview.htm (accessed july 20, 2006). appendix a: tables and charts table 1. yearly comparison prior to the beginning of ill opac enhancements table 2. general systemwide holds implementation (adopted 11/00) article title | author 43opac design enhancements | bennett 43 table 3. opac design enhancement: patron-placed holds (adopted 12/02) table 4. opac design enhancement: patron-placed massachusetts virtual catalog holds (adopted 8/04) table 5. opac design enhancements: “cumulative effect” (fy ’02 to fy ’05) table 6. yearly comparison prior to the beginning of ill opac enhancements of ill received as a percentage of total circulation 44 information technology and libraries | march 200744 information technology and libraries | march 2007 table �. opac design enhancement: patron-placed massachusetts virtual catalog holds (adopted 8/04) ill received as a percentage of total circulation table 10. opac design enhancements: “cumulative effect” (fy ’02 to fy ’05) ill received as a percentage of total circulation table 7. general systemwide holds (adopted 11/00) ill received as a percentage of total circulation table 8. opac design enhancement: patron-placed holds (adopted 12/02) ill received as a percentage of total circulation article title | author 45opac design enhancements | bennett 45 chart 1. circulation comparison prior to any ill opac enhancement (fy ’99 to fy ’00) chart 2. ill received comparison prior to any ill opac enhancement (fy ’99 to fy ’00 chart 4. holds received comparison before and after general systemwide holds implementation (adopted 11/00) chart 5. circulation comparison before and after patron-placed holds opac enhancement (adopted 12/02) chart 3. circulation comparison before and after general systemwide holds implementation (adopted 11/00) chart 6. holds received comparison before and after patron-placed holds opac enhancement (adopted 12/02) 46 information technology and libraries | march 200746 information technology and libraries | march 2007 chart 7. circulation comparison before and after massachusetts virtual catalog opac enhancement (adopted 8/04) chart 8. holds received comparison before and after massachusetts virtual catalog opac enhancement (adopted 8/04) chart 9. circulation comparison opac enhancements “cumulative effect” (fy ’02 to fy ’05) chart 10. ill comparison opac enhancements “cumulative effect” (fy ’02 to fy ’05) lita 35, 47, cover 2, cover 4 neal­schuman cover 3 index to advertisers patrick griffis building pathfinders with free screen capture tools building pathfinders with free screen capture tools | griffis 189 this article outlines freely available screen capturing tools, covering their benefits and drawbacks as well as their potential applications. in discussing these tools, the author illustrates how they can be used to build pathfinding tutorials for users and how these tutorials can be shared with users. the author notes that the availability of these screen capturing tools at no cost, coupled with their ease of use, provides ample opportunity for low-stakes experimentation from library staff in building dynamic pathfinders to promote the discovery of library resources. o ne of the goals related to discovery in the university of nevada las vegas (unlv) libraries’ strategic plan is to “expand user awareness of library resources, services and staff expertise through promotion and technology.”1 screencasting videos and screenshots can be used effectively to show users how to access materials using finding tools in a systematic, step-by-step way. screencasting and screen capturing tools are becoming more intuitive to learn and use and can be downloaded for free. as such, these tools are becoming an efficient and effective method for building pathfinders for users. one such tool is jing (http://www.jingproject.com), freeware that is easy to download and use. jing allows for short screencasts of five minutes or less to be created and uploaded to a remote server on screencast.com. once a jing screencast is uploaded, screencast.com provides a url for the screencast that can be shared via e-mail or instant message or on a webpage. another function of jing is recording screenshots, which can be annotated and shared by url or pasted into documents or presentations. jing serves as an effective tool for enabling librarians working with students via chat or instant messaging to quickly create screenshots and videos that visually demonstrate to students how to get the information they need. jing stores the screenshots and videos on its server, which allows those files to be reused in subject or course guides and in course management systems, course syllabi, and library instructional handouts. moreover, jing’s files storage provides an opportunity for librarians to incorporate tutorials into a variety of spaces where patrons may need them in such a manner that does not require internal library server space or work from internal library web specialists. trailfire (http://www.trailfire.com) is another screencapturing tool that can be utilized in the same manner. trailfire allows users to create a trail of webpage screenshots that can be annotated with notes and shared with others via a url. such trails can provide users with a step-by-step slideshow outlining how to obtain specific resources. when a trail is created with trailfire, a url is provided to share. like jing, trailfire is free to download and easy to learn and use. wink (http://debugmode.com/wink) was originally created for producing software tutorials, which makes it well suited for creating tutorials about how to use databases. although wink is much less sophisticated than expensive software packages, it can capture screenshots, add explanation boxes, buttons, titles, and voice to your tutorials. screenshots are captured automatically as you use your computer on the basis of mouse and keyboard input. wink files can be converted into very compressed flash presentations and a wide range of other file types, such as pdf, but do not support avi files. as such, wink tutorials converted to flash have a fluid movie feel similar to jing screencasts, but wink tutorials also can be converted to more static formats like pdf, which provides added flexibility. slideshare (http://www.slideshare.net) allows for the conversion of uploaded powerpoint, openoffice, or pdf files into online flash movies. an option to sync audio to the slides is available, and widgets can be created to embed slideshows onto websites, blogs, subject guides, or even social networking sites. any of these tools can be utilized for just-in-time virtual reference questions in addition to the common use of just-in-case instructional tutorials. such just-in-time screen capturing and screencasting offer a viable solution for providing more equitable service and teachable moments within virtual reference applications. these tools allow library staff to answer patron questions via e-mail and chat reference in a manner that allows patrons to see processes for obtaining information sources. demonstrations that are typically provided in face-toface reference interactions and classroom instruction sessions can be provided to patrons virtually. the efficiency of this practice is that it is simpler and faster to capture and share a screencast tutorial when answering virtual reference questions than to explain complex processes in written form. additionally, the fact that these tools are freely available and easy to use provides library staff the opportunity to pursue low-stakes experimentation with screen capturing and screencasting. the primary drawback to these freely available tools is that none of them provides a screencast that allows for both voice and text annotations, unlike commercial products such as camtasia and captivate. however, tutorials rendered with these freely available tools can be repurposed into a tutorial within commercial applications like camtasia studio (http://www.techsmith.com/camtasia .asp) and adobe captivate (http://www.adobe.com/ products/captivate/). patrick griffis (patrick.griffis@unlv.edu) is business librarian, university of nevada las vegas libraries. 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . lib-s-mocs-kmc364-20140601051127 1 foreword the editorial board of the journal of library automation is pleased to pay tribute to frederick g. kilgour who, with the able assistance of his assistant editor, eleanor m. kilgour, so firmly established this periodical and set its standards so high. especially in view of the fact that in these first years of journal publication, mr. kilgour was also designing and implementing the complex system which is the ohio college library center, his achievement as first editor was remarkable. to him the information science and automation division of the american library association owes a great debt. as library automation moves further into the seventies, the context of its existence changes. ever-increasing fisca l pressures have required economic justification for every alteration of traditional practice. the mere availability of equipment, of programs and tested system design, even of skilled and experienced manpower can no longer be considered enough. novelty, the magic word "innovation," seldom now cast a spell on those who control institutional budgets. increasingly, in the issues of this journal, we hope that emphasis will be placed on reviews of experience, retrospective evaluations of operation rather than optimistic projections made in the first bright mornings of system design. we must have reports if not of failures at least of alterations and accommodations enforced on operational systems by experience and the heavy hand of time. it is our further hope that the ] ournal will receive more reports from public and school libraries which indicate an increasing dedication, in automation explications, to the social and educational goals of those institutions. -ajg 26 information technology and libraries | june 2008 preparing locally encoded electronic finding aid inventories for union environments: a publishing model for encoded archival description author id (to come) plato l. smith ii this paper will briefly discuss encoded archival description (ead) finding aids, the workflow and process involved in encoding finding aids using ead metadata standard, our institution’s current publishing model for ead finding aids, current ead metadata enhancement, and new developments in our publishing model for ead finding aids at florida state university libraries. for brevity and within the scope of this paper, fsu libraries will be referred to as fsu, electronic ead finding and/ or archival finding aid will be referred as ead or eads, and locally encoded electronic ead finding aids inventories will be referred to as eads @ fsu. n what is an ead finding aid? many scholars, researchers, and learning and scholarly communities are unaware of the existence of rare, historic, and scholarly primary source materials such as inventories, registers, indexes, archival documents, papers, and manuscripts located within institutions’ collections/holdings, particularly special collections and archives. a finding aid—a document providing information on the scope, contents, and locations of collections/ holdings—serves as both an information provider and guide for scholars, researchers, and learning and scholarly communities, directing them to the exact locations of rare, historic, and scholarly primary source materials within institutions’ collections/holdings, particularly noncirculating and rare materials. the development of the finding aid led to the institution of an encoding and markup language that was software/hardware independent, flexible, extensible, and allowed online presentation on the world wide web. in order to provide logical structure, content presentation, and hierarchical navigation, as well as to facilitate internet access of finding aids, the university of california–berkeley library in 1993 initiated a cooperative project that would later give rise to development of the nonproprietary sgml-based, xml-compliant, machine-readable markup language encoding finding aid standard, encoded archival description (ead) document type definition (dtd) (loc, 2006a). thus, an ead finding aid is a finding aid that has been encoded using encoded archival description and which should be validated against an ead dtd. the ead xml that produces the ead finding aid via an extensible style sheet language (xsl) should be checked for well-formed-ness via an xml validator (i.e. xml spy, oxygen, etc.) to ensure proper nesting of ead metadata elements “the ead document type definition (dtd) is a standard for encoding archival finding aids using extensible markup language (xml)” (loc, 2006c). an ead finding aid includes descriptive and generic elements along with attribute tags to provide descriptive information about the finding aid itself, such as title, compiler, compilation date, and the archival material such as collection, record group, series, or container list. florida state university libraries has been creating locally encoded electronic encoded archival description (ead) finding aids using a note tab light text editor template and locally developed xsl style sheets to generate multiple ead manifestations in html, pdf, and xml formats online for over two years. the formal ead encoding descriptions and guidelines are developed with strict adherence to the best practice guidelines for the implementation of ead version 2002 in florida institutions (fcla, 2006), manuscript processing reference manual (altman & nemmers, 2006), and ead version 2002. an ead note tab light template is used to encode findings down to the collection level and create ead xml files. the ead xml files are tranformed through xsl stylesheets to create ead finding aids for select special collections. n ead workflow, processes, and publishing model the certified archivist and staff in special collections and a graduate assistant in the digital library center encode finding aids in ead metadata standard using an ead clip and ead template library in note tab light text editor via data entry input for the various descriptive, administrative, generic elements, and attribute metadata element tags to generate ead xml files. the ead xml files are then checked for validity and well-formed-ness using xml spy 2006. currently, ead finding aids are encoded down to the folder level, but recent florida heritage project 2005–2006 grant funding has allowed selected special collections finding aids to be encoded down to the item level. currently, we use two xsl style sheets, ead2html.xsl and ead2pdf.xsl, to generate html and pdf formats, and simply display the raw xml as part of rendering ead finding aids as html, pdf, and xml and presenting these manifestations to researchers and end users. the ead2html.xsl style sheet used to generate the html versions was developed with specifications such as use of fsu seal, color, and display with input from the special collections department head. the ead2pdf.xsl style sheet used to generate pdf versions uses xsl-fo (formatting plato l. smith ii (psmithii@fsu.edu) is digital initiatives librarian at florida state university libraries, tallahassee. preparing locally encoded electronic finding aid inventories for union environments | smith 27 object), and was also developed with specifications for layout and design input from the special collections department head. the html versions are generated using xml spy home edition with built-in xslt, and the pdf versions are generated using apache formatting object processor (fop) software from the command line. ead finding aids, eads @ fsu, are available in html, pdf, and xml formats (see figure 1). the style sheets used, ead authoring software, and eads @ fsu original site are available via www.lib.fsu.edu/dlmc/dlc/ findingaids. n enriching ead metadata as ead standards and developments in the archival community advance, we had to begin a way of enriching our ead metadata to prepare our locally encoded ead finding aids for future union catalog searching and opac access. the first step toward enriching the metadata of our ead finding aids was to use rlg ead report card (oclc, 2008) on one of our ead finding aids. the test resulted in the display of missing required (req), mandatory (m), mandatory if applicable (ma), recommended (rec), optional (opt), and encoding analogs (relatedencoding and encodinganalog attributes) metadata elements (see figure 2). the second test involved reference online archive of california best practices guidelines (oac bpg), specifically appendix b (cdl, 2005, ¶ 2), to create a formal public identifier (fpi) for our ead finding aids and make the ead fpis describing archives content standards (dacs)–compliant. this second test resulted in the creation of our very first dacs– compliant ead formal public identifier. example: ftasu2003004. xml the rlg ead report card and appendix b of oac bpg together helped us modify our ead finding aid encoding template and workflow to enrich the ead document identifier metadata tag element, include missing mandatory ead metadata elements, and develop fpis for all of our ead finding aids. prior to recent new developments in the publishing model of ead finding aids at fsu libraries, the ead finding aids in our eads @ fsu inventories could not be easily found using traditional web search engines, were part of the so-called “deep web,” (prom & habing, 2002) and were “unidimensional in that they [were] based upon the assumption that there [was] an object in a library and there [was] a descriptive surrogate for that object, the cataloging record” (hensen, 1999). ead finding aids in our eads @ fsu inventories did not have a descriptive surrogate catalog record and lacked the relevant related encoding and analog metadata elements within the ead metadata with which to facilitate “metadata crosswalks”—mapping one metadata standard with another metadata standard to facilitate crosssearching. “to make the metadata in ead instance as robust as possible, and to allow for crosswalks to other encoding schemes, we mandate the inclusion of the relatedencoding and encodinganalog attributes in both the and segments” (meissner, et al., 2002). incorporating an ead quality checking tool such as rlg bpg and ead compliance such as dacs when figure 1. ead finding aids in html, pdf, and xml format figure 2. rlg ead report card of xml ead file 28 information technology and libraries | june 2008 authoring eads, will assist in improving ead encoding and ead finding aids publishing model. n some key issues with creating and managing ead finding aids one of the major issues with creating and managing ead finding aids is the set of rules used for describing papers, manuscripts, and archival documents. the former set of rules used for providing consistent descriptions and anglo-american cataloging rules (aacr) bibliographic catalog compliance for papers, manuscripts, and archival documents down to collection level was archives, personal papers, and manuscripts (appm), which was complied by steven l. hensen and published by the library of congress in 1983. however, the need for more description granularity down to the item level, enhanced bibliographic catalog specificity, marc and ead metadata standards implementations and metadata standards crosswalks, and inclusion of descriptors of archival material types beyond personal papers and manuscripts prompted the development of describing archives: a content standard (dacs), published in 2004 with the second edition published in 2007. “dacs [u.s. implementation of international standard for the description of archival materials and their creators] is an output-neutral set of rules for describing archives, personal papers, and manuscripts collections, and can be applied to all material types ”(pearce-moses, 2005). some international standards for describing archival materials are general international standard archival description isad(g) and international standard archival authority record for corporate bodies, persons, and families [isaar(cpf)]. other issues with creating and managing ead finding aids include (list not exhaustive): 1. online presentation of finding aids 2. exposing finding aids electronically for searching 3. provision of a search interface to search finding aids 4. online public access catalog record (marc) and link to finding aids 5. finding aids linked to digitized content of collections eads @ fsu exist in html for online presentation, pdf for printing, and xml for exporting, which allow researchers greater flexibility and options in the information-gathering and research processes and have improved the way archivists communicated guides to archival collections with researchers as opposed to paper finding aids physically housed within institutions. eads @ fsu have existed online in html, pdf, and xml formats for two years in a static html document and then moved to drupal (mysql database with php) for about one year, which improved online maintenance but not researcher functionality. however, the purchase and upgrade of a digital content management system marked a huge advancement in the development of our ead finding aids implementation and thus resolutions to issues numbers 1–3. researchers now have a single-point search interface to search eads @ fsu across all our digital collections/ institutional repository (see figure 3); the ability to search within the finding aids via full-text indexing of pdfs; the option of brief (thumbnails with ead, htm, pdf, and xml manifestation icons), table (title, creator, and identifier), and full (complete ead finding aid dc record with manifestations) views of search results, which provides different levels of exposures of ead finding aids; and the ability to save/e-mail search results. future initiatives are underway to enhance eads @ fsu implementation via the creation of ead marc records through dublin core to marc metadata crosswalk, to deep link to ead finding aids via 856 field in marc records, and to begin digitizing and linking to ead finding aids archival content via digital archival object ead element. is “linking element that uses the attributes entityref or href to connect the finding aid information to electronic representations of the described materials. the and elements allow the content of an archival collection or record figure 3. online search gui for ead finding aids and digital collections within ir preparing locally encoded electronic finding aid inventories for union environments | smith 29 group to be incorporated in the finding aid” (loc, 2006b). we have opted to create basic dublin core records of ead finding aids based on the information in the ead finding aids descriptive summary (front matter) first and then crosswalk to marc, but are cognizant that this current workflow is subject to change in the pursuit of advancement. however, we are seeking ways to improve the ead workflow and ead marc record creation through more communication and future collaboration with the fsu libraries cataloging department. n number of finding aids and percent of eads @ fsu as of february 16, 2006, we had 700 collections with finding aids in which 220 finding aids are electronic and encoded in html (31 percent of total finding aids). from the 220 electronic finding aids, 60 are available as html, pdf, and xml finding aids (20 percent of electronic finding aids are eads @ fsu). however, we currently have 63 ead finding aids available online in html, pdf, and xml formats. n new developments in publishing eads @ fsu current eads @ fsu include the recommendations from test 1 and test 2 (rlg bpg and dacs compliance) which were discussed earlier and the digital content management system (i.e. digitool) creates a descriptive digital surrogate of the ead objects in the form of brief and basic dublin core metadata records for each ead finding aid along with multiple ead manifestations (see figure 4). we have successfully built and launched our first new digital collection, fsu special collections ead inventories, in digitool 3.0 as part of fsu libraries dlc digital repository (http://digitool3.lib.fsu.edu/r/), a relational database digital content management system (dcms). digitool has an oracle 9i relational database management system backend, searchable web-based gui, a default ead style sheet that allows full-text searching of eads, supports marc, dc, mets metadata standards, jpeg2000 (built in tools for images and thumbnails) as well as z39.50 and oai protocols which will enable resource discovery and exposing of eads @ fsu. you can visit fsu special collections ead finding aids inventories at http://digitool3.lib.fsu.edu/r/? func=collections-result&collection_id=1076. n national, international, and regional aggregation of finding aids initiatives rlg’s archivegrid (http://archivegrid.org/web/index. jsp) is an international, cross-institutional search constituting the aggregation of primary source archival materials of more than 2,500 research libraries, museums, and archives with a single-point interface to search archival collections from across research institutions. other international, cross-institutional searches of aggregated archival collections are: n intute: arts& humanities in the united kingdom www.intute.ac.uk/artsandhumanities/ cgi-bin/browse.pl?id=200025 (international guide to subcategories of archival materials) n archives made easy www.archivesmade easy.org (guide to archives by country) there are also some regional initiatives, which provide cross-institutional search of aggregations of finding aids: n publication of archival library and museum materials (palmm) http://palmm.fcla.edu (crossfigure 4. ead finding aids in ead (default), html, pdf, and xml manifestations 30 information technology and libraries | june 2008 institutional searches in fl fsu participates, fl) n virginia heritage: guides to manuscript and archival collections in virginia http://ead.lib .virginia.edu/vivaead/ (cross-institutional searches in virginia) n texas archival resources online www.lib.utexas. edu/taro/ (cross-institutional searches in texas) n online archive of new mexico http://elibrary .unm.edu/oanm/ (cross-institutional searches in new mexico) awareness of regional, national, and international aggregation of finding aids initiatives and engagement in regional aggregation of finding aids will enable a consistent advancement in the development and implementation of eads @ fsu. acknowledgments fsu libraries digital library center and special collections department, florida heritage project funding (fcla), chuck f. thomas (fcla), and robert mcdonald (sdsc) assisted in the development, implementation, and success of eads at fsu. references altman, b. & nemmers, j. (2006). manuscripts processing reference manual. florida state university special collections. california digital library (cdl). (2005). oac best practice guidelines for encoded archival description, appendix b. formal public identifiers for finding aids. retrieved october 6, 2006 from www.cdlib.org/inside/diglib/guidelines/bpgead/ bpgead_app.html#d0e2995. digital library center, florida state university libraries. (2006). fsu special collections ead finding aids inventories. retrieved january 5, 2007 from http://digitool3.lib.fsu.edu/ r/?func=collections-result&collection_id=1076. florida center of library automation (fcla). (2004). palmm: publication of archival library and museum materials, archival collections. retrieved january 7, 2007 from http://palmm.fcla .edu. florida center for library automation (fcla). (2006). best practice guidelines for the implementaton of ead version 2002 in florida institutions. (john nemmers, ed.). accessed april 21, 2008, at www.fcla.edu/dlini/openingarchives/new/ floridaeadguidelines.pdf fox, m. (2003). the ead cookbook — 2002 edition.chicago: the society of american archivists. retrieved october 6, 2006 from www.archivists.org/saagroups/ead/ead2002cookbook .html. hensen, s. l. (1999). nistf ii and ead: the evolution of archival description. encoded archival description: context, theory, and case studies (pp. 23–34). chicago: the society of american archivsits library of congress (loc). (2006a). development of the encoded archival description dtd. retrieved october 6, 2006 from www.loc.gov/ead/eaddev.html. library of congress (loc). (2006b). digital archival object— encoded archival description tag library—version 2002. retrieved january 8, 2007 from www.loc.gov/ead/tglib. library of congress (loc). (2006c). encoded archival description —version 2002 official site. etd dtd version 2002. retrieved april 19, 2008 from www.loc.gov/ead/ead2002a.html. meissner, d., kinney, g., lacy, m., nelson, n., proffitt, m., rinehart, r., ruddy, d., stockling, b., webb, m., & young, t. (2002). rlg best practices guidelines for encoded archival description (pp. 1-24). mountain view: rlg. retrieved january 5, 2007 from www.rlg.org/en/pdfs/bpg.pdf. national library of australia. (1999). use of encoded archival description (ead) for manuscript collection retrieved january 4, 2007 from www.nla.gov.au/initiatives/ead/eadintro .html. oclc. (2007). archivegrid—open the door to history. retrieved january 4, 2007 from http://archivegrid.org/web. oclc. (2008). ead report card. retrieved april 11, 2008 www.oclc.org/programs/ourwork/past/ead/reportcard .htm. pearce-moses, r. (2005). a glossary of archival and records terminology. chicago: society of american archivists. retrieved january 8, 2007 from www.archivists.org/glossary/index.asp. prom, c. j. & habing, t. g. (2002). using the open archives initiative protocols with ead . paper preserted at the international conference on digital libraries proceedings of the 2nd acm/ieee-cs joint conference on digital libraries. portland, oregan, usa, july 14-18, 2002. retrieved october 6, 2006 from http://portal.acm .org/citation.cfm?doid=544220.544255. reese, t. (2005). building lite-weight ead repositories,. paper presented in the international conference on digital libraries proceedings of the 5th acm/ieee-cs joint conference on digital libraries. new york: acm. retrieved january 5, 2007 from http://doi.acm.org/10.1145/1065385.1065498. special collections department, university of virginia. (2004). virginia heritage guides to manuscripts and archival collections in virginia. retrieved january 7, 2007 from http://ead.lib.virginia .edu/vivaead/. thomas, c., et al. (2006). best practices guidelines for the implementation of ead version 2002 in florida institutions. florida state university special collections. university of texas libraries, university of texas at austin. (unknown). texas archival resources online (taro). retrieved january 4, 2007 from www.lib.utexas.edu/taro. 78 information technology and libraries | june 2006 in the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall. in recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, as well as the value of the information to the user. we have developed a systems engineering-based methodology for improving the whole search experience. the approach focuses on understanding users’ information-seeking problems, understanding who has the problems, and applying solutions that address these problems. this information is gathered through ongoing analysis of site-usage reports, satisfaction surveys, help desk reports, and a working relationship with the business owners. ■ evaluation models in the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall.1 in recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, cognitive models of information behavior, as well as the value of the information to the user.2 the conceptual framework for holistic evaluation of libraries described by nicholson defines multiple perspectives (internal and external views of the library system as well as internal and external views of its use) from which to measure and evaluate a library system.3 the work described in this paper is consistent with these frameworks as it emphasizes that, while efforts to improve search may focus on optimizing precision or recall, it is equally important to recognize that the search experience involves more than a perfect set of high-precision, high-recall search results. the total search experience and how well the system actually helps the user solve the search task must be evaluated. a search experience begins when users enter words in a search box. it continues when the users view some representation (such as a list or a table) of candidate answers to their queries. it includes the users’ reactions to the usefulness of those answers and their representation in satisfying information needs, and continues with the users clicking on a link (or links) to view content. optimizing search results without considering the rest of the search experience and without considering user behavior is missing an opportunity to further improve user success. for example, the experience is a failure if typical users cannot recognize the answers to their information need because the items lack a recognizable title or an informative description, or they involve extensive scrolling or hard-to-use content. ■ proposed solutions problems with search, such as low precision or low recall, are often addressed by either metadata solutions (adding topical tags to content objects based on controlled vocabularies) or replacement of the search engine. the problems with the metadata approach include the time and effort required to establish, evolve, and maintain taxonomies, and the need for trained intermediaries to apply the tags.4 a community of stakeholders may be convened to define the controlled vocabulary, but often the lowest common denominator prevails, the champions and stakeholders leave, and no one is happy with the resulting standard. even with trained intermediaries, inter-indexer inconsistency compromises this approach, and inconsistent term application can cause degradation of search results.5 another shortcoming of the metadata approach is that a specific metadata classification is just a snapshot in time and assumes that there is only one particular hierarchy of the information in the corpus. in reality, however, there is almost always more than one way to describe a concept, and the taxonomy is the view of only one individual or group of individuals. in addition, topical metadata is often implemented with little understanding of the types of queries that are submitted or the probable user search behavior. the other approach to improving search results— replacing a search engine—is not a guarantee to fixing the problem because it focuses only on improving precision (and perhaps recall as well) without understanding the true barriers to a successful search experience. ■ irs.gov irs.gov, one of the most widely used government web sites, is routinely accessed by millions of people each month (more than 27 million visits in april 2005). as an informational site, the key goal of irs.gov is to direct visitors quickly to useful information, either through marcia d. kerchner (mkerchner@mitre.org) is a principal information systems engineer at the mitre corporation, mclean, va. a dynamic methodology for improving the search experience marcia d. kerchner article title | author 79a dynamic methodology for improving the search experience | kerchner 79 navigation or a search function. given that there were almost 16 million queries submitted to irs.gov in april 2005, search is clearly a popular way for its users to look for information. this paper offers an alternative to conventional search-improvement approaches by presenting a systems engineering-based methodology for improving the whole search experience. this methodology was developed, honed, and modified in conjunction with work performed on the irs.gov web site over a threeyear period. a similar strategy of “sense-and-respond” for information technology (it) departments of public organizations that involves systematic intelligence gathering on potential customer demand, a rapid response to fulfill that demand, and metrics to determine how well the demand was satisfied, has recently been described.6 the methodology described in this paper focuses on analyzing the information-seeking behaviors and needs of users and determining the requirements of the business owners (the irs business operating divisions that provide content to irs.gov, such as small business and self-employed, wage and investment) for directing users to relevant content. it is based on the assumption that a web site must evolve based on its user needs, rather than expecting users to adapt to its singularities. to support this evolution, this approach leverages techniques for query expansion and document-space modification.7 dramatic improvements in quality of service to the user have resulted, enhancing the user experience at the site and reducing the need to contact the help desk. the approach is particularly applicable for those government, corporate, and commercial web sites where there is some control over the content, and usage can be categorized into regular patterns. the rest of this paper provides a case study in the application of the methodology and the application of metrics, in addition to precision and recall, to measure search experience improvement. ■ conceptual framework while analysis of search results often focuses on search syntax and search-engine performance, there are actually several steps in the retrieval process, from the user identifying an information need to the user receiving and reviewing query results. as shown in figure 1, finding information is a holistic process. there are several opportunities to improve the whole user experience by fine-tuning this process with a variety of tools—from document engineering to results categorization. once the user and business-owner needs are understood, the appropriate tools to address specific issues can be identified. the tools in our toolkit are described in the following sections. document engineering document engineering includes: ■ document-space modification: modifying the document space by adding terms to content (especially to titles) that are good discriminators and reflect terms commonly entered by users. this approach has the added benefit of making the content more understandable to users. ■ establishment of content-quality standards: defining business processes that improve content quality and organization. document-space modification there is significant syntactic and semantic impreciseness in the english language. in addition, because of the inadequacies of human or automatic keyword assignment, standard means of representing documents in indexes by statistical term associations and frequency counts or by adding metadata tags are not definitive enough to produce a space that is an exact image of the original documents. document-space modification moves documents in the document space closer to future similar queries by adding new terms or modifying the weight of existing terms in the content (figure 2).8 the document space is thus modified to improve retrieval. for irs.gov, rather than adjusting content weights, titles and content are modified to adjust to changing terminology and user needs. establishment of content-quality standards the quality of the search correlates with the quality of the content. improved search results can be achieved by applying good content-creation practices. retrieval can be significantly improved by addressing problems observed in the content. these problems include inconsistencies in term use—for example, earned income credit (eic) versus earned income tax credit (eitc)—duplicate content, insufficiently descriptive page titles, missing document summaries, misspellings, and inconsistent spellings. figure 1. the information retrieval process 80 information technology and libraries | june 2006 processes to improve content quality should establish standards for consistent term usage in content, as well as standards for consistent and descriptive naming of content types (for example, irs types include forms, instructions, and publications). these processes will not only improve search precision, but will also help users identify appropriate content in the search results. for example, content entitled “publication 503” in response to the query “child care” may be the perfect answer (with excellent precision and recall), but the user will not recognize it as the right answer. a title such as “publication 503: child and dependent care expenses” will clearly point the user to the relevant information. usability tests conducted in march 2005 for irs.gov confirmed that content organization plays an important role in the perceived success of a user’s search experience. long pages of links or scrolling pages of content left some users confused and overwhelmed, unable to find the needed information. for these queries, although the search results were perfect, with a precision of 100 percent after one document, the search experiences were still failures. query enhancement the technique of relevance feedback for query expansion improves retrieval in an iterative fashion.9 according to this approach, the user submits a query, reviews the search results, and then reports query-document relevance assessments to the system. these assessments are used to modify the initial query, that is, new terms are added to the initial query (hopefully) to improve it, and the query is resubmitted. if one visualizes the content in a collection as a space (figure 3), this approach attempts to move the query closer to the most relevant content. a drawback of relevance feedback is that it is not generally collected over multiple user sessions and over time, so the next user submitting the same query has to go through the same process of providing results evaluations for query expansion. borlund has noted that, given that an individual user ’s information need is personal and may change over session time, relevance assessments can only be made by a user at a particular time.10 however, on irs.gov, where there are many common queries for which there is a clear best-guess response, there is valuable relevance information that, if captured once, could benefit tens of thousands of users for specific queries. in fact, in april 2005, the top four hundred queries represented almost half of all the queries. another drawback of the relevance-feedback ap proach is that it forces the user, novice or expert, to become engaged in the search process. as noted previously, users are generally not interested in becoming search experts or in becoming intimately involved in the process of search. the relevance-feedback approach tries to change users’ behavior and forces them to find the specific word or words that will best retrieve the relevant information. in fact, some research has shown that the potential benefits of relevance feedback may be hard to achieve primarily because searchers have difficulty finding useful terms for effective query expansion.11 to avoid requiring users to submit relevance-feedback judgments, the methodology uses alternative approaches for gathering feedback: (1) mining sources of input that do not require any additional involvement on the part of the users; and (2) soliciting relevance judgments from subject matter experts. as noted above, while best results may be different per task and per user, particularly given the shortness of the queries, our goal is to maximize the good results for the maximum number of people. best-guess results are derived from a variety of sources, including usability testing, satisfaction survey questionnaires, and businesscontent owners. for example, users entering the common query “1040ez” can be looking for information on the form or the form itself. given that—as shown in table 1 (based on the responses of 11,715 users to satisfaction surveys in 2005)—the goal of 39 percent of irs.gov searchers is to download a form as opposed to 28 percent seeking to obtain general tax information, the retrieval of the 1040ez form and its instructions is prioritized, while also retrieving any general related information. figure 2. document-space modification figure 3. query modification article title | author 81a dynamic methodology for improving the search experience | kerchner 81 we can determine the best-guess results as follows: ■ review the search results for terms that are on the frequently entered search-terms list ■ review help desk contacts, satisfaction-survey comments, and zero-results reports to identify information users who are having trouble finding or understanding ■ identify best results by working with the business owners as necessary ■ analyze why best results are not being retrieved for a particular query ■ add appropriate synonyms for this and related queries ■ engineer relevant documents (as described above) in this way, the thesaurus, as the source for query enhancement, is an evolving structure that adapts to the needs of the users rather than being a fixed entity of elements based on someone’s idea of a standardized vocabulary. search improvement we can intercept very popular queries and return a set of preconfigured results or a quick link at the top of the search-results listing. for example, the user entering “1040” sees a list of the most popular 1040-related forms and instructions in addition to a list of other search results. there were more than 31,000 users in april 2005 who requested the i-9 form. since the form is not an irs form, users are presented with a link to the bureau of citizen and immigration services web site. the tens of thousands of users who look for state tax forms on irs.gov are directed either to the specific state-tax-form website page or to a page with links to state tax sites. this unique and user-friendly approach provides a significant improvement over a page that tells the user that there is no matching result, leaving him to fend for himself. another technique for improving search precision (not currently used for irs.gov) is to tune and adjust parameters in the search engine, such as the relative weighting of basic metadata tags such as title (if they are used in the relevance calculation). results-ranking improvement the search results can be programmatically re-ranked before being presented to the user. this approach (not used as yet on irs.gov) is a variation on the quick links described above for re-ranking more than one result. categorization a large set of search results can be automatically categorized into subsets to help the user find the information he needs. in addition, a “search within a search” function is available to help the user narrow down results. research to be conducted on commercial products to support automatic categorization is planned for the future. summarization as noted earlier, a barrier to a successful user experience can be the lack of informative descriptions in the search results. therefore, an important tool for search-experience improvement is to make sure that content titles and summaries are informative, or as a second choice, that the search engine dynamically generates informative summaries. passage-based summaries and highlighted search terms in the summary and the content have become a feature of many commercial search engines as another way to improve the usability of the returned results. in table 1. reasons for using irs.gov reason for coming to irs.gov % of total site visitors % of total search users download a tax form, publication, or instructions 39 39 obtain general tax information 27 28 obtain information on e-file 10 10 other 6 6 obtain info on tax regulations or written determinations 4 4 order forms from the irs 3 4 sign up or login to e-services 3 3 link and learn (vita/vce) training 3 3 obtain info on the status of your tax return 2 2 use online tax calculators 1 1 obtain info on revenue rulings or court cases 1 1 obtain an employer identification number (ein) 1 — note: due to rounding, totals may not equal 100%. 82 information technology and libraries | june 2006 addition, for those pdf publications that lacked informative titles in the title tag, descriptive information from a different metadata field was added to the search display programmatically, which improved the usability of such results significantly. ■ methodology the methodology for evolving the search functionality is based on a logical, systems-engineering approach to the issue of getting users the information they seek: understanding the problems, understanding who has the problems, and applying solutions that address the problems. usability studies, weblogs, focus groups, help desk contacts, and user surveys provide different perspectives of the information system. the steps of the methodology are: 1. understand the user population. 2. identify the barriers to a successful search experience. 3. analyze the information-seeking behaviors of the users. 4. understand the needs of the business owners. 5. identify and use the appropriate tools to improve the user’s search experience. 6. repeat as needed. 7. monitor new developments in search and analytic technologies and replace the search engine as appropriate. step 1: understand the user population the first step is to profile and understand the user population. as mentioned above, an online satisfaction survey was conducted during a six-week period in january– february 2005, to which 11,715 users responded. the users were asked the frequency of their usage of the site, their primary reason for coming to irs.gov, their category (individual, tax professional, business representative), and how they generally find information on irs.gov. as shown in tables 1–4, 76 percent of the irs. gov visitors use it once a month or less (the largest group being those who use it every six months or less), or were using it for the first time; 64 percent are individual taxpayers; 10 percent are tax professionals; 39 percent visit the site to download a form or publication; and 27 percent come for general tax or e-file information. forty-nine percent use the search engine. not surprisingly, 44 percent of the frequent visitors (those who visit once a week or more) are tax professionals, while 72 percent of the infrequent visitors are individuals or those who represent a business. the most common task of both the most frequent and infrequent visitors is to download a form, publication, or instructions, followed by obtaining general tax information. most frequent and infrequent visitors use the search function to locate their information. thus, the largest group of irs.gov users consists of average citizens, unfamiliar with the site, who have a specific question or a need for a specific form or publication. these users require high-precision, highly relevant results, and a highly intuitive search interface. they do not want or need to read all the material generated by their search, but they want their question answered quickly. these users are generally not experienced with sophisticated query language syntax, and because they come to the site no more than once a month, they are not likely to be familiar with its navigational organization. as studies demonstrate, users in general do not want to learn a search engine interface or tailor their queries to the design of a particular search engine.12 they want to find their information now before “search rage” sets in. one study observed that, on average, searchers get frustrated in twelve minutes.13 tax professionals form a small but important group of irs.gov users that includes lawyers, accountants, and tax preparers. they generally use the site on a regular basis, which could be daily, weekly, or monthly. some of these users, particularly lawyers and accountants, require high relevance in their search results; it is critical that they retrieve every relevant piece of information (e.g., all the tax regulations) related to a tax topic. they may be willing to sift through large results sets to make sure they have seen all the relevant items. in contrast, many tax preparers use the site primarily to download forms and instructions. while these different sets of users have different levels of expertise using the site and somewhat different precision and recall requirements, they do have one characteristic in common—they are not interested in search table 2. frequency of visits to irs.gov first time every six months or less about once a month about once a week daily more than once a day site visitor 29% 34% 13% 13% 7% 4% search user 26% 34% 14% 14% 7% 5% article title | author 83a dynamic methodology for improving the search experience | kerchner 83 for its own sake. approaches to improving retrieval results that focus on forcing users to use tools to refine their query to get presumably better search results (e.g., leveraging the power of boolean or other search syntax) are not desirable in a public web site environment. the complexity of the search must be hidden behind the search box and users must be helped to find information rather than be expected to master a search function. step 2: identify the barriers to a successful search experience there are several categories of reasons why finding information on a public web site can be frustrating for the user. ■ mismatch between user terminology and content terminology  the user search terms may not match the terminology or jargon used in the content (e.g., users ask for “tax tables” or “tax brackets”; the irs names them “tax rate schedules”).  multiple synonymous terms or acronyms are found because different authors are providing content on similar topics (e.g., “ein,” “employer identification number,” “federal id number”; “eic” versus “eitc”).  users request the same information in a variety of ways (e.g., “1040ez,” “1040-ez,” “ez,” “form1040ez,” “1040ez form,” “2005 1040ez,” “ez1040”).  related content may be inconsistently named, complicating the user’s search process (e.g., “1040x” form versus “1040-x” instructions).  the user may use a familiar acronym that is spelled out in the content (e.g., “poa” for “power of attorney”). ■ mismatch between user requests and actual content  many users ask for information that they expect to find on the site but is actually hosted at another site (e.g., “ds156,” a department of state form; “it-201,” a new york state tax form). ■ issues with results listing and content display  content may lack informative titles.  automatically generated summaries may not be sufficiently descriptive for users to recognize the relevant material in the results listing.  content may consist of long, scrolling pages, which users find hard to manage. ■ incomplete user queries  very short search phrases (average length of less than two words) can make it difficult for a search algorithm to deduce the specific content the user is seeking. step 3: analyze the information-seeking behaviors of the users site-usage reports, satisfaction surveys, help desk contact reports, zero-results reports, focus groups, and usability studies are valuable sources of information. they should be mined for information-seeking behaviors of the site’s users and other barriers to a successful search experience, as follows: ■ review site-usage reports for the most frequently entered search terms and popular pages (both may change over time) and the zero-results search terms. look for:  new terms  variations on popular terms  common misspellings or typos  common searches, including searches for items table 3. irs.gov user types type of user % of total site visitors % of total search users individual taxpayer 64% 64% representing a business 11% 11% tax professional 10% 11% representing a charity or nonprofit 3% 3% vita/vce volunteers 3% 3% representing a government entity 2% 2% student 2% 1% irs employee 1% 2% other 4% 3% table 4. how users find information on irs.gov how do you usually find information on irs.gov? % of total site visitors search engine 49% irs keyword 18% navigation to the web page 11% internet search engine (e.g., google, yahoo) 7% site map 5% other 4% bookmarks 3% links to irs.gov from other web sites 3% 84 information technology and libraries | june 2006 not on the site, that could be candidates for preprogrammed “quick links”  frequently entered terms—review search results to identify candidates for improvement ■ review satisfaction surveys over time  look for new problems that caused satisfaction to decrease  analyze answers to questions asking what people could not find, potentially identifying new barriers to success ■ conduct usability studies  identify issues with the user interface as well as with content findability and usability ■ review help desk contact reports  identify which topics users are having trouble finding or understanding step 4: understand the needs of the business owners the business owners are the irs business operating divisions that provide content to irs.gov, such as small business and self-employed, wage and investment. it is important to involve them in the process of enhancing the user experience, because they may have specific goals for prioritizing information on a particular topic or may be managing campaigns for highlighting new information. thus it is desirable to: ■ meet with business owners regularly to understand their goals for providing information to users ■ work with them to increase the findability of their content for example, when an issue in finding a particular content topic is identified (e.g., through an increase in help desk contacts), one approach is to show the business owner the actual results that common queries (based on the site-usage reports) on the topic retrieve and then present suggested alternative results that could be retrieved with a variety of enhancement techniques, such as thesaurus expansion or title improvement. the business owner can then evaluate which set of results presents the content in the most informative manner to the user. steps 1–4 facilitate work behind the scenes to gather the data needed to improve precision and recall and to make information more findable. the remaining steps use these data to adapt proven, widely used techniques for improving search experiences to a web site’s specific environment. step 5: identify appropriate tools to improve the information-retrieval process as described in the previous section, the tools in our toolkit are document engineering, query enhancement, search improvement, results-ranking improvement, categorization, and summarization. step 6: repeat as needed the process of improving the user search experience is ongoing as the site evolves. at irs.gov, different search terms appear on the site-usage reports over time, depending on whether or not it is filing season, or as new content and applications are published. human intervention (with the help of applicable tracking software) is essential for incorporating business requirements, evaluating human behavior, and identifying changing terms. step 7: monitor new developments in search and analytic technologies and replace the search engine as appropriate although a new search engine will not address all the issues that have been described, new features such as passage-based summaries and term-highlighting can improve the search experience. of course, one should consider replacing a search engine if new technology can demonstrate significantly improved precision and recall. the application of the methodology and the use of the toolkit for irs.gov will be described in the next section. ■ findings site-usage reports in 2003, an example of a serious mismatch in user and content terminology was discovered when site-usage reports were analyzed. users entering the equivalent terms ein, employer number, employer id number, and employer identification number retrieved significantly different sets of results. we met with the business owner, who identified a key-starting page that should be retrieved along with other highly relevant pages for all of these query terms. we recommended that “ein” be added to the title of the key page because, although ein is a very popular query, the acronym was not used in the content, but was instead spelled out. as a result, the key page was not being retrieved. synonyms were added to the query enhancement thesaurus to accommodate the variants on the ein concept. after these steps were implemented, the results were as follows: ■ for the query ein, the target page moved from #16 to #1 ■ for the query ein number, it moved from #17 to #5 article title | author 85a dynamic methodology for improving the search experience | kerchner 85 ■ for the query employer identification number, it moved up to #2 (it was not in the top 20 previously) ■ all search results now retrieved on the first page for these terms were highly relevant in january 2004, there were approximately twenty thousand queries using these terms, so the search experience has been improved for tens of thousands of users in one month and hundreds of thousands of users throughout the year. ■ review of help desk contacts help desk reports summarize, for each call or e-mail, the general topic of the user’s contact (filing information, employer id number, forms, and publications issues) and the specific question. for example, the report might indicate that a user needed help in finding or downloading the w-4 form or did not understand the instructions for amending a tax return. as help desk contact reports were reviewed, clusters of questions emerged indicating information that many users could not find or understand. by analyzing approximately 9,800 contacts (e-mail, telephone, chat) during a peak five-day period in april 2003, four particular areas were identified that were ripe for improvement: 480 users could not find previous years’ forms, which, although they can be found on the site, are not indexed and thus not findable through search; 250 users had questions about where to send their tax returns; 170 users had questions about getting a copy of their tax return or w-2 form; and 77 users had problems finding the 1040x or 1040ez forms. utilizing the information retrieval toolkit, the following improvements were implemented: a) search for previous years’ forms tool used: results-ranking improvement a user requesting a previous year’s forms (for example, 2002 1099misc) is now presented with a link directly to the page of forms for that specific year, as follows: recommendation(s) for: 2002 1099misc ■ 2002 forms and publications 2002 forms, instructions, and publications available in pdf format b) request for filing address tools used: document engineering and query enhancement a new “where to file” page was created. synonyms were added to the thesaurus to accommodate the variations on how people make this request (address, where to send, where to mail) and to prioritize retrieval of the “where to file” page. c) request for information about obtaining a copy of a tax return or w-2 form tools used: results-ranking improvement and query enhancement a “quick link” was created to the target page for getting a copy of returns and w-2 forms and synonyms were added to the thesaurus to prioritize related content for any query containing the word “copy.” d) requests for 1040x or 1040ez forms or instructions tool used: query enhancement synonyms were added to the thesaurus to address both the variations on how users requested the 1040x and 1040ez forms and instructions, and the inconsistencies in the titling of these documents (for example, the form and the instructions have different variations of the compound name). ■ results in 2004, approximately 4,200 contacts were reviewed with the help desk during the same time period (the week before april 15) to see whether the changes actually did help users find the information. it should be noted that, during this period from april 2003 to april 2004, many other improvements to the user search experience based on the methodology were deployed. although the number of visits to irs.gov increased by approximately 50 percent compared with the same period in 2003, the total number of contacts with the help desk decreased by 47 percent (there were approximately 9,800 contacts in this period in 2003). the results for the specific improvements are shown in table 5. the average decrease in contacts for those four topics was 68 percent, compared with the average decrease of 47 percent. this approach has significantly improved the user experience by identifying and addressing subject areas users have trouble finding or understanding on irs.gov, eliminating the need for them to contact the help desk. as a result, an increase of resources at the help desk was avoided and, hopefully, user satisfaction improved. 86 information technology and libraries | june 2006 ■ conclusions while the case presented in this article was specific to irs.gov, the methodology itself has wide application across domains. customer service for most government and commercial organizations depends on providing users with relevant information effectively and efficiently. there are many aspects to achieving this elusive goal of matching users with the specific information they need. in this paper, it has been demonstrated that, rather than focusing just on optimizing the search engine or developing a metadata-based solution, it is essential to view the user search experience from the time content is created to the moment when users have truly found the answer to their information needs. there is no one surefire solution, and one should not assume that enhanced metadata or a new search engine is the only solution to retrieval problems. the methodology described in this paper assumes that users, especially infrequent users of public web sites, do not wish to become search experts; that intuitive interfaces and meaningful results displays contribute to a successful user experience; and that keeping business owners involved is important. the methodology is based on understanding the behavior of a site’s users in order to identify barriers to a successful search experience, and on understanding the needs of business owners. the methodology focuses on adapting the site to its users (rather than vice versa) through document modification, improved content-development processes, query enhancement, and targeted search improvement. it includes improvements to the results phase of the search process, such as improved titles and summaries, as well as to the searchand-retrieval phase. this toolkit-based approach is effective and low-cost. it has been used over the past four years to improve the user search experience significantly for the millions of irs.gov users. interesting follow-on research could focus on identifying to what degree this methodology can be automated and how to leverage new tools to provide automated support for usage log analysis (such as mondosearch by mondosoft). it is clear from this case study that it is time to apply systems engineering rigor to search-experience improvement. this approach confirms the need to extend metrics for evaluating search beyond precision and recall to include the totality of the search experience. ■ future work teleporting has been defined as an approach in which users try to jump directly to their information targets.14 trying to achieve perfect search results supports the information-seeking strategy of teleporting. but the search process may involve more than a single search. people often conduct “a series of interconnected but diverse searches on a single, problem-based theme, rather than one extended search session per task.”15 this approach is similar to the sport of orienteering with searchers using data from their present situation to determine where to go next—that is, looking for an overview first and then submitting more detailed searches. given the general, nonspecific nature of the short queries submitted by irs.gov users, the orienteering approach may well describe the information-seeking behaviors of many users. this paper is limited to the improvement of search results for individual searches, but the need to investigate improving the search experience to support orienteering behavior is acknowledged. future research will investigate how to leverage the theoretical models of the information-search process, such as the anomalous states of knowledge (ask) underlying information needs and the information search process model.16 references and notes 1. “common evaluation measures,” the thirteenth text retrieval conference, nist special publication sp 500-261 (gaithersburg, va.: national institute of standards and technology, 2004), appendix a. 2. kalervo jarvelin and peter ingwersen, “information-seeking research needs extension towards tasks and technology,” information research 10, no. 1 (2004), http://informationr .net/ir/10-1/paper212.html (accessed feb. 2, 2006); k. fisher, s. erdelez, and l. mckechnie, eds., theories of information behavior (medford, n.j.: information today, 2005); t. saracevic and paul b. kantor, “studying the value of library and information services, part i: establishing a theoretical framework,” journal of the american society for information science. 48, no. 6 (1997): 527–42. table 5. comparison of 2004 and 2003 help desk contacts problem area number of contacts 2003 number of contacts 2004 change 1040x, 1040ez 77 19 -75% prior year forms 480 103 -78% copy of return 170 91 -47% where to file 250 104 -58% total 977 317 -68% article title | author 87a dynamic methodology for improving the search experience | kerchner 87 3. scott nicholson, “a conceptual framework for the holistic measurement and cumulative evaluation of library services,” journal of documentation 60, no. 2 (2004): 164–82. 4. avra michelson and michael olson, “dynamically enabling search and discovery tem,” internal mitre presentation, mclean, va., mar. 30, 2005. 5. lawrence e. leonard, “inter-indexer consistency studies, 1954–1975: a review of the literature and summary of study results,” occasional paper series, no. 131, graduate school of library science, university of illionois, urbana-champaign, 1977; tefko saracevic, “individual differences in organizing, searching and retrieving information,” in proceedings of american society for information science ’91 (new york: john wiley, 1991), 82–86; g. furnas et al., ”the vocabulary problem in human-system communication,” communications of the acm 30, no. 11 (1987): 964–71. 6. rajiv ramnath and david landsbergen, “it-enabled sense-and-respond strategies in complex public organizations,” communications of the acm 48, no. 5 (2005): 58–64. 7. t. l. brauen et al., “document indexing based on relevance feedback,” report no. isr-14 to the national science foundation, section xi, department of computer science, cornell university, ithaca, n.y., 1968; m. c. davis, m. d. linsky, and m. v. zelkowitz, “a relevance feedback system employing a dynamically evolving document space,” report no. isr-14 to the national science foundation, section x, department of computer science, cornell university, ithaca, n.y., 1968; marcia d. kerchner, dynamic document processing in clustered collections, report no. isr-19 to the national science foundation, ph.d. thesis, department of computer science, cornell university, ithaca, n.y., 1971. 8. ibid. 9. gerard s. salton, dynamic information and library processing (englewood cliffs, n.j.: prentice-hall, 1975). 10. p. borlund, “the iir evaluation model: a framework for evaluation of interactive information retrieval systems,” information research 8, no. 3 (2003), http://informationr.net/ir/8 -3/paper152.html (accessed feb. 15, 2006). 11. ian ruthven, “re-examining the effectiveness of interactive query expansion,” in proceedings of the 26th international acm sigir conference on research and development in information retrieval (new york: acm press, 2003), 213–20. 12. marc l. resnick and rebecca lergier, “things you might not know about how real people search,” 2002, www.search tools.com/analysis/how-people-search.html (accessed oct. 1, 2005). 13. danny sullivan, “webtop search rage study,” the search engine report, 2001, http://searchenginewatch.com/sereport/ article.php/2163451 (accessed sept. 10, 2005). 14. j. teevan et al., “the perfect search engine is not enough: a study of orienteering behavior in directed search,” in proceedings of computer-human interaction conference ’94 (new york: acm press, 2004), 415–22. 15. vicki o’day and robin jeffries, “orienteering in an information landscape: how information seekers get from here to there,” in proceedings interchi ’93 (new york; acm press, 1993), 438. 16. n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval, part i. background and theory,” the journal of documentation 38, no. 2 (1982): 61–71; n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval, part ii. results of a design study,” the journal of documentation 38, no. 3 (1982): 145–64; carol c. kuhlthau, seeking meaning: a process approach (norwood, n.j.: ablex, 1993). lib-mocs-kmc364-20131012122710 292 journal of library automation vol. 14/4 december 1981 we need a format which is consistent, easily maintainable without being uncontrollably disruptive, and responsive to changing needs which are likely to accelerate as we gain experience with online systems. rather than recommending or supporting the implementation of specific changes to the marc format, it is essential that the library community begin to establish the framework and benchmarks necessary to maintain the marc formats over the long term as well as to guide short-term considerations. arl and others can play an important role in undertaking and encouraging a broader approach to this pressing problem. such an approach will not only reduce the risk of decision making, but will also assist in the development of the cost/benefit data needed to enhance consideration of format changes. references l. d. kaye capen, simplification of the marc format: feasibility, benefits, disadvantages, consequences (washington, d.c.: association of research libraries, 1981), 22p. 2. "principles of marc format content designation," draft (washington, d.c.: library of congress, 1981), 66p. 3. ichikot. morita and d. kaye capen, "a cost analysis of the ohio college library center on-line shared cataloging system in the ohio state university libraries," library resources & technical services 21:286-302 (summer 1977). 4. council on library resources bibliographic interchange committee, bibliographic interchange report, no.1 (washington, d.c.: the council, 1981). comparing fiche and film: a test of speed terence crowley: division of library science, san jose state university, san jose, california. introduction for more than a decade librarians have been responding to budget pressures by altering the format of their library catalogs from labor-intensive card formats to computer-produced book and microformats. studies at bath, 1 toronto, 2 texas, 3 eugene, 4 los angeles, 5 and berkeley, 6 have compared the forms of catalogs in a variety of ways ranging from broad-scale user surveys to circumscribed estimates of the speed of searching and the incidence of queuing. the american library association published a state-of-the-art reporf as well as a guide to commercial computer-output microfilm (com) catalogs pragmatically subtitled how to choose; when to buy. 8 in general, com catalogs are shown to be more economical and faster to produce and to keep current, to require less space, and to be suitable for distribution to multiple locations. primary disadvantages cited are hardware malfunctions, increased need for patron instruction, user resistance (particularly due to eyestrain), and some machine queuing. the most common types of library com catalogs today are motorized reel microfilm and microfiche, each with advantages and disadvantages. microfilm offers filesequence integrity and thus is less subject to user abuse, i.e., theft, misfiling, and damage; in motorized readers with "captive" reels it is said to be easier to use. disadvantages include substantially greater initial cost for motorized readers; limits on the capacity of captive reels necessitating multiple units for large files; inexact indexing in the most widespread commercial reader, and eyestrain resulting from high speed film movement. microfiche offers a more nearly random retrieval, much less expensive and more versatile readers, and unlimited file size. conversely, the file integrity of fiche is lower and the need for patron assistance in use of machines is said to be greater than for self-contained motorized film readers. the problem one of the important considerations not fully researched is that of speed of searching. the toronto study included a selftimed "look-up" test of thirty-two items "not in alphabetical order" given to thirtysix volunteers, of whom thirty finished the test. the researchers found the results "inconclusive" but noted that seven of the ten librarians found film searching the fastest method. "average" time reported for searching in card catalogs was 37.3 min-utes, in film catalogs 41.6 minutes, and for fiche catalogs 4i. 7 minutes. a reanalysis of the original data shows a stronger advantage of fiche over film (45.3 minutes versus 51.7 minutes) when all times except duplicates are totaled, but that difference is almost entirely due to one extreme score (203 minutes). 9 the berkeley report of fiche/film comparability addressed the issue of retrieval speed directly. by constructing a series of look-up tests composed of items selected from a large public library com catalog, the researchers were able to compare microfiche and microfilm formats while holding other variables constant. in one test involving thirty-six paid users and 252 trials, microfilm was determined to be faster by 7.6 percent (±2.5 percent). in a second test, forty volunteer users were timed in 240 trials and the advantage of film over fiche dropped to 5. 7 percent ( ± 2.5 percent) .1° although rigorous in design and execution, the berkeley experimenters used in their look-up tests questions that naive users might misinterpret, e.g., "you want a book about paul robeson, written by eloise greenfield. find the listing and give the call number"; and some which could be confusing, e.g., "does the library have any joke books? if so, give the call number for one. "11 such questions potentially pose an element of uncertainty for subjects: should i look under robeson or greenfield? under joke books or humor? in addition, questions were selected by "browsing the file for target items," a procedure which could result in an uneven distribution of items which in turn could bias the results. since the number of observations is relatively large the reliability of the results is not questioned; the validity may be. the study reported here was executed by a class in research methods taught by the author during the same time as the berkeley study; we used the same two formats of the same catalog, and attempted to answer the same question: using the best available equipment, which microformat is faster to search? assumptions we assumed (i) the two forms of the catalog were identical; (2) the quality of the image was not significantly different; (3) a communications 293 search for items selected randomly from the file and arranged randomly was a fair test of retrieval speed; and ( 4) graduate students in library science were reasonably representative users for a test of speed. methodology we used a dictionary catalog from a public library system with 436, 79i entries, of which 5,63i were author, ill,l58 were title or added entries, and 320,002 were subject entries. using a random number table, we selected from the catalog i6 entries which were reproduced and randomly arranged to form the test. of the i6 items, 3 were author entries, 8 were title or added entries, 5 were subject entries. the sequence, which presumably would affect the speed of retrieval more in the film format because of the necessity to scroll from one letter to another, wasacwns kcb wm h l p pal. the test was then administered to thirty-seven volunteer graduate students randomly assigned to a micro-design 4020 fiche reader or an information design rom 3 film reader. the two readers were located in the same room. the 86 fiche were held and displayed by a ring king binder. all times were measured by a stopwatch. questionnaires administered before and after the test established that the two groups did not differ significantly in age or in selfperceived mechanical ability. of the film users, 64 percent used micro-formats "occasionally" or "frequently" compared with 35 percent of the fiche users. of the total group, 73 percent wore glasses and 62 percent reported prior physical problems with both film and fiche readers used before the test. results table 1 shows that the mean speed of the film users was i6. 7 minutes, significantly faster than the 25.3 minutes recorded by the fiche users; the range of speed for the film users was less than v3 that of the fiche users. even the slowest film user was faster than 70 percent of the fiche users. however, the fastest fiche user was faster than 70 percent of the film users. the range of fiche scores is more than 3 times that of the film scores (figure i). the standard statistical test shows the difference of means to be significant at the .oilevel. 294 journal of library automation vol. 14/4 december 1981 table i. speed of retrieval (in minutes) format low microfilm (n = 17) 12.3 microfiche(c = 20) 14.6 t = 4.8,p< .01 discussion searching motorized microfilm appears to be significantly faster than searching microfiche, on the average, for relatively inexperienced users. even the slowest time on the film was faster than most fiche times. the wide range of fiche scores suggests the possibility that frequent users could improve their searching times; very experienced users may be able to search fiche faster than film. • because of the relatively small numbers of subjects and observations •the author, an experienced fiche user, was timed at 11.6 minutes; this was the fastest time recorded by either fiche or film users. ) that we call “satisfaction frequency.” it represents the regularity with which a particular preference value has been used in alerts positively evaluated by the user. this frequency measures the relative importance of the preferences stated by the user and allows the interface agent to generate a ranking list of results. the range of possible values for these frequencies is defined by a group of seven labels that we get from the fuzzy linguistic variable “frequency,” whose expression domain is defined by the linguistic term set s = {always, almost_ always, often, occasionally, rarely, almost_never, never}, being the default value and “occasionally” being the central value. rss feeds thanks to the popularization of blogs, there has been widespread use of several vocabularies specifically designed for the syndication of contents (that is, for making accessible to other internet users the content of a website by means of hyperlink lists called “feeds”). to create our current-awareness bulletin we use rss 1.0, a vocabulary that enables managing hyperlinks lists in an easy and flexible way. it utilizes the rdf/xml syntax and data model and is easily extensible because of the use of proceedings figure 1. sample entry of a skos core thesaurus diego allione sr. af9fa7601df46e95566 library management 0.83 figure 2. user profile sample 26 information technology and libraries | march 2009 modules that enable extending the vocabulary without modifying its core each time new describing elements are added. in this model several modules are used: the dublin core (dc) module to define the basic bibliographic information of the items utilizing the elements established by the dublin core metadata initiative (http:// dublincore.org); the syndication module to facilitate software agents synchronizing and updating rss feeds; and the taxonomy module to assign topics to feeds items. the structure of the feeds comprises two areas: one where the channel itself is described by a series of basic metadata like a title, a brief description of the content, and the updating frequency; and another where the descriptions of the items that make up the feed (see figure 3) are defined (including elements such as title, author, summary, hyperlink to the primary resource, date of creation, and subjects). recommendation log file each document in the repository has an associated recommendation log file in rdf that includes the listing of evaluations assigned to that resource by different users since the resource was added to the system. each of the entries of the recommendation log files consists of a recommendation value, a uri that identifies the user that has done the recommendation, and the date of the record (see figure 4). the expression domain of the recommendations is defined by the following set of five fuzzy linguistic labels that are extracted from the linguistic variable “quality of the resource”: q = {very_low, low, medium, high, very_high}. these elements represent the raw materials for the sdi service that enable it to develop its activity through four processes or functional modules: the profiles updating process, rss feeds generation process, alert generation process, and collaborative recommendation process. system processes profiles updating process since the sdi service’s functions are based on generating passive searches to rss feeds from the preferences stored 14/03/2007 high figure 4. recommendation log file sample escudero sánchez, manuel fernández cáceres, josé luis broadcasting and the internet http://eprints.rclis.org/…/audiovideo_good.pdf this paper is about… 2002 redoc, 8 (4), 2008 virual communities figure 3. rss feed item sample in a user’s profile, updating the profiles becomes a critical task. user profiles are meant to store long-term preferences, but the system must be able to detect any subtle change in these preferences over time to offer accurate recommendations. in our model, user profiles are updated using a simple mechanism that enables finding users’ implicit preferences by applying fuzzy linguistic techniques and taking into account the feedback users provide. users are asked about their satisfaction degree (ej) in relation to the information alert generated by the system (i.e., whether the items a semantic model of selective dissemination of information | morales-del-castillo et al. 27 retrieved are interesting or not). this satisfaction degree is obtained from the linguistic variable “satisfaction,” whose expression domain is the set of five linguistic labels: s’ = {total, very_high, high, medium, low, very_low, null}. this mechanism updates the satisfaction frequency associated with each user preference according to the satisfaction degree ej. it requires the use of a matching function similar to those used to model threshold weights in weighted search queries.31 the function proposed here rewards the frequencies associated with the preference values present when resources assessed are satisfactory, and it penalizes them when this assessment is negative. let ej { }t,=hba,|ss,s ba 0,...∈∈ s’ be the degree of satisfaction, and f j i l { }t,=hba,|ss,s ba 0,...∈∈ s the frequency of property i (in this case i = “preference”) with value l, then we define the updating function g as s’x s→s: { } { } ( ) {=f,eg s